Gaussian Processes

Gaussian process regression module suited to learn and predict energies and forces

Example:

gp = GaussianProcess(kernel, noise)
gp.fit(train_configurations, train_forces)
gp.predict(test_configurations)
class mff.gp.GaussianProcess(kernel=None, noise=1e-10, optimizer=None, n_restarts_optimizer=0)

Gaussian process class Class of GP regression of QM energies and forces

Parameters:
  • kernel (obj) – A kernel object (typically a two or three body)
  • noise (float) – The regularising noise level (typically named sigma_n^2)
  • optimizer (str) – The kind of optimization of marginal likelihood (not implemented yet)
X_train_

list – The configurations used for training

alpha_

array – The coefficients obtained during training

L_

array – The lower triangular matrix from cholesky decomposition of gram matrix

K

array – The kernel gram matrix

calc_gram_ee(X)

Calculate the force-force kernel gram matrix

Parameters:X (list) – list of N training configurations, which are M x 5 matrices
Returns:The energy energy gram matrix, has dimensions N x N
Return type:K (matrix)
calc_gram_ff(X)

Calculate the force-force kernel gram matrix

Parameters:X (list) – list of N training configurations, which are M x 5 matrices
Returns:The force-force gram matrix, has dimensions 3N x 3N
Return type:K (matrix)
fit(X, y, nnodes=1)

Fit a Gaussian process regression model on training forces

Parameters:
  • X (list) – training configurations
  • y (np.ndarray) – training forces
  • nnodes (int) – number of CPU workers to use, default is 1
fit_energy(X, y, nnodes=1)

Fit a Gaussian process regression model using local energies.

Parameters:
  • X (list) – training configurations
  • y (np.ndarray) – training energies
  • nnodes (int) – number of CPU workers to use, default is 1
  • energy of each configuration is E/N where E is the total (The) –
  • energy and N the atoms in that snapshot (snapshot) –
fit_force_and_energy(X, y_force, y_energy, nnodes=1)

Fit a Gaussian process regression model using forces and energies

Parameters:
  • X (list) – training configurations
  • y_force (np.ndarray) – training forces
  • y_energy (np.ndarray) – training local energies
  • nnodes (int) – number of CPU workers to use, default is 1
fit_update(X2_up, y2_up, nnodes=1)

Update an existing energy-energy gram matrix with a list of new datapoints

Parameters:
  • X2_up (list) – training configurations
  • y2_up (np.ndarray) – training forces
  • nnodes (int) – number of CPU workers to use, default is 1
fit_update_energy(X2_up, y2_up, nnodes=1)

Update an existing energy-energy gram matrix with a list of new datapoints

Parameters:
  • X2_up (list) – training configurations
  • y2_up (np.ndarray) – training energies
  • nnodes (int) – number of CPU workers to use, default is 1
fit_update_single(X2_up, y2_up, nnodes=1)

Update an existing force-force gram matrix with a single new datapoint

fit_update_single_energy(X2_up, y2_up, nnodes=1)

Update an existing energy-energy gram matrix with a single new datapoint

load(filename)

Load a saved GP model

Parameters:filename (str) – name of the file where the GP is saved
log_marginal_likelihood(theta=None, eval_gradient=False)

Returns log-marginal likelihood of theta for training data.

Parameters:
  • theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.
  • eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
Returns:

float

Log-marginal likelihood of theta for training data.

log_likelihood_gradient : array, shape = (n_kernel_params,), optional

Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type:

log_likelihood

predict(X, return_std=False)

Predict forces using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters:
  • X (list) – Target configurations where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

predict_energy(X, return_std=False)

Predict energies using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters:
  • X (list) – Target configurations where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

predict_energy_map(X, return_std=False)

Predict energies using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters:
  • X (list) – Target configurations where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

predict_energy_single(X, return_std=False)

Predict energies from forces only using the Gaussian process regression model

This function evaluates the GP energies for a set of test configurations.

Parameters:
  • X (np.ndarray) – Target configurations where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

predict_energy_single_map(X, return_std=False)

Predict energies from forces only using the Gaussian process regression model

This function evaluates the GP energies for a set of test configurations.

Parameters:
  • X (np.ndarray) – Target configurations where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

predict_single(X, return_std=False)

Predict forces using the Gaussian process regression model

We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)

Parameters:
  • X (np.ndarray) – Target configuration where the GP is evaluated
  • return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns:

Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target

configurations. Only returned when return_std is True.

Return type:

y_mean (np.ndarray)

pseudo_log_likelihood()

Returns pseudo log-likelihood of the training data.

Parameters:
  • theta – array-like, shape = (n_kernel_params,) or None Kernel hyperparameters for which the log-marginal likelihood is evaluated. If None, the precomputed log_marginal_likelihood of self.kernel_.theta is returned.
  • eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
Returns:

float

Log-marginal likelihood of theta for training data.

log_likelihood_gradient : array, shape = (n_kernel_params,), optional

Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.

Return type:

log_likelihood

save(filename)

Dump the current GP model for later use

Parameters:filename (str) – name of the file where to save the GP
class mff.gp.ThreeBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)
build_grid(dists, element1)

Function that builds and predicts energies on a cube of values

class mff.gp.TwoBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)

Two Body Kernel

Module that contains the expressions for the 2-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from twobodykernel import TwoBodySingleSpeciesKernel
kernel = kernels.TwoBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
class mff.kernels.twobodykernel.BaseTwoBody(kernel_name, theta, bounds)

Two body kernel class Handles the functions common to the single-species and multi-species two-body kernels.

Parameters:
  • kernel_name (str) – To choose between single- and two-species kernel
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
  • bounds (list) – bounds of the kernel function.
k2_ee

object – Energy-energy kernel function

k2_ef

object – Energy-force kernel function

k2_ef_loc

object – Local Energy-force kernel function

k2_ff

object – Force-force kernel function

calc(X1, X2)

Calculate the force-force kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1*3 x N2*3 matrix of the matrix-valued kernels

Return type:

K (matrix)

calc_ee(X1, X2)

Calculate the energy-energy kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2 matrix of the scalar-valued kernels

Return type:

K (matrix)

calc_ef(X1, X2)

Calculate the energy-force kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2*3 matrix of the vector-valued kernels

Return type:

K (matrix)

calc_ef_loc(X1, X2)

Calculate the local energy-force kernel between two sets of configurations. Used only during mapping since it is faster than calc_ef and equivalent in that case.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2*3 matrix of the vector-valued kernels

Return type:

K (matrix)

calc_gram(X, nnodes=1, eval_gradient=False)

Calculate the force-force gram matrix for a set of configurations X.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type:

gram (matrix)

calc_gram_e(X, nnodes=1, eval_gradient=False)

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N x N gram matrix of the scalar-valued kernels

Return type:

gram (matrix)

calc_gram_ef(X, nnodes=1, eval_gradient=False)

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N x N*3 gram matrix of the vector-valued kernels

Return type:

gram (matrix)

class mff.kernels.twobodykernel.TwoBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Two body single species kernel.

Parameters:
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
static compile_theano()

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns:energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
Return type:k2_ee (func)
class mff.kernels.twobodykernel.TwoBodyTwoSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Two body two species kernel.

Parameters:
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
static compile_theano()

This function generates theano compiled kernels for global energy and force learning

The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.

Returns:energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel
Return type:k2_ee (func)

Three Body Kernel

Module that contains the expressions for the 3-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.

Example:

from threebodykernel import ThreeBodySingleSpeciesKernel
kernel = kernels.ThreeBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
class mff.kernels.threebodykernel.BaseThreeBody(kernel_name, theta, bounds)

Three body kernel class Handles the functions common to the single-species and multi-species three-body kernels.

Parameters:
  • kernel_name (str) – To choose between single- and two-species kernel
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
  • bounds (list) – bounds of the kernel function.
k3_ee

object – Energy-energy kernel function

k3_ef

object – Energy-force kernel function

k3_ef_loc

object – Local Energy-force kernel function

k3_ff

object – Force-force kernel function

calc(X1, X2)

Calculate the force-force kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1*3 x N2*3 matrix of the matrix-valued kernels

Return type:

K (matrix)

calc_ee(X1, X2)

Calculate the energy-energy kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2 matrix of the scalar-valued kernels

Return type:

K (matrix)

calc_ef(X1, X2)

Calculate the energy-force kernel between two sets of configurations.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2*3 matrix of the vector-valued kernels

Return type:

K (matrix)

calc_ef_loc(X1, X2)

Calculate the local energy-force kernel between two sets of configurations. Used only during mapping since it is faster than calc_ef and equivalent in that case.

Parameters:
  • X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
  • X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns:

N1 x N2*3 matrix of the vector-valued kernels

Return type:

K (matrix)

calc_gram(X, nnodes=1, eval_gradient=False)

Calculate the force-force gram matrix for a set of configurations X.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N*3 x N*3 gram matrix of the matrix-valued kernels

Return type:

gram (matrix)

calc_gram_e(X, nnodes=1, eval_gradient=False)

Calculate the energy-energy gram matrix for a set of configurations X.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N x N gram matrix of the scalar-valued kernels

Return type:

gram (matrix)

calc_gram_ef(X, nnodes=1, eval_gradient=False)

Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.

Parameters:
  • X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
  • nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
  • eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns:

N x N*3 gram matrix of the vector-valued kernels

Return type:

gram (matrix)

class mff.kernels.threebodykernel.ThreeBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Three body two species kernel.

Parameters:
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
static compile_theano()

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns:energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
Return type:k3_ee (func)
class mff.kernels.threebodykernel.ThreeBodyTwoSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))

Three body two species kernel.

Parameters:
  • theta[0] (float) – lengthscale of the kernel
  • theta[1] (float) – decay rate of the cutoff function
  • theta[2] (float) – cutoff radius
static compile_theano()

This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn

The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5

Returns:energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel
Return type:k3_ee (func)