Gaussian Processes¶
Gaussian process regression module suited to learn and predict energies and forces
Example:
gp = GaussianProcess(kernel, noise)
gp.fit(train_configurations, train_forces)
gp.predict(test_configurations)
-
class
mff.gp.GaussianProcess(kernel=None, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶ Gaussian process class Class of GP regression of QM energies and forces
Parameters: - kernel (obj) – A kernel object (typically a two or three body)
- noise (float) – The regularising noise level (typically named sigma_n^2)
- optimizer (str) – The kind of optimization of marginal likelihood (not implemented yet)
-
X_train_¶ list – The configurations used for training
-
alpha_¶ array – The coefficients obtained during training
-
L_¶ array – The lower triangular matrix from cholesky decomposition of gram matrix
-
K¶ array – The kernel gram matrix
-
calc_gram_ee(X)¶ Calculate the force-force kernel gram matrix
Parameters: X (list) – list of N training configurations, which are M x 5 matrices Returns: The energy energy gram matrix, has dimensions N x N Return type: K (matrix)
-
calc_gram_ff(X)¶ Calculate the force-force kernel gram matrix
Parameters: X (list) – list of N training configurations, which are M x 5 matrices Returns: The force-force gram matrix, has dimensions 3N x 3N Return type: K (matrix)
-
fit(X, y, nnodes=1)¶ Fit a Gaussian process regression model on training forces
Parameters: - X (list) – training configurations
- y (np.ndarray) – training forces
- nnodes (int) – number of CPU workers to use, default is 1
-
fit_energy(X, y, nnodes=1)¶ Fit a Gaussian process regression model using local energies.
Parameters: - X (list) – training configurations
- y (np.ndarray) – training energies
- nnodes (int) – number of CPU workers to use, default is 1
- energy of each configuration is E/N where E is the total (The) –
- energy and N the atoms in that snapshot (snapshot) –
-
fit_force_and_energy(X, y_force, y_energy, nnodes=1)¶ Fit a Gaussian process regression model using forces and energies
Parameters: - X (list) – training configurations
- y_force (np.ndarray) – training forces
- y_energy (np.ndarray) – training local energies
- nnodes (int) – number of CPU workers to use, default is 1
-
fit_update(X2_up, y2_up, nnodes=1)¶ Update an existing energy-energy gram matrix with a list of new datapoints
Parameters: - X2_up (list) – training configurations
- y2_up (np.ndarray) – training forces
- nnodes (int) – number of CPU workers to use, default is 1
-
fit_update_energy(X2_up, y2_up, nnodes=1)¶ Update an existing energy-energy gram matrix with a list of new datapoints
Parameters: - X2_up (list) – training configurations
- y2_up (np.ndarray) – training energies
- nnodes (int) – number of CPU workers to use, default is 1
-
fit_update_single(X2_up, y2_up, nnodes=1)¶ Update an existing force-force gram matrix with a single new datapoint
-
fit_update_single_energy(X2_up, y2_up, nnodes=1)¶ Update an existing energy-energy gram matrix with a single new datapoint
-
load(filename)¶ Load a saved GP model
Parameters: filename (str) – name of the file where the GP is saved
-
log_marginal_likelihood(theta=None, eval_gradient=False)¶ Returns log-marginal likelihood of theta for training data.
Parameters: - theta – array-like, shape = (n_kernel_params,) or None
Kernel hyperparameters for which the log-marginal likelihood is
evaluated. If None, the precomputed log_marginal_likelihood
of
self.kernel_.thetais returned. - eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
Returns: - float
Log-marginal likelihood of theta for training data.
- log_likelihood_gradient : array, shape = (n_kernel_params,), optional
Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.
Return type: log_likelihood
- theta – array-like, shape = (n_kernel_params,) or None
Kernel hyperparameters for which the log-marginal likelihood is
evaluated. If None, the precomputed log_marginal_likelihood
of
-
predict(X, return_std=False)¶ Predict forces using the Gaussian process regression model
We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)
Parameters: - X (list) – Target configurations where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
predict_energy(X, return_std=False)¶ Predict energies using the Gaussian process regression model
We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)
Parameters: - X (list) – Target configurations where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
predict_energy_map(X, return_std=False)¶ Predict energies using the Gaussian process regression model
We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)
Parameters: - X (list) – Target configurations where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
predict_energy_single(X, return_std=False)¶ Predict energies from forces only using the Gaussian process regression model
This function evaluates the GP energies for a set of test configurations.
Parameters: - X (np.ndarray) – Target configurations where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
predict_energy_single_map(X, return_std=False)¶ Predict energies from forces only using the Gaussian process regression model
This function evaluates the GP energies for a set of test configurations.
Parameters: - X (np.ndarray) – Target configurations where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
predict_single(X, return_std=False)¶ Predict forces using the Gaussian process regression model
We can also predict based on an unfitted model by using the GP prior. In addition to the mean of the predictive distribution, also its standard deviation (return_std=True)
Parameters: - X (np.ndarray) – Target configuration where the GP is evaluated
- return_std (bool) – If True, the standard-deviation of the predictive distribution of the target configurations is returned along with the mean.
Returns: Mean of predictive distribution at target configurations. y_std (np.ndarray): Standard deviation of predictive distribution at target
configurations. Only returned when return_std is True.
Return type: y_mean (np.ndarray)
-
pseudo_log_likelihood()¶ Returns pseudo log-likelihood of the training data.
Parameters: - theta – array-like, shape = (n_kernel_params,) or None
Kernel hyperparameters for which the log-marginal likelihood is
evaluated. If None, the precomputed log_marginal_likelihood
of
self.kernel_.thetais returned. - eval_gradient – bool, default: False If True, the gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta is returned additionally. If True, theta must not be None.
Returns: - float
Log-marginal likelihood of theta for training data.
- log_likelihood_gradient : array, shape = (n_kernel_params,), optional
Gradient of the log-marginal likelihood with respect to the kernel hyperparameters at position theta. Only returned when eval_gradient is True.
Return type: log_likelihood
- theta – array-like, shape = (n_kernel_params,) or None
Kernel hyperparameters for which the log-marginal likelihood is
evaluated. If None, the precomputed log_marginal_likelihood
of
-
save(filename)¶ Dump the current GP model for later use
Parameters: filename (str) – name of the file where to save the GP
-
class
mff.gp.ThreeBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶ -
build_grid(dists, element1)¶ Function that builds and predicts energies on a cube of values
-
-
class
mff.gp.TwoBodySingleSpeciesGP(theta, noise=1e-10, optimizer=None, n_restarts_optimizer=0)¶
Two Body Kernel¶
Module that contains the expressions for the 2-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.
Example:
from twobodykernel import TwoBodySingleSpeciesKernel
kernel = kernels.TwoBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
-
class
mff.kernels.twobodykernel.BaseTwoBody(kernel_name, theta, bounds)¶ Two body kernel class Handles the functions common to the single-species and multi-species two-body kernels.
Parameters: - kernel_name (str) – To choose between single- and two-species kernel
- theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
- bounds (list) – bounds of the kernel function.
-
k2_ee¶ object – Energy-energy kernel function
-
k2_ef¶ object – Energy-force kernel function
-
k2_ef_loc¶ object – Local Energy-force kernel function
-
k2_ff¶ object – Force-force kernel function
-
calc(X1, X2)¶ Calculate the force-force kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1*3 x N2*3 matrix of the matrix-valued kernels
Return type: K (matrix)
-
calc_ee(X1, X2)¶ Calculate the energy-energy kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2 matrix of the scalar-valued kernels
Return type: K (matrix)
-
calc_ef(X1, X2)¶ Calculate the energy-force kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2*3 matrix of the vector-valued kernels
Return type: K (matrix)
-
calc_ef_loc(X1, X2)¶ Calculate the local energy-force kernel between two sets of configurations. Used only during mapping since it is faster than calc_ef and equivalent in that case.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2*3 matrix of the vector-valued kernels
Return type: K (matrix)
-
calc_gram(X, nnodes=1, eval_gradient=False)¶ Calculate the force-force gram matrix for a set of configurations X.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N*3 x N*3 gram matrix of the matrix-valued kernels
Return type: gram (matrix)
-
calc_gram_e(X, nnodes=1, eval_gradient=False)¶ Calculate the energy-energy gram matrix for a set of configurations X.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N x N gram matrix of the scalar-valued kernels
Return type: gram (matrix)
-
calc_gram_ef(X, nnodes=1, eval_gradient=False)¶ Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N x N*3 gram matrix of the vector-valued kernels
Return type: gram (matrix)
-
class
mff.kernels.twobodykernel.TwoBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Two body single species kernel.
Parameters: - theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
-
static
compile_theano()¶ This function generates theano compiled kernels for global energy and force learning
The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.
Returns: energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel Return type: k2_ee (func)
-
class
mff.kernels.twobodykernel.TwoBodyTwoSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Two body two species kernel.
Parameters: - theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
-
static
compile_theano()¶ This function generates theano compiled kernels for global energy and force learning
The position of the atoms relative to the central one, and their chemical species are defined by a matrix of dimension Mx5 here called r1 and r2.
Returns: energy-energy kernel k2_ef (func): energy-force kernel k2_ff (func): force-force kernel Return type: k2_ee (func)
Three Body Kernel¶
Module that contains the expressions for the 3-body single-species and multi-species kernel. The module uses the Theano package to create the energy-energy, force-energy and force-force kernels through automatic differentiation of the energy-energy kernel. The module is used to calculate the energy-energy, energy-force and force-force gram matrices for the Gaussian processes, and supports multi processing. The module is called by the gp.py script.
Example:
from threebodykernel import ThreeBodySingleSpeciesKernel
kernel = kernels.ThreeBodySingleSpeciesKernel(theta=[sigma, theta, r_cut])
ee_gram_matrix = kernel.calc_gram_e(training_configurations, number_nodes)
-
class
mff.kernels.threebodykernel.BaseThreeBody(kernel_name, theta, bounds)¶ Three body kernel class Handles the functions common to the single-species and multi-species three-body kernels.
Parameters: - kernel_name (str) – To choose between single- and two-species kernel
- theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
- bounds (list) – bounds of the kernel function.
-
k3_ee¶ object – Energy-energy kernel function
-
k3_ef¶ object – Energy-force kernel function
-
k3_ef_loc¶ object – Local Energy-force kernel function
-
k3_ff¶ object – Force-force kernel function
-
calc(X1, X2)¶ Calculate the force-force kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1*3 x N2*3 matrix of the matrix-valued kernels
Return type: K (matrix)
-
calc_ee(X1, X2)¶ Calculate the energy-energy kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2 matrix of the scalar-valued kernels
Return type: K (matrix)
-
calc_ef(X1, X2)¶ Calculate the energy-force kernel between two sets of configurations.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2*3 matrix of the vector-valued kernels
Return type: K (matrix)
-
calc_ef_loc(X1, X2)¶ Calculate the local energy-force kernel between two sets of configurations. Used only during mapping since it is faster than calc_ef and equivalent in that case.
Parameters: - X1 (list) – list of N1 Mx5 arrays containing xyz coordinates and atomic species
- X2 (list) – list of N2 Mx5 arrays containing xyz coordinates and atomic species
Returns: N1 x N2*3 matrix of the vector-valued kernels
Return type: K (matrix)
-
calc_gram(X, nnodes=1, eval_gradient=False)¶ Calculate the force-force gram matrix for a set of configurations X.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N*3 x N*3 gram matrix of the matrix-valued kernels
Return type: gram (matrix)
-
calc_gram_e(X, nnodes=1, eval_gradient=False)¶ Calculate the energy-energy gram matrix for a set of configurations X.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N x N gram matrix of the scalar-valued kernels
Return type: gram (matrix)
-
calc_gram_ef(X, nnodes=1, eval_gradient=False)¶ Calculate the energy-force gram matrix for a set of configurations X. This returns a non-symmetric matrix which is equal to the transpose of the force-energy gram matrix.
Parameters: - X (list) – list of N Mx5 arrays containing xyz coordinates and atomic species
- nnodes (int) – Number of CPU nodes to use for multiprocessing (default is 1)
- eval_gradient (bool) – if True, evaluate the gradient of the gram matrix
Returns: N x N*3 gram matrix of the vector-valued kernels
Return type: gram (matrix)
-
class
mff.kernels.threebodykernel.ThreeBodySingleSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Three body two species kernel.
Parameters: - theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
-
static
compile_theano()¶ This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn
The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5
Returns: energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel Return type: k3_ee (func)
-
class
mff.kernels.threebodykernel.ThreeBodyTwoSpeciesKernel(theta=(1.0, 1.0, 1.0), bounds=((0.01, 100.0), (0.01, 100.0), (0.01, 100.0)))¶ Three body two species kernel.
Parameters: - theta[0] (float) – lengthscale of the kernel
- theta[1] (float) – decay rate of the cutoff function
- theta[2] (float) – cutoff radius
-
static
compile_theano()¶ This function generates theano compiled kernels for energy and force learning ker_jkmn_withcutoff = ker_jkmn #* cutoff_ikmn
The position of the atoms relative to the centrla one, and their chemical species are defined by a matrix of dimension Mx5
Returns: energy-energy kernel k3_ef (func): energy-force kernel k3_ff (func): force-force kernel Return type: k3_ee (func)