GDMLPredict
#
- class mbgdml._gdml.predict.GDMLPredict(model, batch_size=None, num_workers=None, max_memory=None, max_processes=None, use_torch=False)[source]#
Query trained sGDML force fields.
This class is used to load a trained model and make energy and force predictions for new geometries. GPU support is provided through PyTorch (requires optional
torch
dependency to be installed).Important
This is only used in
mbgdml._gdml.train.GDMLTrain._recov_int_const()
andmbgdml._gdml.train.model_errors()
.Note
The parameters
batch_size
andnum_workers
are only relevant if this code runs on a CPU. Both can be set automatically via the functionprepare_parallel
. Note: Running calculations via PyTorch is only recommended with available GPU hardware. CPU calculations are faster with our NumPy implementation.- Parameters:
model (
dict
) – Data structure that holds all parameters of the trained model. This object is the output of GDMLTrain.trainbatch_size (
int
, optional) – Chunk size for processing parallel tasks.num_workers (
int
, optional) – Number of parallel workers.max_memory (
int
, optional) – Limit the max. memory usage [GB]. This is only a soft limit that can not always be enforced.max_processes (
int
, optional) – Limit the max. number of processes. Otherwise all CPU cores are used. This parameters has no effect if use_torch=Trueuse_torch (
bool
, optional) – Use PyTorch to calculate predictions
- _set_bulk_mp(bulk_mp=False)[source]#
Toggles bulk prediction mode.
If bulk prediction is enabled, the prediction is parallelized across input geometries, i.e. each worker generates the complete prediction for one query. Otherwise (depending on the number of available CPU cores) the input geometries are process sequentially, but every one of them may be processed by multiple workers at once (in chunks).
Note
This parameter can be optimally determined using prepare_parallel.
- Parameters:
bulk_mp (
bool
, optional) – Enable or disable bulk prediction mode.
- _set_chunk_size(chunk_size=None)[source]#
Set chunk size for each worker process.
Every prediction is generated as a linear combination of the training points that the model is comprised of. If multiple workers are available (and bulk mode is disabled), each one processes an (approximately equal) part of those training points. Then, the chunk size determines how much of a processes workload is passed to NumPy’s underlying low-level routines at once. If the chunk size is smaller than the number of points the worker is supposed to process, it processes them in multiple steps using a loop. This can sometimes be faster, depending on the available hardware.
Note
This parameter can be optimally determined using
prepare_parallel
.
- _set_num_workers(num_workers=None, force_reset=False)[source]#
Set number of processes to use during prediction.
If
bulk_mp is True
, each worker handles the whole generation of single prediction (this if for querying multiple geometries at once)If
bulk_mp is False
, each worker may handle only a part of a prediction (chunks are defined in'wkr_starts_stops'
). In that scenario multiple processes are used to distribute the work of generating a single prediction.This number should not exceed the number of available CPU cores.
Note
This parameter can be optimally determined using
prepare_parallel
.
- get_GPU_batch()[source]#
Get batch size used by the GPU implementation to process bulk predictions (predictions for multiple input geometries at once).
This value is determined on-the-fly depending on the available GPU memory.
- predict(R=None, return_E=True)[source]#
Predict energy and forces for multiple geometries.
This function can run on the GPU, if the optional PyTorch dependency is installed and
use_torch=True
was specified during initialization of this class.Optionally, the descriptors and descriptor Jacobians for the same geometries can be provided, if already available from some previous calculations.
Note
The order of the atoms in
R
is not arbitrary and must be the same as used for training the model.- Parameters:
R (
numpy.ndarray
, optional) – An 2D array of size M x 3N containing the Cartesian coordinates of each atom of M molecules. If this parameter is omitted, the training error is returned. Note that the training geometries need to be set right after initialization usingset_R()
for this to work.return_E (
bool
, default:True
) – IfFalse
, only the forces are returned.
- Returns:
numpy.ndarray
– Energies stored in an 1D array of size M. Unlessreturn_E is False
.numpy.ndarray
– Forces stored in an 2D array of size M x 3N.
- prepare_parallel(n_bulk=1, n_reps=1, return_is_from_cache=False)[source]#
Find and set the optimal parallelization parameters for the currently loaded model, running on a particular system. The result also depends on the number of geometries
n_bulk
that will be passed at once when calling the predict function.This function runs a benchmark in which the prediction routine is repeatedly called
n_reps
-times (default: 1) with varying parameter configurations, while the runtime is measured for each one. The optimal parameters are then cached for fast retrieval in future calls of this function.We recommend calling this function after initialization of this class, as it will drastically increase the performance of the
predict
function.Note
Depending on the parameter
n_reps
, this routine may take some seconds/minutes to complete. However, once a statistically significant number of benchmark results has been gathered for a particular configuration, it starts returning almost instantly.- Parameters:
n_bulk (
int
, optional) – Number of geometries that will be passed to the predict function in each call (performance will be optimized for that exact use case).n_reps (
int
, optional) – Number of repetitions (bigger value: more accurate, but also slower).return_is_from_cache (
bool
, optional) – If enabled, this function returns a second value indicating if the returned results were obtained from cache.
- Returns:
- set_R_d_desc(R_d_desc)[source]#
Store a reference to the training geometry descriptor Jacobians.
This function must be called before
set_alphas()
can be used.This routine is used during iterative model training.
- Parameters:
R_d_desc (
numpy.ndarray
, optional) – A 2D array of size M x D x 3N containing of the descriptor Jacobians for M molecules. The descriptor has dimension D with 3N partial derivatives with respect to the 3N Cartesian coordinates of each atom.
- set_R_desc(R_desc)[source]#
Store a reference to the training geometry descriptors.
This can accelerate iterative model training.
- Parameters:
R_desc (
numpy.ndarray
, optional) – An 2D array of size M x D containing the descriptors of dimension D for M molecules.
- set_alphas(alphas_F, alphas_E=None)[source]#
Reconfigure the current model with a new set of regression parameters.
R_d_desc
needs to be set for this function to work.This routine is used during iterative model training.
- Parameters:
alphas_F (
numpy.ndarray
) – 1D array containing the new model parameters.alphas_E (
numpy.ndarray
, optional) – 1D array containing the additional new model parameters, if energy constraints are used in the kernel (use_E_cstr=True)