PredictSet#
- class mbgdml.data.PredictSet(pset=None, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
A predict set is a data set with mbGDML predicted energy and forces instead of training data.
When analyzing many structures using mbGDML it is easier to predict all many-body contributions once and then analyze the stored data.
- Parameters:
- property E_true#
True energies from data set.
- Type:
- property F_true#
True forces from data set.
- Type:
- property comp_ids#
A 1D array relating
entity_idto a fragment label for chemical components or species. Labels could beWATorh2ofor water,MeOHfor methanol,bzfor benzene, etc. There are no standardized labels for species. The index of the label is the respectiveentity_id. For example, a water and methanol molecule could be['h2o', 'meoh'].Examples
Suppose we have a structure containing a water and methanol molecule. We can use the labels of
h2oandmeoh(which could be anything):['h2o', 'meoh']. Note that theentity_idis astr.- Type:
- property entity_ids#
1D array specifying which atoms belong to which entities.
An entity represents a related set of atoms such as a single molecule, several molecules, or a functional group. For mbGDML, an entity usually corresponds to a model trained to predict energies and forces of those atoms. Each
entity_idis anintstarting from0.It is conceptually similar to PDBx/mmCIF
_atom_site.label_entity_idsdata item.Examples
A single water molecule would be
[0, 0, 0]. A water (three atoms) and methanol (six atoms) molecule in the same structure would be[0, 0, 0, 1, 1, 1, 1, 1, 1].- Type:
- load_dataset(dset, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
Loads data set in preparation to create a predict set.
- Parameters:
dset (
strordict) – Path to data set ordictwith at least the following data:Z_key (
str, default:Z) –dictkey indset_pathfor atomic numbers.R_key (
str, default:R) –dictkey indset_pathfor atomic Cartesian coordinates.E_key (
str, default:E) –dictkey indset_pathfor energies.F_key (
str, default:F) –dictkey indset_pathfor atomic forces.
- load_models(models, predict_model, use_ray=False, n_workers=1, ray_address='auto', wkr_chunk_size=100)[source]#
Loads model(s) in preparation to create a predict set.
- Parameters:
models (
listofmbgdml.models.Model) – Machine learning model objects that contain all information to make predictions usingpredict_model.predict_model (
callable) – A function that takesZ, R, entity_ids, nbody_gen, modeland computes energies and forces. This will be turned into a ray remote function ifuse_ray = True. This can return total properties or all individual \(n\)-body energies and forces.use_ray (
bool, default:False) – Use ray to parallelize computations.n_workers (
int, default:1) – Total number of workers available for ray. This is ignored ifuse_rayisFalse.ray_address (
str, default:"auto") – Ray cluster address to connect to.wkr_chunk_size (
int, default:100) – Number of \(n\)-body structures to assign to each spawned worker with ray.
- nbody_predictions(nbody_orders, use_ray=False, n_workers=1, ray_address='auto')[source]#
Energies and forces of all structures including
nbody_ordercontributions.Predict sets have data that is broken down into many-body contributions. This function sums the many-body contributions up to the specified level; for example,
3returns the energy and force predictions when including one, two, and three body contributions/corrections.- Parameters:
- Returns:
numpy.ndarray– Energy of structures up to and including n-order corrections.numpy.ndarray– Forces of structures up to an including n-order corrections.