PredictSet#

class mbgdml.data.PredictSet(pset=None, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#

A predict set is a data set with mbGDML predicted energy and forces instead of training data.

When analyzing many structures using mbGDML it is easier to predict all many-body contributions once and then analyze the stored data.

Parameters:
  • pset (str or dict, default: None) – Predict set path or dictionary to initialize with.

  • Z_key (str, default: Z) – dict key in pset for atomic numbers.

  • R_key (str, default: R) – dict key in pset for Cartesian coordinates.

  • E_key (str, default: E) – dict key in pset for energies.

  • F_key (str, default: F) – dict key in pset for atomic forces.

property E_true#

True energies from data set.

Type:

numpy.ndarray

property F_true#

True forces from data set.

Type:

numpy.ndarray

asdict()[source]#

Converts object into a custom dict.

Return type:

dict

property comp_ids#

A 1D array relating entity_id to a fragment label for chemical components or species. Labels could be WAT or h2o for water, MeOH for methanol, bz for benzene, etc. There are no standardized labels for species. The index of the label is the respective entity_id. For example, a water and methanol molecule could be ['h2o', 'meoh'].

Examples

Suppose we have a structure containing a water and methanol molecule. We can use the labels of h2o and meoh (which could be anything): ['h2o', 'meoh']. Note that the entity_id is a str.

Type:

numpy.ndarray

property e_unit#

Units of energy. Options are eV, hartree, kcal/mol, and kJ/mol.

Type:

str

property entity_ids#

1D array specifying which atoms belong to which entities.

An entity represents a related set of atoms such as a single molecule, several molecules, or a functional group. For mbGDML, an entity usually corresponds to a model trained to predict energies and forces of those atoms. Each entity_id is an int starting from 0.

It is conceptually similar to PDBx/mmCIF _atom_site.label_entity_ids data item.

Examples

A single water molecule would be [0, 0, 0]. A water (three atoms) and methanol (six atoms) molecule in the same structure would be [0, 0, 0, 1, 1, 1, 1, 1, 1].

Type:

numpy.ndarray

load(pset)[source]#

Reads predict data set and loads data.

Parameters:

pset (str or dict) – Path to predict set npz file or a dictionary.

load_dataset(dset, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#

Loads data set in preparation to create a predict set.

Parameters:
  • dset (str or dict) – Path to data set or dict with at least the following data:

  • Z_key (str, default: Z) – dict key in dset_path for atomic numbers.

  • R_key (str, default: R) – dict key in dset_path for atomic Cartesian coordinates.

  • E_key (str, default: E) – dict key in dset_path for energies.

  • F_key (str, default: F) – dict key in dset_path for atomic forces.

load_models(models, predict_model, use_ray=False, n_workers=1, ray_address='auto', wkr_chunk_size=100)[source]#

Loads model(s) in preparation to create a predict set.

Parameters:
  • models (list of mbgdml.models.Model) – Machine learning model objects that contain all information to make predictions using predict_model.

  • predict_model (callable) – A function that takes Z, R, entity_ids, nbody_gen, model and computes energies and forces. This will be turned into a ray remote function if use_ray = True. This can return total properties or all individual \(n\)-body energies and forces.

  • use_ray (bool, default: False) – Use ray to parallelize computations.

  • n_workers (int, default: 1) – Total number of workers available for ray. This is ignored if use_ray is False.

  • ray_address (str, default: "auto") – Ray cluster address to connect to.

  • wkr_chunk_size (int, default: 100) – Number of \(n\)-body structures to assign to each spawned worker with ray.

name#

File name of the predict set.

Default: "predictset"

Type:

str

nbody_predictions(nbody_orders, use_ray=False, n_workers=1, ray_address='auto')[source]#

Energies and forces of all structures including nbody_order contributions.

Predict sets have data that is broken down into many-body contributions. This function sums the many-body contributions up to the specified level; for example, 3 returns the energy and force predictions when including one, two, and three body contributions/corrections.

Parameters:
  • nbody_orders (list of int) – \(n\)-body orders to include.

  • use_ray (bool, default: False) – Use ray to parallelize computations.

  • n_workers (int, default: 1) – Total number of workers available for ray. This is ignored if use_ray is False.

  • ray_address (str, default: "auto") – Ray cluster address to connect to.

Returns:

  • numpy.ndarray – Energy of structures up to and including n-order corrections.

  • numpy.ndarray – Forces of structures up to an including n-order corrections.

prepare()[source]#

Prepares a predict set by calculated the decomposed energy and force contributions.

Note

You must load a dataset first to specify Z, R, entity_ids, and comp_ids.

property theory#

The level of theory used to compute energy and gradients of the data set.

Type:

str