PredictSet
#
- class mbgdml.data.PredictSet(pset=None, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
A predict set is a data set with mbGDML predicted energy and forces instead of training data.
When analyzing many structures using mbGDML it is easier to predict all many-body contributions once and then analyze the stored data.
- Parameters:
- property E_true#
True energies from data set.
- Type:
- property F_true#
True forces from data set.
- Type:
- property comp_ids#
A 1D array relating
entity_id
to a fragment label for chemical components or species. Labels could beWAT
orh2o
for water,MeOH
for methanol,bz
for benzene, etc. There are no standardized labels for species. The index of the label is the respectiveentity_id
. For example, a water and methanol molecule could be['h2o', 'meoh']
.Examples
Suppose we have a structure containing a water and methanol molecule. We can use the labels of
h2o
andmeoh
(which could be anything):['h2o', 'meoh']
. Note that theentity_id
is astr
.- Type:
- property entity_ids#
1D array specifying which atoms belong to which entities.
An entity represents a related set of atoms such as a single molecule, several molecules, or a functional group. For mbGDML, an entity usually corresponds to a model trained to predict energies and forces of those atoms. Each
entity_id
is anint
starting from0
.It is conceptually similar to PDBx/mmCIF
_atom_site.label_entity_ids
data item.Examples
A single water molecule would be
[0, 0, 0]
. A water (three atoms) and methanol (six atoms) molecule in the same structure would be[0, 0, 0, 1, 1, 1, 1, 1, 1]
.- Type:
- load_dataset(dset, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
Loads data set in preparation to create a predict set.
- Parameters:
dset (
str
ordict
) – Path to data set ordict
with at least the following data:Z_key (
str
, default:Z
) –dict
key indset_path
for atomic numbers.R_key (
str
, default:R
) –dict
key indset_path
for atomic Cartesian coordinates.E_key (
str
, default:E
) –dict
key indset_path
for energies.F_key (
str
, default:F
) –dict
key indset_path
for atomic forces.
- load_models(models, predict_model, use_ray=False, n_workers=1, ray_address='auto', wkr_chunk_size=100)[source]#
Loads model(s) in preparation to create a predict set.
- Parameters:
models (
list
ofmbgdml.models.Model
) – Machine learning model objects that contain all information to make predictions usingpredict_model
.predict_model (
callable
) – A function that takesZ, R, entity_ids, nbody_gen, model
and computes energies and forces. This will be turned into a ray remote function ifuse_ray = True
. This can return total properties or all individual \(n\)-body energies and forces.use_ray (
bool
, default:False
) – Use ray to parallelize computations.n_workers (
int
, default:1
) – Total number of workers available for ray. This is ignored ifuse_ray
isFalse
.ray_address (
str
, default:"auto"
) – Ray cluster address to connect to.wkr_chunk_size (
int
, default:100
) – Number of \(n\)-body structures to assign to each spawned worker with ray.
- nbody_predictions(nbody_orders, use_ray=False, n_workers=1, ray_address='auto')[source]#
Energies and forces of all structures including
nbody_order
contributions.Predict sets have data that is broken down into many-body contributions. This function sums the many-body contributions up to the specified level; for example,
3
returns the energy and force predictions when including one, two, and three body contributions/corrections.- Parameters:
- Returns:
numpy.ndarray
– Energy of structures up to and including n-order corrections.numpy.ndarray
– Forces of structures up to an including n-order corrections.