DataSet#
- class mbgdml.data.DataSet(dset_path=None, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
For creating, loading, manipulating, and using data sets.
- Parameters:
dset_path (
str, optional) – Path to a npz file.Z_key (
str, default:Z) –dictkey indset_pathfor atomic numbers.R_key (
str, default:R) –dictkey indset_pathfor Cartesian coordinates.E_key (
str, default:E) –dictkey indset_pathfor energies.F_key (
str, default:F) –dictkey indset_pathfor atomic forces.
- property E#
The energies of structure(s).
A
numpy.ndarraywith shape of(n,)wherenis the number of atoms.- Type:
- property F#
Atomic forces of atoms in structure(s).
A
numpy.ndarraywith shape of(m, n, 3)wheremis the number of structures andnis the number of atoms with three Cartesian components.- Type:
- property comp_ids#
A 1D array relating
entity_idto a fragment label for chemical components or species. Labels could beWATorh2ofor water,MeOHfor methanol,bzfor benzene, etc. There are no standardized labels for species. The index of the label is the respectiveentity_id. For example, a water and methanol molecule could be['h2o', 'meoh'].Examples
Suppose we have a structure containing a water and methanol molecule. We can use the labels of
h2oandmeoh(which could be anything):['h2o', 'meoh']. Note that theentity_idis astr.- Type:
- convertE(E_units)[source]#
Convert energies and updates
e_unit.- Parameters:
E_units (
str) – Desired units of energy. Options are'eV','hartree','kcal/mol', and'kJ/mol'.
- convertF(force_e_units, force_r_units, e_units, r_units)[source]#
Convert forces.
Does not change
e_unitorr_unit.- Parameters:
force_e_units (
str) – Specifies package-specific energy units used in calculation. Available units are'eV','hartree','kcal/mol', and'kJ/mol'.force_r_units (
str) – Specifies package-specific distance units used in calculation. Available units are'Angstrom'and'bohr'.e_units (
str) – Desired units of energy. Available units are'eV','hartree','kcal/mol', and'kJ/mol'.r_units (
str) – Desired units of distance. Available units are'Angstrom'and'bohr'.
- convertR(R_units)[source]#
Convert coordinates and updates
r_unit.- Parameters:
R_units (
str) – Desired units of coordinates. Options are'Angstrom'or'bohr'.
- property entity_ids#
1D array specifying which atoms belong to which entities.
An entity represents a related set of atoms such as a single molecule, several molecules, or a functional group. For mbGDML, an entity usually corresponds to a model trained to predict energies and forces of those atoms. Each
entity_idis anintstarting from0.It is conceptually similar to PDBx/mmCIF
_atom_site.label_entity_idsdata item.Examples
A single water molecule would be
[0, 0, 0]. A water (three atoms) and methanol (six atoms) molecule in the same structure would be[0, 0, 0, 1, 1, 1, 1, 1, 1].- Type:
- property mb#
Many-body expansion order of this data set. This is
Noneif the data set does not contain many-body energies and forces.- Type:
- property mb_dsets_md5#
All MD5 hash of data sets used to remove n-body contributions from data sets.
- Type:
- property mb_models_md5#
All MD5 hash of models used to remove n-body contributions from models.
- Type:
- property md5#
Unique MD5 hash of data set.
Notes
ZandRare always used to generate the MD5 hash. If available,mbgdml.data.DataSet.Eandmbgdml.data.DataSet.Fare used.- Type:
- property r_prov_ids#
Specifies structure sets IDs/labels and corresponding MD5 hashes.
Keys are the Rset IDs (
int) and values are MD5 hashes (str) for the particular structure set.This is used as a breadcrumb trail that specifies where each structure in the data set originates from.
Examples
>>> dset.r_prov_ids {0: '2339670ad87a606cb11a72191dfd9f58'}
- Type:
- property r_prov_specs#
An array specifying where each structure in
Roriginates from.A
(n_R, 1 + n_entity)array where each row contains the Rset ID fromr_prov_ids(e.g., 0, 1, 2, etc.) then the structure index and entity_ids from the original full structure in the structure set.If there has been no previous sampling, an array of shape (1, 0) is returned.
- Type:
Examples
>>> dset.r_prov_specs # [r_prov_id, r_index, entity_1, entity_2, entity_3] array([[0, 985, 46, 59, 106], [0, 174, 51, 81, 128]])
- property theory#
The level of theory used to compute energy and gradients of the data set.
- Type: