DataSet
#
- class mbgdml.data.DataSet(dset_path=None, Z_key='Z', R_key='R', E_key='E', F_key='F')[source]#
For creating, loading, manipulating, and using data sets.
- Parameters:
dset_path (
str
, optional) – Path to a npz file.Z_key (
str
, default:Z
) –dict
key indset_path
for atomic numbers.R_key (
str
, default:R
) –dict
key indset_path
for Cartesian coordinates.E_key (
str
, default:E
) –dict
key indset_path
for energies.F_key (
str
, default:F
) –dict
key indset_path
for atomic forces.
- property E#
The energies of structure(s).
A
numpy.ndarray
with shape of(n,)
wheren
is the number of atoms.- Type:
- property F#
Atomic forces of atoms in structure(s).
A
numpy.ndarray
with shape of(m, n, 3)
wherem
is the number of structures andn
is the number of atoms with three Cartesian components.- Type:
- property comp_ids#
A 1D array relating
entity_id
to a fragment label for chemical components or species. Labels could beWAT
orh2o
for water,MeOH
for methanol,bz
for benzene, etc. There are no standardized labels for species. The index of the label is the respectiveentity_id
. For example, a water and methanol molecule could be['h2o', 'meoh']
.Examples
Suppose we have a structure containing a water and methanol molecule. We can use the labels of
h2o
andmeoh
(which could be anything):['h2o', 'meoh']
. Note that theentity_id
is astr
.- Type:
- convertE(E_units)[source]#
Convert energies and updates
e_unit
.- Parameters:
E_units (
str
) – Desired units of energy. Options are'eV'
,'hartree'
,'kcal/mol'
, and'kJ/mol'
.
- convertF(force_e_units, force_r_units, e_units, r_units)[source]#
Convert forces.
Does not change
e_unit
orr_unit
.- Parameters:
force_e_units (
str
) – Specifies package-specific energy units used in calculation. Available units are'eV'
,'hartree'
,'kcal/mol'
, and'kJ/mol'
.force_r_units (
str
) – Specifies package-specific distance units used in calculation. Available units are'Angstrom'
and'bohr'
.e_units (
str
) – Desired units of energy. Available units are'eV'
,'hartree'
,'kcal/mol'
, and'kJ/mol'
.r_units (
str
) – Desired units of distance. Available units are'Angstrom'
and'bohr'
.
- convertR(R_units)[source]#
Convert coordinates and updates
r_unit
.- Parameters:
R_units (
str
) – Desired units of coordinates. Options are'Angstrom'
or'bohr'
.
- property entity_ids#
1D array specifying which atoms belong to which entities.
An entity represents a related set of atoms such as a single molecule, several molecules, or a functional group. For mbGDML, an entity usually corresponds to a model trained to predict energies and forces of those atoms. Each
entity_id
is anint
starting from0
.It is conceptually similar to PDBx/mmCIF
_atom_site.label_entity_ids
data item.Examples
A single water molecule would be
[0, 0, 0]
. A water (three atoms) and methanol (six atoms) molecule in the same structure would be[0, 0, 0, 1, 1, 1, 1, 1, 1]
.- Type:
- property mb#
Many-body expansion order of this data set. This is
None
if the data set does not contain many-body energies and forces.- Type:
- property mb_dsets_md5#
All MD5 hash of data sets used to remove n-body contributions from data sets.
- Type:
- property mb_models_md5#
All MD5 hash of models used to remove n-body contributions from models.
- Type:
- property md5#
Unique MD5 hash of data set.
Notes
Z
andR
are always used to generate the MD5 hash. If available,mbgdml.data.DataSet.E
andmbgdml.data.DataSet.F
are used.- Type:
- property r_prov_ids#
Specifies structure sets IDs/labels and corresponding MD5 hashes.
Keys are the Rset IDs (
int
) and values are MD5 hashes (str
) for the particular structure set.This is used as a breadcrumb trail that specifies where each structure in the data set originates from.
Examples
>>> dset.r_prov_ids {0: '2339670ad87a606cb11a72191dfd9f58'}
- Type:
- property r_prov_specs#
An array specifying where each structure in
R
originates from.A
(n_R, 1 + n_entity)
array where each row contains the Rset ID fromr_prov_ids
(e.g., 0, 1, 2, etc.) then the structure index and entity_ids from the original full structure in the structure set.If there has been no previous sampling, an array of shape (1, 0) is returned.
- Type:
Examples
>>> dset.r_prov_specs # [r_prov_id, r_index, entity_1, entity_2, entity_3] array([[0, 985, 46, 59, 106], [0, 174, 51, 81, 128]])
- property theory#
The level of theory used to compute energy and gradients of the data set.
- Type: