`RDF`#

class mbgdml.analysis.rdf.RDF(Z, entity_ids, comp_ids, cell_vectors, bin_width=0.05, rdf_range=(0.0, 15.0), inter_only=True, use_ray=False, ray_address='auto', n_workers=1)[source]#

Handles calculating the radial distribution function (RDF), \(g(r)\), of a constant volume simulation.

Parameters:

Z (numpy.ndarray, ndim: 1) – Atomic numbers of all atoms in the system.
entity_ids (numpy.ndarray, ndim: 1) – Integers that specify which fragment each atom belongs to for all structures.
comp_ids (numpy.ndarray, ndim: 1) – Labels for each entity_id used to determine the desired entity for RDF computations.
cell_vectors (numpy.ndarray) – The three cell vectors.
inter_only (bool, default: True) – Only intermolecular distances are allowed. If True, atoms that have the same entity_id are ignored.
use_ray (bool, default: False) – Use ray to parallelize computations.
n_workers (int, default: 1) – Total number of workers available for ray. This is ignored if use_ray is False.
ray_address (str, default: "auto") – Ray cluster address to connect to.

run(R, comp_id_pair, entity_idxs, step=1)[source]#

Perform the RDF computation.

Parameters:

R (numpy.ndarray, ndim: 3) –
Cartesian coordinates of all atoms in the system under periodic boundary conditions.

Tip

We recommend a memory-map for large R to reduce memory requirements.
comp_id_pair (tuple) – The component ID of the entities to consider. For example, ('h2o', 'h2o') or ('h2o', 'meoh').
entity_idxs (tuple, ndim: 1 or 2) – The atom indices in each component to compute distances from.
step (int) – Number of structures/frames to skip between each analyzed frame.

Returns:

numpy.ndarray – self.bins: \(r\) as the midpoint of each histogram bin.
numpy.ndarray – self.results: \(g(r)\) with respect to bins.

Examples

Suppose we want to compute the \(O_{w}\)-\(O_{m}\) RDF where \(O_{w}\) and \(O_{m}\) are the oxygen atoms of water and methanol, respectively. We can define our system as such.

>>> import numpy as np
>>> Z = np.array([8, 1, 1, 8, 1, 6, 1, 1, 1]*25)
>>> entity_ids = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1]*25)
>>> comp_ids = np.array(['h2o', 'meoh']*25)
>>> cell_vectors = np.array([[16., 0., 0.], [0., 16., 0.], [0., 0., 16.]])

Note

The above information is an arbitrary system made up of 25 water and 25 methanol molecules. They are contained in a 16 Angstrom periodic box where the atoms are always in the same order: water, methanol, water, methanol, water, etc.

This information completely specifies our system to prepare for computing the RDF. We initialize our object with this information.

>>> from mbgdml.analysis.rdf import RDF
>>> rdf = RDF(Z, entity_ids, comp_ids, cell_vectors)

From here we need to specify what RDF to compute with rdf.run(). We assume the Cartesian coordinates for R are already loaded as a 3D array or memory-map.

The last two pieces of information are comp_id_pair and entity_idxs. comp_id_pair specifies what components or species we want to compute our RDF with respect to. In this example, we want \(O_{w}\)-\(O_{m}\). entity_idxs specifies which atom in each entity to use. Oxygen is the first atom in both water and methanol (i.e., index of 0).

>>> comp_id_pair = ('h2o', 'meoh')
>>> entity_idxs = (0, 0)

We can then compute our RDF!

>>> bins, gr = rdf.run(R, comp_id_pair, entity_idxs)

Notes

inter_only only comes into play when there is a chance of also computing intramolecular distances during the RDF calculation. Take the hydroxyl OH RDF in a pure methanol simulation for instance. Our comp_id_pair and entity_idxs would be ('meoh', 'meoh') and (0, 1). The O-H intramolecular bond distance would be a perfectly valid atom pair. Usually we are interested in intermolecular distances. inter_only controls whether intramolecular distances are included (inter_only = False) or not (inter_only = True).

entity_idxs can specify one or more atoms for each component. For example, if you wanted to compute the OH RDF of pure water you could use entity_idxs = (0, (1, 2)).

TODO: Support different cell sizes for each structure.

RDF#

`RDF`#