RDF
#
- class mbgdml.analysis.rdf.RDF(Z, entity_ids, comp_ids, cell_vectors, bin_width=0.05, rdf_range=(0.0, 15.0), inter_only=True, use_ray=False, ray_address='auto', n_workers=1)[source]#
Handles calculating the radial distribution function (RDF), \(g(r)\), of a constant volume simulation.
- Parameters:
Z (
numpy.ndarray
, ndim:1
) – Atomic numbers of all atoms in the system.entity_ids (
numpy.ndarray
, ndim:1
) – Integers that specify which fragment each atom belongs to for all structures.comp_ids (
numpy.ndarray
, ndim:1
) – Labels for eachentity_id
used to determine the desired entity for RDF computations.cell_vectors (
numpy.ndarray
) – The three cell vectors.inter_only (
bool
, default:True
) – Only intermolecular distances are allowed. IfTrue
, atoms that have the sameentity_id
are ignored.use_ray (
bool
, default:False
) – Use ray to parallelize computations.n_workers (
int
, default:1
) – Total number of workers available for ray. This is ignored ifuse_ray
isFalse
.ray_address (
str
, default:"auto"
) – Ray cluster address to connect to.
- run(R, comp_id_pair, entity_idxs, step=1)[source]#
Perform the RDF computation.
- Parameters:
R (
numpy.ndarray
, ndim:3
) –Cartesian coordinates of all atoms in the system under periodic boundary conditions.
Tip
We recommend a memory-map for large
R
to reduce memory requirements.comp_id_pair (
tuple
) – The component ID of the entities to consider. For example,('h2o', 'h2o')
or('h2o', 'meoh')
.entity_idxs (
tuple
, ndim:1
or2
) – The atom indices in each component to compute distances from.step (
int
) – Number of structures/frames to skip between each analyzed frame.
- Returns:
numpy.ndarray
–self.bins
: \(r\) as the midpoint of each histogram bin.numpy.ndarray
–self.results
: \(g(r)\) with respect tobins
.
Examples
Suppose we want to compute the \(O_{w}\)-\(O_{m}\) RDF where \(O_{w}\) and \(O_{m}\) are the oxygen atoms of water and methanol, respectively. We can define our system as such.
>>> import numpy as np >>> Z = np.array([8, 1, 1, 8, 1, 6, 1, 1, 1]*25) >>> entity_ids = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1]*25) >>> comp_ids = np.array(['h2o', 'meoh']*25) >>> cell_vectors = np.array([[16., 0., 0.], [0., 16., 0.], [0., 0., 16.]])
Note
The above information is an arbitrary system made up of 25 water and 25 methanol molecules. They are contained in a 16 Angstrom periodic box where the atoms are always in the same order: water, methanol, water, methanol, water, etc.
This information completely specifies our system to prepare for computing the RDF. We initialize our object with this information.
>>> from mbgdml.analysis.rdf import RDF >>> rdf = RDF(Z, entity_ids, comp_ids, cell_vectors)
From here we need to specify what RDF to compute with
rdf.run()
. We assume the Cartesian coordinates forR
are already loaded as a 3D array or memory-map.The last two pieces of information are
comp_id_pair
andentity_idxs
.comp_id_pair
specifies what components or species we want to compute our RDF with respect to. In this example, we want \(O_{w}\)-\(O_{m}\).entity_idxs
specifies which atom in each entity to use. Oxygen is the first atom in both water and methanol (i.e., index of0
).>>> comp_id_pair = ('h2o', 'meoh') >>> entity_idxs = (0, 0)
We can then compute our RDF!
>>> bins, gr = rdf.run(R, comp_id_pair, entity_idxs)
Notes
inter_only
only comes into play when there is a chance of also computing intramolecular distances during the RDF calculation. Take the hydroxyl OH RDF in a pure methanol simulation for instance. Ourcomp_id_pair
andentity_idxs
would be('meoh', 'meoh')
and(0, 1)
. The O-H intramolecular bond distance would be a perfectly valid atom pair. Usually we are interested in intermolecular distances.inter_only
controls whether intramolecular distances are included (inter_only = False
) or not (inter_only = True
).entity_idxs
can specify one or more atoms for each component. For example, if you wanted to compute the OH RDF of pure water you could useentity_idxs = (0, (1, 2))
.TODO: Support different cell sizes for each structure.