Trajectory Data — colvarsfinder.utils
- Author:
Wei Zhang
- Year:
2022
- Copyright:
GNU Public License v3
This module implements the function integrate_md_langevin()
and integrate_sde_overdamped()
for sampling trajectory data of (molecular) dynamical systems, the function calc_weights()
for calculating weights of the states, and the class WeightedTrajectory
for holding trajectory information.
Typical usage
The main purpose of this module is to prepare training data for training tasks in the module colvarsfinder.core
.
Assume that the invariant distribution of the system is \(\mu\) at temperature \(T\).
One can use this module in the following two steps.
Run
integrate_md_langevin()
orintegrate_sde_overdamped()
to sample states \((x_l)_{1\le l \le n}\) of the (biased) system at a possibly higher temperature \(T_{sim} \ge T\);Use
calc_weights()
to generate a CSV file that contains the weights \((v_l)_{1\le l \le n}\) of the sampled states.
The weights are calculated in a way such that one can approximate the distribution \(\mu\) using the biased data \((x_l)_{1\le l \le n}\) by
for test functions \(f\), where \(d\) is the dimension of the system.
Using the states and weights generated in the above two steps, one can construct the class WeightedTrajectory
, which can then be passed on to the training task classes in the module colvarsfinder.core
.
Classes
- class colvarsfinder.utils.WeightedTrajectory(universe=None, input_ag=None, traj_filename=None, weight_filename=None, min_w=0.0, max_w=inf, verbose=True)[source]
Class that stores trajectory data assoicated to an MDAnalysis Universe.
- Parameters:
universe (
MDAnalysis.core.universe.Universe
) – a MDAnalysis Universe that contains trajectory datainput_ag (
MDAnalysis.core.groups.AtomGroup
) – atom group used for input. All atoms are selected, if it is none. This argument is relevent only when universe is not None.traj_filename (str) – filename of a text file that contains trajectory data
weight_filename (str) – filename of a CSV file that contains weights of trajectory
min_w (float) – minimal value of weights below which the corresponding states will be discarded
max_w (float) – maximal value of weights above which the corresponding states will be discarded
verbose (bool) – print more information if ture
Note
Trajectory data will be loaded from universe if it is not None (for MD systems). Otherwise, trajectory data will be loaded from the text file specified by traj_filename (for trajectory data of general stochastic systems).
When loading trajectory data from text file, each line of the file should contain \(d+1\) floats, separated by space. The first float is the current time, and the remaining \(d\) floats are coordinates of the states.
Note
Weights are loaded from a CSV file if a filename is provided. They are normalized such that its mean value is one. Then, states in the trajectory data whose weights are not within [min_w, max_w] will be discarded, and weights of the remaining states are normalized again to have mean value one.
All weights will be set to one, if weight_filename is None.
- Raises:
FileNotFoundError – if univsere is None and traj_filename does not point to an existing file.
ValueError – if the lengths of trajectory data and weights are not the same.
- trajectory
states in the trajectory. When the trajectory is loaded from a universe, its shape is \([n, N, 3]\), where \(n\) =n_frames is the number of states, and \(N\) is the number of atoms of the system. When the trajectory is loaded from a text file, its shape is \([n, d]\), where \(d\) is system’s dimension.
- Type:
numpy array
- n_frames
number of states in the trajectory
- Type:
int
- weights
weights of states
- Type:
1d numpy array
- dt
time between two consecutive states. The unit is ns for MD systems.
- Type:
float
Functions
- colvarsfinder.utils.integrate_md_langevin(pdb, system, integrator, n_steps, sampling_output_path, pre_steps=0, traj_dcd_filename='traj.dcd', csv_filename='output.csv', report_interval=100, report_interval_stdout=100, plumed_script=None)[source]
Generate trajectory data by integrating Langevin dynamics using OpenMM.
- Parameters:
pdb (
openmm.app.pdbfile.PDBFile
) – PDB filesystem (
openmm.openmm.System
) – system to simulateintegrator (
openmm.openmm.Integrator
) – integrator used to simulate the systemn_steps (int) – total number of steps to integrate
sampling_output_path (str) – directory to save results
pre_steps (int) – number of warm-up steps to run before integrating the system for n_steps
traj_dcd_filename (str) – filename of the DCD file to save trajectory
csv_filename (str) – filename of the CSV file to store statistics
report_interval (int) – how often to write trajectory to DCD file and to write statistics to CSV file
report_interval_stdout (int) – how often to print to stdout
plumed_script (str) – script of PLUMED commands
This class encloses commands to simulate a Langevin dynamics using OpenMM package.
Note
The first line of the CSV file contains titles of the output statistics. Each of the following lines records the time, potential energy, total energy, and temperature of the system, separated by a comma. An example of the output CSV file is given below.
Optionally, by taking advantage of the OpenMM PLUMED plugin, a PLUMED script can be provided, either to record statistics or even to add biasing forces.
Example
Below is an example of the CSV file containing statistics recorded during simulation.
#"Time (ps)","Potential Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)" 1.9999999999998905,-20.050460354856185,40.50556969779072,285.61632731254656 2.9999999999997806,-15.15398066696433,84.48038169480243,469.9317413501974 3.9999999999996705,15.302661169416211,121.54570823139409,501.1020187082061 5.000000000000004,12.581352923170044,96.93859523113919,397.87624303095436 6.000000000000338,-12.222791961491907,69.45248734168707,385.22659570846105 7.000000000000672,-16.41391301364837,91.95546048082197,511.13097116410677 8.000000000001005,12.60815636162124,79.3336199461453,314.71484888738695 ...
- colvarsfinder.utils.integrate_sde_overdamped(pot_obj, n_steps, sampling_output_path, X0=None, pre_steps=0, step_size=0.01, traj_txt_filename='traj.txt', csv_filename='output.csv', report_interval=100, report_interval_stdout=100)[source]
Generate trajectory data by integrating overdamped Langevin dynamics using Euler-Maruyama scheme.
- Parameters:
pot_obj – class that specifies potential \(V\)
n_steps (int) – total number of steps to integrate
sampling_output_path (str) – directory to save results
X0 (1d numpy array of float) – initial position. Random initial position will be used if it is None.
pre_steps (int) – number of warm-up steps to run before integrating the system
step_size (float) – step-size to integrate SDE, unit: dimensionless
traj_txt_filename (str) – filename of the text file to save trajectory
csv_filename (str) – filename of the CSV file to store statistics
report_interval (int) – how often to write trajectory to text file and to write statistics to CSV file
report_interval_stdout (int) – how often to print to stdout
Note
pot_obj is an object of a class which has attributes dim (int), beta (positive real), and memeber functions V and gradV.
The first line of the CSV file (output) contains titles of the output statistics. Each of the following lines records the time and energy of the system, separated by a comma. An example of the output CSV file is given below.
Example
Below is an example of pot_obj.
class mypot(object): def __init__(self): self.dim = 2 self.beta = 1.0 def V(self, x): return 0.5 * x[0]**2 + 2.0 * x[1]**2 def gradV(self, x): return np.array([x[0], 4.0 * x[1]]) pot_obj = mypot()
Example
Below is an example of the CSV file containing statistics recorded during simulation.
Time,Energy 0.0,5.423187853452168 1.0,6.17342106224266 2.0,1.1311978694547915 3.0,0.8239473412429524 4.0,0.02854608581728689 5.0,1.327977569243032 6.0,5.452589054795339 7.0,3.61438132406788 8.0,2.297568687904894 ...
- colvarsfinder.utils.calc_weights(csv_filename, sampling_beta, sys_beta, traj_weight_filename='weights.txt', energy_col_idx=1)[source]
Calculate weights of the trajectory data.
- Parameters:
csv_filename (str) – filename of a CSV file generated by
integrate_md_langevin()
sampling_beta (float) – \(\beta_{sim}\)
sys_beta (float) – \(\beta_{sys}\)
traj_weight_filename (str) – filename to output the weights of trajectory
energy_col_idx (int) – the index of the column of the CSV file, based on which the weights will be calculated.
This function uses certain column (specified by energy_col_idx) of the CSV file (specified by csv_filename) as energy, and computes the weights simply by
\[v_i = \frac{1}{Z} \mathrm{e}^{-(\beta_{sys}-\beta_{sim}) V_i}\,,\]where \(i=1,\dots, l\) is the index of states, \(V_i\) is the energy value, and \(Z\) is the normalizing constant such that the mean value of the weights are one.
Example
Below is an example of the file generated by
calc_weights()
:0.8979287896010564 0.5424829921777613 0.02360908666276056 0.03124009081426178 0.4012103169413903 0.617590719346622 0.031154032337133607 0.299688734996611 0.03230412279837258 ...