Trajectory Data — `colvarsfinder.utils`

Author:: Wei Zhang
Year:: 2022
Copyright:: GNU Public License v3

This module implements the function integrate_md_langevin() and integrate_sde_overdamped() for sampling trajectory data of (molecular) dynamical systems, the function calc_weights() for calculating weights of the states, and the class WeightedTrajectory for holding trajectory information.

Typical usage

The main purpose of this module is to prepare training data for training tasks in the module colvarsfinder.core. Assume that the invariant distribution of the system is \(\mu\) at temperature \(T\). One can use this module in the following two steps.

Run integrate_md_langevin() or integrate_sde_overdamped() to sample states \((x_l)_{1\le l \le n}\) of the (biased) system at a possibly higher temperature \(T_{sim} \ge T\);

Use calc_weights() to generate a CSV file that contains the weights \((v_l)_{1\le l \le n}\) of the sampled states.

The weights are calculated in a way such that one can approximate the distribution \(\mu\) using the biased data \((x_l)_{1\le l \le n}\) by

\[\int_{\mathbb{R}^{d}} f(x) \mu(dx) \approx \frac{\sum_{l=1}^n v_l f(x_l)}{\sum_{l=1}^n v_l}\,,\]

for test functions \(f\), where \(d\) is the dimension of the system.

Using the states and weights generated in the above two steps, one can construct the class WeightedTrajectory, which can then be passed on to the training task classes in the module colvarsfinder.core.

Classes

class colvarsfinder.utils.WeightedTrajectory(universe=None, input_ag=None, traj_filename=None, weight_filename=None, min_w=0.0, max_w=inf, verbose=True)[source]

Class that stores trajectory data assoicated to an MDAnalysis Universe.

Parameters:

universe (MDAnalysis.core.universe.Universe) – a MDAnalysis Universe that contains trajectory data
input_ag (MDAnalysis.core.groups.AtomGroup) – atom group used for input. All atoms are selected, if it is none. This argument is relevent only when universe is not None.
traj_filename (str) – filename of a text file that contains trajectory data
weight_filename (str) – filename of a CSV file that contains weights of trajectory
min_w (float) – minimal value of weights below which the corresponding states will be discarded
max_w (float) – maximal value of weights above which the corresponding states will be discarded
verbose (bool) – print more information if ture

Note

Trajectory data will be loaded from universe if it is not None (for MD systems). Otherwise, trajectory data will be loaded from the text file specified by traj_filename (for trajectory data of general stochastic systems).
When loading trajectory data from text file, each line of the file should contain \(d+1\) floats, separated by space. The first float is the current time, and the remaining \(d\) floats are coordinates of the states.

Note

Weights are loaded from a CSV file if a filename is provided. They are normalized such that its mean value is one. Then, states in the trajectory data whose weights are not within [min_w, max_w] will be discarded, and weights of the remaining states are normalized again to have mean value one.
All weights will be set to one, if weight_filename is None.

Raises:

FileNotFoundError – if univsere is None and traj_filename does not point to an existing file.
ValueError – if the lengths of trajectory data and weights are not the same.

trajectory

states in the trajectory. When the trajectory is loaded from a universe, its shape is \([n, N, 3]\), where \(n\) =n_frames is the number of states, and \(N\) is the number of atoms of the system. When the trajectory is loaded from a text file, its shape is \([n, d]\), where \(d\) is system’s dimension.

Type:: numpy array

n_frames

number of states in the trajectory

Type:: int

weights

weights of states

Type:: 1d numpy array

dt

time between two consecutive states. The unit is ns for MD systems.

Type:: float

Functions

colvarsfinder.utils.integrate_md_langevin(pdb, system, integrator, n_steps, sampling_output_path, pre_steps=0, traj_dcd_filename='traj.dcd', csv_filename='output.csv', report_interval=100, report_interval_stdout=100, plumed_script=None)[source]

Generate trajectory data by integrating Langevin dynamics using OpenMM.

Parameters:

pdb (openmm.app.pdbfile.PDBFile) – PDB file
system (openmm.openmm.System) – system to simulate
integrator (openmm.openmm.Integrator) – integrator used to simulate the system
n_steps (int) – total number of steps to integrate
sampling_output_path (str) – directory to save results
pre_steps (int) – number of warm-up steps to run before integrating the system for n_steps
traj_dcd_filename (str) – filename of the DCD file to save trajectory
csv_filename (str) – filename of the CSV file to store statistics
report_interval (int) – how often to write trajectory to DCD file and to write statistics to CSV file
report_interval_stdout (int) – how often to print to stdout
plumed_script (str) – script of PLUMED commands

This class encloses commands to simulate a Langevin dynamics using OpenMM package.

Note

The first line of the CSV file contains titles of the output statistics. Each of the following lines records the time, potential energy, total energy, and temperature of the system, separated by a comma. An example of the output CSV file is given below.
Optionally, by taking advantage of the OpenMM PLUMED plugin, a PLUMED script can be provided, either to record statistics or even to add biasing forces.

Example

Below is an example of the CSV file containing statistics recorded during simulation.

output.csv

#"Time (ps)","Potential Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)"
9999999999998905,-20.050460354856185,40.50556969779072,285.61632731254656
9999999999997806,-15.15398066696433,84.48038169480243,469.9317413501974
9999999999996705,15.302661169416211,121.54570823139409,501.1020187082061
000000000000004,12.581352923170044,96.93859523113919,397.87624303095436
000000000000338,-12.222791961491907,69.45248734168707,385.22659570846105
000000000000672,-16.41391301364837,91.95546048082197,511.13097116410677
000000000001005,12.60815636162124,79.3336199461453,314.71484888738695
...

colvarsfinder.utils.integrate_sde_overdamped(pot_obj, n_steps, sampling_output_path, X0=None, pre_steps=0, step_size=0.01, traj_txt_filename='traj.txt', csv_filename='output.csv', report_interval=100, report_interval_stdout=100)[source]

Generate trajectory data by integrating overdamped Langevin dynamics using Euler-Maruyama scheme.

Parameters:

pot_obj – class that specifies potential \(V\)
n_steps (int) – total number of steps to integrate
sampling_output_path (str) – directory to save results
X0 (1d numpy array of float) – initial position. Random initial position will be used if it is None.
pre_steps (int) – number of warm-up steps to run before integrating the system
step_size (float) – step-size to integrate SDE, unit: dimensionless
traj_txt_filename (str) – filename of the text file to save trajectory
csv_filename (str) – filename of the CSV file to store statistics
report_interval (int) – how often to write trajectory to text file and to write statistics to CSV file
report_interval_stdout (int) – how often to print to stdout

Note

pot_obj is an object of a class which has attributes dim (int), beta (positive real), and memeber functions V and gradV.
The first line of the CSV file (output) contains titles of the output statistics. Each of the following lines records the time and energy of the system, separated by a comma. An example of the output CSV file is given below.

Example

Below is an example of pot_obj.

class mypot(object):
    def __init__(self):
        self.dim = 2
        self.beta = 1.0

    def V(self, x):
        return 0.5 * x[0]**2 + 2.0 * x[1]**2

    def gradV(self, x):
        return np.array([x[0], 4.0 * x[1]])

pot_obj = mypot()

Example

Below is an example of the CSV file containing statistics recorded during simulation.

output.csv

Time,Energy
0,5.423187853452168
0,6.17342106224266
0,1.1311978694547915
0,0.8239473412429524
0,0.02854608581728689
0,1.327977569243032
0,5.452589054795339
0,3.61438132406788
0,2.297568687904894
...

colvarsfinder.utils.calc_weights(csv_filename, sampling_beta, sys_beta, traj_weight_filename='weights.txt', energy_col_idx=1)[source]

Calculate weights of the trajectory data.

Parameters:

csv_filename (str) – filename of a CSV file generated by integrate_md_langevin()
sampling_beta (float) – \(\beta_{sim}\)
sys_beta (float) – \(\beta_{sys}\)
traj_weight_filename (str) – filename to output the weights of trajectory
energy_col_idx (int) – the index of the column of the CSV file, based on which the weights will be calculated.

This function uses certain column (specified by energy_col_idx) of the CSV file (specified by csv_filename) as energy, and computes the weights simply by

\[v_i = \frac{1}{Z} \mathrm{e}^{-(\beta_{sys}-\beta_{sim}) V_i}\,,\]

where \(i=1,\dots, l\) is the index of states, \(V_i\) is the energy value, and \(Z\) is the normalizing constant such that the mean value of the weights are one.

Example

Below is an example of the file generated by calc_weights():

weights.txt

8979287896010564
5424829921777613
02360908666276056
03124009081426178
4012103169413903
617590719346622
031154032337133607
299688734996611
03230412279837258
...

Trajectory Data — colvarsfinder.utils

Classes

Functions

Trajectory Data — `colvarsfinder.utils`