Trajectory Data — colvarsfinder.utils

Author:

Wei Zhang

Year:

2022

Copyright:

GNU Public License v3

This module implements the function integrate_md_langevin() and integrate_sde_overdamped() for sampling trajectory data of (molecular) dynamical systems, the function calc_weights() for calculating weights of the states, and the class WeightedTrajectory for holding trajectory information.

Typical usage

The main purpose of this module is to prepare training data for training tasks in the module colvarsfinder.core. Assume that the invariant distribution of the system is \(\mu\) at temperature \(T\). One can use this module in the following two steps.

  1. Run integrate_md_langevin() or integrate_sde_overdamped() to sample states \((x_l)_{1\le l \le n}\) of the (biased) system at a possibly higher temperature \(T_{sim} \ge T\);

  2. Use calc_weights() to generate a CSV file that contains the weights \((v_l)_{1\le l \le n}\) of the sampled states.

The weights are calculated in a way such that one can approximate the distribution \(\mu\) using the biased data \((x_l)_{1\le l \le n}\) by

\[\int_{\mathbb{R}^{d}} f(x) \mu(dx) \approx \frac{\sum_{l=1}^n v_l f(x_l)}{\sum_{l=1}^n v_l}\,,\]

for test functions \(f\), where \(d\) is the dimension of the system.

Using the states and weights generated in the above two steps, one can construct the class WeightedTrajectory, which can then be passed on to the training task classes in the module colvarsfinder.core.

Classes

class colvarsfinder.utils.WeightedTrajectory(universe=None, input_ag=None, traj_filename=None, weight_filename=None, min_w=0.0, max_w=inf, verbose=True)[source]

Class that stores trajectory data assoicated to an MDAnalysis Universe.

Parameters:
  • universe (MDAnalysis.core.universe.Universe) – a MDAnalysis Universe that contains trajectory data

  • input_ag (MDAnalysis.core.groups.AtomGroup) – atom group used for input. All atoms are selected, if it is none. This argument is relevent only when universe is not None.

  • traj_filename (str) – filename of a text file that contains trajectory data

  • weight_filename (str) – filename of a CSV file that contains weights of trajectory

  • min_w (float) – minimal value of weights below which the corresponding states will be discarded

  • max_w (float) – maximal value of weights above which the corresponding states will be discarded

  • verbose (bool) – print more information if ture

Note

  1. Trajectory data will be loaded from universe if it is not None (for MD systems). Otherwise, trajectory data will be loaded from the text file specified by traj_filename (for trajectory data of general stochastic systems).

  2. When loading trajectory data from text file, each line of the file should contain \(d+1\) floats, separated by space. The first float is the current time, and the remaining \(d\) floats are coordinates of the states.

Note

  1. Weights are loaded from a CSV file if a filename is provided. They are normalized such that its mean value is one. Then, states in the trajectory data whose weights are not within [min_w, max_w] will be discarded, and weights of the remaining states are normalized again to have mean value one.

  2. All weights will be set to one, if weight_filename is None.

Raises:
  • FileNotFoundError – if univsere is None and traj_filename does not point to an existing file.

  • ValueError – if the lengths of trajectory data and weights are not the same.

trajectory

states in the trajectory. When the trajectory is loaded from a universe, its shape is \([n, N, 3]\), where \(n\) =n_frames is the number of states, and \(N\) is the number of atoms of the system. When the trajectory is loaded from a text file, its shape is \([n, d]\), where \(d\) is system’s dimension.

Type:

numpy array

n_frames

number of states in the trajectory

Type:

int

weights

weights of states

Type:

1d numpy array

dt

time between two consecutive states. The unit is ns for MD systems.

Type:

float

Functions

colvarsfinder.utils.integrate_md_langevin(pdb, system, integrator, n_steps, sampling_output_path, pre_steps=0, traj_dcd_filename='traj.dcd', csv_filename='output.csv', report_interval=100, report_interval_stdout=100, plumed_script=None)[source]

Generate trajectory data by integrating Langevin dynamics using OpenMM.

Parameters:
  • pdb (openmm.app.pdbfile.PDBFile) – PDB file

  • system (openmm.openmm.System) – system to simulate

  • integrator (openmm.openmm.Integrator) – integrator used to simulate the system

  • n_steps (int) – total number of steps to integrate

  • sampling_output_path (str) – directory to save results

  • pre_steps (int) – number of warm-up steps to run before integrating the system for n_steps

  • traj_dcd_filename (str) – filename of the DCD file to save trajectory

  • csv_filename (str) – filename of the CSV file to store statistics

  • report_interval (int) – how often to write trajectory to DCD file and to write statistics to CSV file

  • report_interval_stdout (int) – how often to print to stdout

  • plumed_script (str) – script of PLUMED commands

This class encloses commands to simulate a Langevin dynamics using OpenMM package.

Note

  1. The first line of the CSV file contains titles of the output statistics. Each of the following lines records the time, potential energy, total energy, and temperature of the system, separated by a comma. An example of the output CSV file is given below.

  2. Optionally, by taking advantage of the OpenMM PLUMED plugin, a PLUMED script can be provided, either to record statistics or even to add biasing forces.

Example

Below is an example of the CSV file containing statistics recorded during simulation.

output.csv
#"Time (ps)","Potential Energy (kJ/mole)","Total Energy (kJ/mole)","Temperature (K)"
1.9999999999998905,-20.050460354856185,40.50556969779072,285.61632731254656
2.9999999999997806,-15.15398066696433,84.48038169480243,469.9317413501974
3.9999999999996705,15.302661169416211,121.54570823139409,501.1020187082061
5.000000000000004,12.581352923170044,96.93859523113919,397.87624303095436
6.000000000000338,-12.222791961491907,69.45248734168707,385.22659570846105
7.000000000000672,-16.41391301364837,91.95546048082197,511.13097116410677
8.000000000001005,12.60815636162124,79.3336199461453,314.71484888738695
...
colvarsfinder.utils.integrate_sde_overdamped(pot_obj, n_steps, sampling_output_path, X0=None, pre_steps=0, step_size=0.01, traj_txt_filename='traj.txt', csv_filename='output.csv', report_interval=100, report_interval_stdout=100)[source]

Generate trajectory data by integrating overdamped Langevin dynamics using Euler-Maruyama scheme.

Parameters:
  • pot_obj – class that specifies potential \(V\)

  • n_steps (int) – total number of steps to integrate

  • sampling_output_path (str) – directory to save results

  • X0 (1d numpy array of float) – initial position. Random initial position will be used if it is None.

  • pre_steps (int) – number of warm-up steps to run before integrating the system

  • step_size (float) – step-size to integrate SDE, unit: dimensionless

  • traj_txt_filename (str) – filename of the text file to save trajectory

  • csv_filename (str) – filename of the CSV file to store statistics

  • report_interval (int) – how often to write trajectory to text file and to write statistics to CSV file

  • report_interval_stdout (int) – how often to print to stdout

Note

  1. pot_obj is an object of a class which has attributes dim (int), beta (positive real), and memeber functions V and gradV.

  2. The first line of the CSV file (output) contains titles of the output statistics. Each of the following lines records the time and energy of the system, separated by a comma. An example of the output CSV file is given below.

Example

Below is an example of pot_obj.

class mypot(object):
    def __init__(self):
        self.dim = 2
        self.beta = 1.0

    def V(self, x):
        return 0.5 * x[0]**2 + 2.0 * x[1]**2

    def gradV(self, x):
        return np.array([x[0], 4.0 * x[1]])

pot_obj = mypot()

Example

Below is an example of the CSV file containing statistics recorded during simulation.

output.csv
Time,Energy
0.0,5.423187853452168
1.0,6.17342106224266
2.0,1.1311978694547915
3.0,0.8239473412429524
4.0,0.02854608581728689
5.0,1.327977569243032
6.0,5.452589054795339
7.0,3.61438132406788
8.0,2.297568687904894
...
colvarsfinder.utils.calc_weights(csv_filename, sampling_beta, sys_beta, traj_weight_filename='weights.txt', energy_col_idx=1)[source]

Calculate weights of the trajectory data.

Parameters:
  • csv_filename (str) – filename of a CSV file generated by integrate_md_langevin()

  • sampling_beta (float) – \(\beta_{sim}\)

  • sys_beta (float) – \(\beta_{sys}\)

  • traj_weight_filename (str) – filename to output the weights of trajectory

  • energy_col_idx (int) – the index of the column of the CSV file, based on which the weights will be calculated.

This function uses certain column (specified by energy_col_idx) of the CSV file (specified by csv_filename) as energy, and computes the weights simply by

\[v_i = \frac{1}{Z} \mathrm{e}^{-(\beta_{sys}-\beta_{sim}) V_i}\,,\]

where \(i=1,\dots, l\) is the index of states, \(V_i\) is the energy value, and \(Z\) is the normalizing constant such that the mean value of the weights are one.

Example

Below is an example of the file generated by calc_weights():

weights.txt
0.8979287896010564
0.5424829921777613
0.02360908666276056
0.03124009081426178
0.4012103169413903
0.617590719346622
0.031154032337133607
0.299688734996611
0.03230412279837258
...