MMTSB Tool Set - Analysis of replica exchange simulations

Objective and Overview

This tutorial will illustrate how to use the MMTSB Tool Set to analyze replica exchange simulations, including weighted-histogram analysis to obtain free energy maps.

In order to generate the input data for this tutorial it is necessary to run the peptide folding replica exchange tutorialfirst. It is assumed that the output from that tutorial is available. Start by going to the main replica exchange output directory.

1. Information about replica exchange simulation

The tool rexinfo.pl can provide various output related to the progress of the replica exchange simulation (rather than the sampled conformations which we will look at later).

For example, it is possible to examine how the temperature varies for a given replica with:
    cd rex
    rexinfo.pl -byclient aa1
Alternatively, you can be in your tutorial directory and use the command:
    rexinfo.pl -byclient aa1 -dir rex
"Client" means replica here, and for all-atom replica exchange simulations, the replicas are named aa1 through aaN. The output from the above command contains the temperature in the fifth column, and the internal energy in the sixth column. The RMSD with respect to a reference structure may be available in the seventh column if a reference structure was provided during the replica exchange run.

Extract the variation of temperature for each of the four replicas and plot the results to examine whether each of the replicas samples the entire range of temperatures (as it should).

Another way to query the replica exchange output is to look at which clients are at the lowest temperature at each cycle:
    
  rexinfo.pl -inx 1:100 -bytemp 300    
The output from this command is very similar as in the previous case. Since there are 20,000 cycles if the replica exchange simulation from the previous tutorial is complete, we are limiting the output here to the first 100 cycles. You will notice that the temperature in column 5 is not exactly 300K. This is because we have spaced temperatures exponentially between 275 and 500K and the temperature closest to 300K (299.5K) is picked automatically.

Often, it is most interesting to find out which structures are sampled most at the lowest temperature as these conformations have the lowest free energies. The following command examines the last 100 cycles and determines at which temperature different replicas spend most of heir time:
    
  rexinfo.pl -ranklast 100    
The output from this command shows the average rank for each replica in terms of temperature (with 1 corresponding to the lowest temperature) as well as the percentage of time spent at the lowest temperature. The conformations from the replica with the lowest average rank and largest percentage spent at the lowest temperature are most favorable at the lowest temperature.

2. Extraction of single conformations

The sampled conformations (saved at the end of each replica exchange cycle) are stored in a subdirectory for each replica (aa1 to aaN). If the archive parameter was specified during the replica exchange run, all of the conformations are stored in a single file (prod.coor.archive). They can be accessed with readArchive.pl. An example of how to use this tool is given in the following:
  readArchive.pl -inx 100 -xtract -pdb aa1/final.pdb aa1/prod.coor.archive
This command extracts the conformation after the 100ths cycle from replica aa1. The extracted structure is written out in PDB format to a file set100.pdb.

Use this command in combination with rexinfo.pl to extract the structure from the most favorable replica at the lowest temperature after 10,000 cycles and compare with the most favorable structures after 20,000 cycles.

3. Analysis of multiple conformations

The readArchive.pl tool can also be used to analyze a number of conformations from a given replica. For example, the following command calculates the radius of gyration for the conformations that are sampled by the first replica during the first 200 cycles:
    
  readArchive.pl -inx 1:200 -apply rgyr.pl \
                   -pdb aa1/final.pdb aa1/prod.coor.archive    
Because readArchive.pl operates on single archive files, it cannot easily analyze properties as a function of temperature instead of replica. A different, more general tool, rexanalysis.pl, is available for that purpose. It is used as in the following example:
    
  rexanalysis.pl -inx 1:200 -bytemp 300 -apply rgyr.pl    
In this example, the radius of gyration is calculated for the conformations from different replicas that are found at 300K during the last 1,000 cycles.

As in the previous readArchive.pl command the radius of gyration calculation is performed with the external rgyr.pl command. In fact, any program or script could be used instead. This provides a lot of flexibility but it is not particularly efficient for the analysis of large replica exchange simulations. A more efficient option is to provide a perl function analyze that takes a Molecule object as an argument and returns a value.

In the radius of gyration calculation example such a function would look simply like this (similar to what the rgyr.pl script does):
    
  sub analyze {     
    my $mol=shift;     
    return &Analyze::radiusOfGyration($mol,0);    
  }        
  1;    
Enter this piece of perl code into a file named rgyr.function and rerun the replica exchange analysis tool with the following options:
    
  rexanalysis.pl -inx 1:200 -bytemp 300 -function rgyr.function      
This command should give essentially the same output at a much faster speed because the time for starting an external program for each conformation is saved.

4. 2D WHAM free energy analysis

A powerful method for analyzing the results from replica exchange simulations is weighted-histogram analysis that is most convenient and informative when based on two reaction coordinates that are appropriate for distinguishing the sampled conformations.

In order to prepare the input for the WHAM analysis we need the values of the two reaction coordinates for each temperature at each replica exchange cycle that is being used for analysis. The rexanalysis.pl tool generates such data if a specific temperature is not given:
    
  rexanalysis.pl -inx 10000:20000 -step 20 -function rgyr.function > rgyr.data    
In this case we will skip the first 10,000 cycles that we will consider equilibration. We are also only using every 20th conformation to limit the amount of data we are generating. Use rexanalysis.pl in combination with genseq.pl to also generate the percentage of helicity as a second reaction coordinate. While there are different ways to complete this task it might be easiest to write another function based on genseq.pl that calculates the helicity in the following command:

  rexanalysis.pl -inx 10000:20000 -step 20 -function helicity.function > helicity.data    
Note: writing this function will take some time, but should give you greater familiarity with the data structures of the MMTSB toolset. It may be helpful to consider what elements or specific functions in the genseq.pl script need to be invoked to compute helicity. In addition, given the standard output from genseq.pl, one should consider what post-processing of this output will be required to concatenate the relevant information.

Once we have the data for the reaction coordinates we can run the WHAM analysis with the following command:
    
  rexanalysis.pl -inx 10000:20000 -step 20 \
                 -wham rgyr:rgyr.data:5:15:20=helicity:helicity.data:0:1:5 \
                 -whamtemp 300:400    
The arguments to the -wham option describes the reaction coordinates, including the desired range and binning for the histogram generation. The temperatures at which the PMF surfaces are generated are given with -whamtemp. It is important to exactly match the replica indices and step size (given with -inx and -step) to what was used when the data files were generated so that the data is matched correctly to the energies during the WHAM analysis.

As a result of this command two output files are generated in the current directory containing the two-dimensional PMF profiles at 300K (rgyr_helicity.300.pmf) and 400K (rgyr_helicity.400.pmf).

The profiles can be plotted with gnuplot using the following commands:
    
  set cntrparam levels discr 0,0.5,1,1.5,2,3,5    
  set dgrid3d 5,20,2    
  set contour base 
  unset surface
  set view 0,0,1.2  
  splot "rgyr_helicity.300.pmf" us 1:2:3 w lines    
Compare the surfaces between 300 and 400K. What do you learn from this?

Written by M. Feig