This tutorial will illustrate how to use the MMTSB Tool Set to analyze
replica exchange simulations, including weighted-histogram analysis to
obtain free energy maps.
In order to generate the input data for this tutorial it is necessary
to run the peptide folding
replica exchange tutorialfirst.
It is assumed that the output from that tutorial is available. Start by going to the
main replica exchange output directory.
1. Information about replica exchange simulation
The tool
rexinfo.pl can provide various output related
to the progress of the replica exchange simulation (rather than
the sampled conformations which we will look at later).
For example, it is possible to examine how the temperature varies
for a given replica with:
cd rex
rexinfo.pl -byclient aa1
Alternatively, you can be in your tutorial directory and use the command:
rexinfo.pl -byclient aa1 -dir rex
"Client" means replica here, and for all-atom replica exchange
simulations, the replicas are named aa1 through aaN.
The output from the above command contains the temperature in the
fifth column, and the internal energy in the sixth column. The RMSD
with respect to a reference structure may be available in the seventh
column if a reference structure was provided during the replica
exchange run.
Extract the variation of temperature for each of the four replicas
and plot the results to examine whether each of the replicas
samples the entire range of temperatures (as it should).
Another way to query the replica exchange output is to look
at which clients are at the lowest temperature at each cycle:
rexinfo.pl -inx 1:100 -bytemp 300
The output from this command is very similar as in the previous case.
Since there are 20,000 cycles if the replica exchange simulation from
the previous tutorial is complete, we are limiting the output here to
the first 100 cycles. You will notice that the temperature in column
5 is not exactly 300K. This is because we have spaced temperatures
exponentially between 275 and 500K and the temperature closest to
300K (299.5K) is picked automatically.
Often, it is most interesting to find out which structures are sampled
most at the lowest temperature as these conformations have the lowest
free energies. The following command examines the last 100 cycles and
determines at which temperature different replicas spend most of
heir time:
rexinfo.pl -ranklast 100
The output from this command shows the average rank for each replica
in terms of temperature (with 1 corresponding to the lowest temperature)
as well as the percentage of time spent at the lowest temperature.
The conformations from the replica with the lowest average rank and
largest percentage spent at the lowest temperature are most favorable
at the lowest temperature.
2. Extraction of single conformations
The sampled conformations (saved at the end of each replica exchange cycle) are
stored in a subdirectory for each replica (aa1 to aaN). If the
archive
parameter was specified during the replica exchange run, all of the conformations
are stored in a single file (
prod.coor.archive). They can be accessed with
readArchive.pl. An example of how to use this tool is given in the
following:
readArchive.pl -inx 100 -xtract -pdb aa1/final.pdb aa1/prod.coor.archive
This command extracts the conformation after the 100
ths cycle from replica aa1.
The extracted structure is written out in PDB format to a file
set100.pdb.
Use this command in combination with
rexinfo.pl to extract the structure from
the most favorable replica at the lowest temperature after 10,000 cycles and compare
with the most favorable structures after 20,000 cycles.
3. Analysis of multiple conformations
The
readArchive.pl tool can also be used to analyze a number of conformations
from a given replica. For example, the following command calculates the radius of
gyration for the conformations that are sampled by the first replica during the
first 200 cycles:
readArchive.pl -inx 1:200 -apply rgyr.pl \
-pdb aa1/final.pdb aa1/prod.coor.archive
Because
readArchive.pl operates on single archive files, it cannot
easily analyze properties as a function of temperature instead of replica.
A different, more general tool,
rexanalysis.pl, is available for that
purpose. It is used as in the following example:
rexanalysis.pl -inx 1:200 -bytemp 300 -apply rgyr.pl
In this example, the radius of gyration is calculated for the conformations
from
different replicas that are found at 300K during the last 1,000 cycles.
As in the previous
readArchive.pl command the radius of gyration calculation
is performed with the external
rgyr.pl command. In fact, any program or
script could be used instead. This provides a lot of flexibility but it is not
particularly efficient for the analysis of large replica exchange simulations. A more
efficient option is to provide a perl function
analyze that takes a
Molecule object as an argument and returns a value.
In the radius of gyration calculation example such a function would look simply like this
(similar to what the
rgyr.pl script does):
sub analyze {
my $mol=shift;
return &Analyze::radiusOfGyration($mol,0);
}
1;
Enter this piece of perl code into a file named
rgyr.function and rerun
the replica exchange analysis tool with the following options:
rexanalysis.pl -inx 1:200 -bytemp 300 -function rgyr.function
This command should give essentially the same output at a much faster speed because the time
for starting an external program for each conformation is saved.
4. 2D WHAM free energy analysis
A powerful method for analyzing the results from replica exchange simulations is
weighted-histogram analysis that is most convenient and informative when based
on two reaction coordinates that are appropriate for distinguishing the sampled
conformations.
In order to prepare the input for the WHAM analysis we need the values of the
two reaction coordinates for each temperature at each replica exchange cycle that
is being used for analysis. The
rexanalysis.pl tool generates such data
if a specific temperature is not given:
rexanalysis.pl -inx 10000:20000 -step 20 -function rgyr.function > rgyr.data
In this case we will skip the first 10,000 cycles that we will consider equilibration.
We are also only using every 20th conformation to limit the amount of data we are
generating. Use
rexanalysis.pl in combination with
genseq.pl to also generate
the percentage of helicity as a second reaction coordinate. While there are different ways to
complete this task it might be easiest to write another function based on
genseq.pl that
calculates the helicity in the following command:
rexanalysis.pl -inx 10000:20000 -step 20 -function helicity.function > helicity.data
Note: writing this function will take some time, but should give you greater familiarity with the
data structures of the MMTSB toolset. It may be helpful to consider what elements or specific
functions in the
genseq.pl script need to be invoked to compute helicity. In addition,
given the standard output from
genseq.pl, one should consider what post-processing of this
output will be required to concatenate the relevant information.
Once we have the data for the reaction coordinates we can run the WHAM analysis with
the following command:
rexanalysis.pl -inx 10000:20000 -step 20 \
-wham rgyr:rgyr.data:5:15:20=helicity:helicity.data:0:1:5 \
-whamtemp 300:400
The arguments to the
-wham option describes the reaction coordinates, including
the desired range and binning for the histogram generation. The temperatures at which the
PMF surfaces are generated are given with
-whamtemp. It is important to exactly
match the replica indices and step size (given with
-inx and
-step)
to what was used when the data files were generated so that the data is matched
correctly to the energies during the WHAM analysis.
As a result of this command two output files are generated in the current directory
containing the two-dimensional PMF profiles at 300K (
rgyr_helicity.300.pmf)
and 400K (
rgyr_helicity.400.pmf).
The profiles can be plotted with gnuplot using the following commands:
set cntrparam levels discr 0,0.5,1,1.5,2,3,5
set dgrid3d 5,20,2
set contour base
unset surface
set view 0,0,1.2
splot "rgyr_helicity.300.pmf" us 1:2:3 w lines
Compare the surfaces between 300 and 400K. What do you learn from this?