Recentering/orienting and removing water from the
trajectory
We must first process the raw trajectory to facilitate some of the later
analyses. The trajectrory was obtained from a simulation using periodic boundary
conditions (PBC), which means that it is possible for the solute (the protein)
to be partly out of the primary simulation box if we look at the raw trajectory.
By recentering the trajectory we move solvent molecules, according to the PBC,
so that the protein is in the center of the box in each frame. Furthermore we
then re-orient each frame so that the protein is superimposed on the coordinates
of the initial protein structure, thus removing overall protein
rotation/translation motions. For a very flexible system the removal of overall
rotation may be a non-trivial task, but grx1 is compact and quite rigid so we
base the superposition on all atoms in the protein.
You run the job in this way:
$CHARMMEXEC < orient-recenter.inp > orient-recenter.out
The script uses the CHARMM MERGE command (dynamc.doc). The output file
contains numerous messages about and is quite large. Two new trajectory files
are produced, both of which have the protein translation/rotation removed. One
has the solvent recentered around the protein, and the other has the solvent
completely removed. MERGE can also be used to join or split trajectory files or
to change (reduce) sampling frequency.
Note that this input file (as do all the other examples in this tutorial)
starts by "calling" (STREam, see miscom.doc) another file, init.str, which
contains all commands necessary to setup this system:
* Open&read rtf,param,psf,crd for grx1 (1 ns part of full simulation)
* Open "raw" trajectory on unit 51,
* recentered/oriented trajectory on unit 61
*
read rtf card name top_all22_prot_cust.inp
read para card name par_all22_prot_3.inp
read psf card name grx1_solv.psf
read coor card name grx1_solv_min.crd
define SOLUTE sele segid grx1 end
set MXTIME 81
set NF 1
set SG grx1
set xtla 56.0
set xtlb 46.0
set xtlc 46.0
open read form unit 50 name grx1_solv.frm
open read unform unit 51 name grx1_solv.trj
open read unform unit 61 name grx1_solv_rc.trj
! TO USE THIS TRAJECTORY: DELETE ATOM SELE .NOT. SOLUTE END
open read unform unit 62 name grx1_prot_orient.trj
return
The idea behind this is that (most of) the analysis scripts can be used on
(not too) different systems without change - all that you have to do is to
substitute a new "init.str" with the appropriate information for the new system
to be analyzed.
Root Mean Square Deviation (RMSD) and Radius of Gyration
(RGYR)
Two simple characteristics of a structure are i) its RMSD wrt to
some reference structure (eg, the starting point of a simulation), which
tells us something about the dissimilarity between the structures - 1Å RMSD is
barely visible to the eye, and RMSD 10Å is very different, and ii) its
RGYR, which is a combined measure of its overall size and shape - a sphere has
the smallest RGYR of all bodies of the same volume (=0.6*sphere-radius);
increasing the volume, making the shape more anisotropic or having more of its
mass at the periphery all lead to an increased RGYR. The proper definition
of RGYR in mechanics is mass-weighted, which you can get by adding keywor dMASS
to the COOR RGYR command.
You run the job in this way:
$CHARMMEXEC < rmsd-rgyr.inp > rmsd-rgyr.out
This script uses the CORREL module (correl.doc) to extract the RMSD and RGYR
timeseries, and the results are output to one file. For single structures the
COOR command (corman.doc) is used. The results could have been output to one
file for each property, but we use the ability to edit the dimensions of the
extracted time series in CORREL (edit ... veccod) to put all the
information into one file.
Hydrogen Bonds
Hydrogen bonding patterns often provide useful information. In this exercise
we use the COOR HBOND (corman.doc) command to find hydrogen bonds in a single
structure, as well as average number of hydrogen bonds and their average life
times from a trajectory. Hydrogen bonds can be defined in many ways, but we use
a simple geometric criterion: A hydrogen bond exists if the distance between the
hydrogen and acceptor atoms is less than 2.4Å. This works well in practice. Note
that when we have information about the hydrogen position, the hydrogen bond
definition is not very sensitive to angular criteria, as are often used when
determining hydrogen bonds in X-ray structures (lacking hydrogens). We will look
at hydrogen bonds within the protein, between protein and water, and also on
water-mediated hydrogen bonding contacts between different parts of the protein,
ie hydrogen bonded interactions of the form A/D - water - A/D, where A/D
denotes a hydrogen bond donor or acceptor in the protein. COOR HBOND uses the
information about acceptors and donors in the PSF so we can use quite simple,
and general atom-selections; all hydrogen bonds between the acceptors/donors in
the two selections are calculated. A second form of the command (COOR CONTact)
just applies the distance criterion to all the selected atoms.
You run the job in this way:
$CHARMMEXEC < hbonds.inp > hbonds.out
The results are presented as hydrogen bonds per each acceptor/donor in the
first atom-selection; if the VERBose keyword is present a break-down in terms of
accptors/donors in the second atom-selection is given. NB! The VERBose keyword
is not useful in a CHARMM loop when you want to extract total number of
hydrogen bonds through CHARMM variables (subst.doc). <occupancy> is the average
number of hydrogen bonds formed by a given accptor/donor during the trajectory.
The <lifetime> (in ps) is the average duration of each instance of a given
hydrogen bond.
Secondary Structure
COOR SECS (corman.doc) computes the secondary structure characteristics of a
protein using the Kabsch&Sander algorithm (Kabsch, W., and Sander, C. (1983).
Dictionary of Protein Secondary Structure: Pattern Recognition of
Hydrogen-Bonded and Geometrical Features. Biopolymers 22, 2577-2637.),
which is based on backbone hydrogen bond patterns; this is the same as in the
DSSP program (almost... the CHARMM implementation is a slight simplification and
uses the hydrogen-acceptor-atom definition of a hydrogen bond).
You run the job in this way:
$CHARMMEXEC < secondary-structure.inp > secondary-structure.out
Results are summarized in the output file, and are also returned as a
numerical flag in the WMAIN array.
NMR Relaxation Rates and Generalized Order Parameters
MD simulations and NMR experiments often cover similar time-scales.
Relaxation phenomena observed in NMR experiments depend on motional behavior of
nuclear dipoles in the macromolecule, and the NMR module in CHARMM has been
designed to allow efficient extraction of NMR related parameters from a
trajectory (nmr.doc). This example analyzes relaxtaion parameters (relaxation
rates, generalized order parameters, configurational entropy estimates) for the
grx1 backbone amide groups.
You run the job in this way:
$CHARMMEXEC < nmr.inp > nmr.out
Results, with some details such as the computed correlation functions are
given in the output file, and are also summarized in the nh.dat file. Note that
we use the trajectory with overall protein (translation)/rotation removed - we
assume that internal and overall motions occur on very different time scales so
that we can confidently separate these motions and focus on the internal
motions.
Phi/Psi Distributions and Phi/Psi Based Clustering
Peptide backbone dihdrals phi and psi determine the overall fold of a
protein. In this example we first extract histograms of phi and psi
distribution from the trajectory for a number of residues (correl.doc) and then
use some of the dihdrals that vary during the simulation for clustering the
structures in the trajectory
You run the dihedral distribution job in this way (NB! this is the one
example that requires command-line arguments):
$CHARMMEXEC < phi-psi-dist.inp > phi-psi-dist.out NFIRST=2 NLAST=60
There are more than 60 residues in grx1, but some internal limit in CHARMM
restricts us to less than about 70 residues for a single CORREL analysis of
phi/psi dihedral; if you want data for all residues just run the job
several times with appropriate NFIRST and NLAST arguments (note that the very
first and last residues cannot be analyzed with this script, since the Phi and
Psi definitions go beyond one single residue).
After the TRAJ command in the CORREL module has been executed averages and
fluctuations are printed out for each time series; you can use this as a guide
to find dihedrals that are likely to be of interest (ie, they populate
different regions); primary candidates have large fluctuations. Phi and Psi
histograms for each residue are output to a file named phi-psi-n.dat, where "n"
is the residue number.
Having checked the results of the phi/psi distributions we may now try
clustering conformations (correl.doc):
$CHARMMEXEC < cluster.inp > cluster.out
Pair-wise RMSD Matrix and 2D Projection
In this exercise we compute the a matrix whose of RMSDs between all pairs of
N conformations in the trajectory, and then project the result onto a 2D plane,
represented by a set of coordinate pairs {pi,qi}, i=1,N,
with the property that the distance between (pi,qi) and
(pj,qj) is optimally close to the RMSD between conformers
i and j (Levitt, M. (1983). Molecular Dynamics of Native Protein.II. Analysis
and Nature of Motion. J Mol Biol 168, 621-657.). The CHARMM
implementation of this algorithm is for a 2D projection (easy to visualize), but
it is easily generalized to several dimensions.
You run the job in this way:
$CHARMMEXEC < rmsd-matrix.inp > rmsd-matrix.out
The 2D projection requires a non-linear optimization, and thus is dependent
on initial estimeates (generated using random numbers), and it is strongly
suggested to run the procedure several times, with different seeds (PQseed
keyword) for the random number generator in order to get a feel for the
stability and validity of the 2D projection.
Fluorescence Anisotropy
The motion of chromophores may be detected by following the time-dependence
of the polarized components of fluorescence emission following a brief pulse of
polarized excitation. This time dependent anisotropy can also be computed from
the (rank two) auto correlation of a unit vector along the transmission dipole
(or, more strictly, the corelation between absorption and emission dipoles). Trp
is the most useful intrinsic chromophore in proteins; Tyr also has a certain
fluorescence, but less intense than that of Trp. Extrinsic probes (dansyl
chloride and many others) are often used. The fast decay of this fluorescence
anisotropy means that we usually can decouple the internal motions from the
overall rotational diffusion of the protein (see also the nmr exercise).
You run the job in this way:
$CHARMMEXEC < trp-anisotropy.inp > trp-anisotropy.out
Solvent Dynamics and Structure
All cells live an aqueous environment and their insides (cytoplasm)
are also largely aqueous. Biomacromolecules thus have evolved to function with
this in "mind" (membrane proteins are of course different in this respect);
consquently CHARMM has well-developed facilities to analyze solvation behavior
(see also the Hydrogen Bonds exercise). Here we will use the COOR ANAL
(corman.doc) module to extract solvent dynamics (translational and rotational
diffusion) and structure in terms of radial distribution functions.
You run the job in this way:
$CHARMMEXEC < solvent.inp > solvent.out
Further Examples
The above exercises have been meant to show some of the analysis capabilities
of CHARMM. Most important is that you have to formulate your own scientifically
relevant questions, and then find a way to answer them. CHARMM is an evolving
research tool, and by combining its existing analysis facilites in various ways
using the scripting language you get a long way towards obtaining the data that
you need. More examples can be found at the CHARMM Web-site
www.charmm.org, in the Script Archive forum,
to which everybody is encouraged to submit their own scripts, for analysis,
structure manipulation, simulation or ....