This tutorial will explore the ensemble computing capabilities of the MMTSB Tool Set.
As an example the WW domain from Yes-associated protein (Yap), for which a structure
has been reported in complex with a bound proline rich peptide is used to estimate the
binding energy using molecular dynamics and MM/GB binding energy estimates.
This tutorial will illustrate how we can use the MMTSB Tool Set to run an MD simulation
of the protein-ligand complex, create an ensemble structure from this dynamics trajectory
and then use the ensemble analysis tools in the MMTSB Tools Set to calculate an approximate
binding energy.
Binding energies will be estimated following the MMPB/SA or MMGB/SA scheme. From
the conformations sampled during a molecular dynamics simulation of the complex
average energies are calculated separately for the complex, the receptor, and the
substrate. The binding energy can then be estimated from the difference:
ΔΔG(binding) = ΔG(complex) - ΔG(receptor) - ΔG(ligand)
1. Initial system setup
Obtain/copy the experimental structure of the YAP-WW-domain from the
Protein Data Bank. The PDB code is
1JMQ
Because the structure was solved by NMR spectroscopy there are multiple models in the PDB file. Extract the first model with:
convpdb.pl -model 1 1JMQ.pdb > yapww.pdb
Next, we will minimize the experimental structure with distance dependent dielectric
as a fast approximation of the solvent environment:
minCHARMM.pl -par dielec=rdie,epsilon=4,minsteps=50 \
-log min.log yapww.pdb > yapww.min.pdb
2. Molecular dynamics sampling
We are now ready to run a molecular dynamics simulation of the complex to obtain
conformational samples for the MMGB/SA analysis. Because no explicit solvent is
present, the simulation will be run with GB (GBMV). Also, this simulation is very short
for this type of analysis, but it will suffice for the purposes of this tutorial.
mdCHARMM.pl -par dynsteps=2000,dynoutfrq=100 -trajout yapww.dcd \
-log dynamics.log -final yapww.md.pdb yapww.min.pdb
3. Extraction of ensemble data
For the subsequent analysis we will take advantage of the ensemble computing facility
in the MMTSB Tool Set. In order to use the ensemble computing tools we have to first
generate an ensemble directory structure from the trajectory file.
This can be done with the following command:
processDCD.pl -ensdir ens -ens complex yapww.md.pdb yapww.dcd
In this case the ensemble data is stored in a subdirectory 'ens' and the conformations
from the trajectory are available through the 'complex' tag.
Next, we will extract the substrate (chain P) and receptor (chain A) for each complex
into separate files for later analysis:
ensrun.pl -dir ens -new substrate complex convpdb.pl -chain P
ensrun.pl -dir ens -new receptor complex convpdb.pl -chain A
We now have three files for each snapshot,
complex.pdb,
receptor.pdb, and
substrate.pdb. You should find these files
in any of the ensemble subdirectories. You will see, however, that the files
are automatically compressed to preserve space.
4. Energy evaluation
We are now ready to evaluate the energies that we need for the binding free
energy estimate. First, we calculate the total energy for the complex:
ensrun.pl -dir ens -set dgcomplex:1 complex enerCHARMM.pl \
-par gb,nocut complex.pdb
Second, we will estimate the energies for the substrate and receptor alone:
ensrun.pl -dir ens -set dgreceptor:1 complex enerCHARMM.pl \
-par gb,nocut receptor.pdb
ensrun.pl -dir ens -set dgsubstrate:1 complex enerCHARMM.pl \
-par gb,nocut substrate.pdb
We can take a look at the results with
getprop.pl:
getprop.pl -dir ens -prop dgcomplex,dgreceptor,dgsubstrate complex
... or combine results to get the binding free energy for each snapshot
getprop.pl -dir ens -prop dgcomplex-dgreceptor-dgsubstrate complex
... or obtain the average value:
getprop.pl -dir ens -score avg \
-prop dgcomplex-dgreceptor-dgsubstrate complex
The result is likely around -30 kcal/mol. That number seems fairly large, but
in our analysis so far we have neglected entropic effects related to the substrate
and receptor (the entropic effects of the solvent are included in the implicit
solvent model).