MMTSB Tool Set - Ensemble computing

This tutorial will explore the ensemble computing capabilities of the MMTSB Tool Set. As an example the WW domain from Yes-associated protein (Yap), for which a structure has been reported in complex with a bound proline rich peptide is used to estimate the binding energy using molecular dynamics and MM/GB binding energy estimates.

This tutorial will illustrate how we can use the MMTSB Tool Set to run an MD simulation of the protein-ligand complex, create an ensemble structure from this dynamics trajectory and then use the ensemble analysis tools in the MMTSB Tools Set to calculate an approximate binding energy.

Binding energies will be estimated following the MMPB/SA or MMGB/SA scheme. From the conformations sampled during a molecular dynamics simulation of the complex average energies are calculated separately for the complex, the receptor, and the substrate. The binding energy can then be estimated from the difference:

ΔΔG(binding) = ΔG(complex) - ΔG(receptor) - ΔG(ligand)

1. Initial system setup

Obtain/copy the experimental structure of the YAP-WW-domain from the Protein Data Bank. The PDB code is 1JMQ

Because the structure was solved by NMR spectroscopy there are multiple models in the PDB file. Extract the first model with:

    convpdb.pl -model 1 1JMQ.pdb > yapww.pdb

Next, we will minimize the experimental structure with distance dependent dielectric as a fast approximation of the solvent environment:

    minCHARMM.pl -par dielec=rdie,epsilon=4,minsteps=50 \
                 -log min.log yapww.pdb > yapww.min.pdb

2. Molecular dynamics sampling

We are now ready to run a molecular dynamics simulation of the complex to obtain conformational samples for the MMGB/SA analysis. Because no explicit solvent is present, the simulation will be run with GB (GBMV). Also, this simulation is very short for this type of analysis, but it will suffice for the purposes of this tutorial.

    mdCHARMM.pl -par dynsteps=2000,dynoutfrq=100 -trajout yapww.dcd \
                -log dynamics.log -final yapww.md.pdb yapww.min.pdb

3. Extraction of ensemble data

For the subsequent analysis we will take advantage of the ensemble computing facility in the MMTSB Tool Set. In order to use the ensemble computing tools we have to first generate an ensemble directory structure from the trajectory file. This can be done with the following command:

    processDCD.pl -ensdir ens -ens complex yapww.md.pdb yapww.dcd

In this case the ensemble data is stored in a subdirectory 'ens' and the conformations from the trajectory are available through the 'complex' tag.

Next, we will extract the substrate (chain P) and receptor (chain A) for each complex into separate files for later analysis:

    ensrun.pl -dir ens -new substrate complex convpdb.pl -chain P
    ensrun.pl -dir ens -new receptor complex convpdb.pl -chain A

We now have three files for each snapshot, complex.pdb, receptor.pdb, and substrate.pdb. You should find these files in any of the ensemble subdirectories. You will see, however, that the files are automatically compressed to preserve space.

4. Energy evaluation

We are now ready to evaluate the energies that we need for the binding free energy estimate. First, we calculate the total energy for the complex:

    ensrun.pl -dir ens -set dgcomplex:1 complex enerCHARMM.pl \
              -par gb,nocut complex.pdb

Second, we will estimate the energies for the substrate and receptor alone:

    ensrun.pl -dir ens -set dgreceptor:1 complex enerCHARMM.pl \
              -par gb,nocut receptor.pdb
    ensrun.pl -dir ens -set dgsubstrate:1 complex enerCHARMM.pl \
              -par gb,nocut substrate.pdb

We can take a look at the results with getprop.pl:

    getprop.pl -dir ens -prop dgcomplex,dgreceptor,dgsubstrate complex

... or combine results to get the binding free energy for each snapshot

    getprop.pl -dir ens -prop dgcomplex-dgreceptor-dgsubstrate complex

... or obtain the average value:

    getprop.pl -dir ens -score avg \
               -prop dgcomplex-dgreceptor-dgsubstrate complex

The result is likely around -30 kcal/mol. That number seems fairly large, but in our analysis so far we have neglected entropic effects related to the substrate and receptor (the entropic effects of the solvent are included in the implicit solvent model).