Loop modeling with Modeller and MMTSB Tool Set

This tutorial will illustrate how to use Modeller in combination with the MMTSB Tool Set to model a missing loop in a given protein structure.

We will revisit the structure of ribonuclease A from PDB code 1RNU, shown on the right. In this structure, residues 16-23 are not resolved. These residues will be modeled in this tutorial.

1. Preparation of input files

Copy the PDB file for 1RNU and extract the protein chain with the following command:

    convpdb.pl -nsel protein 1RNU.pdb > 1rnu.pdb

2. Generation of loop models

We will now call Modeller to generate 200 different models for residues 16 through 23. The sequence for the missing residues is STSAASSS. The command is:

    loopModel.pl -models 200 -loop 16:STSAASSS 1rnu.pdb > modeller.scores

This command will take 10-20 minutes to finish. When it is done, the current directory will contain the completed structures in files model.1.pdb through model.200.pdb. Furthermore, the output file modeller.scores contains the scores that Modeller assigned for each of the structures.

3. Model analysis

We will proceed by checking the structures into an ensemble in order to faciliate further analysis:

    checkin.pl -dir ens model model.?.pdb model.??.pdb model.???.pdb

We can associate the scores from the output file modeller.scores (in the third column) with the following command:

    setprop.pl -dir ens -f modeller.scores -inx 3 model score

Let us now cluster the conformations based only on the differences for residues 16 through 23:

    enscluster.pl -kclust -l 16:23 -nolsqfit -radius 2 -dir ens model

All of the structures should be oriented in the same way so that we do not need to superimpose the structures before comparing them. Check the clusters that have been generated with the following command:

    showcluster.pl -dir ens model

Now we can use bestcluster.pl to find the best cluster (according to the average Modeller score):

    bestcluster.pl -dir ens -prop score model

4. All-atom scoring

In addition to the Modeller score we will also calculate CHARMM implicit solvent free energy estimates. This requires that we minimize the structures briefly. The minimization of all the structures in the ensemble is accomplished with:

    ensmin.pl -l 1rnu.pdb 16:23 -par dielec=rdie,epsilon=4 -dir ens model min

This command will only minimize the loop conformations but keep the rest of the structure according to the input structure 1rnu.pdb. We will use distance-dependent dielectric here to minimize the computational costs, but even with this option it will take a few minutes to complete this step. If you have multiple CPUs/cores, you can use the option -cpus N to parallelize the calculation. The result is a new set of structures with the new tag min. Next, we will calculate the energeies for the minimized structures with:

    enseval.pl -par gb,nocut -set mmgbsa=total -dir ens min

The all-atom energies are not available under the mmgbsa for the min tag. In order to compare with the Modeller scores and use the clustering information for the original structures (under the model tag) we can transfer the data with the followign command:

    getprop.pl -dir ens -prop mmgbsa min | setprop.pl -dir ens -f - -inx 2 model mmgbsa

Now we can try the bestcluster.pl command again, but this time using the all-atom score:

    bestcluster.pl -dir ens -prop mmgbsa model

Are the cluster ranked differently?

5. Comparison with experimental structure

A newer structure of ribonuclease A (7RSA) actually contains a structure for residues 16 through 23. Let's copy the PDB file for 7RSA and compare how much the predicted structures deviate from the experimental structure:

    ensrun.pl -set rmsca:1 -dir ens model rms.pl -fitxl -l 16:23 -out CA `pwd`/7RSA.pdb

With the previous command we calculate the coordinate root mean square deviation for residues 16 through 23 after superimposing the rest of the structure. The result is stored under the name rmsca. We can now use bestcluster.pl again to find out whether the best-scoring cluster also corresponds to the cluster with the lowest RMSD:

    bestcluster.pl -dir ens -prop rmsca model

Find the models with the lowest RMSD and with the best scores and compare them visually using VMD. You can use ensfiles.pl as in the following command to find the best structures:

    ensfiles.pl -cluster t.3 -sort rmsca -dir ens model
    ensfiles.pl -cluster t.3 -sort score -dir ens model
    ensfiles.pl -cluster t.3 -sort mmgbsa -dir ens model