This tutorial will illustrate how to use the MMTSB Tool Set to access a variety of
tools for template-based protein structure prediction.
As an example we will predict the structure of human peptidyl-prolyl cis-trans isomerase
G with the following sequence:
1. Secondary structure prediction
Copy the sequence into a file called sequence
and then run PSIPRED through
the following command:
psipred.pl sequence > 2ndary.prediction
This command will take a few minutes to complete because PSIPRED first runs
PSI-BLAST to obtain a sequence profile.
Take a look at the secondary structure. You should find that this protein is
predicted to consist mostly of extended segments (E) rather than helices (H).
2. Identification of Templates
In order to find templates from PDB structures with similar sequences we will
run PSI-BLAST with the following command:
psiblast.pl -pdb -log psiblast.log sequence > psiblast.alignments
option is used to indicate that only sequences of structures
from the PDB will be searched. This command also takes a few minutes because
the entire genomic sequence database is searched initially to build a sequence
The output from this command contains the top scoring alignments to known PDB
structures that could be used as templates. What is the function of these
templates? Does it match the function of the protein that we want to predict?
tool can also be used to extract single alignments
in FASTA format from the log file with the following command:
psiblast.pl -readlog psiblast.log -no 4 sequence > alignment.4
Use this command to extract the first 10 alignments into separate files.
3. Template-based Modeling
From the alignment files we can build template-based models with buildModel.pl
buildModel.pl alignment.1 > model.1.pdb
This script performs a number of tasks, including side chain modeling and loop modeling
of loops with less than 12 residues using Modeller.
After the modeling is complete, the output will tell you
which part of the structure was built and which parts are missing from the model
that was generated.
Repeat this step for all ten alignments.
What is the consensus range of residues that is covered by all models?
In the following we will truncate all of the models to the same length and score
them to decide which one is the best model. Because we have 10 structures we will
use the ensemble computing facility to make life easier.
We begin by creating an ensemble from the 10 models:
checkin.pl -dir ens model model.*.pdb
Now truncate all of the models to the same length:
ensrun.pl -new truncated -dir ens model convpdb.pl -sel :
We can minimize and score the truncated models with:
ensmin.pl -par minsteps=100,dielec=rdie,epsilon=4 -dir ens truncated min
ensrun.pl -set score:1 -dir ens min enerCHARMM.pl -par gb,nocut
The best model can be found with getprop.pl
as the model with
the most negative energy:
getprop.pl -prop score -dir ens min | sort +1n
Take a look at this model with VMD. You can also examine its secondary
genseq.pl -out onesec -dssp
How well does the secondary structure from this model match the predicted
4. Comparison with Experimental Structure
The experimental structure is given in the file native.pdb
. It is also
available from the Protein Data Bank with the ID 2GW2.
We can compare our predicted structures with the native by calculating root
mean square deviations of our models:
ensrun.pl -set rmsnative:1 -dir ens min rms.pl -fit -out CA \
option is needed to perform a least-squares fit superposition
of the models with the native before calculating the RMSD values. -nowarn
suppresses warning messages about missing atoms/residues in the experimental
structure. In this example we will look at C-alpha RMSD values.
again to check whether the best-scoring structure
also corresponds to the model with the smallest RMSD:
getprop.pl -prop rmsnative,score -dir ens min