MMTSB Tool Set - Template-based protein structure prediction

This tutorial will illustrate how to use the MMTSB Tool Set to access a variety of tools for template-based protein structure prediction.

As an example we will predict the structure of human peptidyl-prolyl cis-trans isomerase G with the following sequence:

     RPRCFFDIAINNQPAGRVVFELFSDVCPKTCENFRCLCTGEKGTGKSTQKPLH
     YKSCLFHRVVKDFMVQGGDFSEGNGRGGESIYGGFFEDESFAVKHNAAFLLSM
     ANRGKDTNGSQFFITTKPTPHLDGHHVVFGQVISGQEVVREIENQKTDAASKP
     FAEVRILSCGELIP

1. Secondary structure prediction

Copy the sequence into a file called sequence and then run PSIPRED through the following command:

    psipred.pl sequence > 2ndary.prediction

This command will take a few minutes to complete because PSIPRED first runs PSI-BLAST to obtain a sequence profile. Take a look at the secondary structure. You should find that this protein is predicted to consist mostly of extended segments (E) rather than helices (H).

2. Identification of Templates

In order to find templates from PDB structures with similar sequences we will run PSI-BLAST with the following command:

    psiblast.pl -pdb -log psiblast.log sequence > psiblast.alignments

The -pdb option is used to indicate that only sequences of structures from the PDB will be searched. This command also takes a few minutes because the entire genomic sequence database is searched initially to build a sequence profile. The output files are available here: psiblast.log, psiblast.alignments The output from this command contains the top scoring alignments to known PDB structures that could be used as templates. What is the function of these templates? Does it match the function of the protein that we want to predict? The psiblast.pl tool can also be used to extract single alignments in FASTA format from the log file with the following command:

    psiblast.pl -readlog psiblast.log -no 4 sequence > alignment.4

Use this command to extract the first 10 alignments into separate files. Before you continue we need to install PDB templates. Download pdbfiles.tar, and run the following commands:

    mkdir pdb
    cd pdb
    tar xf pdbfiles.tar
    cd ..
    export PDBDIR=`pwd`/pdb

3. Template-based Modeling

From the alignment files we can build template-based models with buildModel.pl:

    buildModel.pl alignment.1 > model.1.pdb

This script performs a number of tasks, including side chain modeling and loop modeling of loops with less than 12 residues using Modeller. After the modeling is complete, the output will tell you which part of the structure was built and which parts are missing from the model that was generated.

Repeat this step for all ten alignments.

What is the consensus range of residues that is covered by all models?

In the following we will truncate all of the models to the same length and score them to decide which one is the best model. Because we have 10 structures we will use the ensemble computing facility to make life easier.

We begin by creating an ensemble from the 10 models:

    checkin.pl -dir ens model model.*.pdb

Now truncate all of the models to the same length:

    ensrun.pl -new truncated -dir ens model convpdb.pl -sel <from>:<to>

We can minimize and score the truncated models with:

    ensmin.pl -par minsteps=100,dielec=rdie,epsilon=4 -dir ens truncated min
    ensrun.pl -set score:1 -dir ens min enerCHARMM.pl -par gb,nocut

The best model can be found with getprop.pl as the model with the most negative energy:

    getprop.pl -prop score -dir ens min | sort -k 2 -n

Take a look at this model with VMD. You can also examine its secondary structure with:

    genseq.pl -out onesec -dssp

How well does the secondary structure from this model match the predicted secondary structure?

4. Comparison with Experimental Structure

The experimental structure is given in the file native.pdb. It is also available from the Protein Data Bank with the ID 2GW2.

We can compare our predicted structures with the native by calculating root mean square deviations of our models:

    ensrun.pl -set rmsnative:1 -dir ens min rms.pl -fit -out CA \
              -nowarn `pwd`/native.pdb

The -fit option is needed to perform a least-squares fit superposition of the models with the native before calculating the RMSD values. -nowarn suppresses warning messages about missing atoms/residues in the experimental structure. In this example we will look at C-alpha RMSD values.

Use getprop.pl again to check whether the best-scoring structure also corresponds to the model with the smallest RMSD:

    getprop.pl -prop rmsnative,score -dir ens min