The objective of this tutorial is to
become familiar with the MMTSB Tool Set and learn how to prepare a system for
modeling tasks starting with a file from the Protein Data Bank. The goal
is to prepare a solvated system of a protein-RNA complex that can be used as the input for
simulation studies.
In this tutorial the double-stranded RNA-binding protein
Xlrbpa from Xenopus laevis is used as an example. A crystal structure for the
protein-RNA complex is available from the Protein Data Bank
with the PDB code
1DI2. A picture of the asymmetric unit is shown on the right. More
information about the crystal structure is available from this paper.
The system consists of two RNA-binding domains (chain A and B), four chains
of RNA (C, D, E, G), and the oxygen positions of 359 crystallographic water molecules.
The RNA in the biological complex is obtained by using the symmetry copy of chain E
instead of chain G. The first RNA strand consists of chain E followed by chain C, the
second RNA strand of chain D followed by the symmetry copy of chain E. According to
the PDB entry, the crystal structure was solved for a mutated protein (N112M).
|
|
|
1. Obtain/copy the PDB file 1DI2.pdb from the Protein Data Bank to the current
working directory.
|
|
2. Extract the protein chains A and B
convpdb.pl -chain A 1DI2.pdb > 1di2.proteinA.pdb
convpdb.pl -chain B 1DI2.pdb > 1di2.proteinB.pdb
|
|
3. Reverse the N112M mutation to obtain the biological sequence with mutate.pl. Because the
mutation script does not handle chain IDs well, the chain ID of the input structure is first removed
(set to blank) and later reset to A or B respectively. Note, how multiple commands can be combined through
pipes so that the output from one command is used as input for the next command.
convpdb.pl -setchain " " 1di2.proteinA.pdb | mutate.pl -seq 112:N | \
convpdb.pl -setchain A > 1di2.proteinA.mutated.pdb
Repeat for chain B.
|
|
4. Combine the two protein chains into a single PDB file containing both chains.
convpdb.pl -merge 1di2.proteinB.mutated.pdb 1di2.proteinA.mutated.pdb > \
1di2.protein.pdb
Take a look at the resulting structure with VMD. It should look like this:
|
|
5. Extract the nucleic acid chains C, D, and E
convpdb.pl -chain C 1DI2.pdb > 1di2.rnaC.pdb
...
|
|
6. Generate the symmetry copy of chain E. The rotation matrix is available from the header
of the PDB file (see the second matrix under 'REMARK 350 APPLY THE FOLLOWING TO CHAINS: E').
convpdb.pl -rotate -1 0 0 0 1 0 0 0 -1 1di2.rnaE.pdb > 1di2.rnaE.sc.pdb
|
|
7. Generate the first RNA strand from chain E followed by chain C. This requires renumbering
of the chains from the original PDB.
convpdb.pl -setchain C -renumber 1 1di2.rnaE.pdb > \
1di2.rna.strand1.part1.pdb
convpdb.pl -renumber 11 1di2.rnaC.pdb > 1di2.rna.strand1.part2.pdb
Merge the two parts into a single file:
convpdb.pl -merge 1di2.rna.strand1.part2.pdb \
1di2.rna.strand1.part1.pdb > 1di2.rna.strand1.pdb
Repeat to generate strand 2 from chain D followed by the symmetry copy of chain E.
Merge the two strands into a single file:
convpdb.pl -merge 1di2.rna.strand2.pdb 1di2.rna.strand1.pdb > \
1di2.rna.pdb
Take a look at the resulting structure with VMD. It should look like this:
|
|
8. If you took a closer look at the resulting RNA duplex you might have noticed that the two
strands appear to be broken in the middle. This is an artifact of how the crystal structure
was obtained. The strand breaks can be "repaired" with a very short minimization in CHARMM because
CHARMM can automatically add missing atoms. The following command carries out only 10 steps of
steepest descent minimization. The 'nodeoxy' flag is needed to tell CHARMM that we are
working with RNA instead of DNA.
minCHARMM.pl -par nodeoxy,minsteps=0,sdsteps=10 1di2.rna.pdb > \
1di2.rna.fixed.pdb
Take a look at the resulting structure. The strand breaks should be gone. Also, the structure
now has all of the hydrogens compared to the PDB structure where hydrogens are missing because
they are difficult to resolve experimentally.
|
|
9. We are now ready to combine the protein and RNA into a single file:
convpdb.pl -merge 1di2.rna.fixed.pdb 1di2.protein.pdb > 1di2.complex.pdb
The model built so far should look like in the following image:
|
|
10. Next, we will add solvent to the complex. The crystal structure already
contains a number of water molecules. We will keep those and then add additional
waters around to fill a simulation box.
First, we extract the waters from the PDB file and add missing hydrogens. It is
convenient to keep the X-ray waters in a separate chain (chain X):
convpdb.pl -nsel water 1DI2.pdb | complete.pl | convpdb.pl \
-setchain X > 1di2.xray.waters.pdb
The X-ray waters are combined with the complex ...
convpdb.pl -merge 1di2.xray.waters.pdb 1di2.complex.pdb > \
1di2.complex.waters.pdb
... and then solvated in a cubic box with at least 9 A between the complex or any
of the X-ray waters to the edge of the box:
convpdb.pl -solvate -cutoff 8 -cubic 1di2.complex.waters.pdb > \
1di2.complex.solvated.pdb
|
|
11. As a last step we need to add counterions to neutralize the system. The charge of
the protein-DNA complex can be obtained from CHARMM with the following command. Again,
the 'nodeoxy' option is used because the complex contains RNA instead of DNA.
enerCHARMM.pl -par nodeoxy -charge 1di2.complex.pdb
Counterions are added to solvated system by specifying the number of positive (SOD) and/or
negative ions (CLA). Decide how many and which type of ions we need to neutralize this
system from the output of the previous command and then run the following command:
convpdb.pl -ions <type>:<num> 1di2.complex.solvated.pdb > 1di2.complex.solvions.pdb
You can check whether the resulting system is neutral with:
enerCHARMM.pl -par nodeoxy -charge 1di2.complex.solvions.pdb
Finally, we should have a fully solvated system that is ready as a starting point for simulations.
|