CTBP SUMMER SCHOOL 2009

Part 1. BASIC AMBER TUTORIAL: SIMULATIONS ON DUPLEX DNA

By Ross Walker and Dave Case

Pictured above is the average structure from a 1 nanosecond molecular dynamics simulation of a 10 base pair poly(A)-polt(T) DNA duplex. The calculation was run in explicit solvent using periodic boundaries and the particle mesh Ewald method of treating long range electrostatics. The average structure was generated using ptraj by RMS fitting all of the DNA atoms in 1,000 snapshots at 1 ps intervals and then averaging the coordinates.

1 Introduction

The purpose of this tutorial is to provide an initial introduction to setting up and running simulations using the MMTSB toolset interface into AMBER software. (It is based on AMBER 11 and AmberTools 1.2 but should work reasonably well with other versions). In this tutorial we run a series of simulations on simple DNA or RNA structures. We will first figure out how to generate a starting structure and then use this structure as input to the MMTSB toolset.

In order to run a classical molecular dynamics simulation with Sander a number of files are required. These are (using their default filenames):

prmtop - a file containing a description of the molecular topology and the necessary force field parameters.
inpcrd (or a restrt from a previous run) - a file containing a description of the atom coordinates and optionally velocities and current periodic box dimensions.
mdin - the sander input file consisting of a series of namelists and control variables that determine the options and type of simulation to be run.

The MMTSB toolset hides this complexity, and permits one to carry out a variety of simulations using on a PDB file as input. This is great unless things go wrong. A basic purpose of this tutorial is to show you how to run simple molecular mechanics simulations using the tooset, and how to diagnose and correct some common problems.

2 Getting started

2.1 Setting up your environment

To get started, you need to set up some environment variables:

export MMTSBDIR=/path/to/mmtsb/tools

export PATH="$MMTSBDIR/bin:$MMTSBDIR/perl:$PATH"

To check things, the command

which enerAmber.pl

should return a pointer to the $MMTSBDIR/perl directory.

2.2 Getting a PDB file

Molecular mechanics simulations always need to start from initial set of coordinates. One common location is http://www.rcsb.org, commonly known as the Protein Data Bank, or PDB. One famous structure is the original "Dickerson dodecamer", a 12 base-pair fragment of DNA. We have a slightly modified version of this file here: 1BNA.pdb; you should save this into some working directory. [The only modification made was to remove the "HOH" water residues, since we don’t want to use those right now. Generally, some editing of PDB files is necessary; we will disucss this further below.

2.3 Simple MMTSB commands

In the simplest example, you could determine the Amber energy of this starting structure with the simple command:

enerAmber.pl 1BNA.pdb

This should return the string "-1543.3000", a number that by itself is not very helpful; but we can later compare it to other energies.

Molecular mechanics is a complex business, and the simple command listed above has useful options that you need to learn about. To start, type

enerAmber.pl -help

to see some of the options available. Amber-specific options are given by the -par option, described in Amber Parameters. For example, a more useful comand might be this:

enerAmber.pl -log log1 -par gb=1 1BNA.pdb

This would use an implicit solvent model, and save additional information about the calcultion in the file log1. If you scroll towards the bottom of this file, you will see this:

                   FINAL RESULTS

   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER

     1      -4.3948E+03     2.2886E+01     1.1525E+02     C2’       342

 BOND    =       94.3964  ANGLE   =      829.3651  DIHED      =      462.5271

 VDWAALS =     -365.6113  EEL     =      155.9866  EGB        =    -2851.4828

 1-4 VDW =      263.1737  1-4 EEL =    -2983.1773  RESTRAINT  =        0.0000

This gives a breakdown of the energy into individual pieces. We won’t discuss these in detail here, but they should be mostly understandable if you are familiar with molecular mechanics force fields. For more details, consult the AmberTools Users’ Manual.

2.4 Doing a simple minimization

The next simplest thing we can try is an energy minimization. Again, the MMTSB toolset hides most of the complexity, so you can just do this:

minAmber.pl -par gb=1 -log log2 1BNA.pdb > 1BNA.min1.pdb

After a slight delay, this creates a new PDB file with (partially) minimized coordinates. You can find the energy of the minimized structure quite easily:

enerAmber.pl -par gb=1 1BNA.min1.pdb

This returns "-5934.8" (or close to that), which is the new energy. You can use a text editor to examine the log2 file to see how the minimzation took place, and use VMD or some other visualization program to look at the minimzed structure, and to compare it to the X-ray structure.

3 Other starting points

One doesn’t always have a crystal structure to start from, or a model structure might be available in a pdb format that differs from that of the PDB itself. (The latter is especially true right now, since the PDB recently moved to format version 3, but many other programs that create pdb files have not yet followed suit. In this section, we will discuss how to handle some more general situations.

3.1 Using the w3dna server

Wilma Olson’s group at Rutgers has created a nice web server, http://w3dna.rutgers.edu for generating starting structures for nucleic acids. Go to this site, and click on "Reconstruction"; then choose a helix type (I chose "BDNA (generic)" for my example), then type in a sequence (I chose "ATATATATAT") and press "Build". You can then download the pdb file (which will probably be called s0.pdb) and look at it with your favorite visualization tool. However, this web server still uses the older version 2 format for its PDB files. If you try to use this with MMTSB, you will get a strange result:

enerAmber.pl -par gb=1 s0.pdb

will return "0.0000" for the energy. This is a signal that something has gone awry. When this happens (or even when it doesn’t) you can examine the leap.out file to find out what might be the problem. This file contains lots of information from the LEaP module of Amber, and you have to learn how to plow your way through it. If you do that with your example, you should see some messages like these:

Unknown residue: T number: 1 type: Nonterminal

.....

Created a new atom named: P within residue: .R

Created a new atom named: O1P within residue: .R

Created a new atom named: O2P within residue: .R

Added missing heavy atom: .R.A

These messages clearly look like problems, but it might not be clear at first glance what is wrong. Why doesn’t the program recognize the "T" residue? A clue is in the final line above: the program is trying to add an O2’ atom to the first adenine residue. That sounds wrong, since this is supposed to be DNA, not RNA. Here’s what is going on: in version 3 of the pdb format, as in biochemistry texts, adenine is called "A" and deoxy-adenine is called "DA". But earlier versions of the PDB (which unfortunately is what is being used by the Rutgers server) use "A" for both forms of nucleotide. Amber is expecting version 3 input, and hence thinks the "A" residue is missing its O2’ atom, and that there is no "T" residue.

How can we create a version 3 file? You might want to look at the remediator.pl script from the Richardson group (Google to find its current location). This converts pdb version 2 to pdb version 3 files:

remediator.pl s0.pdb > s0.pdbv3

enerAmber.pl -par gb=1 s0.pdbv3

No (real) joy here, since the last commands still returns "0.0000". But don’t give up — look again at the leap.out file. There are actually fewer real errors:

Created a new atom named: P within residue: .R

Created a new atom named: OP1 within residue: .R

Created a new atom named: OP2 within residue: .R

Added missing heavy atom: .R.A

Created a new atom named: P within residue: .R

Created a new atom named: OP1 within residue: .R

Created a new atom named: OP2 within residue: .R

So, what’s up with the O2’ message again?? Look at the s0.pdbv3 file with a text editor and you will see that the conversion changed all of the resdiues to deoxy-residues except for the first residue. This is a bug in the conversion program: it looks for O2’ atoms, but doesn’t do this correctly for the first residue. So, manually change the first residue from "A" to "DA".

What about the "Created new atom" messages? Basically, the w3dna server has put phosphates at the 5’ position of each chain, but Amber doesn’t think they should be there. You can either edit the PDB file to conform with what Amber wants (easy to do, but maybe not what you really want!), or you can try to tell Amber that the phosphates should really be there. We’ll take the easy way out for now, and just manually remove the 6 atoms listed above. Now the command:

enerAmber.pl -par gb=1 s0.pdbv3

returns -"2405.1000" (for the sequence and helix I chose). So we are on our way!

What is the take-home lesson here? First, not all PDB files are created equal, and you often have to edit them before sending them to Amber; recall that even in the "simple" 1BNA example above, we removed the water molecules by hand. Then, you have to keep running enerAmber.pl (or equivalently, the tleap program in Amber) and learn to decipher its error messages and to make the required fixes. This is generally not all that hard to do, but it something that becomes a lot easier with experience.

4 What about RNA?

With DNA under our belt, let’s try something simple with RNA. First, get 2JYM.pdb into a working directory. Now, try the usual:

enerAmber.pl -par gb=1 2JYM.pdb

This should return "-5073.4000". Wasn’t that easy? You might want to thank the PDB format, which (finally, in version 3) distinguishes DNA from RNA in a natural way, as described above. If you think this was neat, try the following: 1BUF.pdb, a DNA file with hydrogen atoms (from an NMR determination) and 1P7E.pdb, a protein file from NMR, that includes hydrogen as well.

Extra credit: do the same things, using enerCHARMM.pl.
More extra credit: run some short MD simulations, and visualize the resulting trajectories.