Skip to content

wells-wood-research/drMD

Repository files navigation

⚕️ drMD: Molecular Dynamics for Experimentalists ⚕️

Automated workflow for running molecular dynamics simulations with Amber and Openmm

⚕️ README Contents ⚕️

  1. Installation
  2. Running drMD: Running drMD from the command line | Running drMD as a Python Module
  3. Config Syntax
  4. drMD Selection Syntax: keyword | customSelection
  5. Adding Restraints in drMD: restraintInfo | restraintType | parameters
  6. Running Metadynamics with drMD: metaDynamicsInfo | height | biasFactor | frequency | biases
  7. Worked Examples

⚕️ GitHub Installation ⚕️

We recommend that you use the following steps to install drMD:

  1. Clone this repository
git clone https://github.com/wells-wood-research/drMD
  1. Create and activate conda environment
conda create -n drMD python=3.10
conda activate drMD
  1. Install AmberTools (needs to be before OpenMM) with conda
conda install -c conda-forge ambertools=23
  1. Install OpenMM with conda
conda install -c omnia openmm
  1. Install OpenBabel with conda
conda install -c conda-forge openbabel
  1. Install other python libraries with pip
pip install -r requirements.txt

⚕️ Pip Installation ⚕️

If you want to integrate drMD into a python-based pipeline, you can install drMD with pip and use it as a python module:

  1. Create and activate conda environment
conda create -n drMD python=3.10
conda activate drMD
  1. Install drMD with pip
pip install drMD
  1. Install AmberTools (needs to be before OpenMM) with conda
conda install -c conda-forge ambertools=23
  1. Install OpenMM with conda
conda install -c omnia openmm
  1. Install OpenBabel with conda
conda install -c conda-forge openbabel

⚕️ Running drMD ⚕️

Now that you have successfully set up the dependencies for drMD, you are nearly ready to run some bimolecular simulations!

🧠 Running drMD from the command line

If you have used the GitHub installation method, you can run drMD using the following command:

python /path/to/drMD.py --config config.yaml

🧠 Running drMD as a python module

If you have used the Pip installation method, you can import drMD as a python module, and as following:

import drMD

myBatchConfig = "/path/to/config.yaml"

drMD.main(myBatchConfig)

This config file contains all of the user inputs drMD needs to run a series of bimolecular simulations. The following section will detail the correct formatting of this config.yaml file

⚕️ Config syntax ⚕️

The config.yaml file is in the YAML format Inputs are grouped by theme and are stored as nested dictionaries and lists. The next few sections will detail the correct formatting of the config.yaml file

For most entries in the config file, the default value will be used unless otherwise specified. If your config file is improperly formatted, or contains a parameter that is not supported, drMD provide useful guidance to help you fix this issue.

🧠 pathInfo

The pathInfo entry in the config file is a dictionary containing two parameters:

🫀 inputDir

(DirectoryPath) This is the absolute path towards a directory containing PDB files that will be used as starting points for your simulations.

Default Value: Current working directory

⚕️ To Perform Replicate simulations, simply create copies of your starting PDB files in the inputDir, with each copy named with a unique number. For example, your inputDir could contain my_protein_1.pdb, my_protein_2.pdb, etc.

🫀 outputDir

(DirectoryPath) This is the absolute path towards a directory that you want your drMD outputs to be written to.

Default Value: /inputDir/outputs

⚕️ The outputDir will be created if it does not already exist at the point of running drMD

⚕️ Within outputDir, a directory will be created for each PDB file contained in inputDir, in this document, these subdirectories will be referred to as runDirs

Example pathInfo:

pathInfo:
  inputDir: "/home/esp/scriptDevelopment/drMD/01_inputs"
  outputDir: "/home/esp/scriptDevelopment/drMD/02_outputs"

🧠 hardwareInfo

This config entry tells drMD about your computer hardware and how you want to use it to run your simulations The hardwareInfo entry in the config file is a dictionary containing three parameters:

🫀 platform

(str) This is the platform that will be used to run simulations in OpenMM. Accepted arguments for platform are "CUDA", "OpenCL", and "CPU"

Default Value: CPU

⚕️ If you have access to GPU acceleration using CUDA, we recommend this option. If you cant use CUDA but have access to OpenCL, this is a close second. If you don't have a GPU you can use the CPU option, this will be a lot slower. Energy minimisation calculations do not benefit from GPU acceleration, so you should use the CPU option for these

🫀 parallelCPU

(int) This is the number of simulations that will be run in parallel

Default Value: 1

🫀 subprocessCpus

(int) This is the number of cpu cores that will be allocated to each simulation.

Default Value: 1

⚕️ The total CPU usage will be parallelCPU * subprocessCpus, so make sure you have enough CPUs when you set these parameters

Example hardwareInfo:

hardwareInfo:
  parallelCPU: 16
  platform: "CUDA"
  subprocessCpus: 2

This will use CUDA to achieve GPU acceleration and run 16 simulations in parallel using 2 cores each for a total usage of 32 cores.

🧠 miscInfo

This section allows you to set some general options for your simulations:

🫀 pH

(int or float) This is the pH of your simulation, this will affect the protonation states of your protein and any ligands present in your simulation

Default Value: 7

🫀 firstAidMaxRetries

(int) This is the maximum number of times that drMD will attempt to recover from an error in a simulation

Default Value: 10

⚕️ This option can be very helpful for rescuing crashed simulations. However don't rely on it too much. If your simulation keeps crashing you may want to reduce the temperature or timestep parameters instead to make it more stable

🫀 boxGeometry

(str) This is the shape of the solvation box that will be used in your simulations. Accepted arguments for boxGeometry are *"cubic" or "octahedral"

Default Value: cubic

🫀 writeMyMethodsSection

(bool) If set to TRUE, drMD will automatically write a methods section for you to use in your publications or thesis

Default Value: True

⚕️ drMD methods sections contain all of the information one might need to replicate your simulations. The formatting of these methods section may be too robotic and repetitive for you, feel free to reformat them as you see fit.

🫀 skipPdbTriage

(bool) drMD runs a pdbTriage protocol to check the validity of your PDB files before running your simulations. To disable this step, set this parameter to True

Default Value: False

🫀 trajectorySelections

(list of dicts) This entry controls the atoms that are written to your MD trajectories. You can specify any selection of atoms that you whish to write to your trajectories. For a full description of how to do this, see drMD Selection syntax

Default Value: [{"selection": {"keyword": 'all'}}]} (this will write all atoms to your trajectory files)

Example miscInfo:

miscInfo:
  pH: 7.4
  firstAidMaxRetries: 10
  boxGeometry: "cubic"
  writeMyMethodsSection: True

Simulations will be run with a pH of 7.4 in a cubic solvation box. The maximum number of first-aid retries will be 10. A methods section will automatically be generated.

🧠 ligandInfo

The ligandInfo entry in the config file is optional and may be used if your PDB files have organic ligand or cofactors. These small molecules will not have parameters in the AMBER forcefield, drMD will run an automated protocol to generate these parameters for you. To do this, you will need to tell drMD some things about each ligand you whish tp simulate.

⚕️ The ligandInfo entry is optional. drMD will automatically detect ligand in your PDB files. It will also detect parameter files in your input directory. If you have frcmod and mol2 files for your ligand already made, they must be located in your inputDir

ligandInfo is a list of dictionaries that contain the following parameters:

🫀 ligandName

(str) This is the three letter name of the ligand, this will be used to match the residue names in your PDB files

🫀 protons

(bool) This is a to tell drMD whether you have protons on your ligand. If set to FALSE, drMD will run an automated protonation protocol to add protons to your ligand

⚕️ The automatic protonation protocol only works reliably for simple organic ligands.

⚕️ For more complex ligand, we recommended that you manually add protons in your input PDB file prior to running drMD

🫀 charge

(int) This is the formal charge of the ligand

🫀 toppar

(bool) This is to tell drMD whether you have an frcmod file for your ligand already made. If you already have one, it must be located in the 01_ligand_parameters directory within your outputDir

🫀 mol2

(bool) This is to tell drMD whether you have a mol2 file for your ligand already made. If you already have one, it must be located in the 01_ligand_parameters directory within your outputDir

Example ligandInfo:

ligandInfo:
  - ligandName: "FMN"
    protons: True
    charge: -1
    toppar: False
    mol2: False
  - ligandName: "TPA"
    protons: True
    charge: -2
    toppar: False
    mol2: False

This ligandInfo tells drMD to expect two ligands: FMN and TPA. FMN has a formal charge of -1 and TPA has a formal charge of -2. Both ligands already have protons, so drMD will not add any. For both ligands the toppar and mol2 parameters are set to False, drMD will automatically generate these files for you


🧠 simulationInfo

This is the real meat and potatoes of the drMD config file.

The simulationInfo entry in the config file is a list of dictionaries containing information about each simulation.

Each simulation detailed in simulationInfo will be run in sequence, with the output of the previous simulation being the starting point for the next simulation. Each simulation dictionary contains the following parameters:

🫀 stepName

(str) This is the name of the step that will be used to create a subdirectory in the runDir, we recommend prefixing these names with numbers to make them order nicely

🫀 simulationType

(str) This is the type of simulation that will be run. Accepted arguments are:

  • EM This will run a steepest-decent Energy Minimisation step.

⚕️ We recommend that you run one of these steps before any other simulation steps

  • NVT: This will run an NVT (constant volume) molecular dynamics simulation
  • NPT: This will run an NPT (constant pressure) molecular dynamics simulation

⚕️ For the majority of protein simulations, the NPT ensemble is used for production MD simulations, while the NVT ensemble is only used in equilibration steps

  • META: This will run a Metadynamics simulation

Selecting simulation temperature

For most simulations, a constant temperature is used. In this case the following parameter is required:

🫀 temperature

(int) This is the temperature of the simulation in Kelvin

If you wish to change the temperature throughout the simulation, the following parameter is required:

🫀 temperatureRange

(list of int) This is a list of integers (again, in Kelvin) that will be used to change the temperature throughout the simulation.

Energy Minimisation Parameters

For Energy Minimisation steps, the following additional parameters are required:

🫀 maxIterations

(int) This is the maximum number of iterations that will be run in the Energy Minimisation step. If this parameter is set to -1, the step will run until the energy converges.

Example Energy Minimisation syntax:

simulationInfo:
  - stepName: "01_energy_minimisation"
    type: "EM"
    temp: 300
    maxIterations: -1

This will run a energy minimisation until the energy converges

Generic Simulation Parameters

For "normal" MD simulations using NVT or NpT ensembles, as well as for Metadynamics simulations, the following additional parameters are required:

🫀 duration

(str) This is the duration of the simulation step, as a string "int unit" eg. "1000 ps"

🫀 timestep

(str) This is the timestep of the simulation, as a string "int unit" eg. "2 fs"

🫀 heavyAtoms

(bool) If set to True, the mass of each proton will be set to 4 amu instead of 1 amu in this simulation step (this increase in mass is compensated by reducing the mass of nearby heavy atoms). On its own, this setting does nothing. It does however allow you to perform you simulation step with a longer timestep (we recommend 4 fs). Simulations run with longer timesteps will be faster, but can be prone to instability.

🫀 logInterval

(str) This is the frequency that the simulation will write to file using built-in OpemMM reporters. As a string "int unit" eg. "100 ps"

Example NVT simulation syntax:

simulationInfo:
  - stepName: "02_NVT_pre-equilibration"
    type: "NVT"
    duration: "100 ps"
    timestep: "2 fs"
    temp: 300
    logInterval: "10 ps"

This will run a 100 ps NVT molecular dynamics simulation with a timestep of 2 fs, a temp of 300 and a logInterval of 10 ps


⚕️ Simulation Aftercare ⚕️

After all of your simulations have been run, drMD contains some simple utilities for organising your output files and deleting any unwanted files.

If you want to do any post-processing, you will need to provide the following parameter in your config file:

🧠 aftercareInfo

(dict) This is a dictionary containing the parameters for the post-simulation processing

If you wish to collect PDB files that represent the last frame of each simulation, you may include the following parameter in aftercareInfo:

🧠 endpointInfo

(dict) This is a dictionary containing the parameters the following parameters:

🫀 stepNames

(list) This is a list of strings containing the names of the steps in the simulation, these should match the stepNames that you have used in your simulationInfo dictionary (described above). Endpoint PDB files will be gathered for these steps

🫀 removeAtoms

(list) This is a list of dictionaries containing the selections of atoms to be removed from the PDB files. For a full description of how to do this, see drMD Selection syntax

Molecular Dynamics simulations can generate very large output files that can become rather unwieldy and difficult to analyse. One way to quickly see the most important parts of your simulation is to perform clustering on your simulation trajectories. To do this with drMD, include the following parameter in your config file:

🧠 clusterInfo

(dict) This is a dictionary containing the parameters following parameters:

🫀 stepNames

(list) This is a list of strings containing the names of the steps in the simulation, these should match the stepNames that you have used in your simulationInfo dictionary (described above). Clustering will be performed on trajectories of these steps

🫀 nClusters

(int) This is the number of clusters PDB files that will be generated

🫀 clusterBy

(list) This is a list of selections of atoms to cluster on. If you want to explore the motions of one particular group of atoms in your system (e.g. the backbone of a protein), you can include a selection of these atoms in this parameter. For a full description of how to do this, see drMD Selection syntax

🫀 removeAtoms

(list) This is a list of dictionaries containing the selections of atoms to be removed from the output cluster PDB files. For a full description of how to do this, see drMD Selection syntax

🫀 collateVitalsReports

(bool) If True, will collate vitals reports from the trajectories generated by the MD simulations into the 00_vitals_reports directory in your specified output directory.


⚕️ drMD Selection syntax ⚕️

When creating restraints, metadynamics bias variables or running post-simulation clustering, you will need to specify the selection of atoms that the restraints will be applied to. To do this, you will need to supply a "selection" dictionary. This dictionary must contain the following parameter:

🫀 keyword

(str) This is the keyword that will be used to specify the selection. Accepted arguments are:

  • "all" : This will select all atoms in your system
  • "protein" : This will select all protein atoms in your system
  • "water" : This will select all water molecules in your system
  • "ions": This will select all ions in your system
  • "ligand" : This will select all non-protein, non-water and non-ion atoms in your system
  • "custom" : This will select all atoms that match the customSelection (see below)

Example use of keywords in the selection dictionary:

selection:
  keyword: "water"

This will select all water molecules in your system

If you have used the custom keyword, you will need to use an additional parameter the selection dictionary:

🫀 customSelection

(list) This is a list of dictionaries containing details of the atoms that will be selected. Each dictionary in the list must contain the following parameters:

🫀 CHAIN_ID

(str, list of str, or _) This is the chain ID of the atom to be selected

🫀 RES_NAME

(str, list of str, or _) This is the three-letter residue name of the atom to be selected

🫀 RES_ID

(int, list of int, or _) This is the residue ID of the atom to be selected

🫀 ATOM_NAME

(str, list of str, or _) This is the atom name of the atom to be selected

For the above parameters:

  • if a single string (or int for RES_ID) is provided, the selection will match that value for the given parameter
  • if a list is provided, the selection will match any value in the list for the given parameter
  • if the wildcard character "_" is provided, the selection will match all values for the given parameter

⚕️ This selection method is used to match atoms using columns of your PDB files. To work out how to identify what inputs you need to select your atoms of interest, simply open your input PDB file in PyMOL and use labels to identify the relevant CHAIN_ID, RES_NAME, RES_ID, and ATOM_NAME values

Example customSelection syntax, (this is probably more complex than you will need, but shows off what you can do):

selection:
  keyword: custom
  customSelection:
    - {CHAIN_ID: [A,B], RES_NAME: _, RES_ID:  _, ATOM_NAME: CA}
    - {CHAIN_ID: A, RES_NAME: SER, RES_ID: 131, ATOM_NAME: [CB, HB1, HB2, OG, HG1]}
    - {CHAIN_ID: C, RES_NAME: FMN, RES_ID: _, ATOM_NAME: _}

In the above example, a selection containing the following:

  • all CA atoms in chains A and B
  • the atoms CB, HB1, HB2, OG, and HG1 in residue Ser131 in chain A
  • all atoms in the residue FMN in chain C

⚕️ Adding Restraints with drMD ⚕️

If you wish to perform simulations with restraints, create a restraintsInfo dictionary in the simulation step:

🧠 restraintInfo

(list of dict) This is a list of dictionaries containing information about each restraints.

Within the restraintInfo list, you must provided at least one dictionary that contains the following parameters:

🫀 restraintType

(str) This is the type of restraints that will be added. Accepted arguments are: "distance", "angle", "dihedral", "position"

🫀 parameters

(dict) This is a dictionary containing the parameters for the restraints.

Within the parameters dictionary, all restraint types require the following parameter:

🫀 k

(int) This is the force constant of the restraint (int), given in kJ/mol A^-2

Additional entries in the parameters dictionary depend on the type of restraints:

🫀 r0

(int or float) Required for distance restraints. This is the distance in Angstroms that the restraint should be applied to

🫀 theta0

(int or float) Required for angle restraints. This is the angle in degrees that the angle should be constrained to

🫀 phi0

(int or float) Required for torsion restraints. This is the angle in degrees that the dihedral should be constrained to

All restraints require the selection parameter. This tells drMD what atoms to apply the restraint to

🫀 selection

(list of dicts) This is a dictionary containing information on the selection of atoms that the restraints will be applied to.

⚕️ The selection method is shared between multiple different inputs in the drMD config file. This is described in more detail in the next section

Example restraints syntax:

  ## position restraint to all protein atoms
    restraints:
    - restraintType: "position"
      selection:
        keyword: "protein"
      parameters:
        k: 1000
  ## distance restraint between CA atoms of residues 1 and 2
    - restraintType: "distance"
      selection: 
        keyword: "custom"  
        customSelection:
          - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 1, ATOM_NAME: "CA"}
          - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 2, ATOM_NAME: "CA"}
      parameters:
        k: 1000
        r0: 3

This example will add the following restraints:

  • Position restraints to the protein atoms with a force constant of 1000 kJ/mol
  • A 3 Angstrom distance restraint between the CA atoms of residues 1 and 2 of the protein, with a force constant of 1000 kJ/mol

For a detailed explanation of how to select chains, residues, and atoms for restraints, see the drMD Selection syntax section.


⚕️ Running Metadynamics with drMD ⚕️

To run a metadynamics simulation, first set the simulationType to "META".

Once you have selected your simulationType, you will need to include an additional metaDynamicsInfo dictionary in your simulation dictionary:

🧠 metaDynamicsInfo

(dict) This is a dictionary containing the parameters for the Metadynamics simulation,

Within the metaDynamicsInfo dictionary, you must provide the following parameters:

🫀 height

(int) This is the height parameter used in the Metadynamics simulation

🫀 biasFactor

(int) This is the bias factor parameter used in the Metadynamics simulation

🫀 frequency

(int) How often (in time steps) gaussians will be added to the bias potential

🫀 biases

(list of dicts) This is a list of dictionaries containing information about each biasVariable.

Within each dictionary in biases you must provide the following parameters:

🫀 biasVar

(str) This is the type of biasVariable that will be added.

Accepted arguments for biasVar are "RMSD", "torsion" "distance" and "angle"

🫀 minValue

(float) This is the minimum value of the biasVariable.

🫀 maxValue

(float) This is the maximum value of the biasVariable.

🫀 biasWidth

(float) This determines the width of gaussians added to the bias potential

🫀 selection

(dict) This is a dictionary containing information on the selection of atoms that the biasVariable will be applied to. The selection syntax is identical to that used for the restraints. For a full description of how to do this, see drMD Selection syntax

⚕️ Depending on the type of bias variable, different numbers of atoms must be selected:

  • Distance bias variables require two atoms to be selected
  • Angle bias variables require three atoms to be selected
  • Torsion bias variables require four atoms to be selected

⚕️ You will also need to specify at least one biasVariable for the simulation to sample. You can specify as many biasVariables as you wish, with one dictionary per biasVariable

Example MetaDynamics syntax:

    metaDynamicsInfo:
      height: 2
      biasFactor: 5
      frequency: 50
      biases: 
        - biasVar: "RMSD"
          minValue: 0
          maxValue: 10
          biasWidth: 1
          selection: 
            keyword: "backbone"
        - biasVar: "torsion"
          selection: 
            keyword: "custom"
            customSelection:
            - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 1, ATOM_NAME: "CA"}
            - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 2, ATOM_NAME: "CA"}
            - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 3, ATOM_NAME: "CA"}
            - {CHAIN_ID: "A", RES_NAME: "ALA", RES_ID: 4, ATOM_NAME: "CA"}

This example will add a RMSD bias to the backbone of the protein and a torsion bias between the CA atoms of residues 1, 2, 3, and 4 of the protein

Advanced YAML-ing with variables

If you are running multiple simulation steps that share the same parameters, you can use variables in the YAML config file. This will most commonly come up when you are applying position restraints during equilibration steps. Below is a standard syntax for a pair of equilibration steps:

simulationInfo:
  - stepName: "01_NVT_pre-equilibration"
    simulationType: "NVT"
    duration: "100 ps"
    timestep: "4 fs"
    heavyProtons: True
    temperature: 300
    logInterval: "10 ps"
    restraintInfo: 
    - restraintType: "position"
      parameters:
        k: 1000
      selection:
        keyword: "protein"
    - restraintType: "position"
      selection: 
        keyword: "ligand"  
      parameters:
        k: 1000

  - stepName: "02_NPT_pre-equilibration"
    simulationType: "NPT"
    duration: "100 ps"
    timestep: "4 fs"
    heavyProtons: True
    temperature: 300
    logInterval: "10 ps"
    restraintInfo:
      - restraintType: "position"
        parameters:
          k: 1000
        selection:
          keyword: "protein"
      - restraintType: "position"
        selection: 
          keyword: "ligand"  
        parameters:
          k: 1000

Instead of repeating the restraintsInfo section each time, equilibriumRestraints can be defined as a variable, then re-used in each equilibration step:

equilibrationRestraints: &equilibrationRestraints
    - restraintType: "position"
      parameters:
        k: 1000
      selection:
        keyword: "protein"

    - restraintType: "position"
      selection: 
        keyword: "ligand"  
      parameters:
        k: 1000

simulationInfo:
  - stepName: "01_NVT_pre-equilibration"
    simulationType: "NVT"
    duration: "100 ps"
    timestep: "4 fs"
    heavyProtons: True
    temperature: 300
    logInterval: "10 ps"
    restraintInfo: *equilibrationRestraints

  - stepName: "02_NPT_pre-equilibration"
    simulationType: "NPT"
    duration: "100 ps"
    timestep: "4 fs"
    heavyProtons: True
    temperature: 300
    logInterval: "10 ps"
    restraintInfo: *equilibrationRestraints

⚕️ WORKED EXAMPLES ⚕️

In this section we will go through a series of examples of how to use this package. We will start with the simplest possible simulation setup and gradually build up to more complex ones.

Worked Example 1: Molecular Dynamics Simulation of a Protein

In this example, we want to simulate the dynamics of the protein isPETase.

This example introduces the following concepts:

  • Removal of co-crystalised ions and small molecules
  • Changing pathInfo in a config file
  • Running drMD from command line
  • Simple equilibrium protocol
  • Production MD simulations

PDB preparation

To begin with, we download a crystal structure of isPETase from the Protein Data Bank PDB:6eqe. Next we need to remove any random ions and small molecules that have co-crystalised in the structure (we recommend this is done in Pymol). Once this file has been saved, we place it into the inputFiles directory.

Removal of co-crystalised ions and small molecules

Before and after removal of co-crystalised waters, ions and small molecules

Config file setup

Next we need to construct our configuration file. As we don't want to do anything too fancy, we will use the standard_MD_config.yaml as a template. We copy this file into inputFiles directory:

cp ./Prescriptions/standard_MD_config.yaml ./inputFiles/isPETase_standard_MD.yaml

The first thing we need to do is change the pathInfo section of this config file. We need to set the inputDir and outputDir entries to match with our own directory paths (this change for your own directory system!)

pathInfo:
  inputDir: "/path/to/inputFiles"
  outputDir: "/path/to/outputFiles"

We don't need to change anything else in the config file. If you want to change anything else, refer to the Config Syntax section of this README for more information.

To run our simulations, we first navigate to the main drMD directory:

cd /path/to/drMD

Then we activate our conda environment (see Installation for more information):

conda activate drMD

Finally we run drMD using our config file:

python ./src/drMD.py ./inputFiles/isPETase_standard_MD.yaml

First, you will see the drMD logo in your terminal. The output text immediately below the logo will keep you updated with what is going on in your simulation.

drMD logo with loading bar

An example of drMD command line output. Here we can see that drMD is running prep steps for system 6eqe_1

Preparation steps

Initially prep steps are being performed. During this step the following are performed:

  • Our protein is protonated at our specified pH (7.4).
  • Our protein is placed in a solvent box which is filled with TIP3P water molecules
  • Chloride ions are added to the solvent box to balance the overall positive charge of our protein

To see output files associated with this step, check the ./outputFiles/6eqe/00_prep directory.

Once the prep steps have finished, our isPETase_standard_MD.yaml file instructs drMD to perform an energy minimisation step. The resulting file can be found in the ./outputFiles/6eqe/01_minimisation directory.

We recommend that you open PDB file in Pymol to check the results of this step. If you slowly rotate the structure, you >will be able to see that the water molecules have a strange, unphysical order to them. This is due to the imperfections in the algorithm used to place the waters in the first place. The purpose of the next two simulation steps is to remove this unphysical behavior.

Pre-equilibration protocol

The steps 02_NVT_pre-equilibration and 03_NPT_pre-equilibration are used to equilibriate the water molecules in our system.

As we are only equilibrating water, we apply position restraints to the protein and ligand. This is done using the restraintInfo section of our config file (see Adding Restraints with drMD and Advanced YAML-ing with variables for more information).

The 02_NVT_pre-equilibration step is performed under the NVT ensemble. This means that the number of particles (N), the volume of the system (V), and the temperature (T) are all constant. Under the NVT ensemble the pressure of the system is allowed to change.

If we look at the trajectory of this step in Pymol (which can be done by loading the trajectory.pdb file then the trajectory.dcd file), we can see that the water molecules contract, leaving voids at the corners of the solvent box.

In the next step 03_NPT_pre-equilibration, we use the NPT ensemble. This means that the number of particles (N), the volume of the system (V), and the pressure (P) are all constant. Under the NPT ensemble, the Volume of the system is allowed to change. If we look at the trajectory of this step in Pymol, we can see that the solvent box itself contracts, eliminating the voids at the corners of the box.

Slow integrator step

Sometimes the removal of restraints can cause numerical errors. To prevent this, we run the 04_NpT_slowIntegrator step using a small timestep of 0.5 fs. This can prevent the next couple of steps from crashing.

If some of your simulations fail, you can try to run these slow integrator steps before other simulation steps.

Equilibriation step

The 05_Equilibration step is run under the NPT ensemble for 5 nanoseconds. The purpose of this step is to ensure that our system is equilibrated. To make sure of this, we can check the vitals report that drMD generates. If the vitals report indicates that the system is not equilibrated, we can extend the duration of this step before running our production simulation.

Removal of co-crystalised ions and small molecules

An example of drMD vitals report for the 05_Equilibration step. We can see that all of the key simulation properties have converged. This gives us good confidence that our system is equilibrated

Production MD

The 06_Production_MD step is run under the NPT ensemble for 50 nanoseconds. This is the production simulation that we are interested in. Depending on our research goals, we will take measurements and/or extract structures from the trajectory of this step.

To perform routine analysis on your production MD trajectories we recommend the MDAnalysis and MDTraj python packages.

Worked Example 2: Restrained MD Simulations of a Protein-Ligand Complex

In this example, we want to investigate the catalytic pose of the ligand PET-tetramer in the active site of isPETase.

This example introduces the following concepts:

  • Ligand preparation
  • The ligandInfo section of the config file
  • Creation of custom restraints in drMD
  • Clustering simulation trajectories

PDB preparation

For this set of simulations, we need to prepare a PDB file of our protein-ligand complex. We start with a cleaned up structure of isPETase (see previous worked example). We recommend this is done using the GNINA molecular docking software. Once we have generated this protein-ligand complex, we place it as a PDB file into our inputFiles directory.

Molecular docking of PET-tetramer ligand to isPETase

An example of a protein-ligand complex generated by GNINA. Note that GNINA generates a large number of alternate conformations for the ligand. It will be up to you to decide which conformations to use as input(s) to drMD

Config file setup

Next we need to construct our configuration file. Again, we are going to use the standard_MD_config.yaml as a template. We copy this file into inputFiles directory:

cp ./Prescriptions/standard_MD_config.yaml ./inputFiles/isPETase_ligand_MD.yaml

We then need to modify the inputFiles/isPETase_ligand_MD.yaml file. First we need to change the pathInfo section see previous example.

Next we need to add a ligandInfo section to our config file. This section will tell drMD useful information about our PET-tetramer ligand.

ligandInfo:
  - ligandName: "PET"
    charge: 0
    protons: False
    toppar: False
    mol2: False

Here we have told drMD that our ligand is called "PET" and has a formal charge of 0. We have also told drMD that it needs to add protons to our ligand and that it does not already have a toppar or mol2 file for our ligand - these will be automatically generated by drMD.

We will use the same pre-equilibration and equilibration steps as our previous worked example.

Creation of custom restraints

To sample from only catalytically active poses of the PET-tetramer ligand in our isPETase active site, we need to add custom restraints to our MD simulation. From QM/MM calculations, we know that the catalytic pose of PET in the active site of isPETase relies on the geometry between the ester carbonyl of the ligand and the residues Ser160, Met161 and Tyr87 (see image below).

The ideal catalytic geometry of PET in the active site of isPETase

The catalytic pose of PET in the active site of isPETase. The catalytic Ser160 residue is 3 Å from the scissile ester carbon of the ligand and forms an angle of 109.5° relative to the ester carbonyl bond. The ester oxygen of the ligand is 2.7 Angstroms from backbone nitrogen atoms of the residues Met161 and Tyr87

To do this, we will create a restraintsInfo section in our 06_Production_MD step of our config file. To do this nice and cleanly, we can create a variable that contains our custom restraints:

catalyticPoseRestraints: &catalyticPoseRestraints
  ## create a distance restraint between Tyr87-N and PET-O2 with a k of 100 and r0 of 2.7 Angstroms
  - restraintType: distance
    parameters:
      k: 100
      r0: 2.7
    selection:
      keyword: custom
      customSelection:
        - {CHAIN_ID: A, RES_NAME: TYR, RES_ID: 87, ATOM_NAME: N}
        - {CHAIN_ID: B, RES_NAME: PET, RES_ID: 1, ATOM_NAME: O2}

  ## create a distance restraint between Met161-N and PET-O2 with a k of 100 and r0 of 2.7 Angstroms
  - restraintType: distance
    parameters:
      k: 100
      r0: 2.7
    selection:
      keyword: custom
      customSelection:
        - {CHAIN_ID: A, RES_NAME: MET, RES_ID: 161, ATOM_NAME: N}
        - {CHAIN_ID: B, RES_NAME: PET, RES_ID: 1, ATOM_NAME: O2}

  ## create a distance restraint between Ser160-OG and PET-C9 with a k of 100 and r0 of 3.0 Angstroms
  - restraintType: distance
    parameters:
      k: 100
      r0: 3.0
    selection:
      keyword: custom
      customSelection:
        - {CHAIN_ID: A, RES_NAME: SER, RES_ID: 160, ATOM_NAME: OG}
        - {CHAIN_ID: B, RES_NAME: PET, RES_ID: 1, ATOM_NAME: C9}

  ## create an angle restraint between Ser160-OG, PET-C9 and PET-O2 with a k of 100 and theta0 of 109.5 degrees
  - restraintType: angle
    parameters:
      k: 100
      theta0: 109.5
    selection:
      keyword: custom
      customSelection:
        - {CHAIN_ID: A, RES_NAME: SER, RES_ID: 160, ATOM_NAME: OG}
        - {CHAIN_ID: B, RES_NAME: PET, RES_ID: 1, ATOM_NAME: C9}
        - {CHAIN_ID: B, RES_NAME: PET, RES_ID: 1, ATOM_NAME: O2}

We can then include these restraints in out 06_Production_MD step of our config file:

  - stepName: "06_Production_MD"
    simulationType: "NpT"
    duration: "50 ns"
    timestep: "4 fs"
    heavyProtons: True
    temperature: 300
    logInterval: "50 ps"
    restraintInfo: *catalyticPoseRestraints

Clustering MD trajectories

We want to investigate the conformational space that our ligand can adopt, whilst being in its catalytic pose. Instead of looking through the entire trajectory manually in Pymol (see previous example for how to do this), we can cluster our trajectory.

To do this, we will create a clusterInfo section in our aftercareInfo section of our config file (see clusterInfo for more information):

aftercareInfo:
  clusterInfo:
    stepNames: ["06_Production_MD"] 
    nClusters: 10
    clusterBy:
    - selection:
        keyword: "ligand"
    removeAtoms:
      - selection:
          keyword: "water"
      - selection:
          keyword: "ions"

Here we have told drMD to cluster the trajectory generated by our 06_Production_MD step. 10 PDB files will be created in the path/to/outputDir/00_clustered_pdbs directory. These files can be used as a starting point for your analysis or an input structures for further computational tools.

Worked Example 3: Energy Minimisation of Multiple Structures

In this example, we want to perform an energy minimisation calculation on multiple structures.

This example introduces the following concepts:

  • Parallelisation of calculations in drMD
  • Cleanup of directories

There are many situations in computational protein science where one may want to perform an energy minimisation calculation on a large library of structures. With the explosion of AI-driven structure prediction tools, this is becoming more common. Some of these tools (such as Alphafold 2) have an energy minimisation step built-in, while others (such as Omegafold) do not. In these cases, drMD can be used to perform the energy minimisation step for you.

PDB preparation

For this example, we will be using some output files from Omegafold, located in the ExampleInputs/Worked_example_3_Mass_EM directory on our github repository. These files are good to go as-is. We can simply place them in our inputDir directory.

Our input structures in this example

3D structures of our input proteins. These were generated by Omegafold

Config file setup

We can use the pre-built energy minimisation configuration file. As always, we need to change the pathInfo entry to match with our own directory system (see Previous Example for more information).

If we look into our configuration file, we can see that the hardwareInfo section uses more than one CPU:

hardwareInfo:
  parallelCPU: 16
  platform: CUDA
  subprocessCpus: 1

This means that we will be running 16 energy minimisation calculations in parallel, each using 1 CPU core. This is a good idea as energy minimisation calculations do not benefit much from GPU acceleration. By running 16 parallel calculations, we can speed up our calculations by a factor of 16.

In this config file, we have a simple one-step simulationInfo section:

simulationInfo:
  - stepName: 01_energy_minimisation
    simulationType: EM
    temperature: 300
    maxIterations: -1

This will run an energy minimisation step until it reaches convergence.

In the aftercareInfo section, we have the following:

aftercareInfo:
  endPointInfo:
    stepNames: [01_energy_minimisation]
    removeAtoms: 
    - selection:
        keyword: ions
    - selection:
        keyword: water
  removeAllSimulationDirs: True

The endpointInfo section tells drMD to collect the final PDB file generated from each energy minimisation calculation and place them in the path/to/outputDir/00_collated_pdbs/01_energy_minimisation directory. Within the endPointInfo section, we have specified that we want to remove all water and ion atoms from the PDB file. This ensures that we have the same atoms in our output files as our input files. The removeAllSimulationDirs parameter tells drMD to delete all prep and simulation directories once the job is complete. This will leave us with only the path/to/outputDir/00_collated_pdbs/01_energy_minimisation directory. This can be useful as it saves on disk space.

Running Energy Minimisation

When we run these simulations, drMD will display a progress bar for each calculation:

Loading Bars

Loading bars for each energy minimisation calculation

Worked Example 4: Using drMD defaults

In this example, we will run the same simulation as in Worked Example 1 using a very minimal config file.

This example introduces the following concepts:

  • Default values in drMD

As you read the configSyntax section of this README you will notice that for most config entries, drMD can provide a default value. This means that if you do not specify a value for a parameter, drMD will use the default value provided for that parameter.

☣️ WARNING: If you do specify a value for a parameter but it is incorrect, drMD will not attempt to use a default value. Instead a handy error message will be displayed.

Config file setup

In the Prescriptions directory, we have a config file named lazy_config.yaml. This config file has the following:

## sparse pathInfo
pathInfo: 
  inputDir: "/home/esp/scriptDevelopment/drMD/ExampleInputs/test_standard"
## sparse hardwareInfo
hardwareInfo:
  platform: CUDA
## no miscInfo
## sparse simulationInfo
simulationInfo:
  - stepName: 01_energy_minimisation
    simulationType: EM
  - stepName: 02_NVT_pre-equilibration
    simulationType: NVT
    duration: 100 ps
    heavyProtons: True
  - stepName: 03_NPT_pre-equilibration
    duration: 100 ps
    heavyProtons: True
  - stepName: 04_slow_step
    duration: 50 ps
    heavyProtons: True
    timestep: 0.5 fs
  - stepName: 05_equilibration
    duration: 5 ns
    heavyProtons: True
  - stepName: 06_production_MD
    duration: 50 ns
    heavyProtons: True   

We have set up the config file as follows:

  • In the pathInfo section, we have not specified the outputDir . drMD will use a default value for outputDir, which is /current/working/directory/outputs.

  • In hardwareInfo we have only specified platform, so drMD will use a default value for parallelCPU and subprocessCpus (both values are set to 1).

  • We have completely neglected to include a miscInfo section. drMD will create this section from scratch with default values.

  • In the simulationInfo section, we have included the minimal amount of information that drMD needs to reproduce the simulation in Worked Example 1.

    • Each simulation step has a stepName entry, this is mandatory.
    • For the 01_energy_minimisation step, we have specified a simulationType of EM. For energy minimisation calculations a maxIterations parameter is required, drMD will use a default value of -1 (this means that the step will run until it reaches convergence).
    • For the remaining simulation steps, we have specified the duration and heavyProtons parameters.
    • We will let drMD use defaults for timestep (4 fs when heavyProtons is True), temperature (300 K) , logInterval (10 ps) and simulationType (NpT).
    • Where we do not want a default value to be used, we have specified a value: For the02_NVT_pre-equilibration step we want use the canonical (NVT) ensemble, so we have specified the simulationType as NVT. For the 04_slow_step step, we we have specified a timestep of 0.5 fs.

After we run drMD, a per-run config file will be created in the `/path/to/outputDir/00_configs' directory. You can see that

hardwareInfo:
  parallelCPU: 1
  platform: CUDA
  subprocessCpus: 1
miscInfo:
  boxGeometry: cubic
  firstAidMaxRetries: 10
  pH: 7
  skipPdbTriage: false
  trajectorySelections:
  - selection:
      keyword: all
  writeMyMethodsSection: true
pathInfo:
  inputDir: /home/esp/scriptDevelopment/drMD/ExampleInputs/test_standard
  inputPdb: /home/esp/scriptDevelopment/drMD/ExampleInputs/test_standard/6eqe_1.pdb
  outputDir: /home/esp/scriptDevelopment/drMD/outputs/6eqe_1
  outputName: 6eqe_1
proteinInfo:
  proteinName: 6eqe_1
  protons: false
simulationInfo:
- maxIterations: -1
  simulationType: EM
  stepName: 01_energy_minimisation
  temperature: 300
- duration: 100 ps
  heavyProtons: true
  logInterval: 10 ps
  simulationType: NVT
  stepName: 02_NVT_pre-equilibration
  temperature: 300
  timestep: 4 fs
- duration: 100 ps
  heavyProtons: true
  logInterval: 10 ps
  simulationType: NPT
  stepName: 03_NPT_pre-equilibration
  temperature: 300
  timestep: 4 fs
- duration: 50 ps
  heavyProtons: true
  logInterval: 10 ps
  simulationType: NPT
  stepName: 04_slow_integration_step
  temperature: 300
  timestep: 0.5 fs
- duration: 5 ns
  heavyProtons: true
  logInterval: 10 ps
  simulationType: NPT
  stepName: 05_equilibration
  temperature: 300
  timestep: 4 fs
- duration: 50 ns
  heavyProtons: true
  logInterval: 10 ps
  simulationType: NPT
  stepName: 06_production_MD
  temperature: 300
  timestep: 4 fs

Worked Example 5: Metadynamics Simulation of Alanine Dipeptide

This section was written by Eva Notari (Thanks Eva!)

In this example, we will run a metadynamics simulation to explore the φ and ψ angles of alanine-dipeptide.

This example introduces the following concepts:

  • Metadynamics in drMD
  • Collective Variables (CVs)
  • Metadynamics parameters
  • Construction of Bias Variables in drMD
  • Free Energy Landscapes

Molecular Dynamics simulations can normally be run in the ns or μs timescales. However, a variety of events of interest to protein scientists, such as protein folding, protein-ligand binding, and transport of molecules through biological membranes, occur on larger timescales (ms or even s). As a consequence, these processes cannot be adequately sampled with standard Molecular Dynamics simulations. Metadynamics is an enhanced sampling technique that helps accelerate such rare events and can also provide quantitative descriptions of the processes under study by computing their free energies.

Metadynamics accelerates sampling along one or more user-specified variables, called Collective Variables (CVs). The chosen CVs should describe the process of interest. For example, if the process being studied is the stretching of a peptide, the CV could be the distance between the first and the last Cα atoms. Metadynamics will then output the ΔG of the process as a function of the chosen CV(s).

Config file setup

A simple example to illustrate the use of metadynamics is alanine dipeptide (an alanine residue capped with acetyl and N-methyl groups). Alanine dipeptide is commonly used as a toy system for simulations, with the two backbone torsions φ (C-N-Cα-C) and ψ (N-Cα-C-N) used as the CVs.

Alanine dipeptide with Phi and Psi angles annotated

Alanine dipeptide with φ and ψ backbone torsions shown

For each CV, we need to select appropriate metadynamics parameters. We recommend surveying the literature for suitable parameters, as they will depend on the process of interest and the nature of the variables chosen.

For our alanine dipeptide example, we can specify the CVs and their parameters in our config file during the 06_Metadynamics step, which in this case replaces the standard MD production step used in Worked Example 1.

  - stepName: 06_Metadynamics
    simulationType: META
    duration: 5 ns
    timestep: 2 fs
    heavyProtons: False
    temperature: 300
    logInterval: 2 ps
    metaDynamicsInfo:
      height: 0.8
      biasFactor: 10
      frequency: 500
      biases:
        - biasVar: torsion
          minValue: -180
          maxValue: 180
          biasWidth: 5.73
          selection: 
            keyword: custom
            customSelection:
            - {CHAIN_ID: A, RES_NAME: ACE, RES_ID: 1, ATOM_NAME: C}
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: N}
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: CA}
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: C}
        - biasVar:  torsion
          minValue: -180
          maxValue: 180
          biasWidth: 5.73
          selection: 
            keyword: custom
            customSelection:
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: N}
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: CA}
            - {CHAIN_ID: A, RES_NAME: ALA, RES_ID: 2, ATOM_NAME: C}
            - {CHAIN_ID: A, RES_NAME: NME, RES_ID: 3, ATOM_NAME: N}

The above config sets up a metadynamics simulation with the backbone torsions φ and ψ as the CVs. The simulation will explore a range of [-180,180] degrees for each CV. Gaussians will be deposited every 500 simulation steps (1 ps), with a Gaussian width of 0.1 rad (5.73 degrees) for each CV. The height of the initial Gaussian will be 0.8 kJ/mol and the bias (temperature) factor is set to 10.

For a detailed technical description of metadynamics, we recommend this review. Note that drMD implements Well-Tempered metadynamics.

Free energy landscape of collective variables

Once our 06_Metadynamics step is complete, drMD will automatically create a file called freeEnergy.csv, located in the /path/to/06_Metadynamics/reporters_and_plots directory. This file contains the free energy landscape of the with respect to our chosen collective variables. When we plot this free energy landscape, we see the following:

The Free Energy Landscape of φ and ψ torsions of alanine dipeptide

The Free energy landscape of φ and ψ torsions of alanine dipeptide

We can compare our results with other studies. If we run our simulations for more time, it is likely that our results will compare more favorably with literature.

Worked Example 6: Metadynamics Simulation of Chignolin Peptide

This section was written by Eva Notari (Thanks Eva!)

In this example, we will run a metadynamics simulation to explore the folding of the chignolin peptide.

This example introduces the following concepts:

  • TODO

Chignolin PDB 1UAO is another toy system commonly used in Molecular Dynamics simulations. It is a fast folding, 10 -residue miniprotein. There are two hydrogen bonds that are key to the folding of the chignolin peptide:

  • Asp3-O --- Gly7-N
  • Asp3-N –-- Thr8 O

Chignolin Peptide with key hydrogen bonds annotated

Chignolin structure showing hydrogen bonds important for its secondary structure (Asp3 O – Gly7 N and Asp3 N – Thr8 O)

Config file setup

We will use the distances between chignolin's hydrogen bond donor-acceptor pairs to define our collective variables. As with the previous example, we will create a new 05_Metadynamics step in the place of the 05_Production_MD step used in the config file from worked example 1

- stepName: 05_Metadynamics
    simulationType: META
    duration: 10 ns
    timestep: 2 fs
    heavyProtons: False
    temperature: 300
    logInterval: 10 ps
    metaDynamicsInfo:
      height: 0.8
      biasFactor: 10
      frequency: 500
      biases:
        - biasVar: distance
          minValue: 2.7
          maxValue: 10
          biasWidth: 0.5
          selection: 
            keyword: custom
            customSelection:
            - {CHAIN_ID: A, RES_NAME: ASP, RES_ID: 3, ATOM_NAME: O}
            - {CHAIN_ID: A, RES_NAME: GLY, RES_ID: 7, ATOM_NAME: N}
        - biasVar:  distance
          minValue: 2.7
          maxValue: 10
          biasWidth: 0.5
          selection: 
            keyword: custom
            customSelection:
            - {CHAIN_ID: A, RES_NAME: ASP, RES_ID: 3, ATOM_NAME: N}
            - {CHAIN_ID: A, RES_NAME: THR, RES_ID: 8, ATOM_NAME: O}

In this 05_Metadynamics step, we have set the height, biasFactor and frequency parameters for the metadynamics simulation. We also have created two distance bias variables, one for each hydrogen bond.