(This tutorial is related to work published at https://doi.org/10.3390/nano12193379).

First, we will build a cellulose nanofibril structure using the program Cellulose-builder. The program offers a user-friendly interface and a set of bash scripts to build the cellulose structure in PDB format with different cellobiose chain lengths and geometries (crystalline, fibril, and monolayer). The program allows imposing periodicity along the chain axis for all model types and 3-dimensional periodicity for the crystalline model. The program supports geometry building for various cellulose polymorphs (I-α, I-β, II, and III(I)). 

We will build a cellulose nanofibril with 10 cellobiose units of the 1-4 linked I-β polymorph for this tutorial. 

Cellulose builder web interface: 

If you prefer to avoid the fuss of downloading and running a script, click here to access the online GUI for geometry construction (Go to the next subsection for a command-line bash script version for structure building). Below is an image of the webserver with queries chosen to build a single nanofibril of 1-4 linked I-β cellulose with 10 cellobiose units (20 D-glucose molecules) per chain. We want to model to represent a slice of the cellulose fibre, so we choose to link the terminal glucose molecules. 

PHASE: I-BETA
PBC: NONE
PCB_c: TRUE
Select Input: fibril
Input 1: 10

 

NOTE: When choosing to build a crystalline representation of cellulose, it is important to go through the article and check the significance of the three input integer options, which define the number of chains on origin and centre layers, and the chain length. Below is an example of parameters to build a structure with 6 origin layers, 4 chains/origin centre layers, and 5 cellobiose units/chains.  

After clicking on the submit button, you can download the results in a zip file that includes the structure output in PDB format (crystal.pdb) and other parameter files supporting the AMBER MD package (not significant for this tutorial). Navigate to the download directory using the terminal and use the following command to extract results from the compressed file: 

unzip <filename>.zip  

cd <filename> 

Cellulose Builder command-line scripts:

If you prefer to dive into the scripts and look at how the geometries and different structure types are built, this is the step for you. The bash script source code of cellulose builder can be downloaded here (GNU license).  Using the scripts to build the model has few installation dependencies (Octave, VMD and psfgen). Check the official documentation of the program here for instructions on how to install the pre-requisites. Remember to add the dependencies executables to the PATH to your “bashrc” or “bash-profile”. Okay, now that we have all the dependencies installed, let’s extract the scripts from the compressed file using the command-line terminal. 

 

For tarball file: tar -xzf cellulose-builder_July_2013.tar.gz 

For zip file: unzip cellulose-builder_July_2013.zip 

Next, start by navigating to the cellulose-builder_July_2013 directory. We first modify the file input.inp to choose the I-beta polymorph and chain periodicity. (You can also use any text editor of your choice) 

vi input.inp (below image shows the choice of parameters for I-beta fibril with periodicity along the axis of chain) 

 

Now run the bash script using the following command: 

./cellulose-builder fibril 10 

If all dependencies and executable paths are correctly defined, you will see a new directory named crystal, inside which there will be a PDB format structure file named crystal.pdb  

  

Now let’s take a closer look at the structure to get an idea of the orientation and interactions. 

pymol crystal.pdb 

You can navigate through the structure to look at the orientation and intramolecular interaction within the chain and across parallel chains. 



Now let’s set up the first MD simulations for the fibril system. A trivial step here is to generate the Gromacs compatible topology files for the cellulose system. Here we use a single D-glucose molecule parameterization to describe the bonded and non-bonded parameters of the system. The parameters (bonds, angles, dihedrals and partial charges) have been already derived with methods compatible and validated for Amber14 force fields, so we will not go into those detail in this tutorial. We will use the Amber14 force field parameters for this tutorial, which can be downloaded as amber14-mfc-pp.tgz. Unarchive the force field tarball into a new directory: 

mkdir force-fields 

mv ~/Downloads/amber14-mfc-pp.tgz force-fields/ 

cd force-fields 

tar -xzf amber14-mfc-pp.tgz 

There should now be a subdirectory named amber14sb_OL15.ff and two files, residuetypes.dat and specbond.dat. 

 

NOTE: If you have time, have a look at the files inside the amber14sb_OL15.ff directory. The glucose molecule parameters are described in the following files: 

  • New-glc.rtp: Residue topology with partial charges and bond descriptions 
  • New-glc.r2b: Residue name translation database 
  • New-glc.arn: Atom renaming database 

For a detailed description of the file types, check this link on the Gromacs webpage. 

 

We want the Gromacs library to access these force fields from any working directory of the computer,  so let’s add the path to the bashrc file of your system. 

echo export GMXLIB=”<path-to-force-field-directory>” >> ~/.bashrc 

or run this command to add the path to the environment variables.

export GMXLIB=”<path-to-force-field-directory>”

Now, let’s start generating the topologies for a cellulose nanofibril in water. Please create a new directory, cellullose-fibril-water and copy the structure file we generate (crystal.pdb) to this new directory. Under this directory, we set up our MD simulations using an approach that allows linking the terminal residues of the cellulose chain across the periodic box.  

Prepare the PDB file to match Gromacs RTP description 

sed -i "s/BGLC/IBG /" <structure-filename>.pdb 

Generate preliminary topology files to be used for the [atom] section description 

gmx_mpi pdb2gmx -f crystal.pdb -o pdb2gmx.gro -p topol-pdb2gmx.top -water tip3p 

The pdb2gmx tool will process the structure and prompt to choose a force field: 

(Choose option 1:AMBER14sb-OL15 + beta-glucose GLYCAM06h) 

Ignore the long bond warning here since pdb2gmx does not support periodic covalent bonding, and we will re-define these using x2top a few steps later. 

 

ImportantCompute the cellulose fibril chain length from the monomer count to define the length of box dimension "c" to allow covalent bonding of the terminal residues. In our case, we have 10 monomer/chain ==> 10.38 nm length.  

gmx_mpi editconf -f pdb2gmx.gro -o mfc-20mono.gro -box 7 8 10.38

Now let's generate a topology with periodic bonded parameters using "x2top" instead of pdb2gmx. 

gmx x2top -f mfc-20mono.gro -o topol-x2top.top -ff select -pbc -name IBG -noparam -alldih 

(Choose option 1:AMBER14sb-OL15 + beta-glucose GLYCAM06h) 

The next step is to add some corrections to the topology files and generate the final cellulose topology file "newtopol.top". 

sed -i '/\[ dihedrals \]/,/\[ system \]/s/1 *$/9/' topol-x2top.top 

sed -n '/Include forcefield parameters/,/\[ bonds \]/p' topol-pdb2gmx.top | sed '/bond/d' >& newtopol.top && sed -n '/\[ bonds \]/,/\[ system \]/p' topol-x2top.top | sed '/\[ system \]/d' >> newtopol.top && sed -n '/Include Position restraint file/,//p' topol-pdb2gmx.top >> newtopol.top 


Solvate

The rest of the MD workflow is now similar to the other Gromacs tutorials. We fill the fibril cell with water: 

gmx_mpi solvate -cp mfc-20mono.gro -cs spc216.gro -p newtopol.top -o box-sol.gro 

The solvated system is now assembled, and the output structure file box-sol.gro can be visualized with PyMol. Also, check for the modifications in the [ molecules ] section of the topology file (newtopol.top), where the number of waters should be included. 

 

Energy minimization

Before we begin the production dynamics, we want to relax the system and remove any steric clashes, using an energy minimization run. A key parameter necessary to simulate periodic/infinite cellulose chain parameters is the addition of "periodic_molecules" to the mdp file. Generate the binary input tpr file using mdp parameter file em.mdp

gmx_mpi grompp -f em.mdp -p newtopol.top -c box-sol.gro -o em 

We can now proceed with the relaxation MD step. Try to run the next step on the cluster, as it may take about an hour on a regular computer. You will have to copy all the directories we created to the cluster (force fields and cellulose-fibril-water). Important, do not forget to add the line export GMXLIB=”<path-to-force-field-directory>” to your bashrc file. 

gmx_mpi mdrun -deffnm em 

Evaluate if your minimization run was successful by checking the em.log file. If not, then try to check what is happening based on the log output. Consider doing a quick analysis of the evolution of the potential energy of the system. 

gmx_mpi energy -f em.edr -o em.xvg 

Choose option number for Potential and then “0” to terminate the analysis input. Use Xmgrace or any other program of choice to visualize the energy plot. 

You can also visualize the optimized structure in the em.gro file. But before that, we will fix any periodicity issues in the structure since Gromacs may not always put back the structure as a whole after implementing a periodic box. 

echo “Cellulose System” | gmx_mpi trjconv -f em.gro -s em.tpr -pbc mol -center -o em.gro 

 

Equilibration

A successful run will ensure we have a good starting model for the equilibration step. We will perform a two-step equilibration of the solvated fibril, starting with temperature equilibration with the canonical (NVT) ensemble and followed by an isothermal-isobaric ensemble (NPT) of 500ps each. The short time is sufficient for equilibrating small/medium systems, but we check the output to re-affirm the plan. 

We now repeat the grompp and mdrun commands using a different mdp file nvt.mdp: 

gmx_mpi grompp -f nvt.mdp -p newtopol.top -c em.gro -r em.gro -o nvt 

gmx_mpi mdrun -deffnm nvt -dlb yes 

At the end of a successful run, analyze the temperature evolution and output structure: 

gmx_mpi energy -f nvt.edr -o nvt.xvg

Choose the option "Temperature" followed by "0" to end the input prompt. Now, using any plotting tool of choice (Xmgrace), you can visualize the temperature fluctuation.

Echo “Cellulose System” | gmx_mpi trjconv -f nvt.gro -s nvt.tpr -pbc mol -center -o nvt.gro

After bringing the system's temperature to a stable range of around 300K, we will now use a barostat in combination with the thermostat to model the system with an environment that closely resembles the experimental conditions.  In the second step equilibration, we will run the NPT ensemble run for 500 ps with a V-rescale thermostat, weakly-coupled Berendsen barostat and position restraints for heavy atoms. Note that we implement semi-isotropic pressure coupling to separate the way the fibril feels pressure along the chain axis (Z) and the other two axes. The pressure coupling is isotropic in the X and Y direction. The NPT run parameter file is available as npt.mdp.

; Pressure coupling 
pcoupl                  = Berendsen     ;Parrinello-Rahman for production run (NPT) 
pcoupltype           = semiisotropic ; uniform scaling of box vectors in x-y and independent z 
tau_p                   = 2.0           ; time constant, in ps 
ref_p                   = 1.0 1.0      ; reference pressure, in bar 
compressibility         = 4.5e-5 4.5e-5 ; isothermal compressibility of water, bar^-1 
refcoord_scaling        = com 

 

Now, let’s start the pressure equilibration run. 

gmx_mpi grompp -f npt.mdp -p newtopol.top -c nvt.gro -r nvt.gro -t nvt.cpt -o npt 

gmx_mpi mdrun -deffnm npt -dlb yes 

Analyze the pressure equilibration using the following command: 

gmx_mpi energy -f npt.edr -o npt.xvg  

Choose the options "Pressure" followed by "0" to end the input prompt and visualize the plot in the output file npt.xvg 

Next, we again make the structure whole using the trjconv tool: 

echo “Cellulose System” | gmx_mpi trjconv -f npt.gro -s npt.tpr -pbc mol -center -o npt.gro 

As expected, the pressure value will fluctuate widely, but the average should be closer to the reference pressure value of 1 bar. The reason behind the fluctuations has been explained in detail in the Gromacs manual and tutorial. 

We use the same NPT ensemble for the production MD run but without any position restraints and a tighter pressure coupling for using the Parrionello-Rahman barostat. You can either copy the npt.mdp to md.mdp and make the below changes or download the parameter file md.mdp.


pcoupl  = Parrinello-Rahman  

 

Production run & Analysis

Okay, finally time to start the production run.

gmx_mpi grompp -f md.mdp -p newtopol.top -c npt.gro -o topol 

gmx_mpi mdrun -s topol -dlb yes 

Make sure to check the run ends successfully. In case of time limit issues on the cluster, it is possible to restart the MD run from the last saved time-point (state.cpt) using the following command:

gmx_mpi mdrun -s topol -dlb yes -cpi state.cpt

Once completed, make the output structure in the file confout.gro whole again using trjconv tool. Check the structure with PyMol for any visually distinct feature in the fibril model.

echo “Cellulose System” | gmx_mpi trjconv -f confout.gro -s topol.tpr -pbc mol -center -o confout-10ns.gro 

 In this section, we only analyse the stability of the model during the 10 ns run. It is always a good idea to evaluate the RMSD fluctuation of the most relevant part of the system (cellulose fibril in our case).

echo “Cellulose Cellulose” | gmx_mpi rms -f traj.trr -s topol.tpr -o rmsd 

In the plot, there may be an initial rise in root-mean-square fluctuations but it should stabilize within 2 ns with only small fluctuations. Next, we will check the potential energy evolution over the trajectory if the stability in the RMSD curve matches the energy landscape.

gmx_mpi energy -f ener.edr -o energy.xvg 

Choose the options "Potential", "Temperature", and "Density" followed by "0" to end the prompt. Now open the plot file and evaluate if the trajectory needs to be extended based on the energy landscape and density fluctuations.


Click here to move to the tutorial for building a polypropylene solvent box

  • No labels

7 Comments

  1. Dear Antti Karttunen,

    Thank you for updating this document. This is my first guidance on Gromacs and I really appreciate your clear instruction.

    May I ask you a question here? The existing cellulose-builder can build a 36-chain cellulose model conveniently. How can I build an 18-chain cellulose model in the hexaonal 2-3-4-4-3-2 arrangement?

    Thank you for your help.

  2. Hello, the guide has been originally written by Vaibhav Modi. I have not used Cellulose Builder myself, so unfortunately I don't know the answer. Perhaps you could build a larger model and then simply remove some of the chains with the help of a visualization program that can edit PDB files. Some codes are listed here: https://www.rcsb.org/docs/additional-resources/molecular-graphics-software.

    Best,
    Antti

    1. Thank you very much for your information, I will try the method as mentioned. 

      Have a nice day,

      Jingyu

  3. Dear Antti Karttunen,

    Thank you for writing such a great tutorial. I have a question.

    A difference between GLYCAM force field and Amber force field is the way to deal with the 1-4 nonbonded interaction. The fudgeLJ and fudgeQQ is different. So I want to know whether you have consider the issue. According to the tutorial, the "amber14sb_OL15.ff/forcefield.itp" was included in the topol. So the fudgeLJ and fudgeQQ is 0.5 and 0.8333, respectively. But in the [pairs] part, there is no modification.

    Best wishes!

    Wei Wu

    Donghua University in China


    1. Dear Wei Wu,

      I discussed with Vaibhav Modi, the lead author of the tutorial, and here is a reply to your question:

      The differences between GLYCAM and Amberff were considered during the parameterization. To generate Amberff and Gromacs compatible parameters we implemented the "acpype" scripts which also account for the scaling of dihedrals and 1-4 interactions very efficiently. It would of course be also possible to use Gromacs compatible GLYCAM parameters but we switched to Amber to include olefins as well for the composite model later.

      More details with references to acpype.py can be found from our original publication at: https://doi.org/10.3390/nano12193379

      Best wishes,
      Antti

  4. Thank you for contributing to the new enthusiasts of molecular dynamics with this tutorial! I'm new to using GROMACS.

    I followed the instructions you provided diligently, but when trying to generate the topol-x2top.top file, I encountered this message:

    "Cannot find the force field for atom O4-21 with 1 bond.

    Cannot find the force field for atom C1-64 with 3 bonds.

    Cannot find the force field for atom O4-105 with 1 bond.

    Cannot find the force field for atom C1-148 with 3 bonds,"

    along with this error: "Could only find a force field type for 14893 out of 15120 atoms."

    I would appreciate it if you could provide me with guidance regarding this. Thank you.


    Ángel Fuentes

    Universidad de Pamplona, Colombia.

  5. Thank you for providing detailed tutorial and nice work. Thank you Professor

    I am interested in cellulose nanofibril using gromacs. I am also facing the same problem as mentioned above. 

    Cannot find the force field for atom C1-8170 with 3 bonds,"

    Fetal  error: 
    "Could only find a force field type for 8112 out of 8190 atoms."

    Can you kindly assist to fix this problem. Thank you