(This tutorial is related to work published at https://doi.org/10.3390/nano12193379).
First, we will build a cellulose nanofibril structure using the program Cellulose-builder. The program offers a user-friendly interface and a set of bash scripts to build the cellulose structure in PDB format with different cellobiose chain lengths and geometries (crystalline, fibril, and monolayer). The program allows imposing periodicity along the chain axis for all model types and 3-dimensional periodicity for the crystalline model. The program supports geometry building for various cellulose polymorphs (I-α, I-β, II, and III(I)).
We will build a cellulose nanofibril with 10 cellobiose units of the 1-4 linked I-β polymorph for this tutorial.
Cellulose builder web interface:
If you prefer to avoid the fuss of downloading and running a script, click here to access the online GUI for geometry construction (Go to the next subsection for a command-line bash script version for structure building). Below is an image of the webserver with queries chosen to build a single nanofibril of 1-4 linked I-β cellulose with 10 cellobiose units (20 D-glucose molecules) per chain. We want to model to represent a slice of the cellulose fibre, so we choose to link the terminal glucose molecules.
PHASE: I-BETA
PBC: NONE
PCB_c: TRUE
Select Input: fibril
Input 1: 10
NOTE: When choosing to build a crystalline representation of cellulose, it is important to go through the article and check the significance of the three input integer options, which define the number of chains on origin and centre layers, and the chain length. Below is an example of parameters to build a structure with 6 origin layers, 4 chains/origin centre layers, and 5 cellobiose units/chains.
After clicking on the submit button, you can download the results in a zip file that includes the structure output in PDB format (crystal.pdb) and other parameter files supporting the AMBER MD package (not significant for this tutorial). Navigate to the download directory using the terminal and use the following command to extract results from the compressed file:
unzip <filename>.zip
cd <filename>
Cellulose Builder command-line scripts:
If you prefer to dive into the scripts and look at how the geometries and different structure types are built, this is the step for you. The bash script source code of cellulose builder can be downloaded here (GNU license). Using the scripts to build the model has few installation dependencies (Octave, VMD and psfgen). Check the official documentation of the program here for instructions on how to install the pre-requisites. Remember to add the dependencies executables to the PATH to your “bashrc” or “bash-profile”. Okay, now that we have all the dependencies installed, let’s extract the scripts from the compressed file using the command-line terminal.
For tarball file: tar -xzf cellulose-builder_July_2013.tar.gz
For zip file: unzip cellulose-builder_July_2013.zip
Next, start by navigating to the cellulose-builder_July_2013 directory. We first modify the file input.inp to choose the I-beta polymorph and chain periodicity. (You can also use any text editor of your choice)
vi input.inp (below image shows the choice of parameters for I-beta fibril with periodicity along the axis of chain)
Now run the bash script using the following command:
./cellulose-builder fibril 10
If all dependencies and executable paths are correctly defined, you will see a new directory named crystal, inside which there will be a PDB format structure file named crystal.pdb.
Now let’s take a closer look at the structure to get an idea of the orientation and interactions.
pymol crystal.pdb
You can navigate through the structure to look at the orientation and intramolecular interaction within the chain and across parallel chains.
Now let’s set up the first MD simulations for the fibril system. A trivial step here is to generate the Gromacs compatible topology files for the cellulose system. Here we use a single D-glucose molecule parameterization to describe the bonded and non-bonded parameters of the system. The parameters (bonds, angles, dihedrals and partial charges) have been already derived with methods compatible and validated for Amber14 force fields, so we will not go into those detail in this tutorial. We will use the Amber14 force field parameters for this tutorial, which can be downloaded as amber14-mfc-pp.tgz. Unarchive the force field tarball into a new directory:
mkdir force-fields
mv ~/Downloads/amber14-mfc-pp.tgz force-fields/
cd force-fields
tar -xzf amber14-mfc-pp.tgz
There should now be a subdirectory named amber14sb_OL15.ff and two files, residuetypes.dat and specbond.dat.
NOTE: If you have time, have a look at the files inside the amber14sb_OL15.ff directory. The glucose molecule parameters are described in the following files:
- New-glc.rtp: Residue topology with partial charges and bond descriptions
- New-glc.r2b: Residue name translation database
- New-glc.arn: Atom renaming database
For a detailed description of the file types, check this link on the Gromacs webpage.
We want the Gromacs library to access these force fields from any working directory of the computer, so let’s add the path to the bashrc file of your system.
echo export GMXLIB=”<path-to-force-field-directory>” >> ~/.bashrc
or run this command to add the path to the environment variables.
export GMXLIB=”<path-to-force-field-directory>”
Now, let’s start generating the topologies for a cellulose nanofibril in water. Please create a new directory, cellullose-fibril-water and copy the structure file we generate (crystal.pdb) to this new directory. Under this directory, we set up our MD simulations using an approach that allows linking the terminal residues of the cellulose chain across the periodic box.
Prepare the PDB file to match Gromacs RTP description
sed -i "s/BGLC/IBG /" <structure-filename>.pdb
Generate preliminary topology files to be used for the [atom] section description
gmx_mpi pdb2gmx -f crystal.pdb -o pdb2gmx.gro -p topol-pdb2gmx.top -water tip3p
The pdb2gmx tool will process the structure and prompt to choose a force field:
(Choose option 1:AMBER14sb-OL15 + beta-glucose GLYCAM06h)
Ignore the long bond warning here since pdb2gmx does not support periodic covalent bonding, and we will re-define these using x2top a few steps later.
Important: Compute the cellulose fibril chain length from the monomer count to define the length of box dimension "c" to allow covalent bonding of the terminal residues. In our case, we have 10 monomer/chain ==> 10.38 nm length.
gmx_mpi editconf -f pdb2gmx.gro -o mfc-20mono.gro -box 7 8 10.38
Now let's generate a topology with periodic bonded parameters using "x2top" instead of pdb2gmx.
gmx x2top -f mfc-20mono.gro -o topol-x2top.top -ff select -pbc -name IBG -noparam -alldih
(Choose option 1:AMBER14sb-OL15 + beta-glucose GLYCAM06h)
The next step is to add some corrections to the topology files and generate the final cellulose topology file "newtopol.top".
sed -i '/\[ dihedrals \]/,/\[ system \]/s/1 *$/9/' topol-x2top.top
sed -n '/Include forcefield parameters/,/\[ bonds \]/p' topol-pdb2gmx.top | sed '/bond/d' >& newtopol.top && sed -n '/\[ bonds \]/,/\[ system \]/p' topol-x2top.top | sed '/\[ system \]/d' >> newtopol.top && sed -n '/Include Position restraint file/,//p' topol-pdb2gmx.top >> newtopol.top
Solvate
The rest of the MD workflow is now similar to the other Gromacs tutorials. We fill the fibril cell with water:
gmx_mpi solvate -cp mfc-20mono.gro -cs spc216.gro -p newtopol.top -o box-sol.gro
The solvated system is now assembled, and the output structure file box-sol.gro can be visualized with PyMol. Also, check for the modifications in the [ molecules ] section of the topology file (newtopol.top), where the number of waters should be included.
Energy minimization
Before we begin the production dynamics, we want to relax the system and remove any steric clashes, using an energy minimization run. A key parameter necessary to simulate periodic/infinite cellulose chain parameters is the addition of "periodic_molecules" to the mdp file. Generate the binary input tpr file using mdp parameter file em.mdp
gmx_mpi grompp -f em.mdp -p newtopol.top -c box-sol.gro -o em
We can now proceed with the relaxation MD step. Try to run the next step on the cluster, as it may take about an hour on a regular computer. You will have to copy all the directories we created to the cluster (force fields and cellulose-fibril-water). Important, do not forget to add the line export GMXLIB=”<path-to-force-field-directory>” to your bashrc file.
gmx_mpi mdrun -deffnm em
Evaluate if your minimization run was successful by checking the em.log file. If not, then try to check what is happening based on the log output. Consider doing a quick analysis of the evolution of the potential energy of the system.
gmx_mpi energy -f em.edr -o em.xvg
Choose option number for Potential and then “0” to terminate the analysis input. Use Xmgrace or any other program of choice to visualize the energy plot.
You can also visualize the optimized structure in the em.gro file. But before that, we will fix any periodicity issues in the structure since Gromacs may not always put back the structure as a whole after implementing a periodic box.
echo “Cellulose System” | gmx_mpi trjconv -f em.gro -s em.tpr -pbc mol -center -o em.gro
Equilibration
A successful run will ensure we have a good starting model for the equilibration step. We will perform a two-step equilibration of the solvated fibril, starting with temperature equilibration with the canonical (NVT) ensemble and followed by an isothermal-isobaric ensemble (NPT) of 500ps each. The short time is sufficient for equilibrating small/medium systems, but we check the output to re-affirm the plan.
We now repeat the grompp and mdrun commands using a different mdp file nvt.mdp:
gmx_mpi grompp -f nvt.mdp -p newtopol.top -c em.gro -r em.gro -o nvt
gmx_mpi mdrun -deffnm nvt -dlb yes
At the end of a successful run, analyze the temperature evolution and output structure:
gmx_mpi energy -f nvt.edr -o nvt.xvg
Choose the option "Temperature" followed by "0" to end the input prompt. Now, using any plotting tool of choice (Xmgrace), you can visualize the temperature fluctuation.
Echo “Cellulose System” | gmx_mpi trjconv -f nvt.gro -s nvt.tpr -pbc mol -center -o nvt.gro
After bringing the system's temperature to a stable range of around 300K, we will now use a barostat in combination with the thermostat to model the system with an environment that closely resembles the experimental conditions. In the second step equilibration, we will run the NPT ensemble run for 500 ps with a V-rescale thermostat, weakly-coupled Berendsen barostat and position restraints for heavy atoms. Note that we implement semi-isotropic pressure coupling to separate the way the fibril feels pressure along the chain axis (Z) and the other two axes. The pressure coupling is isotropic in the X and Y direction. The NPT run parameter file is available as npt.mdp.
; Pressure coupling
pcoupl = Berendsen ;Parrinello-Rahman for production run (NPT)
pcoupltype = semiisotropic ; uniform scaling of box vectors in x-y and independent z
tau_p = 2.0 ; time constant, in ps
ref_p = 1.0 1.0 ; reference pressure, in bar
compressibility = 4.5e-5 4.5e-5 ; isothermal compressibility of water, bar^-1
refcoord_scaling = com
Now, let’s start the pressure equilibration run.
gmx_mpi grompp -f npt.mdp -p newtopol.top -c nvt.gro -r nvt.gro -t nvt.cpt -o npt
gmx_mpi mdrun -deffnm npt -dlb yes
Analyze the pressure equilibration using the following command:
gmx_mpi energy -f npt.edr -o npt.xvg
Choose the options "Pressure" followed by "0" to end the input prompt and visualize the plot in the output file npt.xvg
Next, we again make the structure whole using the trjconv tool:
echo “Cellulose System” | gmx_mpi trjconv -f npt.gro -s npt.tpr -pbc mol -center -o npt.gro
As expected, the pressure value will fluctuate widely, but the average should be closer to the reference pressure value of 1 bar. The reason behind the fluctuations has been explained in detail in the Gromacs manual and tutorial.
We use the same NPT ensemble for the production MD run but without any position restraints and a tighter pressure coupling for using the Parrionello-Rahman barostat. You can either copy the npt.mdp to md.mdp and make the below changes or download the parameter file md.mdp.
pcoupl = Parrinello-Rahman
Production run & Analysis
Okay, finally time to start the production run.
gmx_mpi grompp -f md.mdp -p newtopol.top -c npt.gro -o topol
gmx_mpi mdrun -s topol -dlb yes
Make sure to check the run ends successfully. In case of time limit issues on the cluster, it is possible to restart the MD run from the last saved time-point (state.cpt) using the following command:
gmx_mpi mdrun -s topol -dlb yes -cpi state.cpt
Once completed, make the output structure in the file confout.gro whole again using trjconv tool. Check the structure with PyMol for any visually distinct feature in the fibril model.
echo “Cellulose System” | gmx_mpi trjconv -f confout.gro -s topol.tpr -pbc mol -center -o confout-10ns.gro
In this section, we only analyse the stability of the model during the 10 ns run. It is always a good idea to evaluate the RMSD fluctuation of the most relevant part of the system (cellulose fibril in our case).
echo “Cellulose Cellulose” | gmx_mpi rms -f traj.trr -s topol.tpr -o rmsd
In the plot, there may be an initial rise in root-mean-square fluctuations but it should stabilize within 2 ns with only small fluctuations. Next, we will check the potential energy evolution over the trajectory if the stability in the RMSD curve matches the energy landscape.
gmx_mpi energy -f ener.edr -o energy.xvg
Choose the options "Potential", "Temperature", and "Density" followed by "0" to end the prompt. Now open the plot file and evaluate if the trajectory needs to be extended based on the energy landscape and density fluctuations.
Click here to move to the tutorial for building a polypropylene solvent box
5 Comments
Jingyu Li
Dear Antti Karttunen,
Thank you for updating this document. This is my first guidance on Gromacs and I really appreciate your clear instruction.
May I ask you a question here? The existing cellulose-builder can build a 36-chain cellulose model conveniently. How can I build an 18-chain cellulose model in the hexaonal 2-3-4-4-3-2 arrangement?
Thank you for your help.
Antti Karttunen
Hello, the guide has been originally written by Vaibhav Modi. I have not used Cellulose Builder myself, so unfortunately I don't know the answer. Perhaps you could build a larger model and then simply remove some of the chains with the help of a visualization program that can edit PDB files. Some codes are listed here: https://www.rcsb.org/docs/additional-resources/molecular-graphics-software.
Best,
Antti
Jingyu Li
Thank you very much for your information, I will try the method as mentioned.
Have a nice day,
Jingyu
wuwei
Dear Antti Karttunen,
Thank you for writing such a great tutorial. I have a question.
A difference between GLYCAM force field and Amber force field is the way to deal with the 1-4 nonbonded interaction. The fudgeLJ and fudgeQQ is different. So I want to know whether you have consider the issue. According to the tutorial, the "amber14sb_OL15.ff/forcefield.itp" was included in the topol. So the fudgeLJ and fudgeQQ is 0.5 and 0.8333, respectively. But in the [pairs] part, there is no modification.
Best wishes!
Wei Wu
Donghua University in China
Antti Karttunen
Dear Wei Wu,
I discussed with Vaibhav Modi, the lead author of the tutorial, and here is a reply to your question:
The differences between GLYCAM and Amberff were considered during the parameterization. To generate Amberff and Gromacs compatible parameters we implemented the "acpype" scripts which also account for the scaling of dihedrals and 1-4 interactions very efficiently. It would of course be also possible to use Gromacs compatible GLYCAM parameters but we switched to Amber to include olefins as well for the composite model later.
More details with references to acpype.py can be found from our original publication at: https://doi.org/10.3390/nano12193379
Best wishes,
Antti