This page lists all the databases and software tools used for the wiki.
Most of the information in the wiki is from the UniProt Knowledgebase (UniProtKB).
The proteins in this wiki are ordered based on their unique UniProtKB ID. Each protein page also has a link to the UniprotDB entry for the protein which can be used to find additional information about the entry.
If the protein already has a known crystal structure in Uniprot, it is used in the wiki. The PDB entry for these models is in the filename after the gene name or Uniprot ID. The modelling method (X-ray/NMR) is informed under the model.
A database that lists amino acid and nucleotide sequences for proteins.
All the genome information in the wiki (position in genome and the nucleotide sequence) has been searched from Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The search was done using ordered locus names from UniProt.
In this wiki, the ordered locus name is listed first as a gene name if it is in the UniProtKB, followed by the gene name in brackets. Proteins that don’t have an ordered locus name do not have genome information in the wiki.
The KEGG entries for proteins with ordered locus names can be easily accessed from Uniprot under Cross-references section.
A database of protein domains and functional sites and their identifiable features.
Interpro was used for searching new proteins for the database based on ferredoxin domains. The database recognizes protein domains from a given amino acid sequence, and it can be used for finding proteins with a searched domain.
Following domains were searched for the proteins in the wiki:
- IPR017896 (4Fe-4S ferredoxin-type, iron-sulphur binding domain)
- IPR001041 (2Fe-2S ferredoxin-type iron-sulfur binding domain)
In addition to finding proteins with certain domains, the database also contains signatures for functional sites, which could be used to identify protein domains in unknown amino acid sequences.
Protein modelling tools
These tools were used for simulated models in the wiki. For more comparison of modelling tools, see this presentation.
Most of the simulated models have been modeled using Phyre2, with some probable ligands added using 3DLigand Site. The tool generates a 3D crystal structure from an amino acid sequence mainly by comparing it to the templates.
Normal mode was mostly used for generating the model. However, this often leaves parts of the amino acid out of the model if the ends of the protein do not align with any templates. Amino acid sequences are therefore often slightly shorter in the model than in the sequence listed in the database.
Sometimes Phyre2 determines that better coverage of the amino acid sequence could be achieved by using multiple templates in the Intensive mode. All these amino acid sequences were resubmitted in the intensive mode to get better coverage. In the wiki, this is informed under the link to the model.
Phyre2 does not support modelling Selenocysteine (U). To make sequences with selenocysteine compatible with Phyre2, it was converted to cysteine (C).
Kelley, L., Mezulis, S., Yates, C. et al. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10, 845–858 (2015). https://doi.org/10.1038/nprot.2015.053
3DLigandSite predicts ligand binding sites in a given amino acid sequence or protein 3D model. The ligand locations are predicted using known 3D crystal structures.
Phyre2 should automatically submit the generated model to 3DLigandSite if the confidence is high enough. However, the link to 3DLigandSite in the Phyre2 report seems to be broken. Because 3DLigandSite generates a 3D model from amino acid sequences using Phyre2, the jobs should be submitted there instead of Phyre2.
Compared to I-TASSER ligand prediction, 3DLigandSite adds all the predicted ligands to a single model. Because many ferredoxins with very similar binding sites exist, the models often have a large number of 4Fe-4S complexes in slightly different orientations, which can make viewing the models somewhat confusing. The tool often fails to predict ferredoxin complexes to larger proteins, which might be because few similar 3D structures are known.
After 20.7., many jobs in 3DLigandSite have got stuck at Submitted or Running. The amino acid is always submitted to Phyre2 correctly but the job does not progress after the model is done. Resubmitting the finished .pdb model sometimes fixes the issue.
Wass MN, Kelley LA & Sternberg MJ (2010) 3DLigandSite: predicting ligand-binding sites using similar structures. NAR 38 Suppl:W469-73. PubMed.
Some models were generated using I-TASSER. It works similarly to Phyre2, but can predict ligands natively. However, it is a lot slower than Phyre2 and therefore not optimal for generating large amounts of models.
Ligand prediction seemed to work well in a few times I tested it, but the tool can only export a model with all the possible ligands or only one of them. For models with many ferredoxin complexes, the ability to pick and chose multiple ligands to the model would be useful.
- A Roy, A Kucukural, Y Zhang. I-TASSER: a unified platform for automated protein structure and function prediction. Nature Protocols, 5: 725-738 (2010). (PDF).
- J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). (PDF and supplementary).
- J Yang, Y Zhang. I-TASSER server: new development for protein structure and function predictions. Nucleic Acids Research, 43: W174-W181 (2015). (PDF and supplementary).
Model viewing tools
All pictures of the modelled proteins were taken using UCSF Chimera. One picture was taken using the Interactive 1 (ribbons) preset and two using Interactive 3 (hydrophobicity surface) preset. The first surface model was taken from the same angle as the ribbon model, while the second was taken from the other side of the protein.
The ribbon models show n-terminus as blue and C-terminus as red. The surface models show hydrophilic amino acids as blue and hydrophobic amino acids as red. For model with multiple chains or proteins, only the first chain matching the amino acid sequence is shown.