Plant Proteome Databases and Bioinformatic Tools: An Expert Review and Comparative Insights

Abstract

Historically, plant biology studies have lagged behind systems biology studies in animals and humans. However, there are signs of positive change as evidenced by the rise of big data in plant proteomics, and the availability of data science tools and next-generation sequencing technologies. Currently, the sequence information on nearly 300 plant species is available although they are curated to varying degrees of sophistication. This has led to significant enrichment of representations in the corresponding plant proteome databases. Analysis of the proteome component of an organism offers structural, functional, and network scale insights. Moreover, the development of high-throughput mass spectrometric techniques has augmented our understanding of proteins and their expression patterns under various conditions. Several thousand proteins can now be identified from a single mass spectrometric analysis. In this expert review, we provide an in-depth analysis on plant proteome databases, how to access them, and, importantly, the biological, research, and application contexts in which each database is significant, their comparative strengths, and limitations. We aimed in this analysis to reach out to young scholars embarking on plant biology and proteomic research as well as to those already established in the field so as to provide integrated critical analyses of plant proteome databases and bioinformatics tools in this nascent field of systems sciences. In conclusion, plant proteome research is an emerging and exciting frontier of integrative biology scholarship and innovation. Our future efforts must also be invested in integrating the available databases to allow for multiomics data analysis, research, and development.

Introduction

The advent of next-generation sequencing platforms has led to generation of massive amounts of biological big data. In addition, high-throughput mass spectrometric techniques have further amplified the amount of available proteomic data. The article “A draft map of the human proteome” has recently reported the identification of proteins encoded by 17,294 genes (Kim et al., 2014).

Plant proteome research is an exciting and essential frontier of integrative biology. The present expert review offers a deeper understanding of the plant proteome databases, how to access them, and, importantly, the biological, research, and application contexts in which each database is significant, their comparative strengths, and limitations (Fig. 1).

FIG. 1.

Schematic representation of the plant proteome databases and bioinformatics tools useful for plant proteome analysis.

We aimed to reach out to young scholars embarking on new careers in plant systems biology and proteomic as well as those already established in the field so as to provide integrated critical analyses of plant proteome databases and bioinformatics tools in this nascent field of systems sciences.

Dawn of Proteomics Big Data

Proteomics big data are now emerging in plant research as well. Proteome analysis in plant systems has been carried using various tissue samples such as leaves, root, flower, and fruits among others (Feng et al., 2017; Hopff et al., 2013; Jia et al., 2017; Szymanski et al., 2017). Initially, many studies have been carried out using two-dimensional electrophoresis (2-DE) gel-based techniques. For example, the SWISS-2DPAGE (http://world-2dpage.expasy.org/swiss-2dpage/) houses data related to 2-DE gels, including that of Arabidopsis thaliana, wherein one can locate a protein from UniProtKB/Swiss-Prot on the gel. However, the last update for the database was made in 2011.

Advancements made in chromatographic techniques together with improved resolution powers of mass spectrometers are now paving the way for gel-free techniques of plant proteome analysis (Takac et al., 2017; Tan et al., 2017). Subcellular proteomics have been carried out to gain comprehensive understanding of the compartment-specific data (Albenne et al., 2013; Lee et al., 2013; Narula et al., 2013). Such data sets can be queried for experimental evidence about protein subcellular location (Hooper et al., 2017).

In addition to proteome analysis, high-resolution mass spectrometers allow the analysis of protein post-translational modifications. In this context, data have been recently accumulating, for example, on phosphorylation, acetylation, and succinylation (Hartl et al., 2017; Kumar et al., 2017; Zhen et al., 2016). Information about sites of modifications is documented in databases such as The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt), PHOsphorylation SIte DAtabase (PHOSIDA), and Plant Protein Phosphorylation DataBase (P3DB) for phosphorylation, eukaryotic Writers, Erasers and Readers protein of Histone Acetylation, and Methylation system Database (WERAM) for histone acetylation and methylation (Durek et al., 2010; Gao et al., 2009; Gnad et al., 2011; Zulawski et al., 2013).

Apart from these dedicated databases, dbPTM provides access to available data on various post-translational modifications and OMICtools allows users to navigate various resources and prediction tools for analysis of post-translational modifications (Henry et al., 2014; Huang et al., 2016). High-throughput proteomic studies generate large volumes of data in the order of 15–20 GB per day.

The ProteomeXchange consortium was created to facilitate storage and dissemination of mass spectrometry-derived data. Data can be submitted to the public data repository Proteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/archive/). It serves as a “go to” resource for researchers to archive available data for (re)analysis.

The selection of starting materials for proteomics studies is the most crucial step in conducting an experiment. Databases such as the phenotype database of Arabidopsis mutant traits AraPheno and RARGE enable selection of plants based upon the reported phenotypes (Akiyama et al., 2014; Lloyd and Meinke, 2012; Seren et al., 2017).

The Human Protein Atlas and the Human Protein Reference Database (HPRD) house human protein information including, but not limited to, their tissue and subcellular localization, expression and association with diseases, post-translational modification, and protein–protein interactions (Keshava Prasad et al., 2009; Uhlen et al., 2015; Thul et al., 2017). The wealth of information in Arabidopsis is available in The Arabidopsis Information Resource (TAIR).

Protein Sequence Databases

Sequence information for most of the sequenced plant genomes can be retrieved from databases such as NCBI Viridiplantae, GenBank, DDBJ, and UniProt that host related protein sequence information in addition to nucleotide sequence information. Sequence retrieval can also be performed from databases dedicated to plants such as Phytozome, plaBi, and Gramene database, which are subsets of the EnsemblPlants database and PlantGDB.

Dedicated databases for individual plant species are available such as TAIR for A. thaliana, TOMATOMICS for Solanum lycopersicum, maize GDB for Zea mays, Rice Genome Annotation Project for Oryza sativa ssp. japonica cv. Nipponbare, legume-specific protein database LegProt, and the wheat proteome database (Duncan et al., 2017; Harper et al., 2016; Kawahara et al., 2013; Kudo et al., 2017; Lei et al., 2011; Reiser et al., 2017). The list of databases available for plant proteome analysis is given in Table 1.

Table 1.

Databases Essential for Plant Proteome Analysis

Database	Context in which the database is important for plant and integrative biology	Remarks on comparative strengths and limitations	Access by
NCBI: Under international nucleotide sequence database collaboration (GenBank, DNA DDBJ, European Nucleotide Archive)	• Protein databases can be downloaded and used for searching the mass spectrometry data • Protein sequence similarity searches can be performed using BLAST search tools against the nonredundant or individual plant database • Allows alignment of multiple sequences	• Includes protein sequences from GenPept, RefSeq, Swiss-Prot, PIR, PRF, and PDB • Includes predicted proteome from GenBank • Includes nonredundant and well-annotated protein sequences from RefSeq • Taxonomically diverse • Updated weekly • Predicted annotations are often not experimentally validated • No information related to post-translational modifications	https://www.ncbi.nlm.nih.gov
UniProtKB	• Protein databases can be downloaded and used for searching the mass spectrometry data • Allows conversion of IDs of one database to another • Allows BLAST searches • Allows alignment of multiple sequences • Retrieving protein functional annotations	• Includes protein sequences from EMBL-Bank/GenBank/DDBJ databases • Comprehensive resource for protein sequences • UniProtKB/Swiss-Prot (reviewed) is nonredundant well annotated • Sequences can be fetched programmatically • UniProtKB/TrEMBL (unreviewed) contains redundant sequences • Functional annotations of proteins are not well characterized • Next-generation sequencing/MS/MS resource not available • No portal for comparing spectral information	www.uniprot.org/
Phytozome	• Protein databases can be downloaded and used for searching the mass spectrometry data • PhytoMine allows sequence retrieval and analysis • Allows BLAST searches • Fetch GO terms • Fetch orthologous proteins	• Repository of sequenced plant genomes • Scores expression of plant genes • BLAST searches against other organisms cannot be performed	https://phytozome.jgi.doe.gov/pz/portal.html
PlabiPD	• Sequence download of few plant genomes • Allows protein functional annotation using Mercator • Allows Mapman ontology analysis	• Scores expression of plant genes • Provide cladogram of flowering and nonflowering plants • Sequence information of limited number of plants	www.plabipd.de/portal/sequenced-plant-genomes
Gramene database	• Sequence download of large number of plant species • Pathway analysis allows BLAST searches • Query gene expression in various plants	• Well annotated • Linked to various external databases	http://ensembl.gramene.org/species.html
EnsemblPlants	• Subset of EnsemblGenomes Sequence download of large number of plant species • BLAST analysis • Multiple sequence alignment	• Documents noncoding RNAs • Linked to various external databases	http://plants.ensembl.org/index.html
PlantGDB	• Sequence download of large number of plant species • BLAST analysis • Multiple sequence alignment	• Linked to various other external databases • Not updated very frequently	www.plantgdb.org/
TAIR database	• Sequence download of Arabidopsis thaliana • BLAST analysis • Sequence alignment • Fetch GO terms • Fetch publications for genes of interest • Archive functional annotations	• Well annotated • Linked to other databases and tools • Information related to Arabidopsis mutants • Requires subscription for unlimited access	https://www.arabidopsis.org/
maizeGDB	• Sequence download of Zea mays • BLAST analysis • Fetch GO terms • Fetch publications using search option • Pathway analysis	• Information related to maize mutants • Catalog of newly identified genes	http://archive.maizegdb.org/zmdb.php
TOMATOMICS	• Sequence download of Solanum lycopersicum • BLAST analysis • Information regarding mutants and tomato full-length cDNA clones • Fetch GO terms, functional annotations	• Sequence information for various cultivars	http://bioinf.mind.meiji.ac.jp/tomatomics/
Rice genome annotation project database	• Sequence download of Oryza sativa Nipponbare • BLAST analysis • Fetch GO terms, functional annotations • Analysis of conserved domains • Coexpression analysis	• Extensive resource for O. sativa Nipponbare • Sequence information for various cultivars not available • Not linked to other resources	http://rice.plantbiology.msu.edu/
SoyBase	• Sequence download of Glycine max • BLAST analysis • Information related to mutants • Coexpression analysis • Fetch GO terms, functional annotations	• Extensive resource for soy bean	https://soybase.org/
LegProt: legume-specific protein database	• Download translated sequence of A. thaliana, G. max, Lotus japonicus, Medicago sativa, Medicago truncatula, Lupinus albus, Phaseolus vulgaris, and Pisum sativum	• Translated sequences of gene models, cDNA • Limited number of sequences	http://plantgrn.noble.org/LegumeIP/
Wheat proteome database	• Query information-related wheat proteins and transcripts • Fetch orthologous proteins from Arabidopsis • Sequence download • View peptides observed in mass spectrometry experiments • Pathway analysis	• User friendly, easy to navigate • Data arranged according to protein expression in 24 different tissues • Proteins belonging to selected pathways can be viewed • MRM detection has been mentioned	www.wheatproteome.org
On plant kinases
PlantsP	• Query information-related Arabidopsis kinases • Information related to T-DNA mutants for Arabidopsis kinases	• Not updated recently	http://plantsp.genomics.purdue.edu/
EKPD: Eukaryotic protein kinases and protein phosphatases database	• Sequence download for kinases and phosphatases from various organisms • Query classification of kinases • Prediction of protein PTMs	• User friendly • Kinases and phosphatases are well classified • Linked to several other external databases related to protein phosphorylation • Covers kinase and phosphatase sequence information for large number of organisms	http://ekpd.biocuckoo.org/
On plant transcription factors
AtTFDB: Arabidopsis transcription factor database	• Download plant transcription factor sequences • Download predicted protein–protein interaction • Sequence alignment	• Linked to external databases • Transcription factors are well classified	http://agris-knowledgebase.org/AtTFDB/
ATRM: Arabidopsis transcriptional regulatory map	• Download Arabidopsis transcription factor sequences • View transcription factor networks	• Linked to PlantTFDB	http://atrm.cbi.pku.edu.cn/
PlantTFDB	• Download transcription factor sequences of several plant species • Functional enrichment analysis	• Transcription factors are well classified	http://planttfdb.cbi.pku.edu.cn/index.php
RARTF: RIKEN arabidopsis transcription factor database	• Fetch transcription factor families of Arabidopsis • Download cDNA sequences • Information about mutants • Multiple sequence alignment • Generate phylogenic trees, functional motifs • BLAST analysis • Retrieve expression data	• Transcription factors are well classified into families • Limited to Arabidopsis only	http://rarge.psc.riken.jp/rartf/
STIFDB: Stress responsive transcription factor database	• View list of rice and Arabidopsis transcription factors differentially regulated in response to biotic and abiotic stresses • Fetch related functional annotations • View enrichment profiles	• Fetch orthologous proteins • Transcription factors can be queried individually or according to stress • Multiple stresses are included • Not updated recently	http://caps.ncbs.res.in/stifdb2/index.html
LegumeTFDB			http://legumetfdb.psc.riken.jp/
CicerTransDB: Cicer transcription factor database	• Search, browse, and download transcription factors of Cicer arietinum• Retrieve related functional annotations	• Linked to external databases	www.cicertransdb.esy.es/
PpTFDB	• Browse and download transcription factors of Cajanus cajan• Fetch functional annotations• Fetch GO terms	• Categorized according to families• Protein phylogenetic analysis• Linked to other related databases	http://14.139.229.199/PpTFDB/Home.aspx
PvTFDB	• Browse transcription factors of P. vulgaris• Fetch functional annotations• Fetch GO terms• Tissue-specific expression analysis	• Categorized according to families• Protein phylogenetic analysis	www.multiomics.in/PvTFDB/
wDBTF: Database of wheat transcription factor	• Browse transcription factors of Triticum aestivum		http://wwwappli.nantes.inra.fr:8180/wDBFT/
FmTFDb: foxtail millet transcription factors database	• Browse transcription factors of Setaria italic• Fetch functional annotations• Fetch GO terms	• Categorized according to families• Protein phylogenetic analysis	http://59.163.192.91/FmTFDb/index.html
TreeTFDB: Database of the transcription factors from six economically important tree crops	• Browse transcription factors of six economically important trees		http://treetfdb.bmep.riken.jp/index.pl

BLAST, basic local alignment search tool; GO, gene ontology; MRM, multiple reaction monitoring; MS/MS, mass spectrometry/mass spectrometry (tandem mass spectrometry); PTMs, post-translational modifications; TAIR, The Arabidopsis Information Resource.

Plant Kinase Databases

The model plant Arabidopsis has twice as many protein kinases as compared with the human complement (Zulawski et al., 2014). With a total of 1052 kinases and 162 phosphatases (Wang et al., 2014a), the numbers are still counting, making it important to characterize the roles of these signaling components in a context-dependent manner.

After the first publication related to the classification of Arabidopsis kinases (Shiu and Bleecker, 2001), more kinases were subsequently added to the list (Wang et al., 2014a; Zulawski et al., 2014). Sequence information of Arabidopsis kinases can be retrieved from sources such as the database PlantsP (Tchieu et al., 2003). The database contains information related to 979 unique Arabidopsis kinases and 125 phosphatases classified into various groups. Few other kinases from other plant species are also included. The database has, however, not been populated with the newly identified kinases. The complement of Arabidopsis kinases is also available in P3DB (1186 entries) although not downloadable as a single list.

Perhaps the most comprehensive list of Arabidopsis kinases is currently available at EKPD the Eukaryotic protein Kinases and protein Phosphatases Database (Wang et al., 2014a). This database was updated in 2013 and currently contains protein kinases and protein phosphatases from 84 eukaryotic species. They are further divided into groups and classes based upon their catalytic domains. The information is accessible to the public and it can be downloaded.

Information from 22 plant species (A. thaliana, Arabidopsis lyrata, Brassica rapa, Glycine max, Populus trichocarpa, S. lycopersicum, Solanum tuberosum, Vitis vinifera, Brachypodium distachyon, Hordeum vulgare, Musa acuminate, Oryza glaberrima, Oryza indica, O. sativa, Oryza brachyantha, Setaria italic, Sorghum bicolor, Z. mays, Selaginella moellendorffii, Physcomitrella patens, Chlamydomonas reinhardtii, and Cyanidioschyzon merolae) has been included. For Arabidopsis, 1052 kinases and 162 phosphatases are categorized into groups. The protein and/or domain sequences can be downloaded from the advanced search option, wherein the sequences are arranged according to the organisms. The databases containing information related to plant kinases are listed in Table 1.

Plant Transcription Factor Databases

Some of the available plant transcription factor databases have earlier been reviewed and compared (Mitsuda and Ohme-Takagi, 2009). The database of Arabidopsis transcription factors (DATF) (Guo et al., 2005) is linked to ATRM: Arabidopsis Transcriptional Regulatory Map, which is a subset of the PlantTFDB (Jin et al., 2014). The latter contains information about transcription factor of >80 plant species.

The Arabidopsis transcription factor database (AtTFDB) houses a large collection of Arabidopsis transcription factors that can be fetched using either the locus ID or gene names (Yilmaz et al., 2011). The Arabidopsis transcription factors can also be browsed according to their gene families. As the database was created using TAIR9, revisions in the TAIR10 database (if any) need to be incorporated. Another database of Arabidopsis transcription factors is the RIKEN Arabidopsis Transcription Factor database (RARTF) (Iida et al., 2005). A comprehensive collection of biotic and abiotic stress responsive putative transcription factors from Arabidopsis and rice (O. sativa subsp japonica and O. sativa subsp indica) is available at Stress Responsive Transcription Factor Database (STIFDB V2.0) (Shameer et al., 2009).

The Arabidopsis transcription factors are linked to the source references, however. They have not been updated since 2012. The LegumeTFDB houses information of G. max, Lotus japonicas, and Medicago truncatula Arabidopsis transcription factors (Mochida et al., 2010). The Cicer Transcription Factor Database (CicerTransDB) has recently been developed for the legume crop Cicer arietinum L. (Gayali et al., 2016). Other legume-specific TF databases include pigeonpea PpTFDB and Phaseolus vulgaris PvTFDB (Bhawna et al., 2016; Singh et al., 2017).

Database resources for other plants include wDBTF for wheat, FmTFDb for foxtail millet, and TreeTFDB a database of the transcription factors from six economically important trees (Bonthala et al., 2014; Mochida et al., 2013; Romeuf et al., 2010). Although information related to plant Arabidopsis transcription factors is available, variations in the absolute numbers and classification systems are an evidence for the need toward more sophisticated curation. The databases containing information related to plant transcription factors are listed in Table 1.

Plant Organellar Proteomics Databases

The SUBA (SUBcellular location of proteins in Arabidopsis) database serves to coalesce information on protein localization from large-scale organellar proteomics and green fluorescence protein (GFP) localization studies conducted in Arabidopsis (Hooper et al., 2017). The database includes data sets based on various protein localization prediction tools and data retrieved from Swiss-Prot annotation.

The current version SUBA4, the bibliographic references having been last updated in June 2016, houses information for nearly 60,000 experimental protein location claims. The feature SUBAcon determines the consensus location of query proteins from experimental and in silico predictions.

The Plant Proteome Database (PPDB) houses data from organellar proteome studies (Sun et al., 2009). Users can access lists of proteins by choosing a function, biochemical pathway, subcellular location, or a post-translational modification. Eukaryotic Subcellular Localization DataBase (eSLDB) contains the protein subcellular localization information of eukaryotic organisms (Pierleoni et al., 2007). A nonredundant list of 30,600 Arabidopsis proteins is available along with their localization information, if any. However, most of the data available are based upon in silico analysis and the data have not been updated.

ARAMEMNON catalogs the Arabidopsis membrane proteins. The current release includes proteins from nine plant species (A. thaliana, V. vinifera, P. trichocarpa, S. lycopersicum, Cucumis melo, O. sativa, Z. mays, B. distachyon, and M. acuminate) (Schwacke and Flugge, 2018). Features associated with membrane proteins such as probable lipid modifications [glycosylphosphatidylinositol (GPI)-attachment, prenylation, myristoylation, and palmitoylation], details of transmembrane spanning regions (alpha helices and beta barrel positions) and their subcellular location can be displayed based on the outputs of various prediction tools. A dedicated section is reserved for membrane transporters, referred to as plant “permeome.” Although well curated, the database does not include data sets from large-scale mass spectrometric analysis that would aid in enriching the number of representations.

The Arabidopsis Nucleolar Protein database (AtNoPDB) consists of nucleolar proteins identified through proteomics analysis (Brown et al., 2005). The data set has been compared with that of the human complement (Andersen et al., 2005). Although well structured, the incorporation of currently available data from Arabidopsis and other plant species would make it more useful (Gonzalez-Camacho and Medina, 2004; Palm et al., 2016).

The AT_Chloro database was constructed for indexing the Arabidopsis chloroplast proteins identified through mass spectrometry. The knowledge base is enriched by the incorporation of data from three highly enriched subplastidial regions envelope, stroma and thylakoids and subthylakoidal regions grana and stroma-lamellae (Bruley et al., 2012; Ferro et al., 2010; Tomizioli et al., 2014). The AT_CHLORO database is the first accurate mass and time database dedicated to plants. The database is also linked to other databases that allow users to confirm the annotation of query protein(s).

The database was last updated in the year 2015. The mitochondrial proteome of Arabidopsis is available in two different sites, both referred to as the Arabidopsis Mitochondrial Protein Database. In the former, orthologous proteins are compared across a diverse range of organisms to demonstrate the divergence of the plant mitochondrial proteome. The latter contains information gathered from small-scale proteome studies such as 2-DE analysis and sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The WallProtDB has been created as a resource for plant cell wall proteomics (San Clemente and Jamet, 2015). The list of databases containing information related to protein subcellular localization is given in Table 3.

Table 3.

Databases, Tools, and Resources for Functional Proteomics Analysis

Database	Context in which the database is important for plant and integrative biology	Remarks on comparative strengths and limitations	Access by
On plant organellar proteomics
SUBA: SUBcellular location of proteins in Arabidopsis	• Retrieve subcellular localization information for Arabidopsis proteins• Functional enrichment analysis• Coexpression analysis	• Linked to CropPAL database for querying protein subcellular localization in crop plants• Linked to other protein databases• Includes mass spectrometry-based evidence, if any	http://suba3.plantenergy.uwa.edu.au/
PPDB: The Plant Proteome Database	• Retrieve proteomics studies conducted in Arabidopsis, rice, and maize• Retrieve list of subcellular, comparative and PTMs-based proteomics studies• BLAST analysis• Fetch functional annotations for proteins of interest	• Multiple options for querying• Data available for only three plants• Not frequently updated	http://ppdb.tc.cornell.edu/
eSLDB: Eukaryotic Subcellular Localization DataBase	• Fetch proteins based upon experimental or predicted functional annotation• View and download proteins based upon subcellular localization	• Entries are linked to Uniprot• Limited to Arabidopsis only (among plants)• Not updated• Data limited to small set of proteins	http://gpcr.biocomp.unibo.it/esldb/database.htm
ARAMEMNON	• Fetch functional annotations for proteins of interest• Fetch GO terms• Fetch protein sequence• View transmembrane domain regions if any• Predict presence of lipid modifications• Multiple sequence alignment	• Updated regularly• Linked to external databases such as TAIR, NCBVI, Araport, and Uniprot and multiple subcellular localization databases	http://aramemnon.uni-koeln.de/proj_view.ep?id=start
AtNoPDB: Arabidopsis nucleolar protein database	• Fetch Arabidopsis nucleolar proteins• Fetch human ortholog	• Currently accessible	http://bioinf.scri.sari.ac.uk/cgi-bin/atnopdb/home
AT_Chloro	• Fetch Arabidopsis chloroplast proteins• Fetch related functional annotation and view MapMan Bins• View peptides observed in mass spectrometry and their peptides	• Linked to related external databases• Proteins segregated into envelope, thylakoid, grana/stroma categories	http://at-chloro.prabi.fr/at_chloro/
Arabidopsis mitochondrial protein database	• Fetch information regarding Arabidopsis mitochondrial proteins		www.plantenergy.uwa.edu.au/applications/ampdb/index.html
WallProtDB	• Retrieve literature on plant cell wall proteomics• BLAST search to retrieve putative cell wall proteins• Search cell wall proteins identified through mass spectrometry	• Information about experimental condition and plant ecotypes• Linked to ProtAnnDB that houses information such as predicted subcellular localization, FASTA sequence, and predicted sites of PTMs	www.polebio.lrsv.ups-tlse.fr/WallProtDB/
On protein structure and domain databases
CYBIONIX	• A online portal, which houses information regarding multiple databases, sequence alignment, and other bioinformatics tools	• This portal has links to vast majority of several bioinformatics tools and databases under one roof• There is no separate section for plant database	http://cybionix.com/bioinformatics/databases/
Bioinformatics software and tools	• An online portal, which houses information regarding multiple databases, sequence alignment, and other analysis tools	• This portal has links to vast majority of several bioinformatics tools and databases under one roof• This portal has a subsection called Plant Bioinfo. DB, which has links to 47 different plant databases	http://bioinformaticssoftwareandtools.co.in/index.php
Protein data bank	• Database contains information of 3D molecular structures of proteins, DNA, and RNA	• It has 3D molecular structure information of several plant organisms• The 3D structures with low resolutions are not appropriate for the studies• Only 2102 plant protein 3D structures are available	www.ebi.ac.uk/pdbe/node/1
On protein–protein interaction databases
IntAct molecular interaction database	• Retrieve experimentally validated interacting partners for multiple organisms• Retrieve related literature on interaction	• Linked to external databases including NCBI, Uniprot, and protein-protein interaction databases	www.ebi.ac.uk/intact/
PPIM: Protein–protein interaction database for maize	• Retrieve predicted and experimentally validated interacting partners for maize• Pathway analysis• Fetch GO terms and protein domain information		http://comp-sysbio.org/ppim/
Computational system biology			http://comp-sysbio.org/index.html
AtPID: (Arabidopsis thaliana protein interactome database)	• Retrieve predicted and experimentally validated interacting partners for Arabidopsis proteins	• Linked to NCBI and Uniprot• Related GO terms and literature can be fetched	www.megabionet.org/atpid/webfile/
Plant interactome project	• Retrieve interacting partners for Arabidopsis proteins	• Needs further development for robustness	http://signal.salk.edu/interactome.html
STRING database	• Retrieve proteins interacting with protein of interest• Retrieve coexpression data for interacting partners	• Linked to NCBI, Uniprot, PDB, and Kyoto Encyclopedia of Genes and Genomes and other pathway and protein–protein interaction databases• Hosts predicted and experimentally validated interactions	http://string-db.org/cgi/input.pl?UserId=agF0sT02FyDK&sessionId=6BMDmyna8PsZ&input_page_active_form=organisms
BioGRID	• Retrieve proteins interacting with protein of interest• Retrieve related literature	• Hosts experimentally validated interactions• Linked to external databases such as TAIR, Entrez, Ubniprot, and Refseq• Enlists high-throughput and low-throughput studies	https://thebiogrid.org/
AIM: Arabidopsis interactome modules database	• Retrieve integrated interactome data sets	• Integrates predicted and experimentally validated interactions from multiple resources• Interactive viewer	http://probes.pw.usda.gov/AIM/
GeneMANIA	• Retrieve proteins interacting with protein of interest• Retrieve associated literature, GO terms	• Hosts predicted and experimentally validated interactions• Network displayed is structured to differentiate between experimentally validated and predicted interactions, coexpression data, and shared domains	http://genemania.org/
Other resources
OMICtools	• Provides extensive lists of tools for various kinds of analysis such as prediction of protein PTMs, protein structure analysis, targeted proteomics analysis, etc.	• Well-categorized lists of tools	https://omictools.com/
ePlant	• View functional annotation of Arabidopsis genes/proteins• Mutant phenotype information• View tissue and organellar-specific expression	• Linked to various external databases• Multiple data download is relatively slow	http://bar.utoronto.ca/eplant
Araport: The arabidopsis information portal, ThaleMine	• Sequence download for Arabidopsis• Retrieve functional annotations• Fetch GO terms• View publications for genes of interest• Retrieve gene expression data• Fetch interacting partners• Fetch orthologous proteins phytozome• View and order mutants	• Extensive for Arabidopsis• Frequently updated• Open source• Linked to external databases	www.araport.org/
CORNET (CORrelationNETworks)	• Fetch gene coexpression data• Fetch experimental and predicted interacting partners• Fetch related publications	• Data from multiple resources are available	http://bioinformatics.psb.ugent.be/cornet/
Arabidopsis Proteotypic Predictor	• Predicts A. thaliana tryptic peptides, missed cleavage sites, molecular weight and possible methionine oxidation	• The tool can calculate the peptide redundancy and predict whether it is a proteotypic peptide or not• The tool is limited to A. thaliana proteome	www.plantenergy.uwa.edu.au/APP/
MRMaid	• Predicts peptides and their possible product ions for A. thaliana proteins	• Predicts peptide m/z, product m/z, product ion relative intensity, charge (z) and retention time based on available chromatography conditions and instrument option• Limited to A. thaliana and human proteins	http://138.250.31.29/mrmaid/
Cytoscape	• Analysis of protein interaction networks	• Visualize and study the properties of networks• It allows multiple modules for different kinds of studies	www.cytoscape.org/
Other databases
Human protein reference database	• BLAST search can be performed for plant proteins against human proteins• Retrieve information about human orthologs	• Consists of 30,047 proteins, 41,327 protein–protein interactions, 93,710 PTMs, 112,158 protein expression, 22,490 subcellular localizations, and 470 domains of human• This website also includes links to:◂NetPath—For pathway analysis◂Human Proteinpedia—A community portal for sharing and integration of protein data◂PhosphoMotif Finder—Gives information on kinase/phosphatase motifs curated from literatureHuman proteins can be searched using accession numbers of OMIM, Swiss-Prot, Entrez Gene, HGNC, GenProt, PDB, NetPath• Information on protein post-translational modifications are not updated	www.hprd.org/
The human protein atlas	• Retrieve information about human protein orthologs	• The human proteome is divided into three major parts—tissue atlas, cell atlas, and pathology atlas• Human proteome information such as cell graphic, RNA isoform data, RNA gene data, subcellular localization data, pathology data, and normal tissue data can be downloadable• No information available on PTMs and protein-protein interactions	www.proteinatlas.org/

3D, three-dimensional.

Plant Phosphoproteomics Databases

The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt 4.0) currently houses one of the largest collections of protein phosphorylation sites identified through mass spectrometry (Durek et al., 2010). The database is presented as a web application to enable data retrieval. The availability of annotated spectra of phosphopeptides makes it useful for designing targeted proteomics analysis and data verification. The database allows users to access curated information from kinase–target interactions for several kinases and phosphatases (Zulawski et al., 2013). Data from mass spectrometric studies as well as from kinase–substrate interaction studies can also be submitted to the database.

The P3DB initiated with protein phosphorylation data of oilseed rape was later expanded to integrate data from other plant species (Gao et al., 2009). Currently, P3DB 3.0 hosts phosphosites belonging to 16, 477 phosphoproteins. A list of 1186 plant kinases and 159 phosphatases is available in the database. A Basic local alignment search tool (BLAST) utility option is also available to retrieve orthologous phosphoproteins or phosphopeptide sequences. A link to the Musite website is provided to predict the site of phosphorylation in the query sequence. Other extended data include the protein domain, protein–protein interaction, and ontology information along with kinase (or phosphatase)–substrate information Kinase Client Assay (KiC Assay) data, if available (Yao and Xu, 2017; Yao et al., 2014). Users can also choose to submit their data into the respective experimental categories.

The database of Phospho sites in PlanTs (dbPPT) serves as an integrated resource for protein phosphorylation events reported in plants. It contains both manually curated and information from other public databases, including PhosPhAt (Durek et al., 2010) and P3DB (Gao et al., 2009). The database was launched in 2014 (Cheng et al., 2014) and information related to several thousand phosphorylation sites belonging to 20 plant species is currently available (dbPPT 1.0), making it the most comprehensive collection of plant phosphorylation sites to date.

Available data can be retrieved using various search options such as the organism, function, gene name, protein name, taxa ID, UniProt accession number, reference source database, and sequence source database. The phosphopeptide sequence along with the site of phosphorylation, the database source, and the associated reference is provided. Protein sequence along with the sequence annotations and domain information is embedded in the output. The search speed is, however, rather slow. It would be more useful if the data were available in a tabulated downloadable format. The MedicagoPhosphoProtein Database contains mass spectrometry-derived phosphoproteome data of M. Truncatula, a major model legume (Grimsrud et al., 2010). The databases containing information on plant phosphoproteome are listed in Table 2.

Table 2.

Databases and Tools Essential for Post-translational Modification Analysis

Database	Context in which database is important for plant and integrative biology	Remarks on comparative strengths and limitations	Access by
On plant phosphoproteomics
PhosPhAt: Arabidopsis protein phosphorylation site database	• Download Arabidopsis phosphopeptides with experimental evidence• View spectral information• View site of phosphorylation• Fetch kinase targets• Prediction of protein phosphorlation	• Limited to Arabidopsis only	http://phosphat.uni-hohenheim.de/index.html
P3DB: Plant protein phosphorylation dataBase	• Fetch information related to kinases from A. thaliana, Vitis vinifera, Brassica napus, Glycine max, Medicago truncatula, Nicotiana tabacum, Oryza sativa, Solanum tuberosum, Zea mays• View phosphopeptides• View phosphorylated residue• Fetch protein–protein interactors• Fetch kinase and phosphatase substrates	• Linked to external databases such as TAIR, Ensembl Genomes and Uniprot• Linked to Musite for predicting protein phosphorylation• Not updated regularly	www.p3db.org/
PlanTs	• Most extensive resource for plant phosphopeptides		http://dbppt.biocuckoo.org
MedicagoPhosphoProteinDatabase	• Information regarding phosphoprotein, phosphorpeptide, and phosphor-site data specific to Medicago can be fetched easily• BLAST search can be performed against Medicago proteome database using peptide sequence and phosphorylation site motif information	• This repository contains the information on 3457 unique phosphorpeptides with 3404 nonredundant phosphor sites, which belong to 829 proteins• No information on interaction of phosphor proteins of Medicago	www.phospho.medicago.wisc.edu/db/index.php
Computational tools for predicting PTMs
Disorder-enhanced phosphorylation sites predictor (DISPHOS)	• Protein phosphorylation at serine, threonine, and tyrosine can be predicted by using protein sequence in FASTA format• The localization will be scored to express the confidence	• The tool can be used to predict the phosphorylation site against A. thaliana proteome• Only 100 predictions can be performed per IP address per day	www.dabi.temple.edu/disphos/
NetPhosK	• Predicts serine, threonine, and tyrosine phosphorylation site for all the eukaryotic organisms	• The tool can predict the phosphorylation sites on any given protein sequence using ensembles of neural network• Only 2000 sequences can be given as input at a time	www.cbs.dtu.dk/services/NetPhosK/
scan-X	• A motif prediction online tool, where a motif-x data are given as input		http://scan-x.med.harvard.edu/scan-x.html
Musite	• Predicts phosphorylation site for all the eukaryotic organisms	• The input query can be given by either using Uniprot accession number or protein sequence itself• Options such as, modification type, organism, and kinase, are available to improve the prediction• Only 100 protein accession numbers or sequences can be given as query at a time	http://musite.sourceforge.net/
PlantPhos	• Predicts phosphorylation at serine, threonine, and tyrosine residues of plant proteins	• It makes the use of Hidden Markov Models for the prediction, which was built from catalytic kinase motifs• The protein sequence data up to 2 MB can be given as input file	http://csb.cse.yzu.edu.tw/PlantPhos/

FASTA, FAST-All.

Protein Structure and Domain Databases

Few websites such as CYBIONIX and Bioinformatics Software and Tools are particularly useful as they contain useful collections of links to retrieve sequence information as well as to various analysis tools. The Protein Data Bank is an integrated resource of protein structures and other related information. A list of databases and tools for proteomics data analysis is given in Table 3.

Protein–Protein Interaction Databases

The in vivo interactions between protein scan define the set of plant-specific functions. A set of 6200 binary interactions among ∼2700 proteins has been catalogued by the Arabidopsis Interactome Mapping Consortium (Consortium, 2011). Such interactions are also documented in the IntAct Molecular Interaction Database (Orchard et al., 2014). Recently PPIM, a protein–protein interaction database, was created for maize (Zhu et al., 2016). The database Computational System Biology houses a list of plant protein–protein interaction databases such as PPIM and the Database of Interacting Proteins in Oryza sativa (DIPOS) (dedicated to protein interactions in rice).

A. thaliana Protein Interactome Database (AtPID) is a resource wherein protein interactions are integrated with genotype–phenotype associations of Arabidospsis mutants (Cui et al., 2008). The Plant Interactome project hosted by the Salk Institute intends to document protein–protein interactions of Arabidopsis using yeast-2-hybrid and protein array technologies (http://signal.salk.edu/interactome.html).

The STRING database can be used to query protein interactions from multiple species including Arabidopsis. The data include both predicted and those with experimental evidence. Biological General Repository for Interaction Datasets (BioGRID) contains a large collection of protein–protein interactions from several organisms (Chatr-Aryamontri et al., 2017).

The database can be queried using the gene identifier and displayed results include the protein's gene ontology (GO) category and publication from where the information is retrieved. AIM, Arabidopsis Interactome Modules Database, aims at identifying the protein constituents within interactome modules (Wang et al., 2014b). The data are linked to gene expression data sets derived from microarray analysis. The database is also intended to catalog interlogs in various other plant species. GeneMANIA contains data sets from other interactome databases and it enables the user to predict the functions of genes based upon their protein–protein interactions (Warde-Farley et al., 2010). Databases and tools related to plant protein–protein interactions are listed in Table 3.

Computational Tools for Predicting Post-Translational Modifications (PTMs)

Computational tools such as DISPHOS, NetPhosK, KinasePhos, and scan-X have been widely used for predicting protein phosphorylation sites. DISPHOS (www.dabi.temple.edu/disphos/) prediction is based upon the disorder region surrounding the phosphorylation site, whereas NetPhosK allows the predictions of kinase-specific eukaryotic protein phosphorylation sites (Blom et al., 2004). KinasePhos utilizes sequence-based amino acid coupling-pattern analysis and solvent accessibility (Huang et al., 2005). The scan-X tool can predict the kinase recognition motifs within a phosphoproteome data set (Chou and Schwartz, 2011). The tools are, however, generic in nature and have mostly been trained using from human phosphoproteome data sets.

Few tools have been designed for the analysis of phosphorylation sites on plant proteins. Musite is an application that allows the prediction of protein phosphorylation sites using local sequence similarities, protein disorder scores, and amino acid frequencies (Gao et al., 2010). The current version (Musite 1.0) can perform the analysis for 6 eukaryotic organisms including A. thaliana and kinase-specific prediction models for 13 kinases or kinase families. The application can also be downloaded as an open-source standalone tool. The PhosPhat database also has a built-in plant-specific phosphorylation site prediction tool trained to predict phosphorylation on Ser, Thr, and Tyr residues (pSer, pThr, and pTyr).

PlantPhos is a web tool developed for the prediction of phosphorylation sites on plant proteins based upon the recognition of kinase motifs on the substrates (Lee et al., 2011). This tool has been trained using the experimental Arabidopsis phosphorylation data available on TAIR9 database. Significantly conserved motifs are clustered based upon the maximal dependence decomposition. The Rice_Phospho 1.0 was developed to predict protein phophorylation sites in rice (Lin et al., 2015). Bioinformatic tools for predicting other post-translational modifications (PTMs) include computational tools for prediction of lysine acetylation (Basu, 2013; Deng et al., 2016) and prediction of ubiquitination sites (Chen et al., 2015; Walton et al., 2016). The list of software tools for performing PTM prediction on plant proteins is given in Table 2.

Other Computational Tools

The OMIC tools contain a suite of links to various software tools that are useful for proteomics data interpretation (Henry et al., 2014). The newly developed platform ePlant allows users to seamlessly navigate to available data on Arabidopsis, including genome, proteome, interactome, transcriptome, and three-dimensional molecular structure data (Waese et al., 2017). Along with download option, it is also useful data for data visualization. The Arabidopsis Information Portal (Araport) is a resource developed for plant biology with special emphasis on integrative data analysis (Krishnakumar et al., 2015). ThaleMine developed by the Araport team provides access to data available on Arabidopsis such as RNA-seq and array expression, coexpression, protein interactions, homologs, pathways, publications, alleles, germplasm, and phenotypes (Krishnakumar et al., 2017).

Another tool that allows integrative data analysis is the CORNET (CORrelation NETworks). The platform allows access to coexpression data, protein–protein interactions, regulatory interactions, and functional annotations (Van Bel and Coppens, 2017). The Arabidopsis Proteotypic Predictor is a web-based tool that enables users to select candidate transitions for Selected Reaction Monitoring (Taylor et al., 2014). Another such web-based tool is MRMaid that serves as resource for transitions for a subset of Arabidopsis proteins (Fan et al., 2012). Tools for molecular network visualization and data integration include Cytoscape and all the plugins contained therein (Killcoyne et al., 2009).

Future Perspectives

The dawn of big data in proteomics is changing the landscape of systems biology research practices. Both emerging and established investigators need access to a wide a range of databases in plant proteomics, together with the knowledge of their comparative strengths and limitations. The present review has attempted to address this knowledge gap in the field.

It is noteworthy that plant biology has historically and markedly lagged behind systems biology studies in animals and humans. However, there are signs of positive change as evidenced in the present review and the rise of big data in plant proteomics. Still, the data submitted to public data repositories for public access remain scattered across various databases and websites. There is a need for well-designed and curated protein databases akin to the HPRD and the Human Protein Atlas.

Also, there exists a severe shortage of plant databases for biological pathways such as NetPath (Kandasamy et al., 2010). Detailed knowledge of the stress signaling pathways and those involved in plant secondary metabolite synthesis is important to target molecular components for improved stress tolerance or yields. Finally, we suggest that our future efforts must also be invested in integrating the available databases to allow for multiomics data analysis, research, and development in the field of plant biology.

Footnotes

Acknowledgments

The authors thank Yenepoya (Deemed to be University) for full access to the instrumentation facility. P. S. is funded by the Early Career Research Award (Award no.: ECR/2016/000365) from Science & Engineering Research Board SERB), Government of India. C.N.K. is a recipient of Junior Research Fellowship in the OLAV THON Foundation funded grant at Yenepoya.

Author Disclosure Statement

The authors declare they have no financial conflicts of interest.

Abbreviations Used

References

Akiyama

, Kurotani

, Iida

, Kuromori

, Shinozaki

, and Sakurai

. (2014). RARGE II: An integrated phenotype database of Arabidopsis mutant traits using a controlled vocabulary. Plant Cell Physiol, 55, e4.

Albenne

, Canut

, and Jamet

. (2013). Plant cell wall proteomics: The leadership of Arabidopsis thaliana. Front Plant Sci, 4, 111.

Andersen

, Lam

, Leung

, et al. (2005). Nucleolar proteome dynamics. Nature, 433, 77–83.

Basu

. (2013). Computational prediction of lysine acetylation proteome-wide. Methods Mol Biol, 981, 127–136.

Bhawna, Bonthala

, and Gajula

. (2016). PvTFDB: A Phaseolus vulgaris transcription factors database for expediting functional genomics in legumes. Database 2016, pii:. baw114.

Blom

, Sicheritz-Ponten

, Gupta

, Gammeltoft

, and Brunak

. (2004). Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 4, 1633–1649.

Bonthala

, Muthamilarasan

, Roy

, and Prasad

. (2014). FmTFDb: A foxtail millet transcription factors database for expediting functional genomics in millets. Mol Biol Rep, 41, 6343–6348.

Brown

, Shaw

, and Marshall

. (2005). Arabidopsis nucleolar protein database (AtNoPDB). Nucleic Acids Res, 33, D633–D636.

Bruley

, Dupierris

, Salvi

, Rolland

, and Ferro

. (2012). AT_CHLORO: A chloroplast protein database dedicated to sub-plastidial localization. Front Plant Sci, 3, 205.

10.

Chatr-Aryamontri

, Oughtred

, Boucher

, et al. (2017). The BioGRID interaction database: 2017 update. Nucleic Acids Res, 45, D369–D379.

11.

Chen

, Zhou

, Zhang

, and Song

. (2015). Towards more accurate prediction of ubiquitination sites: A comprehensive review of current methods, tools and features. Brief Bioinform, 16, 640–657.

12.

Cheng

, Deng

, Wang

, Ren

, Liu

, and Xue

. (2014). dbPPT: A comprehensive database of protein phosphorylation in plants. Database, 2014, bau121.

13.

Chou

, and Schwartz

. (2011). Using the scan-x Web site to predict protein post-translational modifications. Curr Protoc Bioinformatics Chapter 13, Unit 13.16.

14.

Consortium AIM. (2011). Evidence for network evolution in an Arabidopsis interactome map. Science, 333, 601–607.

15.

Cui

, Li

, et al. (2008). AtPID: Arabidopsis thaliana protein interactome database—An integrative platform for plant systems biology. Nucleic Acids Res, 36, D999–D1008.

16.

Deng

, Wang

, Zhang

, et al. (2016). GPS-PAIL: Prediction of lysine acetyltransferase-specific modification sites from protein sequences. Sci Rep, 6, 39787.

17.

Duncan

, Trosch

, Fenske

, Taylor

, and Millar

. (2017). Resource: Mapping the Triticum aestivum proteome. Plant J, 89, 601–616.

18.

Durek

, Schmidt

, Heazlewood

, et al. (2010). PhosPhAt: The Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res, 38, D828–D834.

19.

Fan

, Mohareb

, Jones

, and Bessant

. (2012). MRMaid: The SRM assay design tool for Arabidopsis and other species. Front Plant Sci, 3, 164.

20.

Feng

, Wang

, Lu

, Zhang

, and Han

. (2017). Proteomics analysis reveals a dynamic diurnal pattern of photosynthesis-related pathways in maize leaves. PLoS One, 12, e0180670.

21.

Ferro

, Brugiere

, Salvi

, et al. (2010). AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins. Mol Cell Proteomics, 9, 1063–1084.

22.

Gao

, Agrawal

, Thelen

, and Xu

. (2009). P3DB: A plant protein phosphorylation database. Nucleic Acids Res, 37, D960–D962.

23.

Gao

, Thelen

, Dunker

, and Xu

. (2010). Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteomics, 9, 2586–2600.

24.

Gayali

, Acharya

, Lande

, Pandey

, Chakraborty

, and Chakraborty

. (2016). CicerTransDB 1.0: A resource for expression and functional study of chickpea transcription factors. BMC Plant Biol, 16, 169.

25.

Gnad

, Gunawardena

, and Mann

. (2011). PHOSIDA 2011: The posttranslational modification database. Nucleic Acids Res, 39, D253–D260.

26.

Gonzalez-Camacho

, and Medina

. (2004). Identification of specific plant nucleolar phosphoproteins in a functional proteomic analysis. Proteomics, 4, 407–417.

27.

Grimsrud

, den Os

, Wenger

, et al. (2010). Large-scale phosphoprotein analysis in Medicago truncatula roots provides insight into in vivo kinase activity in legumes. Plant Physiol, 152, 19–28.

28.

Guo

, He

, Liu

, et al. (2005). DATF: A database of Arabidopsis transcription factors. Bioinformatics, 21, 2568–2569.

29.

Harper

, Gardiner

, Andorf

, and Lawrence

. (2016). MaizeGDB: The maize genetics and genomics database. Methods Mol Biol, 1374, 187–202.

30.

Hartl

, Fussl

, Boersema

, et al. (2017). Lysine acetylome profiling uncovers novel histone deacetylase substrate proteins in Arabidopsis. Mol Syst Biol, 13, 949.

31.

Henry

, Bandrowski

, Pepin

, Gonzalez

, and Desfeux

. (2014). OMICtools: An informative directory for multi-omic data analysis. Database (Oxford). 2014.

32.

Hooper

, Castleden

, Tanz

, Aryamanesh

, and Millar

. (2017). SUBA4: The interactive data analysis centre for Arabidopsis subcellular protein locations. Nucleic Acids Res, 45, D1064–D1074.

33.

Hopff

, Wienkoop

, and Luthje

. (2013). The plasma membrane proteome of maize roots grown under low and high iron conditions. J Proteomics, 91, 605–618.

34.

Huang

, Lee

, Tzeng

, and Horng

. (2005). KinasePhos: A web tool for identifying protein kinase-specific phosphorylation sites. Nucleic Acids Res, 33, W226–W229.

35.

Huang

, Su

, Kao

, et al. (2016). dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res, 44, D435–D446.

36.

Iida

, Seki

, Sakurai

, et al. (2005). RARTF: Database and tools for complete sets of Arabidopsis transcription factors. DNA Res, 12, 247–256.

37.

Jia

, Sun

, Li

, and Zhang

. (2017). An integrated analysis of protein abundance, transcript level and tissue diversity to reveal developmental regulation of maize. J Proteome Res, 17, 822–833.

38.

Jin

, Zhang

, Kong

, Gao

, and Luo

. (2014). PlantTFDB 3.0: A portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res, 42, D1182–D1187.

39.

Kandasamy

, Mohan

, Raju

, et al. (2010). NetPath: A public resource of curated signal transduction pathways. Genome Biol, 11, R3.

40.

Kawahara

, de la Bastide

, Hamilton

, et al. (2013). Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y), 6, 4.

41.

Keshava Prasad

, Goel

, Kandasamy

, et al. (2009). Human Protein Reference Database—2009 update. Nucleic Acids Res, 37, D767–D772.

42.

Killcoyne

, Carter

, Smith

, and Boyle

. (2009). Cytoscape: A community-based framework for network modeling. Methods Mol Biol, 563, 219–239.

43.

Kim

, Pinto

, Getnet

, et al. (2014). A draft map of the human proteome. Nature, 509, 575–581.

44.

Krishnakumar

, Contrino

, Cheng

, et al. (2017). ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery. Plant Cell Physiol, 58, e4.

45.

Krishnakumar

, Hanlon

, Contrino

, et al. (2015). Araport: The Arabidopsis information portal. Nucleic Acids Res, 43, D1003–D1009.

46.

Kudo

, Kobayashi

, Terashima

, et al. (2017). TOMATOMICS: A web database for integrated omics information in tomato. Plant Cell Physiol, 58, e8.

47.

Kumar

, Khare

, Sharma

, and Wani

. (2017). Engineering crops for future: A phosphoproteomics approach. Curr Protein Pept Sci, 19, 413–426.

48.

Lee

, Taylor

, and Millar

. (2013). Recent advances in the composition and heterogeneity of the Arabidopsis mitochondrial proteome. Front Plant Sci, 4, 4.

49.

Lee

, Bretana

, and Lu

. (2011). PlantPhos: Using maximal dependence decomposition to identify plant phosphorylation sites with substrate site specificity. BMC Bioinformatics, 12, 261.

50.

Lei

, Dai

, Watson

, Zhao

, and Sumner

. (2011). A legume specific protein database (LegProt) improves the number of identified peptides, confidence scores and overall protein identification success rates for legume proteomics. Phytochemistry, 72, 1020–1027.

51.

Lin

, Song

, Tao

, et al. (2015). Rice_Phospho 1.0: A new rice-specific SVM predictor for protein phosphorylation sites. Sci Rep, 5, 11940.

52.

Lloyd

, and Meinke

. (2012). A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis. Plant Physiol, 158, 1115–1129.

53.

Mitsuda

, and Ohme-Takagi

. (2009). Functional analysis of transcription factors in Arabidopsis. Plant Cell Physiol, 50, 1232–1248.

54.

Mochida

, Yoshida

, Sakurai

, Yamaguchi-Shinozaki

, Shinozaki

, and Tran

. (2010). LegumeTFDB: An integrative database of Glycine max, Lotus japonicus and Medicago truncatula transcription factors. Bioinformatics, 26, 290–291.

55.

Mochida

, Yoshida

, Sakurai

, Yamaguchi-Shinozaki

, Shinozaki

, and Tran

. (2013). TreeTFDB: An integrative database of the transcription factors from six economically important tree crops for functional predictions and comparative and functional genomics. DNA Res, 20, 151–162.

56.

Narula

, Datta

, Chakraborty

, and Chakraborty

. (2013). Comparative analyses of nuclear proteome: Extending its function. Front Plant Sci, 4, 100.

57.

Orchard

, Ammari

, Aranda

, et al. (2014). The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res, 42, D358–D363.

58.

Palm

, Simm

, Darm

, et al. (2016). Proteome distribution between nucleoplasm and nucleolus and its relation to ribosome biogenesis in Arabidopsis thaliana. RNA Biol, 13, 441–454.

59.

Pierleoni

, Martelli

, Fariselli

, and Casadio

. (2007). eSLDB: Eukaryotic subcellular localization database. Nucleic Acids Res, 35, D208–D212.

60.

Reiser

, Subramaniam

, Li

, and Huala

. (2017). Using the Arabidopsis information resource (TAIR) to find information about Arabidopsis genes. Curr Protoc Bioinformatics, 60, 1.11.11–1.11.45.

61.

Romeuf

, Tessier

, Dardevet

, Branlard

, Charmet

, and Ravel

. (2010). wDBTF: An integrated database resource for studying wheat transcription factor families. BMC Genomics, 11, 185.

62.

San Clemente

, and Jamet

. (2015). WallProtDB, a database resource for plant cell wall proteomics. Plant Methods, 11, 2.

63.

Schwacke

, and Flugge

. (2018). Identification and characterization of plant membrane proteins using ARAMEMNON. Methods Mol Biol, 1696, 249–259.

64.

Seren

, Grimm

, Fitz

, et al. (2017). AraPheno: A public database for Arabidopsis thaliana phenotypes. Nucleic Acids Res, 45, D1054–D1059.

65.

Shameer

, Ambika

, Varghese

, Karaba

, Udayakumar

, and Sowdhamini

. (2009). STIFDB-Arabidopsis stress responsive transcription factor database. Int J Plant Genomics, 2009, 583429.

66.

Shiu

, and Bleecker

. (2001). Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proc Natl Acad Sci U S A, 98, 10763–10768.

67.

Singh

, Sharma

, Singh

, and Sharma

. (2017). PpTFDB: A pigeonpea transcription factor database for exploring functional genomics in legumes. PLoS One, 12, e0179736.

68.

Sun

, Zybailov

, Majeran

, Friso

, Olinares

, and van Wijk

. (2009). PPDB, the plant proteomics database at Cornell. Nucleic Acids Res, 37, D969–D974.

69.

Szymanski

, Levin

, Savidor

, et al. (2017). Label-free deep shotgun proteomics reveals protein dynamics during tomato fruit tissues development. Plant J, 90, 396–417.

70.

Takac

, Samajova

, and Samaj

. (2017). Integrating cell biology and proteomic approaches in plants. J Proteomics, 169, 165–175.

71.

Tan

, Lim

, and Lau

. (2017). Proteomics in commercial crops: An overview. J Proteomics, 169, 176–188.

72.

Taylor

, Fenske

, Castleden

, Tomaz

, Nelson

, and Millar

. (2014). Selected reaction monitoring to determine protein abundance in Arabidopsis using the Arabidopsis proteotypic predictor. Plant Physiol, 164, 525–536.

73.

Tchieu

, Fana

, Fink

, et al. (2003). The PlantsP and PlantsT functional genomics databases. Nucleic Acids Res, 31, 342–344.

74.

Thul

, Akesson

, Wiking

, et al. (2017). A subcellular map of the human proteome. Science. 356.

75.

Tomizioli

, Lazar

, Brugiere

, et al. (2014). Deciphering thylakoid sub-compartments using a mass spectrometry-based approach. Mol Cell Proteomics, 13, 2147–2167.

76.

Uhlen

, Fagerberg

, Hallstrom

, et al. (2015). Proteomics. Tissue-based map of the human proteome. Science, 347, 1260419.

77.

Van Bel

, and Coppens

. (2017). Exploring plant co-expression and gene-gene interactions with CORNET 3.0. Methods Mol Biol, 1533, 201–212.

78.

Waese

, Fan

, Pasha

, et al. (2017). ePlant: Visualizing and exploring multiple levels of data for hypothesis generation in plant biology. Plant Cell, 29, 1806–1821.

79.

Walton

, Stes

, Cybulski

, et al. (2016). It's time for some “site”-seeing: Novel tools to monitor the ubiquitin landscape in Arabidopsis thaliana. Plant Cell, 28, 6–16.

80.

Wang

, Liu

, Cheng

, et al. (2014a). EKPD: A hierarchical database of eukaryotic protein kinases and protein phosphatases. Nucleic Acids Res, 42, D496–D502.

81.

Wang

, Thilmony

, Zhao

, Chen

, and Gu

. (2014b). AIM: A comprehensive Arabidopsis interactome module database and related interologs in plants. Database, 2014, bau117.

82.

Warde-Farley

, Donaldson

, Comes

, et al. (2010). The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res, 38, W214–W220.

83.

Yao

, Ge

, Wu

, et al. (2014). P (3)DB 3.0: From plant phosphorylation sites to protein networks. Nucleic Acids Res, 42, D1206–D1213.

84.

Yao

, and Xu

. (2017). Bioinformatics analysis of protein phosphorylation in plant systems biology using P3DB. Methods Mol Biol, 1558, 127–138.

85.

Yilmaz

, Mejia-Guerra

, Kurz

, Liang

, Welch

, and Grotewold

. (2011). AGRIS: The Arabidopsis gene regulatory information server, an update. Nucleic Acids Res, 39, D1118–D1122.

86.

Zhen

, Deng

, Wang

, et al. (2016). First comprehensive proteome analyses of lysine acetylation and succinylation in seedling leaves of Brachypodium distachyon L. Sci Rep, 6, 31576.

87.

Zhu

, Wu

, Xu

, et al. (2016). PPIM: A protein-protein interaction database for maize. Plant Physiol, 170, 618–626.

88.

Zulawski

, Braginets

, and Schulze

. (2013). PhosPhAt goes kinases—Searchable protein kinase target information in the plant phosphorylation site database PhosPhAt. Nucleic Acids Res, 41, D1176–D1184.

89.

Zulawski

, Schulze

, Braginets

, Hartmann

, and Schulze

. (2014). The Arabidopsis Kinome: Phylogeny and evolutionary insights into functional diversification. BMC Genomics, 15, 548.