In Silico Prediction of Drug Targets in Phytopathogenic Pseudomonas syringae pv. phaseolicola: Charting a Course for Agrigenomics Translation Research

Abstract

Pseudomonas syringae pv. phaseolicola is a major plant pathogen causing halo blight disease and has world-wide importance. The emerging post-genomics field of agrigenomics, together with the availability of whole genome sequences of a number of pathogens and host organisms, offer the promise for identification of potential drug targets using sequence comparison approaches. On the other hand, lack of gene expression data for most of the phytopathogenic microbes still remains a formidable barrier. The present study aimed at the prediction of drug targets in Pseudomonas syringae pv. phaseolicola by exploiting the knowledge of Codon Usage bias for gene expression subtractively, supported by gene expression analysis and sequence comparisons. Based on screening of the Database of Essential Genes using blastx, 158 of the total 5172 genes of P. syringae pv. phaseolicola were enlisted as vitally essential genes. Similarity search for these 158 essential genes against available host–plant sequences (Phaseolous vulgaris) led to the identification of homologues of 21 genes in the host genome, thus leaving behind a subset of 137 genes. Expression analysis of these 137 genes using RSCU_gene, validated by microarray gene expression data suggested 22 genes had higher expression levels in the cell, and therefore their products have been identified as putative drug targets. The gene ontology analysis of these 22 genes revealed their indispensable roles in pivotal metabolic pathways of P. syringae pv. phaseolicola. Upon comparison of the sequences of these genes with other soil bacteria, we identified two genes that were unique to P. syringae pv. phaseolicola. The products of these genes can potentially be utilized for drug development so as to control the halo blight disease and thereby accelerate translation research in the nascent field of agrigenomics.

Introduction

P seudomonas syringae is a model in plant pathology as it infects a wide range of plants, including the model plant Arabidopsis thaliana (Arnold et al., 2011). Some of the diseases caused in major crops include bacterial speck in tomatoes (Pseudomonas syringae pv. tomato DC3000), brown spot (Pseudomonas syringae pv. syringae), and halo blight in beans (Pseudomonas syringae pv. phaseolicola) (Rico et al., 2011). It is primarily seed-borne, but can also be spread from plant to plant by rain. Halo blight disease in the common bean (Phaseolus vulgaris) is a devastating disease worldwide and has remained difficult to control. Different races of the pathogen (P. syringae pv. phaseolicola) cause either disease symptoms or a hypersensitive response in a series of differentially reacting bean cultivars. The molecular genetics of the co-evolution of the pathogen and the host have been investigated in depth, leading to important discoveries in the field of plant–microbe interactions (Arnold et al., 2011). In recent years, a number of pathogen avirulence genes, type III secretion system, and corresponding host resistance genes have been cloned and identified (Godfrey et al., 2011; Lovell et al., 2011; Vencato et al., 2006).

Extensive genetic and genomic resources are available for P. syringae pv. phaseolicola (http://www.pseudomonas-syringae.org/psp_home.html; Joardar et al., 2005), providing an ideal opportunity to elucidate mechanisms of pathogenecity, plant host susceptibility/resistance, and virulence determinants (Godfrey et al., 2010). The implications will also be useful for discovery of agricultural bactericide discovery (Hernandez-Morales et al., 2009). In recent years, various computational methods have been developed using either protein structure (Rajendran et al., 2012) or genome-based information (Bakheet and Doig, 2010; Katara et al., 2011) for prediction of drug targets in various human pathogenic bacteria. However, no such efforts for predicting drug targets in phytopathogens have been reported, primarily because of the unavailability of gene expression data from diseased conditions. The efficacy of current bactericides and fungicides can be increased in the future without affecting the yield of crops if they are designed on meticulous approaches based on gene expression data. Here we attempt to predict drug target for P. syringae pv. phaseolicola, a well-known phytopathogenic bacteria causing halo blight disease in Phaseolus vulgaris and responsible for huge economic loss worldwide. For such analysis, we developed an innovative approach using sequence similarities and codon usage biasing, subtractively. Following Katara et al. (2011), we presume that drug targets are essential for the growth and viability but highly selective as well for the pathogen with respect to the host. We have also considered their availability for interaction with the drugs (biased for their expression). To the best of our knowledge, such approaches have for the first time been applied for predicting drug targets on a phytopathogen.

Materials and Methods

Sequence resources

Gene sequences of the P. syringae pv. phaseolicola were downloaded from GenBank (ftp://ftp.ncbi.nlm.nih.gov), and available sequences of its host plant Phaseolus vulgaris and nitrogen fixing bacteria Rhizobium etli were used from online resources as database against blast using NCBI-blast facilities (http://www.ncbi.nlm.nih.gov/blast).

Identification of genes of interest

The complete sets of genes of the pathogen were subjected to blastx (Altschul et al., 1997) against the essential genes of γ-proteobacteria available at Database of Essential Genes (DEG; http://tubic.tju.edu.cn/deg/; Zhang and Lin, 2009). A random expectation value (E-value) cut-off of 0.001 and a minimum bit-score cut-off of 100 were used as the baseline to identify the housekeeping genes in the pathogen. These housekeeping genes belonging to the pathogen were subjected to blastn against their host–plant EST-sequence at the NCBI server (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The homologs between pathogen and their host were excluded and the lists of nonhomologs were compiled and considered as genes of interest.

Prediction of Relative Synonymous Codon Usage

Complete sets of essential genes that did not share significant similarities with the ESTs of the target host were subjected to CAIcal (Puigbo et al., 2008), using Eubacterial genetic code to predict nucleotide composition, relative synonymous codon usage (RSCU), and codon usage (Agarwal and Grover, 2008; Hassan et al., 2009; Sharp and Li, 1987). These RSCU were further subjected to geometric mean to predict RSCU_gene (Relative Synonymous Codon Usage for complete gene) for all genes [Eq. 2]. \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} { \rm RSCU}_{xyz} = f_{obs} ({\rm xyz}) / f_{ \exp} ( \rm xyz ) \quad\quad \rm [ Eq. \rm 1 ] \end{align*} \end{document} \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}{ \rm RSCU}_{ \rm gene} = \left( \prod^{ \rm L}_{l = 1} \rm {RSCU}_{xyz} \ ( l ) \right) ^{1 / { \rm L}} \quad\quad\rm [ Eq. \ 2 ]\end{align*} \end{document}

Where f_obs (xyz) is observed frequency of a codon xyz for ‘X’ amino acid, f_exp (xyz) is expected frequency (if all synonymous codon for ‘X’ amino acid were used equally) for codon xyz, L is the length, in codon, of a particular gene.

In order to measure the nonuniformity of codon usage within a group of synonymous codon for a gene, Effective number of codon (Nc) (Fuglsang, 2004; Wright, 1990), was calculated by using Codon usage information [Eq. 3], \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*}{ \rm Nc} = { \rm Codon}_{ \max} - { \rm Codon}_{ \rm gene} \quad\quad \rm [Eq. \ 3]\end{align*}\end{document}

where Codon_max is a number of maximum available codon except stop codon, and Codon_gene is a number of codons used in a gene.

Expression of those genes that were suggested as overexpressed genes by the help of Codon Usage biasing was also validated against cDNA microarray data observations, available at Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo/) under series GSE14625, GSE14983, and GSE14998 (Hernandez-Morales et al., 2009).

Annotation

The finally selected genes (probable drug targets) were subjected to Blast2GO (Conesa et al. 2005) to check their respective metabolic pathway, cellular and functional annotation.

Subtraction of genes shared by nitrogen fixing bacteria

In order to eliminate the undesired targeting of symbiotic R. etli, the finally selected genes from P. syringae pv. phaseolicola were compared for sequence similarities with protein sequences of R. etli, using NCBI-blastx (Altschul et al. 1997).

Results

Genes of interest

Similarity search for a complete set of genes in DEG led us to identify 158 essential genes for P. syringae pv. phaseolicola. However, 21 of these were found sharing considerable sequence similarity with nucleotide sequence of P. vulgaris and were thus eliminated from further analysis. The remaining 137 genes built our collection of ‘genes of interest’.

RSCU and Nc of genes of interest

All the genes could be classified into three categories: high expression (RSCU_gene ≥1.1), moderate expression (RSCU_gene ≥0.9) and low expression (RSCU_gene ≤0.89). Evidently, only 16% (22 genes) of the genes fell into first category of high expression (Table 1). All of these genes also had gene expression ratio (Log₂) ranging between+0.316 and+1.533, based on microarray data analysis (Table 1). A fairly equal number of remaining genes could be mapped to the remaining two categories of moderate and low expression.

Table 1.

Statistics of Highly Expressed Genes

S. No.	Gene ID	Product name	Locus_tag	RSCU	Nc	Gene expression ratio (Log₂) ^*
1	3555703	30S ribosomal protein S13	PSPPH_4570	1.46	35	1.533
2	3559360	Nucleoside diphosphate kinase	PSPPH_1316	1.42	31	0.783
3	3560114	Scaffold protein	PSPPH_1310	1.34	37	0.893
4	3556741	30S ribosomal protein S12	PSPPH_4597	1.34	37	1.186
5	3557914	30S ribosomal protein S20	PSPPH_0718	1.3	33	1.06
6	3556008	30S ribosomal protein S9	PSPPH_4125	1.3	35	1.076
7	3558615	Ribonuclease H	PSPPH_1712	1.281	38	0.67
8	3558452	50S ribosomal protein L22	PSPPH_4587	1.243	38	0.746
9	3559333	50S ribosomal protein L29	PSPPH_4584	1.21	40	0.8
10	3559908	50S ribosomal protein L1	PSPPH_4602	1.209	39	0.946
11	3560210	Inorganic pyrophosphatase	PSPPH_4672	1.204	40	0.386
12	3557790	50S ribosomal protein L13	PSPPH_4126	1.148	40	1.2
13	3555949	30S ribosomal protein S7	PSPPH_4596	1.141	40	0.756
14	3558283	Iron-sulfur cluster-binding protein	PSPPH_4343	1.126	44	0.796
15	3559215	RNA polymerase-binding protein DksA	PSPPH_0861	1.125	43	0.826
16	3555815	Shikimate kinase	PSPPH_0395	1.124	43	0.566
17	3558487	Electron transfer flavoprotein subunit beta	PSPPH_1964	1.12	42	0.576
18	3558018	30S ribosomal protein S11	PSPPH_4569	1.109	44	1.3
19	3557299	50S ribosomal protein L6	PSPPH_4577	1.109	44	0.732
20	3555832	FKBP-type peptidyl-prolyl cis-trans isomerase	PSPPH_0723	1.1	40	0.553
21	3560375	Succinate dehydrogenase, cytochrome b556 subunit	PSPPH_1976	1.1	45	0.316
22	3558843	50S ribosomal protein L3	PSPPH_4592	1.1	44	0.586

Mean of gene expression values (test/normal) from cDNA microarray experiments (A: GSE14625, B: GSE14983, and C: GSE14998). All values are in positive “+” scale, thus indicates high expression of these genes.

For all the genes analyzed, Nc values ranged from 31 to 59, but for highly expressed genes, it ranged from 31 to 45 (Table 1). Pattern analysis for Nc and RSCU_gene suggested that there is a strong negative correlation between Nc and RSCU_gene (Fig. 1) with Pearson coefficient of correlation (r) being −0.89.

FIG. 1.

Relation between Nc and RSCU_gene. (A) Low expressed genes; (B) moderate gene; (C) high expressed gene.

Gene ontology and subtractive comparative genomics against R. elti

The 22 genes shortlisted as targets were further subjected to gene ontology analysis using blast2GO (Table 2). More than 45% of the shortlisted genes were found coding for the proteins involved in metabolic processes of biopolymers, including cellular metabolic process, macromolecule metabolic process, and nitrogen compound metabolic process (i.e., nucleic acids and proteins). The rest of the gene products were found associated with some other vital biological processes, including biosynthesis and biogenesis processes (Fig. 2).

FIG. 2.

Summary of the biological process in which products of the predicted targets are involved.

Table 2.

Summary of Predicted Metabolic Pathway Using KEGG Pathway Analysis

S. No.	Seq ID	Enzyme/Enzyme ID	Pathways	Pathway ID
1	PSPPH_0395	Shikimate kinase/EC:2.7.1.71	Phenylalanine, tyrosine, and tryptophan biosynthesis	Path.map00400
			Metabolic pathways	Path.map01100
			Biosynthesis of alkaloids derived from shikimate pathway	Path.map01063
			Biosynthesis of phenylpropanoids	Path.map01061
			Biosynthesis of secondary metabolites	Path.map01110
2	PSPPH_1316	Nucleoside-diphosphate kinase/EC:2.7.4.6	Metabolic pathways	Path.map01100
			Pyrimidine metabolism	Path.map00240
			Purine metabolism	Path.map00230
			Biosynthesis of secondary metabolites	Path.map01110
3	PSPPH_4672	Endopolyphosphatase/EC:3.6.1.1	Oxidative phosphorylation	Path.map00190

In correspondence with the above results, the highest numbers of gene products were found carrying binding activity. Around one-fourth of the genes encoded structural constituents of the ribosome (Fig. 3). Most of the gene products were mapped as a part of the cell or organelle, and the rest of products mapped to intracellular locations (Fig. 4).

FIG. 3.

Molecular function of the products of the predicted drug target genes.

FIG. 4.

Summary of the cellular components where product of the predicted target genes are localized.

Ten pathways, as listed in Table 2, were identified in which three of the ‘target gene products' participate. For five of the gene products (encoded by PSPPH_1310, PSPPH_4343, PSPPH_0861, PSPPH_1964, and PSPPH_1976) no enzyme codes were found, and for the remaining, no corresponding data could be curated from KEGG database. Whereas 20 of the shortlisted gene homologs could be traced in R. elti, for two gene products (PSPPH_1310 and PSPPH_0723), no homolog could be found.

Discussion

The drug target identification is an important and sensitive first step in the drug discovery process that must satisfy various selection criteria to pass for the next stage (Hefti, 2008; Lipinski et al., 2001). Identification of microbe-specific ‘essential’ proteins that can be targeted while designing drugs is a possible way to control the crop diseases. We used gene sequence information for prediction of drug targets using subtractive genomics in P. syringae pv. phaseolicola, which was followed by a codon usage approach to evaluate the gene expression level. We found that the combination of these two approaches (sequence pattern and codon biasing) provides a good method to find drug targets. The combined approach used here is different from earlier methods (Bakheet and Doig, 2010; Katara et al., 2011; Sakharkar et al., 2008) and also uses a second subtraction step to avoid accidental targeting of ‘friendly species'. Extensive literature surveys suggest that such methods have never been applied earlier for phytopathogens and plant hosts.

On the lines of Sakharkar et al. (2008) and Katara et al. (2011), 137 genes that are essential for P. syringae pv. phaseolicola, and do not share any significant similarity with the host genes, were identified as ‘genes of interest’. In addition to being a gene essential and unique to the pathogen, it also needs to have high expression to be of interest to plant pathologists. A highly expressing gene would code for a sufficiently high number of protein molecules available as drug targets. A protein, against which a drug is designed, should be available in sufficient stoichiometric amounts in the cells at all time points across the diversity of life stages and stress conditions. We determined 22 genes as highly expressing genes based on RSCU_gene (≥1.1) analysis and verified based on microarray data analysis (Table 1). Analysis of available cDNA microarray data for expression of genes of P. syringae pv. phaseolicola under series GSE14625 (response to bean pod extract), GSE14983 (response to apoplastic fluid of bean leaf), and GSE14998 (response to bean leaf extract) (Hernandez-Morales et al., 2009), verified the theoretical statistical results for these 22 genes obtained through RSCU analysis (Table 1), and thus all of these can be used as effective targets for control of P. syringae pv. phaseolicola.

An effective number of codons (Nc) in any given gene signifies the use of a specific number of synonymous codons for their respective amino acids. When all the sense codons are used randomly, Nc has a value of 61. Similarly, it takes an extreme value of 20 when only one synonym is used for each amino acid. Thus, the lower the Nc value, the greater are the chances of codon bias and vice versa (Hassan et al., 2009). In the present study, a strong negative correlation was obtained between RSCU_gene and Nc, suggesting that both these tests in combination can be used as a powerful tool for prediction of gene expression levels, especially in cases where microarray data are not available. We have further been successful in annotation of metabolic pathways for three of the 22 enzymes/proteins identified as drug targets (Table 2). Enzymes such as shikimate kinase and nucleoside diphosphate kinase, because of their indispensable utility being involved in more than one pathway, are ideal drug targets (Saidemberg et al., 2011).

It must be remembered that a drug designed to target an enzyme, which is universal in nature, can also lead to the elimination of friendly and symbiotic microbial species such as Rhizobium sp., in addition to the pathogen. In the present context, however, it must be remembered that for the control of P. syringae pv. phaseolicola, drug must be applied to seeds (Arnold et al., 2011), while the friendly and symbiotic R. etli to Phaseolus vulgaris inhabits the rhizosphere. Nevertheless, in order to further eliminate the threats of accidental targeting of R. etli by inappropriate and alternative modes of drug application (like spraying on standing crop), we compared probable target genes through NCBI_blastx with the proteome of R. etli. For two proteins out of 22 (i.e., scaffold protein and FKBP-type peptidyl-prolyl cis-trans isomerase), no significant matches were obtained between P. syringae pv. phaseolicola and R. etli.

Our results thus provide a starting material for the future discovery of drugs against P. syringae pv. phaseolicola. Twenty-two drug targets have been suggested. Considering P. syringae pv. phaseolicola mediated halo blight is a seed borne disease, seed disinfection is the most efficient method for providing pathogen free seeds of P. vulgaris. Drugs targeting scaffold protein and FKBP-type peptidyl-prolyl cis-trans isomerase can be exploited for designing drugs to be applied on standing crops. Though experimental validation of all these targets is still recommended, we believe that both the results, as well as methods discussed above, are likely to find importance among the scientists actively participating in agrigenomics translational research.

Footnotes

Acknowledgments

The authors acknowledge the DBT center for bioinformatics facility at Department of Bioscience and Biotechnology, Banasthali University, Banasthali, India, for providing essential facilities for completion of this research work. The authors also wish to thank unknown reviewers and the handling editor for constructive criticism, thereby contributing to evolution of this manuscript to its present state.

Author Disclosure Statement

No competing financial interests exist.

References

Agarwal

, Grover

. 2008. Nucleotide composition and amino acid usage in AT-rich hyperthermophilic species. Open Bioinform J, 2:11–19.

Altschul

, Madden

, Schaffer

et al. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res, 25:3389–3402.

Arnold

, Lovell

, Jackson

, Mansfield

. 2011. Pseudomonas syringae pv. phaseolicola: From 'has bean' to supermodel. Mol Plant Pathol, 12:617–627.

Bakheet

, Doig

. 2010. Properties and identification of antibiotic drug targets. BMC Bioinform, 11:195.

Conesa

, Gotz

, Garcia-Gomez

, Terol

, Talon

, Robles

. 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21:3674–3676.

Fuglsang

. 2004. The 'effective number of codons' revisited. Biochem Biophys Res Commun, 317:957–964.

Godfrey

, Mansfield

, Corry

, Lovell

, Jackson

, Arnold

. 2010. Confocal imaging of Pseudomonas syringae pv. phaseolicola colony development in bean reveals reduced multiplication of strains containing the genomic island PPHGI-1. Mol Plant Microbe Interact, 23:1294–1302.

Hassan

, Mahalingam

, Kumar

. 2009. Synonymous codon usage analysis of thirty two Mycobacteriophage genomes. Adv Bioinformatics, 316936.

Hefti

. 2008. Requirements for a lead compound to become a clinical candidate. BMC Neurosci, 9:S7.

10.

Hernandez-Morales

, De La Torre-Zavala

, Ibarra-Laclette

et al. 2009. Transcriptional profile of Pseudomonas syringae pv. phaseolicola NPS3121 in response to tissue extracts from a susceptible Phaseolus vulgaris L. cultivar. BMC Microbiol, 9:257.

11.

Joardar

, Lindeberg

, Jackson

et al. 2005. Whole-genome sequence analysis of Pseudomonas syringae pv. phaseolicola 1448A reveals divergence among pathovars in genes involved in virulence and transposition. J Bacteriol, 187:6488–6498.

12.

Katara

, Grover

, Kuntal

, Sharma

. 2011. In silico prediction of drug targets in Vibrio cholerae. Protoplasma, 248:799–804.

13.

Lipinski

, Lombardo

, Dominy

, Feeney

. 2001. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev, 46:3–26.

14.

Lovell

, Jackson

, Mansfield

et al. 2011. In planta conditions induce genomic changes in Pseudomonas syringae pv. Phaseolicola. Mol Plant Pathol, 12:167–176.

15.

Puigbo

, Bravo

, Garcia-Vallve

. 2008. CAIcal: A combined set of tools to assess codon usage adaptation. Biol Direct, 3:38.

16.

Rajendran

, Purohit

, Sethumadhavan

. 2012. In silico investigation of molecular mechanism of laminopathy caused by a point mutation (R482W) in lamin A/C protein. Amino Acids, 43:603–615.

17.

Rico

, McCraw

, Preston

. 2011. The metabolic interface between Pseudomonas syringae and plant cells. Curr Opin Microbiol, 14:31–38.

18.

Saidemberg

, Passarelli

, Rodrigues

, Basso

, Santos

, Palma

. 2011. Shikimate kinase (EC 2.7.1.71) from Mycobacterium tuberculosis: Kinetics and structural dynamics of a potential molecular target for drug development. Curr Med Chem, 18:1299–1310.

19.

Sakharkar

, Sakharkar

, Chow

. 2008. Biocomputational strategies for microbial drug target identification. Methods Mol Med, 142:1–9.

20.

Sharp

, Li

. 1987. The codon Adaptation Index—A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res, 15:1281–1295.

21.

Vencato

, Tian

, Alfano

et al. 2006. Bioinformatics-enabled identification of the HrpL regulon and type III secretion system effector proteins of Pseudomonas syringae pv. phaseolicola 1448A. Mol Plant Microbe Interact, 19:1193–1206.

22.

Wright

. 1990. The 'effective number of codons' used in a gene. Gene, 87:23–29.

23.

Zhang

, Lin

. 2009. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res, 37:D455–458.