Prediction of Essential Proteins of Klebsiella pneumoniae using Integrative Bioinformatics and Systems Biology Approach: Unveiling New Avenues for Drug Discovery

Abstract

Klebsiella pneumoniae is an opportunistic multidrug-resistant bacterial pathogen responsible for various health care-associated infections. The prediction of proteins that are essential for the survival of bacterial pathogens can greatly facilitate the drug development and discovery pipeline toward target identification. To this end, the present study reports a comprehensive computational approach integrating bioinformatics and systems biology-based methods to identify essential proteins of K. pneumoniae involved in vital processes. From the proteome of this pathogen, we predicted a total of 854 essential proteins based on sequence, protein–protein interaction (PPI) and genome-scale metabolic model methods. These predicted essential proteins are involved in vital processes for cellular regulation such as translation, metabolism, and biosynthesis of essential factors, among others. Cluster analysis of the PPI network revealed the highly connected modules involved in the basic functionality of the organism. Further, the predicted consensus set of essential proteins of K. pneumoniae was evaluated by comparing them with existing resources (NetGenes and PATHOgenex) and literature. The findings of this study offer guidance toward understanding cell functionality, thereby facilitating the understanding of pathogen systems and providing a way forward to shortlist potential therapeutic candidates for developing novel antimicrobial agents against K. pneumoniae. In addition, the research strategy presented herein is a fusion of sequence and systems biology-based approaches that offers prospects as a model to predict essential proteins for other pathogens.

Introduction

Klebsiella pneumoniae is an opportunistic bacterial pathogen that causes several hospital-acquired infections, including pneumonia, urinary tract infection, bacteremia, and so on, in immunocompromised patients in health care settings (CDC.gov, 2023; Chang et al., 2021). In recent years, this pathogen has become a significant public health threat due to the emergence of multidrug-resistant strains and hypervirulent strains, which have been reported in various parts of the world (Awoke et al., 2021; Banerjee et al., 2021; Chen et al., 2023). Indeed, K. pneumoniae has developed resistance to multiple antibiotics, including the last resort regimens such as colistin and carbapenems, with various resistance mechanisms (Petrosillo et al., 2019). The increased mortality associated with multidrug-resistant strains of K. pneumoniae demands the development of therapeutic intervention in treating the infections caused by this pathogen (Li et al., 2023; Xu et al., 2017).

In this scenario, several studies attempted to identify potential targets and drugs in K. pneumoniae using various omics-based approaches (Ali et al., 2022; Ramos et al., 2018; Serral et al., 2022) and other studies to enhance existing treatment options (Bayatinejad et al., 2023; Shein et al., 2021).

For effective therapeutics discovery, exploration of essential genes/proteins of a pathogenic organism can assist us in comprehending the pathogen system and pinpoint key factors that are vital to various key biological processes (Plaimas et al., 2010). The proteins encoded by the essential genes in an organism are termed as essential proteins that are indispensable for the growth and survival of the organism. The identification of such essential proteins in a pathogen would facilitate in shortlisting potential target candidates, which can be further utilized for developing/designing antimicrobial agents. Multiple subtractive genomics-based studies on pathogenic bacteria use this strategy to filter potential candidate targets for drug target identification and vaccine development (Khan et al., 2022; Shanmugham and Pan, 2013; Solanki and Tiwari, 2018; Uddin et al., 2018).

These studies mainly relied on the sequence-based approach that compares the pathogen proteins with a known set of essential proteins to select proteins for further analysis. Similarly, there are reports on network-based methods and genome-scale metabolic modeling (GSSM) approaches for the identification of essential genes/proteins in various organisms (Gollapalli et al., 2021; Sertbas and Ulgen, 2020; Wuchty and Uetz, 2014).

Researchers have developed and applied several approaches encompassing both in vitro and in silico analysis to identify essential genes/proteins in various organisms, ranging from archaea, bacteria to eukaryotes (Guo et al., 2021). The identification methods have evolved to utilize various omics data and use machine/deep learning-based approaches, which necessitate high-quality large datasets for such prediction (Aromolaran et al., 2021; Yue et al., 2022). In this scenario, with the availability of multiple biological data, including proteome sequence, interactome, and so on, an array of combination analyses would be advantageous as the predictions are consolidated results from different methods. Moreover, prediction of essential proteins in K. pneumoniae utilizing a fusion of sequence and systems biology-based integrative approach is absent.

To this end, the present study aims to predict essential proteins in this pathogen using an integrative computational approach comprising the sequence, interaction network, and metabolic model analyses, which would be an inception toward developing novel antibiotic agents.

Materials and Methods

The workflow utilized in the present study to predict essential proteins in K. pneumoniae is depicted in Figure 1.

FIG. 1.

Overview of the methodology adopted in the study comprising various approaches to predict essential proteins in Klebsiella pneumoniae. GSMM/GSSM, Genome-Scale Metabolic Model; PPI, protein–protein interaction.

Data retrieval

The complete reference proteome sequence of K. pneumoniae subsp. pneumoniae (strain ATCC 700721/MGH 78578) was retrieved from UniProtKB (Proteome ID: UP000000265) (UniProt Consortium, 2023). The protein–protein interaction (PPI) network data of K. pneumoniae was mapped using the retrieved whole proteome sequence, with a confidence score ≥0.900 and was imported into the Cytoscape workspace from the STRING database (v11) (Szklarczyk et al., 2019) using StringApp (Doncheva et al., 2019).

Network and cluster analysis

The mapped PPI network of K. pneumoniae was visualized in Cytoscape (version 3.8.2), and various network centrality measures were computed using NetworkAnalyzer, an in-built Cytoscape plugin for computing network topological metrics (Shannon et al., 2003). cytoHubba is a Cytoscape-based app that identifies essential nodes in the network and ranks those nodes based on different topological analysis methods (Chin et al., 2014). Cluster analysis of the constructed PPI network was performed using Molecular Complex Detection (MCODE), a Cytoscape app plugin (Bader and Hogue, 2003), and the enrichment of the clusters was carried out with StringApp with no redundant terms (p < 0.05).

Essentiality prediction

The present study predicted essential proteins of K. pneumoniae using an integrative bioinformatics and systems biology approach comprising the ensemble of sequence, PPI network, and GSSM methods.

Sequence-based prediction

Database of essential genes (DEG) is an online database that comprises experimentally identified set of essential genes/proteins of bacteria, archaea, and eukaryotes (Luo et al., 2014). The retrieved whole proteome sequence of Klebsiella pneumoniae MGH 78578 was searched against essential proteins of all bacteria present in the DEG 10 using BLASTp with the criteria of E value ≤1e⁻⁰⁵ and query coverage ≥80%. If a protein sequence showed similarity to at least one hit from the database sequence with the above-mentioned criteria, then it was considered as an essential protein in the present study. In addition, the proteome sequence was subjected to essentiality prediction using Geptop 2.0, a prediction server for gene essentiality of prokaryotes (Wen et al., 2019).

PPI network-based prediction

The nodes of the constructed PPI network of K. pneumoniae were ranked based on network topological parameters using cytoHubba (11 parameters) and Network Analyzer (clustering coefficient) plugins. From each topological centrality parameter, the top 1000 nodes were considered and compared. If a protein node was present in at least two centrality parameters, it was considered as an essential protein from this analysis.

GSSM-based prediction

The GSSM (iLY1228) of Klebsiella pneumoniae MGH 78578 (Liao et al., 2011) present in the BiGG Models database (Norsigian et al., 2020) was analyzed using the MetaNetX platform, an interactive online resource for the automated construction, annotation, and analysis of large-scale metabolic networks (Moretti et al., 2021). A gene/peptide knockout (PKO) analysis was carried out on the iYL1228 model, which identified genes/peptides that hinder the growth of the organism.

Finally, if two or more prediction approaches predicted a protein to be essential, then it was considered as an essential protein in this study. This set of essential proteins was compared with NetGenes, an online database comprising machine learning-based prediction of essential genes in 2711 bacteria (Senthamizhan et al., 2021). Further, the expression of these essential proteins was explored using the PATHOgenex platform, a comprehensive RNA atlas of global expression profiles of 32 human pathogens under stress conditions (Avican et al., 2021).

Results and Discussion

Proteome of K. pneumoniae and interactome data

The complete proteome sequence of K. pneumoniae subsp. pneumoniae MGH 78578 comprises 5126 proteins encoded in a single circular chromosome with five plasmids (pKPN3, pKPN4, pKPN5, pKPN6, pKPN7). Out of 5126 proteins, only 715 were under the reviewed category in UniProtKB, suggesting their extensive curation and annotation levels. The genome accessions and the protein count are provided in Supplementary Table S1. The PPI data from the STRING database for K. pneumoniae had 5797 distinct protein-coding genes belonging to the core STRING type. The PPI data of K. pneumoniae in the STRING database is based on the integrated data from various resources. This core set represents the most reliable and supported PPI data based on several evidences and biological relevance.

Upon mapping the retrieved sequence data to the PPI data of the STRING database, a total of 4447 proteins were mapped with an interaction score ≥0.900 (highest confidence). This reduction in the number of proteins mapped (4447 out of 5126) to the PPI data was due to the stringent cutoff of the confidence score, limited data/annotations and interactome complexity.

PPI network of K. pneumoniae

The mapped PPI data of K. pneumoniae, imported into Cytoscape, was analyzed with global and local network topological parameters. In the PPI network, the proteins are represented as nodes, and their interactions are represented as edges. The constructed PPI network of K. pneumoniae comprised 4447 nodes with 8897 interactions (Supplementary Fig. S1), with 7.6 as the average number of neighbors. The average number of neighbors refers to the average degree of connectivity between the proteins, that is, the average number of interacting partners each protein has in the network.

The average network degree can be computed by summing the degrees of each node divided by the network's total number of nodes. The degree of a node (protein) is defined as the number of edges (interactions) connected to the specific protein in the network. The degree suggests the importance of a node (protein), where proteins with high degrees are referred to as hubs, which play a central role in the network and are crucial for various biological functions (Zotenko et al., 2008).

On the contrary, proteins with a lower degree have fewer interactions that may involve in specific interactions for a specialized function within a cell. Analyzing the degree distribution in the PPI network can provide insights into the overall topology and structure of the network. It assists in identifying highly connected proteins that can act as key regulators or mediators in cellular processes and also in understanding how proteins interact to form functional modules or participate in a pathway. In addition, this also provides the robustness and vulnerability of the network for perturbations. The computed global network statistical properties of the constructed network are given in Table 1. The degree distribution of the constructed PPI network of K. pneumoniae (Fig. 2) exhibited a long-tail pattern following a power-law distribution, indicating the scale-freeness property of the network.

FIG. 2.

Scatter plot of the degree distribution of the constructed PPI network of Klebsiella pneumoniae.

Table 1.

Global Network Topology of the Constructed Protein–Protein Interaction Network of Klebsiella pneumoniae

Network statistics
Number of nodes	4447
Number of edges	8897
Avg. number of neighbors	7.67
Network diameter	17
Network radius	8
Characteristic path length	6.13
Clustering coefficient	0.44
Network density	0.004
Network heterogeneity	1.56
Network centralization	0.065
Connected components	1881

The power-law distribution implies that a minimal number of nodes (proteins), referred to as hubs, have higher interactions (degree) than most of the proteins with lesser interactions. This signifies the network's resilience to perturbations, and the hubs act as crucial points that can potentially disrupt cellular functional regulations (Albert, 2005).

The network radius measures the shortest path between a specific node (protein) to the other farthest node, which shows the spread or dispersion of the network. The smaller the radius, the more compact and tightly interconnected the network is, where most of the proteins are closer to each other. Herein, the PPI network of K. pneumoniae had a network radius of eight, indicating that the traversal from a node (protein) had a path distance of eight edges to reach the farthest node to pass the information or signal. The identified characteristic path length for the K. pneumoniae PPI was 6.13. It indicates the average number of edges between nodes that exist in the network, showing the efficiency of information flow.

The clustering coefficient of the PPI network was 0.44, reflecting the local interconnectedness among the nodes (44%) in the network. The higher clustering coefficient suggests the existence of tightly connected complexes in the network. Several attributes, such as data availability, experimental techniques, and functional specificity, contribute to the representation of networks, which could be the limiting factors in network analysis.

The analysis of the global properties of the K. pneumoniae PPI network reveals insight into various important structural and functional properties of the network, such as scale-freeness, hubs, information flow, and connectedness. The scale-free topology indicates the network's resilience to perturbation, signifying the role of hubs in regulating cellular processes that are potentially essential to the organism. These findings enhance our understanding of K. pneumoniae protein interactions and their functional and organizational implications. In addition, it could shed light on potential key regulators or pathways for further exploration to combat infections and develop therapeutic strategies.

Cluster and enrichment analysis

The cluster analysis of the constructed PPI of K. pneumoniae resulted in 211 clusters with a minimum degree of >2. It showed the densely connected modules with significant functional relationships and likely to be involved in a common biological pathway or process. The 211 clusters indicate that the PPI network is modular with distinct functional units. Each identified cluster had a specific biological process, functional module, or protein complex. The top 10 clusters (Supplementary Fig. S2), identified based on the MCODE analysis and scores, are listed in Table 2. Enrichment analysis of these modules revealed the gene ontology terms related to the clusters and the pathways associated with the cellular processes.

Table 2.

Top 10 Cluster Scores Based on MCODE Analysis of the Constructed Protein–Protein Interaction Network of Klebsiella pneumoniae with the Number of Nodes and Edges

Cluster	Score (density^*#nodes)	Nodes	Edges
1	56.6	58	1615
2	17.8	18	152
3	15.6	16	117
4	11	12	61
5	11	11	55
6	10.8	11	54
7	10.182	12	56
8	10	10	45
9	9.1	10	41
10	8.5	9	34

MCODE, Molecular Complex Detection.

The top three clusters with the highest scores are shown with STRING-based representations in Figure 3. Cluster 1 consisted of 58 nodes with 1615 edges, mostly ribosomal proteins (L13, S20, L2, L9, S17, L35, etc.) involved in rRNA (ribosomal RNA) or tRNA (transfer RNA) binding. Ribosomes are essential cellular organelles responsible for protein synthesis and comprise both rRNA and ribosomal proteins. This cluster 1 showed the ribosomal complexes, including the subunits forming functional roles to perform translation, an essential process in cellular functioning. Moreover, the dominance of ribosomal proteins in this cluster implies the role of these proteins in cellular regulation, homeostasis, and growth. In this cluster, ribosome, RNA polymerase, and protein export were found to be significantly enriched KEGG pathways, and nucleic acid binding (oligonucleotide/oligosaccharide-binding fold), zinc-binding ribosomal protein, beta-barrel domain, translation protein SH3-like domain superfamily and ribosomal protein S5 domain 2-type fold were enriched terms in InterPro domains.

FIG. 3.

Top 3 clusters of the Klebsiella pneumoniae protein–protein interaction network showing various nodes (proteins) interacting as functional modules associated with different vital biological processes.

Cluster 2 had 18 nodes with 152 edges that were enriched with proteins involved in cobalamin biosynthesis, including the tetrapyrrole methylase, CbiD, CbiC, CobU, CbiF, CbiH, CbiO, and so on, which signifies a functionally cohesive group of proteins involved in cobalamin biosynthesis. It has been reported that CbiD is essential for cobalamin biosynthesis in both Salmonella typhimurium and Bacillus megaterium (Raux et al., 1998). It is known that bacteria synthesize cobalamin, which is an essential coenzyme for vital cellular processes such as nucleic acid metabolism and amino acid synthesis. This cluster signifies the interactions between proteins coordinated in cobalamin synthesis, which is important for bacterial growth and metabolism. The pathway enrichment of cluster 2 showed significant pathways that the proteins were involved in, namely pyruvate metabolism, carbon metabolism, biosynthesis of secondary metabolites, and propanoate metabolism, which are essential for cellular regulation and metabolism.

Similarly, cluster 3 was observed to have 16 nodes with 117 edges, comprising mainly proteins of nicotinamide adenine dinucleotide (NAD) + hydrogen (H)-quinone (NADH-quinone) oxidoreductase subunits (NuoA, NuoB, NuoC, NuoH, NuoI, NuoN, and NuoK, etc.), enriched in ubiquinone and peptidase M16 activity with NADH-quinone oxidoreductase/Mrp antiporter domains, indicating potential role in substrate or ion transport across membranes. The NADH-quinone oxidoreductase complex has been reported to have a key role in bacterial respiration for ATP generation (Friedrich et al., 2016). The protein peptidase M16 belongs to the metallopeptidase family and has diverse functional roles, including proteolytic activity, that are important for protein processing and degradation (Turner and Nalivaeva, 2011). The proteins of this cluster were mainly involved in electron transport and energy production for cellular respiration. The presence of ATP-binding cassette transporters in the identified cluster indicates their involvement in nutrient uptake and metabolite efflux in the bacterial cell for maintaining homeostasis (Davidson et al., 2008).

The pathway enrichment analysis of the cluster showed that the proteins were mainly involved in porphyrin metabolism and transport. Porphyrin metabolism is associated with heme synthesis, which is important for several cellular processes such as electron transport, oxygen binding, and so on (Dailey et al., 2017).

The cluster analysis with enrichment of the modules identified from the PPI network unveiled the modularity property of the network involved in distinct functions/processes. The top three clusters comprised proteins associated with various essential functions such as translation, cobalamin biosynthesis, energy metabolism, and transport. These findings highlight the functional organization of clusters in the PPI network belonging to biological pathways and processes that are central to the cellular functioning of K. pneumoniae.

Predicted essential proteins of K. pneumoniae

The prediction of essential genes/proteins in an organism aids in identifying vital factors that are essential to the organisms' survival and thus can be considered as drug target candidates. These factors can be utilized for the development of drugs with selective binding to avoid any side effects on the host. In addition, they also help in understanding the basic core cellular functional regulations, which have broad applications in the area of applied biotechnology and biomedicine (Dong et al., 2020). The current study utilized a combination of ensemble approaches comprising integrative systems biology and bioinformatics methods to predict the essential proteins in K. pneumoniae.

The sequence-based prediction identified essential proteins based on similarity to existing experimentally determined essential proteins presented in the DEG database. The concept in this similarity-based method is that proteins that have a high similarity to known essential proteins of other organisms/pathogens are potentially essential as they could be homologs or orthologs in nature. These proteins are often found to be highly conserved across different closely related species, maintaining functional conservation that is essential to basic cellular functionality, which can assist in studying evolution and biological functional conservation.

The sequence-based approach resulted in 744 K. pneumoniae proteins that were similar to essential proteins identified in various bacterial strains of Escherichia coli, Salmonella enterica, Providencia stuartii, Shewanella oneidensis, Pseudomonas aeruginosa, Vibrio cholerae, Haemophilus influenzae, and so on. Thus, these 744 proteins are likely to be essential in K. pneumoniae. These proteins belong to the gram-negative class of bacteria, indicating their conservation across different species. For instance, the ribosomal proteins and subunits (large and small) identified as essential had high similarity (≈100%) and are known to be highly conserved across bacterial species (van den Elzen et al., 2023; Lecompte et al., 2002).

Other proteins identified based on the similarity approach are peptidyl-tRNA hydrolase, recombinase A, dihydroorotase, Ribonuclease 3, pyridoxine/pyridoxamine 5′-phosphate oxidase, queuine tRNA-ribosyltransferase, and so on. These proteins were found to be involved in major essential processes such as tRNA processing, translation, DNA recombination, ribosomal small subunit biogenesis, RNA catabolic process, regulation of DNA-templated transcription, nicotinamide dinucleotide phosphate (NADP) metabolic process, and cell wall organization, including lipid A biosynthetic process and peptidoglycan biosynthetic process. These processes are vital for regular cellular functionality, and these proteins signify their indispensability potential for further investigations toward drug development. Since the similarity-based approach is highly dependent on the conservation and annotations of gene/protein information, it relies on the availability of experimental data.

In this context, prediction-based methods using the sequence data could provide reliable predictions at the genome scale for gene essentiality. One such tool is Geptop 2.0, based on public essential gene repositories for identifying essential genes in prokaryotes. This tool predicted 394 proteins as essential proteins in K. pneumoniae. For example, the proteins UDP-N-acetylmuramate—L-alanine ligase (encoded by murC), small ribosomal subunit protein uS4 (rpsD), phenylalanine—tRNA ligase alpha subunit (pheS), uridylate kinase (pyrH), and elongation factor Ts (tsf) were some of the proteins with high essentiality scores. Geptop classifies a protein as essential (1) or non-essential (0) based on the essentiality scores defined by a cumulative formula and are normalized to a range from zero to one. The larger the value toward one represents the essentiality of the gene/protein, and the average area under the curve (AUC) is ∼0.84, as reported (Wen et al., 2019).

Functional analysis of the Geptop predicted essential proteins showed these proteins were majorly involved in key biological processes, including translation, peptidoglycan biosynthesis, ATP binding, rRNA binding, lipoprotein biosynthesis, methionyl-tRNA aminoacylation, and so on. The results of the prediction-based method were in accordance with the similarity-based approach in revealing the core essential biological processes in bacteria, indicating functional significance.

In addition to the sequence-based approach (similarity and prediction), using the constructed PPI network of K. pneumoniae, a network centrality-based approach was utilized to rank the proteins to shortlist essential proteins using 11 topological parameters computed based on cytoHubba along with another topological parameter, clustering coefficient. The correlation plot based on these 12 topological parameters (degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality, bottleneck, eccentricity, closeness, radiality, betweenness, stress, and clustering coefficient) is depicted in Supplementary Figure S3, showing the relationship between computed values.

Based on these 12 topologies, the protein nodes of the PPI network were ranked (Top 1000), and a protein was considered as essential if it is present in at least two centrality measures. This PPI-based prediction resulted in a total of 1131 proteins, which were found to possess central properties in the network and perturbation of such proteins would collapse the network and possibly be essential in K. pneumoniae.

Some of the proteins identified as essential based on the network topologies were ribosomal proteins (30S S9, 30S S5, 50S L15, 50S L6, 50S L2, etc.), translation elongation factor LepA, DNA-directed RNA polymerase, translation initiation factor IF-3, cobalamin biosynthesis protein CbiG, threonine—tRNA ligase, NADH-quinone oxidoreductase subunit C, pyruvate-flavodoxin oxidoreductase, Type II secretion system proteins (J, I, L, G, K), CoA-acylating propionaldehyde dehydrogenase, and so on. The protein pyruvate-flavodoxin oxidoreductase was identified as the highest interaction protein node with a degree of 82 in the PPI network of K. pneumoniae, a crucial enzyme for metabolic repertoire that allows bacteria to adapt and survive in different environmental conditions. This network-based approach provides a unique way to integrate data resources in the context of PPIs to assess functional relationships, which is crucial in essentiality, but it is limited to data quality, availability, and completeness.

Overall, these network topologies provide reliable metrics to pinpoint essential proteins as they denote their central roles and contribution toward network functionality and integrity, enhancing confidence in the prediction of essentiality. However, combining multiple topologies and integrating them with experiments is necessary to validate these predictions.

Further, a GSSM iYL1228 of K. pneumoniae was subjected to gene/PKO analysis, resulting in 118 peptides that critically affected biomass production and were lethal when knocked out. These 118 peptides/proteins are predicted to be essential in the analyzed GSSM of K. pneumoniae. This experimentally validated metabolic model comprises 1229 genes with 1658 metabolites involved in 2262 reactions. Cytidine 5-triphospate (CTP) synthase [EC (Enzyme Commission number) 6.3.4.2], orotidine 5′-phosphate decarboxylase (EC 4.1.1.23), phosphatidylserine synthase, acyl carrier protein, tryptophan synthase alpha chain (EC 4.2.1.20), ATP phosphoribosyltransferase (EC 2.4.2.17), histidine biosynthesis bifunctional protein HisB, and UDP-glucose 6-dehydrogenase were some of the identified essential genes/proteins that crucially affect the growth.

It was observed that these essential proteins were part of crucial biosynthetic processes such as lipid A biosynthesis, carbohydrate metabolism, amino acid biosynthesis, phosphorylation, polysaccharide biosynthesis, fatty acid biosynthesis, de novo inosine monophosphate/adenosine monophosphate (IMP/AMP) biosynthesis, transsulfuration, chorismate metabolic process, and so on. These results can assist in designing and developing tools such as metabolic engineering for various applications and potentially enhance our understanding of complex metabolic networks that are fundamental to life forms.

To obtain a comprehensive set of essential proteins in K. pneumoniae, proteins were compared between different prediction approaches, and it was found that 33 proteins (Fig. 4; Table 3) were common in all prediction approaches. Herein, the final set of essential proteins was considered based on the presence of a protein in two or more prediction approaches. A total of 854 proteins (Supplementary Table S2) were identified as the consensus set of essential proteins based on the results of the combination approach. These 854 proteins can be further evaluated experimentally to validate the findings from the study.

FIG. 4.

Venn diagram showing the number of essential proteins overlapping with different prediction approaches. DEG, database of essential genes; GSSM, genome-scale metabolic model.

Table 3.

List of 33 Klebsiella pneumoniae Proteins Predicted as Essential By All Methods in the Present Study

UniProtKB ID	Gene name(s)	Protein name(s)	Length (AA)	3D structure(s) available in PDB
A6TEJ5	glmM	Phosphoglucosamine mutase	445	—
A6TD54	pyrG	CTP synthase	545	—
A6T4N3	murG	UDP-N-acetylglucosamine—N-acetylmuramyl-(pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase	356	—
A6TG33	glmS	Glutamine—fructose-6-phosphate aminotransferase	609	—
A6T706	msbA	ATP-binding transport protein multicopy suppressor of htrB	582	—
A6TES2	accB	Biotin carboxyl carrier protein of acetyl-CoA carboxylase	155	—
A6T4N1	murD	UDP-N-acetylmuramoylalanine—D-glutamate ligase	438	—
A6TDG3	thyA	Thymidylate synthase	264	—
A6T7K1	purB	Adenylosuccinate lyase	456	—
A6TEK8	murA	UDP-N-acetylglucosamine 1-carboxyvinyltransferase	419	—
A6TES3	accC	Biotin carboxylase	449	—
A6T4Y3	lpxA	Acyl-[acyl-carrier-protein]—UDP-N-acetylglucosamine O-acyltransferase	262	—
A6T4M8	murE	UDP-N-acetylmuramoyl-L-alanyl-D-glutamate—2,6-diaminopimelate ligase	495	—
A6T4Y1	lpxD	UDP-3-O-(3-hydroxymyristoyl) glucosamine N-acyltransferase	341	—
A6T7F5	fabF	3-oxoacyl-[acyl-carrier-protein] synthase 2	413	—
A6T7F2	fabD	Malonyl CoA-acyl carrier protein transacylase	309	—
A6T4N9	lpxC	UDP-3-O-acyl-N-acetylglucosamine deacetylase	305	—
A6T7F8	tmk	Thymidylate kinase	213	—
A6T4N0	mraY	Phospho-N-acetylmuramoyl-pentapeptide-transferase	360	—
A6T4Y7	accA	Acetyl-coenzyme A carboxylase carboxyl transferase subunit alpha	319	—
A6TGR3	purH	Bifunctional purine biosynthesis protein PurH	529	—
A6TCC2	guaA	GMP synthase	525	—
A6TG34	glmU	Bifunctional protein GlmU	456	8CU9
A6TGJ2	dapF	Diaminopimelate epimerase	275	—
A6TFP7	gmk	Guanylate kinase	207	—
A6T4Y4	lpxB	Lipid-A-disaccharide synthase	383	—
A6TC15	aroC	Chorismate synthase	361	—
A6TAN5	kdsA	2-dehydro-3-deoxyphosphooctonate aldolase	284	—
A6T4N4	murC	UDP-N-acetylmuramate—L-alanine ligase	491	—
A6T4M9	murF	UDP-N-acetylmuramoyl-tripeptide—D-alanyl-D-alanine ligase	452	—
A6TC02	accD	Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta	306	—
A6T7F4	acpP	Acyl carrier protein	78	—
A6TCC3	guaB	Inosine-5′-monophosphate dehydrogenase	488	—

AA, amino acids; PDB, Protein Data Bank.

In silico validation of predicted essential proteins

The final comprehensive set of 854 essential proteins predicted based on the consensus combination approach was compared with the NetGenes database, which utilized network-based features to predict essential proteins from the STRING PPI data. It was noticed that 242 proteins (Fig. 5) were common among the essential proteins predicted from the present study, with NetGenes essential proteins showing the validity of the approach utilized. A recent study that curated a set of Klebsiella metabolic models proved 57 genes to be essential for all substrates in all considered strains (Hawkey et al., 2022), and interestingly, all those 57 genes/proteins from the present study were also identified as essential.

FIG. 5.

Venn diagram showing the number of common genes/proteins predicted as essential of Klebsiella pneumoniae in the present study and essential proteins of the NetGenes database.

In addition, the gene expression of the predicted essential proteins was explored with the PATHOgenex platform, a resource for human bacterial pathogens gene expression data (RNA atlas). It was observed that these essential proteins were differentially expressed in different conditions such as acidic stress (As), bile stress (Bs), low iron (Li), hypoxia (Hyp), nutritional downshift (Nd), nitrosative stress (Ns), osmotic stress (Oss), oxidative stress (Oxs), stationary phase (Sp), temperature (Tm), and virulence inducing condition (Vic) (Supplementary Fig. S4). The condition-specific expression of these essential proteins suggests that these proteins are associated with the adaptiveness of the pathogen in different environments. In particular, the essential proteins that are expressed in the Vic condition have a potential association with pathogenicity and thus can be considered as target candidates for developing antimicrobial agents specific to them.

Overall, the findings of this study can contribute to our fundamental understanding of essentiality and cellular functions that can, in turn, lead to target-based drug discovery and development to tackle infectious diseases caused by bacterial pathogens.

Conclusions

The prediction of essential genes/proteins has several applications in biology and medicine. It helps to comprehend the core fundamental biological processes vital for an organism's survival. Identifying essential proteins also guides toward a targeted drug discovery approach by shortlisting potential therapeutic targets that could facilitate the development of novel antimicrobial agents against pathogenic infections. The present study utilized an integrative bioinformatics and systems biology approach to identify essential proteins in K. pneumoniae. The combination approach of proteome sequence, interaction, and genome-scale modeling methods resulted in 854 essential proteins, involved in major cellular and biological processes. Cluster analysis of the PPI network of K. pneumoniae showed the top regulatory pathways and processes that are vital for cellular functions. The predicted essential proteins were compared with existing databases and literature that confirmed the validity of the prediction approach.

The findings of the study would serve as a beginning for a target-based drug discovery approach to consider these essential proteins for shortlisting novel antimicrobial agents toward therapeutic development. Moreover, the methodology adopted herein can be exploited and applied to other multidrug-resistant pathogens that cause infections to humankind.

Footnotes

Acknowledgments

G.P. is supported by the Department of Biotechnology (DBT), Government of India, under the DBT-BINC research fellowship program (DBT-BINC/2017/PU/6). The authors are indebted to the Department of Bioinformatics, Pondicherry University, Pondicherry, for providing the computational facility to carry out the work.

Authors' Contributions

G.P.: Conceptualization, methodology, data analysis, visualization, and writing—original draft, review, and editing. A.P.: Conceptualization, formal analysis, resources, supervision, and writing—review and editing. All authors have read and approved the final version of the article.

Author Disclosure Statement

The authors declare they have no conflicting financial interests.

Funding Information

No funding was received for this article.

Supplementary Material

Abbreviations Used

References

Albert

. Scale-free networks in cell biology. J Cell Sci, 2005; 118(21):4947–4957; doi: 10.1242/jcs.02714.

Ali

, Alam

, Hasan

, et al. Potential therapeutic targets of Klebsiella pneumoniae: A multi-omics review perspective. Brief Funct Genomics, 2022; 21(2):63–77; doi: 10.1093/bfgp/elab038.

Aromolaran

, Aromolaran

, Isewon

, et al. Machine learning approach to gene essentiality prediction: A review. Brief Bioinform, 2021; 22(5):1–19; doi: 10.1093/bib/bbab128.

Avican

, Aldahdooh

, Togninalli

, et al. RNA atlas of human bacterial pathogens uncovers stress dynamics linked to infection. Nat Commun, 2021; 12(1):3282; doi: 10.1038/s41467-021-23588-w.

Awoke

, Teka

, Seman

, et al. High prevalence of multidrug-resistant Klebsiella pneumoniae in a Tertiary Care Hospital in Ethiopia. Antibiotics (Basel), 2021; 10(8):1007; doi: 10.3390/antibiotics10081007.

Bader

, Hogue

CWV

. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 2003; 4:2; doi: 10.1186/1471-2105-4-2.

Banerjee

, Wangkheimayum

, Sharma

, et al. Extensively drug-resistant hypervirulent Klebsiella pneumoniae from a series of neonatal sepsis in a Tertiary Care Hospital, India. Front Med, 2021; 8:645955; doi: 10.3389/fmed.2021.645955.

Bayatinejad

, Salehi

, Beigverdi

, et al. In vitro antibiotic combinations of colistin, meropenem, amikacin, and amoxicillin/clavulanate against multidrug-resistant Klebsiella pneumonia isolated from patients with ventilator-associated pneumonia. BMC Microbiol, 2023; 23(1):298; doi: 10.1186/s12866-023-03039-w.

CDC.gov. Klebsiella pneumoniae in Healthcare Settings. 2023. Available from: https://www.cdc.gov/hai/organisms/klebsiella/klebsiella.html [Last accessed: December 29, 2023].

10.

Chang

, Sharma

, Dela Cruz

, et al. Clinical epidemiology, risk factors, and control strategies of Klebsiella pneumoniae infection. Front Microbiol, 2021; 12:750662; doi: 10.3389/fmicb.2021.750662.

11.

Chen

, Zhang

, Liao

. Hypervirulent Klebsiella pneumoniae. Infect Drug Resist, 2023; 16:5243–5249; doi: 10.2147/IDR.S418523.

12.

Chin

C-H

, Chen

S-H

, Wu

H-H

, et al. cytoHubba: Identifying hub objects and sub-networks from complex interactome. BMC Syst Biol, 2014; 8(Suppl. 4):S11; doi: 10.1186/1752-0509-8-S4-S11.

13.

Dailey

, Dailey

, Gerdes

, et al. Prokaryotic heme biosynthesis: Multiple pathways to a common essential product. Microbiol Mol Biol Rev, 2017; 81(1):e00048-16; doi: 10.1128/MMBR.00048-16.

14.

Davidson

, Dassa

, Orelle

, et al. Structure, function, and evolution of bacterial ATP-binding cassette systems. Microbiol Mol Biol Rev, 2008; 72(2):317–364; doi: 10.1128/MMBR.00031-07.

15.

Doncheva

, Morris

, Gorodkin

, et al. Cytoscape StringApp: Network analysis and visualization of proteomics data. J Proteome Res, 2019; 18(2):623–632; doi: 10.1021/acs.jproteome.8b00702.

16.

Dong

, Jin

Y-T

, Hua

H-L

, et al. Comprehensive review of the identification of essential genes using computational methods: Focusing on feature implementation and assessment. Brief Bioinform, 2020; 21(1):171–181; doi: 10.1093/bib/bby116.

17.

Friedrich

, Dekovic

, Burschel

. Assembly of the Escherichia coli NADH: Ubiquinone Oxidoreductase (respiratory Complex I). Biochim Biophys Acta, 2016; 1857(3):214–223; doi: 10.1016/j.bbabio.2015.12.004.

18.

Gollapalli

, Selvan GT

H M

, et al. Genome-scale protein interaction network construction and topology analysis of functional hypothetical proteins in Helicobacter pylori divulges novel therapeutic targets. Microb Pathog, 2021; 161:105293; doi: 10.1016/j.micpath.2021.105293.

19.

Guo

, Ju

, Chen

, et al. Research on the computational prediction of essential genes. Front Cell Dev Biol, 2021; 9:803608; doi: 10.3389/fcell.2021.803608.

20.

Hawkey

, Vezina

, Monk

, et al. A curated collection of Klebsiella metabolic models reveals variable substrate usage and gene essentiality. Genome Res, 2022; 32(5):1004–1014; doi: 10.1101/gr.276289.121.

21.

Khan

, Jalal

, Uddin

. An integrated in silico based subtractive genomics and reverse vaccinology approach for the identification of novel vaccine candidate and chimeric vaccine against XDR Salmonella typhi H58. Genomics, 2022; 114(2):110301; doi: 10.1016/j.ygeno.2022.110301.

22.

Lecompte

, Ripp

, Thierry

J-C

, et al. Comparative analysis of ribosomal proteins in complete genomes: An example of reductive evolution at the domain scale. Nucleic Acids Res, 2002; 30(24):5382–5390; doi: 10.1093/nar/gkf693.

23.

, Huang

, Rao

, et al. Bacteremia mortality: A systematic review and meta-analysis. Front Cell Infect Microbiol, 2023; 13:1157010; doi: 10.3389/fcimb.2023.1157010.

24.

Liao

Y-C

, Huang

T-W

, Chen

F-C

, et al. An experimentally validated genome-scale metabolic reconstruction of Klebsiella pneumoniae MGH 78578, iYL1228. J Bacteriol, 2011; 193(7):1710–1717; doi: 10.1128/JB.01218-10.

25.

Luo

, Lin

, Gao

, et al. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res, 2014; 42(D1):D574–D580; doi: 10.1093/nar/gkt1131.

26.

Moretti

, Tran

VDT

, Mehl

, et al. MetaNetX/MNXref: Unified namespace for metabolites and biochemical reactions in the context of metabolic models. Nucleic Acids Res, 2021; 49(D1):D570–D574; doi: 10.1093/nar/gkaa992.

27.

Norsigian

, Pusarla

, McConn

, et al. BiGG Models 2020: Multi-strain genome-scale models and expansion across the phylogenetic tree. Nucleic Acids Res, 2020; 48(D1):D402–D406; doi: 10.1093/nar/gkz1054.

28.

Petrosillo

, Taglietti

, Granata

. Treatment options for colistin resistant Klebsiella pneumoniae: Present and future. J Clin Med Res, 2019; 8(7): 934; doi: 10.3390/jcm8070934.

29.

Plaimas

, Eils

, König

. Identifying essential genes in bacterial metabolic networks with machine learning methods. BMC Syst Biol, 2010; 4:56; doi: 10.1186/1752-0509-4-56.

30.

Ramos

PIP

, Fernández Do Porto

, Lanzarotti

, et al. An integrative, multi-omics approach towards the prioritization of Klebsiella pneumoniae drug targets. Sci Rep, 2018; 8(1):10755; doi: 10.1038/s41598-018-28916-7.

31.

Raux

, Lanois

, Warren

, et al. Cobalamin (vitamin B12) biosynthesis: Identification and characterization of a Bacillus megaterium cobI Operon. Biochem J, 1998; 335(1):159–166; doi: 10.1042/bj3350159.

32.

Senthamizhan

, Ravindran B and Raman

. NetGenes: A database of essential genes predicted using features from interaction networks. Front Genet, 2021; 12:722198; doi: 10.3389/fgene.2021.722198.

33.

Serral

, Pardo

, Sosa

, et al. Pathway driven target selection in Klebsiella pneumoniae: Insights into carbapenem exposure. Front Cell Infect Microbiol, 2022; 12:773405; doi: 10.3389/fcimb.2022.773405.

34.

Sertbas

, Ulgen

. Genome-scale metabolic modeling for unraveling molecular mechanisms of high threat pathogens. Front Cell Dev Biol, 2020; 8:566702; doi: 10.3389/fcell.2020.566702.

35.

Shanmugham

, Pan

. Identification and characterization of potential therapeutic candidates in emerging human pathogen Mycobacterium abscessus: A novel hierarchical in silico approach. PLoS One, 2013; 8(3):e59126; doi: 10.1371/journal.pone.0059126.

36.

Shannon

, Markiel

, Ozier

, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res, 2003; 13(11):2498–2504; doi: 10.1101/gr.1239303.

37.

Shein

AMS

, Wannigama

, Higgins

, et al. Novel colistin-EDTA combination for successful eradication of colistin-resistant Klebsiella pneumoniae catheter-related biofilm infections. Sci Rep, 2021; 11(1):21676; doi: 10.1038/s41598-021-01052-5.

38.

Solanki

, Tiwari

. Subtractive proteomics to identify novel drug targets and reverse vaccinology for the development of chimeric vaccine against Acinetobacter baumannii. Sci Rep, 2018; 8(1):9044; doi: 10.1038/s41598-018-26689-7.

39.

Szklarczyk

, Gable

, Lyon

, et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res, 2019; 47(D1):D607–D613; doi: 10.1093/nar/gky1131.

40.

Turner

, Nalivaeva

. Metalloproteases and proteolytic processing. In: Post-Translational Modifications in Health and Disease. Springer New York: New York, NY, USA; 2011; pp. 457–482; doi: 10.1007/978-1-4419-6382-6_19.

41.

Uddin

, Siddiqui

, Azam

, et al. Identification and characterization of potential druggable targets among hypothetical proteins of extensively drug resistant Mycobacterium tuberculosis (XDR KZN 605) through subtractive genomics approach. Eur J Pharm Sci, 2018; 114:13–23; doi: 10.1016/j.ejps.2017.11.014.

42.

UniProt Consortium. UniProt: The Universal Protein Knowledgebase in 2023. Nucleic Acids Res, 2023; 51(D1):D523–D531; doi: 10.1093/nar/gkac1052.

43.

van den Elzen

, Helena-Bueno

, Brown

, et al. Ribosomal proteins can hold a more accurate record of bacterial thermal adaptation compared to rRNA. Nucleic Acids Res, 2023; 51(15):8048–8059; doi: 10.1093/nar/gkad560.

44.

Wen

Q-F

, Liu

, Dong

, et al. Geptop 2.0: An updated, more precise, and faster geptop server for identification of prokaryotic essential genes. Front Microbiol, 2019; 10:1236; doi: 10.3389/fmicb.2019.01236.

45.

Wuchty

, Uetz

. Protein-protein interaction networks of E. coli and S. cerevisiae are similar. Sci Rep, 2014; 4:7187; doi: 10.1038/srep07187.

46.

, Sun

, Ma

. Systematic review and meta-analysis of mortality of patients infected with carbapenem-resistant Klebsiella pneumoniae. Ann Clin Microbiol Antimicrob, 2017; 16(1):18; doi: 10.1186/s12941-017-0191-3.

47.

Yue

, Ye

, Peng

P-Y

, et al. A deep learning framework for identifying essential proteins based on multiple biological information. BMC Bioinformatics, 2022; 23(1):318; doi: 10.1186/s12859-022-04868-8.

48.

Zotenko

, Mestre

, O'Leary

, et al. Why do hubs in the yeast protein interaction network tend to be essential: Reexamining the connection between the network topology and essentiality. PLoS Comput Biol, 2008; 4(8):e1000140; doi: 10.1371/journal.pcbi.1000140.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.95 MB

0.47 MB

0.41 MB

1.27 MB

0.01 MB

0.10 MB