ME ditome : Computational Detection of RNA Edit Sites Using de Novo Assembly in Microbiomes

Abstract

RNA editing is a post-transcriptional modification that alters single-nucleotide sites within RNA strands, thus diversifying transcriptomes and proteomes and modulating gene expression. While better characterized in eukaryotes and in a few microbes, the study of RNA editing in entire microbiomes remains unexplored. Recent studies have demonstrated that A-to-I RNA editing contributes to bacterial adaptation and pathogenicity. Previously, we developed MetaEdit, a reference-based computational pipeline to detect RNA edit sites in microbiomes. While MetaEdit successfully identified RNA edit sites in Escherichia coli within the context of the human gut microbiome, including previously reported loci, it relied primarily on aligning reads to reference genomes of target bacteria. This dependence on reference genomes introduced potential biases, as editing can only be identified in reference genomes, while editing in novel microbial strains missing from the reference databases could be overlooked. Even for reference genomes, the search for edit sites is inefficient since it would have to be conducted one reference genome at a time.

Here, we introduce MEditome, employing de novo assembly to overcome these limitations. This crucial change enables the detection of RNA edit sites across all microbial organisms in the microbiome, including novel bacterial strains for which comprehensive reference genomes are unavailable. Using sequencing data from the Integrative Human Microbiome Project, MEditome identified 2,295 unique RNA editing sites across diverse bacterial taxa. Several of these overlaps with previously identified edits in E. coli detected by MetaEdit in hok/gef gene family and arginine-associated genes, providing in silico validation of accuracy. We observed taxon-specific editing patterns and gene-level differential editing associated with inflammatory bowel disease, highlighting RNA editing as a potential regulatory mechanism influencing microbial adaptation and host–microbe interactions.

Keywords

RNA editing post-transcriptional A-to-I editing metagenomics metatranscriptomics

1. INTRODUCTION

RNA editing is an important post-transcriptional modification that alters RNA sequences, thereby expanding the functional diversity of the transcriptome (Gott and Emeson, 2000). In eukaryotes, RNA editing mechanisms (e.g., adenosine-to-inosine or A-to-I editing) catalyzed by adenosine deaminases (ADARs) are well-studied and known to influence diverse cellular processes, including neuronal development, immune responses, and disease pathogenesis (Blow et al., 2004; Christofi and Zaravinos, 2019; Mallela and Nishikura, 2012). In contrast, RNA editing in bacteria has not been investigated sufficiently. It was initially identified in bacterial tRNAs by Wolf et al. (2002) in the early 2000s. Until recently, editing in bacterial mRNAs was scarcely reported, with limited instances shown to influence bacterial pathogenicity and stress responses (Bar-Yaacov et al., 2017; Nie et al., 2020). RNA editing in the context of complex microbial ecosystems, such as the human gut microbiome, remains unexplored. The human gut microbiome comprises numerous bacterial taxa integral to host physiology, contributing to various health conditions including inflammatory bowel diseases (IBD). Understanding RNA editing within these microbial communities could reveal novel adaptive mechanisms critical for microbial survival and host–microbe interactions. Several computational tools have been developed to detect RNA editing in eukaryotes, including REDItools Picardi and Pesole (2013), RES-Scanner (Wang et al., 2016), and SPRINT (Zhang et al., 2017), which primarily rely on reference genomes and have been widely applied in eukaryotic systems. However, these approaches are not well-suited for microbiome-scale analyses, where incomplete reference catalogs, high strain-level diversity, and frequent horizontal gene transfer introduce substantial bias and limit sensitivity (Nayfach and Pollard, 2016; Lloyd-Price et al., 2019). To resolve this gap, we developed MetaEdit, a reference-based computational pipeline that successfully identified RNA edit sites in any microbe that has a reference sequence cataloged in genomic databases. We used it to show for the first time that RNA editing occurs in microbes, even in the context of complex microbial communities. Furthermore, within microbiome samples, edited E. coli genes may have a potential role in microbial regulation and host interactions (Mehta et al., 2025).

However, reliance on reference genomes limited the scope of MetaEdit to detect editing sites, especially for diverse microbial communities with significant strain variability. Furthermore, it could only be used with one reference genome at a time. To address this critical limitation, we now introduce MEditome, an innovative pipeline leveraging de novo genome assembly, enabling unbiased detection of RNA edit sites across entire microbiome samples without dependence on reference genomes. This methodological advancement is pivotal for uncovering the full scope and biological significance of RNA editing in microbiomes, ultimately expanding our understanding of microbial adaptability, pathogenicity, and host–microbe interactions.

2. METHODS

MEditome employs a reference-free, de novo assembly-based approach to detect RNA editing in microbiomes using metagenomic and metatranscriptomic sequencing data. A schematic of the pipeline can be found in Figure 1, and a detailed algorithm is presented in Algorithm 1. First, metagenomics and metatranscriptomics reads are trimmed and filtered for quality and host genome contamination. High-quality contigs are assembled from the metagenomic reads and serve as the scaffold for alignment. Both metagenomic and metatranscriptomic reads are then mapped back to these contigs to identify candidate RNA editing sites based on discrepancies between alignments of the metagenomic reads versus the metatranscriptomic reads. To ensure reliability, candidate sites are filtered based on alignment quality metrics and read support. Cross-sample validation is performed by conducting multiple sequence alignment using Multiple Alignment using Fast Fourier Transform (MAFFT) (Katoh and Standley, 2013), allowing the identification of conserved RNA edit sites in cohorts of individuals.

FIG. 1.

MEditome Pipeline: (A) Metagenomics (MGX) and metatranscriptomics (MTX) reads sequenced from the same stool sample are quality trimmed and filtered. De novo assembly of metagenomics (MGX) reads then generates contigs library, which is then aligned back to MGX and MTX reads. (B) eggNOG-mapper tool predicts the open reading frames (ORFs) and gene orthologs for the resulting contigs by mapping it to its database. It also assigns functional annotations, including KEGG pathways, KEGG Orthologs (KO), Enzyme Commission Number (EC) numbers, and Taxonomic assignments. (C) RNA editing calling algorithm predicts RNA edit sites for each contig based on editing signals. (D) Functional annotations such as gene family, taxonomic information, KO, EC, KEGG Pathway, and functional annotations were added to each RNA edit site. Any sites present in less than 5 samples and not belonging to any known gene family were filtered out.

Statistical methods are then applied to assess the significance of RNA editing at each site, followed by differential analysis between disease and healthy cohorts to identify edit sites potentially associated with pathophysiological states. Finally, functional annotation and orthology inference are carried out using eggNOG-mapper (Huerta-Cepas et al., 2019) to characterize the biological context of the edited loci, providing insights into their potential roles in microbial function and host–microbiome interactions.

2.1. Dataset

Our dataset consisted of paired metagenomic and metatranscriptomic sequence datasets obtained from 834 samples from 109 individuals diagnosed with Crohn’s disease (CD), ulcerative colitis (UC), or non-inflammatory bowel disease (non-IBD) (Lloyd-Price et al., 2019), all of which were collected as part of the Integrative Human Microbiome Project (iHMP2). To minimize the confounding factors, the samples were screened to include samples exclusively from the Caucasian population, resulting in a final dataset of 748 paired samples from 96 individuals. To minimize population-specific genetic and environmental heterogeneity that could confound RNA editing analyses, we restricted this study to individuals self-reported as Caucasian, consistent with prior iHMP2 subgroup analyses. While this filtering improves internal consistency, it may limit generalizability and is addressed as a limitation

2.2. Quality control and data preprocessing

Raw sequencing reads underwent stringent quality control using Trim Galore (Krueger, 2012) and FastQC (Andrews, 2010) to remove adapters and low-quality bases (Phred score $< 20$ ). Algorithm 1 step 1. Because of our focus on microbial organisms, human genomic contamination was filtered out by aligning trimmed reads to the human reference genome (HG38) using Bowtie2 (Langmead and Salzberg, 2012). Ribosomal sequences were also filtered out from the metatranscriptomic data through alignment to the SILVA rRNA database (Quast et al., 2013).

2.3. De novo assembly

High-quality metagenomic reads were assembled into contigs using MEGAHIT (Li et al., 2015). Assembled contigs provide greater confidence for downstream steps such as annotations. It also allows for unbiased assemblies to be generated, suitable for diverse microbial communities. All metagenomic reads were then aligned to the assembled contigs using Bowtie2 (Langmead and Salzberg, 2012), followed by an identical step with metatranscriptomic reads. Algorithm 1 step 2. For convenience, we will refer to these as DNA alignments and RNA alignments, respectively. High-quality alignments were ensured by retaining only reads with mapping quality scores $> 30$ . These steps were employed in the iHMP2 study to generate high-quality Metagenomic contigs. To mitigate biases introduced by fragmented assemblies and low-abundance organisms, MEditome applies conservative downstream filtering. Only contigs and positions with a minimum of 10 × coverage in both metagenomic and metatranscriptomic alignments were retained. Contigs with insufficient coverage or ambiguous taxonomic annotation were excluded from statistical and validation analyses, prioritizing specificity over sensitivity.

2.4. Gene prediction and ortholog assignment

Following de novo assembly, protein-coding genes were identified and taxonomically annotated with eggNOG-mapper v2.2 (database 5.0) in metagenome mode (Cantalapiedra et al., 2021). Step 3 in Algorithm 1 yields, for every sample, (i) a General Feature Format (GFF) file describing the coordinates of each predicted gene, strand, and identifier of each ORF, and (ii) a FASTA file of translated ORF sequences with their eggNOG ortholog assignments, KEGG/COG terms, and taxonomic lineage. These files form the scaffold for downstream mapping of editing events to genes and orthologous groups.

2.5. RNA editing detection and DNA mutations filtering

In MEditome, Algorithm 1 step 4, candidate RNA editing sites were identified by comparing nucleotide composition between metatranscriptomic (RNA) and metagenomic (DNA) reads aligned to the same de novo assembled contigs. A candidate edit site was defined as a genomic position where nucleotide heterogeneity was observed exclusively in RNA reads while DNA reads consistently supported a single reference nucleotide. To distinguish RNA editing events from genomic SNPs or population-level polymorphisms, positions exhibiting mixed alleles or alternative allele support in metagenomic reads were excluded. Only single-nucleotide substitutions consistent with known bacterial RNA editing signatures (e.g., A→G and T→C) were retained. Insertions, deletions, and non-canonical mismatches were filtered out. For each retained site, an RNA editing score was calculated as the proportion of RNA reads supporting the edited nucleotide relative to total RNA coverage at that position. Sites with low editing scores (below 0.03 or 3%), well clear of sequencing error rates (1%) (Ross et al., 2013), were excluded.

2.6. Mapping edits to genes and orthologs

Because each metagenomic assembly is sample-specific, contig identifiers, and hence the genes predicted on those contigs, are unique to a single library. To place RNAediting events into a common coordinate system, in step 5, we first converted every edit position from contig space to gene space: the GFF file for each sample was parsed to obtain the start, end, and strand of every coding sequence, the raw contig coordinate of the edit was verified to fall within a Coding Sequence, and its 1-based offset from the gene’s 5-prime end was calculated with strand correction when required. The resulting intragenic coordinates were then grouped by eggNOG seed ortholog, and the corresponding protein sequences were aligned with MAFFT (Katoh and Standley, 2013), allowing orthologous editing sites from different samples to be projected onto the same column of the multiple-sequence alignment for downstream comparative analyses.

2.7. Edit-site filtering and cross-sample validation

To ensure the robustness of identifying RNA editing events, in step 6, we implemented a multi-tiered filtering and validation framework. A candidate edit site was retained only if it was observed in at least five independent samples, each meeting stringent read support criteria: a minimum of $10 \times$ coverage in both metagenomic (DNA) and metatranscriptomic (RNA) alignments, and at least three RNA reads supporting the edited nucleotide.

To confirm positional homology across genomes, we leveraged multiple sequence alignments generated using MAFFT (Katoh and Standley, 2013). Only those editing sites that consistently aligned to the same nucleotide position and context across all five samples were retained for downstream analysis. This step ensured that the editing signal was not confounded by misalignments or assembly fragmentation. We then identified functional impact of edit codons and flagged any disruptive edits in step 7).

The resulting high-confidence set of candidate edits was subjected to statistical evaluation (step 7) to distinguish true biological modifications from sequencing artifacts or stochastic variation. This rigorous filtering process produced a conservative and reliable catalog of reproducible RNA editing events across the microbiome, suitable for comparative and functional analyses.

2.8. Functional annotation and impact prediction

To elucidate the biological relevance of the curated RNA editing catalog, we combined the seed-ortholog assignments (see Section 2.4) with domain, pathway, and taxonomic metadata provided by eggNOG-mapper(Cantalapiedra et al., 2021). Each edited ORF was linked to its KEGG ortholog, COG category, and taxonomic lineage, enabling pathway-level and clade-specific summaries of editing frequency.

Protein consequences were evaluated in two steps. First, both the reference and edited coding sequences were translated in silico with BioPython (Cock et al., 2009) under the bacterial genetic code. Second, each codon change was classified as synonymous, missense, or nonsense; missense substitutions that overlapped residues annotated by eggNOG-mapper as catalytic or ligand-binding sites were flagged as potentially disruptive (Cantalapiedra et al., 2021).

Together, this functional annotation framework provides a comprehensive understanding of the biological context and potential consequences of RNA editing in the microbiome—spanning gene function, evolutionary conservation, taxonomic origin, pathway participation, and protein-level impact.

2.9. Statistical analysis

To assess the statistical significance of RNA edit sites, we employed a multi-layered approach integrating both site-level and gene-level analyses. For each candidate RNA editing site, we applied a binomial test to evaluate whether the proportion of edited bases observed in RNA reads significantly exceeded what would be expected by random sequencing error alone. The null hypothesis assumes that the observed editing proportion is due to background noise, while the alternative supports a true post-transcriptional modification.

In addition to evaluating individual sites, we conducted differential RNA editing analyses across disease and control groups to identify editing sites associated with clinical phenotypes. This analysis was performed at two levels: (i) site-level differential editing, comparing RNA editing scores across cohorts using non-parametric Kruskal–Wallis tests, and (ii) gene-level aggregation, where editing scores from multiple sites within the same gene were combined to assess broader patterns of regulation. This gene-centric view accounts for the potential clustering of edits within functionally important regions and reduces site-level noise.

To adjust for the large number of statistical tests performed across the transcriptome, we implemented the Benjamini–Hochberg procedure (Benjamini and Hochberg, 1995) to control the FDR. Sites or genes with FDR-adjusted p values below 0.1 were considered statistically significant. Furthermore, we performed taxon-specific RNA editing analyses to investigate whether certain bacterial clades—defined at the family or class level—exhibited unique editing profiles. This stratified analysis enabled the identification of clade-restricted RNA editing signatures, offering insights into potential lineage-specific post-transcriptional regulatory mechanisms within the microbiome. Different false discovery FDR thresholds were applied depending on the level of analysis: site-level tests used FDR below 0.05, whereas gene-level and other aggregated analyses used FDR below 0.1, reflecting the reduced multiple-testing burden and the exploratory aim of integrative analyses.

2.10. Code availability and tool versions

The MEditome pipeline is publicly available at Mehta (2025), including all scripts, parameters, and documentation required to reproduce the analyses presented in this study. Versions of tool used: Trim Galore (v0.6.2), FastQC (v0.11.6), Bowtie2 (v2.4.4), MEGAHIT (v1.2.1), eggNOG-mapper (v2.2; database v5.0), and MAFFT (v7.145).

3. RESULTS

Starting from a dataset of 748 paired metagenomic and metatranscriptomic samples collected from 96 individuals, MEditome identified a total of 2,295 unique RNA edit sites across the cohort. These edit sites were distributed across a wide range of microbial taxa and genes and were annotated using eggNOG-mapper, enabling the identification of gene orthologs, functional categories, and taxonomic origins. Our findings reveal that RNA editing is more widespread than previously reported, occurring not only in tRNA genes but also across a broad range of protein-coding genes. This expands the known landscape of microbial RNA editing beyond its traditionally recognized role in tRNA modification, suggesting broader regulatory or adaptive functions in microbial physiology. Table 1) represents high-confidence missense sites found in previously reported genes (Mehta et al., 2025) using MEditome. To explore the distribution of RNA edit sites, we analyzed site- and gene-level patterns across taxa. We then applied statistical tests to identify differential RNA edit sites between disease and control groups, both at the level of individual and aggregated gene-based sites.

Table 1.
High-Confidence Missense RNA Edit Sites in Genes Previously Reported to Undergo Editing

Gene Contig ID Position Base change AA change Effect Edit score RA RC RG RT DA DC DG DT

hutU 450_5 1361 A–G K $\to$ R missense 0.57 52 0 70 0 28 0 0 0

pth 2913_2 40 A–G I $\to$ V missense 0.41 56 0 39 0 12 0 0 0

pth 15637_6 40 A–G I $\to$ V missense 0.34 52 0 27 0 40 0 1 0

hsp18 4128_0 104 T–C S $\to$ P missense 0.33 0 301 0 620 0 0 0 20

gdh 4786_6 1325 A–G Q $\to$ R missense 0.22 849 1 237 1 41 0 0 0

pckA 5829_0 1174 A–G S $\to$ G missense 0.19 652 2 154 0 56 0 0 0

pckA 15242_3 1174 A–G S $\to$ G missense 0.14 101 0 18 0 64 0 0 0

gap 9519_26 940 T–C K $\to$ E missense 0.03 1 89 8 2701 0 0 0 119

nusG 13806_3 280 T–C R $\to$ C missense 0.09 330 0 31 0 70 0 1 0

nusG 13806_3 274 A–G K $\to$ E missense 0.07 324 0 25 0 48 0 0 0

hsp18 8387_0 691 A–G S $\to$ P missense 0.03 1716 3 61 0 185 0 1 0

hutU 450_5 1355 A–G Q $\to$ R missense 0.33 49 0 24 0 28 0 0 0

czcA 20584_0 367 A–G I $\to$ V missense 0.04 580 1 22 0 14 0 0 0

metG 33137_1 193 A–G D $\to$ G missense 0.09 341 0 41 0 19 0 0 0

metG 27633_0 193 A–G D $\to$ G missense 0.06 105 0 12 0 22 0 0 0

metG 18355_0 193 A–G D $\to$ G missense 0.06 128 0 11 0 24 0 0 0

metG 33220_2 193 A–G D $\to$ G missense 0.06 133 0 9 0 17 0 0 0

argS 34330_0 1397 A–G A $\to$ T missense 0.06 60 0 13 0 11 0 0 0

argS 13263_0 1397 A–G A $\to$ T missense 0.05 55 0 10 0 10 0 0 0

argS 10031_0 1397 A–G A $\to$ T missense 0.05 59 0 11 0 12 0 0 0

Gene	Contig ID	Position	Base change	AA change	Effect	Edit score	RA	RC	RG	RT	DA	DG	DT
hutU	450_5	1361	A–G	K $\to$ R	missense	0.57	52	0	70	0	28	0	0
pth	2913_2	40	A–G	I $\to$ V	missense	0.41	56	0	39	0	12	0	0
pth	15637_6	40	A–G	I $\to$ V	missense	0.34	52	0	27	0	40	1	0
hsp18	4128_0	104	T–C	S $\to$ P	missense	0.33	0	301	0	620	0	0	20
gdh	4786_6	1325	A–G	Q $\to$ R	missense	0.22	849	1	237	1	41	0	0
pckA	5829_0	1174	A–G	S $\to$ G	missense	0.19	652	2	154	0	56	0	0
pckA	15242_3	1174	A–G	S $\to$ G	missense	0.14	101	0	18	0	64	0	0
gap	9519_26	940	T–C	K $\to$ E	missense	0.03	1	89	8	2701	0	0	119
nusG	13806_3	280	T–C	R $\to$ C	missense	0.09	330	0	31	0	70	1	0
nusG	13806_3	274	A–G	K $\to$ E	missense	0.07	324	0	25	0	48	0	0
hsp18	8387_0	691	A–G	S $\to$ P	missense	0.03	1716	3	61	0	185	1	0
hutU	450_5	1355	A–G	Q $\to$ R	missense	0.33	49	0	24	0	28	0	0
czcA	20584_0	367	A–G	I $\to$ V	missense	0.04	580	1	22	0	14	0	0
metG	33137_1	193	A–G	D $\to$ G	missense	0.09	341	0	41	0	19	0	0
metG	27633_0	193	A–G	D $\to$ G	missense	0.06	105	0	12	0	22	0	0
metG	18355_0	193	A–G	D $\to$ G	missense	0.06	128	0	11	0	24	0	0
metG	33220_2	193	A–G	D $\to$ G	missense	0.06	133	0	9	0	17	0	0
argS	34330_0	1397	A–G	A $\to$ T	missense	0.06	60	0	13	0	11	0	0
argS	13263_0	1397	A–G	A $\to$ T	missense	0.05	55	0	10	0	10	0	0
argS	10031_0	1397	A–G	A $\to$ T	missense	0.05	59	0	11	0	12	0	0

RA, RC, RG, and RT denote RNA read counts for bases A, C, G, and T, respectively. DA, DC, DG, and DT denote corresponding DNA read counts.

3.1. Distribution of RNA editing sites across taxa

The distribution of the 2,295 RNA edit sites across the microbiome was highly family-specific, with 11 bacterial families accounting for the majority of edit sites (see Fig. 2). The top 7 families consisting of Bacteroidaceae, Ruminococcaceae, Rikenellaceae, Porphyromonadaceae, Lachnospiraceae, Clostridiaceae, Eubacteriaceae, had over 2100 of the 2,295 edit sites. Three out of the seven families mentioned above are from the order Bacteroidales in the phylum Bacteroidota, making the order Bacteroidales represent the majority (1711) of the edit sites. The other four are from the class Clostridia in the phylum Bacillota. This non-uniform distribution suggests that RNA editing is not a random occurrence but may represent a lineage-specific regulatory mechanism active in select microbial groups, with particular emphasis on the order Bacteroidales and the class Clostridia. A similar distribution of the count of edit sites by phylum and class is shown in the Appendix (see Figs. A1 and A2). It is worth noting that the edit sites are most abundant in the phylum Bacteriodota with a distant second in the Firmicutes. A minuscule number were found in the phylum Proteobacteria.

FIG. 2.

Family-level distribution of RNA editing sites across the gut microbiome. Bar plot showing the number of unique high-confidence RNA editing sites assigned to bacterial families. Editing sites are non-uniformly distributed, with Bacteroidaceae and Ruminococcaceae contributing the majority of events, consistent with their high abundance in the Integrative Human Microbiome Project (iHMP2) samples.

Importantly, the top four families were also among the most abundant taxa in the iHMP2 dataset, as reported in previous analyses of gut metagenomes from individuals with and without IBD (Lloyd-Price et al., 2019). For example, Bacteroidaceae alone accounted for over 65% of average microbial abundance in healthy individuals, while Ruminococcaceae and Lachnospiraceae were consistently detected at moderate-to-high abundance across all cohorts. The co-occurrence of high abundance and elevated RNA editing activity in these taxa strongly suggests that editing may play a functional role in modulating gene expression or adaptation in dominant gut microbes.

The edit site number distributions at other levels of the taxonomic hierarchy are provided in the Appendix.

3.2. Site-specific differential RNA editing

To investigate whether individual RNA edit sites differ between disease states, we performed site-level differential analysis using the Kruskal–Wallis test across three diagnostic groups: UC, CD, and controls (non-IBD). After adjusting for multiple comparisons using the Benjamini-Hochberg method, no individual RNA edit sites exhibited statistically significant differences (FDR $< 0.05$ ) among the groups. This lack of significant site-level differences suggests that RNA editing variations associated with disease states may not be localized at specific sites but could be more apparent at broader levels, such as gene-level aggregations or pathway-wide alterations. Previous research has highlighted that RNA editing in bacteria often occurs in clusters rather than isolated events, indicating potential coordinated regulatory mechanisms. For example, Bar-Yaacov et al. (2017) reported clustered RNA edit sites within specific bacterial genes, suggesting that these clusters may facilitate rapid and synchronized adaptation to environmental stressors or host interactions. Similarly, Nie et al. (2020) observed clustered A-to-I RNA editing sites associated with increased bacterial pathogenicity and stress resistance, further reinforcing the notion that RNA editing operates in a coordinated fashion at the gene or operon level, influencing broader biological pathways essential for specific functions such as bacterial survival and adaptation. These findings underline the significance of investigating RNA editing at the gene level to understand its comprehensive functional implications in microbial ecosystems.

3.3. RNA editing in highly conserved genes across multiple samples

To explore whether RNA editing patterns are functionally meaningful at broader biological scales, we aggregated RNA editing sites at the gene level, enabling a more integrated analysis of editing activity within entire coding regions. While this analysis ignores the microbial taxa in which the editing may be occurring, such an analysis is in line with the broader hypothesis that microbiomes act in a coordinated fashion where an entire community may be holding receptacles of gene families and consequently imparting functional capabilities to the collective. This approach accounts for the tendency of RNA editing in bacteria to occur in clusters within genes (Bar-Yaacov et al., 2017; Nie et al., 2020), rather than as isolated single-nucleotide changes. Clustered editing also allows for the resulting proteins to have more changes than what may be possible with a single nucleotide edit.

We investigated whether RNA editing preferentially targets functionally important genes by focusing on identifying highly conserved bacterial genes that exhibit multiple RNA editing sites across a set of microbiome samples from multiple samples. This analysis aimed to determine whether essential microbial functions, typically encoded by evolutionarily stable genes, are subject to post-transcriptional modifications. Our analysis identified several genes exhibiting extensive RNA editing across numerous samples. Notably, genes such as dnaK (Heat shock 70 kDa protein), gap (Glyceraldehyde-3-phosphate dehydrogenase), groL (Chaperonin), and pckA (Phosphoenolpyruvate carboxykinase) demonstrated both a high number of RNA edit sites and widespread occurrence across more than 200 samples each, indicating pervasive RNA editing activity across diverse microbial populations. The genes with the highest number of edit sites in a large number of samples are shown in Figure 3.

FIG. 3.

Highly conserved genes exhibit widespread RNA editing across samples. Top 20 genes ranked by the number of samples in which RNA editing was detected. Bars indicate the number of samples containing an editing site within that gene, with total edit-site counts shown in parentheses. Many of these genes (e.g., dnaK, gap, groL, pckA) encode essential metabolic or stress-response functions.

Among these highly edited genes, several encode universally conserved microbial proteins, including dnaK, groL, and gap. These genes perform essential cellular functions such as protein folding, stress responses, glycolysis, and energy metabolism, and their evolutionary conservation is well established in bacterial phylogenetic studies (Woese et al., 1990; Ciccarelli et al., 2006; Jordan et al., 2002). The pronounced RNA editing observed in these conserved genes points to an additional regulatory complexity layer previously underappreciated in microbiological research.

Conventionally, highly conserved genes maintain low sequence variability at the DNA level to preserve essential cellular functions (Jordan et al., 2002). However, our findings reveal frequent RNA edit sites in these conserved genes, suggesting RNA editing as a potential adaptive mechanism allowing transient functional diversification without permanent genomic alterations. Previous studies have highlighted role of RNA editing in microbial adaptation, particularly under environmental stress conditions (Bar-Yaacov et al., 2017; Nie et al., 2020).

The widespread and consistent RNA editing observed in metabolic enzymes, notably gap and pckA, aligns with previous reports identifying these enzymes as critical under nutrient-limiting and inflammatory conditions frequently associated with diseases such as IBD (Shepherd et al., 2018).

Collectively, our data strongly suggest that RNA editing preferentially targets conserved bacterial genes involved in vital cellular processes. The frequent and widespread editing across multiple taxa and samples underscores the role of RNA editing as a dynamic regulatory strategy, potentially facilitating rapid microbial adaptation to environmental challenges or host-associated conditions.

3.4. Gene-specific differential editing

Next, we investigate the relationship between clusters of edit sites and disease. Using mean RNA edit scores across the genes normalized by approximate gene length, we performed differential RNA editing analysis at the gene level across the three diagnostic groups: UC, CD, and non-IBD controls. After correcting for multiple hypothesis testing using the Benjamini–Hochberg procedure, several genes exhibited significant differences in editing levels at FDR < 0.1. Figure 4 presents a heatmap of the mean gene-level RNA editing scores in the three cohorts. Among the genes showing differential editing, bcd2 (acyl-CoA dehydrogenase, involved in fatty-acid $β$ -oxidation; higher editing in CD), pckA (phosphoenolpyruvate carboxykinase; higher editing in controls), gap (glyceraldehyde-3-phosphate dehydrogenase; lower editing in UC), and hutU (urocanate hydratase, involved in histidine catabolism) stood out for their group-specific editing patterns. (see Fig. 5). Note that some of the genes with the most number of edit sites are present in this heatmap, but are not necessarily front-runners in displaying disease-specific editing.

FIG. 4.

Heatmap of mean gene-level RNA editing scores for selected biologically relevant genes across ulcerative colitis (UC), Crohn’s disease (CD), and non-inflammatory bowel disease (nonIBD) controls. Editing scores represent the mean aggregated site-level editing normalized by gene length. Genes shown passed false discovery rate (FDR) < 0.1 using Kruskal–Wallis testing. Color scale indicates relative editing intensity.

FIG. 5.

Disease-associated differences in RNA editing for key metabolic genes. Violin plots showing the distribution of gene-level RNA editing scores for pckA, bcd2, gap, and hutU across UC, CD, and non-IBD cohorts. Each point represents an individual sample. Group differences were assessed using pairwise Mann–Whitney U tests with FDR correction; adjusted p values are shown.

These results support the hypothesis that RNA editing may be involved in adaptive transcriptional regulation of bacterial gene function in response to host-associated conditions. The disease-specific editing patterns observed in metabolic genes like gap and pckA align with previous reports describing metabolic shifts and microbial functional reprogramming in the gut microbiome during IBD progression (Lloyd-Price et al., 2019). Collectively, our gene-level analysis suggests that RNA editing contributes to functional modulation of microbial activity, potentially enhancing the ability of gut bacteria to respond dynamically to inflammatory environments.

3.5. Gene-based differential RNA editing within taxonomic groups

Further stratifying the analysis by taxonomic groups highlighted distinct differential editing patterns at the gene level in each family. Within the Ruminococcaceae family (see Fig. 6) the gene nagA (N-acetylglucosamine-6-phosphate deacetylase, involved in amino sugar metabolism) exhibited significantly lower editing levels in CD samples, while the Bacteroidaceae family (see Fig. 7) showed increased hsp20(small heat shock protein) editing in UC samples. These taxon-specific patterns illustrate the nuanced interplay between bacterial taxa and host disease states, suggesting that RNA editing within specific taxa contributes to functional adaptations of the microbiome, potentially influencing the progression and severity of IBD.

FIG. 6.

Taxon-specific gene-level RNA editing patterns. Violin plots showing RNA editing scores for selected genes within the Ruminococcaceae family across diagnostic groups. These analyses highlight lineage-specific RNA editing differences that are masked in global gene-level aggregation. Statistical testing was performed using Mann–Whitney U tests with FDR correction. Adjusted p values are shown.

FIG. 7.

Taxon-specific gene-level RNA editing patterns. Violin plots showing RNA editing scores for selected genes within the Bacteroidaceae family across diagnostic groups. These analyses highlight lineage-specific RNA editing differences that are masked in global gene-level aggregation. Statistical testing was performed using Mann–Whitney U tests with FDR correction. Adjusted p values are shown.

Additional violin plots for significantly differentially edited genes from other families (including Porphyromonadaceae) are shown in Appendix Figure A3.

4. DISCUSSION

Our study highlights RNA editing as an important mechanism for microbial adaptation within the human gut microbiome. Utilizing a de novo assembly approach allowed unbiased detection of editing sites across diverse, understudied bacterial taxa and unclassified novel strains. Enrichment of RNA editing in family-level taxa like Bacteroidaceae and Ruminococcaceae suggests adaptive responses to inflammatory conditions, supporting prior associations with functional dysbiosis in IBD.

We validated our predictions in two ways: (i) overlap with previously confirmed edits in E. coli (Bar-Yaacov et al., 2017; Mehta et al., 2025), and (ii) stringent multi-sample filtering that excludes DNA mutations. Nevertheless, experimental validation (e.g., Sanger sequencing, strand-specific RNA-seq) will be critical in future studies. A limitation of our approach is that low-abundance taxa with fragmented assemblies may be underrepresented due to coverage filters. While this reduces false positives, it may underestimate editing diversity in rare microbes. Also, our statistical models do not currently adjust for host metadata (e.g., age, medication, sequencing depth), as these data points were sparse in iHMP2. Future work with linear mixed models will better account for confounders in experiments designed to study RNA editing.

The observed taxon-specific RNA editing indicates evolutionary adaptations tailored to distinct bacterial metabolic and physiological demands, aligning with previous isolated bacterial studies. Despite extensive analyses, we did not identify significant differential editing at individual sites across disease states, suggesting subtle site-level variations or broader cluster-based effects, consistent with the clustered RNA editing patterns described in the literature.

Notably, significant gene-level editing in stress-response and metabolic genes, including dnaK, gap, pckA, and groL, highlights their potential roles in bacterial adaptation to inflammatory stress. To further contextualize these findings, we report effect sizes for gene-level differential RNA editing as both absolute differences in mean editing scores and log₂ fold changes (Supplementary Table S1). Genes exhibiting the largest effect sizes encode enzymes central to microbial metabolism and stress adaptation. For example, gap (glyceraldehyde-3-phosphate dehydrogenase), a key glycolytic enzyme involved in energy production, showed reduced RNA editing in ulcerative colitis relative to non-IBD controls ( $Δ$ mean $\approx - 0.04$ , log₂FC $\approx - 0.39$ ), while pckA (phosphoenolpyruvate carboxykinase), which plays a central role in gluconeogenesis and metabolic flexibility, exhibited lower editing in ulcerative colitis and Crohn’s disease ( $Δ$ mean $\approx - 0.04$ to $- 0.02$ , log₂FC $\approx - 0.27$ ). In contrast, bcd2 (acyl-CoA dehydrogenase), involved in fatty-acid $β$ -oxidation, displayed elevated RNA editing in Crohn’s disease relative to both ulcerative colitis and non-IBD controls ( $Δ$ mean $\approx + 0.06$ , log₂FC $\approx + 0.23$ ). Together, these effect sizes indicate post-transcriptional modulation of core microbial metabolic pathways in response to host inflammatory environments. Similar metabolic reprogramming of the gut microbiome has been reported in inflammatory bowel disease, where shifts in carbohydrate and lipid metabolism are hallmarks of dysbiosis (Lloyd-Price et al., 2019; Shepherd et al., 2018). Additionally, taxon-specific editing patterns (e.g., nagA in Ruminococcaceae, ompA (outer membrane protein in Bacteroidaceae) reflect nuanced microbial–host interactions, potentially influencing disease severity.

In conclusion, our findings underscore RNA editing as a critical, dynamic, and taxon-specific regulatory mechanism in the gut microbiome, warranting future experimental validation and functional studies to elucidate its therapeutic potential.

AUTHORS’ CONTRIBUTIONS

A.M.: Conceptualization, methodology, software, formal analysis, investigation, data curation, visualization, writing—original draft. V.S.: Methodology, software, formal analysis, writing—review and editing. K.M.: Resources, biological interpretation. G.N.: Conceptualization, supervision, project administration, funding acquisition, writing—review and editing.

Footnotes

ACKNOWLEDGMENT

The authors thank ICCABS 2025 for the invitation to write for the JCB (Journal of Computational Biology, SI. The authors also thank the Integrative Human Microbiome Project (iHMP2) consortium for generating and making available the multi-omics datasets used in this study. The authors also acknowledge the Florida International University High Performance Computing resources for computational support.

AUTHOR DISCLOSURE STATEMENT

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

FUNDING INFORMATION

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Supplemental Material

Appendix

References

Andrews

. FastQC: A quality control tool for high throughput sequence data. Babraham Bioinformatics 2010 http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Bar-Yaacov

, Mordret

, Biniashvili

, et al. RNA editing in bacteria recodes multiple proteins and regulates an evolutionarily conserved toxin-antitoxin system. Genome Res 2017;27(10):1696–1703; doi: 10.1101/gr.222760.117

Benjamini

, Hochberg

. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B 1995;57(1):289–300.

Blow

, Futreal

, Wooster

, et al. A survey of RNA editing in human brain. Genome Res 2004;14(12):2379–2387.

Cantalapiedra

, Hernández-Plaza

, Letunic

, et al. eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol Biol Evol 2021;38(12):5825–5829.

Christofi

, Zaravinos

. RNA editing in the forefront of epitranscriptomics and human health. J Transl Med 2019;17(1):319.

Ciccarelli

, Doerks

, von Mering

, et al. Toward automatic reconstruction of a highly resolved tree of life. Science 2006;311(5765):1283–1287.

Cock

PJA

, Antao

, Chang

, et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009;25(11):1422–1423.

Gott

, Emeson

. Functions and mechanisms of RNA editing. Annu Rev Genet 2000;34:499–531.

10.

Huerta-Cepas

, Szklarczyk

, Heller

, et al. eggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource. Nucleic Acids Res 2019;47(D1):D309–D314.

11.

Jordan

, Rogozin

, Wolf

, et al. Essential genes are more evolutionarily conserved than nonessential genes in bacteria. Genome Res 2002;12(6):962–968.

12.

Katoh

, Standley

. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol 2013;30(4):772–780.

13.

Krueger

. Trim Galore: A wrapper tool around Cutadapt and FastQC. Babraham Bioinformatics 2012 https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/

14.

Langmead

, Salzberg

. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9(4):357–359.

15.

, Liu

C-M

, Luo

, et al. MEGAHIT: An ultra-fast single-node solution for large metagenomics assembly. Bioinformatics 2015;31(10):1674–1676.

16.

Lloyd-Price

, Arze

, Ananthakrishnan

, IBDMDB Investigators. et al.; Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 2019;569(7758):655–662.

17.

Mallela

, Nishikura

. A-to-I editing of protein coding and noncoding RNAs. Crit Rev Biochem Mol Biol 2012;47(6):493–501.

18.

Mehta

. MEditome: A computational framework for microbial RNA editing discovery and analysis. Version 1.0. GitHub repository, 2025. Available from: https://github.com/ameht014/MEditome

19.

Mehta

, Stebliankin

, Mathee

, et al. MetaEdit: Computational identification of RNA editing in microbiomes. In Computational Advances in Bio and Medical Sciences, volume 15599 of Lecture Notes in Computer Science. Springer; 2025, pp. 157–170; doi: 10.1007/978-3-032-02489-3_12

20.

Nayfach

, Pollard

. Toward accurate and quantitative comparative metagenomics. Cell 2016;166(5):1103–1116; doi: 10.1016/j.cell.2016.08.007

21.

Nie

, Wang

, He

, et al. A-to-I RNA editing in bacteria increases pathogenicity and tolerance to oxidative stress. PLoS Pathog 2020;16(8):e1008740.

22.

Picardi

, Pesole

. Reditools: High-throughput rna editing detection made easy. Bioinformatics 2013;29(14):1813–1814; doi: 10.1093/bioinformatics/btt287

23.

Quast

, Pruesse

, Yilmaz

, et al. The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools. Nucleic Acids Res 2013;41(Database issue):D590–D596.

24.

Ross

, Russ

, Costello

, et al. Characterizing and measuring bias in sequence data. Genome Biol 2013;14(5):R51; doi: 10.1186/gb-2013-14-5-r51

25.

Shepherd

, DeLoache

, Pruss

, et al. An exclusive metabolic niche enables strain engraftment in the gut microbiota. Nature 2018;557(7705):434–438.

26.

Wang

, Wang

, Ji

, et al. Res-scanner: A software package for genome-wide identification of rna-editing sites. Gigascience 2016;5(1):37; doi: 10.1186/s13742-016-0143-4

27.

Woese

, Kandler

, Wheelis

. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci USA 1990;87(12):4576–4579.

28.

Wolf

, Gerber

, Keller

. tadA, an essential tRNA-specific adenosine deaminase from Escherichia coli. Embo J 2002;21(14):3841–3851.

29.

Zhang

, Lu

, Yan

, et al. Sprint: An snp-free toolkit for identifying rna editing sites. Bioinformatics 2017;33(22):3538–3548; doi: 10.1093/bioinformatics/btx473

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.07 MB