Abstract
Background:
Autophagy and apoptosis are cellular processes that maintain cellular homeostasis and remove damaged or aged organelles or aggregated and misfolded proteins. Stress factors initiate the signaling pathways common to autophagy and apoptosis. An imbalance in the autophagy and apoptosis, led by cascade of molecular mechanism prior to both processes culminate into neurodegeneration.
Objective:
In present study, we urge to investigate the codon usage pattern of genes which are common before initiating autophagy and apoptosis.
Methods:
In the present study, we took up eleven genes (DAPK1, BECN1, PIK3C3 (VPS34), BCL2, MAPK8, BNIP3 L (NIX), PMAIP1, BAD, BID, BBC3, MCL1) that are part of molecular signaling mechanism prior to autophagy and apoptosis. We analyzed dinucleotide odds ratio, codon bias, usage, context, and rare codon analysis.
Results:
CpC and GpG dinucleotides were abundant, with the dominance of G/C ending codons as preferred codons. Clustering analysis revealed that MAPK8 had a distinct codon usage pattern compared to other envisaged genes. Both positive and negative contexts were observed, and GAG-GAG followed by CTG-GCC was the most abundant codon pair. Of the six synonymous arginine codons, two codons CGT and CGA were the rarest.
Conclusions:
The information presented in the study may be used to manipulate the process of autophagy and apoptosis and to check the pathophysiology associated with their dysregulation.
Keywords
INTRODUCTION
Autophagy is a catabolic process that is essential to maintain cellular homeostasis. Homeostasis is maintained by the degradation and recycling of degraded cellular products like aggregated or misfolded proteins, and damaged organelles. Autophagic dysfunction has been linked to various ailments including obesity [1], heart diseases [2], neurodegeneration [3], iron accumulation [4], cancer [5], diabetes [6] autoimmune [7] and inflammatory diseases [8], and general aging of the body [9].
Apoptosis is an evolutionary conserved programmed cell death mechanism. It brings morphological changes coupled with enzyme-dependent biochemical processes and cellular removal from the body [10]. Low apoptosis with high proliferation rates provokes cancer. On the other hand, higher apoptosis rates, with standard cellular proliferation rate, results in degenerative diseases like Alzheimer’s disease (AD), Parkinson’s disease (PD), and rheumatoid arthritis.
With the growing age, autophagy decreases [11] while apoptosis increases [12], and there are molecular mechanisms that regulate the cross-talk between apoptosis and autophagy [13]. Dysregulated autophagy and apoptosis are causatives of several neurodegenerative disorders like familial parkin disorder [14, 15], AD and focal cerebral ischemia [16], and Huntington’s disease progression [17], amyotrophic lateral sclerosis, PD, AD, and polyglutamine (polyQ) disease [18], and amyotrophic lateral sclerosis [19] and an interplay between autophagy and apoptosis has been meticulously reviewed by Mariño et al. [20].
Considering the importance of autophagy and apoptosis in the pathology of neurodegenerative disorders, we investigated common pathways leading to autophagy and apoptosis. Based on an extensive review by Mariño et al. [20], we found eleven genes (DAPK1, Beclin1, VPS34, BCL2, JNK, NIX, NOXA, BAD, BID, PUMA, and MCL1) those are shared in the process to autophagy and apoptosis. In the present study, we performed compositional analysis, dinucleotide odds ratio analysis, codon usage, codon context, and rare codon analysis with the investigation of evolutionary forces (selection and mutation) acting on gene sequences. Before the initiation of the process of autophagy and apoptosis, a shared pathway encompassing these eleven genes is present. Understanding molecular patterns present in these genes might help understand their functions and, in the future, might be used to target aberrant autophagy and apoptosis leading to neurodegenerative disorders.
MATERIALS AND METHODS
Sequence retrieval
The transcripts of DAPK1, Beclin1, VPS34, BCL2, JNK, NIX, NOXA, BAD, BID, PUMA, and MCL1 genes were retrieved from National Centre for Biotechnology Information (NCBI). Transcripts’ lengths varied between 165 bp and 4293 bp. The transcripts started with ATG, ended with stop codons, and devoid of ambiguous bases were considered for the study. For the DAPK1, BECN1, PIK3C3, BCL2, MAPK8, BNIP3L, PMAIP1, BAD, BID, BBC3, and MCL1 genes 05, 05, 02, 04, 17, 02, 05, 02, 07, 05, and 03 transcripts, respectively, fulfilled the criteria. Accession numbers of transcripts are given in Supplementary Table 1. We understand that the number of genes is less to get statistically very accurate data, but this is the highest number of genes responsible for common pathways resulting in autophagy and apoptosis. To compare our data with the unrelated family of sequences, we compared our sequences with that of transcripts associated sequences common in cancer and neurodegeneration [21].
Compositional analysis
Overall Nucleotide composition % A, % T, % C, and % G was determined along with the composition at the first, second, and third codon positions (% A3, % T3, % C3, and % G3). Percent GC3 is the determinant of both composition and codon bias [22]. The average % GC at the first and second codon position (% GC12) and % GC at the third codon position (% GC3) were determined to conduct regression analysis. Regression analysis is deciphering the neural forces acting on the nucleotide sequence.
Dinucleotide odds ratio
The odds ratio suggests the likelihood of any dinucleotide in the coding sequence. It is the ratio between the observed and expected frequency of the dinucleotide. Dinucleotides with values below 0.78 and above 1.23 are considered underrepresented and overrepresented, respectively [23].
Relative synonymous codon usage (RSCU)
RSCU is a measure for determining codon bias depicted as the ratio of the observed to the expected frequency. Values below 0.6 indicate underrepresented codons, while values above 1.6 indicate overrepresented codons [24]. RSCU values >1 and <1 indicate preferred and nonpreferred codons, respectively [25]. Codon usage difference between envisaged sequences and the sequences common in cancer and neurodegeneration was determined through t-test analysis with 10000 permutation combinations.
Codon adaptation index (CAI)
Sharp and Li (1987) [26] proposed the codon adaptation index as a measure of codon adaptation of a given gene. Higher CAI value showed a higher level of gene expression. CAIcal server developed by Puigbo et al. [27] was used to determine the CAI and the value range between 0 and 1.
Effective number of codons (ENc)
ENc is a non–directional measure of codon usage bias; thus, higher values show lower bias. Due to mutation pressure and genetic drift, % GC3 composition influences Enc [28]; however, it remains unaffected by gene length. The value ranges between 20 and 60. The value of 20 indicates extreme bias where only one codon is used out of the synonymous codons available. In contrast, 60 indicates that all available synonymous codons are used for encoding an amino acid. The values above 45 are generally considered lower bias [29].
Clustering analysis
For clustering purposes, ClustVis web tool was used that uses the dist and hclust functions based on R [30]. A heatmap is generated based on RSCU values.
Codon pair context analysis
Codon pair context provides insight into the possibilities of co-occurrence of two codons together. Some codon pairs appear greater in number in any gene than other codon pairs, called codon pair bias. Codon pair frequency was determined using Anconda2.0 software that prepares the table of codon pair usage based on contingency analysis and preferred and non-preferred pairs of codons are obtained [31].
Rare codon analysis
Although optimal codons are widely taken as a parameter that is responsible for the higher expression of genes, the significance of rare codons also cannot be ignored. Rare codons may be present within the ramp sequence, which is a 20–40 nucleotide stretch and present at the 5’ end of any gene, controlling the rate of translation and thereby regulating the proper folding of the protein [32]. Moreover, the addition of rare codons may influence the translation rate and resulting protein yield and solubility [33]. Rare codon analysis was done in Anaconda 2.0 software [34].
RESULTS
Compositional analysis revealed GC-rich transcripts
Based on the compositional analysis, it was evident that the gene transcripts are GC-rich. Overall, % GC was 53.54±10.3%, while % AT was 46.45±10.3%. The overall GC percentage appeared wide, from 76.28% (BBC3) to 40.45% (PIK3C3). Overall average % G, % A, and % G composition were the highest while % T, % G, and % T were the lowest at the first, second, and third codons, respectively (Fig. 1).

Percentage composition of nucleotides at three codon positions. In composition % A1, % G1%, C2%, T2%, and G2% outliers are present. Boxes represent the compositions of 25% –75% of samples. The small box inside the outer bigger box indicated the mean value. The figure shows that % T composition at the first codon position is the least among all nucleotides, and percent composition varied highly for all four nucleotides at the third codon position.
Odds ratio analysis revealed overrepresented ApG, CpC, GpA, GpC, and GpG dinucleotides
We analyzed 16 dinucleotide pairs present, and the average value of five dinucleotides (ApG, CpC, GpA, GpC, GpG) was found overrepresented (odds ratio>1.23). For dinucleotide, CpC and GpG, the odds ratio was more than 1.3; for the other three dinucleotides, it was between 1.23 and 1.3 (Fig. 2). CpC and GpG dinucleotides were highly used, while TpA and GpT were the least used dinucleotides.

Dinucleotide odds ratio for genes common in molecular mechanisms leading to autophagy and apoptosis.Bars representing the CpC and GpGdinucleotides showed maximum odds ratio. The result demonstrates that the odds ratio varied very little for CpT and GpAdinucleotides, while it is pretty variable for CpC and CpG. The small box inside the outer bigger box indicated the mean value. Based on the figure, it is evident that % T composition at the first codon position is the least among all nucleotides, and percent composition varied highly for all the four nucleotides at third codon position.
RSCU analysis revealed dominance of G/C ending codons
Based on the average RSCU value, we determined the most preferred synonymous codon for each amino acid, and out of 18 amino acids, G/C ending codons were preferred for fifteen amino acids. In contrast, only three preferred codons were A/T ending. The RSCU table for all eleven envisaged genes is given in Table 1.
Average RSCU value for genes
RSCU values of the envisaged genes were compared with the RSCU values of the sequences common to cancer and neurodegeneration, and the t-test was performed. The result is given in Table 2 The results revealed that out of 59 codons (excluding Met, Trp, and stop codons), for 17 codons, the codon usage was significantly different among the envisaged genes and genes commonly involved in cancer and neurodegeneration.
Comparison of codon usage of autophagy apoptosis-related sequences with that of sequences involved in cancer and neurodegeneration [21]
***p<0.001; **p<0.01; *p<0.05; NS, non-significant.
With increasing codon bias, gene expression is reduced
To determine the effect of codon bias on gene expression, we did a correlation analysis between CAI and codon bias (ENc). We found a significant negative Pearson correlation (r =–0.668, p<0.0001), suggestive of the fact that with increasing bias in codon usage, gene expression also increased. A positive correlation of CAI with overall GC content (0.425, p<0.001) was found. The analysis suggested that if codon bias is enhanced, protein expression is also increased, and thus, manipulation in the codon bias might alter gene expression level.
Clustering analysis revealed different codon usage pattern of MAPK8 gene from all other genes
When the transcripts of the eleven genes envisaged were subjected to hierarchal clustering, the transcripts were clustered mainly into two groups. MAPK8 gene transcripts clustered separately from all other gene transcripts. All transcripts belonging to a single gene were clustered together except for MCL1 transcripts. A total of 3 transcripts were present in the MCL1 gene, where two transcripts (NM_021960.5 and NM_182763.3) clustered separately from one transcript (NM_001197320.2) (Fig. 3).

RSCU based clustering analysis for genes common to molecular mechanisms involving autophagy and apoptosis. Out of 3 transcripts of the MCL1 gene, two transcripts clustered separately from the remaining transcript. MAPK8 gene transcripts formed a separate cluster from the other transcripts, showing a different gene origin.
Codon context
Sometimes codon contexts appear to act as molecular signatures [35]. Both codon bias and codon context bias analyses have implications for optimizing and facilitating gene expression [36]. There is a presence of codon context in the genes common in molecular mechanisms following autophagy and apoptosis. The 5’ context is given in Fig. 4. Codon context at the start and +2 codon might influence gene expression by 20 folds and result from an evolutionary adaptation [37]. A context bias is far stronger in low-expressed genes than in high-expressed ones, and context-level polarity exists where low and reproducible gene product levels are required [38]. Both preferred and non-preferred codon pairs are present in the present study. The GAG-GAG codon pair was most abundant, followed by the CTG-GCC codon pair. If the codon context changes, a codon pair may become unfavored. GAG-TGC and GAG-TCG are examples, present only three times in all the envisaged sequences.

Differential display map for codon context analysis. The color coding scale is indicated in the figure. A positive context is displayed with pink, while a negative context is displayed with blue. Here both X and Y axes are 5′ and 3′ codons of a codon pair, respectively.
Out of six, two Arg codons are rarely used in genes common to molecular mechanisms leading to autophagy and apoptosis. The frequency of the TAA codon was the least and TGA codon was relatively common as stop codon. Codons CGT and CGA encoding for Arg were the least used codons proceeding by stop codons. The most abundant codons were CTG (Leu), GAG (Glu), and CAG (Gly), with frequencies 3.82%, 3.68%, and 3.36%, respectively.
DISCUSSION
Autophagy and apoptosis are self-destructive mechanisms that eliminate aged or damaged cells and organelles. Apart from the homeostatic function, apoptosis also has a role in metabolism during starvation conditions. In response to several kinds of stresses, the cell may undergo preferentially through autophagy or apoptosis, and the choice depends on the magnitude of the stimulus. Stresses like reactive oxygen species, ceramide, the elevation of cytosolic Ca2+, p53, BH3-only proteins, and death-associated protein kinases can stimulate both apoptosis and autophagy [39]. A critical balance between autophagy and apoptosis is essential for cell fate, especially for long-lived cells, including neurons. Neurodegenerative disorders like PD result from progressive neuron loss in the substantianigra pars compacta. In PD, BCL2 family members play a major role in regulating apoptosis and autophagy. Therefore, restoring dysregulated autophagy and apoptosis could be a promising strategy for treating PD [40]. Cross-talk between autophagy and apoptosis has also been shown in heart disease [41]. In the case of atherosclerosis, moderate autophagy can protect cells from apoptosis and necrosis through clearing damaged organelle, reduced inflammation, and plaque stabilization, and dysregulated autophagy promotes inflammation and leads to the development of atherosclerosis [42]. Although autophagy and apoptosis are markedly different mechanisms, some pathways regulate both autophagy and apoptosis and can supplement apoptosis [41]. Since both processes are involved in complex cross-talk, and few mechanisms lead to both autophagy and apoptosis, if we control these mechanisms, eventually, it will regulate autophagy and apoptosis. Based on an exhaustive review by Mariño et al. [20], we chose eleven genes responsible for regulating these degradative types of machinery. If we could modulate the expression of these genes, in the future, we would be able to control or at least delay ailments caused by aberrant autophagy or apoptosis.
We first initiated the molecular characterization and did nucleotide compositional analysis. Nucleotide composition influences codon usage and is specific within genes or species. It is a feature that uniquely affects the codon usage bias [43]. There is a difference in codon usage pattern in prokaryotes and eukaryotes. The G ending or C ending codons are generally preferred in eukaryotes compared to AT ending codons. The conclusion is based on a comparison of prokaryotes and eukaryotes having similar GC content (p = 0.75) [44]. Our study is no exception to the general phenomenon, and the genes were found GC rich. In the envisaged genes, G and G3 were the highest while T and T3 were lowest. In the SPANX gene in human, A and G3 were high while T and T3 were low [45]. It is a general observation that if a genome is AT-rich, A/T ending codons are preferred while vice versa in the opposite case [46]; with some exceptions, the case like in the Drosophila genome where 65% of nucleotide content is AT, still in all optimal codons are G or C endings [47]. In the present study also maximum of the preferred codon was G/C ending. The impact of nucleotide composition greatly affected the dinucleotide content. In several organisms, dinucleotide odds ratio is shown to affect the overall codon usage bias [48]. Dinucleotides reflect the overall composition and indicate a selection or role in physiological functions [22]. TpA dinucleotide was the least used, followed by GpT, TpT, and ApT. The TpA dinucleotide was found to be underrepresented in several studies and found selectional forces behind it [49]. Being part of two of the three stop codons (TAG and TAA) and roles in mRNA destabilizing effect, TpA dinucleotide is subjected to negative selection [50]. CpG is the second dinucleotide that is frequently under negative selection; however, in the present study, it was only marginally underrepresented, and in fact, in three genes, BBC3, BCL2, and MCL1, it was indeed overrepresented. The CpG dinucleotide has been found to be overrepresented in a few predicted isoforms of pro-apoptotic Bim gene isoforms and speculated to be under positive selective forces [22]. The CpG residues finely tune the transcriptional activity [51] and expression level, and changes in the intragenic level of CpG content might result in the depletion in protein quantities [52]. CpC and GpG overrepresentation were observed in our study in concordance with our previous results obtained in proapoptotic Bim gene isoforms [22]. CpC has been implicated in non-CpG methylation and indicated in disorders including AD, Rett syndrome, fragile X syndrome, PD, and amyotrophic lateral sclerosis [53]. GGGGCC stretch harboring both the GpG and CpC are present in the C9orf72 gene [54], with the abundance of GpG dinucleotide [50]. GpG and CpC dinucleotide overabundance could be the result of the composition factor, too, since our sequences are GC-rich. High CpG content is allied with higher proinflammatory cytokine production and NF-κB activation that eventually leads to high pathogenicity of bacterial and viral pathogens [24], the contrarily long-term evolutionary history of SARS-CoV-2 lineage showed a gradual reduction in CpG content potentially to escape from their host defense mechanisms [55]. Thus, the dinucleotide pattern reveals several unique molecular features. In the present analysis, the unique pattern of dinucleotide is obtained that might be attributed to a complex interplay of factors, including selection, mutation, nucleotide composition, and GC-biased gene conversion [24]. In our study, GC content was shown to be associated with the codon adaptation index, and our results are in concordance with the large-scale analysis of codon usage bias in 4868 bacterial genomes [56]. Codon pair context is an important phenomenon influencing gene expression. Some codon pairs are preferred while others are not and called codon pair bias. It affects translational fidelity and efficiency and is under selective pressure. The significance of synonymous codon pairs has been implicated in diseases [57,58, 57,58]. Stop codon readthrough occurs when ribosomes misread the stop codon and insert an amino acid instead of chain termination. It is desirable when a disease is caused by a premature termination codon in a critical gene. It has been found that stop codon context plays a role in the stimulation of termination codon readthrough by aminoglycosides, and this miscoding has been observed during the translation of histone genes, selenoprotein genes, and S-adenosylmethionine decarboxylase [59]. Nonsense mutations leading to the formation of stop codons are a potential cause of peroxisomal biogenesis disorders. One of the most severe peroxisome biogenesis disorders is Zellweger Syndrome, which has fatal cerebral, renal, and hepatic symptoms. The impact of readthrough-stimulation inducer drugs on stop codon is highly context-specific. It is a deciding factor in choosing the correct drug and amount to be given to patients for these readthrough-based therapies [60]. Furthermore, translation termination is context-dependent in Ciliates where alternative nuclear genetic codes are present [61]. The leaky stop codon context is found to be associated with codon context after the stop codon that determines readthrough efficiency [62]. Identical codon pairing increases translational efficiency within the genes since similar codons encoding for the same amino acid are translated by the same tRNA before it is diffused from the ribosome [63]. GAG-GAG codon pair was the most abundant codon pair in our study. The same codon pair is abundant in leaf cutter ant (Atta cephalotes) [64], and (Marsupenaeus japonicas) [65] and thus shows some evolutionary significance. Among different populations, CTG, GAG, and ACA codon pairing were significantly different [66]. Thus, codon pairing also appears to be part of the selectional drive and influences protein expression. In the experiment of Chevance et al. (2014), codon pairs’ impact on the translation speed rate in lac operon has been studied. A codon context with arginine codons and successive proline codons were amongst the slowest codon pairs translated in vivo. Two arginine encoding codons CGA and CGT were the rarest in the present study, and when their context with 3’ proline was tested, we found that with CGA-CCG, CGA-CCC, and CGA-CCA were present in the frequency of 4, 5, and 17. While CGA-CCU was absent. The codon context of CGU with any of the proline codons was completely absent, and we speculated strong detrimental effects of this codon pair on protein expression, which resulted in selection force-based elimination of these codons. Not only the optimal codons but the rare codon also influences gene expression. Rare codons prevent the traffic jam of ribosomes during the translation process; at rare codons, ribosomes halt, and the protein is folded correctly downstream. Rare codons can cause ribosome stalling at the 5’ end and may inhibit translation initiation itself [32]. As a proof of concept, in an Escherichia coli model, a molecular device is constructed with an engineered rare codon by Wang and colleagues [68] to optimize the metabolic pathways using the rare codon AGG. Altered expression level is observed in fabZ, fabG, fabI, and tesA’ genes, the genes involved in the fatty acid synthesis pathway. Our analysis is completely based on in-silico work. Thus, we cannot prove the same claim for our envisaged genes until it is validated in an in vivo system (which can be part of further investigation by other researchers); however, the information from the experiment of Wang et al. [68] underscores the significance of the usage of rare codon as a molecular device to regulate protein expression and our result could further fill the knowledge gap for genes present as a common pathway leading to apoptosis and autophagy.
Conclusion
Autophagy and apoptosis are imperative to cell homeostasis and share common molecular processes. Therefore, manipulating these molecular mechanisms may pave the way to manipulate and control autophagy and apoptosis. In present study, we envisaged eleven genes (DAPK1, BECN1, PIK3C3 (VPS34), BCL2, MAPK8, BNIP3 L (NIX), PMAIP1, BAD, BID, BBC3, MCL1) that are part of molecular signaling mechanism prior to autophagy and apoptosis. MAPK8 gene transcripts clustered separately from all other gene transcripts showing independent evolution of the gene. Codon pairs GAG-GAG and CTG-GCC were the most abundant codon pairs in the envisaged genes, thus disrupting this context might be used role in reducing the expression of these genes.CTG (Leu), GAG (Glu), and CAG (Gly), were the most frequent codons while CGT and CGA (both encoding for Arg) were the rarest codons (Table 3). Rare codons CGT and CGA also can be placed in ramp sequence to limit the rate of translation and diseases which arose due to misfolded protein might be addressed. Disrupting favorable codon pairs and introduction of rare codons slows down the rate the translation while introduction of optimal codons improves the rate. Our study investigated molecular characters like overall composition and associated parameters like dinucleotide odds ratio and RSCU analysis. We performed codon context, codon pair, and rare codon analyses, and such information might help in manipulating the said processes employing the advanced gene modification techniques. It can be used in fighting ailments caused by aberrant autophagy and apoptosis.
Rare codon analysis of genes common to molecular mechanisms leading to autophagy and apoptosis
AUTHOR CONTRIBUTIONS
Rekha Khandia (Conceptualization; Data curation; Investigation; Methodology; Supervision; Writing – original draft; Writing – review & editing); Pankaj Gurjar (Conceptualization; Formal analysis; Investigation; Software; Writing – original draft; Writing – review & editing); Victoria Romashchenko (Conceptualization; Formal analysis; Software; Writing – review & editing); George Zouganelis (Conceptualization; Data curation; Writing – original draft; Writing – review & editing); Sami A. Al-Hussain (Conceptualization; Data curation; Writing – original draft); Athanasios Alexiou (Conceptualization; Data curation; Validation; Writing – original draft); Magdi E.A. Zaki (Conceptualization; Data curation; Formal analysis; Writing – original draft; Writing – review & editing).
Footnotes
ACKNOWLEDGMENTS
The authors appreciate the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU), Saudi Arabia; Barkatullah University, India; Saveetha Institute of Medical and Technical Sciences, India; Novel Global Community Educational Foundation,; Australia; RUDN University, Russian Federation; and University of Derby, United Kingdom for conducting this work.
FUNDING
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research through the project number IFP-IMSIU-2023131. The authors also appreciate the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) for supporting and supervising this project.
CONFLICT OF INTEREST
The authors declare no conflict of interest. Rekha Khandia and Pankaj Gurjar are Editorial Board Members of this journal but were not involved in the peer-review process of this article nor had access to any information regarding its peer-review.
DATA AVAILABILITY
The data supporting the findings of this study are available within the article and/or its supplementary material.
