Abstract
The transition from genomic ribonucleic acid (RNA) to deoxyribonucleic acid (DNA) in primitive cells may have created a selection pressure that refined the genetic alphabet, resulting from the global weakening of the N-glycosyl bonds. Hydrolytic rupture of these bonds, termed deglycosylation, leaves an abasic site that is the single greatest threat to the stability and integrity of genomic DNA. The rates of deglycosylation are highly dependent on the identity of the nucleobases. Modifications made to the bases, such as deamination, oxidation, and alkylation, can further increase deglycosylation reaction rates, suggesting that the native bases provide optimum N-glycosyl bond stability. To protect their genomes, cells have evolved highly specific enzymes called glycosylases, associated with DNA repair, that detect and remove these damaged bases. In RNA, however, the occurrence of many of these modified bases is deliberate. The dichotomous behavior that cells exhibit toward base modifications may have originated in the RNA world. Modified bases would have been advantageous for the functional and structural repertoire of catalytic RNAs. Yet in an early DNA world, the utility of these heterocycles was greatly diminished, and their presence posed a distinct liability to the stability of cells' genomes. A natural selection for bases exhibiting the greatest resistance to deglycosylation would have ensured the viability of early DNA life, along with the recruitment of DNA repair. Key Words: DNA—Nucleic acids—RNA world—Asteroid—Chemical evolution—Ribozymes. Astrobiology 12, 884–891.
Introduction: A Selection Pressure in the Early DNA World
D
Prebiotic Chemistry and Alternative Bases
Studies in prebiotic chemistry routinely suggest that the native bases, while synthetically accessible and common, were likely to have been present in a mixture with numerous other nucleobases (Orgel, 2004; Borquez et al., 2005; Benner et al., 2010). Reactions such as deamination, aromatic substitution, oxidation, and alkylation of exocyclic amines are examples of modifications that readily occur with purines and pyrimidines under prebiotic conditions (Robertson and Miller, 1995a; Robertson and Miller, 1995b; Shapiro, 1995; Levy and Miller, 1998; Shapiro, 1999; Siegel and Tor, 2005; Powner et al., 2009; Barks et al., 2010). Analyses of carbonaceous meteorites also suggest that both the extraterrestrial and early terrestrial environment may have been diversely populated with related heterocycles (Botta and Bada, 2002; Martins et al., 2008; Callahan et al., 2011). Furthermore, contributions illustrating that alternative bases and base pairs can both replace and expand the genetic alphabet continue to underscore the question as to why nature selected the native letters (Piccirilli et al., 1990; Benner, 2004; Benner and Sismour, 2005; Chiba and Inouye, 2010). The discovery of bacteriophages that employ modified bases completely replacing one of the native letters (Fig. 1) is testament to the utility of modified letters in a functional DNA alphabet. Pyrimidine derivatives appear to be the most common modifications (Warren, 1980), but there is at least one case where 2,6-diaminopurine (Dap) was found to completely replace adenine (Kirnos et al., 1977).

Examples of bacteriophages that employ a modified base (HmU=5-hydroxymethyluracil, HmC=5-hydroxymethylcytosine, mC=5-methylcytosine, Dap=2, 6-diaminopurine, Hyp=Hypoxanthine) completely replacing one of the native bases in their genomes (Warren, 1980). All of the modified bases shown here are also considered prebiotically relevant heterocycles, and the purines have recently been identified in meteorites (Callahan et al., 2011). It is interesting that while hypoxanthine (commonly known as inosine in nucleic acids) is routinely employed as a guanine letter in RNA, it has not been identified in bacteriophage functional DNA genomes.
Selection pressures favoring the native bases, such as increased photochemical stability (Abo-Riziq et al., 2005; Serrano-Andres and Merchan, 2009), decreased susceptibility to tautomerization (Roberts et al., 1997; Robinson et al., 1998), and greater comparative stability against decomposition (Levy and Miller, 1998), in comparison to other accessible heterocycles, have been discussed. Consequently, many investigations have operated under the assumption that the selection of the native bases could have been made during the prebiotic epoch (Powner et al., 2009; Powner et al., 2010; Sutherland, 2010), pre-RNA (Joyce, 2002; Bean et al., 2009; Engelhart and Hud, 2010), and/or the RNA world (Joyce, 1989; Bean et al., 2007). However, with the emergence of DNA, another opportunity for base selection or refinement seems plausible.
The Emergence of Labile N-Glycosyl Bonds and DNA Repair
The transition from genomic RNA to DNA is widely accepted to be a result of a selection pressure for early forms of life to overcome the kinetic instability of the 3′,5′-phosphodiester bond in ribonucleotides (Fig. 2) (Lazcano et al., 1988; Li and Breaker, 1999). The removal of the 2′-OH group in the deoxyribonucleotides has weakened, however, the N-glycosyl bonds (Lindahl, 1993). DNA is vulnerable to a specific type of hydrolytic damage called deglycosylation, which involves the loss of a nucleobase via rupture of the N-glycosidic bond. Unlike RNA, where the 3′,5′-phosphodiester bonds are subjected to transesterification reactions by the 2′-OH group, the loss of genetic information and backbone stability in DNA is dependent on the specific identity of the bases (Gates, 2009). Among these, the purines (A/G) deglycosylate under physiological conditions more frequently than the pyrimidines (C/T) (Fig. 2), and heat, divalent metal ions, or a low pH can accelerate these reactions even further (Lindahl, 1993). While RNA can also suffer from depurination, the occurrence of these reactions takes place at significantly reduced rates and low pH (Kochetkov, 1972) (Fig. 2). Importantly, even slight modifications made to the DNA bases, such as deamination, methylation, or oxidation, typically result with increased deglycosylation rates, suggesting that the native bases may provide optimum stability of their N-glycosyl bonds (Schroeder and Wolfenden, 2007; Gates, 2009).

Half-life values for spontaneous damage to RNA and DNA. Both polymers are highly susceptible to spontaneous hydrolysis and subsequent chemistry, as indicated by the color (red: most vulnerable, blue: vulnerable, and black: minimally vulnerable) and arrow width of experimentally determined kinetics for single-stranded polymers at neutral pH and extrapolated to 25–37°C. While only one 5′-phosphodiester bond is indicated by an arrow, note that any other RNA backbone linkage is vulnerable to cleavage. Phosphodiesters in DNA appear to be essentially stable within the lifetime of any organism. However, it is the formation of abasic sites resulting from deglycosylation that exposes the Achilles Heel of DNA. Abasic sites in DNA, being hemi-acetals, are in equilibrium with their open chain aldehydes (about 1%) and are prone to β elimination reactions and strand cleavage. Experimental conditions:
The product of deglycosylation is called an apurinic or apyrimidinic site (AP), also known as an abasic site (Fig. 2) (Lindahl, 1993). The formation of an AP site is the single greatest threat to the integrity and stability of DNA (Lindahl, 1993; Gates, 2009). AP sites are both powerful mutagenic lesions (Loeb and Preston, 1986) and cytotoxic species given their reactive nature, which could lead to strand cleavage (Shapiro, 1981; Lhomme et al., 1999; Boiteux and Guillet, 2004). For RNA, however, abasic sites are less reactive than the DNA counterparts since RNA AP sites maintain enhanced stability against degradation (Fig. 2) (Küpfer and Leumann, 2006). It is important to note that while DNA is extremely resistant to direct phosphodiester bond cleavage (Williams et al., 1999; Schroeder et al., 2006), it can readily suffer from the same problem that plagues RNA upon deglycosylation (Eigner et al., 1961; Sugiyama et al., 1994).
The generation of abasic sites would have posed a formidable barrier to the persistence of DNA life had it not been for the recruitment of repair proteins (Jensen, 1976; Friedberg et al., 2006). Modern cells devote substantial resources to the surveillance and maintenance of their genome, which includes AP sites and damaged bases (Lindahl and Wood, 1999). The base excision repair (BER) pathway is a preeminent defense mechanism used by cells that uses a variety of enzymes to detect and remove lesions and others to repair AP sites (Baute and Depicker, 2008). The need for cells to recruit enzymes that repair AP sites could have been the primary pressure that helps explain the origin of this highly sophisticated pathway. This activity has recently been discovered to also exist within the enzymatic capabilities of a DNA polymerase, hinting at a possible early pressure for maintaining genomic stability combined with proper replication (Banos et al., 2010). Yet, particularly advantageous to the evolution of BER was the recruitment of a class of enzymes known as glycosylases that specifically undertake the task of finding and removing damaged bases as a form of preventive measure to ensure the “health” of a cells' genome (O'Brien, 2006). Primitive glycosylases may have evolved to differentiate bases simply by the relative ease of glycosidic bond excision. This feature, being able to differentiate between normal and damaged bases due to differences in N-glycosyl stability, has been observed even in modern glycosylases (O'Brien and Ellenberger, 2004; Bennett et al., 2006; O'Brien, 2006). It is proposed, however, that many extant glycosylases use other methods for the detection and removal of lesions (Friedman and Stivers, 2010). But in an early DNA world in which less sophisticated forms of BER were used, glycosylases that exploit the difference in glycosidic bond stability would seem to be the simplest.
Greater N-Glycosyl Stability May Have Aided in the Utility of Diverse Bases in the RNA World
Intriguingly, many of the damaged DNA bases so diligently removed by glycosylases are the exact modifications created by proteins in transfer RNA (tRNA), ribosomal RNA (rRNA), and messenger RNA (mRNA) (Table 1). Cells seem to exploit the enhanced stability of N-glycosyl bonds in RNA. Modified bases such as alkylated purines and pyrimidines (e.g., 7-methylguanine, 3-methylcytosine) that are unstable lesions in DNA are found to reside in RNAs (Limbach et al., 1994; Gates, 2009). Hypoxanthine, having a weaker glycosidic bond compared to A or G and being a particularly potent mutagenic lesion in DNA (Schroeder and Wolfenden, 2007), is a ubiquitous modification in RNA, which is employed as a reliable guanine surrogate in RNA editing (Fig. 1, Table 1) (Nishikura, 2010). Even uracil, the base excluded from the DNA alphabet, is known to exhibit higher deglycosylation rates under neutral pH in comparison to thymine (Shapiro and Kang, 1969). Not all modifications found in RNA are necessarily excluded from DNA. Two examples of nucleobases that are used in DNA are 5-methylcytosine and 5-hydroxymethylcytosine (Table 1, Fig. 1), but they present interesting peculiarities. These bases, along with the parent cytosine, are notorious for their rapid rates of spontaneous deamination in comparison to the other letters (Fig. 2) (Levy and Miller, 1998). Yet cytosine is the one base that nature has selected to exploit, and its modifications can make up a substantial presence in DNA (Poole et al., 2001; Nabel et al., 2011). From the viewpoint of genetic fidelity, deamination reactions of C, mC, and HmC are the most problematic, but with regards to DNA stability, cytosine retains one of the strongest glycosidic bonds (Fig. 2).
It was further shown in a detailed study in which deoxynucleosides were used to measure spontaneous deglycosylation rates that cytosine contributes the strongest N-glycosyl bond above all of the native bases (Schroeder and Wolfenden, 2007). It could be that one explanation for nature's particular selection of modifying the C5 position of cytosine in DNA is because the global impact to N-glycosyl stability is comparatively minimal. Many other modifications highlighted in Table 1 (D, HoU, 1mA, N6mA, isoG), while employed in RNA, are necessarily removed from DNA given their inability to maintain or exhibit genetic fidelity or function. Lacking, however, are thorough investigations of relative deglycosylation rates of these and other modified bases compared to the native letters that could provide a quantitative perspective as to how various modifications also affect the N-glycosyl bonds.
What is the evolutionary origin of the diversity seen in RNA bases? Although all these heterocycles result from post-transcriptional modifications, their structures resemble side chains of amino acids (Lazcano, 1994; Robertson and Miller, 1995a; Levy and Miller, 1999). The greater stability of RNA N-glycosyl bonds thus may have been an advantageous feature in the RNA world (Fig. 3). The utility of modified and exotic bases could have expanded the repertoire of catalytically competent RNA oligomers (Robertson and Miller, 1995a; Cermakian and Cedergren, 1998; Benner et al., 1999; Levy and Miller, 1999; Forterre and Grosjean, 2009; Nguyen and Burrows, 2011) without the consequences of rapid deglycosylation. However, with the emergence of DNA and the greater utility of proteins, the functionalized bases would have become obsolete and a detriment to the survival of life (Fig. 3). While the selection of the DNA bases may have been aided by those exhibiting the greatest glycosidic bond stability, the eventual refinement of RNA bases would have largely mirrored the selection process based on energetic costs to the cell. With the increasing takeover by proteins, only the most essential bases for structural and functional roles in RNA would have continued to persist. Modified bases identified in contemporary RNA may actually have been part of the original larger family of diverse bases used in the RNA world (Cermakian and Cedergren, 1998).

Hypothesis diagram illustrating a refinement of the genetic alphabet.
Conclusion
The selection of the native bases did not occur in any one hypothetical period. It is more likely that a continuous process of refinement directed their selection throughout prebiotic and early biotic epochs. As suggested here, the fundamental change to the differences in hydrolytic susceptibility between RNA and DNA may have contributed to this refinement process. While life in the RNA world may have been challenged by the nature of the sugar moiety and its impact on backbone stability, in an early DNA world the governing pressure came from the identity of the attached nucleobase.
In this sense, the arrival of DNA should not be considered just a later modification of RNA; rather, it is a unique biopolymer in its own right that challenged life to adapt to its specific chemical vulnerabilities, to further refine the genetic alphabet, and to evolve repair pathways that allowed for the ubiquity of DNA as we know it.
Footnotes
Acknowledgments
We are grateful to the NIH for support (via grant number GM 069773), Dr. Ulrich Muller, and the reviewers for their helpful comments to our manuscript.
Author Disclosure Statement
No competing financial interests exist.
Abbreviations
AP, apurinic or apyrimidinic; BER, base excision repair; Dap
