Abstract
The pursuit of DNA demethylation has a colorful history, but it was not until 2009 that the stars of this story, the Ten-eleven-translocation (Tet) family of proteins, were really identified. Tet proteins convert 5-methylcytosine to 5-hydroxymethylcytosine (5hmC), which can be further oxidized to 5-formylcytosine and 5-cyboxycytosine by Tet proteins to achieve DNA demethylation. Recent studies have revealed that 5hmC-mediated DNA demethylation can play essential roles in diverse biological processes, including development and diseases. Here, we review recent discoveries in 5hmC-mediated DNA demethylation in the context of stem cells and development.
Introduction
E
Since then, this base modification has received much attention, and more details about the features and function of 5hmC have been uncovered. Present evidence indicates that 5hmC can be further oxidized to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), which can be further excised to unmodified cytosines by thymine-DNA glycosylase (TDG) and the base excision repair pathway [8,9]. Several excellent reviews have highlighted the features of 5hmC-mediated DNA demethylation in the genome and its potential biological function [10 –15]. Here, we summarize recent discoveries about the distribution landscape and detection technologies for 5hmC and its derivatives, with a focus on their function in stem cells and development.
Catalyzing Enzymes and Binding Proteins
There are three Tet proteins, all of them 2-oxoglutarate- and iron (II)-dependent dioxygenases, and each has a catalytic domain (cysteine-rich region) at the C-terminal: Tet1 (aa1367–2039), Tet2 (aa916–1921), and Tet3 (aa697–1668), which has catalytic activity (Fig. 1). In addition, Tet1 and Tet3 contain a CXXC zinc finger domain at their N-terminals, a known DNA-binding domain (Fig. 1). It is interesting to note that Tet2 does not have this CXXC domain, which raises the question of how it binds to DNA to carry out the conversion function. Very recently, Ko et al. found that IDAX (also known as CXXC4), originally encoded in Tet2 but lost during evolution, actually encodes the Tet2 CXXC domain [16]. The CXXC domain of IDAX preferentially binds to unmethylated CpG-enriched DNA, and IDAX directly interacts with Tet2 through its CXXC domain and recruits Tet2 to promoters and CpG islands [16].

Schematic representation of ten-eleven-translocation (TET) protein structure and TET-mediated DNA demethylation.
To follow the routine roadmap of DNA methylation field, a critical question in the demethylation field is what is the 5hmC-binding protein (the reader). Several studies have yielded some candidates for 5hmC-binding proteins, although the search continues for more specific ones. Our previous study showed that the dosage of the Rett syndrome protein methyl-CpG-binding protein 2 (MeCP2) inversely correlates with the global amount of 5hmC [17], but the loss of MeCP2 does not change the overall distribution of 5hmC in the genome [17,18]. Later on, it was found that, in addition to binding to 5mC, MeCP2 can bind to 5hmC, as well, and its R133C mutation presenting in some Rett syndrome patients severely diminished its binding to 5hmC, whereas only subtly affected its 5mC binding capability [18]. Subsequently, more factors, such as Uhrf1 (ubiquitin-like with PHD and ring finger domains 1), Thy28, Mpg (N-methylpurine DNA glycosylase), Neil3 (Endonuclease VIII-like 3), and Recql (ATP-dependent DNA helicase), were found to be capable of binding to 5hmC in mouse embryonic stem cells (mESCs) [19] (Table 1). Intriguingly, these factors could bind to 5mC, as well; so, how the functional outcome is determined needs to be explored further. Meanwhile, upon mESC differentiation, neuronal progenitor cells show a distinct set of 5hmC-binding factors that only partially overlap with those in mESCs [19]. Unlike Uhrf1, which shows similar binding capability for 5mC and 5hmC, Uhrf2 specifically binds 5hmC in neuronal systems [19]. The dynamic feature of 5hmC readers suggests that 5hmC-mediated functional roles could be cell type-specific.
8-week-old mouse (C57BL6 strain). 5hmC abundance varied between different types of neurons (ranging from 0.2% in granule cells to 0.6% in Purkinje cells).
mESC, mouse embryonic stem cells; NPCs, neuronal progenitor cells; mC, methylcytosine; hmC, hydroxymethylcytosine; fC, formylcytosine; caC, cyboxycytosine; N/A, not known yet.
Present evidence also indicates an interaction between 5hmC-binding proteins and Tet proteins (writers). For example, Uhrf2 overexpression enhances Tet1 catalytic activity and increases the global level of 5hmC and its oxidized derivatives [19]. Yildirim et al. found that methyl-CpG-binding protein 3 (Mbd3) is enriched at CpG-rich promoters and shows a high correlation with Tet1 binding [20]. Tet1 depletion can dramatically inhibit Mbd3 binding to 5hmC-harboring DNA regions, suggesting its binding is Tet1-dependent [20]. Further, Mbd3 depletion significantly reduces global 5hmC levels, indicating a role for Mbd3 in the establishment or maintenance of global 5hmC in the genome via the interaction with Tet proteins [20]. Meanwhile, this study also revealed the function of the methyl-binding domain of MBD3, which had been a mystery for a long time.
Distribution Features
The main obstacle to investigating the function of 5hmC (and its oxidation derivatives) has been detecting its distribution genome-wide because of its low abundance in general. Classic methods, like thin-layer chromatography and mass spectrometry analysis, are usually used to quantify global levels of nucleotides. Today, there are many new technologies that can be combined with high-throughput sequencing to profile 5hmC and its derivatives at a genome-wide level. These methods include hydroxymethyl-DNA immunoprecipitation sequencing [21], chemical labeling [22,23], and single-molecule real-time sequencing [24]; the pros and cons of each methodology have been discussed elsewhere [10].
Through the use of available technologies, some conserved features of the 5hmC landscape have emerged. First, in both neuronal cells and mESCs, 5hmC is enriched on gene bodies, including 5′UTR, exons, 3′UTR, intragenic regions, and CpG islands, but it is depleted at transcription start sites (TSSs) and intergenic regions [17,23,25]. Second, in both mESCs and neuronal cells, 5hmC is enriched at euchromatin, but depleted at heterochromatin [17,25]. Further, both in the developing and mature mammalian brain, 5hmC is depleted on the X chromosome [17,25]. The depletion of 5hmC on the inactive X chromosome in females is not hard to imagine, but it is surprising that 5hmC is also missing from the X chromosome in males. Reactivation of the X chromosome is accompanied by the accumulation of 5hmC [26]. Further studies are needed to clarify the exact roles of 5hmC during X chromosome inactivation and activation.
Besides its conservative features, even more impressive are 5hmC's dynamic properties. The global level of 5hmC varies between tissues and cell types [27]. 5hmC is highly enriched in brain (∼40% of 5mC in Purkinje neurons), about 10 times more enriched in neurons than in mESCs, whereas it displays lower levels in other tissues (∼5–10% of the level of 5mC) (Table 1) [7]. In ES cells, the overall level of 5hmC decreases upon differentiation; however, in tissue-specific stem cells, such as neural stem/progenitor cells and hematopoietic stem cells (HSCs), the 5hmC is acquired upon differentiation [17,28,29]. Moreover, the distribution pattern of 5hmC is drastically modified by cell types. For example, 99,238 5mC-enriched regions and 115,913 5hmC-enriched regions were found in human ESCs (hESCs), respectively [30]. Upon differentiation, 26,044 differentially hydroxymethylated regions and 16,123 differentially methylated regions were uncovered during the differentiation of hESCs into neuronal progenitor cells, which mainly occurred in promoters, exons, and enhancers [30]. Finally, the enrichment of 5hmC and the depletion of 5mC on gene bodies vary in different neuronal cell types [18]. Notably, the levels of 5fC and 5caC are much lower relative to 5mC and 5hmC: 0.03% and 0.01% of the level of 5mC in mESCs, respectively, but they are barely detected in brain [31]. These findings further suggest that the function of 5hmC and its derivatives in mediating epigenetic regulation could be cell/tissue type-specific.
Employing Tet-assisted bisulfite sequencing technology, single-base resolution sequencing yields more details about 5hmC in the genome. Yu et al. found that 5hmC is enriched at both promoters and enhancers in ESCs, and distal-regulatory elements are significantly higher than those of promoter-proximal elements [32]. Nearly half of the 5hmC localized to distal-regulatory elements in hESCs, and almost all the 5hmC was identified in the CpG context, 99.89% in hESCs and 98.7% in mESCs [32], which is plausible considering methylation predominantly occurs in the CpG context. 5hmC is lowest at high-CpG promoters, whereas at low-CpG promoters and intermediate-CpG promoters, 5hmC content is several times more abundant [32]. The precise mapping of 5hmC in the genome will advance our understanding about the molecular mechanism behind how 5hmC functions.
Two independent groups further mapped 5fC at a genome-wide level in mESCs and found that TDG specifically recognizes 5fC [33,34]. TDG depletion significantly increases the global level of 5fC and 5caC, with no effect on 5hmC level [33,34]. In terms of its distribution in the genome, Song et al. [33] found 5fC is enriched at exons and enhancers, but depleted at intergenic regions and repeat elements, including LINEs, LTRs, and DNA repeats. Shen et al. [34] found that most Tdg depletion-induced 5fC and 5caC peaks are located outside of promoter or exonic regions, while overlapping with proximal and distal regulatory regions, including active enhancers [33,34]. More than 20% of 5fC clustered regions localized to enhancers, particularly at poised enhancers marked with H3K4me1, but not H3K27ac [33]. TDG-depleted mESCs did not display defects in morphology and only showed subtle alterations in gene expression overall [34]. Notably, even though 5caC accumulated at the binding sites of the pluripotency transcription factors Oct4, Nanog, Sox2, and Esrrb, their expression did not change [34]. Thus, further exploration is needed to fully reveal the function of 5fC and 5caC.
Function
It was once assumed that, in view of their structural relationship, 5hmC just played the opposite role of 5mC; however, mounting evidence indicates this assumption is too simplistic, and 5hmC-mediated DNA demethylation may have multiple functions. For example, Tet1 preferential binding regions have high and intermediate CpG density, and Tet1 binding sites are enriched at promoters, exons, and TSS regions [21,23,25]. Tet1 binding promoters can be categorized into three types: (1) actively transcribed genes marked by H3K36me3 and H3K4me3, (2) Polycomb repressive complex 2 (PRC2)-repressed genes marked by H3K27me3, and (3) genes displaying a bivalent feature marked by both H3K27me3 and H3K4me3 [21,23,25]. Since different histone modifications can affect gene expression differentially, these studies are a testament to the multifaceted character of Tet proteins.
Tet1 and Tet2 were also found to associate with the pluripotency factor Nanog and synergistically increase/enhance reprogramming efficiency relying on their catalytic activity [35]. The enrichment of 5hmC at promoters and gene bodies is positively correlated with gene expression [17,21,23,25]. The expression of these genes also requires the depletion of 5mC on gene bodies, that is, the ratio of 5hmC: 5mC determines whether or not one gene is expressed [35]. Below we will discuss the functional roles of Tet proteins and 5hmC in different types of stem cells.
Embryonic Development and ESCs
Genome-wide active DNA demethylation occurs on the paternal genome but not the maternal genome in both developing primordial germ cells (PGCs) and fertilized oocytes (zygotes) and is largely independent of DNA replication. The maternal genome undergoes passive demethylation in the absence of the maintenance methyltransferase DNMT1 during DNA replication in cleavage-stage embryos [36], but the mechanism behind this remains elusive. Tet3 is highly expressed in oocytes and zygotes, and then decreases at the two-cell stage and later [37,38]. Knockdown of Tet3 significantly affects the active demethylation of the paternal genome [37,38]. Another study showed that Tet3 is specifically expressed on the paternal genome, and in Tet3-deficient zygotes, active DNA demethylation cannot be processed, and embryonic development fails. Maternal Tet3 is responsible for converting 5mC to 5hmC in the paternal genome of developing zygotes. Tet3 deletion during the oocyte stage significantly impacted the fertility of female mice [39].
All three Tet genes display differential expression patterns in different tissue types and at different cellular stages. Tet1 is the most abundant in mESCs, with about five times more Tet1 than Tet2, and Tet3 is expressed at very low levels [40]. Individually, the absence of Tet1 or Tet2 can moderately decrease global 5hmC levels, but dual knockdown leads to a more drastic reduction (around 75–80%) [40]. Upon ESC differentiation, the levels of Tet1 and Tet2 decrease, while Tet3 increases; consistently, global 5hmC also decreases. These data suggest Tet1- and Tet2-mediated 5hmC is strongly correlated with the maintenance of ESC pluripotency. In support of this, one study found that Tet1 depletion impaired the self-renewal of ESCs [8]. Later on, several other studies revealed that the pluripotency factor Oct4–Sox2 complex regulates the expression of Tet1 and Tet2, and the depletion of Tet1 skewed the differentiation of ESCs; however, these studies did not demonstrate changes in cell morphology or the expression of pluripotency factors after Tet1 knockdown [40 –42]. In agreement with this, an in vivo study also showed that Tet1 knockout ESCs could generate mice that are fertile and viable [64].
The expression of Tet1 and Tet2 in ESCs is regulated by Oct4 and has been implicated in DNA demethylation during PGC development [43]. Tet3 is expressed in the preimplantation zygote, where it is proposed to mediate the conversion of 5mC to 5hmC immediately after fertilization [38,39]. In hESCs, 5hmC preferentially associates with active enhancers over poised enhancers, and 5hmC-enriched enhancers are also marked with H3K4me1, H3K9ac, and H3K27ac [44,45]. Thus, 5hmC could potentially interact with histone modification to regulate gene expression, making it important for cell identity.
Reprogramming and iPSCs
Global epigenetic reprogramming is critical for the induction of induced pluripotent stem cells (iPSCs) [46,47]. Given the important roles of DNA methylation in the reprogramming process, it is enlightening to examine the roles of 5hmC during reprogramming. The modulation of Tet proteins alters the expression of some pluripotency factors, including Esrrb, Oct4, Klf4, and their binding sites; enrichment on those loci prompted an active exploration into whether Tet proteins might play some roles in iPSC induction [35,48]. During iPSC induction, the expression of Tet1 is significantly increased, and 5hmC levels increase accordingly [35,48,49]. Precocious expression of Tet1 can significantly enhance the efficiency of iPS induction, and the catalytic domain is critical for this function [35,48,49]. One key step in reprogramming is the demethylation and reactivation of Oct4. Tet1 promotes demethylation (5mc to 5hmC) at the promoter and enhancer of Oct4 and facilitates its expression at the early stage (day 1) of reprogramming [48]. Tet1 also promotes the demethylation of CpGs at the above two regions, which contributes to the reactivation of Oct4, as well [48]. Further, Tet1 together with Sox2, Klf4, and c-Myc can achieve reprogramming, and the iPSCs induced are fully pluripotent and generate mice [48]. These results indicate essential roles for Tet1-catalyzed hydroxymethylation in somatic reprogramming.
The establishment of a normal epigenetic landscape is critical in iPSC induction. Tet1 promotes the demethylation of Oct4 enhancer and promoter, and thus facilitates reprogramming [48]. Both Tet1 and Tet2 physically interact with Nanog, and consistently the depletion of Tet1 and Tet2 can reduce the induction efficiency of mouse embryonic fibroblast cells and embryonic germ cells to iPSCs [35]. Further, Tet2 depletion not only suppresses the cluster of 5hmC at Nanog and Esrrb, it also affects the chromatin modification at Nanog and Esrrb loci [50]. It is of interest to note that, although the overall hydroxymethylation landscape is quite similar between ESCs and iPSCs, some differentially hydroxymethylated regions are found clustering at the subtelomeric regions of specific chromosomes in iPSCs [51].
Leukemia and HSCs
TET1 was originally identified in acute myeloid leukemia (AML) as a fusion partner of the histone H3 Lys 4 (H3K4) methyltransferase mixed-lineage leukemia [52]. Loss of TET2 function is strongly associated with AML, and a variety of myelodysplastic syndromes and myeloproliferative disorders. TET2 mutations were identified in myeloproliferative neoplasms, myelodysplastic syndrome, and AML, and these mutations are associated with poorer patient outcomes. TET2 depletion induces the overproliferation of HSCs and leads to the skewed differentiation of blood progenitors, promoting the granulomonocytic lineage while inhibiting the lymphoid and erythroid lineages [29,53 –55]. The depletion of TET2 also results in a high occurrence of myeloid malignancies, leading to the deaths of nearly a third of mice [29]. The discoveries that TETs associating with cancers could help explain the global DNA hypomethylation and locus-specific DNA hypermethylation seen in cancers. These studies also indicate that TETs are important for maintaining normal development, including hematopoiesis.
Besides leukemia, Tet proteins and 5hmC have been implicated in other types of cancers, including melanoma, glioma, and breast cancer, all of which display global loss of 5hmC [56 –59]. In human melanoma, there is a progressive loss of global 5hmC that is caused by the decreased expression of key enzymes, IDH2, and TET proteins, especially TET2 [57]. Re-establishment of the demolished 5hmC landscape could prohibit tumor invasion and growth [57]. These studies suggest there is a fine balance between 5mC and 5hmC that is critical for maintaining the normal state of tissues/cells, and 5hmC could potentially serve as a diagnostic marker and therapeutic target for cancer.
Neuronal Development and Neural Stem Cells
DNA methylation at cytosines (5mC) is known to play important roles in adult neurogenesis by regulating the proliferation and survival of neural progenitors, and the dendrite growth of newborn neurons in both embryonic and adult brains [1,60,61]. 5hmC is the most abundant among all tissues studied, and genome-wide mapping has revealed that 5hmC is not only acquired during postnatal development, it also displays dynamic features [7,17]. During neuronal development 5hmC is enriched on neurodevelopment-activated and synaptic protein genes [17,62]. Furthermore, 5hmC is enriched on enhancers and gene bodies and marks exon-intron boundaries in brain [17,18,62], further supporting the importance of 5hmC-mediated epigenetic regulation in brain development and function [5,10,14].
During Xenopus embryogenesis, the expression of Tet3 shows a spatial pattern and gradually reaches a peak [63]. Tet3 deficiency induces the abnormal expression of some key developmental genes, including Pax6 (eye and neural marker), Sox2 (neural stem cell marker), Otx2 (anterior neural marker), Sox9 and Snail (neural crest markers), Neurogenin-related 1 and N-tubulin (primary neuron markers), and Shh and Ptc-1 (Sonic hedgehog signaling), which leads to the eyeless, small head, and lethal phenotypes, suggesting Tet3 is important for neuronal development [63].
In the embryonic mouse brain, the 5hmC level increases during neuronal differentiation, while 5mC shows a significant reduction [28]. In both embryonic and postnatal brain, 5hmC is enriched at gene bodies and promoters, but depleted at TSSs and enhancers [17,18,28]. Neuronal differentiation-activated genes display a loss of H3K27me3 in gene bodies and promoters, meanwhile showing an accumulation of 5hmC at intragenic regions during embryonic neurogenesis [28]. Although a previous study indicated that Tet1 deficiency is compatible with brain development and no abnormal anatomical and morphological features were observed [64]. Zhang et al. found that, in adult mouse brain, the absence of Tet1 impaired the self-renewal capability and the proliferation of adult neuronal progenitor cells, and consistently the number of newly generated neurons decreased [65]. Overall, these results suggest 5hmC plays roles in embryonic brain development and adult neurogenesis.
All the evidence that Tet proteins regulate the expression of neurotrophic factors and neurogenesis has prompted investigations into their function in learning and memory [65,66]. Tet1 depletion was found to result in defects in spatial learning and short-term memory [65]. However, another study detected no defects of memory formation, but did uncover impairment in memory extinction, which could be due to the downregulation of neuronal activity-regulated genes and abnormal synaptic plasticity [67]. Whether this is due to the experimental parameter settings is unclear. Overexpressing Tet1 (catalytic domain) in the hippocampus specifically impairs hippocampus-dependent long-term associative memory formation, while leaving general baseline behaviors and learning intact [68]. Unexpectedly, the overexpression of Tet1m (a catalytically inactive mutant) showed similar effects on gene expression and behavioral outcome as Tet1, suggesting the effect of Tet1 on gene expression and behavior is independent of catalytic activity [68]. More detailed studies are needed to reveal the underlying mechanisms.
Mechanism
Recent studies have shed light on how Tet-5hmC modification functions. All three Tet proteins are able to form a complex with O-GlcNAc transferase (OGT) and facilitate OGT-mediated O-GlcNAcylation of histone, especially H2B in both mouse and human [69 –71]. More than 90% of OGT binding sites co-localize with Tet1 occupancy, which preferentially occurs at promoter regions. The depletion of Tet1 can abolish OGT binding to chromatin, suggesting Tet1 might serve as a linker between OGT and genomic DNA. Moreover, the association between OGT and chromatin is found to depend on Tet1 activity, and the interaction between Tet1 and OGT can also stabilize Tet2 binding to DNA [69 –71]. Meanwhile, the loss of Tet2 decreases global H2B levels and H3K4me3 levels, thereby leading to an alteration of gene transcription [69,70].
5hmC-abundant DNA can be more easily released from chromatin by MNase digestion than 5mC-containing chromatin, pointing to an interaction between 5hmC and chromatin [18]. The binding of Tet1 to CpG-rich promoters is not only required for the expression of transcriptionally active genes (H3K4me3 marked), it is also necessary for the repression of Polycomb-repressed genes through a complex with PRC2 or SIN3A (H3K27me3 marked) [23]. 5hmC can also be highly enriched at the TSS of bivalent genes marked by H3K4me3 and H3K27me3 [23]. All these studies indicate an interaction between Tet-mediated 5hmC and histone modification.
It is plausible to imagine an interaction between Tet proteins and other epigenetic pathways. Several microRNAs, including miR-22, -29, and -26a, can regulate Tet and TDG, and overexpressing these microRNA mimics decreases in the bulk 5hmC level [59,72,73]. A very recent study found that one miRNA, miR22, maintains the HSC pool by negatively regulating Tet2 [73]. The same microRNA was also found to play a role in breast cancer. miR-22 promotes tumor invasion and metastasis by silencing metastatic repressor miR-200 by inhibiting the demethylation of miR-22 promoter, and the expression level of endogenous miR-22 inversely correlated with the survival rate and clinical outcome of breast cancer patients [59]. miR-22 expression is directly anticorrelated with the expression of Tet1 and Tet2 in patients, and the knockdown of Tet proteins significantly inhibits the reduction in cell invasion caused by miR-22 inhibition [59]. Collectively, these enlightening studies reveal an intensive interaction between different Tet-5hmC and other epigenetic pathways.
The activity of Tet proteins is regulated by other factors. Previous studies have shown that vitamin C could achieve widespread but specific demethylation and increased the efficiency of iPS induction [74 –76]. Vitamin C treatment rapidly enhanced the global level of 5hmC in hESCs and mESCs, respectively, while not affecting the expression of Tet and Dnmt [74,75]. The methylation status of a group of promoters changed, and the expression of most of these genes were upregulated [74,75]. Further study indicated that vitamin C could enhance Tet1 activity and serves as a co-factor of Tet1 during the demethylation process [74,75]. As vitamin C is present in our bodies and can be obtained from food, these discoveries further link environment with epigenetic modulation and make epigenetics even more colorful.
Concluding Remarks
In summary, 5hmC-mediated DNA methylation is involved in embryonic development, cell fate determination, cancer, reprogramming, and neuronal function via different mechanisms. 5hmC does not simply serve as an intermediate in DNA demethylation; it is a stable epigenetic modification.
Although remarkable progress has been made since the discovery of DNA demethylation, several interesting questions remain to be addressed. First, the mechanisms of how 5hmC and its derivatives playing function are still poorly understood. The discovery of more-specific binding proteins would greatly contribute to understand the mechanism. Moreover, current studies indicate specific Tet proteins play dominant roles in specific diseases, yet what is the underlying reason for that, since each Tet protein can convert 5mC to 5hmC? Finally, it is unclear why 5hmC is dynamic at some loci but stable at others during development. These findings prompt us to wonder whether 5hmC at distinct genomic regions, such as enhancers, promoters, and gene bodies, plays the same or different roles in gene expression, or is 5hmC at different regions more critical for specific functions? Determining the factors interacting with Tet proteins at different 5hmC loci could be a ripe area for future research.
Footnotes
Acknowledgments
We thank Ms. C. Strauss for critical reading of the article. X.L. is supported by the National Key Basic Research Program of China (no. 2014CB943001) and the National Natural Science Foundation of China (31371309).
Author Disclosure Statement
No competing financial interests exist.
