Abstract
Long noncoding RNAs (lncRNAs) are transcribed RNA molecules >200 nucleotides in length. They comprise a diverse class of transcripts that structurally resemble mRNAs, but do not encode proteins. The characterization of lncRNAs and their acceptance as crucial regulators of numerous developmental and biological pathways have suggested that the lncRNA study has gradually become one of the hot topics in the field of RNA biology. In this article, we will highlight recent progress regarding lncRNAs studies, including their classification, biological functional characterization, and their potential roles in disease development.
Introduction
Characterization of lncRNAs
Classification
In mammals, <2% of genome encodes proteins, whereas most of the genome is transcribed to yield complex patterns of interlaced and overlapping transcripts that include thousands of lncRNA transcripts (Mercer et al., 2009; Wilusz et al., 2009). Some researchers attempted to classify lncRNAs based on their genomic proximity to protein-coding genes, including five types: (1) sense, (2) antisense, (3) bidirectional, (4) intronic, and (5) intergenic (Fig. 1) (Mattick, 2009; Mercer et al., 2009). Some researchers considered that “lncRNA” was only a blanket term, which encompasses mRNA-like ncRNAs, long intergenic ncRNAs (lincRNAs), as well as antisense and intron-encoded transcripts, transcribed ultraconserved regions, and transcribed pseudogenes (Gibb et al., 2011). Genes are organized on the specific region of the genome. If the lncRNA classification was based on their genomic proximity to protein-coding genes, it would be easy to look for lncRNA's location, and acquire relative information about the adjacent genes, which could provide some crucial information for decoding the potential roles of lncRNAs in biological processes.

Genomic organization of long noncoding RNAs (lncRNAs). Schematic diagram illustrating the complexity of the network of the long noncoding transcript (black box) that is associated with a representative gene (hollow box). Sense: The lncRNA sequence overlaps with the sense strand of a protein-coding gene. Antisense: The lncRNA sequence overlaps with the antisense strand of a protein-coding gene. Bidirectional: The lncRNA sequence is located on the opposite strand from a protein-coding gene whose transcription is initiated <1000 base pairs away. Intronic: The lncRNA sequence is derived entirely from within an intron of another transcript. This may be either a true independent transcript or a product of pre-mRNA processing. Intergenic: The lncRNA sequence is not located near any other protein-coding loci.
Conservation
Due to recent and rapid adaptive selection, lncRNAs are more pliant to evolutionary pressures than protein-coding genes, as evidenced by the existence of many lineage-specific lncRNAs, such as Xist or Air (Nesterova et al., 2001; Pang et al., 2006). Thus, the primary sequence of lncRNAs is generally less conserved than protein-coding exons. Further, it is less conserved than mRNAs, miRNAs, and snoRNAs across different species (Bentwich et al., 2005; Pang et al., 2006). However, a few lncRNAs still contain some highly conserved elements. Nineteen percent of highly conserved phastCons elements occur in known introns, and another 32% in unannotated regions (Siepel et al., 2005).
Biological Roles of lncRNAs
The majority of the non-protein-coding transcripts belong to the group of lncRNA transcripts. Many studies have indicated that these transcripts play crucial roles in a variety of biological processes (Table 1).
lncRNAs, long noncoding RNAs; lincRNA, long intergenic ncRNAs; Gas5, growth-arrest-specific5; TERRA, telomeric repeat-containing RNA.
Cis-inactivation: dosage compensation and gene imprinting
X chromosome inactivation (XCI), the transcriptional silencing of one X chromosome that occurs in female mammals to compensate for X-linked gene–dosage imbalance, is controlled and regulated by a cis-acting region on the X chromosome termed the X-inactivation center (XIC) (Wood and Oakey, 2006). Four lncRNAs were identified in the XIC region—Xist, Tsix, Xite, and Rep A—which are involved in chromosome inactivation (Lee and Lu, 1999; Ogawa et al., 2008; Koerner et al., 2009). The 17-kb-long Xist is expressed in the inactive chromosome where it recruits histone-modifying enzymes that promote chromatin condensation in cis (Erwin and Lee, 2008). RepA, a 1.6-kb lncRNA within Xist, directly binds the polycomb repressive complex 2 (PRC2) (Zhao et al., 2008). This complex apparently initiates histone H3K27 trimethylation in cis, which then spreads along and transcriptionally inactivates the X chromosome. Jpx, another lncRNA, is developmentally regulated and accumulates during XCI. It mainly functions as a molecular switch for XCI (Tian et al., 2010).
Gregg et al. (2010) performed a genome-wide characterization of imprinting in the mouse embryonic and adult brain. They identified a great number of putative lncRNAs associated with parental allelic effects in expression in the brain. These lncRNAs may play an important role in regulating the gene-imprinting mode (Gregg et al., 2010). The nuclear lncRNAs, Air and Kcnq1ot1, can coat their target loci in an allele-specific manner and induce repressive histone modification through the H3K9 histone methyltransferase G9a (Nagano et al., 2008; Pandey et al., 2008). The H19 gene encodes a 2.3-kb lncRNA that is exclusively expressed from the maternal allele, and it plays an important role in genomic imprinting during growth and development (Gabory et al., 2010).
Cell fate specification
Dynamic lncRNA expression is concordant and discordant to nearby protein-coding genes during ES cell differentiation. Differential lncRNA expression can be detected during T-cell differentiation, and neuronal and glial cell differentiation. This evidence suggests that lncRNAs may be involved in regulating cell fate specification (Dinger et al., 2008; Pang et al., 2009; Mercer et al., 2010). Cesana et al. (2011) identified a muscle-specific lncRNA, linc-MD1, which governs the time of muscle differentiation by acting as a ceRNA in mouse and human myoblasts. Downregulation or overexpression of linc-MD1 correlates with retardation or anticipation of the muscle differentiation program, respectively (Cesana et al., 2011).
lncRNAs have also been implicated in global remodeling of the epigenome and gene expression during reprogramming of somatic cells to induced pluripotent stem cells (iPSCs). Loewer et al. (2010) searched for lncRNAs that are specifically changed in human iPSCs compared to the cell of origin. They identified a subset of those that are upregulated in iPSCs compared to ES cells. iPSC-enriched lncRNA loci are bound by the key pluripotency transcription factors, OCT4, SOX2, and NANOG, and knockdown of OCT4 results in the downregulation of the lncRNAs, reasoning that increased lncRNA expression may promote reprogramming (Loewer et al., 2010). Ulitsky et al. (2011) used chromatin marks, poly(A)-site mapping, and RNA-Seq data to identify more than 550 distinct lincRNAs in zebrafish. Antisense reagents targeting conserved regions of two zebrafish lincRNAs cause developmental defects. Reagents targeting splice sites cause the same defects and are rescued by adding either the mature lincRNA or its human or mouse ortholog. This study provides a roadmap for identification and analysis of lincRNAs in model organisms and shows that lincRNAs play crucial biological roles during embryonic development with functionality conserved despite limited sequence conservation (Ulitsky et al., 2011). Guttman et al. (2011) performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. They found that lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs, suggesting that lincRNAs play key roles in the circuitry-controlling ES cell state (Guttman et al., 2011).
Cell apoptosis and cell cycle control
The lncRNA growth-arrest-specific5 (Gas5) can sensitize a cell to apoptosis by regulating the activity of glucocorticoids in response to nutrient starvation. Upon cellular stress induced by limited growth factors, Gas5 accumulates through a 5′-oligopyrimidine tract that confers RNA stability. Gas5 binds to the DNA-binding domain of the glucocorticoid receptor (GR) and prevents GR interaction with cognate glucocorticoid response elements. Under normal conditions, GR target genes suppress cell apoptosis, such as cellular inhibitor of apoptosis 2, and inhibit the cell-death executioner caspases 3, 7, and 9 (Webster et al., 2002; Kino et al., 2010). In response to DNA damage, p53 directly induces the expression of lincRNA-p21, an ∼3-kb transcript located in the proximity of the cell cycle regulator gene Cdkn1a. lincRNA-p21 acts as an inhibitor of the p53-dependent transcriptional response by repressing the transcription of genes that interfere with apoptosis. Furthermore, lincRNA-p21 can interact with ribonucleoprotein K and recruits it to repress a host of genes known to be inhibited by p53 expression (Mourtada-Maarabouni et al., 2008). Hu et al. (2011) examined the expression of lncRNAs during erythropoiesis and identified an erythroid-specific lncRNA with antiapoptotic activity. Inhibition of this lncRNA blocks erythroid differentiation and promotes apoptosis. This lncRNA represses expression of Pycard, a proapoptotic gene, explaining in part the inhibition of the programmed cell death (Hu et al., 2011). Vax2os1 is a retina-specific lncRNA whose expression is restricted to the mouse ventral retina. The spatiotemporal misexpression of Vax2os1 determines cell cycle alterations in photoreceptor progenitor cells, suggesting that Vax2os1 functions as a regulator of cell cycle in the mammalian retina during development (Meola et al., 2012).
Nuclear architecture and subnuclear compartments
Paraspeckles (PSPs) are nuclear bodies associated with the retention in the nucleus of specific mRNAs in controlling gene expression (Bond and Fox, 2009). The lncRNA, Men ɛ/β, associates with the core PSP proteins and maintains their integrity. Temporary and reversible blocking of transcription leads to the disassembly of PSP components, whereas reversal of the transcriptional block results in reassembly of PSP proteins on nascent MEN-ɛ/β lncRNAs, not on mature MEN-ɛ/β (Clemson et al., 2009; Mao et al., 2010; Souquere et al., 2010). MALAT1 is transcribed from the downstream region of the Men ɛ/β gene on the same chromosome, and specifically retained in nuclear speckles that are involved in the assembly, modification, and/or storage of components of the pre-mRNA-processing machinery (Ji et al., 2003; Zhao et al., 2009).
Several repeat-associated lncRNAs are localized to specific nuclear architectures. The telomeric repeat-containing RNA (TERRA), also termed telomeric RNA (TelRNA), is transcribed from the heterochromatic telomeric regions at least in part by RNAP II. It is heterogeneous in length, accumulates at telomeres, and may work together with RNA surveillance factors to assure telomere replication and length homeostasis (Azzalin et al., 2007; Schoeftner and Blasco, 2007). A subclass of Sat III in the pericentromeric heterochromatin is transcribed upon heat shock and gives rise to highly asymmetrical transcripts. These Sat III transcripts are polyadenylated, heterogeneous in size, and associated with nuclear stress bodies (Valgardsdottir et al., 2008). Retrotransposons located immediately 5′ of protein-coding loci frequently act as alternative promoters and/or express noncoding RNAs. Faulkner et al. (2009) conducted the genome-wide screen, and identified 23,000 candidate regulatory regions derived from retrotransposons, in addition to more than 2000 examples of bidirectional transcription. More importantly, most of the transcripts are lncRNAs (Faulkner et al., 2009).
Chromatin modification
lncRNAs can mediate epigenetic changes by recruiting chromatin-remodeling complexes to specific genomic loci. A 2.2-kb ncRNA residing in the HOXC locus, termed HOTAIR, can interact with PRC2, and is required for PRC2 occupancy and histone H3 lysine-27 trimethylation of the HOXD locus. Transcription of lncRNA may demarcate chromosomal domains of gene silencing at a distance (Rinn et al., 2007). Another lncRNA, HOTAIR, can serve as scaffolds by providing binding surfaces to assemble select histone modification enzymes and thereby specify the pattern of histone modifications on target genes (Tsai et al., 2010). From an analysis of four mouse cell types, Guttman et al. (2009) identified 1250 unannotated intergenic regions at least 5 kb in size through a computational approach. These linRNAs are entirely situated in the intervening regions between genes, and enriched in evolutionarily conserved sequences (Mitchell Guttman et al., 2009). HOTTIP, a lincRNA transcribed from the 5′ tip of the HOXA locus, can coordinate the activation of several 5′ HOXA genes in vivo. HOTTIP RNA binds the adaptor protein WDR5 directly and targets WDR5/MLL complexes across HOXA, driving histone H3 lysine-4 trimethylation and gene transcription. Thus, by serving as key intermediates that transmit information from higher order chromosomal looping into chromatin modifications, lincRNAs may organize chromatin domains to coordinate long-range gene activation (Wang et al., 2011).
RNA processing
MALAT1 interacts with SR proteins and influences the distribution of these and other splicing factors in nuclear speckle domains, and thereby regulates alternative splicing (Tripathi et al., 2010). A highly conserved tRNA-like small RNA of 61 nucleotides was identified from the MALAT1 locus. In contrast to the long MALAT1 transcript that localizes to nuclear speckles, the small RNA exclusively localizes to the cytoplasm. RNase P cleaves downstream of a genomically encoded poly(A)-rich tract to simultaneously generate the 3′ end of the abundant MALAT1 transcript and the 5′ end of the small RNA. The finding reveals a novel 3′ end-processing mechanism by which a single locus can yield both a stable nuclear retained noncoding RNA with a short poly(A) tail-like moiety and a small tRNA-like cytoplasmic RNA (Wilusz et al., 2008).
Enhancer- and promoter-associated lncRNAs
Preker et al. (2008) revealed a class of short, polyadenylated, and highly unstable RNAs. These promoter upstream transcripts (PROMPTs) give rise to ∼0.5 to 2.5-kb upstream of active transcription start sites. PROMPT transcription takes place in both sense and antisense directions with respect to the downstream gene. The possibility is that PROMPT transcription may have a more general function by providing reservoirs of RNAPII molecules, which can facilitate rapid activation of the downstream gene, and/or by serving to alter the chromatin structure (Preker et al., 2008).
Mammalian genomes contain vast intergenic regions that can be transcribed into various types of short noncoding and lncRNAs. De Santa et al. (2010) used chromatin signatures to characterize extragenic transcription sites targeted by RNA pol II in a highly regulated response—endotoxin activation of macrophages. They found that a significant portion of extragenic transcription sites are associated with the chromatin signature characteristic of enhancers (De Santa et al., 2010). Kim et al. (2010) used the genome-wide sequencing methods to study the stimulus-dependent enhancer function in mouse cortical neurons. They observed activity-regulated RNAPII binding to thousands of enhancers. Notably, RNAPII at enhancers transcribes bidirectionally a novel class of enhancer RNAs (eRNAs) within enhancer domains defined by the presence of histone H3 monomethylated at lysine 4. The level of eRNA expression at neuronal enhancers positively correlates with the level of messenger RNA synthesis at nearby genes, suggesting that eRNA synthesis occurs specifically at enhancers that are actively engaged in promoting mRNA synthesis (Kim et al., 2010). In addition, Ørom et al. (2010) used a GENCODE annotation of the human genome to characterize over a thousand lncRNAs that are expressed in multiple cell lines. They found an enhancer-like function for a set of these lncRNAs in human cell lines (Ørom et al., 2010).
lncRNAs and Disease
lncRNAs and cancer progression
lncRNAs are emerging as new players in the cancer paradigm demonstrating their potential roles in both oncogenic and tumor suppressive pathways. These novel transcripts are frequently aberrantly expressed in a variety of human cancers. MALAT1 is widely expressed in normal human and mouse tissues. It shows abnormal expression in numerous human carcinomas, including those of the breast, pancreas, lung, colon, prostate, liver, and ovarian cancer (Lin et al., 2006; Lai et al., 2011). RNA interference-mediated silencing of MALAT1 impaires the in vitro migration of lung adenocarcinoma cells through concomitant regulation of motility-associating genes via transcriptional and/or post-transcriptional means (Tano et al., 2010). Similarly, short hairpin RNA inhibition of MALAT1 reduces cell proliferation, and invasive potential of a cervical cancer cell line (Guo et al., 2010). Chung et al. (2011) revealed that a novel lncRNA termed “PRNCR1” (prostate cancer noncoding RNA 1) is upregulated in some of the prostate cancer cells as well as precursor lesion prostatic intraepithelial neoplasia. Knockdown of PRNCR1 attenuates the viability of PC cells and the transactivation activity of the androgen receptor, suggesting that PRNCR1 is involved in prostate carcinogenesis possibly through the androgen receptor activity (Chung et al., 2011). Prensner et al. (2011) discovered 121 unannotated prostate cancer-–associated ncRNA transcripts (PCATs) by an ab initio assembly of high-throughput sequencing of polyA+ RNA (RNA-Seq) from a cohort of 102 prostate tissues and cells lines. One lncRNA, PCAT-1, is found to be a prostate-specific regulator of cell proliferation and a target of the PRC2 (Prensner et al., 2011).
H19 is reported to be reactivated during adult tissue regeneration and tumorigenesis. H19 is highly expressed in liver metastasis derived from a range of carcinomas. By controlling oxygen pressure during tumor cell growth and H19 expression levels, Matouk et al. (2007) investigated the role of H19 expression in vitro and in vivo in hepatocellular carcinoma (HCC) and bladder carcinoma. H19 RNA harbors protumorigenic properties; thus, the H19 gene behaves as an oncogene and may serve as a potential new target for antitumor therapy (Matouk et al., 2007). lncRNA-HEIH is upregulated in HCC. It plays a key role in the G0/G1 arrest, and is associated with enhancer of zeste homolog 2 (EZH2), which is required for the repression of EZH2 target genes (Yang et al., 2011b). The product of the MYC oncogene is widely deregulated in cancer and functions as a regulator of gene transcription. c-Myc significantly induces the expression of the H19 noncoding RNA in diverse cell types, including breast epithelial, glioblastoma, and fibroblast cells. The c-Myc oncogene directly induces the H19 noncoding RNA by allele-specific binding to potentiate tumorigenesis (Barsyte-Lovejoy et al., 2006).
Melanoma can occur in any part of the body that contains melanocytes. It is less common than other skin cancers, but it is much more dangerous and causes the majority (75%) of deaths related to skin cancer. Levels of one lncRNA, SPRY4-IT1, are markedly increased in melanoma cells, but not in normal skin cells (Khaitan et al., 2011). p15 antisense (p15AS) upregulated in a leukaemia sample can inhibit the expression of the p15 gene, a cyclin-dependent kinase inhibitor, in cis and in trans through heterochromatin formation, but not DNA methylation (Yu et al., 2008). In addition, many HOX-associated lncRNAs show differential expression between primary breast carcinomas and distant metastases (Gupta et al., 2010). In response to DNA damage, a lot of p53-dependent lncRNAs were also identified (Huarte et al., 2010).
lncRNAs and other diseases
An inherited form of α-thalassaemia is caused by the translocation of an antisense lncRNA to a location near the alpha-globin gene (HBA2). The translocation and induction of the lncRNA result in epigenetic silencing of the HBA2 gene and the form of human anemia (Tufarelli et al., 2003). Through a large-scale case–control association study using single-nucleotide polymorphism (SNP) markers, a susceptible locus is identified for myocardial infarction on chromosome 22q12.1 mapped to an lncRNA, MIAT (myocardial infarction-associated transcript (Ishii et al., 2006). Genome-wide association studies also identified a region associated with coronary artery disease that encompassed an lncRNA, ANRIL (McPherson et al., 2007; Pasmant et al., 2007). PRINS (psoriasis-associated RNA induced by stress), an lncRNA, is upregulated in the skin of patients with psoriasis, which contributes to psoriasis via the downregulation of G1P3, a gene encoding a protein with antiapoptotic effects in keratinocytes (Sonkoly et al., 2005). The increased expression of the antisense transcript of the BACE1 gene in response to cell stressors has been implicated in the progression of Alzheimer's disease (Faghihi et al., 2008; Khalil et al., 2008). Patients with SCA8 have a trinucleotide expansion in an lncRNA named ataxin 8 opposite strand (ATXN8OS), which is antisense to the KLHL1 gene. The involvement of this mutation in SCA8 disease progression is confirmed in a transgenic mouse model. Transgenic mice with this repeat expansion have a similar progressive neurological phenotype to humans with SCA8 (Moseley et al., 2006).
lncRNAs as a disease-diagnose biomarker
Cancer-specific miRNAs are detectable in the blood, sputum, and urine of cancer patients. Thus, they are suitable as biomarkers for cancer diagnosis (Krutovskikh and Herceg, 2010; Duttagupta et al., 2011). lncRNA expression profiles are also altered in several types of cancers, including human prostate cancer, renal cell carcinomas, breast cancer, ovarian cancer, and human lung adenocarcinomas, raising the possibility that lncRNAs may become a promising biomarker in disease diagnosis. The prostate-specific lncRNA, DD3, shows higher specificity than serum prostate-specific antigen (PSA). It can be developed into a highly specific, nucleic acid amplification-based marker (Hessels et al., 2003; Tinzl et al., 2004). HCC-associated lncRNA, HULC, is upregulated in the blood of hepatocarcinoma patients, implicating the potential use for hepatocarcinoma diagnosis (Panzitt et al., 2007). HOTAIR for HOX antisense intergenic RNA is obviously increased in breast tumor, implicating that it may be a powerful predictor of patient outcomes such as metastasis and death. In addition, increased HOTAIR expression in HCC makes it as a candidate biomarker for predicting tumor recurrence in HCC patients who have undergone liver transplant therapy and a potential therapeutic target (Yang et al., 2011a). SNORD-host RNA Zfas1 is a transcript (Zfas1) antisense to the 5′ end of the protein-coding gene Znfx1, which is highly expressed in the mammary gland and downregulated in breast tumors compared to normal tissue, suggesting its potential application for the diagnosis of breast cancer (Askarian-Amiri et al., 2011).
A conserved noncoding antisense transcript for β-secretase-1 (BACE1) regulates BACE1 mRNA and subsequently BACE1 protein expression in vitro and in vivo. BACE1 is upregulated in Alzheimer's disease, thus can be exploited as a biomarker for the diagnosis of Alzheimer's disease (Pandey et al., 2008). ANRIL is expressed in tissues and cell types affected by atherosclerosis, and its altered expression can be used to predict the development of coronary artery disease (Broadbent et al., 2008; Jarinova et al., 2009).
lncRNAs represent a significant untapped resource as their smaller noncoding RNA counterparts in terms of developing diagnostics and therapies. To date, although our understanding on how lncRNAs cause diseases lags far behind our understanding of their protein partners, some features of lncRNAs make them ideal candidates for therapeutic intervention.
Conclusions and Perspectives
Despite recent and rapid progress in lncRNA research, there are still a great number of important questions that remain to be solved. How do specific proteins interact with lncRNAs? How do these interactions lead to functional consequences? How do lncRNA–protein interactions mediate specific functions? In general, nucleotide sequences of lncRNAs are not well conserved among different species. However, the secondary structures formed by lncRNAs appear to be conserved, that is, double stem-and-loop structure for PRC2 binding (Zhao et al., 2008; Kanhere et al., 2010; Maenner et al., 2010). Can lncRNAs function like proteins through their conformational versatility to form various secondary structures? In addition, several lines of evidence have shown that even small-scale mutations, such as SNPs, can affect the lncRNA structure and function. Thus, future studies are required to elucidate the mechanism by which mutations in lncRNA functional motifs can affect their regulatory domains and compromise its ability to interact with other molecules, thereby leading to the pathogenesis of disease.
Unlike matured mRNAs, which after processing are transported to the cytoplasm, most of the lncRNAs are predominantly expressed in nucleus. Only a small percentage of lncRNAs are primarily detected in the cytoplasm. Few lncRNAs seem to be equally expressed in both compartments. Recently, three lncRNAs were found to be generated from the mitochondrial genome. These mitochondrial lncRNAs form intermolecular duplexes and their abundance is cell- and tissue-specific (Rackham et al., 2011). This evidence raises the question whether the specific cellular localization can help to screen their function.
In conclusion, although a lot of key questions remain unanswered, the recent flurry of studies regarding the roles of lncRNAs in various biological processes clearly suggests the potential roles of lncRNAs as important regulators in many gene regulatory networks.
Footnotes
Acknowledgments
This work was supported by the Doctoral Fund (B-8812-11-0197 to B.Y., A-2400-11-0209 to Z.-H.W.), and the Shanghai cultivation fund for Outstanding Young Teachers (B-5409-11-0012 to Z.-H.W.).
Disclosure Statement
No competing financial interests exist.
