Abstract
The pathogenesis of Alzheimer’s disease (AD) is identified to be significantly regulated by long non-coding RNA (lncRNA) based on in vivo and clinical experiments. Single nucleotide polymorphisms (SNPs) can strongly impact expression and function of lncRNA in AD, and previous genome-wide associations studies (GWAS) have discovered substantial amount of risk SNPs associated with AD. However, current studies omit the important information about SNPs when identifying potential AD-related lncRNAs. In addition to single discovery approach and small-scale samples in these studies, the number of lncRNAs discovered as keys in AD is limited. Here, multiple computational methods were integrated to discover novel and key lncRNA of the pathology of AD. First, large-scale GWAS data involved in three ethnicities were collected from two authoritative sources, and meta-analyses were conducted to find SNPs significantly associated with AD (tag SNPs). Second, these tag SNPs together with their linkage disequilibrium information were used to discover potential lncRNAs related to AD. Third, after validation by microarray probe re-annotation of 1,282 samples and RNA-seq data analysis of 117 samples, respectively, a total of five key lncRNAs of AD were identified. Finally, possible function of these lncRNAs was predicted by genome mapping, expression quantitative trait loci, differential co-expression, and gene set enrichment analysis. Based on function prediction, four of the five key lncRNAs were identified to affect the risk of AD by regulating corresponding pathogenic genes and pathways, which are involved in regulation of amyloid-β peptide and the immune system. In summary, these findings can facilitate the discovery of potential disease-related lncRNAs and enhance understanding of the pathogenesis of AD.
Keywords
INTRODUCTION
Alzheimer’s disease (AD) is a chronic neurodegenerative disorder that starts slowly and worsens over time [1]. The accumulation of amyloid-β peptide (Aβ) is commonly recognized as a main characterization and driver of AD pathogenesis [2, 3]. So far, >35 million people around the world and 5.5 million in the United States suffer from AD [4]. Because of the lack of effective preventive measures and drugs, the number of AD patients is estimated to rapidly increase in the next forty years [5 –7]. In particular, according to two independent studies in 2011 and 2015, newly diagnosed AD patients are expected to reach 10–15 million and 135 million by 2050 in the United State and around the world, respectively [8 –10].
Long non-coding RNAs (lncRNAs) are a kind of considerable non-protein coding transcripts [11]. Until now, about 167,150 lncRNAs have been identified in the human genome [12], and hundreds of them are considered to be involved in many key aspects of cellular process, such as establishment of cell identity, cell cycle control, and apoptosis [13]. Recent studies have shown that the pathogenesis of many human diseases involve the dysregulations of lncRNAs [14 –16], and several AD-related lncRNAs have been identified by experiments [17 –19]. For example, Mus et al. found that lncRNA BC200, a translational regulator targeting eukaryotic initiation factor 4A, is significantly upregulated in AD patients compared to age-matched healthy individuals. The overexpression of BC200 can impair synaptic plasticity in AD neurons [20]. Faghihi et al. found that lncRNA BACE1-AS, a conserved antisense transcript β-site amyloid precursor protein, is a crucial regulator in AD pathophysiology. BACE1-AS concentration is significantly elevated in AD patients, and the overexpressed BACE1-AS enhances amyloid-β protein precursor (AβPP) processing and the abundance of Aβ1–42 in AD brain [17]. Ciarlo et al. identified an AD-related lncRNA 51A which maps in antisense configuration to intron 1 of the SORL1 gene. The overexpressed 51A in AD brain results in a decreased synthesis of SORL1 variant A (a canonical long protein associated with impaired processing of AβPP) by affecting the alternative splicing of SORL1 gene, and further leads to increased Aβ formation [18]. Moreover, according to genome-wide association study (GWAS) findings by Lambert et al., two single nucleotide polymorphisms (SNPs) rs190982 and rs11771145 (locating in the MEF2C and EPHA1, respectively) are significantly associated with AD, and MEF2C and EPHA1 are actually located in lncRNAs [21]. These experiments identified the novel lncRNAs with great accuracy, but their significant costs in time and investment limit our understanding of the function of lncRNAs in the pathology of AD [22 –26]. Compared with other complex diseases, only a few AD-related lncRNAs and their functions have been little identified [27 –30]. In order to overcome the deficiencies, computational methods are applied in the identification of potential AD-related lncRNAs which can substantially enhance the discovery efficiency [31 –34]. Recently, a strategy of microarray probe reannotation was adopted and 108 lncRNAs were found to be potentially related to AD in temporal cortex [31]. Another study finds 488 differential expressed lncRNAs between AD patients and healthy individuals in entorhinal cortex by a similar approach [32]. However, because of the single discovery approach and small-scale samples (16 and 20 individuals, respectively) utilized in those studies, the results are not fairly reliable making it difficult to explain the functions of these lncRNAs, and it is highly possible that some novel lncRNAs playing a key role in AD are omitted.
Recently, the SNPs were identified to contribute to the dysregulation of lncRNAs expression and play an important role in the association between lncRNAs and diseases [35 –39]. Ever since the first application of GWAS in 2005 [40], a large number of SNPs significantly associated with various diseases have been identified on a genome-wide scale. For example, Harold et al. compared 3,941 AD cases with 7,848 controls and identified two susceptibility SNPs (rs11136000 and rs3851179) at the CLU and PICALM gene, respectively [41]. Naj et al. analyzed the genotyping data of 6,922 AD cases and 24,666 controls and identified four risk loci (rs4938933, rs9349407, rs11767557, and rs3865444) at the MS4A4A, CD2AP, EPHA1, and CD33 gene, respectively [42]. In 2014, a susceptibility locus (rs277470) at PLXNA4 gene was identified associated with AD among Caucasian, African American, and Japanese populations [43]. Recently, three common loci (rs3764650, rs3752246, and rs4147929) in ABCA7 gene were re-assessed and confirmed to increase the risk of AD by a meta-analysis [44]. Nevertheless, nearly all of these discoveries focus on the protein-coding genes, and the non-coding risk variants are ignored [45]. Moreover, the existence of linkage disequilibrium (LD) in GWAS can present many SNPs associated with AD even if only one of them is the true risk variant, which may lead to neglect of the association between disease and lncRNA SNPs if they are in the LD with the nominal susceptible SNPs [46]. Therefore, the mining from GWAS data can facilitate the effective discovery of novel AD-related lncRNAs.
In this study, multiple computational approaches were integrated to identify novel lncRNAs playing a key role in the pathology of AD. First, large-scale GWAS data involving three ethnicities (European, African, and Asian ancestry) were collected from AlzGene [47] and International Genomics of Alzheimer’s Project (IGAP) [21], and the meta-analyses were then conducted to eliminate the inconsistency and find the SNPs significantly associated with AD (tag SNPs). Second, the tag SNPs together with their LD information were used to discover the lncRNAs potentially related to AD. Third, 1,282 microarray samples and 117 RNA-seq samples from the Gene Expression Omnibus (GEO) database were used to validate and identify key lncRNAs of AD. Finally, a total five key lncRNAs of AD were identified, and their possible functions were predicted by applying genome mapping, expression quantitative trait loci (eQTL), differential co-expression, and gene set enrichment analysis (GSEA). The brief flow chart of this study is illustrated in Fig. 1 (details are described in Supplementary Figure 1). In summary, these findings can facilitate the discovery of novel and key lncRNAs in AD and enhance the understanding of the pathogenesis of AD.

The flow chart of the study design for identifying the key lncRNAs in pathology of AD and predicting their functions.
METHODS
Collecting GWAS data and identifying AD-related tag SNP
AlzGene is a publicly available resource providing AD genetic variants. It catalogued 695 genes and 2,973 polymorphisms from 1,395 GWAS studies involved in European, African, and Asian ancestry ethnicities [47]. We used the R package ‘meta’ (http://cran.r-project.org/web/packages/meta/index.html) to conduct systemic meta-analyses. Frist, we extracted the detailed genotype data of each SNP both in AD patients and controls from AlzGene. Second, we used two quantities, Cochran’s Q and I
2, to measure the heterogeneity of each SNP among different studies. The Cochran’s Q approximately followed a chi-squared distribution with k–1 degrees of freedom (where k was the number of studies), and the I
2 value was calculated through Cochran’s
Identifying the non-coding SNPs in LD with the tag SNPs
HaploReg is a tool for exploring annotations about the linked variants at the regions of non-coding sequence, which is based on the LD information of SNPs from the 1000 Genomes Project [46]. HaploReg (v4) defined a core set of 52,054,804 variants mainly consisting of SNPs for the four ancestral super-populations, i.e., African (AFR), American (AMR), Asian (ASN), and European (EUR) ancestry [46]. We uploaded these tag SNPs to the HaploReg using a text file (one tag SNP ID per line), and calculated the squared correlation (r 2) between each tag SNP and the non-coding SNPs in different ethnicities (including African, American, Asian, and European). Then, according to the ethnicities of tag SNPs, we selected the non-coding SNPs in LD with them in corresponding ethnicities (the LD threshold was set as r 2≥0.8). Parameters of HaploReg were set to the default values.
Identifying the lncRNAs potentially related to AD using non-coding SNPs
lncRNASNP database provides a comprehensive resource about lncRNAs and their related SNPs. It systematically integrates the information of SNPs and lncRNAs from dbSNP of the NCBI and LNCipedia database, respectively, and identifies 495,729 SNPs in > 30,000 human lncRNA transcripts [35]. In this study, we uploaded these non-coding SNPs in LD with the tag SNPs to lncRNASNP (v2.0) database (one SNP ID for each line), and mapped the genomic locations of these non-coding SNPs and lncRNAs. Then, we selected the lncRNAs which contain at least one of these non-coding SNP, and defined them as the potentially AD-related lncRNAs.
Validating and discovering the key lncRNAs in the pathology of AD
In order to further validate the association between AD and those lncRNAs identified in the previous step, the expression levels of the lncRNAs were compared between AD patients and healthy persons. First, we selected all the AD-related Affymetrix Human expression profiling microarray datasets from GEO database by searching with the keyword: “Alzheimer’s Disease”. Then, we re-annotated the microarray probes to measure expression levels of the lncRNAs potentially related to AD according to the strategy in a previous study [53]. Briefly, we first downloaded the reference sequences of these potentially AD-related lncRNAs in FASTA format from NONCODE database [12]. Second, probe sets of the microarrays were aligned to the lncRNA sequences using SeqMap tool [54, 55], and the lncRNA-specific probe sets were obtained which contain at least four probes uniquely mapped to the lncRNA sequences without mismatch. Finally, quality controls for the microarrays were performed using Expression Console software, and the gene differential expression analysis between AD brains and controls were conducted using Affymetrix Transcriptome Analysis Console software. The expression difference threshold was set as fold-change (FC) ≥1.5 and p-values less than 0.05 after multiple testing according to previous studies [56 –59].
If the samples of original microarray studies are from same brain regions and ethnicities, and the results of lncRNA expression patterns are inconsistent, a meta-analysis strategy was further performed to eliminate inconsistency [60 –62]. Briefly, the meta-FC values were calculated using a linear amalgamation method, which combined the FC values by giving weights to the effect size in each study. The meta p-values were calculated using a Fisher’s method based on the x 2 distribution. In addition, Given that the microarray samples are mainly from European ancestry ethnicity, we further used a RNA-seq dataset of European ancestry ethnicity (GSE95587 [63]) to perform an lncRNA differential expression analysis and verify the results of microarray analysis. First, we downloaded the sequence data of GSE95587 (including 84 AD cases and 33 controls) from the NCBI Sequence Read Archive (SRA) database and converted them into FASTQ files using the SRA Toolkit software. Then, quantification of the lncRNA transcripts was performed by mapping the RNA-seq reads to their corresponding reference sequences and calculating the transcript per million (TPM) values using the Kallisto software, which is a fast and highly accurate tool to quantify transcript abundance from large-scale RNA-seq data using a k-mer lookup (instead of the traditional alignment step) [64]. Finally, we used the R package ‘DESeq’ to identify the differential expressed lncRNAs between AD and control according to the threshold of FC≥1.5 and p-value < 0.05 [65]. The p-values of differential expression analysis are corrected by the multiple testing (Benjamini–Hochberg method).
Inferring the function of the key lncRNAs in the pathology of AD
To explore the functions of those key lncRNAs identified in the previous step, four strategies (including genome mapping, eQTL, differential co-expression, and GSEA analysis) were applied to assess the regulations of lncRNAs to the known AD pathogenic genes both in cis and trans levels, and discovery the biological pathways associated AD. Recent studies showed that many lncRNAs can effect on the expression of protein-coding genes located closely to them in genome (particularly within 2 kb), and these lncRNAs and protein-coding genes tend to be associated with the same diseases [66, 67]. So, one of the strategies to infer the functions of the key lncRNAs in the pathology of AD are to test whether some of the known AD pathogenic genes exist within 2 kb of these lncRNAs or not. For this purpose, we first loaded all the data of the LNCipedia database into the University of California Santa Cruz (UCSC) Genome Browser [68, 69]. LNCipedia is a comprehensive database of human lncRNAs which contains the information about location, sequence, and structure of the currently known human lncRNAs [70]. Then, we used the UCSC Genome Browser to locate both these identified lncRNAs and known AD pathogenic genes in human genome. Here, we used GRCh38/hg38 as human genome references.
Other studies reported that the expression of lncRNAs can be affected by the SNPs located in them, and about 25% of these SNPs also influence the expression of protein-coding genes neighboring the lncRNAs [35], which implied that part of the cis-action of lncRNAs to protein-coding genes may be associated the SNPs located in them. To further explore the functions of those key lncRNAs in the pathology of AD, cis eQTL analysis was conducted to test whether the SNPs located in these lncRNAs affect the expression of known AD pathogenic genes or not by Braineac database [71] and Genotype-Tissue Expression (GTEx) project [72] (p < 0.01). Braineac and GTEx analyzed the microarray datasets of 10 (Cerebellar Cortex, Frontal Cortex, Hippocampus, Medulla, Occipital Cortex, Putamen, Substantia Nigra, Temporal Cortex, Thalamus, and Intralobular White Matter) and 13 (Amygdala, Anterior Cingulate, Caudate, Cerebellum, Cortex, Frontal Cortex, Hippocampus, Hypothalamus, Nucleus Accumbens, Putamen, Spinal Cord, and Substantia Nigra) brain regions from 131 and 175 postmortem donors using the 1000 Genomes Project as reference, respectively. More detailed information is described in the original publications [71, 72]. First, we selected all the SNPs located in these identified lncRNAs using the lncRNASNP database. Then, we uploaded these SNPs into the Braineac and GTEx databases, respectively, and tested whether these SNPs affected the expression of known AD pathogenic genes or not. The p-values of eQTL are corrected by multiple testing (Benjamini–Hochberg method).
Besides this cis action, it is confirmed that lncRNAs can also regulate the protein-coding genes in trans level [73]. Therefore, to further explore the functions of the key lncRNAs in the pathology of AD, the GEO data were used to conduct the differential co-expression analysis. Based on the strategy of measuring lncRNA expression level as described in the previous step, we used the same samples to identify differential expression of the protein-coding genes between the AD patients and healthy individuals (FC≥1.5 and p < 0.05). Then, correlation coefficients (R) and significance level between the differential expression genes and the lncRNAs were calculated, and the co-expressed genes were selected according to the threshold (|R|≥0.9 and p < 0.001). If there are known AD pathogenic genes co-expressed with those lncRNAs, the functions of the corresponding lncRNA in AD are likely associated with it. Finally, besides those known pathogenic genes, other genes co-expressed with key lncRNAs in the co-expression networks can be used to identify the biological pathways associated AD. Therefore, we selected the protein-coding genes co-expressed with each of these identified lncRNAs, respectively, and used the R package ‘clusterProfiler’ to conduct the GO enrichment analysis [74]. The enrichment analysis is based on the classical Hypergeometric distribution test, and the adjusted p-values are calculated by the multiple testing (Benjamini–Hochberg method). The cutoff criterion for the enrichment analysis is set to adjusted p-values less than 0.01.
RESULTS AND DISCUSSION
Identifying the lncRNAs potentially related to AD using non-coding SNPs
We used multiple tools in the three-stage analysis to identify lncRNAs potentially related to AD. In stage 1, a total of 478 tag SNPs were selected from the European, African, and Asian ancestry ethnicities, which include 458 SNPs from AlzGene according to the results of the meta-analyses of GWAS studies and 20 SNPs from IGAP. We found that most of the tag SNPs’ allele frequencies are not significantly different between ethnicities (Supplementary Tables 1 and 2). In stage 2, we used HaploReg (v4) tool and discovered 6,220 non-coding SNPs within the LD of those 478 tag SNPs (r 2≥0.8), and of which 4,985 non-coding SNPs are from European ancestry ethnicity, 39 are from African, and 1,472 are from Asian (Supplementary Table 3). In stage 3, by the lncRNASNP, we found that 186 lncRNAs contain at least one of the 6,220 non-coding SNPs, and defined them as potential AD-related lncRNAs (Supplementary Table 4).
Validating and discovering the key lncRNAs in the pathology of AD
By the re-annotation of microarray probes for the lncRNAs potentially related to AD, we found that a total 317 probes can represent 26 corresponding lncRNA sequences. These probes are from five types of Affymetrix microarray, i.e., Human Genome U133A, Human Genome U133B, Human Genome U133 Plus 2.0, Human Gene 1.0 ST, and Human Gene 1.1 ST microarray (Supplementary Table 5). Then, after removing the unsuitable microarrays (the original studies are not designed by the case-control method or the ethnicity and diagnosis of the samples are unclear), e.g., Possible AD), we found a total 1,282 samples in 5 microarray datasets (GSE84422 [75], GSE43326 [76], GSE5281 [77], GSE36980 [78], GSE39420 [79]) from GEO database. Particularly, the 1,282 samples include 563 AD cases and 330 controls from European ancestry ethnicity, 134 AD cases and 56 controls from African ancestry ethnicity, and 95 AD cases and 104 controls from Asian ancestry ethnicity. Base on differential expression analysis and microarray meta-analysis between AD patients and normal people of the 1,282 samples, 12 lncRNAs were identified significantly differentially expressed between AD cases and controls (FC≥1.5 and p < 0.05) (Table 1).
The information of the differentially expressed lncRNAs identified by microarray
To further validate the association between AD and the potential lncRNAs, we used the Kallisto software to quantify the transcript abundance (TPM values) of these lncRNAs from an RNA-seq dataset (including 84 AD cases and 33 controls). Then, after the differential expression analysis of these lncRNAs (based on the TPM values) by the R package ‘DESeq’, 12 lncRNAs were found significantly differentially expressed between AD cases and controls (FC≥1.5 and p < 0.05) (Table 2), and 5 of them are overlapped with findings of microarray analysis. We therefore defined these 5 lncRNAs (NONHSAT152299.1, NONHSAT016928.2, NONHSAT157359.1, NONHSAT160355.1, and NONHSAT018519.2) as the key lncRNAs in the pathology of AD.
The information of the differentially expressed lncRNAs identified by RNA-seq
In particular, we found that the expression of lncRNA NONHSAT018519.2 in European AD patients is significantly decreased compared with healthy persons in Amygdala, Nucleus Accumbens, and Primary Visual Cortex tissues (FC = –1.90, –2.32 and –9.42, respectively) (Fig. 2a-c). Interestingly, according to results of the GTEx project [72], for the healthy European ancestry individuals, NONHSAT018519.2 is significantly highly expressed in brain tissues compared with 40 other human tissues (Fig. 2d). All these phenomena implied that the highly expressed NONHSAT018519.2 in brain plays an important role for healthy persons, and the decreased expression of NONHSAT018519.2 may be associated with the pathology of AD.

The expression levels of lncRNA NONHSAT018519.2 in AD patients and healthy persons (a-c). To AD patients, NONHSAT018519.2 expression significantly decreases compared with healthy persons in the Amygdala (a), Nucleus Accumbens (b), and Primary Visual Cortex (c), and the differentially expressed NONHSAT018519.2 is only identified in these three brain tissues. d) Results of Genotype-Tissue Expression (GTEx) project show that NONHSAT018519.2 is highly expressed in brain tissues of healthy individuals compared with other 40 human tissues. All the samples are collected from European ancestry populations.
Genome mapping analysis using UCSC Genome Browser and LNCipedia
To infer the function of those lncRNAs identified as key to AD pathology, both the lncRNAs and known AD pathogenic genes were located in genome using UCSC Genome Browser and LNCipedia. Finally, a total of two pathogenic genes with known functions, BDNF and ADAM12, were located within 2 kb of two key lncRNAs, NONHSAT018519.2 and NONHSAT016928.2, respectively. Moreover, the genes, BDNF and ADAM12, are the only protein-coding gene within 2 kb of their corresponding lncRNAs. The detailed information was described in Fig. 3.

The location of lncRNAs and protein-coding genes in human genome. a) The locations (GRCh38/hg38) of lncRNA NONHSAT018519.2 and the protein-coding gene BDNF near it. b) The locations of lncRNA NONHSAT016928.2 and the protein-coding gene ADAM12 near it. The green lines with a solid red box were lncRNAs according to LNCipedia database. The LNCipedia ID of these lncRNAs was in one-to-one correspondence with NONCODE ID. The black lines with a dashed box were the protein-coding genes.
Based on the results of previous studies, BDNF (brain-derived neurotrophic factor) has a neuroprotective effect against AD by facilitating neurotransmitter release and reducing Aβ peptide toxicity, and shows a lower expression in AD brains [80, 81]. Similarly, the expression of ADAM12 (a proteolytic enzyme) is reduced in AD brain, and it provides a broad neuroprotection by mediating neurotoxic effect of Aβ peptide [82]. In view of the downregulated (FC = –1.90, –2.32, –9.42 and –2.79,in the Amygdala, Nucleus Accumbens, Primary Visual Cortex, and Fusiform Gyrus, respectively) and upregulated expression (FC = 13.68, 1.60 and 1.55 in Hippocampus, Putamen, and Fusiform Gyrus, respectively) of NONHSAT018519.2 and NONHSAT016928.2 in AD patients compared with the healthy persons, respectively (Tables 1 and 2), the function of them in AD may be associated with regulation of the expression of BDNF and ADAM12 in the opposite directions.
The eQTL analysis based on the data of the Braineac and GTEx projects
The Braineac and GTEx projects involve 10 and 13 brain regions, respectively, and 6 of which are consistent with this study (Amygdala, Hippocampus, Nucleus Accumbens, Putamen, Temporal Cortex, and Thalamus). To further explore the functions of the key lncRNAs in the pathology of AD, cis eQTL analysis was performed to test whether the SNPs located in the lncRNAs affect the expression of the known AD pathogenic genes in the 6 brain regions. As shown in Fig. 4a, the SNP rs7232 (p = 0.0011) and rs12453 (p = 0.00057) in lncRNA NONHSAT160355.1 significantly downregulate the expression of a known AD pathogenic gene TCN1 in Temporal Cortex, which participates in the regulation of homocysteine in brain to increase risk of AD [83]. Then, as seen in Fig. 4b and c, rs12453 is in LD with the SNP rs983392 (r 2 = 0.821568), which is identified to be associated with a reduced risk of AD (OR = 0.9; 95% CI = 0.87–0.92; p = 1.6E –16) [21]. Moreover, the expression of NONHSAT160355.1 in AD patients is significantly increased compared with healthy persons (FC = 1.64, 2.85 and 2.26 in Temporal Cortex, Thalamus, and Fusiform Gyrus, respectively) (Tables 1 and 2). Therefore, the possible functions of NONHSAT160355.1 in the pathology of AD can be inferred as promoting the expression of TCN1.

The rs7232 and rs12453 in NONHSAT160355.1 show an eQTL effect on FADS1 expression. a) The SNP rs7232 (chr11:60173126 T > A) and rs12453 (chr11:60178272 T > C) affect the expression of gene TCN1 (chr11:59852808-59866568) by an eQTL pattern in the Temporal Cortex (TCTX). b) As the result of Lambert et al.’s study [21], rs983392 (chr11:60156035 A > G) is a susceptibility locus associated with AD, and it significantly decreases the risk of AD (OR = 0.9; 95% IC = 0.87–0.92). c) The rs7232 and rs12453 are located in the region of lncRNA NONHSAT160355.1, and the rs12453 is in the LD with the SNP rs983392 (r 2 = 0.821568) based on 1000 Genomes Project. All the location information is according to the human reference genome hg38.
The differential co-expression analysis and enrichment analysis
To further explore the trans regulation of 5 key lncRNAs and their functions acting upon AD pathways, we performed the differential co-expression analysis and enrichment analysis. After the differential co-expression analysis, a total 555 protein-coding genes with significantly differential expression between AD cases and controls were discovered to co-express with 3 key lncRNAs, NONHSAT152299.1, NONHSAT157359.1, and NONHSAT160355.1 (|R|≥ and p < 0.001). Detail information is provided in Supplementary Table 6. Among these protein-coding genes, we found that four known AD susceptibility genes (C4A, C4B, TCF4, GRIP1) co-expressed with the lncRNA NONHSAT152299.1 (Table 3). Particularly, the number of copies for both C4A and C4B are increased in AD patients, which significantly affect the expression of complement component 4 protein (a key complement factor in Aβ peptide metabolism) [84]. TCF4 functions as a transcriptional repressor of BACE1 gene by binding BACE1 promoter, and further represses Wnt/β-catenin stimulation in AD [85]. The GRIP1 affects the functions of PFA1 and PFA2 in recognizing Aβ monomers, protofibrils, and fibrils and the structures of their antigen binding fragments in AD [86]. Moreover, the polymorphisms of C4A, C4B, TCF4, and GRIP1 are proved to be significantly associated with the pathology of AD. In addition, these four susceptibility genes locate in different chromosomes than the lncRNAs co-expressed with them, and they may be regulated by lncRNAs in trans level.
Co-expression between the known AD pathogenic genes and the lncRNA NONHSAT152299.1
As the results of the gene set enrichment analysis, we found that 11 enriched GO terms related to the pathology of AD are significantly associated with the lncRNA NONHSAT160355.1 and NONHSAT152299.1 (p < 0.01) (Table 4). Particularly, three enriched GO terms associated with the lncRNA NONHSAT160355.1 are involved in the immune response. As reported, immune response participates in the regulation of metabolic processes related to AD [87]. In a recent study, the same pathways (GO:0019886, GO:0002495, and GO:0002504) are also identified to be associated the pathology of AD by an AD-related variants enrichment in GO database [88]. Interestingly, immune responses associated with NONHSAT160355.1 are related to antigen processing and presentation via the major histocompatibility complex (MHC) class II, which is a cell surface protein to bind and present antigen peptide fragments to T cell and markedly high expressed in AD brain [89]. As a result of eQTL analysis in a previous step, NONHSAT160355.1 was found to significantly affect the expression of a known AD pathogenic gene (TCN1) in the Temporal Cortex (Fig. 4). As reported, the TCN1 participates in the regulation of homocysteine in brain to increase risk of AD, and homocysteine can oxidize the HLA antigens to affect the antigen processing and presentation related to MHC class II [83, 90]. Combined with the higher expression of the NONHSAT160355.1 in AD patients compared with controls (FC = 1.64, 2.85 and 2.26 in Temporal Cortex, Thalamus, and Fusiform Gyrus, respectively) (Tables 1 and 2), the lncRNA may increase the risk of AD by promoting the expression of TCN1 to affect the antigen processing and presentation related to MHC class II.
GO enrichment results of the protein-encoding genes co-expressed with lncRNAs
The remaining 8 enriched GO terms associated with the lncRNA NONHSAT152299.1 are involved the development of nervous system. Moreover, GRIP1 is negatively co-expressed with NONHSAT152299.1 in Thalamus (R = –0.9003; p = 2.73E –08) (Table 3), and it is demonstrated to play a critical role for the development of nervous system in brain [91 –93]. Given that the higher expression of NONHSAT152299.1 in AD patients compared with controls (FC = 2.06 in Thalamus) (Table 1), the lncRNA may increase the risk of AD by inhibiting the expression of GRIP1 to affect the development of nervous system in brain.
Conclusion
Based on the GWAS, microarray, and RNA-seq data, this study identified a total of five key lncRNAs in the pathology of AD, which contain the SNPs significantly associated with AD and are differentially expressed between AD cases and controls. By combining the results of function prediction by genome mapping, eQTL, differential co-expression, and GSEA analysis, four of the key lncRNAs were identified to increase or decrease the risk of AD by regulating disease-related pathogenic genes and pathways. Particularly, the lncRNA NONHSAT160355.1, NONHSAT152299.1, and NONHSAT016928.2 were identified as accelerating the risk of AD by 1) promoting TCN1 expression to affect the antigen processing and presentation related to MHC class II, 2) inhibiting GRIP1 expression to affect the development of nerve system in brain, and 3) inhibiting ADAM12 expression to aggravate neurotoxic effect of Aβ peptide, respectively. The NONHSAT018519.2 demonstrates a neuroprotective effect against AD by promoting BDNF expression to facilitate neurotransmitter release and reducing Aβ peptide toxicity. These findings can help to discover novel key lncRNAs in the pathology of AD and improve the understanding of the pathogenesis of AD.
Footnotes
ACKNOWLEDGMENTS
The National Natural Science Foundation of China (81872798), National Key Research and Development Program of China (2018YFC0910500), Innovation Project on Industrial Generic Key Technologies of Chongqing (cstc2015zdcy-ztzx120003), and Fundamental Research Funds for the Central Universities (2018QNA7023, 10611CDJXZ238826, 2018CDQYSG0007, CDJZR14468801, CDJKXB14011).
