Abstract
A recent study sequenced the full coding region of SORL1 in 1,255 early-onset Alzheimer’s disease (EOAD) cases and 1,938 control individuals, and investigated the contribution of genetic variability in SORL1 to EOAD risk in a European cohort. This study identified six common variants and five low frequency variants in the SORL1 coding sequence. However, none of these 11 variants was significantly associated with EOAD risk after adjusting for multiple testing. We consider whether these 11 SORL1 variants identified in European EOAD contribute to late-onset Alzheimer’s disease (LOAD) risk in individuals of European ancestry. Here, we investigated these 11 SORL1 variants identified in European EOAD and LOAD risk in individuals of European ancestry using a large-scale LOAD GWAS. Our results indicate that three genetic variants rs2070045, rs2276412, and rs17125548 as well as their tagged genetic variants contribute to LOAD risk in European population. We further investigate whether these variants could affect SORL1 expression using multiple expression quantitative trait loci (eQTLs) datasets. Our findings suggest that three genetic variants rs2070045, rs1699102, and rs3824968 could significantly regulate SORL1 expression in human brain tissues. We believe that our findings further provide important supplementary information about the involvement of the SORL1 variants in LOAD risk.
INTRODUCTION
Alzheimer’s disease (AD) is the most common neurodegenerative disease in the elderly [1]. According to the 2016 dementia report from the World Health Organization, there are 47.5 million people with dementia and 7.7 million new cases every year in the worldwide [2]. It was estimated that number of people with dementia expected to reach 75.6 million in 2030 and almost triple by 2050 to 135.5 million [2]. In order to investigate underlying AD genetic risk, large-scale genome-wide association studies (GWAS), candidate gene studies and pathway analysis of GWAS have been widely performed, and have yielded important new insights into the genetic mechanisms of AD [1, 3–25]. Some novel AD susceptibility loci have been identified in European populations, and were successfully replicated in other populations [5–6, 8–24].
It has been well established that SORL1 gene is significantly associated with increased AD risk, especially the large-scale GWAS [26–28]. In a recent study, Verheijen et al. sequenced the full coding region of SORL1 in 1,255 early-onset AD (EOAD) cases and 1,938 control individuals, and investigated the contribution of genetic variability in SORL1 to EOAD risk in a European cohort [29]. They identified six common variants (rs12364988, rs2298813, rs78274293, rs2070045, rs1699102, and rs3824968) with minor allele frequency (MAF) ≥0.05, and five low frequency variants (rs117260922, rs146903951, rs62617129, rs2276412, and rs17125548) with 0.01 ≤ MAF <0.05 in the SORL1 coding sequence [29].
Verheijen et al. performed a single variant association analysis, and found that the common variant rs78274293 showed nominal significant association with EOAD (p = 0.03) [29]. However, none of these 11 variants was significantly associated with EOAD risk after adjusting for multiple testing [29]. Verheijen et al. further compared their results with previous findings in late-onset AD (LOAD) cohorts [26]. In 2015, Vardarajan et al. performed a family and cohort based genetic association study to identify functional SORL1 mutations in LOAD cases of Caribbean-Hispanic origin, and reported significant association of rs117260922 (p = 7.68E-07) and rs2298813 (p = 6.09E-07) variants with LOADrisk [26].
In discussion, Verheijen et al. described that the lack of association between these SORL1 variants and EOAD risk may indicate reduced pathogenic relevance of these variants in EOAD compared with LOAD [29]. The cohort ethnicity and founder effects may cause discrepancies between variant frequency and direction of effect [29]. These concerns above prompted us to consider whether these 11 SORL1 variants identified in European EOAD contribute to LOAD risk in individuals of European ancestry.
Here, we aim to replicate and validate the potential association between these variants and SORL1 risk using a large-scale LOAD GWAS dataset. Meanwhile, evidence shows that genetic variants could modify gene expression and cause disease risk [1, 30–32]. We further investigate whether these variants could affect SORL1 expression using multiple expression quantitative trait loci (eQTLs) datasets.
MATERIALS AND METHODS
The LOAD GWAS dataset
The LOAD GWAS dataset is from a large-scale meta-analysis of LOAD GWAS in European descent, which is performed by the International Genomics of Alzheimer’s Project (IGAP) [28]. This dataset consisted of 74,046 individuals. In stage 1, the IGAP genotyped and imputed 7,055,881 SNPs, and performed a meta-analysis of four GWAS datasets including 17,008 cases and 37,154 controls from four consortia the Alzheimer’s Disease Genetic Consortium (ADGC), the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the European Alzheimer’s Disease Initiative (EADI), and the Genetic and Environmental Risk in Alzheimer’s Disease (GERAD) Consortium [28]. In stage 2, 11,632 SNPs were genotyped and replicated using 8,572 cases and 11,312 controls of European ancestry originating from Austria, Belgium, Finland, Germany, Greece, Hungary, Italy, Spain, Sweden, the UK, and the United States [28]. Here, we got the summary results in stage 1 and combined stage 1+2 [28].
Genetic association analysis
Here, we investigated the potential association between these 11 variants and LOAD susceptibility using the summary association results from above study [28]. If any one variant is not available in the LOAD GWAS dataset, we used HaploReg (version 4) to identify the proxy SNPs based on the linkage disequilibrium (LD) information in 1000 Genomes Project [33]. We selected tagged SNPs with r2 ≥0.8. All the p values in LOAD GWAS are nominal (p < 0.05).
The GWAS eQTLs datasets
The GWAS eQTLs datasets consisted of six brain-expression GWAS eQTLs datasets consisted of 773 brain samples including 177 non-AD cerebellar samples, 197 non-AD temporal cortex samples, 197 AD cerebellar samples, and 202 AD temporal cortex samples, the combined AD and non-AD 374 samples in human cerebellum tissue, and 399 samples in human temporal cortex tissue [30]. The non-AD samples several brain pathologies including progressive supranuclear palsy, Lewy body disease, corticobasal degeneration, frontotemporal lobar degeneration, multiple system atrophy, and vascular dementia [30].
The Braineac eQTLs dataset
The Braineac eQTLs dataset is from the UK Brain Expression Consortium (UKBEC) [34], aiming to release to the scientific community a valid instrument to investigate the genes and SNPs associated with neurological disorders [34]. This dataset includes 10 brain regions from 134 neuropathologically normal individuals of European descent [34]. The 10 brain regions are cerebellar cortex, frontal cortex, hippocampus, medulla, occipital cortex, putamen, substantia nigra, temporal cortex, thalamus, and intralobular white matter [34]. In Braineac, Affymetrix GeneChip Human exon 1.0 ST arrays was used to measure the gene expression [34]. The gene expression in transcript level is the Winsorized mean over exon-specific levels [34].
eQTLs analysis
In brief, a linear regression analysis was applied to evaluate the potential cis-association between eQTLs and gene expression under an additive model in all the eQTLs datasets above. Here, we downloaded the summary results from the six brain expression GWAS datasets to directly evaluate the potential association between these 11 variants and SORL1 gene expression [30]. In the Braineac database, we downloaded the SORL1 gene expression data and the genotype data of generic variants with 1 Mb upstream of transcription start site and 1 Mb downstream of transcription end site [34]. We evaluated the potential association between these 11 variants and SORL1 gene expression using R program. All the p values in eQTLs analysis are nominal (p < 0.05).
RESULTS
Genetic association analysis
We found that three low frequency variants (rs117260922, rs146903951, and rs62617129) were not available in the LOAD GWAS dataset (stage 1, and combined stage 1+2). Using HaploReg (version 4) and the LD information from the 1000 Genomes Project (EUR) [33], we tried to select those SNPs tagged by all three low frequency variants with r2 ≥0.8. However, we did not identify any tagged variant. Thus, our following analyses focus on the remaining 8 genetic variants.
Interestingly, three of the remaining 8 genetic variants are significantly associated with LOAD risk in stage 1 (p < 0.05) including rs2070045 (p = 1.16E-02), rs2276412 (p = 4.68E-07) and rs17125548 (p = 7.15E-05). More importantly, rs2276412 and rs17125548 are still significantly associated with LOAD risk in stage 1+2 (p < 0.05) with p = 3.46E-08 and p = 6.13E-06, respectively, among which rs2276412 had the genome-wide significance (p < 5.00E-08) as described in Table 1.
SORL1 variants and AD susceptibility
*Significant associations with p < 0.05 are bolded. Beta, overall estimated effect size for the effect allele; SE, overall standard error for effect size estimate; p value, meta-analysis p value using regression coefficients (beta and standard error). Beta is the regression coefficient, based on the effect allele using an additive model. Position, chromosome 11 (hg19); Beta >0 and Beta <0 means that this effect allele increases and reduces LOAD risk, respectively.
We further investigated whether the tagged variants by these three genetic variants could also contribute to LOAD risk. We used the HaploReg (version 4) and LD information from the 1000 Genomes Project (EUR) to select all possible variants tagged by rs2070045, rs2276412, and rs17125548 with r2 ≥0.8 [33]. In the end, we got 8, 13, and 11 genetic variants tagged by rs2070045, rs2276412, and rs17125548, respectively including the query variant, as described in the Supplementary Table 1. We again evaluated the potential association between these tagged genetic variants and LOAD risk using the LOAD GWAS datasets. Interestingly, all these genetic variants are still significantly associated with LOAD risk in stage 1 or stage 1+2 (p < 0.05) as described in Tables 2–4.
8 genetic variants tagged by rs2070045 and AD susceptibility
*Significant associations with p < 0.05 are bolded. Beta, overall estimated effect size for the effect allele; SE, overall standard error for effect size estimate; p value, meta-analysis p value using regression coefficients (beta and standard error). Beta is the regression coefficient, based on the effect allele using an additive model. Position, chromosome 11 (hg19); Beta >0 and Beta <0 means that this effect allele increases and reduces LOAD risk, respectively.
13 genetic variants tagged by rs2276412 and AD susceptibility
*Significant associations with p < 0.05 are bolded. Beta, overall estimated effect size for the effect allele; SE, overall standard error for effect size estimate; p value, meta-analysis p value using regression coefficients (beta and standard error). Beta is the regression coefficient, based on the effect allele using an additive model. Position, chromosome 11 (hg19); Beta >0 and Beta <0 means that this effect allele increases and reduces LOAD risk, respectively.
11 genetic variants tagged by rs17125548 and AD susceptibility
*Significant associations with p < 0.05 are bolded. Beta, overall estimated effect size for the effect allele; SE, overall standard error for effect size estimate; p value, meta-analysis p value using regression coefficients (beta and standard error). Beta is the regression coefficient, based on the effect allele using an additive model. Position, chromosome 11 (hg19); Beta >0 and Beta <0 means that this effect allele increases and reduces LOAD risk, respectively.
eQTLs analysis
We found six genetic variants rs12364988, rs3824968, rs2070045, rs1699102, rs2298813, and rs2276412 to be available in the six brain expression GWAS datasets. Interestingly, three genetic variants, rs2070045, rs1699102, and rs3824968, could significantly regulate SORL1 expression. In summary, the rs1699102C allele could significantly regulate increased (beta >0) SORL1 expression in two of the six brain-expression GWAS eQTLs datasets. The rs2070045G allele could significantly regulate increased (beta >0) and reduced (beta <0) SORL1 expression in the four of the six brain-expression GWAS eQTLs datasets. The rs3824968 A allele could also significantly regulate increased (beta >0) and reduced (beta <0) SORL1 expression in the four of the six brain-expression GWAS eQTLs datasets. All these results are described in Table 5 (significance level 0.05).
Genetic variants and SORL1 expression in the human brain tissues
*Significant associations with p < 0.05 are listed. AD, Alzheimer’s disease; CER, cerebellum; TX, temporal cortex; Beta is the regression coefficient, based on the effect allele using an additive model. Beta >0 and Beta <0 means that this effect allele regulates increased and reduced gene expression, respectively.
We found six genetic variants, rs12364988, rs3824968, rs2070045, rs1699102, rs2298813, and rs78274293, to be available in the Braineac dataset. Interestingly, three genetic variants, rs2070045, rs1699102, and rs3824968, could also significantly regulate SORL1 expression only in the human putamen tissue. More importantly, rs1699102C allele, rs2070045G allele, and rs3824968 A allele could only regulate increased (beta >0) SORL1 expression in the human putamen tissue, as described in Table 5.
DISCUSSION
Taken together, Verheijen et al. identified 11 genetic variants in the SORL1 coding sequence, none of which showed significantly association with EOAD risk in the European cohort [29]. Verheijen et al. compared their findings with those from LOAD cohorts of Caribbean-Hispanic origin, and highlighted rs117260922 and rs2298813 variants to be significantly associated with LOAD risk [26]. All these findings above indicate that SORL1 variants, although lack of association with EOAD risk, may be significantly associated with LOAD risk.
In recent years, large-scale GWAS have been widely performed and contribute to identify the common AD genetic variants [1, 35]. It is possible and rapid to validate a finding using the available large-scale AD GWAS dataset. Here, we investigated these 11 SORL1 variants identified in European EOAD and LOAD risk in individuals of European ancestry using a large-scale LOAD GWAS. Our results indicate that three genetic variants, rs2070045, rs2276412, and rs17125548, as well as their tagged genetic variants contribute to LOAD risk in European population. We believe that our findings further provide important supplementary information about the involvement of the SORL1 variants in LOAD risk.
Meanwhile, evidence shows that genetic variants could modify gene expression and cause disease risk, although may lack of association in genetic association studies [1, 36–38]. Using HaploReg (version 4) [33], we identified that rs2070045, rs2276412, and rs17125548 and their tagged variants were located in non-protein-coding region (Supplementary Table 2), which indicated that all these variants may regulate nearby gene expression. Here, we further evaluated these genetic variants using six brain-expression GWAS eQTLs datasets and Braineac eQTLs dataset [30, 34]. Our findings suggest that three genetic variants, (rs2070045, rs1699102, and rs3824968, could significantly regulate SORL1 expression in human brain tissues. We further found that these expression associations may be tissue and disease independent. These three genetic variants (rs2070045, rs1699102, and rs3824968) could influence SORL1 expression in kinds of human brain regions such as cerebellum, temporal cortex tissue, and putamen, and in different diagnostic groups including AD cases [30], neuropathologically normal individuals [34], as well as combined AD and non-AD samples with several brain pathologies [30].
Here, we also selected 17 coding exonic variants significantly associated with LOAD identified by Vardarajan et al. in Caribbean-Hispanic population [26]. However, most of these 17 variants are rare in the human genome, and not available in the AD GWAS dataset and the eQTLs datasets. Only the rs2298813 variant is available in the AD GWAS dataset, which has been described in Table 1.
SORL1 is a neuronal apolipoprotein E receptor, which is predominantly expressed in the central nervous system. Recently, the aberrant expression of SORL1 has been implicated in AD pathogenesis [39]. The reduced SORL1 expression has been reported in brain tissue from sporadic AD [27]. The reduced expression of SORL1 is also associated with increased amyloid-β peptide production [27]. Hence, the three genetic variants, rs2070045, rs1699102, and rs3824968, may be potential targets to reduce SORL1 expression level. We believe that these findings further provide important regulating mechanisms of these genetic variants in AD risk. In the future, we will further perform the validation of associated study and functional study of significant eQTLs in human cell lines. Meanwhile, other studies are still required to verify our findings.
