Abstract
Large-scale genome-wide association studies have reported EPHA1 rs11767557 variant to be associated with Alzheimer’s disease (AD) risk in the European population. However, it is still unclear how this variant functionally contributes to the underlying disease pathogenesis. The rs11767557 variant is located approximately 3 kb upstream of EPHA1 gene. We think that rs11767557 may modify the expression of nearby genes such as EPHA1 and further cause AD risk. Until now, the potential association between rs11767557 and the expression of nearby genes has not been reported in previous studies. Here, we evaluate the potential expression association between rs11767557 and EPHA1 using multiple large-scale eQTLs datasets in human brain tissues and the whole blood. The results show that rs11767557 variant could significantly regulate EPHA1 gene expression specifically in human whole blood. These findings may further provide important supplementary information about the regulating mechanisms of rs11767557 variant in AD risk.
INTRODUCTION
Alzheimer’s disease (AD) is a most common neurodegenerative disease [1–4]. In recent years, large-scale genome-wide association studies (GWAS), in particular the International Genomics of Alzheimer’s Project (IGAP), and sequencing analysis identified some novel AD genetic variants in the European population including GAB2, ABCA7, APOE, BIN1, CASS4, CD2AP, CD33, CELF1, CLU, CR1, EPHA1, FERMT2, HLA-DRB5/DRB1, INPP5D, MEF2C, MS4A6A, NME8, PICALM, PTK2B, SLC24A4, SORL1, ZCWPW1, and TREM2 [5–25]. Interestingly, some of these variants were successfully replicated in other populations [6–23, 26].
Most AD variants are located in non-coding regions, and may alter the expression of nearby genes [23, 28]. These variants influencing gene expression are expression quantitative trait loci (eQTLs) [3, 29–37]. Allen et al. evaluated the potential association between AD risk variants and the expression of 6 risk genes including ABCA7, BIN1, CLU, MS4A4A, MS4A6A, and PICALM in human brain [29]. Their results showed that rs11136000, rs2304933, and rs2304935 variants could significantly regulate the expression of nearby genes [29]. Karch et al. investigated the influence of AD risk variants on the expression levels of 10 risk genes including ABCA7, BIN1, CD2AP, CD33, CLU, CR1, EPHA1, MS4A6A, MS4A6E, and PICALM using brain tissue from AD cases and normal controls [38]. However, they did not report any significant association between these AD risk variants and the expression levels of their corresponding risk genes [38]. Allen et al. genotyped 12 AD risk variants including 10 variants reported by the IGAP consortium and 2 variants at the CR1 and CD2AP gene [39]. They further performed an eQTLs analysis to evaluate the association of these AD risk variants with the expression levels of 34 genes using 400 cerebellum and temporal cortex samples [39]. The results indicated that AD risk variants could significantly regulate the expression levels of CR1, HLA-DRB1, and PILRB genes [39]. Karch et al. found that the ZCWPW1 rs1476679 variant could regulate the expression of PILRB and GATS genes, and CELF1 rs7120548 variant in linkage disequilibrium with rs10838725 was associated with MTCH2 expression [40].
EPHA1 is a member of the ephrin receptor subfamily [41]. Ephrins and Eph receptors are membrane bound proteins which play roles in cell and axon guidance and in synaptic development and plasticity [41]. EPHA1 is expressed by CD4-positive T lymphocytes and monocytes [42]. EPHA1 could regulate cell morphology and motility, and play additional roles in apoptosis and inflammation [41]. EPHA1 is also an important gene for immune response [43–45]. In 2011, a large-scale GWAS reported EPHA1 rs11767557 variant to be associated with AD susceptibility in a European population [41]. The rs11767557 variant is located approximately 3 kb upstream of EPHA1 gene [46]. We think that rs11767557 may modify the expression of nearby genes such as EPHA1 and cause AD risk. Until now, the potential association between rs11767557 and the expression of nearby genes had not been reported in previous studies [29, 38–40]. Here, we evaluate the potential association between rs11767557 and EPHA1 expression using multiple large-scale eQTLs datasets in human brain tissues and the whole blood.
MATERIALS AND METHODS
Linkage disequilibrium analysis
Here, we performed a linkage disequilibrium analysis using the HaploReg (version 4.1) online database based on the linkage disequilibrium information from 1000 Genomes Project (EUR) [47]. We defined the variants tagged by rs11767557 with r2 ≥ 0.8. All these tagged variants are used to perform the genetic association analysis, enhancer enrichment analysis, and eQTLs analysis. The enhancer enrichment analysis is used to evaluate in which cell types these variants are significantly enriched.
Genetic association analysis
Here, we select the large-scale meta-analysis of AD GWAS [23]. This dataset consisted of 74,046 individuals. Stage 1 included 17,008 cases and 37,154 controls [23]. Stage 2 included 8,572 cases and 11,312 controls [23]. Here, we investigated the potential association between genetic variants tagged by rs11767557 and AD risk using summary association results [23]. All the p values in AD GWAS were nominal (p < 0.05). More detailed information has widely descried in our previous studies [18, 48].
Enhancer enrichment analysis
Enhancers are DNA regulatory sequences and could regulate tissue-specific gene expression [49, 50]. To evaluate the overlap of rs11767557 tagged variants with predicted enhancers in each reference epigenome, we performed an enhancer enrichment analysis using HaploReg (version 4.1) [47]. The epigenomic data is from Roadmap Epigenomics project [47]. There are four different methods to define the enhancers including the 15-state core model, the 25-state model incorporating imputed epigenomes, the H3K4me1/H3K4me3 peaks and the H3K27ac/H3K9ac peaks [47]. The differences in these four different methods are based on the source for epigenomes [47]. The core 15-state model and the 25-state model use the imputed marks, H3K4me1/H3K4me3 use peaks from H3K4me1 and H3K4me3, and H3K27ac/H3K9ac use peaks from H3K27ac and H3K9ac [47].
Here, we selected all 1000 Genomes variants with a minor allele frequency 5% in any population to be the background set. The overlap of variants with enhancers in each cell type is compared to the background set [47]. A binomial test is used to evaluate the relative enrichment to the background set [47]. The uncorrected p-values are reported. There are a total of 127 different cell types in the HaploReg (version 4.1) [47]. Here, we limit our analysis in 28 blood cells and 13 brain cells, as the eQTLs in human brain tissues and the whole blood are widely and publicly available. To validate the enhancer enrichment analysis, we then performed an eQTLs analysis using multiple datasets in human brain tissues and the whole blood.
Braineac eQTLs datasets
The Brain eQTL Almanac (Braineac) is a web-based resource to access the UK Brain Expression Consortium (UKBEC) dataset [51]. The Braineac includes 10 eQTLs datasets in 10 brain tissues of 134 neuropathologically normal individuals [51]. The 10 brain tissues are cerebellar cortex, frontal cortex, hippocampus medulla (specifically inferior olivary nucleus), occipital cortex (specifically primary visual cortex), putamen, substantia nigra, temporal cortex, thalamus, and intralobular white matter [51]. More detailed information is described in the original study [51].
Brain expression GWAS eQTLs datasets
The brain expression GWAS includes 6 eQTLs datasets in 773 brain samples including 197 AD cerebellar samples, 202 AD temporal cortex samples, 177 non-AD cerebellar samples, 197 non-AD temporal cortex samples, the combined cerebellar AD and non-AD subjects (CER_All, n = 374), and the combined temporal cortex AD and non-AD subjects (TX_All, n = 399) [52]. The AD and non-AD samples were analyzed both separately and jointly in both cerebellar and temporal cortex tissues. The non-AD samples have kinds of brain pathologies including progressive supranuclear palsy, Lewy body disease, corticobasal degeneration, frontotemporal lobar degeneration, multiple system atrophy, and vascular dementia [52].
GTEx eQTLs datasets
The Genotype-Tissue Expression (GTEx) database (version 6) includes a total of 44 tissues, 449 donors, and 7,051 samples with at least 70 samples in each tissue [53]. These donors have several death pathologies including traumatic injury, cerebrovascular disease, heart disease, liver, renal, respiratory, and neurological diseases [54]. Here, we limit our analysis in 10 human brain tissues including anterior cingulate cortex, caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex BA9, hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen basal ganglia [53].
eQTLs analysis in human brain tissues
In brief, a linear regression analysis was applied to evaluate the potential association between eQTLs and gene expression under an additive model in all the eQTLs datasets above [51–53]. A cis window is a genetic region, where we evaluate the potential association between eQTLs and gene expression. In Braineac and GTEx, a cis window was defined to be 1 Mb upstream of transcription start site and 1Mb downstream of transcription end site [51, 53]. In the brain expression GWAS, a cis window is defined to be within±100 kb of the targeted gene [52]. In Braineac, we downloaded the EPHA1 gene expression data and the genotype data of generic variants with 1 Mb upstream of transcription start site and 1 Mb downstream of transcription end site [51]. We utilized the R program to evaluate the association between rs11767557 and EPHA1 expression. Meanwhile, we downloaded the summary results from the online brain expression GWAS (https://www.synapse.org/#!Synapse:syn3157225) and the GTEx (version 6, https://gtexportal.org/home/datasets,) database to directly evaluate the association rs11767557 and EPHA1 expression.
eQTLs analysis in human whole blood
Here, we investigated the potential effect of rs11767557 variant on the expression of EPHA1 gene using multiple large-scale eQTLs datasets in whole blood including 5,311 individuals [55], 2,765 individuals [56], 2,116 individuals [57], and 5,257 individuals [58], respectively. In all these datasets above, a linear regression analysis or Spearman correlation coefficient is used to detect the potential association between genetic variants and the expression of neighboring genes [55–59]. More detailed information is descried in the original studies [55–58].
AD case-control gene expression analysis
To investigate the potential differential expression of EPHA1 gene in AD blood, we analyzed the gene expression profiles from GEO (ID: GSE63060). GSE63060 dataset contains 145 AD cases and 104 normal elderly controls [60]. We used NCBI web tool GEO2R [61] for differential expression analysis. We define a significantly differential expression of EPHA1 gene with fold change <0.5 or >2 and p < 0.05 in AD cases compared with controls.
RESULTS
Linkage disequilibrium analysis
Using HaploReg (version 4.1), we identified 6 variants tagged by rs11767557 with r2 ≥ 0.8. The detailed information about the linkage disequilibrium, position, and allele frequency in four populations is described in Table 1.
6 variants tagged by rs11767557 with r2 ≥ 0.8
AFR, frequency in African samples; AMR, frequency in Ad Mixed American samples; ASN, frequency in East Asian samples; EUR, frequency in European samples; LD, linkage disequilibrium; SNP, single nucleotide polymorphism; Ref, reference allele; Alt, altered allele.
Genetic association analysis
In the large-scale AD GWAS dataset, all the 6 genetic variants are significantly associated with AD risk in stage 1+2 (p < 0.05). Interestingly, all these variants have the genome-wide significance (p < 5.00E-08) as described in Table 2.
6 variants tagged by rs11767557 and AD susceptibility
*Significant associations with p < 0.05 are bolded. Beta, overall estimated effect size for the effect allele; SE, overall standard error for effect size estimate; p value, meta-analysis p value using regression coefficients (beta and standard error). Beta is the regression coefficient, based on the effect allele using an additive model. Position, chromosome 7 (hg19); Beta >0 and Beta <0 means that this effect allele increases and reduces AD risk, respectively.
Enhancer enrichment analysis
To perform the enhancer enrichment analysis, we selected four different methods to define the enhancers including the 15-state core model, the 25-state model incorporating imputed epigenomes, the H3K4me1/H3K4me3 peaks and the H3K27ac/H3K9ac peaks [47]. Interestingly, the results showed that these 6 genetic variants are significantly enriched in some blood cell types and brain cells (Table 3, significance threshold 5.00E-02). These findings indicate that these 6 genetic variants are likely to regulate gene expression in the blood and brain cell types.
Enhancer enrichment analysis using HaploReg
Significant associations with p < 0.05 are bolded. O, observed; E, expected; BLD, blood; BRN, brain
eQTLs analysis in human brain tissues
We found that rs11767557 variant was available in all the Braineac and Brain expression GWAS eQTLs datasets. However, the rs11767557 variant was available only in the four of the 10 GTEx human brain tissues including Cerebellar Hemisphere, Cerebellum, Cortex, and Frontal Cortex BA9. The eQTLs analysis showed that rs11767557 variant could not regulate the gene expression of EPHA1 in these brain tissues. More detailed information is provided in Table 4.
rs11767557 variant and EPHA1 gene expression in human brain tissues
Significance level p < 0.05; rs11767557, chr7 : 143109139 (hg19).
eQTLs analysis in human whole blood
The rs11767557 variant is available in three of these four large-scale eQTLs datasets including 5311 individuals [55], 2116 individuals [57], and 5257 individuals [58]. Interestingly, the rs11767557 variant C allele could significantly regulate increased gene expression of EPHA1 in these three eQTLs datasets with p = 9.67E-09 [55], p = 1.10E-13 [57], and p = 4.11E-20 [58]. The rs11767557 variant is available in one of these four large-scale eQTLs datasets [56]. The alleles of four variants tagged by rs11767557 and linkage disequilibrium with rs11767557 C allele could significantly regulate the increased EPHA1 expression [56]. Meanwhile, the results also indicate that rs11767557 C allele could significantly regulate increased expression of other nearby genes including EPHA1-AS1, TAS2R60, TAS2R62P, OR2R1P, OR10AC1P, TAS2R41, and ZYX. More detailed information is provided in Table 5.
rs11767557 variant and EPHA1 gene expression in blood
Significance level p < 0.05; rs11767557, chr7 : 143109139 (hg19); NA, not available; Z-score = effect (beta)/standard error; Beta is the regression coefficient based on the effect allele. Beta >0 and Beta <0 means that this effect allele regulates increased and reduced gene expression, respectively.
AD case-control gene expression analysis
Using GSE63060 dataset with 145 AD cases and 104 normal elderly controls [60], we did not identify any differential expression of EPHA1 gene in AD cases and normal elderly controls with p = 0.58.
DISCUSSION
In 2011, a GWAS highlighted a significant association between rs11767557 variant and AD risk in European population [23]. Andrews et al. analyzed 1,626 non-demented older Australians of European ancestry and identified rs11767557 variant to be significantly associated with quadratic rate of change [62]. Carrasquillo et al. selected more than 2,000 cognitively normal Caucasians, and identified rs11767557 variant T allele to be associated with increased rates of memory decline in subjects with a final diagnosis of mild cognitive impairment or AD [63].
Until recently, it is still unclear how rs11767557 variant functionally contributes to the underlying disease pathogenesis. The rs11767557 variant is located approximately 3 kb upstream of EPHA1 gene. More and more evidence shows that genetic variants in non-coding regions may modify gene expression and cause disease risk [3, 59]. Until now, several studies have evaluated the potential association between AD risk variants and the expression of nearby AD risk genes [29, 38–40]. However, none of these studies reported any significant association between rs11767557 and the expression of nearby genes [29, 38–40].
Here, we performed a linkage disequilibrium analysis, genetic association analysis, enhancer enrichment analysis, and eQTLs analysis. Our results show that 6 genetic variants tagged by rs11767557 are significantly associated with AD risk, and enriched in blood cell types and brain cells. We evaluate the potential expression association between rs11767557 and EPHA1 using large-scale eQTLs datasets in human brain tissues and the whole blood. Our results show that rs11767557 variant could significantly regulate EPHA1 gene expression specifically in human whole blood.
It is well known that tissue and disease specific factors may exert their influences on gene expression [52, 64]. To perform an eQTLs analysis in brain tissues, we selected three different datasets including the Braineac [51], brain expression GWAS datasets [52], and the GTEx [54]. There are some differences in these datasets. The Braineac eQTLs dataset includes 10 brain regions of 134 neuropathologically normal individuals with European descent [51]. Here, we can directly perform an eQTLs analysis using the 134 neuropathologically normal individuals without the adjustment for disease status. The brain expression GWAS datasets included two brain tissues and samples with a wide variety of brain pathologies such as AD, progressive supranuclear palsy, Lewy body disease, corticobasal degeneration, frontotemporal lobar degeneration, multiple system atrophy, and vascular dementia [52]. In GTEx, there are 10 brain tissues, and most donors (95%) are neuropathologically normal individuals [54]. For example, the donors with neurological diseases are only about 3.7% of these donors age 20–39, and 2.3% of these donors age 60–71 [54].
Here, we found that the rs11767557 variant could not significantly regulate EPHA1 gene expression in human brain tissues. Three reasons may cause this negative association in human brain tissues. First, it is reported that genetic variants may modify gene expression and cause disease risk [30, 65]. In reality, some eQTLs have more ubiquitous effects, and others may need tissue, cell, region, and disease specific factors to exert their influences on gene expression [52, 64]. Second, the enhancer enrichment analysis shows that 6 genetic variants are significantly enriched in blood cell types and brain cells. This is a bioinformatics analysis, and rs11767557 variant may not be predicted to locate in enhancer histone marks in human brain tissues. Third, the sample sizes in these brain eQTLs datasets may be small compared with the large-scale sample sizes in blood eQTLs datasets. In summary, we believe that our findings may further provide important supplementary information about the regulating mechanisms of rs11767557 variant in AD risk.
Despite these interesting results, we recognize some limitations in our study. First, our findings provide moderate but not conclusive evidence of association between rs11767557 and EPHA1 gene expression. Further functional study in human cell lines would be helpful. However, we could not perform a direct validation due to the limitations of the experimental conditions. Second, we did not identify any significant association between rs11767557 and EPHA1 gene expression in human brain tissues. We think that future studies using large-scale sample sizes in brain eQTLs and validation studies should further evaluate our findings.
Footnotes
ACKNOWLEDGMENTS
This work was supported by funding from the National Nature Science Foundation of China (Grant No. 61571152), the National High-tech R&D Program of China (863 Program) (No: 2014AA021505, 2015AA020101, 2015AA020108) and the National Science and Technology Major Project (No: 2013ZX03005012 and 2016YFC1202302) and the Tianjin Basic Research and Frontier Technology Program (No. 13JCYBJC39500).
