Abstract
Background
Alzheimer's disease and related dementias (ADRD) frequently co-occur with comorbidities such as diabetes and cardiovascular diseases in elderly populations.
Objective
Utilize a life-course approach to identify genetic variants that are associated with the co-occurrence of ADRD and another comorbid condition.
Methods
Research data from African American participants of the Indianapolis-Ibadan Dementia Project (IIDP) linked with electronic medical record (EMR) data and genome-wide association study (GWAS) data were utilized. The age of onset for ADRD was obtained from longitudinal follow-up of the IIDP study. Age of onset for comorbid conditions was obtained from EMR. The analysis included 1177 African Americans, among whom 174 were diagnosed with ADRD. A semi-parametric marginal bivariate survival model was used to examine the influence of single nucleotide polymorphisms (SNPs) on dual time-to-event outcomes while adjusting for sex, years of education, and the first principal component of GWAS data.
Results
Targeted analysis of 20 SNPs that were reported to be associated with ADRD revealed that six were significantly associated with dual-disease outcomes, specifically congestive heart failure and cancer. In addition, eight novel SNPs were identified for associations with both ADRD and a comorbid condition.
Conclusions
Using a bivariate survival model approach, we identified genetic variants associated not only with ADRD, but also with comorbid conditions. Our utilization of dual-disease models represents a novel analytic strategy for uncovering shared genetic variants for multiple disease phenotypes.
Keywords
Introduction
Alzheimer's disease and related dementias (ADRD) are increasingly prevalent among the aging population in the United States, presenting significant challenges to the healthcare system. 1 Notably, many individuals diagnosed with ADRD often have concurrent comorbidities, such as heart diseases,2,3 hypertension,4,5 and diabetes.5,6 Recent research indicates that the onset ages of some of these comorbid conditions may significantly influence the risk of developing ADRD. For instance, midlife obesity, hypertension, and high cholesterol are associated with increased risks of ADRD in late life, while late-life obesity and hypertension have been linked to decreased risk. 7 These findings highlight the importance of considering the timing of comorbid conditions in determining the risk of ADRD.
In the past decade, multiple genome-wide association studies (GWAS) studies for ADRD have been conducted. Recent large-scale GWAS have identified >100 loci associated with ADRD. 8 However, due to the scarcity of data from cohorts containing information on multiple diseases, the current approach to identifying genetic variants for two diseases involves analyzing summary statistics obtained from GWAS conducted independently for each disease in different cohorts. Variants with p-values below a predetermined threshold from GWAS of both cohorts are identified as shared genetic variants between the two diseases. 9
It is important to note the inherent limitations of using separate GWAS data to identify shared genetic variants for multiple diseases. First, conducting separate GWAS in different cohorts does not utilize information from individuals with multiple diseases, raising uncertainty regarding the establishment of true connections between diseases. Second, the separate GWAS is focused on finding genetic association with a single disease, rather than exploring the relationship between comorbid conditions. Lastly, it is worth noting that most current GWAS are conducted using a cross-sectional framework, neglecting to consider the timing of disease onset and progression.
The availability of extensive electronic medical records (EMR) data has stimulated studies using EMR and genotype data integration to examine common genetic risks for disease co-occurrence.10–15 However, current studies predominantly utilize cross-sectional analytic approaches, which do not account for the sequential nature of comorbid disease onsets. Consequently, these analytic approaches fail to leverage the rich information contained in EMR data.
The objective of our study is to apply a novel life-course analytic approach to identify genetic variants that are associated with the co-occurrence of ADRD and another comorbid condition in a longitudinal cohort of African Americans. These findings hold promise for illuminating biological pathways associated with multiple diseases, paving the way for tailored treatments and interventions for elderly individuals with multiple comorbid conditions.
Methods
Study population
The study population consisted of the African American participants of the Indianapolis-Ibadan Dementia Project (IIDP). All were age 65 or older residing in Indianapolis, Indiana. Recruitment was conducted at two time points. During the first recruitment in 1992, 2212 African Americans aged 65 or older living in Indianapolis were enrolled. In 2001, the project enrolled 1893 additional African American community-dwelling participants 70 years and older. All participants agreed to undergo regular follow-up cognitive assessment and clinical evaluations. Details on the assembling of the original cohort and the enrichment cohort are described previously.16,17 The Indianapolis-Ibadan Dementia Project was approved by Institutional Review Boards at Indiana University and the University of Ibadan, Nigeria.
The IIDP study followed a two-stage design with a screening evaluation every two to three years followed by a more comprehensive home-based clinical evaluation. Participants underwent a screening evaluation using the Community Screening Interview for Dementia (CSID). 18 Participants were selected for a more comprehensive home-based clinical evaluation involving cognitive testing based upon the CERAD instruments, an informant interview, an interviewer-conducted home evaluation for function (CHIF) 19 and a neurological examination. Dementia was diagnosed with both the Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition (DSM-III-R 20 and International Classification of Diseases, 10th Revision (ICD-10) 21 criteria. AD was diagnosed using criteria proposed by NINCDS/ADRDA. 22 These diagnostic criteria were applied to all evaluations and involved the same physician diagnostic panel ensuring consistency in clinical diagnoses during the entire 19 years of the study.
Electronic medical records
Electronic medical records were obtained from the Indiana Network for Patient Care. The Indiana Network for Patient Care is a regional health information exchange that integrates clinical information from the five major health care systems in Indianapolis in support of medical care. 23 Of the 4105 participants enrolled in IIDP, 3778 (92%) were identified in the Indiana Network for Patient Care using social security numbers, name, sex, and date of birth. For each individual, we retrieved ICD-9 codes and time of diagnosis for the following comorbid conditions: coronary artery disease, congestive heart failure, chronic obstructive pulmonary disease, type II diabetes, cancer, depression, hyperlipidemia, renal disease, liver disease, and stroke.
Genotyping, quality control and genotype imputation
DNA was extracted from whole blood and genotyped on the Illumina Human Omni1Quad array at the Broad Institute. A cohort of 1178 individuals was genotyped for 823,561 variants. Rigorous quality control procedures were performed, and variants were filtered based on the following criteria: Hardy-Weinberg equilibrium p-values ≤ 0.0001 (168), missing rates >0.05 (323), and minor allele frequencies <0.03 (31,675). After QC filtering, 753,414 high-quality variants were obtained for subsequent analyses.
Statistical analysis
African American participants with EMR records who were not diagnosed with dementia at baseline were included in this analysis. Baseline demographic characteristics including age, sex, years of education, and rates of comorbid conditions between individuals with ADRD and those without ADRD were compared using the Chi-square test for categorical variables and t-test for continuous variables.
The age of onset for ADRD was determined from the research data in IIDP based on longitudinal evaluation of cognitive function and clinical diagnoses. Age of onset for ADRD and each comorbid condition was used as a time-to-event variable, which was censored by either age at death or by the age at the date of the EMR data pull (January 26, 2015). To identify genetic variants associated with both ADRD and a comorbid condition, we used the semi-parametric marginal model for dual time-to-event outcomes proposed by Prentice and Zhao. 24 In this approach, a proportional hazard model was used for times to dual event outcomes. Estimating equations were used to estimate model parameters for these marginal hazard models. The hazard ratio (HR) estimate of a genetic variant for the dual outcomes represents the ratio of the hazard for dual disease occurrence in carriers compared to the hazard in non-carriers. Therefore, an HR greater than 1 indicates an earlier onset of the dual diseases in carriers compared to non-carriers.
Semi-parametric marginal models for dual diseases, i.e., ADRD and a comorbid condition, were conducted for each variant. The models included sex, years of education, and the first principal component of the GWAS data as covariates. To reduce the impact of rare events on model parameter estimates, comorbid conditions with fewer than 10 co-occurrences with ADRD were excluded. The R package ‘mhazard’ was used for conducting the semi-parametric marginal models for dual diseases.
Two sets of analyses were performed. In the first analysis, we focused on the variants previously identified as associated with ADRD, 25 setting the significance level for the hazard ratio estimates at 0.05 for confirmatory analyses. To compare results from the dual-outcome models to those obtained from a single disease model we also conducted Cox's proportional hazard models for time to a single disease. In addition, we conducted these dual disease models further adjusting for the Apolipoprotein E (APOE) gene, given its established role as a risk factor for ADRD. 26 The second analysis includes all available variants, aiming to discover additional variants associated with ADRD and comorbid conditions that had not been reported as AD-associated. The significance level for hazard ratio estimates in this analysis was set at 5e-6. Adjusted p-values were computed using the false discovery rate method. Manhattan plots for each dual disease outcome were generated using the FUMA software. 27 All statistical analyses were conducted using R.
Results
The baseline characteristics between participants with ADRD and the non-ADRD group were compared and presented in Table 1. Individuals in the ADRD group were older and had significantly fewer years of education compared to the non-ADRD group. Those with ADRD also had a higher proportion of APOE ε4 carriers than the non-ADRD group. Furthermore, the proportions of individuals with Type II diabetes, cancer, and hyperlipidemia were significantly lower in the ADRD group compared to the non-ADRD group.
Characteristics of African American participants from the Indianapolis-ibadan dementia project.
In the first analysis of targeted ADRD variants, six variants (Figure 1) were significantly associated with dual diseases involving ADRD and another comorbid condition. Among these, minor alleles of five variants, (rs1476679C, rs7274581C, rs610932G, rs744373A, and rs7274581C) were associated with earlier age of onset for dual disease outcomes. Specifically, individuals with rs1476679C and rs7274581C nearly doubled the hazard for having earlier onset of both ADRD and coronary artery disease (hazard ratio (HR) = 1.973, 95% CI [1.112, 3.503]), as well as both ADRD and hyperlipidemia (HR = 1.869, 95% CI [1.164, 3.000]) compared to those without these alleles, respectively. Conversely, rs3851179 T on gene PICALM demonstrated a protective role, reducing the hazard of developing both ADRD and congestive heart failure by 0.506, with a 95% CI [0.022, 0.751]. While these variants showed significant associations with dual disease outcomes, they were not significantly associated with ADRD or the comorbid conditions in single disease models, except for rs3851179 T, which was significantly associated with a lower risk of developing ADRD (Table 2).

Forest plot of hazard ratios of significant SNPs on comorbidities.
Hazard ratio (HR) estimates of significant variants in dual disease models (ADRD and a comorbid condition) in the targeted analysis. Hazard ratios from Cox's models for single diseases were also included.
In the APOE-adjusted models (Table 3), the APOE E4 allele was consistently identified as a risk factor for dual diseases with the majority of associated p-values below 0.05. Controlling for APOE, the allele rs1476679C on the ZCWPW1 gene was identified as a significant risk factor for two additional dual disease outcomes, with cancer and congestive heart failure, respectively, in addition to its previously noted association with coronary artery disease. Similarly, the allele rs727458C on the CASS4 gene was a significant risk factor for hyperlipidemia and ADRD adjusting for APOE, extending its established link with type II diabetes and ADRD. On the other hand, two alleles, rs3851179 T and rs610932G, previously significant for dual diseases, were no longer significant when adjusting for the APOE E4 allele.
Hazard ratio (HR) estimates and p-values for significant variants in dual-disease models adjusted by APOE ε4 allele.
In subsequent GWAS analyses, eight variants were identified (p-value < 5E-6) in the dual disease models (Table 4). Minor alleles of all eight variants had hazard ratio estimates greater than one, indicating increased risks for both ADRD and comorbid conditions. Among these variants, rs7830188, rs1850440, rs1806267, and rs2281732 had p-values <0.05 in the single disease models for both ADRD and the comorbid conditions, although they fell short of the 5e-6 threshold for the GWAS analyses. None of these variants reached the GWAS threshold in the traditional single disease models for either ADRD or a comorbid condition.
Hazard ratios (HR) of significant variants in dual-disease models for ADRD and a comorbid condition from GWAS. Hazard ratios from Cox's models for single diseases were also included.
Manhattan plots for each dual ADRD and comorbidity outcome are in Figures 2 and 3, and Supplemental Figure 1. For the dual diseases of ADRD and coronary artery disease (Figure 2), a locus is observed on chromosome 17, led by rs1806267 within the NUP88 gene of the nucleoporin family. Additionally, for the dual diseases of ADRD and congestive heart failure (Figure 3), rs2281732 serves as the leading variant, revealing a locus on chromosome 9 within genes TRIM14 and NANS, associated with cell cycle regulation and a protein-coding region, respectively.

Manhattan plot of dual-disease model for ADRD and coronary artery disease, and the locus zoom plot region around rs1806267 on chromosome 17.

Manhattan plot of dual-disease model for ADRD and congestive heart failure, and the locus zoom plot of region around rs2281732 on chromosome 9.
Discussion
We applied a novel statistical model to identify genetic variants associated with ADRD and comorbid conditions. Confirmatory analyses of 20 variants previously linked to ADRD 25 revealed six of them exhibited associations with both ADRD and a comorbid condition using dual-disease models. Moreover, employing the same analytical method in a genome-wide analysis identified eight novel variants associated with both ADRD and another comorbid condition. These findings underscore the potential of these new models for genetic variant discovery associated with multiple disease phenotypes.
To the best of our knowledge, this study represents the first attempt to utilize age of onset data from individuals with multiple diseases to identify shared genetic variants associated with the risk to ADRD and a comorbid disease. Unlike most current GWASs on human diseases, which typically model disease outcome as a binary case-control variable in a logistic regression while adjusting for covariates such as age, sex, and genetic principal components, our approach considers the timing of events, the varying lengths of observation periods among patients, and censoring at the end of the observation period, which may be treated as non-disease in logistic regression frameworks. Previous research has shown that Cox models can offer greater power than logistic regression, particularly for data collected under specific study designs.28,29 In this novel application of marginal dual time-to-event models, we demonstrate the potential to identify shared genetic variants associated with the risk of two diseases.
Several of the ADRD variants identified in the confirmatory analysis shown in Figure 1 have been found to be associated with these comorbid conditions in studies outside of GWAS. The locus PICALM, specifically rs3851179, has drawn attention for its potential role in sub-valvular aortic stenosis, in both animal studies and investigations conducted by multiple research groups.30–32 Similarly, the association of rs610932 with cancer has been linked to the involvement of the MS4A6A gene, with particular relevance to pathological grading and prognosis in ovarian cancer. 33 MS4A6A has also been also implicated in various cancers, including glioma 34 and prostate cancer. 35 Moreover, a Portuguese primary care-based study reported a significant association between the BIN1 rs744373 and dyslipidemia. 36 Furthermore, the involvement of the MS4A genes in immune response, and the roles of BIN1 and PICALM in endocytosis have been linked to the risk of ADRD.37,38 These findings align with the results from our confirmatory analyses, indicating the potential involvement of these variants in the biological pathways underlying both ADRD and these comorbid diseases.
The eight variants identified in our GWAS analysis have not been previously reported as related to ADRD in other GWAS studies. However, two of these SNPs have been associated with other conditions. The rs1850440 T allele, identified in our study as a risk factor for both ADRD and cancer, is located in the STK39 gene (serine/threonine kinase 39). In a GWAS study conducted with Japanese individuals, rs1850440 was associated with acute encephalopathy with biphasic seizures and late reduced diffusion. 39 This study further reported that the minor allele T of rs1850440 correlated with increased expression of the STK39 gene in peripheral blood, suggesting a potential link between SNP rs1850440 and brain function. Additionally, Genome Set Enrichment Analysis of RNA-seq data from non-small cell lung cancer specimens indicated that STK39 expression is significantly correlated with cancer-related processes and pathways, including metastasis, cell cycle, apoptosis, and the p38 pathway. 40 The loci rs1806267 is within the NUP88 gene (nucleoporin 88). Nucleoporins are key components of the nuclear pore complex (NPC) in eukaryotic cells, and NPC dysfunction is a hallmark of various neurodegenerative disorders, including Alzheimer's disease, Huntington's disease, and amyotrophic lateral sclerosis. 41 Other nucleoporins, such as NUP160 and NUP153, play important roles in the cardiovascular system. 42 While NUP88 has not been directly identified in these processes, its potential impact on neurodegeneration and cardiovascular diseases may occur through interactions or mislocation of NPC components. Very limited information has been published regarding the other variants, and further studies are necessary to confirm our findings and determine their functions.
It is not surprising that the majority of variants identified in our analyses are associated with both ADRD and cardiovascular conditions. This observation resonates with extensive literature highlighting the established link between cardiovascular diseases and increased risk of developing ADRD.43,44 Numerous studies have consistently reported on the intricate interplay between cardiovascular health and cognitive decline, underscoring the significance of vascular factors in the pathogenesis of ADRD. This robust body of evidence reinforces the expectation of shared genetic susceptibility underlying these two prevalent and interconnected health conditions. Recent research has reported intriguing connections between ADRD and cancer. Some studies45–47 have reported a reduced risk of cancer in individuals with Alzheimer's disease, and vice versa. This intriguing observation has led researchers to explore potential molecular mechanisms underlying this relationship, such as the involvement of common signaling pathways and shared biological processes like inflammation and oxidative stress.
In our models of dual disease onsets, we also presented results using Cox's proportional hazard models for each individual disease separately. We observed the absence of statistical significance for some variants in these single disease models, despite their associations with the dual diseases. We hypothesize that genetic variants show a stronger association with dual-disease outcomes than individual outcomes due to the heterogeneous nature of individual diseases, whereas dual-disease outcomes may represent more homogeneous disease mechanisms. However, this hypothesis requires further examination with additional research data in future studies.
Utilizing EMR data has the potential for diagnostic bias, where participants with one condition may receive an earlier diagnosis of another condition due to increased interactions with the healthcare system. However, our analyses focus on examining how genetic variant carriers exhibit different hazard rates for dual diseases compared to non-carriers. Our previous analyses of this cohort revealed a cohort effect on the incidence of ADRD, which was mostly accounted for by comorbid conditions. 48 Since we are now modeling the age of onset for both cardiovascular conditions and ADRD, we did not include the year of birth in our model.
Our analysis has several strengths. First, we utilized longitudinal and individual patient health histories contained in EMR data to identify genetic variants for both ADRD and a comorbid condition. Our time-to-event modeling framework utilizes age of onset data, providing a more appropriate modeling option compared to binary disease outcomes. Second, our novel marginal dual-outcome models were constructed using information from individuals experiencing the co-occurrence of ADRD and these comorbid conditions. This approach contrasts with studies that utilize summary statistics from different cohorts. By adopting a time-to-dual event modeling approach within the same longitudinal cohort, we gain valuable insights into the risk of dual diseases and their relationship. Third, the diagnosis of ADRD in our analyses was obtained from direct longitudinal evaluations of the participant cohort, rather than relying on EMR diagnoses, which may lag behind symptom onset. This enhances the reliability of our analysis, allowing a more accurate age of onset for ADRD to be used. Last, the relatively long follow-up period of the IIDP study enhances the robustness of our findings, enabling a comprehensive examination of long-term health outcomes.
Our study also has important limitations. The sample size of our study cohort is relatively moderate compared to many large GWAS studies, thereby constraining the statistical power of both our confirmatory and GWAS analyses in identifying SNPs with small to moderate associations with dual disease outcomes. Another limitation is that the current marginal dual-disease models do not account for the competing risk of death, which may bias results given the increased mortality in older populations. Future development of statistical methods is needed to extend this model framework to appropriately account for competing risks. Furthermore, our study cohort comprises only African American participants, potentially limiting the generalizability of our results to other demographic groups. Validation in larger, ethnically diverse cohorts is needed to confirm our findings. Lastly, our analyses focused on GWAS data. Future research on functional data such as gene expression or proteomic data using artificial intelligence or machine learning methods is needed.49,50
In summary, we utilized a novel dual-disease modeling framework to analyze longitudinal data from the IIDP study, integrating EMR with genotype data. Through this approach, we identified genetic variants associated not only with ADRD but also with comorbid conditions. Our utilization of dual-disease models represents a novel analytic strategy for uncovering shared genetic variants for multiple disease phenotypes.
Supplemental Material
sj-docx-1-alz-10.1177_13872877241289054 - Supplemental material for Genetic variants for Alzheimer's disease and comorbid conditions
Supplemental material, sj-docx-1-alz-10.1177_13872877241289054 for Genetic variants for Alzheimer's disease and comorbid conditions by Minmin Pan, Dongbing Lai, Frederick Unverzagt, Liana Apostolova, Hugh C. Hendrie, Andrew Saykin, Tatiana Foroud and Sujuan Gao in Journal of Alzheimer's Disease
Footnotes
Acknowledgments
The authors have no acknowledgements to report.
Author contributions
Minmin Pan (Formal analysis; Writing – original draft); Dongbing Lai (Supervision; Writing – review & editing); Frederick Unverzagt (Writing – review & editing); Liana Apostolova (Writing – review & editing); Hugh C Hendrie (Writing – review & editing); Andrew Saykin (Writing – review & editing); Tatiana Foroud (Writing – review & editing); Sujuan Gao (Conceptualization; Supervision; Writing – review & editing).
Funding
The research is partially supported by NIH grants K07AG076659, P30AG072976, R01AG009956, U24AG021886, and U01AG057195.
Declaration of conflicting interests
LA has provided consultation to Eli Lilly, Biogen, Two Labs, FL Dept Health, Genentech, NIH Biobank, Eli Lilly, GE Healthcare, Eisai, Roche Diagnostics, and Alnylam. LA receives the following research support: NIA U01 AG057195, NIA R01 AG057739, NIA P30 AG010133, Alzheimer Association LEADS GENETICS 19-639372, Alzheimer Association SG-23-1061716, Roche Diagnostics RD005665, AVID Pharmaceuticals, Life Molecular Imaging. LA has received honoraria for participating in independent data safety monitoring boards and providing educational CME lectures and programs. LA has stock in Cassava Sciences.
Data availability
The data supporting the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
