Abstract
Background
Smoking has been linked to dementia, but the causal relationship has not been well established.
Objective
Our study used a Mendelian randomization (MR) framework to examine the impact of different stages and kinds of smoking behavior on cognitive status.
Methods
We analyzed a Health and Retirement Study sample, categorizing cognitive status into three levels (normal, cognitive impairment-no dementia, dementia) and using self-reported smoking behaviors. We used multivariable logistic regressions to examine associations and MR to examine potential causality. We used smoking polygenic scores as instruments for one-sample MR and validated through two-sample MR with genome-wide association study summary statistics.
Results
Current smoking was associated with 1.33 times higher odds of cognitive impairment-no dementia (95% CI: 1.06, 1.65) in European ancestry participants (N = 7708). Among participants who had ever smoked, each 10 additional year of smoking was associated with 1.11 times higher odds of cognitive impairment-no dementia (95% CI: 1.10, 1.22). Using ever smoking polygenic score as a validated instrumental variable, we detected strong causal effects of ever smoking, current smoking, and total smoking years on cognitive impairment (all p < 0.001). Two-sample MR showed no evidence of causality between smoking behaviors and Alzheimer's disease. No causality was observed in the African ancestry sample (N = 1928).
Conclusions
Smoking behavior was cross-sectionally associated with and potentially on the causal pathway of cognitive impairment-no dementia in the larger European ancestry sample. However, no associations were observed with dementia, and the findings did not replicate across ancestry groups. The causal relationship between smoking and cognitive health remains suggestive but not conclusive. Promoting smoking cessation remains a prudent public health strategy to prevent numerous health conditions, and its potential impact on cognitive health warrants further investigation.
Introduction
Dementia is a neurodegenerative disorder characterized by difficulties in memory loss, impaired language function, challenges in problem-solving, and changed cognitive status that impair a person's daily life. 1 In the United States, 14% of people aged over 70 are living with dementia, which costs a total estimate of $345 billion in 2023. 2 Determining the causal risk factors of dementia provides insights into the pathophysiology of dementia, which can lend support to effective interventions that reduce dementia risks. Multiple factors contribute to the incidence of dementia, including genetic factors and environmental factors, 3 among which cigarette smoking is one of the most prevalent but controversial ones.
With a prevalence of 20.6% among adults, smoking was estimated to be attributable to 10.8% of total dementia cases in the United States,4,5 assuming a causal relationship between smoking and dementia. However, prior studies have compared smokers and non-smokers in case-control or cross-sectional studies of dementia and found mixed results. For example, in 19 case-control studies, ever smoking was strongly associated with lower odds of Alzheimer's disease relative to non-smokers. 6 A meta-analysis conducted in in seven low- and middle-income countries with 11,143 individuals showed no significant association between smoking and the onset of any dementia. 7 In contrast, a few longitudinal studies of older individuals showed smoking was associated with a 1.3–2.3 times higher risk of dementia and Alzheimer's disease.8–10 Moreover, smoking was found to increase dementia risk in the Chinese, 11 Japanese, 12 and Korean13,14 populations but not in White or African American populations. One explanation for these inconsistent results is the potential overlapping of risk and protective factors. Smoking behaviors might be related to many potential confounding factors, such as alcohol consumption, an unhealthy diet, and limited exercise. In turn, ill health could be related to a reduction or cessation of smoking, introducing potential bias due to reverse causality, and survival biases may be present with smokers dying younger. 15 Despite numerous studies on the association between smoking behavior and dementia, the causal relationship between smoking behavior and dementia has not been rigorously assessed.
Mendelian randomization (MR) is a method using genetic variants as instrumental variables to assess causality under specific assumptions. According to Mendel's genetic laws of segregation and independent assortment, MR can be analogous to a randomized controlled trial where individuals are randomized to carry genetic variants that may modify the risk of exposure. Since genetic variants are fixed at conception, they precede the onset of health disorders and environmental exposures and can overcome many drawbacks of observational epidemiology studies, such as confounding and reverse causation. 16 MR testing requires a strong genetic predictor, and smoking behavior is partially heritable. A recent genome-wide association study (GWAS) has linked a total of 566 genetic variants in 406 loci to multiple stages of smoking (initiation, cessation, or heaviness). 17 Given the significant genetic component of smoking behaviors, instrumental variables of genetic predisposition to smoking may be valid for MR analyses.
In our current study, we first conducted conventional cross-sectional analyses to confirm associations between smoking behaviors (smoking initiation, smoking cessation, age at initiation, cigarettes per day) and cognitive impairment using a United States cohort of older adults. To assess a potential causal relationship, we applied a one-sample MR framework, in which the genetic variants, exposure, and outcome were measured in the same sample. Specifically, we performed a one-sample MR for smoking behaviors and cognitive impairment stratified by European and African genetic ancestries. Last, we validated our causal inference results using two-sample MR analyses 18 with public GWAS summary statistics to evaluate causality between smoking behavior and Alzheimer's disease, the most common type of dementia. 2
Methods
Conventional association analyses and one-sample MR
Study sample
We conducted our conventional association analyses and one-sample MR in the Health and Retirement Study, a publicly available, national longitudinal panel study of individuals over 50 in the United States. 19 The Health and Retirement Study has collected data on health and economic information related to aging every two years since 1992, and more than 43,000 individuals have participated to date. 19 Participants provided written informed consent, and these secondary data analyses were approved by the University of Michigan Institutional Review Board (HUM00128220). Non-identifiable data in the Health and Retirement Study are publicly available (https://hrs.isr.umich.edu). Our cross-sectional sample combined multiple waves of Health and Retirement Study (waves 2006 to 2018) to maximize the number of participants with genetic and phenotypic data available. Proxy respondents were excluded from our analyses.
Cognition measures
A participant's cognitive status at the most recent visit was used as our main outcome variable. Cognitive status in self-respondents was categorized into three levels as normal, cognitive impairment-no dementia, or dementia. The method for categorization was based on a 27-point scale, according to participants’ performance on a series of cognitive tests, including immediate and delayed 10-noun free recall, serial 7 subtraction, and backward counting from 20. 20 Cognitive status cut points were established by Crimmins et al. and were validated clinically and empirically with an area under curve score of 0.84. 21 We also conducted a sensitivity analysis using continuous cognitive function measures. For respondents with missing cognition data, the Health and Retirement Study performed imputations using a multivariate, regression-based procedure, assuming that the data were not missing at random. This process incorporated relevant demographic, health, and economic variables, along with cognitive variables from prior and current waves. Descriptive statistics and correlations between observed and imputed values were used to assess the consistency of the imputation process. Further details are available in the Health and Retirement Study documentation. 22 All cognition-related data were retrieved from the cross-wave imputation of cognitive functioning data. 22
Smoking behaviors and covariate measures
All smoking behavior variables were self-reported at each measurement wave in the Health and Retirement Study. 19 Smoking was defined as more than 100 cigarettes in a respondent's lifetime (not including pipes or cigars). A total of nine smoking variables were used in our primary analyses, including smoking initiation (ever/never smoking), current smoking (yes/no), cigarettes per day (current and when smoke most, reported in ever smokers only), age start/stop smoking (years, reported in ever smokers only), years since started/stopped smoking (years, reported in ever smokers only), and smoking duration (years, reported in ever smokers only, calculated by age at current wave/age stop smoking - age start smoking).
Baseline characteristics used in our analysis included demographic characteristics, behavioral risk factors, and chronic health conditions. Age (years) was calculated as current wave year minus self-reported year of birth. Sex (male/female), years of education, body mass index (BMI, kilograms/meters2), alcohol consumption (ever/never drink), history of hypertension (yes/no), diabetes (yes/no), and stroke (yes/no) were self-reported. We also used one self-reported mental health index, derived from the Center for Epidemiologic Studies Depression (CESD) scale, to represent participants’ depressive symptom levels. All smoking and covariate variables were collected at the same wave as their last cognitive visit and were retrieved from the RAND HRS Longitudinal File V1 March 2023. 19
Genetic measures
Starting in 2006, respondents provided saliva samples after reading and signing a consent form during an enhanced face-to-face interview. Genotype measures were obtained using the Illumina HumanOmni2.5 BeadChip, and genotyping was conducted by the Center for Inherited Disease Research. Genotype data that passed initial quality control were released to and analyzed by the Quality Assurance/Quality Control analysis team at the University of Washington. Details of the genotype collection and quality control are available elsewhere. 23 Genetic ancestry was identified through the union of self-reported race/ethnicity and genetic principal component analysis on genome-wide single nucleotide polymorphisms (SNPs) calculated across all participants. Ancestry-specific principal components were created in each ancestry sample to adjust for hidden population structures within ancestry. All genetic data were downloaded from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS, dataset #NG00119). 24
We used a polygenic risk score for each smoking behavior that had SNP-level GWAS weights available from the Sequencing Consortium of Alcohol and Nicotine use (GSCAN) GWA studies, including smoking initiation, cessation, cigarettes per day, and age at initiation, as our primary instrumental variables in the one-sample MR analysis. The list of the contributing studies is shown in Supplemental Table 1. Importantly, the Health and Retirement Study was not part of the discovery GWAS. A polygenic risk score is a single quantitative measure of cumulative genetic risk. It aggregates multiple individual loci across the human genome and weights them by effect sizes derived from a GWAS. 25 Smoking polygenic risk scores were calculated using all independent significant SNPs (p-value < 5.0E−08, LD r2 < 0.6) that overlapped between the GWAS summary statistics and the Health and Retirement Study genetic database. All polygenic scores were standardized to a standard normal curve (mean = 0, standard deviation = 1) within the genetic ancestry category. SNPs in the apolipoprotein E (APOE) gene were identified as strong genetic risk factors that contribute to dementia. 26 Thus, we also built a binary variable for APOE ε4 allele carrier status (having at least one copy of the ε4 allele, yes/no) using genetic measures provided by the Health and Retirement Study. It is noteworthy that the SNPs utilized in constructing the polygenic risk scores for smoking exhibit no overlap with the genomic region encompassing the APOE gene.
The sample selection for these analyses is shown in Supplemental Figure 1. We excluded participants with missing cognitive status, genetic data, smoking behaviors, and other covariates. Proxy respondents were excluded since no genetic data were collected for those participants. Since the risk factors and underlying neuropathological features of dementia are considerably different for people aged under 50 or over 90, 27 we excluded participants who were in these age groups, as well as those in the Asset and Health Dynamics among the Oldest Old and Children of the Depression study, the two oldest cohorts of the Health and Retirement Study. 28
Statistical analyses
Analyses were carried out separately for European (primary) and African (sensitivity) ancestry samples. Distributions of categorcial covariates were described using counts and frequencies. Distributions of continuous covariates were described using means and standard deviations. To examine potential bivariate differences in baseline characteristics within the included sample across different ancestry groups and cognitive status outcome groups, we used the χ2 test and analysis of variance as appropriate.
We first examined associations between smoking behaviors and cognitive status in our analytical sample. To assess the proportional association between smoking behaviors and the three-level cognition variable, ordinal logistic regression was first attempted. However, we observed a violation of the proportional odds assumption at multiple variables, indicating heterogeneous associations of covariates across different levels of cognitive status. Thus, we performed multivariable logistic regression instead, treating cognitive impairment-no dementia and dementia as separate outcomes with normal cognition as the reference group. The sample size (N) varied across models due to differing numbers of eligible respondents to specific smoking behavior questions in the Health and Retirement Study. Variables including ‘years since started smoking,’ ‘age stopped smoking,’ and ‘total smoking years’ were only applicable to participants who indicated they had ever smoked. Consequently, our analysis was conducted on this subset of respondents. Our baseline regression models were adjusted for age, sex, years of education, last cognitive visit wave, and five ancestry-specific principal components. To assess the robustness of the multivariable association testing to additional covariates, we repeated the logistic regressions additionally adjusting for alcohol consumption, BMI, CESD score, history of hypertension, diabetes, and stroke. We further adjusted for APOE ε4 allele carrier status as a precision variable to reduce the standard errors of the regression models and hence shrink confidence intervals (CIs) on coefficients of interest. We present results as odds ratios (ORs) and 95% CIs.
We then conducted MR through instrumental variable regressions using the AER R package. 29 Mendelian randomization methods were based on the following three assumptions: 1) relevance assumption: the genetic variants are associated with the risk factor of interest; 2) independence assumption: there are no unmeasured confounders of the association between genetic variants and outcome; 3) exclusion restriction: the genetic variants affect the outcome only through the effect on the risk factor of interest (Figure 1). 30

A one-sample Mendelian randomization framework and assumptions. Mendelian randomization assumptions: (i) the genetic variants associate with the exposure(s), (ii) there are no unmeasured confounders of the association between genetic variants and outcome and (iii) the genetic variants affect the outcome only through the effect on the exposure(s).
We followed current guidelines for MR analysis. 31 To examine the potential for a smoking polygenic score to serve as an instrumental variable for each smoking behavior, we performed multivariable logistic (for binary smoking variables) or linear (for continuous smoking variables) regression between smoking polygenic scores and smoking behaviors. The F-statistics were calculated to assess instrument strength, with >10 indicating a sufficiently strong instrument. 32 To assess inferred causality between smoking behaviors and cognitive status, two-stage least squares models were used to calculate the causal effect controlling for confounders at each step. In the first stage, we regressed the exposure on the genetic variants (e.g., smoking polygenic score) and relevant covariates. In the second stage, the outcome was regressed on the predicted values of the exposure from the first regression and the same covariates. 33 We controlled for age, sex, years of education, and the first five ancestry-specific principal components in all instrumental variable regression analyses.
For sensitivity analyses, to assess the robustness of our MR findings, we employed the Limited Information Maximum Likelihood (LIML) method, 34 which is known to provide more reliable estimates when dealing with weak genetic instruments. Unlike the traditional two-stage least squares approach, LIML is less sensitive to weak instrument bias, potentially yielding more accurate estimates of causal effects. 32 This method estimates the causal association between smoking behaviors and cognitive outcomes while accounting for the limitations associated with instrument strength. Moreover, to account for potential confounding factors, we employed an Inverse Probability of Treatment Weighting (IPTW) method 35 to estimate the causal effect of smoking on cognitive performance. Propensity scores were estimated using logistic regression, with smoking status as the outcome and age, sex, years of education, and the first five ancestry-specific principal components as predictor variables. The propensity score represents the probability of exposure (e.g., being a smoker) given these confounding factors. Using these scores, we calculated inverse probability weights. For example, where individuals who smoked were weighted by the inverse of their propensity score (1/propensity score), and non-smokers were weighted by the inverse of 1 minus their propensity score (1/(1-propensity score)). These weights were applied in a weighted linear regression model to create a pseudo-population in which the distribution of confounders is balanced between smokers and non-smokers. This approach allowed us to estimate the association between smoking and cognitive performance while minimizing the influence of confounders, thereby providing a more accurate estimate of the causal effect. In addition, to determine if the associations and causal effects differed by age or sex, we conducted stratified analyses in the European sample. Participants were grouped by age (70 and older versus under 70) and by sex (female versus male). Finally, we repeated all analyses in the African ancestry sample, as applicable. We did not conduct a stratified analysis for the African ancestry sample due to the small sample size.
Two-sample MR
Study consortia and genome-wide association studies
For the exposure phenotypes (smoking behaviors), we used the same set of GWAS summary statistics from the GSCAN consortium as our one-sample MR, including smoking initiation, cigarettes per day, age of initiation, and smoking cessation 17 (Supplemental Table 1). Summary statistics from a GWAS on Alzheimer's disease were identified from a consortium GWAS from 2019 by Kunkle et al. 36 Alzheimer's disease is the most prevalent form of dementia and was the most similar trait to those used in our one-sample MR for which a GWAS was available. We also selected these GWAS studies based on their large sample sizes. All the GWAS summary statistics are publicly available.17,36 Information on recruitment procedures and diagnostic criteria is detailed in the original publications. These GWAS estimates were selected from studies conducted by different consortia, ensuring minimal sample overlap between the exposure and outcome associations.
Instrumental variable selection
We used the default settings in the TwoSampleMR R package 37 to conduct the selection of genetic instruments from the exposure (smoking behaviors) GWASs. Specifically, genome-wide significant (p-value < 5.0E−08) SNPs were extracted. 38 Significant SNPs were then clumped (clumping window of 10,000 kb, LD r2 cutoff 0.001) for independence using a European reference panel to control for linkage disequilibrium. We used a large clumping window and a small LD r2 cutoff to reduce the likelihood of selecting correlated instruments and ensure the independence of our genetic instruments, thereby enhancing the robustness and validity of our instrumental variable analysis.39,40 These significant SNPs were checked for overlap with Alzheimer's disease to ensure that they were unique to smoking behavior. None of the SNPs were associated with Alzheimer's disease at the genome-wide level of significance (p-value < 5.0 × 10−8), suggesting that the instruments were independent based on current GWAS data.
Statistical analyses
To calculate the causal estimate of a smoking phenotype on Alzheimer's disease, we first calculated the Wald ratio for each selected instrumental variable (i.e., individual SNPs). Then the individual effect of each SNP was meta-analyzed using inverse variance weighting to combine the effect of different instrumental variables as a concluding beta estimate, which was transformed into an OR.32,41,42
We also assessed potential violations of the MR assumptions. To satisfy the relevance assumption, we estimated the strength of instrumental variables using the proportion of variance of the exposures explained by the SNPs (R2) and F-statistics. 32 To test the exclusion restriction assumption, we applied MR-Egger to detect possible violations of the assumption due to directional horizontal pleiotropy. 41 We examined other MR approaches to minimize bias, including weighted median, a median of the weighted estimates that provides a consistent effect even if 50% of instrumental variables are pleiotropic, 43 and weighted mode, which assumes the most common causal effect is consistent with the true causal effect. 44 Heterogeneity between different MR methods was tested using Cochran's Q test 45 and illustrated using scatter plots.
As sensitivity analyses, MR pleiotropy residual sum and outlier (MR-PRESSO) tests were performed to detect outlier SNPs which may bias estimates through horizontal pleiotropy. 46 A leave-one-out analysis was also performed to identify any influential SNP that was disproportionately responsible for the result of each MR study. Finally, we tested reverse MR, where Alzheimer's disease was regarded as the exposure and smoking behaviors as outcomes, using the same analytic methods. All two-sample MR analyses were conducted with the TwoSampleMR R package. 37
All analyses were conducted in R version 4.2.2. 47 We considered p-values < 0.05 as statistically significant if not specified. Codes to produce all analyses in this manuscript are available on GitHub for reproducible studies (https://github.com/bakulskilab/Smoking_Dementia_MR).
Results
Study sample descriptive statistics
The Health and Retirement Study analytical sample included 7708 participants of European genetic ancestry and 1928 participants of African genetic ancestry (Table 1). Compared to the African ancestry sample, the European ancestry sample had lower proportions of smokers, abnormal cognition, and chronic health conditions, while smokers in the European ancestry sample had longer smoking periods and smoked more cigarettes per day.
Characteristics of sample participants (N = 9636) stratified by ancestry, Health and Retirement Study, Wave 2006–2018 a .
APOE: Apolipoprotein E; CESD: Center for Epidemiologic Studies Depression.
All smoking and covariate variables were collected at the same wave as their last cognitive visit. All the statistics were calculated based on non-missing data for each variable. Categorical variables were calculated with count (frequency), and continuous variables were calculated with mean (standard deviation).
The overall p-value was calculated from chi-square test or analysis of variance for categorical or continuous variables as appropriate, interpreted as differences between groups.
Within the European ancestry sample, smoking initiation and current smoking were more common among those with cognitive impairment (Tables 2 and 3). Those with cognitive impairment also had started smoking longer ago, stopped smoking more recently, and had smoked for more total years than those with normal cognition. Thus, these five smoking behavior variables were used as primary exposures in further assessments of associational and causal inference models. Former and current smokers exhibited higher PGSs for smoking initiation and cessation compared to never smokers. In contrast, no significant differences were observed in other smoking-related PGSs across different smoking groups. Demographic covariates including age, sex, education, and health condition covariates (stroke status, hypertension status, diabetes status, BMI, ever-drinking alcohol, and CESD score) were significantly associated with cognitive status and smoking status and were treated as covariates in adjusted models.
Characteristics of European sample participants (N = 7708) stratified by cognitive status, Health and Retirement Study, Wave 2006–2018. a
APOE: Apolipoprotein E; CESD: Center for Epidemiologic Studies Depression; PGS: polygenic score.
All the statistics were calculated based on non-missing data for each variable. Categorical variables were calculated with count (percent), and continuous variables were calculated with mean (standard deviation).
The overall p-value was calculated from chi-square test or analysis of variance for categorical or continuous variables as appropriate, interpreted as differences between groups.
The genetic instrumental variables (PGSs) were built with genome-wide significant single nucleotide polymorphisms identified from genome-wide association studies.
Characteristics of European sample participants (N = 7708) stratified by smoking status, Health and Retirement Study, Wave 2006–2018. a
APOE: Apolipoprotein E; CESD: Center for Epidemiologic Studies Depression; PGS: polygenic score.
All the statistics were calculated based on non-missing data for each variable. Categorical variables were calculated with count (frequency), and continuous variables were calculated with mean (standard deviation).
The overall p-value was calculated from chi-square test or analysis of variance for categorical or continuous variables as appropriate, interpreted as differences between groups.
The genetic instrumental variables (PGSs) were built with genome-wide significant single nucleotide polymorphisms identified from genome-wide association studies.
Associations between smoking behaviors and cognitive status in the health and retirement study
Multivariable logistic regression analysis showed that smoking initiation and current smoking were associated with higher odds of cognitive impairment-no dementia in the European ancestry sample (Table 4). After adjusting for demographics, health conditions, and Alzheimer's disease genetics, ever-smokers had 1.26 (95% CI: 1.08, 1.46) times higher odds of cognitive impairment-no dementia relative to never-smokers and normal cognition; and the association was even higher for current smokers relative to former and never smokers (OR = 1.33, 95% CI: 1.06, 1.65). For participants who had ever smoked, we observed that a ten-year increase in smoking duration was associated with 1.11 (95% CI: 1.10, 1.22) times higher odds of cognitive impairment-no dementia. Finally, among former smokers, a ten-year increase in the age of stopping smoking was associated with 1.10 times higher odds of cognitive impairment-no dementia (95% CI: 1.01, 1.22). No statistically significant association was observed between any of the smoking behaviors with odds of dementia. Results from the continuous cognitive function analysis (total score of 27) were consistent with those from the categorical variable analysis (Supplemental Table 2).
Associations between smoking behaviors and cognitive status in European ancestry sample (N = 7708), Health and Retirement Study, Wave 2006–2018. a
AD: Alzheimer's disease; APOE: Apolipoprotein E; BMI: body mass index; CESD: Center for Epidemiologic Studies Depression; CI: confidence interval; CIND: Cognitive Impairment-No Dementia; OR: odds ratio.
All the logistic regression analyses used “normal cognitive status” as the reference group.
All the models were adjusted for age, sex, years of education, last cognitive visit wave, and five ancestry-specific principal component sets.
Adjusted for history of hypertension, diabetes, stroke, ever drink alcohol, BMI, and CESD score in addition to variables in b.
Adjusted for APOE4 status in addition to variables in c.
Sample sizes varied across models due to differing numbers of eligible respondents. Only ever smokers were asked about time of smoking initiation, cessation, and total years, and response frequencies to these questions varied.
One-sample MR in the Health and Retirement Study
Assessing smoking polygenic scores as valid instrumental variables for smoking behaviors
Among all four polygenic scores, only polygenic scores for smoking initiation (nSNP = 257) and smoking cessation (nSNP = 29) were associated with one or more exposures of interest (Table 5). We chose to use the smoking initiation polygenic risk score as our primary instrumental variable given stronger associations with smoking behaviors compared to the smoking cessation polygenic risk score. More specifically, in the European ancestry sample, after adjusting for age, sex, years of education, and ancestry-specific PCs, a one-standard deviation increase in smoking initiation polygenic score was associated with 1.15 times higher odds of membership in the ever-smoking group (95% CI: 1.10, 1.20). Similarly, a one-standard deviation increase in smoking initiation polygenic score was associated with 1.14 times higher odds of current smoking behavior (95% CI: 1.06, 1.23). Among ever smokers, the smoking initiation polygenic score was not associated with longer total smoking years (effect size: 0.80, 95% CI: 0.21, 1.40). Formal test statistics confirmed the smoking initiation polygenic score as a valid instrument for these three smoking behaviors (F-statistics: ever smoking 34.48, current smoking 34.39, total smoking years 16.37).
Associations between smoking polygenic risk scores and smoking behaviors in European ancestry sample (N = 7708), Health and Retirement Study, Wave 2006–2018. a
CI: confidence interval; OR: odds ratio; PC: principal component; PGS: Polygenic Score; SNP, single nucleotide polymorphisms.
Logistic regression analyses for binary outcomes (ever smoking, current smoking), and results were reported as ORs. Linear regression analyses for continuous outcomes (years since start, age stop, and total smoking years), and results were reported as beta coefficients. All the models were adjusted for age, sex, and five ancestry-specific principal component sets.
Number of SNPs used to build the PGS accordingly.
Next, we checked the associations of the smoking initiation polygenic risk score with baseline covariates using linear regressions and visualized with forest plots (Figure 2). The smoking initiation polygenic risk score was unrelated to all potential confounding factors except for a weak association with years of education. Thus, we adjusted for education as a covariate in the following MR analyses. This adjustment aims to account for the potential confounding effect of education on the association between PGS, smoking behaviors, and cognitive performance.

Associations between smoking initiation polygenic risk score and covariates in the European ancestry sample (N = 7708), Health and Retirement Study, Wave 2006–2018. Covariates were collected at the same wave as their last cognitive visit. APOE: Apolipoprotein E; BMI: body mass index; CESD: Center for Epidemiologic Studies Depression; PGS: polygenic score; SNP: single nucleotide polymorphisms.
Inferring causality between smoking behaviors and cognitive status
We next tested the inferred causal relationship between smoking behaviors and cognitive status, using the smoking initiation polygenic score as an instrumental variable with a one-sample MR framework (Table 6). In the one-sample MR analysis, we detected strong causal effects of ever smoking (p < 0.001) and current smoking (p < 0.001) on cognitive impairment-no dementia and on dementia. For example, current smokers had 1.71 (95% CI: 1.53, 1.91) times higher risk of cognitive impairment-no dementia relative to never-smokers and normal cognition. Among smokers, we also saw a weak but significant inferred causal effect of each 10-year increase in total smoking years on cognitive impairment-no dementia and on dementia. Similar results were observed when using the continuous cognitive score as the outcome. Specifically, we identified an inverse causal effect of ever smoking and current smoking on cognitive scores compared to never smokers, as well as an inverse causal effect of total years of smoking on cognitive scores among smokers (Supplemental Table 3). Sensitivity analyses using the LIML method produced a stronger association compared to the traditional two-stage least square estimates for the effect of smoking status on cognitive status (Supplemental Table 4). Using the IPTW method, which adjusted for potential confounders through propensity score weighting—including age, sex, years of education, and five ancestry-specific principal component sets—produced results consistent with the primary findings regarding cognitive impairment no-dementia. However, the previously observed significant causal effects of ever smoking and current smoking on dementia were no longer detected (Supplemental Table 5).
Causal estimates with one-sample Mendelian randomization between smoking behaviors and cognitive status in European ancestry sample (N = 7708), Health and Retirement Study, Wave 2006–2018. a
CI: confidence interval; OR: odds ratio.
All models used the smoking initiation polygenic risk score as the instrumental variable. All the models were adjusted for age, sex, years of education, and five ancestry-specific principal component sets.
Sex and age stratified results for the European ancestry sample are presented in Supplemental Tables 6–7. In both sexes and across younger (<70) and older (≥70) age groups, we observed positive associations between smoking initiation, current smoking, smoking duration, age of stopping smoking, and cognitive impairment-no dementia. These associations and causal effects were stronger in males and younger age groups. No significant association was observed between any smoking behavior and dementia. Across both sex and age groups using the same MR framework, we identified strong causal effects of ever smoking, current smoking, and smoking duration on cognitive impairment-no dementia and dementia. Notably, these associations were stronger in males compared to females, while the causal effects were more pronounced in older age groups.
African ancestry sample results
In the African ancestry sample (N = 1928), we also observed that smoking initiation was associated with higher odds of cognitive impairment-no dementia (Supplemental Table 8). After adjusting for demographics, health conditions, and Alzheimer's disease genetics, ever-smokers had 1.32 (95%CI: 1.05, 1.68) times higher odds of cognitive impairment-no dementia relative to never-smokers and normal cognition. However, no association was found between any smoking polygenic scores and smoking behaviors (Supplemental Table 9), so no further MR analysis was performed.
Two-sample MR with public GWAS summary statistics
The results of the two-sample MR analyses investigating the causal relationship between Alzheimer's disease and four smoking behaviors are shown in Table 7 and Figure 3. Our inverse variance weighted results showed that none of the liability for those smoking behaviors affects Alzheimer's disease risk, which is consistent with results from the other MR methods, including MR-Egger and weighted median. In the main MR analyses, the effect sizes (ORs) of smoking behaviors on Alzheimer's disease ranged from 1.02 to 1.18.

Scatter plots showing the Mendelian randomization effect of each smoking behavior on Alzheimer's disease.
Results of the two-sample Mendelian randomization analyses showing the bidirectional effects between smoking behaviors and Alzheimer's disease outcome using independent significant SNPs.
CI: confidence interval; MR: Mendelian randomization; OR: odds ratio; SNP: single nucleotide polymorphisms.
According to the test results of the MR assumptions (Table 7), the genetic instruments explained 0.1% to 0.9% of the variance in smoking behaviors. All F-statistics were above the standard cutoff (F-statistic > 10), suggesting that our MR analyses had sufficient instrumental strength that would not be affected by weak instrument bias. 32 Significant heterogeneity was apparent in our instrumental variables for smoking initiation (MR Egger, Q p-value = 0.027; inverse variance weighted, Q p-value = 0.031) and cigarettes per day (inverse variance weighted, Q p-value = 0.042), which were also shown in funnel plots and leave-one-out plots (Supplemental Figure 2). The MR-Egger intercept was centered around zero for all MR analyses. No outlier SNPs were identified by MR-PRESSO. Altogether, there was no evidence of unbalanced horizontal pleiotropy in our two-sample MR analyses.
We performed sensitivity analysis to examine reverse causality with Alzheimer's disease as the exposure and risk of the four smoking behaviors as outcomes. We found no clear evidence to suggest a causal effect of Alzheimer's disease on any of the smoking behaviors (Supplemental Table 10).
Discussion
Dementia is prevalent, and the identification of potentially modifiable risk factors is critically needed. The present study was conducted among older adults from combined waves of the Health and Retirement Study. To our knowledge, this is the first study to examine the causal relationships between different stages and kinds of tobacco use and cognitive status using both a one- and two-sample MR framework. We first tested the cross-sectional associations between smoking behaviors and cognitive status to benchmark with prior research. We observed that smoking initiation was associated with higher odds of cognitive impairment-no dementia in both European and African ancestry samples, after adjusting for demographics, health conditions, and Alzheimer's disease genetics. We also observed that current smoking, smoking duration, and age stopped smoking were associated with higher odds of cognitive impairment-no dementia. These associations were stronger in males and in individuals under 70 years of age. Then we newly applied an MR framework to examine whether smoking behaviors were causally associated with cognitive status, using a genetic predisposition to smoking initiation as an instrumental variable. We found evidence of causal effects of smoking initiation, current smoking, and smoking duration on cognitive impairment and dementia in the European ancestry sample using one-sample MR. The effects were particularly strong in males and individuals over 70 years old. However, these findings were not consistent in the smaller African ancestry sample, and they were not consistent in the two-sample MR framework.
The positive associations found between smoking initiation, current smoking, and cognitive impairment were similar to previous studies.8,48,49 Smokers may have elevated chronic oxidative stress in the brain and other organ systems, which may trigger dementia-pathophysiological processes. Computed tomography and magnetic resonance-based studies also found supportive evidence of abnormalities in brain morphology, perfusion, and neurochemistry in smokers. 50 Although there are several hypothesized pathways through which smoking may influence cognitive function, we were not able to identify specific mechanisms in this study.
An interesting finding from our association analysis was that statistically significant associations were only observed between smoking behaviors with cognitive impairment-no dementia but not with dementia in both ancestry samples. One explanation could be the selective survival concern for both dementia and smoking. In other words, our samples might be biased toward healthier smokers—individuals who survived or did not experience significant smoking-related morbidities, and the smoking-dementia relationship was likely underestimated. Smokers may die prematurely from other smoking-related diseases before developing dementia. In the Health and Retirement Study, the genotyped respondents were longer-lived as compared with their non-genotyped respondents. 51 Moreover, the development of other smoking-related morbidities, such as cardiovascular diseases and cancer, may also impede participation in the longitudinal Health and Retirement Study. For example, we observed a significantly higher incidence of past stroke in the excluded sample (12.7%, N = 2786) compared to participants (9.8%, N = 9636, p < 0.001) that were included in our analyses.
In our analysis, we observed an increase in the magnitude of the OR for “current versus non-current smoker” comparison when moving from the “health condition” model with fewer covariates to the “AD genetics” model with more covariates. This phenomenon, while unexpected, may be attributed to the complex interactions between smoking status, health conditions, and genetic predispositions to AD. It is possible that genetic factors associated with AD exert a stronger influence on smoking behavior than initially anticipated, thereby amplifying the OR in the presence of these covariates. Additionally, the inclusion of genetic variables may uncover underlying confounding effects that were not apparent in the health condition model alone. This finding underscores the importance of considering genetic predispositions when examining the impact of smoking on health outcomes. Future research should further investigate these interactions to elucidate the mechanisms driving this association and validate our results.
In this study, we used one-sample MR models to assess causality and found evidence indicating that smoking initiation, current smoking, and longer smoking duration had causal relations with cognitive impairment. We sought to validate these findings with two-sample MR, but the nearest health outcome to cognitive impairment that had an available GWAS was Alzheimer's disease. These two-sample MR methods using public summary statistics yielded non-significant estimates, which showed little evidence for a causal relationship between smoking behaviors and Alzheimer's disease. Although Alzheimer's disease is the most common etiology of cognitive impairment, other diseases, such as cerebrovascular diseases, may also cause cognitive decline and dementia. 2 Thus, the discordance between one-sample and two-sample MR results does not necessarily contradict to each other. Given that we were only able to broadly classify cognitive status in our current study, additional studies examining dementia subtypes and domains of cognition (e.g., episodic memories, mental status, and vocabulary) will be able to establish the specificity of associations and causations.
Assessing the validity of MR models requires a thorough evaluation of several assumptions. We performed formal tests to validate the smoking initiation polygenic risk score as an instrument for multiple smoking behaviors in the European ancestry sample (relevance assumption met). We acknowledge that while the F-statistic is a commonly used measure to assess the strength of genetic instruments, relying solely on this metric from a single study may not be sufficient to rule out weak instrument bias. 32 To assess the robustness of our MR findings, we employed the LIML method, 34 which is less sensitive to weak instruments than the standard two-stage least square method. Results showed that the LIML estimates were higher and more significant compared to the two-stage least square estimates, likely reflecting the robustness of LIML to weak instrument bias. 32 This suggests that the two-stage least square estimates may have been attenuated toward the null due to the weaker genetic instruments. Although the independence and exclusion restriction assumptions cannot be tested pragmatically, we made several attempts to test for potential violations. The lack of association between smoking initiation polygenic risk score and Alzheimer's disease polygenic risk score in our European sample indicates an absence of pleiotropy in the one-sample MR analysis. Additionally, the null association between other potential confounders—apart from a weak association with years of education, which was adjusted for in the MR models—suggests that any direct associations between genetic predisposition for smoking and other confounders are minimal, thereby limiting their impact on our primary findings (Figure 2). Multiple tests of assumptions used in the two-sample MR suggested no evidence of unbalanced horizontal pleiotropy (Table 7). Consistent estimates across multiple methods also ensure that bias is less likely (Figure 3). 52 However, we observed small R2 values in the assumption check (Table 7), which indicates that the genetic instruments explain only a small proportion of the variance in the exposures. This can lead to weak instrument bias, where the instruments are not sufficiently strong to reliably estimate the causal effect, resulting in biased and imprecise estimates.
Higher smoking initiation polygenic scores were associated with greater likelihood to engage in smoking behaviors in the European ancestry sample only. Because our methods for computing the smoking polygenic scores depended on summary statistics from a GWAS that had focused primarily on participants of European ancestry, the results are likely non-generalizable to other ancestral groups. 17 Furthermore, we had a relatively small sample of African ancestry (1928 versus 7708 for the European ancestry sample). Thus, we were not surprised by the null association observed in the African ancestry sample. A similar study should be replicated in an African ancestry with a larger sample size and polygenic scores based on smoking behavior GWASs among the African ancestry specifically.
This study has several strengths. First, studies examining smoking behaviors and cognitive status in the context of genetics are somewhat limited, and thus our findings contribute to an important yet sparse literature. Second, instead of using a single SNP, which only accounts for a very small fraction of variability in the incidence of the disease or behavior, we combined multiple independent variants (polygenic scores) to increase the amount of variation explained, and thus, increase the power of testing our hypotheses. 53 Third, we used data from two genetic ancestries in the Health and Retirement Study, and which were independent from the parent GWAS of both smoking and Alzheimer's disease. Fourth, we examined the causal relationships with both one-sample and two-sample MR. Using a single population sample allows for the comparison of MR and conventional association findings in the same individuals. While two-sample MR uses genetic summary-level data, which are often available for larger sample sizes, reflecting an increased statistical power to detect a causal effect. 41 These two MR methods can be complementary and using them together can provide more robust causal results.
Several limitations to this work are worth noting. First, because our research questions were tailored to different analytic samples, such as the general population (smokers and non-smokers) and specific population (smokers only), we are unable to directly compare the odds ratios across models. We focused our interpretations within a given sample instead of between samples. Second, although we attempted to account for confounding and population stratification, we caution against overinterpretating the causality of the findings. Future studies are warranted to replicate our findings in other cohorts and study populations. Third, cognitive status is not as well-defined as other more explicitly defined variables, such as smoking behaviors, and was examined at only one point, which almost certainly does not completely account for the lifetime variation in the trait instrumented by the gene. The next steps should include examining cognitive trajectories and dementia incidence.
In conclusion, in an analysis of the Health and Retirement Study, we observed smoking behaviors were cross-sectionally associated with higher odds of cognitive impairment-no dementia in European ancestry samples; however, no significant associations were observed with dementia. These findings were not replicated across ancestry groups, and evidence of causality was suggestive but not consistent across methods and samples. Our study adds to evidence supporting associations between smoking initiation, current smoking, and smoking duration with cognitive impairment-no dementia, which posits smoking cessation as a protective factor. Smoking behaviors are potentially modifiable, and interventions may decrease cases of cognitive impairment. While we observed suggestive evidence of a causal relationship in certain analyses, these findings were not universally consistent in our study. Therefore, promoting smoking cessation remains a prudent general public health strategy, as smoking is a known risk factor for numerous health conditions, and its potential impact on cognitive health warrants further investigation. However, additional studies on the time-span effect of smoking behaviors on cognitive impairment are critically needed, as well as studies in diverse genetic ancestries.
Supplemental Material
sj-docx-1-alz-10.1177_13872877251320562 - Supplemental material for Understanding causal estimates of smoking behaviors for cognitive impairment: A Mendelian randomization study
Supplemental material, sj-docx-1-alz-10.1177_13872877251320562 for Understanding causal estimates of smoking behaviors for cognitive impairment: A Mendelian randomization study by Mingzhou Fu, Herong Wang, Erin B. Ware and Kelly M. Bakulski in Journal of Alzheimer's Disease
Footnotes
Acknowledgments
The authors would like to thank the participants of the Health and Retirement Study. We would additionally like to thank the scientific community for helpful comments on our medRxiv posting.
Author contributions
Mingzhou Fu (Conceptualization; Data curation; Formal analysis; Methodology; Software; Validation; Visualization; Writing – original draft); Herong Wang (Validation; Visualization; Writing – review & editing); Erin B Ware (Conceptualization; Funding acquisition; Supervision; Writing – review & editing); Kelly M Bakulski (Conceptualization; Funding acquisition; Supervision; Writing – review & editing).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: All authors were supported by grants from the National Institute on Aging (R01 AG055406 and R01 AG067592). KB was supported by grants from the National Institute for Environmental Health Sciences and the National Institute for Minority Health and Health Disparities (R01 ES025531; R01 ES025574; and R01 MD013299). EW was supported by a grant from the National Institute on Aging (R01 AG055654). This work was supported by the Michigan Center for the Demography of Aging (P30 AG012846) and the Michigan Alzheimer's Disease Center (P30 AG053760). The funders did not have any role in the design, analysis, interpretation, or writing of the manuscript.
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article. Kelly M. Bakulski is an Editorial Board Member of this journal but was not involved in the peer-review process of this article nor had access to any information regarding its peer-review. The remaining authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Publicly available datasets were analyzed in this study. Outcome and covariate data can be downloaded from the Health and Retirement Study, (HRS core files or RAND HRS Longitudinal File 2020 (V1)) public use dataset. Produced and distributed by the University of Michigan with funding from the National Institute on Aging (grant number NIA U01AG009740), Ann Arbor, MI, (2023). The RAND HRS Longitudinal File is an easy-to-use dataset based on the HRS core data. This file was developed at RAND with funding from the National Institute on Aging and the Social Security Administration.
Genetic data is available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (accession number NG00119.v1). The Health and Retirement Study genetic data is sponsored by the National Institute on Aging (grant numbers U01AG009740, RC2AG036495, and RC4AG039029) and was conducted by the University of Michigan.
Supplemental material
Supplemental material for this article is available online.
Correction (April 2025):
In Table 4, under the “Total smoking years (10-year)” row, the values in the “Health condition” column were previously reported as “1.10 (1.11, 1.21)” and have been corrected to “1.11 (1.10, 1.21).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
