Multimodal machine learning for early risk stratification of post-stroke cognitive impairment

Abstract

Background

Post-stroke cognitive impairment (PSCI) is a major vascular contributor to dementia, significantly impacting long-term recovery and quality of life. Developing accurate prediction models are essential for early identification and timely intervention in high-risk individuals.

Objective

To develop and validate a stacking-based multimodal machine learning model integrating clinical, demographic, and neuroimaging features for early PSCI prediction in acute ischemic stroke (AIS) patients.

Methods

In this retrospective cohort study, 1070 AIS patients admitted to Lianyungang First People's Hospital from January 2020 to August 2023 were included. Demographic, clinical, and neuroimaging data were collected, and cognitive function was assessed 3–6 months post-stroke. PSCI was defined as a z-score ≤ -2.0 in at least one of four cognitive domains. A stacking ensemble model was developed, combining six base algorithms: XGBoost, Gradient Boosting Decision Trees, CatBoost, Support Vector Machine, Logistic Regression, and LightGBM. The final prediction was generated by a meta-model trained on base model outputs.

Results

Of the 1070 patients (mean age 67.4 ± 9.3 years, 61.5% male), 37.2% developed PSCI. The stacking model achieved 98.13% accuracy, 0.9972 AUC, and 0.9744 F1-score in internal validation. External validation showed 81.00% accuracy, 0.9049 AUC, and 0.8780 recall. Key predictors of PSCI included infarct volume, cortical lesions, medial temporal lobe atrophy, and baseline NIHSS score.

Conclusions

This stacking-based multimodal machine learning model demonstrates robust predictive performance for PSCI risk in AIS patients, serving as a reliable tool for early detection that may inform personalized intervention strategies to prevent progression to post-stroke dementia.

Keywords

Alzheimer's disease machine learning magnetic resonance imaging post-stroke cognitive impairment predictors vascular dementia

Introduction

It is well established that post-stroke cognitive impairment (PSCI) has become a key concern in the field of neurology due to its significant impact on patients’ quality of life, daily functioning, and social interactions. PSCI typically occurs within six months after a stroke and is characterized by deficits in attention, memory, language, and executive functions.¹ These cognitive impairments severely affect the patient's ability to live independently and can lead to long-term disability. Traditional cognitive assessment tools such as the Montreal Cognitive Assessment (MoCA) and the Mini-Mental State Examination (MMSE) are commonly used in clinical practice, but they have limitations in quantifying the multidimensional risk factors of PSCI and fail to meet the demands of early screening.^2,3 Therefore, the development of a more precise tool for early identification and intervention of PSCI is essential for improving patient care after stroke.

Previous studies have identified major risk factors associated with PSCI, including hypertension, smoking, diabetes, and dyslipidemia.⁴ These modifiable risk factors account for approximately 90% of the total risk for stroke, with hypertension being the most significant risk factor, closely linked to both ischemic stroke and hemorrhagic stroke. The growing importance of multi-omics approaches has contributed to discovering novel biomarkers and understanding the molecular mechanisms of PSCI.⁵ These techniques include advanced neuroimaging methods such as magnetic resonance imaging (MRI), positron emission tomography (PET), and diffusion tensor imaging (DTI), as well as cerebrospinal fluid (CSF) analysis, genetic testing, and blood-based biomarkers. While PET and CSF analysis offer valuable insights into neurodegenerative pathology, their invasive nature, specialized technical requirements, and high cost restrict their accessibility. It is worth noting that the burden of stroke is particularly heavy in low- and middle-income countries, which account for 75% of global stroke-related deaths and 81% of stroke-related disability-adjusted life years.⁶ Thus, techniques such as genetic testing remain impractical for routine clinical use due to financial constraints. In contrast, MRI offers a non-invasive and clinically feasible approach to assessing brain structure, providing key insights into PSCI-related changes while balancing diagnostic accuracy and resource availability, particularly in settings with access to MRI technology.^5–8

PSCI exhibits diverse phenotypic presentations and can be influenced by demographic factors as well as comorbidities associated with stroke. A recent study by Hua et al. (2023) identified significant variations in cognitive trajectories—encompassing global cognition, memory, and visuospatial abilities—before and after stroke within a large cohort of 13,311 patients.⁹ Notably, cognitive decline is most pronounced in the early post-stroke phase, and these changes may be influenced by educational background and other individual differences in cognitive capacity.⁸

However, despite considerable progress in identifying risk factors and biomarkers for PSCI, several issues remain unresolved. A major limitation of previous studies is the reliance on traditional cognitive assessment tools, such as the MMSE and MoCA, which are insufficient for addressing the multifaceted nature of PSCI and its diverse clinical manifestations.^2,3 Furthermore, although neuroimaging features, such as small vessel disease (SVD), have been shown to significantly contribute to PSCI prediction, most studies focus on single features and lack a comprehensive approach that integrates multi-modal imaging data and functional assessments.^10,11 These gaps underscore the need for an integrated, easily implementable model capable of rapidly assessing the possibility of PSCI onset, particularly in resource-limited settings where advanced diagnostic tools may not be readily available.

In this study, we aim to address this gap by proposing an innovative stacking multi-modal machine learning model for the early prediction of PSCI. The model integrates the patient's demographic data, neuroimaging features, clinical variables including vascular risk factors and depression, and functional assessments of speech, vision, hearing, and swallowing. The model employs base classifiers such as LightGBM and XGBoost,¹² with a support vector machine (SVM) meta-model to enhance prediction accuracy. Through this multi-modal framework, we propose a novel approach to early detection of PSCI, which may provide more timely and accurate intervention for high-risk patients in clinical settings.

Methods

Study design and population

This retrospective cohort study included 1070 patients admitted to the Department of Neurology at Lianyungang First People's Hospital between January 2020 and August 2023 with a diagnosis of acute ischemic stroke (AIS). In addition to laboratory and imaging assessments, all patients with ischemic stroke included in the study underwent a standardized neuropsychological battery 3 to 6 months following stroke onset. The study was approved by the Medical Ethics Committee of The First People's Hospital of Lianyungang, and all study participants provided informed consent (KY-20240826002-01).

Patients were included based on the following criteria (Figure 1): (1) aged 50–90 years, (2) diagnosis of cerebral infarction consistent with the American Heart Association/American Stroke Association (AHA/ASA) guidelines for the early management of patients with AIS, confirmed by MRI showing acute infarction lesions and exclusion of other brain conditions that may lead to dementia,¹³ (3) admitted within 7 days of symptom onset, (4) no history of mood disorders or psychiatric illnesses, (5) willingness to undergo cognitive assessment, (6) availability of informed consent, and (7) complete cranial MRI data. Patients were excluded if they had (1) a history of major psychiatric disorders (e.g., schizophrenia, bipolar disorder) or a severe, active mood disorder, (2) inability or refusal to provide informed consent, (3) premorbid cognitive decline, defined as a documented diagnosis of dementia or mild cognitive impairment, or any history of cognitive impairment treatment prior to the index stroke, as ascertained through medical records and caregiver interviews, (4) neurodegenerative diseases, hereditary cerebral small vessel disease, or other significant brain pathologies that could independently cause cognitive impairment, or (5) inability to complete neuropsychological testing due to hearing loss, poor cooperation, or severe neurological deficits, including aphasia. A total of 133 patients (11.1% of the enrolled cohort of 1203) were excluded due to incomplete neuroimaging data or loss to follow-up at the 3-to-6-month cognitive assessment. The final analysis was therefore conducted on 1070 patients with complete data.

Figure 1.

Inclusion and exclusion flow chart of the PSCI study.

Post-stroke cognitive impairment

Cognitive performance was assessed within 7 days post-stroke using the MoCA and MMSE to evaluate cognitive status during the acute phase. To mitigate potential practice effects arising from item overlap, the administration order of the two tests was counterbalanced across participants. For the assessment of PSCI, the cognitive function was assessed 3 to 6 months after stroke using a standardized neuropsychological battery evaluating five cognitive domains¹⁴: memory (story memory, word list learning), language (picture naming, verbal fluency), executive function (Stroop Color-Word Test, Trail Making Test Part B), attention, and visuospatial function (Clock Drawing Test, Rey-Osterrieth Complex Figure Test). For patients with mild aphasia, only non-verbal subtests were administered, and the total score was prorated accordingly. Raw scores from each test were adjusted for age, sex, and education to generate standardized z-scores. Domain-specific scores were calculated as the average z-score of all relevant tests within each cognitive domain. PSCI was defined as having a z-score ≤ −2 standard deviations in at least one cognitive domain (memory, language, executive function, attention, or visuospatial function).

Brain imaging

All participants underwent brain MRI within seven days of stroke onset using two 3.0-T whole-body MRI systems (GE Signa HDx or Philips Ingenia) with an 8-channel head coil. Imaging protocols included T1-weighted imaging (T1WI), T2-weighted imaging (T2WI), and fluid-attenuated inversion recovery (FLAIR) sequences, which provided high-resolution structural details for stroke evaluation. Imaging data included critical lesion characteristics, such as laterality, multifocality, and volume of infarction. Lesion locations were categorized as cortical, subcortical, or infratentorial regions. Additionally, cortical thickness and subcortical volumes were measured from baseline structural T1-weighted images using FreeSurfer 7.4.1 to obtain quantitative morphometric measures. Strategic lesion locations associated with cognitive function were identified, including the basal ganglia, thalamus, hippocampus, caudate nucleus, inferomedial temporal gyrus, and angular gyrus.^10,15 Global brain metrics, such as total brain surface area and volume, and region-specific measures, including hippocampal and amygdala volumes, were extracted. The brain volume refers to the total brain parenchyma volume excluding the ventricular volume (such as the lateral ventricles and third ventricle), providing a more accurate measure of the brain's gray and white matter.¹⁵ Cortical thickness was assessed in regions including the medial temporal lobe, cingulate gyrus, parahippocampal gyrus, and temporal gyri, along with additional areas such as the entorhinal cortex, inferior temporal gyrus, middle temporal gyrus, superior temporal gyrus, lateral orbitofrontal cortex, temporal pole, and transverse temporal gyri. Additionally, the volumes of the whole brain, along with the bilateral hippocampi and amygdalae, were measured.¹⁵ Quality control was rigorously conducted by trained technicians, and any data with significant artifacts were excluded from analysis. Small vessel diseases were evaluated based on the presence of lobar or deep chronic microbleeds, white matter hyperintensities (WMH) according to the modified Fazekas scale, and lacunar infarcts.¹⁶ For the purpose of our predictive model, this was converted into a binary variable. A patient was classified as having significant WMH if the Fazekas score was ≥ 2 in either region. Furthermore, the presence of mesial temporal lobe atrophy (MTLA) was assessed using the Scheltens scale.¹⁷ The presence of significant MTLA was defined as a Scheltens scale score of ≥ 2. These imaging features were integrated into predictive modeling for PSCI.

Statistical analysis

Continuous variables were expressed as mean ± standard deviation (SD) or median with interquartile range (IQR), as appropriate, and categorical variables as frequencies and percentages. Comparisons between the PSCI and post-stroke no cognitive impairment (PSNCI) groups were conducted using t-tests or Mann-Whitney U tests for continuous variables and chi-square or Fisher's exact tests for categorical variables. Statistical significance was set at p < 0.001.

PSCI risk model

The pipeline consists of two components, namely the capture of essential features, and the construction of the novel PSCI risk model. Initially, a total of 69 features were considered for inclusion in the model (Supplemental Table 1). The dataset was split into training and test sets with a ratio of 8:2, ensuring consistent class proportions in both sets. In the first step, we designed a subgroup-based recursive feature elimination method to select key differential features.The 69 candidate features were first divided into four clinically meaningful subgroups: Cerebral atrophy score, Small cerebral vascular disease burden, Infarct score, and Other features. (1) Select vital base models: Here we utilized six various base models namely, XGBoost,¹⁸ gradient boosting decision tree (GBDT),¹⁹ CatBoost,²⁰ SVM,²¹ logistic regression,²² and LightGBM.²³ Mathematical details of these base models are provided in the Supplemental Material.

Then, a Bayesian optimization algorithm based on the Tree-structured Parzen Estimator Approach (TPE) was used to bring in data for 10-fold cross-validation to optimize and fine-tune the hyperparameters of the model.²⁴ A nested cross-validation approach was used for hyperparameter search. (2) For each subgroup independently,take this trained model into the recursive feature elimination (RFE) search process, and set the RFE parameter (n _ features _ to _ select = 1), i.e., eliminate only one feature at a time. In step one, the RFE process will remove the least important features from the current dataset each time, and then repeat the first step of the model training based on the input model's hyperparameters, in return for traversing all features. (3) Record the importance of each feature based on the ranking of RFE evaluators. For each subgroup, we retained the top-performing features based on the cross-validated AUC. Then, for each subgroup, the two models with the highest cross-validated AUC were selected as base learners, and the features retained by these base learners were combined. The final selection of 22 features was determined as the union of the features retained by the top-two models across all four subgroups (Supplemental Table 2).

In the second step, we perform a model fusion of these base models by designing a stacking ensemble, input the fused results into the meta-model, and finally output the results. Our model consists of two parts, the first part is the base model, and here we consider it from four perspectives: algorithmic diversity, feature diversity, sample diversity, and random diversity. In the feature selection stage described above, we used six base models (XGBoost, GBDT, CatBoost, SVM, Logistic Regression, and LightGBM). For stacking, we selected eight base learners (the top-two models from each of the four subgroups) as the first-level estimators. We fuse and select these 8 models in different numbers and types using Bayesian optimization and input the results into the meta-model. The final output of our meta-model is the stacking ensemble. The above process is packaged into a whole model for Bayesian optimization based on the TPE algorithm, in which the hyperparameters of the basic models are in a frozen state. The final PSCI risk model was the stacking ensemble with the best-performing meta-learner (SVM) after hyperparameter optimization. All analyses were performed with a fixed random seed (42) to ensure reproducibility.

External validation

To evaluate the stability and generalization ability of the model on external data, this study used data from 100 AIS patients admitted to our institution since August 2023 as an external validation set. The data were standardized to ensure consistency with the training set. The processed data were then input into the trained predictive model, and performance metrics such as AUC, accuracy, sensitivity, specificity, and F1 score were calculated to assess the model's performance on new data and validate its generalization ability and stability.

Results

Baseline characteristics

Of the 1070 individuals included in this study, 398 patients were diagnosed with PSCI and 672 were identified as PSNCI (Figure 1). The mean age of the cohort was 67.4 ± 9.3 years, and neuropsychological assessments were conducted approximately 4 months after stroke onset (Table 1). The median National Institutes of Health Stroke Scale (NIHSS) score was 2 (IQR 1–5). Baseline characteristics were compared between the two groups to ensure the validity of subsequent analyses (Tables 1 and 2).

Table 1.

Demographic and clinical characteristics of PSCI and PSNCI groups.

	PSNCI (n = 672)	PSCI (n = 398)	p
Age	65.1 (8.99)	71.6 (9.25)	<0.001
Gender (m/f)	435/237	219/179	0.002
Education	7.67 (4.61)	5.44 (4.80)	<0.001
Height	169 (66.7)	165 (8.32)	0.101
Weight	69.7 (10.6)	68.1 (11.6)	0.024
BMI	25.1 (3.26)	25.1 (3.55)	0.909
Smoking, n (%)	304 (45.2%)	153 (38.4%)	0.035
Drinking, n (%)	247 (36.8%)	140 (35.2%)	0.65
Hypertension, n (%)	518 (77.1%)	306 (76.9%)	1
Diabetes, n (%)	263 (39.1%)	164 (41.2%)	0.546
Cancer, n (%)	48 (7.14%)	37 (9.30%)	0.253
Obstructive sleep apnea, n (%)	230 (34.2%)	158 (39.7%)	0.083
Atrial fibrillation, n (%)	37 (5.51%)	59 (14.8%)	<0.001
Coronary heart disease, n (%)	75 (11.2%)	65 (16.3%)	0.02
SBP	147 (17.5)	148 (18.7)	0.437
TC	4.63 (1.10)	4.54 (1.15)	0.22
Creatinine	66.7 (29.8)	66.0 (25.7)	0.679
Hb	139 (18.1)	134 (18.3)	<0.001
Glycated hemoglobin	6.58 (1.70)	6.65 (1.73)	0.494
Hospital stay days	9.54 (3.89)	10.3 (4.59)	0.006
TIA history, n (%)	273 (40.6%)	196 (49.2%)	0.007
Stroke history, n (%)	124 (18.5%)	126 (31.7%)	<0.001
Thrombolytic therapy	122 (18.2%)	44 (11.1%)	0.003

PSCI: post-stroke cognitive impairment; PSNCI: post-stroke no cognitive impairment; BMI: body mass index; SBP: systolic blood pressure; TC: total cholesterol; Hb: hemoglobin; TIA: transient ischemic attack

Table 2.

Neurological impairment after stroke in PSCI and PSNCI groups.

	PSNCI (n = 672)	PSCI (n = 398)	p
MoCA	26.1 (1.77)	20.0 (1.96)	<0.001
MMSE	27.8 (1.26)	23.6 (2.24)	<0.001
Hearing Decline, n (%)	225 (33.5%)	168 (42.2%)	0.005
Visual Impairment, n (%)	353 (52.5%)	263 (66.1%)	<0.001
Speech Clarity Issues, n (%)	215 (32.0%)	189 (47.5%)	<0.001
Swallowing Difficulties, n (%)	89 (13.2%)	105 (26.4%)	<0.001
Depression, n (%)	134 (19.9%)	226 (56.8%)	<0.001
mRS	1.41 (0.79)	2.06 (1.07)	<0.001
GCS	14.8 (0.91)	14.2 (1.54)	<0.001
NIHSS	2.17 (2.62)	4.86 (4.90)	<0.001
TOAST:			<0.001
CE	31 (4.61%)	44 (11.1%)
LAA	233 (34.7%)	211 (53.0%)
SVO	397 (59.1%)	133 (33.4%)
UDandOD	11 (1.64%)	10 (2.51%)

PSCI: post-stroke cognitive impairment; PSNCI: post-stroke no cognitive impairment; MoCA: Montreal Cognitive Assessment; MMSE: Mini-Mental State Examination; NIHSS: National Institute of Health Stroke Scale; GCS: Glasgow Coma Scale; mRS: Modified Rankin Scale; TOAST: Trial of Org 10172 in Acute Stroke Treatment; LAA: large artery atherosclerosis; SVO: small vessel occlusion; CE: cardioembolism; UD: undetermined; OD: other determined.

Neuroimaging findings showed that the PSCI group had significantly larger infarct volumes (Median: 2477.5 mm³, 95% CI: 2300.0 mm³ to 3036.0 mm³ versus Median: 1802.0 mm³, 95% CI: 1730.0 mm³ to 1893.0 mm³; p < 0.001), a higher prevalence of cortical or strategic lesions, and more severe MTLA (64.8% versus 34.2%; p < 0.001). Additionally, microbleeds (45.5% versus 35.6%; p = 0.002) and WMH (54.5% versus 29.6%; p < 0.001) were more common in the PSCI group (Table 3).

Table 3.

MR imaging characteristics of PSCI and PSNCI groups.

	PSNCI (n = 672)	PSCI (n = 398)	p
Lacunar infarction, n (%)	606 (90.2%)	380 (95.5%)	0.003
Microbleeds, n (%)	239 (35.6%)	181 (45.5%)	0.002
Medial temporal lobe atrophy, n (%)	230 (34.2%)	258 (64.8%)	<0.001
Brain white matter hyperintensities, n (%)	199 (29.6%)	217 (54.5%)	<0.001
Infarct volume	2389 (3046)	9979 (19364)	<0.001
Multifocality, n (%)	253 (37.6%)	272 (68.3%)	<0.001
Cortical, n (%)	187 (27.8%)	213 (53.5%)	<0.001
Subcortical, n (%)	37 (5.51%)	42 (10.6%)	0.003
Deep, n (%)	406 (60.4%)	233 (58.5%)	0.589
Infratentorial, n (%)	172 (25.6%)	73 (18.3%)	0.008
Left-sided lesions, n (%)	300 (44.6%)	206 (51.8%)	0.029
Whole brain surface area	1438321 (246697)	1361482 (235606)	<0.001
Whole brain volume	888622 (77405)	855247 (87237)	<0.001

These findings highlight the significant differences in demographic, clinical, and neuroimaging characteristics between the PSCI and PSNCI groups, providing a comprehensive understanding of the factors associated with post-stroke cognitive impairment.

Essential features

Our screening process identified 22 key features with significant discriminative power for predicting PSCI risk. A complete description of all 22 selected features, including their categories and units of measurement, is provided in Supplemental Table 2. These features encompass various categories, including brain atrophy, small vessel disease burden, infarct scores, and other relevant features. These features were used in subsequent modeling and evaluation of model prediction performance.

Prediction models

To validate the performance of the prediction models, we randomly stratified 20% of the overall dataset (80 PSCI, 134 PSNCI) as an independent test set, while the remaining samples were used for model training. We compared the predictive performance of six machine learning models (XGBoost, GBDT, CatBoost, SVM, Logistic Regression, and LightGBM) with the PSCI risk model on the test set. The evaluation metrics, including accuracy, AUC, F1-score, precision, and recall, are presented for the seven models (Table 4).

Table 4.

Comparison of machine learning model performance for the prediction of PSCI.

	Accuracy	AUC	F1-score	Precision	Recall
PSCI Risk Model	0.9813	0.9972	0.9744	1.0000	0.9500
CatBoost	0.9626	0.9964	0.9518	0.9186	0.9875
GBDT	0.9346	0.9943	0.9157	0.8837	0.9500
LightGBM	0.9439	0.9945	0.9286	0.8864	0.9750
Logistic Regression	0.9626	0.9969	0.9518	0.9186	0.9875
SVM	0.9533	0.9966	0.9390	0.9167	0.9625
XGBoost	0.9533	0.9936	0.9398	0.9070	0.9750

The results indicate that the fusion strategy of combining multiple models provides superior performance compared to individual models (Figure 2). Figure 2 demonstrates that the stacking ensemble model consistently outperforms all six base learners across all five metrics, with the most notable advantages in F1-score and precision.

Figure 2.

Performance comparison of six machine learning models (XGBoost, GBDT, CatBoost, SVM, Logistic Regression, and LightGBM) with the PSCI risk model on an independent test set (20% of the total dataset). The x-axis represents five evaluation metrics (Accuracy, AUC, F1-score, Precision, and Recall), and the y-axis represents the corresponding metric values. Evaluation metrics include accuracy, AUC, F1-score, precision, and recall. The results show that the ensemble strategy of combining multiple models outperforms individual model.

Feature importance

We utilized SHapley Additive exPlanations (SHAP) values from the PSCI risk model to determine the importance of features for PSCI prediction (Figure 3A).²⁵ The analysis revealed that MoCA and MMSE during the acute phase of stroke were the most critical variables, with wide SHAP value distributions indicating significant impact on model output. High feature values (magenta) are concentrated in the negative SHAP value region (Figure 3B), suggesting that higher MoCA and MMSE scores negatively contribute to the model's prediction. Conversely, low feature values (light blue) are concentrated in the positive SHAP value region, indicating that lower MoCA and MMSE scores positively influence the prediction. Additionally, brain atrophy scores such as “Whole Brain Volume”, “Cingulate-Isthmus Thickness (L)”, and “Temporal Pole Thickness (R)” were inversely related to PSCI risk. In contrast, “Infarct Volume” from the infarct scores showed a positive correlation, indicating that larger infarct volumes are associated with higher PSCI risk. Figure 3A confirms that MoCA carries the largest mean absolute SHAP value, followed by MMSE, underscoring the dominant role of acute-phase cognitive assessments. Figure 3B further shows that lower MoCA and MMSE scores consistently shift predictions toward PSCI, while greater whole brain volume and cortical thickness values are associated with negative SHAP contributions, reducing predicted PSCI risk.

Figure 3.

The SHapley Additive exPlanations values of the best prediction model. (A) Bar plot showing the mean absolute SHAP values of each feature, reflecting overall feature importance. (B) Beeswarm plot illustrating the SHAP value distribution for each feature across all patients. Each dot represents an individual patient; magenta dots indicate high feature values and light blue dots indicate low feature values. Positive SHAP values indicate a positive contribution to the predicted risk of PSCI, while negative SHAP values indicate a protective effect.

Local explanation for different evaluation units

Local explanations provide insights into the contribution and direction of each variable's effect on the model's decision for individual evaluation units (Figures 3 and 4). For a randomly selected evaluation unit from PSCI and non-PSCI samples, we visualized the local explanation results using the SHAP algorithm. The vertical axis represents the variables and their values used in the model, while the horizontal axis indicates the magnitude and direction of the variable's influence on the model's decision. Blue represents inhibitory effects, and red represents promotive effects. The cumulative result of each factor's contribution and the model's processed prediction value is denoted by f(x). The contributions of MoCA, MMSE, and brain atrophy-related variables were found to be the most significant in the model's decision-making process.

Figure 4.

Local explanations showing the contribution and direction of each variable's impact on the model's decision for individual evaluation units. The y-axis represents feature names and values; the x-axis represents the magnitude and direction of each feature's contribution to the model output f(x), relative to the baseline E[f(x)] = 0.38. SHAP algorithm was used to visualize the local explanation results for evaluation units randomly selected from PSCI and PSNCI samples.

External validation

The model demonstrated strong predictive performance on the external validation set. The external validation dataset consisted of 100 patients, including 41 (41%) with PSCI. The proposed PSCI risk model achieved the following performance: accuracy = 0.810 (95% CI: 0.722–0.875), AUC = 0.905 (95% CI: 0.839–0.971), F1-score = 0.791 (95% CI: 0.707–0.862), precision = 0.720 (95% CI: 0.595–0.844), and recall = 0.878 (95% CI: 0.778–0.978). The proportion of PSCI events (41%) was comparable to that in the development cohort, reducing concern about event sparsity. The narrow confidence intervals—particularly for AUC (lower bound 0.839)—indicate acceptable reliability of the model's generalization. These results indicate that the model performs well not only on the training data but also exhibits strong generalization ability on external data, effectively predicting the occurrence of PSCI.

Discussion

This study successfully developed a predictive model for PSCI by integrating machine learning techniques with recursive feature elimination and hyperparameter optimization. SHAP analysis was employed to identify and interpret the most influential features, revealing MoCA performance, MMSE performance, whole brain volume, cingulate-isthmus thickness, temporal pole thickness, age, and infarct volume as the most impactful predictors. The model incorporated multimodal features, including neuropsychological scales, imaging features (e.g., brain volume, cingulate gyrus thickness, temporal pole thickness), and clinical data (e.g., age, hemoglobin, NIHSS scores), to significantly enhance predictive accuracy. Structural brain metrics, such as brain volume and local thickness, were strongly correlated with cognitive decline, particularly in post-stroke populations.²⁶ The integration of multimodal data not only captured complex interactions comprehensively but also improved both prediction accuracy and interpretability.^27,28 Furthermore, the Stacking ensemble method, combining models like SVM and logistic regression, provided distinct advantages in handling complex, nonlinear data, enhancing both predictive accuracy and robustness.²⁹

MoCA and MMSE scores, assessed within seven days post-stroke, emerged as the most influential predictors for PSCI. MoCA, with its high sensitivity to mild cognitive impairment, and MMSE, a more generalized cognitive status measure, provided complementary insights into cognitive deficits. The high SHAP values for these assessments confirm their critical role in early PSCI identification. Previous research has underscored the importance of early cognitive assessments, especially in stroke patients, to facilitate timely interventions.^10,30 Clinically, these non-invasive and accessible tools enable routine cognitive screening, supporting their role as core features in the PSCI prediction model.

Beyond functional assessments, neuroimaging features provided the structural basis for these cognitive deficits, further reinforcing the predictive value of the model. Whole brain volume, a marker of brain atrophy, strongly predicted cognitive decline, emphasizing the role of structural brain metrics in dementia-related impairments.²⁶ Regional measures, including cingulate-isthmus and temporal pole thickness, highlighted their roles in attention, executive function, memory, and language, with SHAP values confirming their relevance.^27,28 Infarct volume was another significant predictor, with larger lesions linked to severe cognitive deficits due to extensive neural damage. These findings highlight the importance of integrating neuroimaging metrics into PSCI prediction models to improve accuracy and sensitivity.

The predictive value of these neuroimaging features is further supported by established neurocognitive frameworks and neurobiological mechanisms. Whole brain volume reflects global structural integrity, with progressive atrophy following stroke representing a recognized substrate of secondary neurodegeneration and cognitive decline.³¹ The cingulate-isthmus, as a core node of the default mode network, is critically involved in memory consolidation and self-referential processing, and cortical thinning in this region has been consistently linked to cognitive deterioration.³² At the neurobiological level, post-stroke injury to the cingulate gyrus disrupts the structural integrity of the cingulum bundle, a white matter tract connecting the cingulate cortex to the hippocampus and parahippocampal gyrus,³³ thereby impairing hippocampal-dependent memory consolidation and contributing to the disconnection of large-scale cognitive networks.^34,35 Temporal pole thickness, reflecting a convergence zone for semantic memory and language comprehension, has similarly been associated with impairments in attention and memory when structurally compromised.³⁶ Ischemic damage to the temporal pole further disrupts the uncinate fasciculus, a white matter pathway critical for semantic memory retrieval and multimodal associative processing,³⁷ resulting in fragmentation of the temporal-frontal and temporal-limbic networks. Beyond focal lesion effects, stroke triggers secondary neurodegeneration through Wallerian degeneration of connected fiber tracts,³⁸ as evidenced by longitudinal neuroimaging studies demonstrating progressive cortical thinning and subcortical atrophy in regions remote from the primary infarct.^35,39 Concurrently, sustained neuroinflammatory cascades, including microglial activation and pro-inflammatory cytokine release, have been identified as key contributors to diffuse cognitive decline following stroke.⁴⁰ Infarct volume compounds these vulnerabilities by disrupting long-range cortical-subcortical connectivity, with larger lesions exerting strategic effects on cognitive networks.³¹ Together, these converging pathophysiological mechanisms provide a neurobiological foundation for the predictive value of cingulate-isthmus and temporal pole thickness in the proposed PSCI model, bridging the SHAP-identified imaging features with known pathophysiology.

Building upon the feature selection and neuroimaging insights described above, the integration of RFE and TPE significantly enhanced the performance of predictive models, as demonstrated in this study. Specifically, the optimized PSCI risk model achieved superior results across multiple metrics, including an AUC of 0.9972, an accuracy of 0.9813, and an F1 score of 0.9744, outperforming traditional machine learning models such as CatBoost and XGBoost. RFE effectively streamlined the feature selection process by iteratively removing irrelevant features, improving model interpretability and generalizability.^29,41 Similarly, TPE demonstrated its capacity for hyperparameter optimization by systematically exploring parameter combinations, ensuring robust performance across feature sets.⁴² This combination of RFE and TPE not only improved predictive accuracy but also enhanced model robustness, providing a reliable foundation for PSCI risk stratification.

The superior performance of our stacking model is further supported by comparison with recently published PSCI prediction models. Existing machine learning approaches have predominantly relied on either clinical variables alone or limited imaging inputs, which may constrain their capacity to capture the full complexity of post-stroke cognitive decline.^10,43,44 In contrast, our model integrates comprehensive multimodal inputs spanning neuropsychological assessments, structural neuroimaging, and clinical variables, enabling a more holistic representation of PSCI risk. As demonstrated in Table 4, this integration contributed to consistently superior performance across accuracy, AUC, F1-score, and precision metrics compared with all individual benchmark models evaluated in this study.

This study combines multi-modal data and machine learning techniques for PSCI prediction. However, several limitations should be acknowledged. Firstly, the external validation was limited to temporal validation within our institution, without validation from external centers. This may affect the generalizability of the findings to other populations or settings. Secondly, the quantitative indicators used for cerebral small vessel disease in this study may require further refinement for improved accuracy. Thirdly, the exclusion of patients with hearing impairment, aphasia, or severe neurological deficits was necessitated by the requirement for reliable completion of standardized neuropsychological assessments. However, this criterion may introduce selection bias toward less severely affected patients, as excluded individuals tend to have more severe strokes and are potentially at higher risk for PSCI. Lastly, the external validation dataset comprised only 100 patients from a single institution, which may limit the stability and reliability of the reported performance metrics. Although the proportion of PSCI events was comparable to the development cohort and confidence intervals remained acceptably narrow, and the final analysis was restricted to complete cases, with 133 patients (11.1%) excluded whose data were no longer retrievable; thus, the possibility of selection bias cannot be entirely excluded. Larger multicenter external validation cohorts are needed to more robustly assess the generalizability of the model. Addressing these limitations in subsequent studies could contribute to the development of more robust and generalizable PSCI prediction models.

Conclusion

This study demonstrates methodological innovation in feature selection, model optimization, and data integration. The combination of RFE with hyperparameter optimization allowed for improved accuracy at each feature elimination step.^45,46 The integration of multimodal data enhanced both model performance and clinical insights.^10,30 Additionally, the Stacking ensemble and SHAP-based analysis provided an effective, interpretable clinical decision support tool for PSCI prediction.^47,48 This model has demonstrated potential for application in the early risk stratification of post-stroke cognitive impairment. However, given the absence of calibration analysis, decision curve analysis, and multicenter external validation, further studies are warranted before broader clinical application can be recommended.

Supplemental Material

sj-docx-1-alz-10.1177_13872877261454538 - Supplemental material for Multimodal machine learning for early risk stratification of post-stroke cognitive impairment

Supplemental material, sj-docx-1-alz-10.1177_13872877261454538 for Multimodal machine learning for early risk stratification of post-stroke cognitive impairment by Xingyongpei Zheng, Panpan Zhao, Na Wang, Xinyu Wang, Ziyi Dong, Caihong Gu, Yong Sun, Xinru Gu and Xinyu Zhou in Journal of Alzheimer's Disease

Footnotes

Acknowledgements

The authors sincerely thank all the patients who participated in this study for their cooperation and support.

ORCID iDs

Xingyongpei Zheng

Panpan Zhao

Na Wang

Xinyu Wang

Ziyi Dong

Caihong Gu

Yong Sun

Xinru Gu

Xinyu Zhou

Ethical considerations

Institutional Review Board approval was obtained. The study was approved by the Medical Ethics Committee of The First People's Hospital of Lianyungang, and all study participants provided informed consent (KY-20240826002-01).

Consent to participate

All study participants provided informed consent (KY-20240826002-01).

Consent for publication

Not applicable

Author contribution(s)

Xingyongpei Zheng: Conceptualization; Funding acquisition; Resources; Software; Supervision; Writing – original draft; Writing – review & editing.

Panpan Zhao: Data curation; Writing – review & editing.

Na Wang: Data curation; Writing – review & editing.

Xinyu Wang: Data curation; Writing – review & editing.

Ziyi Dong: Formal analysis; Writing – review & editing.

Caihong Gu: Project administration; Writing – review & editing.

Yong Sun: Methodology; Writing – review & editing.

Xinru Gu: Project administration; Writing – review & editing.

Xinyu Zhou: Validation; Visualization; Writing – original draft; Writing – review & editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Science and Technology Bureau of Lianyungang, Jiangsu Province, China (Project No. SF2426), High-level Project Cultivation Program of Lianyungang Medical Education Innovation Research Center, Nanjing Medical University (Project No. LYGYB09 and LYGYB20), Natural Science Foundation of the Jiangsu Higher Education Institutions of China Programme (Project No. 25KJB360002), and Zhongda Hospital Affiliated to Southeast University, Jiangsu Province High-Level Hospital Pairing Assistance Construction Funds (zdlyg13).

Declaration of conflicting interests

The authors declare no competing financial interests or personal relationships that could influence the work reported in this paper. The funding source (Science and Technology Bureau of Lianyungang, Jiangsu Province; Project SF2426) had no involvement in study design, data collection, analysis, interpretation, manuscript preparation, or publication decisions.

Data availability statement

The data and code used in this study are available from the corresponding author upon reasonable request.

Supplemental material

Supplemental material for this article is available online.

References

O’Donnell

Chin

Rangarajan

, et al. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study. Lancet 2016; 388: 761–775.

Pendlebury

Mariz

Bull

, et al. MoCA, ACE-R, and MMSE versus the national institute of neurological disorders and stroke–Canadian stroke network vascular cognitive impairment harmonization standards neuropsychological battery after TIA and stroke. Stroke 2012; 43: 464–469.

K-H

Cho

S-J

, et al. Cognitive impairment evaluated with vascular cognitive impairment harmonization standards in a multicenter prospective stroke cohort in Korea. Stroke 2013; 44: 786–788.

Zerna

Thomalla

Campbell

BCV

, et al. Current practice and future directions in the diagnosis and acute treatment of ischaemic stroke. Lancet 2018; 392: 1247–1256.

Shao

Zhou

, et al. Multi-omics research strategies in ischemic stroke: a multidimensional perspective. Ageing Res Rev 2022; 81: 101730.

Pandian

Kalkonde

Sebastian

, et al. Stroke systems of care in low-income and middle-income countries: challenges and opportunities. Lancet 2020; 396: 1443–1451.

Reas

Shadrin

Frei

, et al. Improved multimodal prediction of progression from MCI to Alzheimer’s disease combining genetics with quantitative brain MRI and cognitive measures. Alzheimers Dement 2023; 19: 5151–5158.

Lim

J-S

Lee

J-J

Woo

C-W

. Post-stroke cognitive impairment: pathophysiological insights into brain disconnectome from advanced neuroimaging analysis techniques. J Stroke 2021; 23: 297–311.

Hua

Dong

Chen

G-C

, et al. Trends in cognitive function before and after stroke in China. BMC Med 2023; 21: 204.

10.

Lee

Yeo

N-Y

Ahn

H-J

, et al. Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alzheimers Res Ther 2023; 15: 147.

11.

Georgakis

Fang

Düring

, et al. Cerebral small vessel disease burden and cognitive and functional outcomes after stroke: a multicenter prospective cohort study. Alzheimers Dement 2023; 19: 1152–1163.

12.

Bonkhoff

Rübsamen

Grefkes

, et al. Development and validation of prediction models for severe complications after acute ischemic stroke: a study based on the Stroke Registry of Northwestern Germany. J Am Heart Assoc 2022; 11: e023175.

13.

Jauch

Saver

Adams Jr

, et al. Guidelines for the early management of patients with acute ischemic stroke a guideline for healthcare professionals from the American Heart Association/American Stroke Association. Stroke 2013; 44: 870–947.

14.

Hachinski

Iadecola

Petersen

, et al. National institute of neurological disorders and stroke–Canadian stroke network vascular cognitive impairment harmonization standards. Stroke 2006; 37: 2220–2241.

15.

McEvoy

Fennema-Notestine

Roddey

, et al. Alzheimer disease: quantitative structural neuroimaging for detection and prediction of clinical and structural changes in mild cognitive impairment. Radiology 2009; 251: 195–205.

16.

Fazekas

Chawluk

Alavi

, et al. MR Signal abnormalities at 1.5T in Alzheimer’s dementia and normal aging. AJR Am J Roentgenol 1987; 149: 351–356.

17.

Scheltens

Launer

Barkhof

, et al. Visual assessment of medial temporal lobe atrophy on magnetic resonance imaging: interobserver reliability. J Neurol 1995; 242: 557–560.

18.

Chen

Guestrin

. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY: Association for Computing Machinery, 2016, pp.785–794.

19.

Friedman

. Greedy function approximation: a gradient boosting machine. Ann Stat 2001; 29: 1189–1232.

20.

Prokhorenkova

Gusev

Vorobev

, et al. Catboost: unbiased boosting with categorical features. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Red Hook, NY: Curran Associates Inc, 2018, pp.6639–6649.

21.

Cortes

Vapnik

. Support-vector networks. Mach Learn 1995; 20: 273–297.

22.

Cox

. The regression analysis of binary sequences. J R Stat Soc Ser B Stat Methodol 1958; 20: 215–232.

23.

Meng

Finley

, et al. LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems 30. Red Hook, NY: Curran Associates, 2017, pp.3146–3154.

24.

Bergstra

Bardenet

Bengio

, et al. Algorithms for hyper-parameter optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Red Hook, NY: Curran Associates Inc., 2011, pp.2546–2554.

25.

Lundberg

Lee

S-I

. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY: Curran Associates Inc., 2017, pp.4768–4777.

26.

McKhann

Knopman

Chertkow

, et al. The diagnosis of dementia due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association Workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 2011; 7: 263–269.

27.

Desikan

Fan

Wang

, et al. Genetic assessment of age-associated Alzheimer disease risk: development and validation of a polygenic hazard score. PLoS Med 2017; 14: e1002258.

28.

Long

Coble

Xiong

, et al. Preclinical Alzheimer’s disease biomarkers accurately predict cognitive and neuropathological outcomes. Brain 2022; 145: 4506–4518.

29.

Giorgio

Jagust

Baker

, et al. A robust and interpretable machine learning approach using multimodal biological data to predict future pathological tau accumulation. Nat Commun 2022; 13: 1887.

30.

Jessen

Amariglio

Buckley

, et al. The characterisation of subjective cognitive decline. Lancet Neurol 2020; 19: 271–278.

31.

Rost

Brodtmann

Pase

, et al. Post-stroke cognitive impairment and dementia. Circ Res 2022; 130: 1252–1271.

32.

Thomas

Sheelakumari

Kannath

, et al. Regional cerebral blood flow in the posterior cingulate and precuneus and the entorhinal cortical atrophy score differentiate mild cognitive impairment and dementia due to Alzheimer disease. AJNR Am J Neuroradiol 2019; 40: 1658–1664.

33.

Bubb

Metzler-Baddeley

Aggleton

. The cingulum bundle: anatomy, function, and dysfunction. Neurosci Biobehav Rev 2018; 92: 104–127.

34.

Delano-Wood

Stricker

Sorg

, et al. Posterior cingulum white matter disruption and its associations with verbal memory and stroke risk in mild cognitive impairment. J Alzheimers Dis 2012; 29: 589–603.

35.

Haque

Gabr

Hasan

, et al. Ongoing secondary degeneration of the limbic system in patients with ischemic stroke: a longitudinal MRI study. Front Neurol 2019; 10: 154–164.

36.

Martikainen

Kemppainen

Johansson

, et al. Brain β-amyloid and atrophy in individuals at increased risk of cognitive decline. AJNR Am J Neuroradiol 2019; 40: 80–85.

37.

Zhang

Chang

Park

, et al. Uncinate fasciculus and its cortical terminals in aphasia after subcortical stroke: a multi-modal MRI study. Neuroimage Clin 2021; 30: 102597.

38.

Zedde

Grisendi

Assenza

, et al. Stroke-induced secondary neurodegeneration of the corticospinal tract-time course and mechanisms underlying signal changes in conventional and advanced magnetic resonance imaging. J Clin Med 2024; 13: 1969.

39.

Brodtmann

Werden

Khlif

, et al. Neurodegeneration over 3 years following ischaemic stroke: findings from the cognition and neocortical volume after stroke study. Front Neurol 2021; 12: 754204.

40.

Leng

Edison

. Neuroinflammation and microglial activation in Alzheimer disease: where do we go from here? Nat Rev Neurol 2021; 17: 157–172.

41.

Borchert

Azevedo

Badhwar

, et al. Artificial intelligence for diagnostic and prognostic neuroimaging in dementia: a systematic review. Alzheimers Dement 2023; 19: 5885–5904.

42.

Liang

, et al. Improving genomic prediction with machine learning incorporating TPE for hyperparameters optimization. Biology (Basel) 2022; 11: 1647.

43.

Wang

Chen

, et al. Predicting post-stroke cognitive impairment using machine learning: a prospective cohort study. J Stroke Cerebrovasc Dis 2023; 32: 107354.

44.

Zhong

Zhao

, et al. Development and validation of a machine learning-based risk prediction model for post-stroke cognitive impairment. Sci Rep 2025; 15: 32942.

45.

Richhariya

Tanveer

Rashid

. Diagnosis of Alzheimer’s disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomed Signal Process Control 2020; 59: 101903.

46.

Fayemiwo

Olowookere

Olaniyan

, et al. Immediate word recall in cognitive assessment can predict dementia using machine learning techniques. Alzheimers Res Ther 2023; 15: 111.

47.

Boutet

Madhavan

Elias

GJB

, et al. Predicting optimal deep brain stimulation parameters for Parkinson’s disease using functional MRI and machine learning. Nat Commun 2021; 12: 3043.

48.

Albizu

Fang

Indahlastari

, et al. Machine learning and individual variability in electric field characteristics predict tDCS treatment response. Brain Stimul 2020; 13: 1753–1764.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

4.05 MB

0.00 MB