Abstract
Background
Alzheimer's disease (AD) is strongly associated with slowly progressive hippocampal atrophy. Elucidating the relationships between local morphometric changes and disease status for early diagnosis could be aided by machine learning algorithms trained on neuroimaging datasets.
Objective
This study intended to propose machine learning models for the accurate identification and cognitive function prediction across the AD severity spectrum based on structural magnetic resonance imaging (sMRI) of the bilateral hippocampi.
Methods
The high-resolution sMRI data of 120 AD dementia patients, 232 amnestic mild cognitive impairment (aMCI) patients, and 206 healthy controls (HCs) were included from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The classification capacity and cognitive predict ability of hippocampal volume was evaluated by multiple pattern analysis using the support vector machine (SVM) and relevance vector regression (RVR) application of the Pattern Recognition for Neuroimaging Toolbox, separately. For validation, the analyses were performed using a biomarker-based regrouping method and another independent local dataset.
Results
The SVM application produced a total accuracy of 94.17%, 80.85%, and 70.74% and area under receiver operating characteristic curves of 0.97, 0.87, and 0.72 between HC versus AD dementia, HC versus aMCI, and aMCI versus AD dementia classification, respectively. The RVR application significantly predicted the baseline and mean cognitive function at three years of follow-up. Qualitatively consistent results were obtained using different regrouping method and the local dataset.
Conclusions
The machine learning methods based on the bilateral hippocampi distinguished across the AD severity spectrum and predicted the baseline and the longitudinal cognitive function with greater accuracy.
Keywords
Introduction
Alzheimer's disease (AD) is a chronic and irreversible neurodegenerative disease predominantly afflicting the elderly. 1 It is characterized by the progressive degeneration of neurons and ensuing tissue atrophy, ultimately leading to cognitive impairments and dementia. Around 46.8 million people globally are currently living with AD-associated dementia and the prevalence is expected to triple by the year 2050 due to population aging. 2 Amnestic mild cognitive impairment (aMCI) is another common age-related condition characterized by cognitive dysfunction, but with intact daily functioning, 3 with a high risk of progression to AD dementia. 4 Until now, the early diagnosis remains a challenge due to the non-specificity of symptoms and the difficulty detecting the underlying pathology using non-invasive approaches. 5 Despite the lack of curative treatments, the available anti-amyloid-β (Aβ) treatments (such as lecanemab, donanemab) have been achieved a real breakthrough, 6 thus early diagnosis is of great significance for timely intervention, which may help preserve cognition, improve patients’ living ability and reduce the burden of caregivers.
However, the routine applications of AD detection methods such as positron emission tomography (PET) scans and cerebrospinal fluid (CSF) tests are limited for early diagnosis and large-scale screening due to the complex operation, high cost and invasive characteristics. 7 In contrast, structural magnetic resonance imaging (sMRI) makes up for the above deficiencies and is the preferred neuroimaging modality for monitoring AD-associated atrophy of brain structure due to the high tissue contrast achievable, permitting accurate measurement of neural surface area, thickness, and volume. 8 Further, recent morphometric studies have attempted to identify structural biomarkers indicative of disease state for diagnosis and monitoring of disease progression. 9 In addition, the application of automated machine learning techniques enables to learn more information from images, further improving the accuracy. Previous studies have shown that machine learning classifiers based on sMRI were able to accurately diagnose AD, and the diagnostic accuracy ranging from 80% to 100%, with the hippocampus being the strongest contributor to classification decisions.10–12
The hippocampus is one of the first brain structures exhibiting detectable neurodegenerative changes and atrophy during the progression of AD before onset of cognitive symptoms. Hippocampal volume is currently the best validated sMRI biomarker for AD. 13 The exceptional spatial resolution and high contrast provided by sMRI allows for the visualization of subtle morphometric changes in the hippocampus that could aid in AD diagnosis, 14 what's more, the hippocampal volume has demonstrated promise for disease monitoring and cognitive function forecasting in clinical trials.15,16 Thus, it is necessary to construct a generalizable machine learning models based on sMRI of bilateral hippocampi for the early diagnosis and cognitive function prediction of AD.
The primary goal of current study was to propose and test a machine learning application for distinguishing across the AD severity spectrum and predicting both the baseline and the longitudinal cognitive function based on objective morphometric values of the bilateral hippocampi derived using sMRI.
Methods
Participants
The data analyzed in the current study were obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI, http://adni.loni.usc.edu), an ongoing, multisite and longitudinal dataset. The ADNI was launched in 2003 as a public-private partnership, led by principal investigator Michael W. Weiner, MD. The primary goal of ADNI was to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. The study was approved by the Institutional Review Boards (IRBs) of all participating institutions. Informed written consent was obtained from all participants at each site. The detailed criteria of inclusion and exclusion are described on the website (http://www.adni-info.org).
In the current study, participants included were recruited from ADNI-2 at baseline, all of them underwent CSF Aβ42 measurement, sMRI scans, a battery of cognitive test at baseline, and had at least one cognitive test during the three years of follow-up. Finally, 558 participants were included (120 AD dementia patients, 232 aMCI patients, and 206 healthy controls (HCs)).
This study was approved by the Medical Research Ethics Committee of the First Affiliated Hospital of Anhui Medical University, China, according to the Declaration of Helsinki, as well as the institutional review board of all participating sites in ADNI. Informed consent was provided by all subjects.
Cognitive assessments
All participants included in the current study received a set of standardized neuropsychological evaluation including the Mini-Mental State Examination (MMSE), Montreal Cognitive Assessment (MoCA), Auditory Verbal Learning Test (AVLT) and so on. For a description of the cognitive scales, see the Supplemental Material.
CSF biomarkers
CSF was collected by lumbar puncture, aliquoted to 500 μL in polypropylene tubes, and stored at −80°C. The concentrations of CSF biomarkers (Aβ42, t-tau, and p-tau) were measured using Elecsys electrochemiluminescence immunoassays according to the methods on the website (http://adni.loni.usc.edu).
MRI data acquisition and image pre-processing
The sMRI data of these participants were acquired using MPRAGE or equivalent protocols of different resolutions with a slice thickness of 1.2 mm and have been subjected to several preprocessing steps by the research groups belonging to the ADNI. The geometric distortions caused by gradient models were corrected, as were image intensity B1 nonuniformities, an N3 histogram peak sharpening algorithm was applied to reduce image intensity nonuniformities.
The anatomic images were pre-processed using the voxel-based morphometry (VBM) 8 toolbox (http://dbm.neuro.uni-jena.de/vbm.html), implemented in Statistical Parametric Mapping (SPM) 8 (www.fil.ion.ucl.ac.uk/spm/software/spm8/), running on Matlab 2018 (Math Works, USA). The detailed steps for image pre-processing can be found in the Supplemental Material.
Machine learning models
Machine learning was performed using the Pattern Recognition for Neuroimaging Toolbox (PRoNTo) running in Matlab 2018, 17 and the covariates age, sex, education, and total intracranial volume (TIV) were regressed out. To evaluate the statistical significance of model performance, permutation tests were conducted in which the results were compared to the classification of models retrained by 1000 repetitions of random labels.
Performance of the classification model
Class labels were first assigned to the smoothed images of each subject (AD dementia, aMCI, or HC). A DARTEL gray matter mask was applied as the first-level mask and a mask of the bilateral hippocampi extracted from the Brainnetome atlas 18 was selected as the second-level mask to build the kernel. The hippocampus is further divided into rostral and caudal hippocampus along the long axis in this atlas. Furthermore, a mask of the bilateral hippocampi extracted from the Automatic Anatomical Labelling (AAL) atlas, a commonly used atlas and the atlas obtained from a preliminary release of the EADC-ADNI harmonized segmentation protocol project 19 were selected as the second-level mask to verify the stability of classification results.
Support vector machine (SVM; Cortes and Vapnik) was adopted for the discrimination analysis. It is a generalized linear classifier based on supervised algorithm for pattern recognition. The basic principle is to find a hyperplane in the feature space, so as to distinguish different classes of samples, and maximize the distance from the sample point closest to the hyperplane. It is a common machine learning application for neuroimaging due to its strong capacity for generalization based on relatively small training samples. During the model training phase, weights are assigned to features yielding maximal separation using a hyperplane. Classification labels are then obtained by the total feature weights and the test sample. In this study, a default soft-margin parameter C = 1 was selected.
A leave-one-out cross-validation (LOOCV) method was adopted to quantify the classification power of each model generated by the algorithms.20,21 Receiver operating characteristic (ROC) curves were then constructed based on LOOCV results to assess the classifiers’ performance for distinguishing among AD dementia, aMCI, and HC individuals. The classification performance is expressed as the area under the ROC curve (AUC), and the larger AUC, the better the classification performance.
Performance of the regression model
Due to the relatively high classification accuracy obtained using the whole hippocampus template based on the Brainnetome atlas, a DARTEL gray matter mask was applied as the first-level mask and a mask of the bilateral hippocampi extracted from the Brainnetome atlas was selected as the second-level mask to build the kernel for the regression model.
Relevance vector regression (RVR) is an SVM-based regression model used to predict continuous values. The basic idea of RVR is to find a hyperplane in the feature space so that all data points are closest to that plane. This hyperplane can be thought of as the best fit line. Compared to traditional support vector regression (SVR), RVR is more flexible because it allows data points to have zero weight on the hyperplane, which reduces model complexity and improves prediction accuracy. 22 The performance of the prediction model was evaluated using Pearson correlation coefficient (R), mean squared error (MSE), and the coefficient of determination (R2).
Statistical analysis
The Statistical Package for the Social Sciences 23.0 was applied for the statistical analyses. The differences in continuous variables were compared by one-way analysis of variance (ANOVA) with post hoc Bonferroni correction for pair-wise comparisons, while categorical variables were compared among groups by χ2 test. Group differences in bilateral hippocampal volume were assessed by constructing a general linear model with age, sex, education, apolipoprotein E (APOE) ε4 allele status and TIV as covariates. p < 0.05 was considered statistically significant.
Validation analysis
Validation analysis of regrouped participants of ADNI by CSF Aβ42 measurements
To represent AD progression in more detail, the participants were further regrouped into different groups according to the CSF Aβ42 level based on the following pre-established cutoffs: Aβ42 < 980 pg/mL A+. 23 Participants were categorized into the following subgroups: HC A-/A+, cognitive normal with Aβ negative/positive; aMCI A-/A+, aMCI with Aβ negative/positive; AD dementia A-/A+, AD dementia with Aβ negative/positive. The SVM model was applied for distinguishing participants among HC A-, aMCI A+ and AD dementia A+.
Validation analysis of participants from local dataset
To verify the stability of the machine learning model performance, these analyses were repeated using another independent cohort from our local hospital-the First Affiliated Hospital of Anhui Medical University. In total, 423 participants matched for age, sex, and education were included (161 AD dementia patients, 130 aMCI patients, and 132 HCs). The details of participants’ inclusion and exclusion criteria, cognitive assessment, MRI data acquisition and image pre-processing steps can be found in the Supplemental Material.
Results
Cohort characteristics
The demographic data of all subjects are summarized in Table 1. No significant differences in age, education, ethnicity, race, sex ratio, or TIV among HC, aMCI, and AD dementia groups were detected (all p>0.05). However, body mass index was significantly higher in the aMCI and HC groups compared to AD dementia group (both p < 0.05). As expected, the cognitive assessment scores (MMSE, MoCA, and AVLT at baseline and during follow-up) and the concentration of Aβ42 were significantly higher in the HC group compared to AD dementia and aMCI groups, and higher in the aMCI group than the AD dementia group, while the proportion of APOE ε4 carriers and the concentration of t-tau and p-tau had the opposite trend (all p < 0.05).
Characteristics of the AD dementia, aMCI, and HC groups in the ADNI database.
AD: Alzheimer's disease; aMCI: amnestic mild cognitive impairment; HC: healthy control; BMI: body mass index; APOE: apolipoprotein E; MMSE: Mini-Mental State Examination; MoCA: Montreal Cognitive Assessment; AVLT: Auditory Verbal Learning Test; CSF: cerebrospinal fluid; Aβ: amyloid-β; TIV: total intracranial volume.
Comparison of hippocampal volume
The volumes of the whole, rostral and caudal hippocampus were largest in HC participants, medium in aMCI patients, and minimal in AD dementia patients (p < 0.05, Table 2 and Figure 1).

Extraction of bilateral whole, rostral, and caudal hippocampi and intergroup comparisons of hippocampal volume in Alzheimer's disease (AD) dementia, amnestic mild cognitive impairment (aMCI), and healthy control (HC) groups (A-C, Bonferroni correction, ****p < 0.0001).
Group differences in gray matter volume of bilateral hippocampi between AD dementia, aMCI, and HC groups.
AD: Alzheimer's disease; aMCI: amnestic mild cognitive impairment; HC: healthy control.
Performance of the SVM classifier
The predictive power of the classifier was evaluated by calculating the total accuracy, balanced accuracy, sensitivity and specificity in the cross-validation testing phase (Table 3) from ROC curves (Figure 2). Compared to rostral and caudal hippocampus, the SVM model trained using sMRI of the whole hippocampus achieved the highest accuracy, yielding a total accuracy of 94.17%, 80.85%, 70.74%, balanced accuracy of 93.30%, 80.87%, 68.35% (all p = 0.001), sensitivity of 94.31%, 83.00%, 56.59%, specificity of 93.91%, 78.59%, 78.92% for the AD dementia versus HC, aMCI versus HC and AD dementia versus aMCI separately (AUC = 0.97, 0.87, 0.72 separately). Similar results were obtained using other two hippocampal masks (Supplemental Table 1).

ROC curves from the support vector machine classifier and the general ROC model for classification of the AD spectrum based on the whole, rostral and caudal hippocampus (A-C). ROC: receiver operating characteristic; AUC: area under the curve; AD: Alzheimer's disease; aMCI: amnestic mild cognitive impairment; HC: healthy control.
Accuracy and area under the ROC curve (AUC) by hippocampal volume in classifying the AD dementia, aMCI, and HC groups.
ROC: receiver operating characteristic; AD: Alzheimer's disease; aMCI: amnestic mild cognitive impairment; HC: healthy control.
Performance of the RVR prediction model
As the SVM model based on whole hippocampus achieved highest accuracy, the RVR prediction model used the mask of the whole hippocampus for further analysis (Table 4). The model not only significantly predicted the baseline cognitive performance (MoCA: R = 0.62, p = 0.001, R2 = 0.38, p = 0.010, MSD = 12.36, p = 0.001; AVLT-immediate: R = 0.63, p = 0.001, R2 = 0.40, p = 0.003, MSD = 106.29, p = 0.001; AVLT-learning: R = 0.50, p = 0.001, R2 = 0.25, p = 0.010, MSD = 5.96, p = 0.001), but also significantly predicted the mean cognitive function at three years of follow-up (MoCA: R = 0.64, p = 0.001, R2 = 0.41, p = 0.010, MSD = 18.07, p = 0.001; AVLT-immediate: R = 0.43, p = 0.001, R2 = 0.19, p = 0.020, MSD = 110.45, p = 0.001; AVLT-learning: R = 0.38, p = 0.001, R2 = 0.15, p = 0.023, MSD = 4.17, p = 0.001).
The performance of the prediction model.
MoCA: Montreal Cognitive Assessment; AVLT: Auditory Verbal Learning Test; R: Pearson correlation coefficient, MSE: mean squared error; R2: coefficient of determination.
Validation analysis
The SVM classification model yielded similar total accuracy for distinguishing AD dementia A+ from HC A- (95.80%), AD dementia A+ from aMCI A+ (70.80%), and aMCI A+ from HC A- (82.01%) by regrouping ADNI participants according to the concentration of CSF Aβ42 (Table 5).
Accuracy and area under the ROC curve (AUC) by hippocampal volume in classifying the AD dementia
ROC: receiver operating characteristic; AD: Alzheimer's disease; aMCI: amnestic mild cognitive impairment; HC: healthy control.
The SVM classification model yielded qualitatively similar results for the local dataset, with a similar total accuracy for distinguishing AD dementia from HC (95.56%), a slightly lower accuracy for distinguishing aMCI from HC (74.05%) and a slightly higher accuracy for distinguishing AD dementia from aMCI (76.63%) (Supplemental Table 6). The RVR model also performed well in predicting baseline cognitive function in the local dataset (Supplemental Table 7).
Discussion
We developed and tested generalizable machine learning models using PRoNTo application implemented in Matlab for distinguishing patients on the AD disease spectrum and predict the baseline and longitudinal cognitive performance based on sMRI measures of bilateral hippocampal volume. The main findings were as follows: (1) The SVM model demonstrated high accuracy for distinguishing patients from controls and reasonable accuracy for distinguishing AD dementia from aMCI. (2) The RVR model significantly predicted the baseline and mean cognitive function at three years of follow-up. (3) Qualitatively consistent results were obtained by using different mask of hippocampus, a regrouping method based on the biomarker of CSF Aβ and a different local dataset.
The SVM model demonstrated high accuracy for distinguishing patients from controls and reasonable accuracy for distinguishing AD dementia from aMCI patients. The model enabled us to distinguish AD dementia patients from controls with up to 93.30% balanced accuracy (AUC = 0.97), and the balanced accuracy was 80.87% when distinguishing aMCI patients from controls (AUC = 0.87), superior to many established models trained using databases24–27 or data from patients recruited by the study site.28,29 The high total and balanced accuracy suggested that the model was not affected by the effect of imbalanced test sets and the application of SVM method also makes the model have high efficiency, interpretability and robustness. 30 However, the accuracy of this classification model was just acceptable when applied to the identification of aMCI patients and AD dementia patients (balanced accuracy = 68.35, AUC = 0.72),26,31 underscoring the need for training using much large numbers of aMCI patient images. In the current study, we selected SVM for model construction, which is more suitable for small sample size models and has better generalization and explainability, however, this model is limited in scalability, practicability and computational complexity, etc.32,33 Continued model training using large numbers of images and the application of other models, such as random forests, may further improve this classification accuracy, thereby facilitating earlier diagnosis and identifying structural features predictive of the aMCI to AD dementia transition.
The hippocampus is one of the first brain regions exhibiting AD pathology and ultimately one of the most severely damaged as focal lesions of apoptotic neurons accumulate, leading to detectable hippocampal shrinkage. 34 Consistent with numerous previous reports, we found a decrease in bilateral hippocampal volume of both AD and aMCI patients.35,36 The analyses based on the subregions of the hippocampus revealed that the classification accuracy was lower than that of whole hippocampus. The outperformance of the whole hippocampal volume suggested that there are significant inter-group differences in the whole hippocampus, which can provide more information in the classifier than subregions and play a slightly greater role in the diagnosis of AD. Based on findings of hippocampal degeneration in AD models and patient samples, numerous studies over the past decade have used sMRI to non-invasively capture the structural changes to the whole hippocampus induced by the disease.37,38 It may be possible to establish models predictive of AD onset and progression based on regional structural changes or combinations of changes in other brain regions.
Globally, the MMSE is the preferred screening tool for AD patients, while the MoCA is applied regularly for the clinical diagnosis of aMCI patients 39 as total scores strongly correlate with disease progression (i.e., with the extent of cortical and hippocampal damage). In the current study, the AVLT scale was also included to provide a more targeted assessment of short- and long-term memory deficits. According to previous literatures, there were strong relationship between AVLT scores and hippocampal volume associated with AD, highlighting the significance of the AVLT for the detection and diagnosis of AD in the early stages. 40 The results of the current study suggested that the RVR model based on bilateral hippocampal volume can predict baseline and follow-up scores of the above scales, further suggesting that bilateral hippocampi contribute to cognitive dysfunction in AD.
In terms of generalization of the model, similar classification accuracies were obtained by regrouping the participants according to the levels of CSF Aβ on the basis of clinical diagnosis, which proved that the models were stable. On the other hand, this model had similar predictive performance in both the local dataset and the ADNI database, representing different races, educational levels, and socioeconomic status, indicating that the model can be generalized to independent datasets and thus play a greater role.
This study has several limitations: (1) the participants of the local dataset were mainly Han Chinese, lacking diversity and under-represented, at the same time, the lack of biomarker, genes, socioeconomic status limited the representative power of the local dataset; (2) model training using only sMRI of the hippocampus, and the discriminative ability was relatively weak for aMCI discriminations. Training using image acquired by other modalities, including functional MRI, PET and so on, may improve the accuracy of classification. (3) the segmentation of hippocampus was based on the application of SPM, and the results were not validated using segmentation methods based on other applications, such as Freesurfer; (4) the inclusion/exclusion criteria reduced the likelihood of mixed pathologies and limit the generalization of this dataset.
Conclusion
The SVM and RVR machine learning models based on hippocampal volume can accurately distinguish AD from matched controls with satisfying generalization ability and significantly predicted the baseline and longitudinal cognitive function. The findings suggest the key role of hippocampus in the early detection and diagnosis of AD, highlighting the value of promoting and applying machine learning classification model based on hippocampus structure in clinical practice. Additional training with larger image samples may further improve classification performance, and enhance the utility of this tool for aiding in AD diagnosis and prognosis.
Supplemental Material
sj-docx-1-alz-10.1177_13872877241296130 - Supplemental material for Identification and cognitive function prediction of Alzheimer's disease based on multivariate pattern analysis of hippocampal volumes
Supplemental material, sj-docx-1-alz-10.1177_13872877241296130 for Identification and cognitive function prediction of Alzheimer's disease based on multivariate pattern analysis of hippocampal volumes by Ziwen Gao, Wanqiu Zhu, Yuqing Li, Wei Ye, Xiao Chen, Shanshan Zhou, Xiaohu Li, Xiaoshu Li, Yongqiang Yu and the Alzheimer's Disease Neuroimaging Initiative in Journal of Alzheimer's Disease
Footnotes
Acknowledgments
We thank all the participants for their cooperation during this study.
Data collection and sharing for the Alzheimer's Disease Neuroimaging Initiative (ADNI) is funded by the National Institute on Aging (National Institutes of Health Grant U19AG024904). The grantee organization is the Northern California Institute for Research and Education. In the past, ADNI has also received funding from the National Institute of Biomedical Imaging and Bioengineering, the Canadian Institutes of Health Research, and private sector contributions through the Foundation for the National Institutes of Health (FNIH) including generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.
Author contributions
Gao Ziwen (Writing – original draft); Wanqiu Zhu (Writing – original draft); Yuqing Li (Data curation); Wei Ye (Data curation); Xiao Chen (Data curation); Shanshan Zhou (Data curation); Xiaohu Li (Data curation); Xiaoshu Li (Writing – review & editing); Yongqiang Yu (Writing – review & editing).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported by the National Natural Science Foundation of China (No. 82071905).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
