Abstract
Background:
Noninvasive identification of amyloid-β (Aβ) is important for better clinical management of mild cognitive impairment (MCI) patients.
Objective:
To investigate whether radiomics features in the hippocampus in MCI improve the prediction of cerebrospinal fluid (CSF) Aβ42 status when integrated with clinical profiles.
Methods:
A total of 407 MCI subjects from the Alzheimer’s Disease Neuroimaging Initiative were allocated to training (n = 324) and test (n = 83) sets. Radiomics features (n = 214) from the bilateral hippocampus were extracted from magnetic resonance imaging (MRI). A cut-off of <192 pg/mL was applied to define CSF Aβ42 status. After feature selection, random forest with subsampling methods were utilized to develop three models with which to predict CSF Aβ42: 1) a radiomics model; 2) a clinical model based on clinical profiles; and 3) a combined model based on radiomics and clinical profiles. The prediction performances thereof were validated in the test set. A prediction model using hippocampus volume was also developed and validated.
Results:
The best-performing radiomics model showed an area under the curve (AUC) of 0.674 in the test set. The best-performing clinical model showed an AUC of 0.758 in the test set. The best-performing combined model showed an AUC of 0.823 in the test set. The hippocampal volume model showed a lower performance, with an AUC of 0.543 in the test set.
Conclusion:
Radiomics models from MRI can help predict CSF Aβ42 status in MCI patients and potentially triage the patients for invasive and costly Aβ tests.
INTRODUCTION
Mild cognitive impairment (MCI) is often considered a prodromal stage of Alzheimer’s disease (AD); however, patients with MCI can vary, with different rates of progression toward AD [1]. The identification of MCI patients at risk for dementia due to AD is of utmost importance to predicting disease prognosis, as well as for potential preventative and therapeutic treatments [2]. Therefore, biomarker-based detection of initial amyloid-β (Aβ) pathology is important for better clinical management of MCI, potentially providing an opportunity to start disease-modifying therapies prior to progression of AD.
Aβ pathology can be assessed by measurement of Aβ concentrations in the cerebrospinal fluid (CSF) or via molecular imaging techniques, such as positron emission tomography (PET) scans using a specific radioligand for Aβ [3]. However, obtaining CSF by lumbar puncture is invasive, and PET scans are costly, invasive due to radiation exposure, and are not always available [4]. Therefore, finding non-invasive predictive biomarkers for Aβ status could reduce the number of invasive examinations required and financial burden.
Structural neuroimaging using magnetic resonance imaging (MRI) has been shown to be useful in characterizing dementia and cognitive decline due to AD pathology [5, 6]. Structural changes in AD-vulnerable structures, such as the entorhinal cortex, hippocampus, and temporal lobe, have been reported to be diagnostic indicators of cognitive impairment and even used for the prediction of amyloid pathology [6]. Compared with CSF study and PET scan, MRI has the advantages of being non-invasive, and its cost is usually reimbursed in most countries. Therefore, if MRI can predict Aβ pathology, it would hold potential advantages over CSF study or PET scans.
Radiomics is an emerging field that extracts auto-mated quantifications of radiologic phenotypes using data characterization algorithms [7]. Because radio-mics models use high-throughput imaging features, they are more likely reveal hidden information that is inaccessible with single-parameter approaches. To the best of our knowledge, there has been no previous study of the use of radiomics to predict amyloid pathology in patients with MCI. We hypothesized that radiomics features of brain MRI, along with machine learning techniques, in MCI patients would improve the prediction of CSF Aβ42 status when integrated with clinical and genetic profiles.
METHODS
Patient population
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database http://adni.loni.usc.edu. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see http://www.adni-info.org.
A total of 494 patients diagnosed with MCI who were enrolled in the Alzheimer’s Disease Neuroimaging Initiative-GO (ADNI-GO) and ADNI2 database were included in this study. The eligible patients were those who completed baseline visits and underwent MRI. Of these, we excluded those who had 1) missing demographics or neuropsychological (NP) test data (n = 68), 2) errors in hippocampus masks or severe artifacts on MRI (n = 18), or 3) errors in radiomics processing (n = 1). Finally, 407 patients were enrolled in this study. The enrolled patients were randomly allocated to training (n = 324) and test (n = 83) sets (Fig. 1).

Patient inclusion flowchart.
Apolipoprotein E (APOE) gene polymorphism was assessed, and patients were divided into ɛ4 carriers (ɛ4/ɛ4 or ɛ3/ɛ4) and non-carriers according to the presence of the APOE ɛ4 allele. NP test results, including the Mini-Mental State Examination (MMSE), the 11-item Alzheimer’s Disease Assessment Scale cognitive subscale (ADAS-cog), and Logical Memory I (LM I) immediate recall and Logical Memory II (LM II) delayed recall, from MCI patients were obtained [8–10]. The total numbers of story units recalled in LM I were labeled as the LM I total score, and the total number of story units recalled in LM II were labeled as the LM II total score. The total number of cues in LM II were labeled as the LM II cue score. CSF Aβ42 was measured for all patients with available CSF samples using the ADNI Biomarker Core at the University of Pennsylvania School of Medicine [11]. CSF Aβ42 was dichotomized to Aβ- or Aβ+groups using a previously defined CSF concentration threshold (CSF Aβ42 <192 pg/mL) [12].
MRI acquisition
MRIs were acquired using a 3-Tesla system as per standardized protocols compatible with the ADNI [13]. T1-weighted images were acquired using an axial three-dimensional spoiled gradient echo sequence. Axial T2 fluid-attenuated inversion recovery images were acquired.
Image postprocessing and radiomics feature extraction
Automated mask extraction of the right and left hippocampi was performed using volBrain https://volbrain.upv.es/ [14, 15], which is a robust automatic pipeline for brain segmentation with high accuracy [16]. After denoising with an adaptive nonlocal mean filter, images were affine-registered in the Montreal Neurological Institute space using Advanced Normalization Tools software [17], corrected for image inhomogeneities using N4, and finally, intensity normalized [18]. Then, the hippocampus was segmented based on a multi-atlas framework combining nonlinear registration and patch-based label fusion [19]. Two experienced neuroradiologists (Y.W.P. and M.P, with 8 years and 10 years of experience, respectively) visually checked for segmentation or registration errors by overlaying each subject’s native-space-transformed ROI masks onto their T1-weighted images and modified the errors in consensus.
For radiomics analysis, all images were resampled to a 1-mm isovoxel across all patients. A total of 107 radiomics features, including shape; first-order features; and second-order features, consisting of gray level co-occurrence matrix, gray level run-length matrix, gray-level size zone matrix, gray level dependence matrix, and neighboring gray tone difference matrix (Supplementary Table 1); were extracted from each hippocampus ROI. A total of 214 (107 features×two ROIs [right and left hippocampi]) radiomics features were obtained. The feature extraction was performed using an open-source Python-based package (PyRadiomics, version 2.0) [20].
Intracranial volume-corrected hippocampus volume measurement
Total intracranial volume (ICV) and bilateral hippocampus volume were generated by the volBrain pipeline, and the hippocampus volume was normalized using ICV ([hippocampus volume/ICV]×100). ICV-adjusted hippocampal volumes were used for the construction of prediction models of Aβ status. A logistic regression model was constructed in the training set and validated in the test set.
Radiomics feature selection and machine learning models with performance evaluation
After normalization of all imaging features by z-score normalization, the least absolute shrinkage and selection operator (LASSO) with 10-fold cross validation was applied for feature selection after splitting the training and test sets [21]. LASSO is designed to avoid overfitting and is known to be suitable for analyzing high-dimensional datasets, such as radiomics features. To evaluate whether radiomics improves prediction over models, three types of models were trained as follows: 1) a model based on radiomics features; 2) a clinical model based on demographics (age, sex, and education), APOE ɛ4 status, and NP test results (MMSE, ADAS-cog, LM I, and LM II); and 3) a combined model based on radiomics features and clinical features. For classification, we applied the random forest (RF) algorithm. Hyperparameters were optimized by random search. In addition, to overcome data imbalance, each machine learning model was trained as follows: 1) without subsampling, 2) with synthetic minority over-sampling technique (SMOTE), and 3) with random over-sampling examples (ROSE) [22, 23]. Thus, a total of nine combinations of RF-based prediction models with different subsampling methods were trained and validated. Performance was evaluated in the training set with 10-fold cross-validation and validated in the test set. The area under the curve (AUC), accuracy, sensitivity, and specificity of each model were obtained. The machine learning algorithms were trained and validated using Python 3 with Scikit-Learn library v0.21.2. The overall process is shown in Fig. 2.

Workflow of image processing, radiomics feature extraction, and machine learning. LASSO, least absolute shrinkage and selection operator; RF, random forest.
Statistical analysis
For analysis of baseline characteristics and neuropsychological test scores, either Student’s t-test or Mann Whitney’s U test was used for continuous variables according to normality. Chi-square test was performed for categorical variables. Logistic regression using ICV-adjusted hippocampal volumes was applied to construct a prediction model for Aβ status to evaluate the predictive performance of hippocampus volume. All statistical analyses were performed using the statistical software R (version 3.6.0; R Foundation for Statistical Computing). Statistical significance was set at p < 0.05.
RESULTS
Patient characteristics
The baseline characteristics and NP test results for the 407 MCI patients in the training and test sets are summarized in Table 1. In both the training and test sets, the CSF Aβ42 +group was significantly older (p = 0.001 and p = 0.003 in the training and test sets, respectively), had a higher prevalence of APOE ɛ4 carriers (p < 0.001 and p < 0.001 in the training and test sets, respectively), showed higher scores in ADAS-cog (p < 0.001 and p = 0.014 in the training and test sets, respectively), and had lower LM I total (p < 0.001 and p = 0.008 in the training and test sets, respectively) and LM II total scores (p < 0.001 and p = 0.014 in the training and test sets, respectively) compared to the CSF Aβ42 –group.
Clinical characteristics in the training and test sets
Data are presented as the number of patients (%) or mean±SD. *p-values were calculated using Student’s t-test for continuous variables and chi-square test for categorical variables, to compare subject characteristics between the CSF Aβ42 –and CSF Aβ42 +groups in the training and test sets, respectively. †p-values were calculated using Student’s t-test for continuous variables and chi-square test for categorical variables, to compare subject characteristics between the training and test set. Aβ, amyloid-β; ADAS, Alzheimer’s Disease Assessment Scale; APOE, apolipoprotein E; CSF, cerebrospinal fluid; LM, logical memory; MMSE, Mini-Mental State Exam.
There were no differences in the clinical characteristics and NP test results between the training and test sets.
ICV-adjusted hippocampus volume model performance
The ICV-adjusted hippocampus volume model showed an AUC of 0.595 in the training set to predict CSF Aβ status. In the test set, the hippocampus volume model showed an AUC, accuracy, sensitivity, and specificity of 0.543 (95% confidence interval [CI]: 0.433–0.653), 55.4%, 64.2%, and 40%, respectively.
Radiomics features and classification performance
In the radiomics model, 33 radiomics features (17 from the right and 16 from the left hippocampi) were selected to predict CSF Aβ status in the training set (Supplementary Table 2). The selected features consisted of seven shape features, four first-order features, and 22 second-order features (e.g., gray level run-length matrix, gray-level size zone matrix, gray level dependence matrix, and neighboring gray tone difference matrix). The AUCs ranged from 0.594 to 0.718 in the training set. In the test set, the radiomics model with the highest predictive power among the various combinations of ML models was RF with ROSE, with an AUC, accuracy, sensitivity, and specificity of 0.674 (95% confidence interval [CI]: 0.557–0.790), 65.1%, 82.7%, and 35.5%, respectively.
In the clinical model, patient sex, age, education (years), ADAS-cog, LM I total score, and APOE ɛ4 status were included to predict CSF Aβ status in the training set. The AUCs ranged from 0.723 to 0.769 in the training set. In the test set, the radiomics model with the highest predictive power among the various combinations of ML models was RF with ROSE, with an AUC, accuracy, sensitivity, and specificity of 0.758 (95% CI: 0.656–0.861), 71.1%, 67.3%, and 77.4%, respectively.
In the combined model of radiomics and clinical features, 32 out of 33 radiomics features from the radiomics model were retained, and five out of six clinical features from the clinical model were retained (Supplementary Table 3). The features that dropped out from the LASSO procedure in the combined model, compared to the radiomics and clinical models, were one first-order feature (minimum) and patient sex, respectively. The AUCs ranged from 0.732 to 0.804 in the training set. The combined models with highest predictive power in the test set was RF with SMOTE, with an AUC, accuracy, sensitivity, and specificity of 0.823 (95% CI: 0.734–0.912), 77.1%, 84.6%, and 64.5%, respectively. The combined model showed higher performance than either the radiomics (AUC 0.674) or clinical model (AUC 0.758) in the test set. The diagnostic performances of the three models in the test set are summarized in Table 2.
Diagnostic performance of each machine learning model in the test set for prediction of CSF Aβ positivity
AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; ROSE, random over-sampling example; SMOTE, synthetic minority over-sampling technique.
DISCUSSION
In this study, we developed and validated a prediction model based on a combination of clinical and radiomics features that could predict Aβ positivity based on CSF analysis at the single-subject level. The combined model involving both clinical and radiomic features showed the best performance (AUC: 0.823), followed by the clinical model (AUC: 0.758) and the radiomics model (AUC: 0.674) in the test set, demonstrating the utility and robustness of the combined model. These results indicate the independent contribution of radiomics and clinical features in identifying MCI with CSF Aβ pathology and the added value of the radiomics beyond the effects of clinical features.
Accumulation of Aβ pathology is one of the hallmark pathologic characteristics of the AD continuum and precedes decades before the onset of cognitive symptoms [6]. Recently, many amyloid-modifying therapy trials in AD subjects failed to show its effectiveness [24–26], and one of the presumed reasons for failure is the enrollment of subjects with clinical heterogeneity who did not have increased cerebral Aβ plaques and were unlikely to have had AD pathology [27]. Therefore, the identification of Aβ biomarkers via CSF Aβ or PET is important to diagnosing the AD continuum in both research and clinical settings. However, these biomarkers are not routinely acquired in clinics, owing to limited resources, high costs, and the need for invasive procedures. Therefore, practical methods to determine candidates for the amyloid biomarker test with commonly available clinical and MRI data may be helpful.
In this context, many previous studies have at-tempted to develop and propose different predictive models for identifying Aβ positivity with various predictors, such as demographic features, APOE ɛ4 status, results of NP tests, and/or MRI features [28–33]. An early study using NP test results showed good performance with an AUC around 0.77–0.86 [32], although a small number of patients without validation limits its value. Other studies using comprehensive parameters have been shown to be associated with Aβ status, although most were performed without proper validation, which may have led to over-fitted results [34]. A recent study applying a data-driven algorithm with clinical features showed an AUC of 0.71 on its validation [33], showing only fair performance, unlike the high predictive performance noted in previous studies. Furthermore, several studies have applied data that are not easily accessible, such as blood-based biomarkers, which have limited availability for wide clinical application [35–37]. Meanwhile, our model integrating radiomics and clinical features showed good performance in both the training set and the test set, showing its robustness. The robust predictive capacity of the combined predictive model in early AD continuum patients can help triage the subjects for more invasive and costly Aβ testing.
Although previous radiomics studies in the neuro-radiology field have mostly focused on neuro-oncology [38–40], there have also been several recent studies using radiomics analysis on T1-weighted images in AD. These studies using radiomic feature have shown promising results and have indicated that radiomic features are helpful not only in the diagnosis of AD but also in the prediction of disease progression [41–43]. In our study, various shape features, such as maximum 2D or 3D diameter from the right and left hippocampi, which reflect volume information, were included in the radiomics based model, which is in line with previous studies on decreased volume of the hippocampus in MCI patients with amyloid pathology [29–31]. However, though MCI patients with amyloid pathology have lower volumes in various brain regions, including the hippocampus, many predictive models using hippocampal volume offer only a fair degree of diagnostic performance [30, 31]. In line with those results, we also found that ICV-corrected hippocampal volume showed an AUC of 0.543 in predicting amyloid pathology, suggesting that hippocampal volume alone may not be a reliable predictor.
Interestingly, the majority of selected radiomics features retained in our predictive model were second order features (21 out of 32 selected features) in the combined model, apart from shape and first order features. The hypothesis for this observation is that second order features capture the spatial variation in signal intensity, which tend to extract information that may be incomprehensible and invisible to the naked eye. Moreover, second order features may reflect signal intensity variation from the deposition of Aβ plaques and extract different biologic information from volume, which is the traditional imaging biomarker of AD. Previous studies have shown that Aβ plaques in AD can be reflected in MRI signal intensity [43, 44]. As T1 relaxation time is known to be related to many factors, such as macromolecular integrity, the relationship between free and bound water, and the neuronal loss potentially associated with Aβ plaques, and these changes may also be captured by radiomics analysis [41]. This was further evidenced by a recent study that showed radiomics features to be reflective of underlying histology [45]. However, a further follow-up study with histopathologic correlation is mandatory to prove our hypothesis of a direct relationship between radiomic features and deposition of Aβ plaques in the brain.
Notably, nearly all the radiomics features were retained in the combined model after the LASSO procedure in our study. This suggests that most radiomics features harbor information independent from clinical features, which may provide added value in predicting CSF Aβ status. However, the prediction of CSF Aβ status by radiomics features alone was not optimal, confirming the importance of clinical features. Nonetheless, our results indicate that the added value of radiomics features to clinical features.
Our study has several limitations. First, we only included the radiomics features of the hippocampus, as previous studies have shown good performance using the hippocampus mask for the classification and prediction of AD [5, 43]. However, volume changes not only occur in the hippocampus, but also in other AD signature regions, such as the entorhinal cortex and precuneus [6]. Thus, the radiomics prediction model could be improved by adding radiomic information on other anatomical structures. Further, whole brain investigation should be performed in future studies. Second, CSF Aβ status was used as the gold standard for Aβ positivity rather than PET imaging. It could be argued that the performance of the prediction model could be sensitive to the selection of the gold-standard method. However, the agreement between CSF and PET determinations of Aβ positivity is very high, particularly in the intermediate ranges where thresholds for positivity typically lie [46, 47].
In conclusion, an MRI radiomics-based model can help predict CSF Aβ42 status in MCI patients and can potentially triage these patients for invasive and costly Aβ tests.
Footnotes
ACKNOWLEDGMENTS
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. NRF-2020R1C1C1005724), the Basic Science Research Program through the NRF of Korea funded by the Ministry of Education (2020R1I1A1A01071648), and a new faculty research seed money grant of Yonsei University College of Medicine for 2020 (3-2020-0031).
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health
. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
