Abstract
Background:
Amyloid-β positivity (Aβ+) based on PET imaging is part of the enrollment criteria for many of the clinical trials of Alzheimer’s disease (AD), particularly in trials for amyloid-targeted therapy. Predicting Aβ positivity prior to PET imaging can decrease unnecessary patient burden and costs of running these trials.
Objective:
The aim of this study was to evaluate the performance of a machine learning model in estimating the individual risk of Aβ+ based on gold-standard of PET imaging.
Methods:
We used data from an amnestic mild cognitive impairment (aMCI) subset of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) cohort to develop and validate the models. The predictors of Aβ status included demographic and ApoE4 status in all models plus a combination of neuropsychological tests (NP), MRI volumetrics, and cerebrospinal fluid (CSF) biomarkers.
Results:
The models that included NP and MRI measures separately showed an area under the receiver operating characteristics (AUC) of 0.74 and 0.72, respectively. However, using NP and MRI measures jointly in the model did not improve prediction. The models including CSF biomarkers significantly outperformed other models with AUCs between 0.89 to 0.92.
Conclusions:
Predictive models can be effectively used to identify persons with aMCI likely to be amyloid positive on a subsequent PET scan.
Keywords
INTRODUCTION
Cerebral amyloid-β (Aβ) deposition is a hallmark pathologic change in Alzheimer’s disease (AD) and is believed to precede dementia by many years [1]. In the last decade, many clinical trials have tried to use targeted therapies to lower brain Aβ, but all these trials have failed to achieve significant effects on clinical endpoints [2–4]. Major proposed reasons for failure include clinical heterogeneity of participants, selection of an inappropriate biological target (i.e., merely reducing amyloid production or aggregation cannot modify disease progression) [5], enrollment of individuals based on unreliable criteria, and inclusion of individuals who did not have increased cerebral Aβ and were unlikely to have had AD pathology [6].
To address some of these limitations, the new NIA-AA Research Framework has proposed to use biomarkers of Aβ deposition, pathologic tau, and neurodegeneration [AT(N)] to diagnose AD and decrease heterogeneity in research study samples. Similarly, more recent clinical trials have used biomarkers of amyloid status measured in cerebrospinal fluid (CSF) or in the brain using positron emission tomography (PET) [7]. While amyloid PET is considered non-invasive, and may be more reliable than CSF biomarkers [8], its utility in both research and clinical practice has been limited. Factors that have prevented widespread use of PET imaging in research and practice include availability, economic factors (high costs, not being covered by insurance), and patient or caregiver’s concerns (safety, burden, tolerance, and radiation exposure) [9].
Recruitment of eligible amnestic mild cognitive impairment (aMCI) patients is a major bottleneck in conducting secondary prevention trials; as few as 10–20% of potential MCI patients are actually trial-eligible [10]. In addition, only 40–60% of aMCI patients are likely to be Aβ positive based on the current gold standard of amyloid PET, which further limits the number of trial-eligible individuals [11]. Without using any predictive models, to establish Aβ positivity, all enrolled participants (based on initial clinical diagnostic criteria) require amyloid PET imaging at the time of screening. Therefore, predicting Aβ positivity prior to PET imaging can decrease unnecessary patient burden and costs of running the trials.
In addition, were a treatment to become available for the prevention of AD in persons with aMCI, implementation in clinical practice might be difficult. Amyloid PET would be an expensive option for identifying individuals eligible for treatment. One option might be to develop and use risk prediction models and screening algorithms similar to what has been used in cardiovascular disease [12] or various types of cancer [13, 14]. Using this approach, data gathered at lower cost (e.g., neurocognitive tests and MRIs) could be used to predict Aβ positivity. Amyloid PET would be performed in a selected subgroup of individuals predicted to have a positive amyloid scan. Machine learning (ML) techniques provide a promising method for predicting amyloid positivity. These approaches are specifically designed to predict outcomes and provide a feasible approach for exploiting and managing complex and high-dimensional data [15, 16]. Developing practical predictive models can drive a major shift in clinical care and for both primary and secondary prevention purposes [17–21].
The primary goal of this study was to compare the relative sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of different combinations of features (demographics, APOE ɛ4 status, neuropsychological tests, MRI volumetrics, and CSF biomarkers) used in a ML model to predict PET Aβ positivity. The model was developed in a subsample of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) aMCI population and was subsequently validated using an independent sample from the same cohort. Considering that availability and associated burden and costs of each of these measures is different (e.g., MRIs require staying still for long periods and lumbar puncture is an invasive procedure), we evaluated the predictive value of each of the multimodal features separately and jointly.
METHODS
Study design and participants
The data used for this analysis were downloaded from the ADNI database (http://www.adni.loni.usc.edu) in March 2019. The ADNI is an ongoing cohort, which was launched in 2003 as a public–private partnership. The individuals included in the current study were initially recruited as part of ADNI-GO, and ADNI-2 between 2009 and 2013. This study was approved by the Institutional Review Boards (IRB) of all participating institutions. Informed written consent was obtained from all participants at each site.
A total of 369 participants diagnosed with MCI who were enrolled in ADNI-GO, and ADNI-2 were eligible for this study. Eligible individuals completed baseline visit and had MRIs and amyloid PET imaging in the same wave of study. All ADNI participants with the diagnosis of MCI, were diagnosed as having amnestic MCI; this diagnostic classification required Mini-Mental State Examination (MMSE) scores between 24 and 30 (inclusive), a memory complaint, objective memory loss measured by education-adjusted scores on the Wechsler Memory Scale Logical Memory II, a Clinical Dementia Rating (CDR) of 0.5, absence of significant impairment in other cognitive domains, essentially preserved activities of daily living, and absence of dementia. Participants whose scans failed to meet quality control or had unsuccessful image analysis were excluded from this study.
Study measures
Neuropsychological data
Neuropsychological (NP) tests included the MMSE, the 11-item Alzheimer’s Disease Assessment Scale cognitive subscale (ADAS-cog), and Logical Memory II [22–24]. These tests were available for all participants in ADNI studies from the beginning of cohort and therefore, they were not a limiting factor for inclusion of participants in this study. All NP measures were entered into models as continuous variables.
APOE gene status
APOE ɛ4 allele (ApoE4) frequency was available for all participants included in this study. ApoE4 status was defined as ApoE4-negative (–) if they carried no ApoE4 allele or ApoE4-positive (+) if they carried at least one ApoE4 allele.
MRI acquisition and preprocessing
MRIs were obtained across different sites of the ADNI study with a unified protocol (For more information, please see http://adni.loni.usc.edu/). MRI data were automatically processed using the FreeSurfer software package (available at http://surfer.nmr.mgh.harvard.edu/) by the Schuff and Tosun laboratory at the University of California-San Francisco as part of the ADNI shared data-set. FreeSurfer methods for identifying and calculation of regional brain volume are previously described in detail [25]. Volumes of 47 regions of interests (ROIs), derived from FreeSurfer software, were used as MRI indicators. For the purpose of this study, volume of all regions of interest (ROIs) were normalized for total intracranial volume (TICV) and the ratio of ROIs’ volume (ROIv) to TBV [i.e., (ROIv/TICV) x mean whole population ROIv] was used in the analyses and reported throughout the manuscript unless otherwise specified.
PET imaging acquisition and preprocessing
Florbetapir PET images were obtained across different sites of ADNI study with a unified protocol (For more information, please see http://adni.loni.usc.edu/methods/pet-analysis-method/pet-analysis/) Data were processed at the Jagust lab at University of California, Berkeley. Details of the methods used to process PET images have been previously described [26]. In brief, a native-space MRI scan for each subject was segmented and parcellated with FreeSurfer to define cortical grey matter regions of interest (frontal, anterior/posterior cingulate, lateral parietal, lateral temporal) that make up a summary cortical ROI. In addition, five reference regions were created (cerebellar grey matter, whole cerebellum, brainstem/pons, eroded subcortical white matter, and a composite reference region). Subsequently each PET scan was coregistered to the corresponding MRI and the mean Florbetapir uptake within the cortical and reference regions were computed. A Florbetapir SUVR was calculated by averaging across the four cortical regions and dividing this summary ROI by the uptake in the whole cerebellum. To establish Amyloid positivity or negativity, a Florbetapir SUVR cutoff of 1.11 was used as recommended by previous studies [27]. For the purpose of this study, we only used the first Florbetapir PET scan obtained from each participant.
CSF biomarkers
CSF samples were batch processed by the ADNI Biomarker Core at the University of Pennsylvania School of Medicine and CSF tau, p-tau181p, and Aβ1-42 were measured for all participants with CSF sample [28]. These data were available for 335 participants (90.5% of the whole sample) and sections of data analysis that required CSF measures were limited to these participants. CSF measures were included as continuous variables in ML models. However, for the purpose of simplicity, in Table 1 individuals were classified according to CSF concentration thresholds (tau: >93 pg/mL; p-tau181p: >23 pg/mL; Aβ1-42<192 pg/mL) previously established to maximize sensitivity and specificity of autopsy confirmed AD [29].
Demographics and clinical characteristics of study participants according to group
aValues are means±SD unless otherwise stated. bPercentage of individuals carrying at least one E4 allele. CDR-SB, Clinical Dementia Rating scale Sum of Boxes; MMSE, Mini-Mental State Exam; ADAS, Alzheimer’s Disease Assessment Scale; LM, logical memory-delayed recall. Hippocampal volume is reported in cubic centimeter.
Data analysis
Training and validation samples
The training and validation of the ML model was performed by using the split half method. For this purpose, participants were randomly split into two independent samples with approximately equal number of Aβ- and Aβ+ based on PET imaging. One sample was used as training data-set and the other sample was used for validation of models. This validation method enables the generalization of the trained ML model to data that have never been presented to the ML algorithms previously.
Selection of feature-sets (indicators)
Demographics (age, sex, and education), ApoE4 status, NP tests, all available volumetric MRI measures (FreeSurfer outputs), and all CSF biomarkers mentioned above were used as features in the predictive models. We chose 7 different feature-sets and compared the performance of ML models which used these feature-sets for classification. In addition to demographics and ApoE4, models include the following features: Model 1) NP tests; Model 2) MRI volumetrics; Model 3) CSF biomarkers; Model 4) NP tests plus MRI volumetric; Model 5) NP tests plus CSF biomarkers; Model 6) MRI volumetric plus CSF biomarkers; Model 7) NP tests, MRI volumetric plus CSF biomarkers.
Machine learning model
Analysis and computation of ML methods were conducted using MATLAB © (version 2018b). We used Ensemble Linear Discriminant (ELD) ML models for the purpose of classification and pattern recognition. EDL is among the family of classification methods known as ensemble learning, in which the output of an ensemble of simple and low-accuracy classifiers trained on subsets of features are combined (e.g., by weighted average of the individual decisions), so that the resulting ensemble decision rule has a higher accuracy than that obtained by each of the individual classifiers [30, 31]. In this work, we combined linear discriminant functions (i.e., hyperplanes that dichotomize the samples based on subsets of features) in order to construct the ensemble classifier. To avoid overfitting, we trained the models for a maximum of 100 cycles. We monitored the learning curve and picked the cycle with the lowest misclassification rate for termination of the training. The parameters for the models were optimized automatically through the hyperparameter optimization process in MATLAB.
Training the classification model
Data from the training sample (N = 185) were used for training of the classifier (Fig. 1). Models were trained to recognize Aβ- versus Aβ+ individuals using all sets of the features as described above. A 10-fold cross-validation procedure was used in all models for testing validity of the models. Cross-validation is an established statistical method for validating a predictive model, which involves training several parallel models, each based on a subset of the training data. Subsequently, the model performance is evaluated based on the average accuracy in predicting the labels of the omitted portion of the training data [32]. The performance of each model was calculated based on the total percentage of correct classification (accuracy), sensitivity, specificity, PPV, NPV, and area under the curve (AUC).

Study design diagram. aMCI, amnestic mild cognitive impairment; ML, machine learning.
Prediction of amyloid status in the validation sample
Following training of the models, they were applied to the validation sample to predict amyloid positivity of each person (Fig. 1). Using the same feature-sets used for training of models, each individual was assigned to “predicted Aβ-” or “predicted Aβ+” groups. The performance of the predicted outcome was evaluated using the results obtained from PET imaging. Accuracy, sensitivity, specificity, PPV, and NPV for each model were estimated.
Inverse cross-validation
To further validate the models, we performed an inverse cross-validation by training the ML model using the half-sample that was used for prediction previously and using the half-sample that was used for training as the prediction subset. Considering that results for this analysis was very similar to the initial model (see Supplementary Tables 1 and 2) and to avoid confusion, we primarily focus on the results of the first model for the rest of this article.
Data availability
Data from ADNI cohort is publicly available. Programming codes used for this paper are available upon request.
RESULTS
Sample characteristics
Participants with aMCI had an average age of 71.2 years (SD = 7.2) and 54.5% were men. In both subsamples (training and validation), in comparison with Aβ- subgroup, the Aβ+ subgroup was older and had less favorable performance on NP tests, had smaller hippocampal volumes and had a CSF profile that was more similar to AD. Table 1 summarizes participants’ demographics and clinical characteristics.
Developing the amyloid prediction models in the training subsample
Performance of ELD models using 7 different feature-sets for classification of training sample to Aβ- or Aβ+ on PET is summarized in Table 2. In the training set, the area under the curve (AUC) of models including demographics, ApoE4, and NP tests or MRI volumetrics (models 1 and 2) were 0.74 and 0.72, respectively. The combination of NP with MRI (model 4, AUC = 0.70) did not improve the prediction. AUC of the models including demographics, ApoE4, and CSF markers alone was substantially higher (model 3, AUC = 0.86), however neither addition of NP (model 5, AUC = 0.89) or MRI (model 6, AUC = 0.90) improve the models. The combination of all measures yielded an AUC of 0.90 (model 7).
Performance of Ensemble Linear Discriminant (ELD) classifiers in differentiating Aβ- versus Aβ+ in training set (subsample 1)
Dem, demographics; NP, neuropsychological tests; MRI-v, all MRI volumetrics; CSF-b, all CSF biomarkers.
Performance of the amyloid prediction models in the validation subsample
After development of ELD models, they were applied to the data from validation sample to assign participants to Aβ- or Aβ+ (Table 3). The AUC of models including demographics, ApoE4, and NP tests or MRI volumetrics (models 1 and 2) were 0.72 and 0.71, respectively. AUC of the model including demographics, ApoE4, and CSF markers as features was higher (model 3, AUC = 0.86). Inclusion of both MRI volumetric and NP tests as features in the same model did not make a substantial change in the performance of model in comparison with models including them separately (model 4, AUC = 0.73). Models that included CSF measures (models 3, 5, 6, 7) had substantially better performance in comparison with models that did not include them (see Table 3 for details).
Performance of Ensemble Linear Discriminant (ELD) classifiers in predicting Aβ- versus Aβ+ in validation (test) set (subsample 2)
Dem, demographics; NP, neuropsychological tests; MRI-v, all MRI volumetrics; CSF-b, all CSF biomarkers.
DISCUSSION
In this study, we evaluated the value of machine learning models in predicting amyloid positivity based on florbetapir PET scans. We showed that the positive predictive values of models, which used NP tests, MRI volumetrics, or CSF biomarkers were 0.72, 0.71, and 0.86, respectively. Addition of MRI measures to NP tests in the models did not lead to improvement in the prediction performance. As expected, addition of CSF measures noticeably improved performance of models.
A few studies have previously proposed different types predictive models for detecting cerebral amyloid positivity based on demographics, NP tests, MRI measures, and blood or CSF-based biomarkers [33–38]. For example, Kander et al. [34] reported AUCs of 0.59–0.67 for individual NP tests, AUC of 0.64 using all NP tests, and AUC of 0.64 for hippocampal volume. Similar to our findings, they showed that adding imaging biomarkers to NP tests in the multivariate analysis does not improve the AUC. Palmqvist et al. [36] applied a forward selection logistic regression model to demographics, ApoE4, NP tests, and white matter lesions for prediction of amyloid positivity and achieved AUCs of 0.80–0.82 in ADNI. Kim et al. [35] used similar variables and using logistic regressions, developed a nomogram that achieved predictive AUCs of 0.74–0.77.
A common limitation in the previous studies is that in many cases they have used scores of individual tests or they have relied on data from one or two modalities, which limited investigating the incremental value of combining various modalities. Understanding the joint and separate value of different feature sets are of interest to new clinical trials as it could affect recruitment strategies due to associated cost and burden of each modality. Obtaining demographic info, NP tests and ApoE4 status is relatively easy and inexpensive; however, obtaining and processing MRIs are more burdensome (to both the patient and researcher/clinician) and obtaining CSF biomarkers is difficult considering the invasive nature of lumbar punctures. On the other hand, MRI is routinely obtained both in trials and in practice to identify or exclude structural factors that could contribute to MCI, such as mass lesions or vascular disease. Given that the MRI is part of the evaluation, the incremental cost usually arises from image processing and not image acquisition.
It is important to note that interpretation of the performance of the prediction models (and therefore their effectiveness) should be evaluated based on the clinical or research question and the clinical setting. One setting in which such models could be of use is in a primary care setting for screening, especially when an effective treatment for Aβ+ patients becomes available. In such settings, using models with the highest sensitivity are more suitable. Another setting that these models could be used is for enrichment of AD clinical trials in which Aβ positivity on PET scan is an enrollment criterion. In such cases, amyloid risk models with high PPV are the most desirable models for reducing the number of unnecessary PET scans and decreasing costs and burden of trial. For example, let’s assume a trial design that requires 1000 Aβ+ aMCI participants to be enrolled and Aβ status verified using amyloid PET. Assuming that the aMCI population that participants are selected from are similar to the ADNI cohort, prevalence of Aβ+ individuals with aMCI would be 61.0%. Therefore, without use of any predictive models, 1639 individuals who have passed the initial clinical prescreening should undergo amyloid PET screening to identify 1000 Aβ+ individuals. Using a predictive model incorporating demographics, ApoE4 status, and NP (model 1 in Table 3), can decrease the number of participants to undergo PET scan to 1263 individuals (approximately 23% decrease in number of screening PET scans), and reduce the costs by >2.5 million USD (with an approximate cost of 5000 USD for acquisition and analysis of each PET scan), while concurrently decreasing the number of people undergoing this invasive and time-consuming procedure. This cost-saving calculation is in line with reports of previous studies that have suggested using predictive models to enrich clinical trials [36, 38]. It should be noted that in these studies and in our example above, the costs associated with clinical prescreening and NP testing is either ignored or it is assumed that they are obtained through an online platform at no cost. However, in practice, most clinical trials still require a clinic visit for clinical prescreening and NP testing, which costs approximately $1000 per person in USA (considerably less in Europe [39]). The number needed to screen in a design using amyloid PET predictive models is substantially higher: in the example above, clinical data and NP tests should be obtained from a total of 2193 participants to identify 1263 individuals who are predicted to be amyloid positive based on Model 1. Therefore, the costs of in-person clinical visit can potentially offset the costs of obtaining fewer PET scans. Considering that AD therapy is moving toward using drugs targeting tau or combination therapies (e.g., tau and amyloid), in the long run, such predictive models along with online prescreening tools can substantially decrease the costs of trial while decreasing the number of people undergoing invasive and time-consuming procedures. Additionally, considering the high PPV of models that include CSF biomarkers (>90%), and lower costs of obtaining and analyzing CSF (approximately $1000 in 2019), it might be a reasonable choice to replace amyloid PET data with CSF data when obtaining PET scans is not an option.
A few limitations for this study should be mentioned. First, ADNI is not a population-based study and there are strict inclusion and exclusion criteria for selection of participants, which can affect generalizability of our findings. Therefore, validating these models in other population-based studies and clinical trials’ data is an essential next step. Moreover, the inclusion criteria in ADNI study may further limit the applicability of the findings presented here to a broader range of patients. This study focused on aMCI subjects and it is possible that in a broader selection of MCI population or in individuals with subjective cognitive complaints who do not meet MCI criteria, the models might show different capabilities in prediction of amyloid status. Although we showed that using our models can reduce costs of conducting research trials or clinical practice, it is difficult to estimate the imposed burden of obtaining additional tests (e.g., MRIs, lumbar punctures, etc.) on patients, caregivers, or researchers and clinicians. Ultimately, efficiency of clinical trials depends not just on reducing the cost of amyloid PET scanning but on the identification of persons who will progress in the absence of treatment and who are more likely to respond to treatment. Similar approaches have been used extensively for conducting research in other neurodegenerative disease such as Parkinson’s disease and have shown substantial potential for use. In a subsequent study, we plan to investigate the rate of progression in various groups as identified by predictive models.
To conclude, our results indicate that predictive models can be effectively used to decrease the number of participants who need to undergo amyloid PET scans. This approach can potentially decrease the costs of the trial and also decrease the burden on patients and caregivers who are participating in the trial. By implementing a step-by-step screening (adaptable design) procedure to enroll participants in trials and using validated predictive models, we can reduce the number of screen failures due to biomarker inclusion criteria and associated costs. A similar approach can be used to improve clinical decision-making with the least associated cost and burden for treatment of patients in AD continuum when effective treatments targeted at AD pathology becomes available.
Footnotes
ACKNOWLEDGMENTS
Data collection and sharing for ADNI project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
This work was supported by a grant from the Alzheimer’s Association (Ezzati, 2019-AACSF-641329), and the Leonard and Sylvia Marx Foundation. Authors of this study were supported in part by grants from National Institutes of Health NIA 2 P01 AG03949 (Lipton), NIA 1R01AG039409-01 (Lipton), NIH K01AG054700 (Zammit), and the Czap Foundation (Lipton).
