Abstract
A variety of imaging, neuropsychological, and genetic biomarkers have been suggested as potential biomarkers for the identification of mild cognitive impairment (MCI) in patients who later develop Alzheimer’s disease (AD). Here, we systematically evaluated the most promising combinations of these biomarkers regarding discrimination between stable and converter MCI and reflection of disease staging. Alzheimer’s Disease Neuroimaging Initiative data of AD (n = 144), controls (n = 112), stable (n = 265) and converter (n = 177) MCI, for which apolipoprotein E status, neuropsychological evaluation, and structural, glucose, and amyloid imaging were available, were included in this study. Naïve Bayes classifiers were built on AD and controls data for all possible combinations of these biomarkers, with and without stratification by amyloid status. All classifiers were then applied to the MCI cohorts. We obtained an accuracy of 76% for discrimination between converter and stable MCI with glucose positron emission tomography as a single biomarker. This accuracy increased to about 87% when including further imaging modalities and genetic information. We also identified several biomarker combinations as strong predictors of time to conversion. Use of amyloid validated training data resulted in increased sensitivities and decreased specificities for differentiation between stable and converter MCI when amyloid was included as a biomarker but not for other classifier combinations. Our results indicate that fully independent classifiers built only on AD and controls data and combining imaging, genetic, and/or neuropsychological biomarkers can more reliably discriminate between stable and converter MCI than single modality classifiers. Several biomarker combinations are identified as strongly predictive for the time to conversion to AD.
INTRODUCTION
Alzheimer’s disease (AD) is a complex disorder of deteriorating cognition with multiple known neuropathological mechanisms which include amyloid-β (Aβ) and tau deposition and neurodegeneration. Numerous genetic and nongenetic risk factors of this neuropathology such as apolipoprotein E (APOE) genotype, neuropsychological measures, and in vivo measures of atrophy, glucose utilization, and amyloid depositions have been identified in studies on AD [1–6]. Considering several of these biomarkers have been shown to be a promising way for improving diagnostic accuracy, researchers are now integrating them into the revised diagnostic criteria for AD [7, 8]. However, the understanding on which biomarkers provide an additive value when combined with others is rather limited. This applies even more for the early stages of the disease.
A large proportion of patients with amnestic mild cognitive impairment (MCI) are now considered to represent an early AD stage [7, 9]. A series of studies have been performed with the aim of increasing the diagnostic accuracy in MCI. Whilst most studies have focused on single biomarkers [10–17], multiple studies have also applied machine learning algorithms to compare biomarkers and their combinations with the intent of capturing different aspects of the complex pathophysiology of AD [18–27]. A consistent finding across the multimodal studies is increased accuracy ranging between 60 and 90 % for discrimination between stable mild cognitive impairment (sMCI) and MCI converting to AD (cMCI) when information from different biomarkers is combined. However, none of these studies systematically evaluated the additive value of all of these biomarkers and their combinations in the same MCI population. Furthermore, applications of extensive parameter optimization procedures to increase cross-validation performance might have led to an overestimate of accuracies achievable for new data – a problem that is commonly referred to as overfitting. Accuracies reported when performing a strict separation of training and testing data, which is considered as the gold standard of machine learning, are typically lower, ranging below 80% [20, 28]. A further aspect that has been commonly shown is that classifiers trained on AD and healthy controls can be applied to reliably discriminate between cMCI and sMCI. Another common limitation of most of the above mentioned studies is the use of non-histopathologically validated training cohorts to establish the classifiers. The known limited accuracy of clinical diagnoses may lead to the inclusion of other dementias in the AD groups or of preclinical AD as healthy controls [29, 30]. Both could reduce the capability of the classifiers to discriminate between new AD and control cases. While there are still no sufficiently large histopathologically confirmed datasets available for most of the biomarkers, novel amyloid positron emission tomography tracers provide a close in vivo approximation of the corresponding AD histopathology [31]. Thus, using this information to identify AD and control training cases may further increase accuracies reported for different biomarkers.
A further aspect neglected in previous studies is the sensitivity of identified biomarkers to disease staging. Earlier studies have mostly focused on the categorical question of conversion versus non-conversion, without evaluating if the identified biomarkers also reflect disease staging as indicated, for example by the time to conversion to AD (TTC). This aspect might yet be essential to monitor progression in clinical trials focusing on early disease stages and because potential treatment is considered to be more beneficial for patients when loss of function is not yet strongly advanced.
Given that genetic risk, deterioration of cognition, Aβ deposition, and brain structural and functional biomarkers contribute to the diagnosis of AD, we systematically evaluated the potential of combinations of these factors to accurately stratify the MCI population according to risk of conversion to AD and disease staging. We hypothesize that a combination of biomarkers covering several genetic, behavioral, and neuropathological factors will provide higher sensitivity for early AD detection and disease staging as compared to best performing single modality biomarker. Further, we hypothesize that the use of only amyloid negative healthy controls and amyloid positive AD for training the classifiers will further improve the discrimination accuracies for cMCI and sMCI.
METHODS
Subjects
All available ADNI1, ADNI-GO and ADNI2 (ADNI: Alzheimer’s Disease Neuroimaging Initiative) data as of December 2013, of AD, healthy control subjects (HC), amnestic sMCI and cMCI having APOE genotype and neuropsychological evaluation were included in the study. Additionally, an imaging sub-cohort was identified from these data for which each of the following imaging biomarkers was available for at least one of the time points: Structural magnetic resonance imaging (sMRI), [18F]fluorodeoxyglucose positron emission tomography (FDG-PET) and/or [18F]AV45-PET (florbetapir) (Table 1). To avoid biases in accuracies due to use of different amyloid compounds, we restricted our analyses to AV45-PET as a tracer with greater availability in the ADNI database [32]. For sMCI, an inclusion criterion of at least 2 y of follow-up was applied to ensure stability of the diagnosis over time. For cMCI, all three imaging modalities had to be available prior to or at conversion to AD. The final dataset for APOE and neuropsychology included data of 144 AD, 112 NL, 177 cMCI, and 265 sMCI, with overall 958 observations (number of subjects times number of visits) for MCI and 750 observations for AD and HC (Table 1, Supplementary Material 1).
Diagnosis of AD was based on National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer’s Disease and Related Disorders Association (NINCDS/ADRDA) criteria [33]. Imaging and genetic biomarkers evaluated in our study were not part of criteria used by the ADNI to establish diagnostic labels of MCI or AD. The study was conducted according to the Declaration of Helsinki. Written informed consent was obtained from all participants before protocol-specific procedures were performed.
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies, and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment, can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinicaltrials.
The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California – San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects, but ADNI has been followed by ADNI-GO and ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI, and people with early AD. The follow-up duration of each group is specified in the protocols for ADNI-1, ADNI-2, and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2. For up-to-date information, see www.adni-info.org.
Demographic and neuropsychological measures
Between-group differences in gender across all groups were evaluated using a chi-square test for independent samples. Analyses of variance (p < 0.05) and subsequent post-hoc t-tests (p < 0.05 Bonferroni corrected for multiple comparisons) were applied to evaluate differences in age, education, and neuropsychological scores. The following 6 neuropsychological scores were included into the classification analysis based on their availability for most of the subjects: Mini Mental State Examination (MMSE [34]), Geriatric Depression Scale (GDS [35]), Alzheimer’s Disease Assessment Scale (ADAS [36]), Rey Auditory Verbal Learning Test – immediate and delayed recall (RAVLT immediate and RAVLT [37, 38]) and Functional Activities Questionnaire (FAQ [39]) (Table 1).
Imaging data
The MRI dataset included standard T1-weighted images obtained with different 1.5T and 3T scanner types using a three-dimensional magnetization-prepared rapid gradient-echo sequence varying in repetition time and echo time with in-plane resolution of 1.25 *1.25 mm and 1.2 mm slice thickness. If both 1.5 and 3T data were available for the same time points only the 3T data were used. Overall, there was a significantly higher proportion of 3T data in the AD group in the training data (p < 0.001). There was no significant difference in distribution of scanner types between cMCI and sMCI (p > 0.1). All images were corrected for distortions and B1-field non-uniformities as described on the ADNI website (http://adni.loni.usc.edu/).
FDG-PET and AV45-PET data were downloaded at the most advanced pre-processing stage (excluding smoothing) provided by ADNI. In brief, the pre-processing provided by ADNI included a within subject co-registration and averaging of all PET frames from the same time-point, interpolation to 1.5 mm cubic voxels and global mean intensity normalization. Detailed description of this pre-processing pipeline can be found on the ADNI website (http://adni.loni.usc.edu/methods/pet-analysis/pre-processing/) listed under point 3. Though other intensity normalization procedures have been shown to be more sensitive for differentiation of AD and HC subjects using FDG-PET [40–42], the choice of an optimum reference region is less clear for AV45-PET. To avoid systematic differences in pre-processing between the two PET modalities, we restricted our analyses to global mean intensity normalization. Similarly, correction of PET data for partial volume effects using structural MRIs acquired at the same time points can also improve their sensitivity for AD detection [43–45]. However, appropriate correction for these effects would require structural data acquired at the same time points. Due to a relatively low availability of AV45 and sMRI data for the same time points, applying this correction would have resulted in exclusion of a significant proportion of available PET data. To avoid this data loss, correction for partial volume effects was not applied in this study.
Pre-processing of imaging data
Pre-processing of all imaging data was performed in SPM8 (Wellcome Trust Centre for Neuroimaging, http://www.fil.ion.ucl.ac.uk/spm/) implemented in Matlab 7.12 (MathWorks, Inc, Sherborn, MA, USA). The pre-processing pipeline consisted of co-registration of all imaging modalities for the same visits of each subject, segmentation of sMRI data using NewSegment, spatial normalization using diffeomorphic image registration (DARTEL) [46] with subsequent affine registration into the Montreal Neurological Institute (MNI) space and smoothing with a Gaussian kernel of 8 mm FWHM. To reduce computational time DARTEL template was computed from a random representative subsample of 300 scans. The obtained grey matter images were additionally modulated to preserve the total amount of signal from each region. All further analyses were restricted to a mask obtained by applying a probability threshold of 0.2 to the first and the last DARTEL templates co-registered to the MNI space [47]. To reduce the dimensionality of imaging data for the Bayesian feature selection procedure described below all pre-processed images were downsampled to an isotropic resolution of 6 mm.
Feature selection
To ensure that the features used from the different imaging modalities for subsequent classification are not biased by differences in general characteristics of the cohorts (e.g., demographic factors or disease severity) or image pre-processing (e.g., differential smoothing or spatial normalization) used to identify these features across different studies we adopted our own feature selection approach for the current study. All feature selection steps for imaging data were performed using a subset of AD and HC data. The subset was selected from the whole training dataset and included only the earliest time points for AD and HC for which all three imaging modalities were available (AD: n = 38, HC: n = 93). This selection step was performed to ensure that exactly the same subjects were used to identify the most relevant features across the three imaging modalities. To avoid a bias towards a specific modality (e.g., using only amyloid positive AD and negative HC), all AD and HC patients in this subset were used for feature selection.
Feature selection for imaging data was performed using a Bayesian Markov Blanket approach integrated in the Causal Explorer toolbox implemented in Matlab [48, 49]. In brief, the algorithm identifies features that are relevant for Bayesian separation between AD and HC subjects at a predefined statistical threshold. The setting for continuous data with conditioning set size of 0 was used for feature identification. A full Bonferroni corrected threshold of p < 0.05 was applied to identify most relevant sMRI and FDG-PET features. For AV45-PET this already conservative threshold resulted in a very high number of features covering the whole brain. To reduce the AV45-PET feature set to a comparable size as observed for FDG-PET and sMRI, a Bonferroni corrected threshold of p < 0.000001 was applied. The feature selection procedure resulted in identification of 13 clusters for FDG-PET, 13 clusters for AV45-PET, and 29 clusters for sMRI (Fig. 1, cluster images are provided in Supplementary 2). Mean values extracted from each of the identified clusters were used for subsequent classification. All cluster images will be published on nitrc.org upon acceptance of this manuscript.
Naïve Bayes Classification
We used a Naïve Bayes (NB) classification algorithm, as implemented in Matlab 7.12 to evaluate the predictive accuracy of different genetic, neuropsychological, and imaging biomarkers for differentiation between cMCI and sMCI. In brief, the NB approach provides a probability for each new case to belong to a particular class based on frequencies for categorical and means and standard deviations for continuous features as observed in training data. Similarly to a clinician-based decision, the NB approach is considering all biomarkers as independent evidence for assignment to one of the diagnostic classes. A strong advantage of the NB classifier as compared to most other machine learning algorithms is its capability to deal with sparse, categorical, and continuous data and the posterior probability it provides for each new case to belong to a particular class. As the NB approach does not require any extensive parameter optimization, it also reduces the risk of overfitting the classifier to the training data.
NB classifiers were first built using all available AD and HC data separately for each of the modalities (APOE genotype, neuropsychological scores, AV45-PET, FDG-PET, and sMRI). In a further analysis, NB classifiers were then built for all possible combinations of imaging biomarkers with APOE genotype and neuropsychological profiles. For all classifiers, equal prior probability was set for AD and HC classes to avoid the risk that the classifier is biased by the differential numbers of training cases per class.
The obtained NB classifiers were then applied to MCI data having the same biomarker constellations. APOE genotype was treated as a categorical variable, while all other measurements were treated as continuous. Applying the NB classifiers to the MCI data resulted in one set of predicted labels for each biomarker constellation for the MCI subset having the corresponding biomarker measures. An assignment of sMCI as HC and of cMCI as AD was considered as correct. Balanced accuracies ((sensitivity+specificity)/2), sensitivities, specificities, receiver operating characteristics (ROC) curves, and the area under the curve (AUC) were computed based on predicted labels by each NB output. To evaluate the prognostic values of each biomarker combination for cMCI and because neuropsychological information is used to establish the AD diagnosis therewith inducing circularity in the classification problem, all metrics for this group were computed separately for biomarker data acquired before and at conversion to AD. Further, to test if the obtained balanced accuracies were significantly above chance, we ran permutation statistics (1000 permutations) for each biomarker constellation randomly shuffling the stable and converter MCI labels to the biomarker data and then computed balanced accuracy for each permutation. We then computed z-scores and corresponding p-values for the balanced accuracies obtained on real data relative to means and standard deviations obtained in randomly permuted data.
As AV45-PET was only added in ADNI2, the average time to conversion for these data was significantly lower. To control for this, we recomputed all sensitivity metrics for the testing data after matching them for TTC. Although the NB classifier is considered to be relatively robust regarding the number of training data, we aimed to exclude potential biases caused by these differences. For this reason, we repeated all training and classification procedures with the same, randomly drawn number of training cases as available for the biomarker constellation with the lowest number of cases.
Histopathological evaluation is still considered the gold standard for AD diagnosis. Thus, stratifying training data based on an in vivo biomarker of histopathology might improve its performance for early AD detection. Considering AV45-PET information as its in vivo approximation, we evaluated this possibility of using only data of amyloid positive AD and amyloid negative HC to train the classifiers. For these analyses, a previously reported threshold of 1.1 was applied to the mean AV45-PET standard uptake value ratio extracted from the selected clusters in the training dataset including only HC with a mean amyloid load below and AD patients with a mean above this threshold [50]. Applying this threshold resulted in an average exclusion of about 25% of the training data. Differences in accuracies, sensitivities, and specificities obtained using all versus amyloid thresholded data were evaluated using one-sample t-tests (p < 0.05 Bonferroni corrected for multiple comparisons) assuming no differences between the classifiers. As classification based on AV45-PET data might be differentially biased by application of an amyloid threshold for selection of training data, one-sample t-tests were performed separately for classifiers with and without this biomarker.
To illustrate the contribution of the APOE genotype, both the training and the testing dataset were stratified by the APOE allele combinations computing the relative proportion of either AD or cMCI in the respective populations. Lastly, we evaluated the possibility to use all biomarker combinations to predict the time to AD diagnosis as an index of future cognitive decline. For this we computed regression analyses to predict TTC using z-transformed probabilities provided by the NB classifiers for cMCI data for each classifier and using only biomarker data acquired before conversion to AD. To provide a more quantitative metric of the predictive power of each classifier for TTC, we reported Pearson’s correlation coefficients between observed TTC values and those predicted by the regression analyses. A squared Pearson’s correlation coefficient (determination coefficient) provides the percentage of variance explained in the target variable by the variables used as predictors.
To enable a clearer interpretation of all above mentioned analyses, we have further ranked all biomarker constellations by sensitivities matched for TTC, specificities and correlations with observed TTC. All biomarker combinations were then sorted by the average rank of these three metrics.
RESULTS
Demographic and neuropsychological results
The groups did not differ with respect to age and gender (Table 1). There was a significant difference in education. post-hoc t-tests revealed differences in education only between AD and HC (t(254) = 3.5; p = 0.001) but not between cMCI and sMCI (t(440) = 0.2; p = 0.875). Comparisons of neuropsychological scores revealed significant between-group differences in all six measures (Table 1). Subsequent post-hoc t-tests revealed significant differences between AD and HC (all p < 0.001) in all measures. When comparing cMCI and sMCI all measures except for GDS (p = 1.0) were also significantly different.
Classification results
Classification results for all biomarker combinations are displayed in Table 2 and Fig. 2. All classifiers performed significantly above chance level for differentiating between sMCI and cMCI (all p < 0.01). For single biomarkers, highest balanced accuracy (74.5% ), specificity (83.9% ), and AUC (0.824) were obtained using FDG-PET only (Fig. 3a). In contrast, highest sensitivity of 85.4% but on cost of a very low specificity (52.4% ) was obtained using a classifier based on neuropsychological scores. Adjustment for TTC resulted in an even higher sensitivity of 92.9% for this classifier (Table 2). A strong increase in sensitivity when adjusting for TTC was also observed for sMRI. Lowest single modality classifier performance with a balanced accuracy of 59.5% was obtained for APOE followed by AV45-PET with 63.5% . At time of conversion, highest sensitivity of 100% for single biomarkers was obtained for neuropsychological scores followed by FDG-PET with 90% being the most sensitive imagingbiomarker.
When evaluating all possible combinations of imaging biomarkers with APOE and neuropsychological information, highest balanced accuracy of 85% and highest sensitivity of 100% were obtained for the combination of AV45-PET and sMRI with neuropsychological scores (Table 2, Fig. 3b). In contrast, when adjusting the testing data for TTC, 86.8% was the highest accuracy observed for the combination of APOE, FDG-PET, and sMRI. This combination also showed the highest specificity of 86.1% and an AUC of 0.84. The constellation of biomarkers providing the lowest balanced accuracy was with 60.6% the combination of APOE with AV45- and FDG-PET. All classifier results were comparable when matching for size of the training cohorts (Table 3).
The use of only amyloid negative controls and amyloid positive AD did not significantly change accuracies [t(16) = –2.024; p = 0.36], sensitivities [t(16) = –2.083; p = 1.0], and specificities [t(16) = –2.083; p = 0.3240] for classifiers that did not include AV45-PET (Table 4). The only strong change for these classifiers was observed for the combination of neuropsychological profiles with FDG-PET and sMRI information for which the sensitivity increased by 10% whilst the specificity decreased by 14% . In contrast, significant changes with average sensitivity increases by 7% [t(17) = –4.244; p = 0.006] and specificity decreases by 12% [t(17) = –7.965; p < 0.001] were observed for biomarker combinations that included AV45-PET. Differences in accuracies, though on average lower for amyloid thresholded data, were not significant [t(17) = –1.940; p = 0.42].
When using z-transformed NB probabilities as predictors for TTC, several biomarker combinations showed a significant association with TTC. The strongest and significant correlations with TTC of r = 0.65 (p < 0.001) were observed when using classifier predictions based on neuropsychological scores and either FDG-PET or sMRI (Table 2, Fig. 4b, c).For output of single modality classifiers, the strongest and only significant correlation with TTC (r =0.41) was observed for neuropsychological profiles(Fig. 4a).
DISCUSSION
In this study we demonstrated that a fully independent classifier built only on AD and HC data, which includes imaging, genetic and neuropsychological biomarkers, can reliably discriminate between sMCI and cMCI outperforming previously reported accuracies. We show that combinations of biomarkers reflecting several pathophysiological mechanisms, genetics and cognition provide greatest sensitivities in the MCI population. Further, we identify biomarker combinations providing very accurate estimations of TTC as an indicator of future disease progression. By controlling all of the evaluated combinations for potential differences in TTC and size of the training data we additionally account for some known aspects which might have biased the observed accuracies. In the single biomarker setting, highest sensitivity and strongest association with disease staging is found for neuropsychological information. In contrast, highest specificity and the overall accuracy are achieved by FDG-PET.
By controlling for TTC and combining APOE with structural and glucose imaging, we obtain an accuracy of about 87% for differentiation between cMCI and sMCI, outperforming all other combinations evaluated in our study. The observed accuracy also substantially outperforms most previously reported accuracies for this discrimination problem [10, 51–57]. The improved discrimination when adding APOE to both imaging modalities can be explained by its known strong positive and negative predictive value for particular allele combinations as illustrated in Fig. 5 [2, 58–60]. Furthermore, these results demonstrate that a known genetic risk factor combined with neuropsychological information and two in vivo measures of neuropathological mechanisms like brain atrophy and neurodegeneration better predict the final phenotype of conversion. Importantly, as compared to most previous studies the high accuracy was achieved using fully independent training and testing data therewith reducing the risk of overfitting. Though differential sensitivities of different combinations of imaging, genetic, and neuropsychological biomarkers for discrimination between cMCI and sMCI were repeatedly reported in previous studies, these estimates were mostly computed in different MCI subpopulations, e.g., not each patient had each imaging biomarker. This aspect limits the comparability of accuracies of different imaging biomarker combinations due to potential differences in diagnostics, training sets, or other between-group differences in clinical or demographic characteristics across the MCI populations included for different modalities. By evaluating all imaging biomarkers in the same MCI subjects we account for these potential biases. We identify FDG-PET as the most accurate single modality biomarker differentiating between cMCI and sMCI. This finding is consistent with conclusions of a recent comprehensive meta-analysis reporting higher accuracies for FDG-PET as compared to other imaging and clinical biomarkers to detect AD related pathology [61].
Also consistently with previous studies, we find that a combination of FDG-PET and sMRI results in a substantially improved accuracy for early AD detection [47, 62–64]. Adding APOE genotype to this combination further increases the observed accuracy. This combination also results in the highest specificity of 86% . We observed a similarly high accuracy for the combination of neuropsychological profiles with AV45-PET and sMRI. However, this good performance is mostly driven by a very high sensitivity whilst the specificity is comparably low. Correspondingly, these two combinations of biomarkers might provide alternative enrichment strategies for clinical trials where high sensitivity or specificity might be prioritized.
Most importantly, for the first time we identified biomarker combinations which not only allow a very accurate discrimination of cMCI and sMCI but are also strongly predictive at an individual subject’s level to the future cognitive decline as indicated by TTC. In a single biomarker setting only neuropsychological scores are a significant predictor of future disease as indicated by a 0.4 correlation with TTC. However, combining these with either FDG-PET or sMRI increases the explained variance in TTC to above 40% (squared Pearson correlation coefficients). These findings suggest that both biomarker combinations are highly sensitive to future disease progression. This aspect might be particularly important in clinical trials aiming to identify MCI patients and earlier and/or more homogeneous disease stages. Though many previous studies focused on identification of biomarker combinations that increase the risk of conversion to AD, only few of the studies so far also evaluated the link between identified biomarkers and TTC [22, 53]. By focusing on hazard ratios these studies identified risk factors associated with TTC at group level. These factors do not yet necessarily allow an accurate prediction of progression for individual patients. Furthermore, none of the above mentioned studies performed an exhaustive comparison of different imaging, genetic and neuropsychological biomarker to identify constellations that are most sensitive to TTC. The strongly significant association identified in our study for the combinations of neuropsychological scores with either FDG-PET or sMRI information indicates a high potential of these modality combinations to provide prognostic information for individual MCI patients. Beside this already highly important information for the patients, the established relationships can be also applied in clinical trials to identify MCI patients at particular disease stages. Considering that several promising phase III studies targeting mechanisms in AD have recently failed [65–67] with post-hoc analyses of these failures suggesting that the inclusion of AD patients at quite advanced disease stages might partially explain the lack of observed treatment effects [68]. The identified biomarker combinations might provide a sensitive stratification mechanism to identify patients at earlier disease stages in future clinical studies. More recent AD trials therefore aim to focus on more prodromal AD stages as the primary intervention window [69]. Crucial for their success might be therefore the capability to accurately diagnose AD at its early disease stages. Contrary to our prior expectations of getting more accurate classifiers when using only amyloid positive AD and amyloid negative HC to train the classifiers, no benefit in terms of accuracies is observed for discrimination of cMCI and sMCI. Interestingly, we see a strong dissociation between classifiers with and without AV45-PET in terms of obtained sensitivities and specificities. Whilst no major changes are observed for classifiers not including AV45-PET, we see a strong shift towards an increased sensitivity and decreased specificity in classifier combinations including this biomarker. Considering that previous epidemiological studies have shown that amnestic sMCI as included in the ADNI are at high risk of conversion to AD when followed for a period of up to 10 years [9, 70], the assignment of a higher percentage of sMCI patients as AD might in fact more closely reflect the true differentiation between AD and non-AD MCI than the criteria of a stable follow-up of two years we apply for sMCI. However, these considerations remain speculative until sMCI populations with longer follow-up than the one included in our study become available.
Lastly, we observed even reduced accuracy as compared to single biomarkers when AV45- and FDG-PET information is combined. This finding suggests a low consistency between these two imaging modalities in the evaluated MCI population. A potential reason for this might be that amyloid depositions are rather dissociated from disease progression as reflected by functional imaging markers. The lack of clinical benefits in pharmacological trials aiming at amyloid clearance despite successful reductions of those depositions supports this assumption [71–73].
Even though in the present study we aimed to account for most potential limitations and biases common to these types of studies, several limitations still need to be considered prior to interpretation of the reported findings. First of all, in our effort to identify a homogeneous subpopulation of the ADNI cohort having the constellation of all biomarkers included in our study, we had to discard a large amount of data available in the ADNI dataset. In particular for the constellations of biomarkers including AV45-PET and at the time point of conversion applying these filtering criteria resulted in a very low and varying number of MCI testing cases depending on the biomarker constellation. The data loss is mostly due to the fact that AV45-PET was only included in the ADNI-GO and 2 and to sparse acquisition of some of the imaging measures. For this reason, we limited our discussion of accuracies obtained for data at the time point of conversion as they need to be validated in samples that are significantly larger than evaluated in the current study. Correspondingly, the low numbers of testing data need to be considered when interpreting sensitivities obtained using the affected combinations. For the reason of varying and small numbers of testing cases for each biomarker constellation we also did not directly compare the classifier performances to each other but only to chance level performance. This formal testing needs to be performed when a sufficiently large amount of data becomes available, covering all of the studied modalities in exactly the same MCI population. A second limitation of our study is related to the pre-processing pipelines applied for imaging data. Numerous studies including our own previous work have provided evidence that particular pre-processing steps omitted in our study, e.g., partial volume effect correction or adjustment for age-related effects, can further improve the sensitivity of the single imaging modalities for discrimination between AD and HC or cMCI and sMCI [40, 74]. Due to the high sparsity of the available imaging data, applying these pre-processing steps would have resulted in further exclusion of a substantial amount of imaging data eventually leading to a very limited sample size of the training and testing datasets. As demonstrated by earlier studies cited above, having more optimal pre-processing pipelines should further increase the accuracies observed here for the single biomarker classifiers. Therefore, if anything, our results are likely to underestimate the achievable accuracies.
A further limitation of our study is the pre-selection of neuropsychological and clinical tests used in our study to differentiate between stable and converter MCI. Our major motivation to do a pre-selection of tests from the extensive test battery included in the ADNI was to cover major domains affected in AD with a reasonable number of tests that could be integrated in a standard clinical setting. However, inclusion of other neuropsychological and functional measures might further increase accuracies achievable with these types of biomarkers.
Lastly, though we validated the obtained classifier using fully independent testing data, the reported classifier performances remain limited to the ADNI dataset with its restrictive inclusion and exclusion criteria. They are therefore likely to overestimate accuracies achievable in a standard clinical setting in the presence of other possible dementia syndromes [75].
To summarize, in our study we provide strong evidence that fully automated classifiers based on combination of imaging, genetic, and/or neuropsychological biomarkers can reliably and very accurately discriminate between stable and converter MCI. Further, we demonstrate the high sensitivity of the some of the identified biomarker combinations to future disease progression as indicating by the time to conversion to AD. The result of our study further confirms the high degree of pathological and clinical heterogeneity of AD [76], thus suggesting that the combined use of genetic and imaging and neuropsychological biomarkers in the framework of endophenotypes for this disorder could increase the power of identifying individuals at risk for conversion. Notably, these biomarker combinations could be used for enrichment of clinical trials to identify MCI patients at earlier AD stages.
Footnotes
ACKNOWLEDGMENTS
Juergen Dukart, Fabio Sambataro, and Alessandro Bertolino are full-time employees of F.Hoffmann-La Roche, Basel, Switzerland. The authors received no specific funding for this work. F.Hoffmann-La Roche provided support in the form of salary for authors Juergen Dukar, Fabio Sambataro and Alessandro Bertolino, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec, Inc.; Bristol-Myers Squibb Company; Eisai, Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc, Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Rev December 5, 2013 Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
