Abstract
Keywords
INTRODUCTION
Alzheimer’s disease (AD) is a progressive neurodegenerative disorder and the most prevalent cause of dementia. Cognitive decline in AD starts many years before diagnosis and accelerates with disease progression [1]. Currently there are no available treatments for modifying the course of the disease, although several clinical trials are underway [2–4]. Mild cognitive impairment (MCI) can be a transitional stage from asymptomatic preclinical AD to clinical dementia due to AD. Incidence of progression to AD in individuals with MCI is estimated to be between 10–15% per year [5]. But not all MCI subjects develop AD; some may remain cognitively stable for many years and others may experience improvement in cognition [6]. It is extremely important to develop criteria that can be used to separate the MCI subjects at imminent risk of conversion to an AD diagnosis from those who will remain stable so that disease modifying treatments, when they become available, can be applied as early as possible and also for reducing the cost of future clinical trials by targeting individuals with MCI with AD as the underlying cause.
In assessing the risk of conversion from MCI to AD multiple variables and factors can be considered including age, sex, ethnicity, family history, genetic information, brain imaging data, cerebrospinal fluid (CSF) biomarkers, clinical information, and cognitive test scores. In addition, the progressive nature of AD means that the rate of change of some of these variables with time can be informative for diagnosis. The complex interactions between these variables make it very difficult for humans to process this information for prediction purposes. For this reason, researchers have been developing advanced pattern recognition algorithms to teach computers to predict incipient Alzheimer-type dementia in subjects with MCI with varying degrees of success [5, 7–14]. A recent review of literature with regard to multivariate analysis and machine learning in AD research is given by Falahati et al. [15].
We have recently developed a novel measure of hippocampal volumetric integrity (HVI) based on structural MRI [16]. The HVI is an estimate of the fraction of brain parenchyma in a predefined adaptive hippocampal region of interest (ROI). Lower values of this normalized measure indicate higher hippocampal atrophy. The advantages of HVI are that it takes less than one minute to compute bilaterally and fully automatically, is obtained from raw MRI volumes without requiring any preprocessing, and does not require correction for intracranial volume. Using HVI and its rate of change with respect to time as features in a linear support vector machine (SVM) algorithm achieved 97% accuracy in separating healthy controls from AD patients in an independent cohort [16]. The main objective of this paper is to determine the efficacy of HVI as a feature in the more clinically interesting problem of classification between MCI subjects who would remain stable (stable MCI or sMCI) and those who would progress to develop AD (progressive MCI or pMCI).
The rate of progression of neurodegeneration in AD can only be measured longitudinally and may enable more specific and sensitive characterizations of the disease. For example, there is evidence that the trajectory of cognitive performance may be a better indicator of an imminent AD diagnosis than the level of cognitive performance at a given time [17]. Despite this, the majority of previous approaches for predicting AD in MCI have been based on static measures, for example, structural MRI at baseline only [5, 7–14]. Therefore, in this paper we use longitudinal structural MRI and neuropsychiatric tests for sMCI versus pMCI classification.
The Random Forest machine learning algorithm [18] is a powerful technique that has not been as widely applied to AD diagnosis as other algorithms [15]. In a comparison of 10 supervised classification algorithms based on 8 different performance measures and application to 11 binary classification problems, the Random Forest method was a close second best performing method [19]. Recent applications of Random Forest classification to the problem of MCI-to-AD prediction have been promising[13, 14].
In the present study, we applied a Random Forest classification algorithm using longitudinal measurement of HVI, a novel index of hippocampal volumetric integrity derived from structural MRI, cognitive test scores, and demographic and genetic information in order to predict whether or not a patient with MCI is at imminent risk of incipient AD. Acronyms used in this paper are defined in Table 1.
MATERIALS AND METHODS
Study subjects
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see http://www.adni-info.org.
The participants in this study were MCI subjects in the “Complete year 1 visits” ADNI-1 standardized dataset [20]. The subjects’ supplementary data were accessed in October 2013. There are 311 MCI subjects in this cohort. To be classified as MCI in ADNI-1, subjects had Mini-Mental State Examination (MMSE) scores between 24–30 (inclusive), a memory complaint, objective memory loss measured by education adjusted scores on Wechsler Memory Scale Logical Memory II, a Clinical Dementia Rating (CDR) of 0.5, absence of significant levels of impairment in other cognitive domains, essentially preserved activities of daily living, and an absence of dementia.
We further classified the MCI subjects into two groups. Those whose diagnoses were MCI at baseline and remained so for the duration of their observation period of at least 3 years were labeled as stable MCI (sMCI). Those whose diagnoses were MCI at baseline and at one-year follow-up but were subsequently diagnosed as AD some time during their remaining observation period were labeled as progressive MCI (pMCI). At the time of their probable AD diagnosis, the pMCI subjects had MMSE scores between 20–26 (inclusive), CDR of 0.5 or 1.0, and met NINCDS-ADRDA criteria for probable AD. Based on these definitions, our cohort was reduced to 78 sMCI (54 men and 24 women) and 86 pMCI (55 men and 31 women) subjects.
Each of the 164 subjects underwent two structural MRI scans. One scan was conducted at baseline and a second follow-up scan at approximately one year later. In addition, the subjects underwent two neuropsychiatric test batteries, again one set of tests given near the time of the baseline MRI scan and repeated at approximately one year later near the time of the follow-up MRI scan. More information about the MRI scans and the neuropsychiatric test battery is given in the following two subsections.
MRI data
We downloaded baseline and nominal one-year follow-up 3D MRI volumes for each of the 164 subjects in our study (328 volumes in total). The volumes were raw (as acquired) with no preprocessing performed. The mean (±SD) baseline to follow-up interval was 1.00 (±0.05) year. All MRI volumes were acquired on 1.5 Tesla scanners using T1-weighted magnetization prepared rapid gradient echo pulse sequences with matrix sizes 192×192×160–170 or 256×256×166–184, in-plane voxel resolutions 0.94 to 1.25 mm, 1.2 mm slice thickness, repetition times 2300–2400 ms for multi-coil phased array head coils and 3000 ms for birdcage or volume coils, inversion time 1000 ms, and 8° flip angle. More details about the MRI acquisition protocol are given in [21].
Neuropsychiatric tests
We also downloaded results of longitudinal neuropsychiatric tests. A set of tests was given near the time of the baseline MRI scan and repeated approximately one year later near the time of the follow-up MRI scan. The test battery included the following: (1) The MMSE [22] ranging from 0 to 30 where score of 20 to 24 suggests mild dementia, 13 to 20 suggests moderate dementia, and less than 12 indicates severe dementia; (2) The Clinical Dementia Rating Sum of Boxes (CDRSB) [23, 24], the sum of scores in each of six domains of functioning: memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care, where the score in each domain ranges from 0 to 3. Thus, the CDRSB ranges from 0 (no impairment) to 18 (severe impairment in all six domains); (3) The 11-item Alzheimer’s Disease Assessment Scale-cognitive subscale (ADAS-Cog) [25] which ranges from 0 to 70 with higher scores reflecting greater cognitive impairment; and (4) The modified ADAS-Cog 13-item scale [26] which adds to ADAS-Cog a number cancellation task and a delayed free recall task, for a total of 85 points where higher scores indicate greater impairment.
Hippocampal volumetric integrity (HVI)
Details of the algorithm for HVI computation are given in [16]. Briefly, HVI is the fraction of the volume of a region that is expected to encompass the hippocampus in a normal brain that is occupied by tissue (rather than CSF). The fully automated, fast, reliable and robust process is based on 3D T1-weighted structural MRI and involves identification of the mid-sagittal plane [27] and the anterior and posterior commissures [28] on the MRI scan, from which a rigid-body transformation is performed to a standard orientation. Once in standard space, based on a priori training, 230 landmarks in the vicinity of the hippocampi are detected by template matching from which two (one for each hemisphere) 12-parameter affine transformations are computed. The composite (rigid-body + affine) transformations are applied to probabilistic left and right hippocampi labels determined based on manual tracings of hippocampi on scans from 65 normal subjects. Thus, a volume is determined (separately for each hemisphere) that is expected to encompass the hippocampus in a normal brain. Finally, an automated histogram analysis method using the expectation maximization (EM) algorithm is used to determine the partial fraction of this region that is occupied by brain tissue (rather than CSF). The ratio is termed the HVI. The histogram analysis method is illustrated in Fig. 1. The purpose of the histogram analysis is to determine a CSF intensity threshold I CSF . For this purpose, a Gaussian mixture model with 5 terms is fitted to the histogram of the voxel intensities in the hippocampus ROI using the EM algorithm. In Fig. 1, the voxel intensities histogram is given by the jagged line where as the EM fit is given by the smooth thicker line. The (1 - α) th percentile value of the histogram is denoted by Iα (default α = 0.25). A gray matter intensity peak I gm is found as the peak of the EM fit of the histogram in the intensity region [cIα, Iα] (default c = 0.4); and I CSF is defined as I CSF = I gm - γIα with default γ = 0.2. The HVI is essentially the area under the histogram curve for voxel intensities above I CSF .
We computed the HVI for the right and left hippocampi on the baseline and followup MRI scans for all 164 subjects. Software (KAIBA) for computing the HVI is available online at www.nitrc.org/projects/art.
Feature space for machine learning
In machine learning, observations obtained from objects or individuals (in this application the MCI patients) to be classified into groups are referred to as features. For each individual, the features are collected into a vector referred to as the feature vector. The dimension of this vector, which we denote by d, equals to the number of observations collected from each subject. The d-dimensional feature vectors can be thought of as single points in a d-dimensional abstract space which is referred to as the feature space. The aim of machine learning algorithms is to define boundaries in the feature space, referred to as decision boundaries, that best separate the individuals into the defined classes (in this application sMCI or pMCI). In this work, we use the notation to describe the feature vector for a given subject i. In this section we summarize the d features that comprise our featurevectors.
In total, for classification purposes we used d = 16 features measured from each MCI subject as shown in Table 2. Features 1–4 (i.e., ) were subjects’ age, years of education, number of apolipoprotein E ɛ4 alleles (APOE4), and sex. The APOE4 was defined as a categorical variable with three levels (0, 1, and 2) that indicates the number of ɛ4 alleles carried by the subject. Features 5 and 6 were the average MMSE [i.e., (MMSE@Baseline + MMSE@Followup)/2] and the average rate of change of MMSE with respect to time from baseline to followup, denoted by ΔMMSE/Δt and given by:
To summarize, from each of the 164 MCI subjects in the study, we obtained values for the 16 variables described in the previous paragraph. The variables age, years of education, APOE4 status, sex, MMSE, CDRSB, ADAS11, and ADAS13 were downloaded directly from ADNI. The baseline and follow-up values of the left and right HVI were computed by KAIBA using baseline and one-year follow-up MRI scans downloaded from ADNI. Our aim was to train a machine learning algorithm that uses these 16 measures, that is, the feature vector to predict whether an MCI subject would remain stable or would convert to AD. For this purpose, we used the Random Forest algorithm described in the following section.
Random Forest algorithm
In this paper, we utilize the Random Forest algorithm [18], implemented in the randomForest R package [29], as a supervised binary (sMCI versus pMCI) classification algorithm. In this context, a dataset D ={ (x(1), y(1)) , (x(2), y(2)) , …, (x(n), y(n)) } is given where n is the number of subjects, y(i)∈ { 0, 1 } represents the subject label (e.g., 0 for sMCI and 1 for pMCI), and x(i) are d-dimensional feature vectors measured from each subject. In this study, the dimension of the feature space d is 16. In general, the features can be numerical variables (e.g., age) or categorical (e.g., sex or APOE4 genotype). A supervised classifier essentially uses the training data D to define a function y = f (x ; D) which would assign a label y to a subject with a feature vector x for whom the true label is unknown at the time of classification. The decision tree approach is a classical supervised method for defining f (x ; D) (i.e., training a classifier using D). However, a single decision tree classifier may have a large variance, that is, a small change in the feature vector x could change the assigned label y. To overcome this instability, the Random Forest algorithm [18] uses the method of bootstrap aggregation (also known as bagging) where instead of a single decision tree, an ensemble of B trees (B = 5000 in our application) are trained, hence the term forest. In this method, the decision function of each tree, denoted by y b = f (x ; D b ) (b = 1, 2, …, B), is trained using a bootstrap sample D b from the complete dataset D (i.e., D b is obtained by sampling from D with replacement), where the size of D b is also n. Another element of the Random Forest method is that for training a decision tree at any given node of the tree, that is choosing a variable and a corresponding threshold value to guide the decision flow at that node, the entire feature space is not used. Rather, a random subspace of size m < d of the feature space is first selected by sampling without replacement from {1, 2, … , d } and then the node variable is selected from within this subspace for building the decision tree at the given node. The size of the random subspace (m) is a parameter of the algorithm for training the Random Forest classifier, with the default value of . In summary, the Random Forest method trains an ensemble of B decision trees using bagging and random subspaces. Given a feature vector x with an unknown label, classifications {f (x ; D1) , f (x ; D2) , … , f (x ; D B ) } are made by all B trees in the forest and a final label y is estimated by aggregating the results, usually by a majority vote.
Out-of-bag (OOB) estimation of classification accuracy
Since in the Random Forest method, each tree is built using a bootstrap sample D b from the complete dataset D, on the average about 37% of the training samples are not used in the process of building any given tree. These are referred to as the out-of-bag (OOB) samples and can be used to estimate the generalization error of the Random Forest. Consider a given sample (x(i), y(i)), bootstrap sampling causes this to be an OOB sample in approximately 0.37B trees. In OOB estimation of classification accuracy, the 0.37B trees for which the sample is considered OOB are used to classify the sample. Let us denote the OOB estimate of the label for the sample by . This is then compared with the known label of the sample, that is, y(i). Since every sample in the entire dataset D is OOB for a subset of trees of similar number, the process can be repeated for all n samples in D. Then by comparing the OOB estimates with the known labels {y(1), y(2), … , y(n) } one can obtain the OOB estimate of the accuracy, which is simply the percentage of samples for which match y(i) . Empirical evidence [30] suggests that the OOB estimate of the true accuracy is as accurate as using an independent test set of size n for estimating the true accuracy. Therefore, using the OOB estimate removes the need for a separate test set.
Assessment of variables for classification
We used the mean reduction of Gini impurity index as a measure of variable importance for classification [30]. During training of a decision tree, whenever a variable is used to split a parent node into two descendent nodes, the sum of the Gini impurity indices of the descendent nodes is smaller than the Gini impurity index of the parent node. The greater the difference the better the variable has performed in splitting the cases into pure classes. The mean amount of reduction over all nodes in the forest where a given variable is used is taken an indication of the importance of that variable for classification.
RESULTS
Observation period
The mean±SD of the observation period for the sMCI group (n = 78) was 5.88±2.48 years with a minimum of 3 years starting from the time of their baseline MRI scan. By definition, the diagnosis of these subjects remained MCI during this period. We imposed the minimum 3-year observation period requirement for the sMCI group to reduce sample noise, that is, to reduce the chance that some of the subjects included in the sMCI are actually progressive cases. The follow-up periods did not significantly differ between men (5.22±2.68 y) and women (6.17±2.35 y).
In the pMCI group (n = 86) the diagnosis changed from MCI to probable AD sometime during their observation period. In this group, the mean±SD time interval from their baseline MRI scan to probable AD diagnosis was 2.84±1.50 years with a minimum of 1 year. The conversion times did not significantly differ between men (2.91±1.47 y) and women (2.72±1.57 y).
Feature vector variables
Table 2 shows comparisons between sMCI and pMCI groups separately in each of the 16 variables comprising the feature space. Statistical comparisons for the categorical variables sex and APOE4 were made using the chi-square test. Significances of group differences in the remaining 14 variables were assessed using the Mann-Whitney U test.
The groups did not differ in age or sex distribution. There was a trend toward the pMCI group being marginally more educated (p = 0.064) than the sMCI group. There were statistically significant mean differences in all the remaining 13 variables. The distribution of the APOE4 factor was significantly different between groups (p < 10–3) indicating that a significantly greater proportion of pMCI subjects carried one or two copies of the apolipoprotein E ɛ4 allele. In all cognitive measures (MMSE, CDRSB, ADAS11, and ADAS13), the pMCI group had significantly poorer average scores and their scores deteriorated at significantly faster rates as compared to the sMCI group. Also, the HVI bilaterally were significantly lower and declined at significantly faster rates in the pMCI group as compared to the sMCI group.
Since we find sex differences in the accuracy of Random Forest classification with higher accuracy in women (see below), we further examined the 13 variables in Table 2 that were statistically significantly different between sMCI and pMCI groups to determine the effect sizes separately for men and women. The results are given in Table 3. We found that the effect sizes in 11 of the 13 variables were larger in women, which helps to better understand the higher classification accuracy in women compared to men.
Random Forest classification
We trained a Random Forest classifier with B = 5000 trees using the default value of for the dimension of the random subspaces used by the algorithm. To reiterate, the aim of the Random Forest classifier was to classify the MCI subjects into two groups: sMCI and pMCI, in other words, to predict conversion, or lack thereof, from MCI to AD based on the 16 measured/calculated quantities obtained at baseline and one-year follow-up from each MCI subject.
The results of OOB estimations, which is referred to as a confusion matrix, are summarized in Table 4. The estimated classification sensitivity [TP/(TP+FN)] was 86.0%. The estimated classification specificity [TN/(TN+FP)] was 78.2%. The estimated overall accuracy of the classifier [(TN+TP)/(TN+FP+TP+FN)] was 82.3%. The receiver operating characteristic (ROC) curve for the classifier is shown in Fig. 2 (solid line). The area under the curve (AUC) was 0.83. The performance of the classifier was considerably better for women (sensitivity: 93.6%; specificity: 83.3%; overall accuracy: 89.1%) than for men (sensitivity: 81.8%; specificity: 75.9%; overall accuracy: 78.9%). The relative importance of the 16 variables as measured by the mean reduction in the Gini impurity index when the variable is used as a decision variable at a tree node is shown in Fig. 3. According to this criterion, ADAS13 was the most important variable for classification followed by the rate of change with respect to time of the right HVI.
In order to assess the contribution of longitudinal measurements to classification performance, we retrained the Random Forest classifier after excluding the six features representing the rates of change of variables that could only be obtained longitudinally (Table 2: features 6, 8, 10, 12, 14, and 16). The performance of the classifier reduced considerably yielding an estimated 75.6% sensitivity, 69.2% specificity, and 72.6% overall accuracy. The ROC curve for this analysis is shown in Fig. 2 (dashed line). The AUC reduced to 0.77.
We also retrained the classifier by using only the four features related to the HVI (Table 2: features 13-16). In this case the overall classification accuracy reduced to 68.3% and the AUC of the ROC curve reduced to 0.74 (Fig. 2, dash-dot line). On the other hand, when we retrained the classifier by using all the non-HVI related features (Table 2: features 1–12), we obtained an overall classification accuracy of 71.9% and an AUC of 0.77 (Fig. 2, dotted line). In both cases, the classification performance was considerably lower than the 82.3% achieved using all features. The ability of classification algorithms such as the Random Forest to combine a number of weak learners to obtain a single strong learner is known as boosting.
DISCUSSION
Several biomarkers have shown prediction power for the early detection of AD. Among the most established and clinically validated ones are various neuropsychiatric measures of cognitive function [22–26], measures derived from structural MRI, particularly the hippocampus volume [31–33], CSF total tau (t-tau), tau phosphorylated at threonine 181(p-tau181) and Aβ1–42, positron emission tomography (PET) measures of Aβ plaques assessed by 11C-PiB or 18F-AV-45 and measures of regional cerebral rates of glucose metabolism derived from 18F-FDG PET [34]. Measuring CSF biomarkers requires a spinal tap and PET imaging is more expensive and not as widely available as MRI. Training data for machine learning are also limited for CSF biomarkers and PET measures. Therefore, for pragmatic clinical utility, the current study was limited to measures based on longitudinal structural MRI, neuropsychiatric tests, APOE genotype, and simple demographic information for sMCI versus pMCI classification. In all, a 16-dimensional feature vector was used for classification.
We used a novel measure of HVI derived from structural MRI as a biomarker. HVI is computed fully automatically, fast (less than 1 minute), and reliably without requiring any pre-processing of the structural MRI images (e.g., distortion correction). Another advantage is that HVI does not require correction for intracranial volume for two main reasons. Firstly, HVI is a normalized measure (tissue fraction of the ROI). Therefore, if the scale of the ROI changed, the HVI value would remain the same, that is, HVI is scale-invariant. Secondly, before HVI computation, we apply a local 12-parameter affine transformation computed based on landmark-detection to the MRI which to a large extent normalizes the size of the immediate medial temporal structures surrounding the hippocampus across individuals. Further details of the algorithm for HVI computation are givenin [16].
The Random Forest classifier was able to predict conversion from MCI to AD dementia with estimated accuracy of 82.3%. Although it is difficult to precisely compare this result with previous studies because of the different datasets used to train and evaluate the classification algorithms and different methods used for estimating the prediction accuracy, a review of the literature indicates that the typical accuracy of the current algorithms for predicting MCI to AD conversion is in the low 70th percentiles [5, 7–14]. Therefore, it can be said that the overall estimated accuracy of our Random Forest classifier is high in comparison to previous reports.
A very interesting finding of this research is that the accuracy of classification when broken down by sex (Table 4) was considerably higher in women (89.1%) than in men (78.9%). With the classification accuracy in women, to the best of our knowledge, being the highest reported in any previous application of machine learning to the current problem. A possible explanation for the greater accuracy achieved in predicting progression from MCI to AD in women could simply be sampling bias in the ADNI dataset, and that this result may not be generalizable to the MCI population. On the other hand, multiple studies suggest that the trajectories of structural and functional changes in the brain and the associated clinical and cognitive deficits in the course of AD development may be different between men and women. The apolipoprotein E ɛ4 allele, the strongest known genetic factor for sporadic AD [35], has been shown to confer greater risk of AD development in women [36–39]. In cross-sectional studies, atrophy of the medial temporal lobe structures in MCI and mild to moderate AD has been shown to be associated with APOE genotype with greater prominence in women compared to men [40, 41]. Longitudinally, sex and age interactions have also been shown in brain atrophy rates obtained by voxelwise analyses [42, 43]. ROI analyses also show sex differences in atrophy rates of several brain structures including the hippocampus and entorhinal cortex in healthy controls, MCI, and AD subjects [16, 43]. Sex differences have also been observed in the rates of decline of clinical and cognitive measures in MCI subjects with faster deterioration in women [44, 45]. Taken together, these studies suggest that the patterns of AD predictors at baseline and their rates of change with time are different between men and women with MCI. Therefore, it is conceivable that there may be stronger indicators of imminent AD dementia (or lack thereof) in women with MCI as compared to men, and that the results of this paper may be generalized to the MCI population. Future replication studies using independent datasets are necessary to determine whether the results of this paper are simply due to a sampling bias in ADNI or whether in fact it is possible to predict MCI to AD conversion in women with greater accuracy than in men.
We also found that the effect size of HVI difference between sMCI and pMCI groups is larger for the right hippocampus as compared to the left hippocampus (Table 2), and that the HVI and its rate of change on the right hippocampus are more important variables for classification than those of left hippocampus (Fig. 3). These results are consistent with several volumetric studies that have found the right hippocampus to be more affected than the left in AD and MCI [46–48].
A limitation of the current approach is the requirement of longitudinal baseline and one-year follow-up MRI volumes for prediction of MCI to AD conversion. A shorter baseline to follow-up period (e.g., 6 months) would make the current approach a much more practical proposition for both diagnosis and clinical trials. But it would also likely reduce the signal-to-noise ratio of the longitudinal variables used for prediction. Therefore, some loss in the prediction accuracy could be expected when the initial observation period is reduced. It may be possible to compensate for this potential reduction in accuracy by including additional AD biomarkers as features in the classification process. Future studies are required to assess the performance of the Random Forest classification algorithms trained based on shorter observation periods with additional biomarkers.
Additional biomarkers could for example be related to the size and shape of the corpus callosum derived from structural MRI which many studies have shown to be different between mild AD and normal subjects both in their baseline levels and rates of change longitudinally [49–52]. Other potential structural MRI biomarkers are measures of atrophy in the entorhinal cortex [53–55] and posterior cingulate [56, 57]. The feature space could also be augmented by additional clinical and cognitive measures, such as the Functional Activities Questionnaire (FAQ) [58], a low cost 10-item measure ranging from 0 (normal) to 30 (maximum impairment) which has been shown to have power in predicting a change in diagnostic status from MCI to AD [59–61].
Features used for classification could also include other MRI modalities such as resting-state functional MRI [62], diffusion tensor imaging [63], and arterial spin labeling [64], each of which have shown promise in discriminating between normal aging and AD. Thus a multimodality approach could potentially improve performance. Our use of raw T1-weighted imaging requires the shortest acquisition time and least pre- and post-processing.
It should also be mentioned that the MRI volumes used in the current paper had been acquired using 1.5 Tesla scanners. Intuitively, using higher signal-to-noise ratio volumes that can be obtained from modern 3 Tesla MRI scanners could potentially increase the accuracy of the prediction methods presented in this paper. However, potential improvements need to be demonstrated in future studies.
The mean±SD baseline to probable AD diagnosis in the pMCI group was 2.84±1.50 years. This means that after the baseline and 1-year follow-up scans are acquired, if a prediction of imminent conversion to probable AD is made based on the techniques developed in this paper, there is on average a 1.8-year window of treatment aimed at slowing or stopping disease progression. This window of opportunity for therapeutic intervention can be extended by six months if the length of the observation period can be reduced from one year to six months without sacrificing accuracy, e.g., by using 3 Tesla scanners and including corpus callosum and other structural MRI biomarkers.
Another limitation of this study is that even though the accuracies achieved, particularly in women, are among the highest reported in the literature, roughly 80% specificity/sensitively is still not high enough to be applicable for routine clinical work. Thus, the challenge remains to increase the accuracy in future computer aided diagnosis methods. However, any prediction performance above the level of chance will reduce the cost of clinical trials of therapeutic interventions. It should also be kept in mind that sample noise in both training and testing data (e.g., mislabeling progressive cases as sMCI) will always impose a fundamental limit in achievable accuracy.
In conclusion, we have shown in this paper that using longitudinal HVI measures obtained from structural MRI along with a number of common cognitive tests and demographic and genetic information, it is possible to accurately predict whether or not an MCI subject would remain stable or progress to develop AD type dementia. We also found that the estimated prediction accuracy in women is considerably higher than that of men. An important feature of this work is that our approach is designed to be clinically practical and viable. The measures used in our prediction are widely available, non-invasive, and measureable without much difficulty or requiring substantial expertise or preprocessing of data. We avoid preprocessing of structural MRI scans (e.g., distortion correction, B1 field inhomogeneity correction, etc.) on the grounds that these procedures may not be easily available for routine clinical work. We take advantage of the longitudinal data based on evidence that biomarkers of AD are not just different at baseline but their rate of change is different between stable and progressive MCI groups. Finally, we use the Random Forest machine learning algorithm, a powerful technique that has not been as widely applied to the current problem as other algorithms. Future studies on independent datasets are necessary to confirm our results of sex differences in prediction accuracy. If confirmed, this finding will have important implications in both clinical practice and in research.
Footnotes
ACKNOWLEDGMENTS
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
