Abstract
Background/Objective:
Overlapping clinical symptoms often complicate differential diagnosis between patients with Alzheimer’s disease (AD) and behavioral variant frontotemporal dementia (bvFTD). Magnetic resonance imaging (MRI) reveals disease specific structural and functional differences that aid in differentiating AD from bvFTD patients. However, the benefit of combining structural and functional connectivity measures to—on a subject-basis—differentiate these dementia-types is not yet known.
Methods:
Anatomical, diffusion tensor (DTI), and resting-state functional MRI (rs-fMRI) of 30 patients with early stage AD, 23 with bvFTD, and 35 control subjects were collected and used to calculate measures of structural and functional tissue status. All measures were used separately or selectively combined as predictors for training an elastic net regression classifier. Each classifier’s ability to accurately distinguish dementia-types was quantified by calculating the area under the receiver operating characteristic curves (AUC).
Results:
Highest AUC values for AD and bvFTD discrimination were obtained when mean diffusivity, full correlations between rs-fMRI-derived independent components, and fractional anisotropy (FA) were combined (0.811). Similarly, combining gray matter density (GMD), FA, and rs-fMRI correlations resulted in highest AUC of 0.922 for control and bvFTD classifications. This, however, was not observed for control and AD differentiations. Classifications with GMD (0.940) and a GMD and DTI combination (0.941) resulted in similar AUC values (p = 0.41).
Conclusion:
Combining functional and structural connectivity measures improve dementia-type differentiations and may contribute to more accurate and substantiated differential diagnosis of AD and bvFTD patients. Imaging protocols for differential diagnosis may benefit from also including DTI and rs-fMRI.
Keywords
INTRODUCTION
Alzheimer’s disease (AD) and behavioral variant frontotemporal dementia (bvFTD) are the most common causes of young onset dementia [1–3]. Accurate and confident differentiation between these disease types is crucial for the proper management, prognosis, and potential treatment of patients with dementia [4–6]. Yet, despite distinct clinical diagnostic criteria [1, 7], heterogeneity and overlap of clinical manifestations often complicate differential diagnosis [8].
Complementary to clinically derived correlates, magnetic resonance imaging (MRI) has shown to reveal important disease-specific brain changes that may corroborate differential diagnosis on an individual basis [9]. Differences in the extent and location of gray matter (GM) and white matter (WM) atrophy have been recognized as important markers that distinguish each dementia-type [10–14]. Additionally, varying levels of diffusion tensor imaging (DTI)-derived measures elucidate regional differences in WM integrity impairment that are specific to AD or bvFTD patients [15–17]. It even has been suggested that these integrity impairments may precede or facilitate cortical degeneration [18]. However, despite these detectable group differences, MRI-based single subject classifications remain challenging. Per subject analyses are affected more by between subject variations than group-based analyses and it is still unclear which MRI measure contributes most to MRI-based dementia-type classifications [18–20]. Particularly in the earlier disease stages, accurate dementia-type classifications based on structural neuroimaging alone may be hampered by atrophy and tract specific deficit patterns that overlap or are hardly distinguishable [17, 22].
Functional connectivity measures may inform on dementia state [23–25] or dementia progression [23–25] even well before clinical or structural differences can be detected [25, 26]. Differences in functional connectivity strengths as measured with resting-state functional MRI (rs-fMRI) have been heralded as early and useful markers to differentiate AD and bvFTD patients [27, 28]. However, the contribution of rs-fMRI measures as a (complementary) measure to differentiate dementia-types on an individual basis is still largely unclear [27, 30].
It may be argued that one single MRI-derived measure may not sufficiently capture the complex pathophysiological processes that underlie dementia development. Multiparametric MRI-based statistical algorithms integrate various MRI measures to compute a single, quantitative probabilistic index and have shown to be more accurate in differentiating cognitively healthy from demented patients than single MRI measure-based algorithms alone [18, 32]. We speculate that MRI-based classification algorithms that differentiate dementia-types benefit from integrating structural and functional connectivity measures. In this study we therefore aimed to determine the diagnostic accuracy of MRI-based classification algorithms to, on a subject basis, differentiate between AD and bvFTD patients, when combining measures derived from anatomical MRI, DTI, and rs-fMRI.
METHODS
Participants
This two-center study involved a retrospective analysis of data previously published [18, 28] and was conducted in accordance with regional research regulations and conformed to the Declaration of Helsinki. Local medical ethics committees of both centers approved the study and all patients gave written informed consent for their clinical and biological data to be used for research purposes.
For this study, we selected 37 patients with probable AD and 28 patients with bvFTD, who visited either the Alzheimer Center of the VU University Medical center (VUmc) (AD: n = 22, bvFTD: n = 19) or the Alzheimer Center of the Erasmus University Medical Center Rotterdam (EMC) (AD: n = 15, bvFTD: n = 9). In addition, we included 40 cognitively normal controls that were recruited by local newspaper advertisements (VUmc: n = 22; EMC: n = 18).
Patients underwent a standardized one-day assessment including medical and informant-based history, medical history (dementia, psychiatry, cardiovascular) of first-degree relatives, physical and neurological examination, blood tests, neuropsychological assessment, and brain MRI. Diagnoses were made in a multidisciplinary consensus meeting according to the core clinical criteria of the National Institute on Aging and the Alzheimer’s Association workgroup for probable AD [7] and according to the clinical diagnostic criteria of FTD for bvFTD [1]. To minimize center effects, all diagnoses were re-evaluated in a panel that included clinicians from both centers. Controls were screened for memory complaints, family history of dementia, drugs- or alcohol abuse, major psychiatric disorder, and neurological or cerebrovascular diseases. They underwent an assessment that included medical history, physical examination, neuropsychological assessment, and brain MRI comparable to the patient work-up.
For both cohorts, inclusion criteria were: 1) age between 40 and 80 years and 2) availability of a T1-weighted 3-dimensional MRI (3DT1w) scan, a diffusion tensor imaging (DTI) dataset, and a rs-fMRI T2*-weighted scan. Exclusion criteria were: 1) large image artifacts (n = 7); 2) failure of imaging analyzing software to process MR scans (n = 10); and 3) gross brain pathology other than atrophy, including severe white matter hyperintensities and/or lacunar infarctions in deep gray matter structures.
MRI acquisition and review
Patients and controls of the VUmc cohort were scanned at the VUmc Amsterdam using a 3T MRI scanner (Signa HDxt, GE Healthcare, Milwaukee, WI, USA) with an 8-channel head coil with foam padding to restrict head motion. Patients and controls of the EMC cohort were scanned at the Leiden University Medical Center (LUMC) using a 3T MRI scanner (Achieva, Philips Medical Systems, Best, The Netherlands) with an 8-channel SENSE head coil. MRI sequence parameter settings are detailed in Supplementary Table 1. In brief, the imaging protocol included a whole-brain near-isotropic 3D T1-weighted (3DT1w) sequence for cortical and subcortical tissue-type segmentation, a diffusion tensor imaging (DTI) sequence for assessments of white matter integrity, and a resting state functional MRI T2*-weighted MRI for the calculation of functional connectivity measures. Participants were instructed to lie still with their eyes closed and not to fall asleep during rs-fMRI. Additionally, a 3D fluid attenuated inversion recovery, dual-echo T2-weighted, and susceptibility weighted imaging datasets were acquired to allow for review of brain pathology other than atrophy by an experienced radiologist.
Demographics
*versus control subjects, p < 0.01. Mean±standard deviation or n (%). AD, Alzheimer’s disease; bvFTD, behavioral variant frontotemporal dementia; MMSE, Mini-Mental State Examination.
MRI preprocessing
Preprocessing of 3DT1w images involved non-uniformity correction [33] and segmentation of parenchymal tissue signal from surrounding tissue [34]. Images were then spatially aligned to the MNI152 2×2×2 mm T1 template (Montreal Neurological Institute, Canada) using a non-linear registration procedure with a warp resolution of 10 mm [35]. Voxel-wise densities GM, WM, and cerebrospinal fluid (CSF) were determined with the initial steps of the voxel-based morphometry pipeline of the Statistical Parametric Mapping toolbox (SPM12; Functional Imaging Laboratory, University College London, London, UK) [36] implemented in MATLAB R2015b (MathWorks, Natick, MA). Except for manual placement of the image’s origin approximately on the anterior commissure and applying the light cleanup option to remove any remaining non-brain tissue, default settings were used for tissue-type segmentations. Deep gray matter (DGM) structures including the bilateral thalamus, caudate nucleus, putamen, globus pallidus, nucleus accumbens, amygdala, and left and right hippocampus were separately identified using a dedicated registration and segmentation procedure with default settings [37].
For DTI, preprocessing included motion and eddy-current induced distortion correction and gradient vector direction correction [38]. The corrected DTIs were subsequently used to voxel-wise calculate measures of fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AxD; largest eigenvalue), and radial diffusivity (RD; average of the two eigenvalues λ2 and λ3) using weighted least squares fitting [39]. The tract-based spatial statistics (TBSS) pipeline was used to create a study-specific TBSS-skeleton, by non-linearly aligning all FA maps to the FMRIB58_FA template [40]. The derived skeleton was thresholded at 0.2 to ensure values originated from WM tissue (see below).
For rs-fMRI, preprocessing included motion correction [41], brain extraction, spatial smoothing using a Gaussian kernel with a full width half maximum of 3 mm, grand mean intensity normalization, motion artifact removal, and high-pass temporal filtering (cutoff frequency = 0.01 Hz). Motion artifacts were removed from the data using the ICA-based automatic removal of motion artifacts (ICA-AROMA (vs0.3-beta)) procedure [42]. Subsequently, the rs-fMRI volumes were linearly aligned to the corresponding 3DT1 w [43]. Spatial alignments to the MNI152 template were achieved by concatenating the registration parameters of the previous step with the nonlinear parameters from the 3DT1 w to the MNI152 template.
All registration and segmentation steps were critically reviewed and errors were corrected accordingly.
Feature extraction
Two anatomical atlases were used to parcellate the entire brain. These atlases were used to extract, in each subject’s native space, cortical and subcortical GM and WM features from the structural 3DT1w-images. The 48 cortical regions of the Harvard–Oxford (HO) probabilistic anatomical brain atlas were split into left and right hemisphere regions, resulting in 96 distinct cortical regions that covered the complete cortical GM [44]. For WM regional analysis, we selected 20 WM tract regions from the probabilistic Johns-Hopkins-University (JHU) white-matter tractography atlas [45]. Voxel probability values less than 25% were excluded. The remaining values were used to weight the WM or GM densities. Weighted GM densities were calculated by weighting the GM segmentations by the probability of a voxel being part of that specific HO brain atlas-derived region. Weighted WM densities were calculated by weighting the WM segmentations by the probability of a voxel being part of that specific JHU brain atlas-derived tract. This way we emphasized the regions’ likelihood of being GM or WM without introducing bias resulting from conservatively selecting brain regions. For DGM regions, the dedicated segmentations of DGM, hippocampus, and amygdala were used to calculate the regional volumes normalized by intracranial volume to compensate for individual differences in brain volume [46]. This resulted in a feature vector of 110 GMD values (96 cortical GMD values and 14 DGM volumes), and a feature vector of 20 average WMD values per subject.
DTI-based features were calculated by, on a voxel-wise basis, projecting each subject’s FA, MD, AxD, or RD values on to the TBSS group skeleton. Analogous to the anatomical WMD features, the 20 WM tracts of the probabilistic JHU white-matter tractography atlas were used to calculate a weighted mean FA, MD, AxD, and RD per tract per subject. This resulted in 4*20 feature vectors of mean FA, MD, AxD, and RD values per subject.
The functional connectivity features were calculated by combining all processed rs-fMRI datasets in a temporally concatenated independent component analysis (ICA) [47], with dimensionality fixed at 70 components and an ICA threshold of 0.99 [48]. This meant that each voxel included in the ICA map was 99 times more likely to be part of that component than to be caused by Gaussian background noise. For each subject, we calculated the mean time course for each component, weighted by the ICA weight map and GM probability of that component’s region. These mean time courses were subsequently used to determine the functional connectivity of a component with the 69 other components. Functional connectivity was either expressed as full correlations (corr) or as sparse L1-regularized partial correlations (pcor) between the components’ time courses. Partial correlations were calculated using the graphical lasso algorithm [49]. Both functional connectivity measures resulted in two feature vectors of each (70*69)/2 = 2415 (partial) correlations per subject.
Differential classification
These feature vectors were subsequently used to train an elastic net regression algorithm with default settings to provide a statistically sound solution for the imbalance between a large number of features and a small number of subjects. An elastic net regression model estimates a sparse regression model that selects a subset of all the features provided as input by imposing feature selection and feature weight penalties during regression, effectively selecting only those features relevant for classification [50–52]. A cross validation procedure was used to determine the optimal set of penalty parameters and generalized classification performance of the elastic net classifiers. Cross-validation reduces classification bias by iteratively subdividing the data in separate test and training sets. In this study we used two nested, 10-fold cross validation loops. The first, outer, loop was used to determine the overall classification performance, while the second, inner, loop subdivided the training set further to determine the lowest binomial deviance that corresponded with the best operational parameters for the penalty terms (including the optimal number of features) without overestimating classification performance [53, 54]. This process was repeated 10 times to ascertain that each subject was part of the test set exactly once. To ensure that estimated feature regression coefficients were conditional on subject age and gender and to adjust for scanner effects, age, gender, and center were included into the model without any penalty. For AD and bvFTD differentiations, models were furthermore made conditional on disease severity by also including disease duration (in months). The entire classification procedure was repeated 50 times to reduce variance resulting from random partitioning in training and test folds, and to report the range of observed outcomes under different train and test conditions.
We trained three types of classification models following the above-described procedures. One model aimed to differentiate AD from bvFTD patients. To contextualize the classification performance values of this model, we also trained two additional models. A second model aimed to differentiate control subjects from bvFTD patients, and a third model to differentiate control subjects from AD patients. For each subject, these models produced a predicted value between 0 and 1. The higher the predicted value the more likely this subject belongs to the bvFTD- group or AD group in case of the control versus AD model.
Classification performance
Classification performances were quantified using a threshold-independent measure based on receiver-operating characteristic (ROC) analyses. After each 10-fold nested cross-validation, the predictions were used to calculate ROC curves by continuously increasing a threshold between 0 and 1 and for each threshold classifying a subject either as control, AD, or bvFTD patient, depending on the model being evaluated. These classifications were then compared with the subject’s actual differential diagnosis. The area under this ROC curve (AUC) was calculated as a measure of classification performance insensitive to the distribution of each patient group [55]. Additionally, we calculated the optimal operating point on each curve to calculate the model’s accuracy, sensitivity, and specificity given equal class distribution and equal penalty for false positive and false negative predictions. Reported AUC, accuracy, sensitivity, and specificity values are averages obtained from repeating the cross-validation 50 times.
Multiparametric classifications
To determine whether combining multiple MRI measures improved classification, we first assessed classification performance for each single measure separately by alternately providing all features of that specific measure (i.e., 110 GMD, 20 WMD, 20 FA, 20 MD, 20 AxD, 20 RD, 2514 corr, or 2514 pcor features) as input for cross-validation. Subsequently, we step-wise concatenated measures each time adding all features of a new measure to the best performing measure combination (i.e., highest AUC) of the previous step until all measures and thus all features were included into the model. The model subsequently determined the importance of each feature by assigning the proper weight.
All classification analyses and evaluations were implemented in R (R core 2010, GLMnet package) [51].
Statistical analysis
Statistical analyses of between-group differences were performed using SPSS (IBM SPSS Versions 22.0, IBM, Amonk, NY). Demographic group differences between age, MMSE, and disease duration were assessed using analysis of variance (ANOVA). Sex and center distributions were assessed with χ2 tests. Permutation testing was used to determine whether single or multiparametric models performed above-chance level (one-tailed, N = 5000). The maximum statistic method was used for family-wise error correction [56]. A bootstrap percentile method was used to compare ROC curves of single measures (two-tailed, N = 5000), and between single measures and the best performing multiparametric combination (one-tailed, N = 5000) [57, 58]. False discovery rate correction within each patient-group was used to correct for multiple comparisons. p < 0.05 was considered statistically different.
RESULTS
Demographics
For this study, 88 subjects met the inclusion criteria. Thirty AD subjects were diagnosed with probable AD, 23 patients with bvFTD. Thirty-five control subjects were included for classification performance evaluation of the classification models involving control subjects. MMSE values of control subjects were higher than those of patients. MMSE values of AD patients were comparable to those of bvFTD patients (Table 1).
Classification performance: AD versus bvFTD
For AD and bvFTD differentiations, combining multiple MRI measures within a multiparametric classification model resulted in higher mean AUC values than those with models that were based on single MRI-based measures only (Fig. 1; Table 2). The level of AUC values was however conditional on the combination being considered. After step-wise adding multiple MRI-derived measures to the best performing single MRI measure model (i.e., MD: 0.708 [0.625–0.775] (mean AUC [min-max]); Table 2 underlined), highest AUC, sensitivity, and specificity values were obtained for a classification model that included MD, full correlations, and FA (0.811 [0.755–0.862]; Table 2: bold; Table 3: bold). When comparing the single measure-based classifications, AUC values of the single measure-based classifications were not significantly different. However, AUC values obtained with the multiparametric model were significantly higher compared with those obtained with single MRI measure-based models (Table 3), except for MD-based classifications (p = 0.06).

Receiver-operating characteristic curves of classifications between AD and bvFTD patients. For each differential diagnosis, measurements of gray matter density (GMD), white matter density (WMD), fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AxD), radial diffusivity (RD), full correlation between rs-fMRI derived independent components (cor), and L1-regularized partial correlation between rs-fMRI-derived independent components (pcor) were separately assessed. The multiparametric curve resulted from step-wise combining MRI measures, each time adding a measure to the combination with the highest AUC values of the previous step. Highest AUC was obtained by combining MD, corr, and FA measurements (mean AUC [min-max]) 0.811 [0.755 –0.862]). The diagonal line represents random classification performance.
Mean AUC values for AD versus bvFTD classifications using single or combination of multiple MRI-derived measures
Multiparametric models result from stepwise adding measures to the best performing classification model of the previous step, starting with the best performing single MRI measure (i.e., MD, underlined). Bold: best performing model. Italic: mean AUC significantly above chance level after family-wise error rate correction. GMD, gray matter density; WMD, white matter density; FA, fractional anisotropy; MD, mean diffusivity; AxD, axial diffusivity; RD, radial diffusivity; corr, full correlations between ICA components; pcor, L1-regularized partial correlations between ICA components; AUC, area under the ROC curve.
Classification performance values of AD versus bvFTD classifications using single or multiple MRI-derived measures
Mean, minimum, and maximum area under the ROC curve (AUC) after 50 times 10-fold cross-validation. Bold: best performing model, Underlined: best performing single measure-based model. Mean sensitivity, specificity, and classification accuracy were derived from the optimal operating point on the ROC (cutoff). GMD, gray matter density; WMD, white matter density; FA, fractional anisotropy; MD, mean diffusivity; AxD, axial diffusivity; RD, radial diffusivity; corr, full correlations between ICA components; pcor, L1-regularized partial correlations between ICA components; AUC, area under the ROC curve. *p < 0.05 versus multiparametric model, #p < 0.05 versus chance, ##p < 0.001 versus chance.
Classification performance: Control versus dementia
MRI measures that resulted in highest classification performance rates for control versus dementia classifications (i.e., control versus bvFTD or control versus AD) were different from those obtained with AD versus bvFTD classifications. For control and bvFTD classifications, highest AUC, sensitivity, and specificity values were found for a multiparametric model that included measurements of FA, GMD, and full correlations (0.922 [0.877–0.954]; Table 4: bold; Supplementary Table 2: bold). AUC values of this multiparametric model were significantly higher than those obtained with single MRI measure-based models, except for FA- (0.862 [0.810–0.903], p = 0.08) and GMD-based (0.858 [0.827–0.896], p = 0.11) classifications (Table 4). For classifications between control subjects and AD patients on the other hand, classification performance values of a combination of GMD and DTI measures (0.941 [0.910–0.966]) were not better than classifications based on GM measurements only (0.940 [0.913–0.962], p = 0.41) (Table 5, Supplementary Table 3). While the combination outperformed the other single measure-based classifications, GMD-based classifications were also better than all other single MRI measure classification except for RD-based classifications (0.832 [0.784–0.865], p = 0.10).
Classification performance values of control versus bvFTD classifications using single or multiple MRI-derived measures
Mean, minimum, and maximum area under the ROC curve (AUC) after 50 times 10-fold cross validation. Bold: best performing model. Underlined: best performing single measure-based model. Mean sensitivity, specificity, and classification accuracy were derived from the optimal operating point on the ROC (cutoff). GMD, gray matter density; WMD, white matter density; FA, fractional anisotropy; MD, mean diffusivity; AxD, axial diffusivity; RD, radial diffusivity; corr, full correlations between ICA components; pcor, L1-regularized partial correlations between ICA components; AUC, area under the ROC curve. *p < 0.05 versus multiparametric model, #p < 0.05 versus chance, ##p < 0.001 versus chance.
Classification performance values of control versus AD classifications using single or multiple MRI-derived measures
Mean, minimum, and maximum area under the ROC curve (AUC) after 50 repetitions. Bold: best performing model. Underlined: best performing single measure-based model. Mean sensitivity, specificity, and classification accuracy were derived from the optimal operating point on the ROC (cutoff). GMD, gray matter density; WMD, white matter density; FA, fractional anisotropy; MD, mean diffusivity; AxD, axial diffusivity; RD, radial diffusivity; corr, full correlations between ICA components; pcor, L1-regularized partial correlations between ICA components; AUC, area under the ROC curve. *p < 0.05 versus multiparametric model, §p < 0.05 versus GMD, #p < 0.05 versus chance, ##p < 0.001 versus chance.
Features selected
MRI-derived measures outweighed the contribution of gender, age, and center to the classification. AD and bvFTD differentiations using gender, age, and center distributions only did not outperform random chance (AUC = 0.596, p = 0.12). AUC values of the best performing AD versus bvFTD model (i.e., combination of MD, full correlations, and FA) were similar to classifications where age (p = 0.89), gender (p = 0.70), or center (p = 0.14) was excluded as a covariate. Identical AUC values were also obtained when these covariates were excluded from the best performing control versus bvFTD model (i.e., combination of FA, GMD, and full correlations: age p = 0.10, gender p = 0.65, center p = 0.50) or control versus AD model (i.e., combination of GMD, AxD, FA, RD, MD: age p = 0.60, gender p = 0.64, center p = 0.79).
Nested cross-validation resulted in models that selected a subset of all the MRI features provided as input for classification. The best performing AD versus bvFTD model selected 5% of all the MD, full correlations, and FA features provided for classification. This model focused on differences in uncinate fasciculus, forceps minor, cingulum bundle, cortical spinal tracts, and altered functional connectivity with the dorsal default mode-network. In contrast, the control versus dementia models also included GM regions for classification. The best performing control versus bvFTD model selected 3% of all provided features and considered anterior thalamic radiation, cortical spinal tract, inferior longitudinal fasciculus, and hippocampal regions as key regions for classification. The best performing control versus AD model selected 37% of all the features provided and primarily focused on volume differences in DGM structures including the hippocampus, nucleus accumbens, pallidum, and thalamus.
DISCUSSION
This study investigated the diagnostic accuracy of MRI-based classification algorithms to—on a subject basis—differentiate between AD and bvFTD patients using anatomical MRI, DTI, and rs-fMRI. Our study showed that, classification algorithms that combine measures of DTI and rs-fMRI are more effective in discriminating AD from bvFTD patients than algorithms that use DTI, GMD, or rs-fMRI measures separately. Furthermore, when compared with classification models that aim to differentiate patients from controls we found that the level of improvement was conditional on MRI measure and dementia-type differentiation being considered.
GM- and DTI-derived measures have been heralded as important corroborating measures for differentiating AD and bvFTD patients [15, 31]. Classification methods that combined these measures invariably showed improvement of classification accuracy over models that used these measures separately [15, 31]. We also found that combining measures improves classification accuracy. However, we found highest classification rates when DTI and rs-fMRI measures were combined. GM-based values contributed, either as a single (for AD) or complementary measure (for bvFTD), to the differentiation of dementia versus control subjects, but not to the discrimination of AD and bvFTD patients. Differential diagnosis between dementias may be challenged by overlapping GM atrophy patterns particularly in subcortical regions [21, 22]. Both best performing dementia versus control models considered volume reductions in subcortical regions like the hippocampus as decisive regions for classification and may therefore have had little contribution in the differentiation between the dementia-types. Distinct microstructural changes in WM may be potential markers for bvFTD. Compared with AD patients, bvFTD patients show distinct differences in diffusion especially in key regions like uncinate fasciculus, genu of the corpus callosum, and cingulum bundle [16, 59]; differences that may even precede cortical degeneration [26]. In line with these findings, we found that the best performing bvFTD classification models specifically focused on these regions of DTI-derived WM pathology. We also found that AD versus bvFTD classifications improved when rs-fMRI measures were added. Altered connectivity measurements mainly with the dorsal default mode network regions were considered important and, in the end, increased classification performance rates. This corresponds with other work that showed that connectivity deviations of the default mode network alongside salience/executive control network contrast between AD and bvFTD patients [12, 60]. Nevertheless, the role of rs-fMRI as a separate measure for disease-type classifications has not yet been established. It has found limited application for single subject disease-type classifications [27, 32]. In AD versus control classifications its performance rates are consistently among the lowest compared with other MRI measures [29, 32] and we also found that rs-fMRI as a single measure was neither effective in differentiating dementia types nor between patients and controls subjects.
We first trained classifications models with varying combinations of MRI measures to differentiate AD from bvFTD patients. Subsequently, we trained models that aimed to differentiate AD or bvFTD patients from cognitively healthy controls in order to assess classification model accuracy under conditions where large MRI-based differences are expected [10, 28]. Combining MRI-derived structural and functional measures improves bvFTD classifications. The level of improvement is however dependent on the dementia-types and MRI measures being considered. Substantial improvement was for instance not observed for AD versus control classifications while others did [29, 32] or did not show benefit [30] from GM, DTI, and rs-fMRI measures. Furthermore, in line with previous work, we observed that concatenating all measures did not result in the most accurate classification model and as more measures were added the level of improvement became less apparent [29, 61]. The marginal improvements observed for some measures or combinations may indicate a negligible contribution to the model. These may result from variability due to non-stationarities like correlation across features and samples within our data [62]. These measures are most likely of clinically insufficient added relevance, and may not justify the additional processing time.
Overall, classification performance values for dementia-type differentiations where consistently lower than dementia versus control subject differentiations. Dementia type classifications remain challenging. Compared with control versus patient comparisons, differences between patients are more subtle and, particularly in the earlier stages of the disease, may be hindered by overlapping patterns of GM atrophy [21, 63], WM integrity impairment [17], or comparable patterns of functional connectivity loss in specific functional network regions [12, 28]. It remains questionable whether any of these algorithms can fully capture the complexity of structural and functional dynamics of neurodegenerative processes underlying dementia. We speculate that other algorithms that utilize more sophisticated feature combination approaches, like sparse group lasso models [61], or hierarchical or longitudinal algorithms that aim to differentiate patients from a general population in order to subsequently differentiate between dementia-types may further exploit and weigh the additional information from multiple measures [64]. Incorporating other or additional imaging-derived biomarkers as cerebral blood flow [65], amplitude of low frequency fluctuations [32], GM derived connectomics [19], or diffusion tractography derived graph-based analytics [61] may further contribute to MRI-based dementia-type classification estimates without increasing diagnostic complexity.
In our analysis, we took several steps to reduce classification bias and augment generalizability of our results. First, while specific functional network connections were found to deviate between AD and bvFTD patients [12, 60], we hypothesized that restricting our analysis to specific network regions may introduce bias and may unnecessary exclude other regions that show deficits at different stages of the disease [23, 66]. The number of network regions that need to be distinguished to optimally differentiate between dementia-types still needs clarification. ICA dimensionality continues to be an actively debated topic and is a trade-off between detail in functionally connected regions and feature space dimensionality. Second, we employed a repeated nested cross validation approach in which two levels of data segregation ensured unbiased operational parameter optimization and classification performance estimation [53]. Thirdly, for classification, we used regularized regression to establish homogeneous and stable dementia type estimates and to accommodate proper selection of relevant features despite high dimensionality and collinearity of our data [52]. Our study was limited by the clinical diagnoses used to validate the differentiations. While our multidisciplinary, multicenter team carefully diagnosed each patient according to the newest criteria for AD [7] or bvFTD [1], uncertainty in the diagnosis remains. Overlapping clinical symptoms may complicate dementia-type differentiations [8] and postmortem pathological data to confirm the diagnosis were unavailable. Furthermore, we included patients with possible and probable FTD diagnosis to maximize our patient cohort. This may have increased the complexity of our classifications. We included relatively young patients who were diagnosed in a relatively early stage of the disease, therefore likely to have less apparent structural or functional deficits. Nevertheless, we were able to differentiate bvFTD patients from controls with high accuracy. Further validations with larger, multicenter cohorts are necessary to contextualize and compare our findings. The trained models depend on both random and non-random class differences in the training sample and especially in light of our limited population sizes, we cannot reliably differentiate between real and random class differences in the trained models [62]. Consequently, we refrained from biological interpretation of the model’s parameters, speculating on the exact measure order, and cautiously interpreted the features selected for classification.
In conclusion, in this study we investigated whether multiparametric MRI improves MRI-based differential diagnosis of dementia-types with overlapping clinical symptoms. We found that combining multiple MRI-derived measures of structural and functional connectivity improve MRI-based differentiations between bvFTD and AD patients. Our results imply that current MRI protocols for differential diagnosis of AD or bvFTD may benefit from adding functional and diffusion connectivity measures complementary to the anatomical (GM-based) measurements already being acquired. Yet, the MRI measures and dementia-types being differentiated should be carefully considered to attain most optimal result. These results furthermore highlight the potential of these multiparametric imaging-based classification algorithms to aid in and possibly improve diagnosis, particularly in situations where experienced neuroradiologists or other supporting diagnostic measures are available to a limited extent. Further analysis should reveal how these results generalize to larger cohorts, to cases where patients’ symptoms are less conspicuous, or involve subjects in a pre-symptomatic disease stage. Nevertheless, these observations may serve as a first guidance toward integrating quantitative imaging-based measures that may contribute to more confident per subject differential diagnoses and subsequent tailored patient care.
Footnotes
ACKNOWLEDGMENTS
Gradient non-linearity corrections were kindly provided by GE medical systems, Milwaukee.
This study was supported by VICI grant no. 016.130.677 of the Netherlands Organization for Scientific Research (I). The Vumc Alzheimer Center is supported by Alzheimer Nederland and Stichting VUMC Fonds. Research of the Vumc Alzheimer Center is part of the neurodegeneration research program of the Neuroscience Campus Amsterdam.
