Abstract
Background:
Hippocampus (HC) atrophy is a hallmark of early Alzheimer’s disease (AD). Atrophy rates can be measured by high-resolution structural MRI. Longitudinal studies have previously shown sex differences in the progression of functional and cognitive deficits and rates of brain atrophy in early AD dementia. It is important to corroborate these findings on independent datasets.
Objective:
To study temporal rates of HC atrophy over a one-year period in probable AD patients and cognitively normal (CN) subjects by longitudinal MRI scans obtained from the Minimal Interval Resonance Imaging in AD (MIRIAD) database.
Methods:
We used a novel algorithm to compute an index of hippocampal (volumetric) integrity (HI) at baseline and one-year follow-up in 43 mild-moderate probable AD patients and 22 CN subjects in MIRIAD. The diagnostic power of longitudinal HI measurement was assessed using a support vector machines (SVM) classifier.
Results:
The HI was significantly reduced in the AD group (p < 10–20). In addition, the annualized percentage rate of reduction in HI was significantly greater in the AD group (p < 10–13). Within the AD group, the annual reduction of HI in women was significantly greater than in men (p = 0.008). The accuracy of SVM classification between AD and CN subjects was estimated to be 97% by 10-fold cross-validation.
Conclusion:
In the MIRIAD patients with probable AD, the HC atrophies at a significantly faster rate in women as compared to men. Female sex is a risk factor for faster descent into AD. The HI measure has potential for AD diagnosis, as a biomarker of AD progression and a therapeutic target in clinical trials.
Keywords
INTRODUCTION
The hippocampus (HC) is a component of the medial temporal lobe limbic system and plays a central role in the formation, consolidation, and retention of recent (or declarative) memory [1]. Atrophy of the HC occurs early in the pathogenesis of Alzheimer’s disease (AD) [2], which can be detected by structural magnetic resonance imaging (MRI) [3, 4]. Thus, HC atrophy above age expectation has been proposed as a core neuroimaging biomarker of AD [5, 6]. This article presents a recently developed fully automatic and rapid technique for measuring an index of hippocampal (volumetric) integrity (HI) from three-dimensional (3D) T1-weighted MRI scans.
We present a preliminary study of the impact of normal aging and presumed AD pathology on HI using a publicly available image database [7]. In particular, we determine whether HI and its rate of change are different between cognitively normal (CN) individuals and patients with probable AD, assess the influence of age and sex on HI, and evaluate its discrimination power for classifying between CN and AD subjects.
There have been three general approaches to characterizing HC atrophy on MRI. The first and most basic method is to determine the HC volume by manually tracing its boundary on high-resolution structural MRI scans using specialized software [8–11]. Unfortunately manual measurement is tedious, time-consuming, suffers from intra- and inter-observer variability, requires extensive operator training, and protocols vary significantly across laboratories. These issues have hindered using manual measurement as a routine clinically-viable approach to assessing HC atrophy.
The second general approach for HC volume measurement has been to develop automated algorithms for HC segmentation [12–23]. While excellent progress has been made in this area, automated methods are less robust as compared to manual measurements [24], can be computationally expensive [12], are not widely available, and often require extensive preprocessing of the MRI scans (inhomogeneity correction, tissue segmentation, distortion correction, etc.) and technical operator expertise [15].
Regardless of whether manual or automatic methods are used for HC volume determination, results should be corrected for the intra-cranial volume (ICV) to which they are significantly correlated. However, accurate measurement of the ICV is itself a non-trivial problem [25].
As the third approach to characterizing HC atrophy, several research groups have formulated variables that can be measured from MRI scans which in some sense reflect atrophy, but are not direct measurements of HC volume per se[26–28]. The basic premise behind this group of techniques is that neurodegeneration tends to replace brain parenchyma with cerebrospinal fluid (CSF). Hence the measurements in some sense try to capture the relative volume, and change thereof, between brain parenchyma and CSF in well-defined regions of interest in the medial temporal lobe. The HI index introduced in this paper follows this approach. The advantages of our approach are that HI can be computed fast (∼1 min), does not require ICV measurement, could be applied to raw MRI scans without any pre-processing, and requires very little if any image processing expertise on the part of the user, while being at least as sensitive as volume measurement to HC abnormality.
MATERIALS AND METHODS
MRI data
We analyzed longitudinal MRI scans from the Minimal Interval Resonance Imaging in Alzheimer’s Disease (MIRIAD) public database [7]. The MIRIAD study was designed to investigate the feasibility of using longitudinal MRI as an outcome measure for clinical trials of AD treatments. MIRIAD includes volumetric MRI brain scans from 23 cognitively normal (CN) subjects and 46 mild to moderate probable AD patients. All subjects were scheduled for imaging at 0, 2, 6, 14, 26, 38, and 52 weeks from baseline, with two back-to-back scans conducted at 0, 6, and 38 weeks.
The MIRIAD scans are three-dimensional (3D) sagittal T1-weighted volumes acquired on a 1.5 Tesla Signa MRI scanner (GE Medical systems, Milwaukee, WI) using an inversion recovery prepared fast spoiled gradient recalled (IR-FSPGR) pulse sequence with matrix size: 256 × 256 × 124, voxel size: 0.9375 × 0.9375 × 1.5 mm3, TR: 15 ms, TE: 5.4 ms, flip angle: 15°, and TI: 650 ms.
For each subject we selected a baseline and a follow-up scan (referred henceforth as volume). We used volume 1 of week 0 as the baseline and the volume at week 52 as the follow-up. However, prior to image analysis the selected volumes were visually inspected and alternatives were selected if severe motion artifacts were present. This was the case in 9 subjects listed in Supplementary Material A. Four subjects (3 AD and 1 CN) were eliminated from the study because we could not identify an appropriate baseline and/or follow-up volume for reasons listed in Supplementary Material A. The demographics of the remaining 22 CN and 43 AD subjects are given in Table 1.
Image registration
In longitudinal analyses, it is essential to treat all volumes over time exactly the same [29]. Interpolation bias can occur if, for example, the baseline volume is kept untouched and the follow-up volume is re-sampled to the native space of the baseline volume. Bias may also be introduced if a single time point is used consistently in a preprocessing step, for example, as the reference volume in a registration step.
Let V b and V f represent the baseline and follow-up volumes, respectively. The volumes are registered according to the flowchart shown in Fig. 1. (1) The two volumes are inputted to an inverse-consistent rigid-body registration algorithm, the Automatic Temporal Registration Algorithm (ATRA) of the Automatic Registration Toolbox (ART). This algorithm produces a rigid-body transformation matrix T bf (T fb ) that maps positions from the baseline (follow-up) volume to the corresponding positions on the follow-up (baseline) volume. Inverse-consistency implies , so it does not matter whether the baseline or the follow-up volume is chosen as the reference in the registration algorithm. (2) Following determination of the forward (T bf ) and backward (T fb ) transformations, the square roots of these matrices, and , are computed. (3) Subsequently the baseline and follow-up volumes are resampled using and respectively, resulting in two spatially registered volumes and . The advantage of this approach is that the baseline and follow-up volumes are treated in exactly the same way thus avoiding asymmetric interpolation bias. (4) Finally the spatially registered baseline and follow-up volumes are averaged.
Note that in order to find the square root transformations such as: , we used concepts from group theory [30]. Rigid-body registration matrices in 3D are members of the special Euclidean group SE (3) with associated Lie algebra se (3). For every transformation matrix T ∈ SE (3) one can find a corresponding matrix g ∈ se (3) and vice versa such that: T = e g . The square root of the transformation is simply: T1/2 = eg/2.
In summary, the image registration pipeline (Fig. 1) has two input volumes: the baseline (V b ) and the follow-up (V f ) in their respective native spaces (as originally acquired), and three output volumes: the baseline and follow-up transformed to an average space, midway between them, each having undergone a single interpolation operation and their average volume .
An example registration is shown in Fig. 2, where orthogonal sections through the left HC are shown in (left column) and (right column).
Hippocampus integrity computation
Hippocampal (volumetric) integrity (HI) is defined as the fraction of tissue (non-CSF) found in a region that is expected to encompass a normal HC. Lower values indicate increased atrophy of the HC. The rate of atrophy may be obtained from a pair of measurements of HI at two different times in a longitudinal study. Steps for computing the HI longitudinally are outlined below. We will only describe the steps for computing the HI for the left HC at baseline, which is denoted by η
lb
. The HI for left HC at follow-up (η
lf
), right HC at baseline (η
rb
), and right HC at follow-up (η
rf
) are computed similarly. The baseline and follow-up volumes are registered using the inverse-consistent unbiased registration algorithm outlined above. The mid-sagittal plane (MSP) is detected on the post-registration average volume V
avg
using the method described in [31]. The cross-section points of the anterior commissure (AC) and the posterior commissure (PC) on the MSP are detected automatically using the method described in [32]. A rigid-body transformation T
PIL
is computed based on the MSP and AC/PC locations and applied to V
avg
to transform it to a standard posterior-inferior-left (PIL) orientation. Based on a priori training data computed offline, 106 landmarks in the vicinity of the left HC are detected by template matching (124 landmarks for the right HC). Based on the detected landmarks, an affine transformation with 12 parameters T
LM
is computed by least squares that would map the detected landmarks on PIL space as closely as possible to their expected locations which are determined a priori. The composite transformation is applied to a probabilistic left HC label determined a priori based on manual tracings on multiple MRI on a separate dataset to project the probabilistic label onto the space of the V
avg
volume. Let L denote the projected probabilistic label which is also in the space of the registered baseline volume . Figure 3 shows the 0% and 50% isocontours of the probabilistic left HC label projected on in one case. Supplementary Material B presents more details on the construction of the HC probabilistic label. An automated analysis of the histogram of the voxels in that coincide with the non-zero voxels in L is performed using the expectation maximization (EM) algorithm [33]. The purpose of the histogram analysis is to determine a CSF intensity threshold I
CSF
. Briefly, (a) a Gaussian mixture model with 5 terms is fitted to the histogram of the voxel intensities in the HC ROI. (b) The (1 − α) th percentile value of the histogram is denoted by I
α
(default α = 0.25). (c) A gray matter intensity peak I
gm
is found as the peak of the EM fit of the histogram in the intensity region [cI
α
, I
α
] (default c = 0.4). (d) Finally I
CSF
is defined as I
CSF
= I
gm
− γI
α
with default γ = 0.2. Finally η
lb
is defined as follows:
Note that steps (1)–(4) are only performed once, as they need not be repeated for computing η lf , η rb , and η rf ; steps (5)–(7) are performed twice, once for computing η lb and η lf for the left HC, and once for computing η rb and η rf for the right HC; and finally steps (8) and (9) are performed four times for the left and right at baseline and follow-up.
While the above processing steps have been described in the context of a longitudinal study with two time-points, it is straightforward to apply the HI computational method to cross-sectional studies where only a single MRI volume is available. In this case, step (1) above (i.e., symmetric unbiased registration) is not necessary, and in the subsequent steps V avg is simply replaced by the available MRI volume.
MIRIAD data analysis
Using the procedure outlined above, for each of the 65 usable subjects in the MIRIAD dataset we computed four quantities, the HI for the left and right HC at baseline (η lb and η rb ) and those for the left and right HC at follow-up (η lf and η rf ). The average processing time for obtaining HI bilaterally was 48 seconds on a single Intel 2.4 GHZ processing core.
Analysis showed (see Results and Fig. 4) that the left and right HC measures are highly correlated. Therefore, we averaged the bilateral HI at baseline and at follow-up to result in a single HI index at baseline η b = (η lb + η rb )/2 and a single HI index at follow-up η f = (η lf + η rf )/2.
Regression analysis of the HI indices as dependent variables and age and diagnosis as independent variables showed HI to be significantly associated with age. Therefore, age correction was performed for all HI indices. For example, for the HI at baseline η b : = η b − β (Age − 70), where β is the regression coefficient. Henceforth all analyses are based on age-corrected quantities.
To obtain a single measure of HI per subject, the baseline and follow-up HI were averaged to obtainη = (η b + η f )/2. We hypothesized that η is significantly lower in the AD subjects relative to CN subjects. In addition, the annualized percentage change in HI for each subject was computed as:
To determine how well the AD cases discriminate from the CN cases in (κ, η) space, we used these variables as features to train a support vector machine (SVM) classifier with a linear kernel [34]. We estimated the accuracy of the classification method by 10-fold cross-validation. The slack variables’ tuning parameter C for the SVM classifier was selected using a second level nested cross-validation from set {10-4, 10-3, … , 104 }.
For comparison, we computed the HC volume (HCV) using the longitudinal analysis stream of FreeSurfer v.5.3.0 and repeated the CN versus AD group difference tests and SVM analyses after correcting HCV for age and intra-cranial volume. Instructions for running FreeSurfer in longitudinal mode can be found in http://surfer.nmr.mgh.harvard.edu. FreeSurfer uses probabilistic atlases of structures and of spatial relationships between structures. The longitudinal processing stream uses inverse-consistent registration to produce an unbiased within-subject template. FreeSurfer took approximately 24 hours per subject to complete on a single Intel 2.4 GHZ processing core, producing over 35 segmentations, of which we used only values for the hippocampi volumes and ICV. We ran the 65 subjects in this study in parallel on a 32-node Linux cluster.
Differences in group means were tested using independent-samples t-tests. Differences in variances were tested using Levine’s test for equality of variances. All statistical analyses were performed using SPSS Version 22 (IBM Corp. Armonk, NY).
RESULTS
We utilized 22 CN (11 males, 11 females) and 43 probable AD (18 males, 25 females) subjects from the MIRIAD dataset (Table 1). The mean (±SD) age of the entire cohort (n = 65) was 69.32 ± 7.03 y at baseline. The mean age was not statistically different between AD and CN groups. In the AD group, the mean age was not statistically different between females and males. However, in the CN group the baseline age of females (65.81 ± 4.89 y) was significantly lower than that of males (72.96 ± 7.57 y) (p = 0.016). In females, the mean baseline age of the CN group was lower than that of the AD group, however, the difference was not statistically significant (p = 0.12). In males, the mean baseline age of the CN group was higher than that of the AD group, however, the difference was not statistically significant (p = 0.18).
The mean time interval between baseline and follow-up scans (Δ t) was 0.97 ± 0.09 y. The mean Δt was not statistically different between AD and CN groups. Within each group, Δt was not statistically different between sexes.
The left and right HI (before age-correction) at both baseline (Fig. 4a) and follow-up (Fig. 4b) were highly correlated (baseline r = 0.873, follow-up r = 0.895). Therefore, for the remaining analyses we averaged the bilateral HI to obtain single measures at baseline (η b ) and follow-up (η f ).
There was a significant partial correlation between η and age after controlling for diagnosis (r = –0.425, p < 0.0005), indicating a general reduction of HI with age. Multiple linear regression of η with age and diagnostic group as explanatory variables estimated the coefficient β = -0.005 (1/y) for age which was used to correct for age in η b and η f and hence in η and κ. The following statistical results pertain to these age-corrected variables.
The mean η was significantly lower in the AD group (0.656 ± 0.083) than in the CN group (0.876 ± 0.048) (p < 10–20; Cohen’s d = 3.44) (Fig. 4c vertical axis). Within each of the AD and CN groups, η was not significantly different between males and females, despite CN females being younger than their male counterparts. The variance in η was significantly higher in AD than CN (p = 0.003). Within the CN group, the variance was not statistically different between males and females, despite females being younger. However, within the AD group, the variance of η in females was higher than males.
The mean annual percentage decline in HI as measured by κ in Equation (2) was significantly faster in AD group (–5.65 ± 3.06 % /y) than in the CN group (–0.42 ± 0.89 % /y) (p < 10–13; Cohen’s d = 2.83) (Fig. 4c horizontal axis). Furthermore, within the AD group, the females had significantly faster decline rate (–6.61 ± 3.30 % /y) than males (–4.31 ± 2.12 % /y) (p = 0.008). This result remained significant after correction for disease status as measured by the average baseline and follow-up MMSE (p = 0.013). However, there was no sex difference in κ within the CN group. The variance of κ was significantly higher in AD than CN (p < 0.0001). In addition, within the AD group, the variance of κ was higher in females relative to males (p = 0.038). However, there was no sex difference in variance within the CN group.
When η and κ were used as features to train a SVM classifier to separate AD from CN subjects, the classification accuracy was estimated to be 97% by 10-fold cross-validation. Figure 4c shows the scatter plot of η (vertical axis) versus κ (horizontal axis) for all 65 subjects. The diagonal line shows the decision boundary found by the SVM algorithm.
The above analyses were repeated for HCV and its annual percentage change obtained using FreeSurfer v5.3.0. The HCV after correction for age and intra-cranial volume was significantly smaller in the AD group as compared with the CN group (p < 10–16; Cohen’s d = 3.08) (Fig. 5 vertical axis). In addition, the annual percentage decline of HCV was faster in the AD group than in the CN group (p < 10–4; Cohen’s d = 1.18) (Fig. 5 horizontal axis). A linear SVM classifier trained on the HCV and its annual rate of decline achieved a 10-fold cross-validation accuracy of 89% (Fig. 5). The rate of HCV decline in the AD group did not statistically differ between males andfemales.
DISCUSSION
This paper introduced a new index for characterizing the structural integrity of the HC on 3D T1-weighted MRI volumes, based on the idea that atrophy replaces brain parenchyma with CSF in a standardized probabilistic HC region of interest, thus reducing the ratio of the brain tissue voxels to the total number of voxels comprising the probabilistic HC label map(Equation 1).
As hypothesized, analysis of the MIRIAD data showed that HI was significantly lower in AD patients compared to CN subjects; and that it decreased at a significantly faster rate in AD. In order to determine whether these measures have discrimination power at the case level, age-corrected η and κ were used in a two-dimensional feature space (Fig. 4c) to classify the AD from CN cases using a SVM classifier with a linear kernel. This simple combination achieved a high classification accuracy of 97% estimated by 10-fold cross-validation.
As seen in Fig. 4c, a CN female subject (subject 243, 73 years old) was misclassified by the SVM algorithm as AD. A visual examination of the MRI volumes of this subject showed marked enlargement of the lateral ventricles with significant apparent hippocampal atrophy (Fig. 2). The sample shows characteristics of AD both in terms of a low η = 0.74 and a relatively fast decline κ = -1.44 % /y. Unfortunately, we did not have a follow-up clinical diagnosis for this case.
The high accuracy of the simple linear two-dimensional SVM classifier (97% ) indicates that HI and its rate of change are highly specific and sensitive to AD when the classification task is reduced to distinguishing between AD and CN subjects. It is likely that the performance of this simple classifier will diminish in other more difficult but perhaps clinically more relevant classification tasks, for example, the task of discriminating between different dementia types [35] or the problem of predicting conversion from mild cognitive impairment to AD [36, 37]. In order to achieve equally good results for these potentially more challenging problems, it may be necessary to expand the feature space to include variables such as age, sex, genotype, neurocognitive measures and CSF biomarkers. More sophisticated classifiers such as non-linear kernel SVM may also be required.
As seen in Fig. 4c, the combination of hippocampal integrity η with its annual change κ clearly improved the discrimination between AD and CN control subjects compared to each measure alone. Quantitatively, the SVM classifier achieves a 10-fold cross-validation accuracy of 95% using η alone, and 92% using κ alone, as opposed to 97% when using both measures. As expected, availability of longitudinal information improves diagnostic accuracy [38].
Our statistical analysis showed that within the MIRIAD AD cases, the HI in females decreased at a significantly faster rate than in males. In fact, looking at Fig. 4c, it can be seen that amongst the 43 AD cases, the top 12 fastest decliners were females! In the US, the number of women with AD has been estimated to be about twice that of men [39–41]. While some of this discrepancy may be attributed to the higher longevity in women, a faster descent in women as suggested by our results could partly explain the difference in prevalence of AD between sexes.
The sex differences in the rate of HC atrophy found in this work is in concordance with several previous studies [42–44] that also reported that in AD certain brain structures atrophy at a faster rate in women than in men. Our results also provide a neuroanatomical substrate for the results of the longitudinal study by Holland et al. [44] and the results of the recent longitudinal study by Lin et al. [45], who showed that cognitive and functional measures in mild cognitive impairment subjects deteriorated faster in women that in men.
In addition to its mean, the distribution of κ in women with AD was found to be different from that of men in its variance. A factor that can explain the higher sample variance in the rate of HI reduction in women is a possible interaction between apolipoprotein E ɛ4 (APOE4) genotype and sex. APOE4 is the strongest known genetic risk factor for sporadic AD [46]. Studies have found that APOE4 confers greater AD risk in women [44, 47–49]; and that while APOE4 significantly accelerates the rate of decline of brain structures in AD [44], its effect size is larger in women [50, 51]. Further studies in which genotype information are available are needed to elucidate this finding.
The automated hippocampal volume measurements obtained using FreeSurfer differentiated between the AD and CN both in terms of mean HCV and its annual rate of decline with effect sizes 3.08 and 1.18, respectively. However, these effect sizes were smaller than those of η (3.44) and κ (2.83), indicating that the HI quantities are able to better separate between CN and AD groups. This result is also reflected in the SVM classification accuracy of 89% using HCV as compared to 97% using HI. In addition, sex differences in atrophy progression did not remain statistically significant when using HCV measured by FreeSurfer.
A limitation of this study with regards to the finding of sex differences in the rate of HI decline is the small number of subjects (43) and the fact that the female AD group (n = 25) was larger than the male group (n = 18). Another limitation is the retrospective nature of the study. Therefore, in the future, it will be necessary to corroborate these findings in larger groups of AD subject in a prospective study.
In conclusion, we have developed and applied a novel measure of HC structural integrity to one-year longitudinal data from the MIRIAD dataset. Results showed that this measure along with its longitudinal rate of change have excellent discrimination power for the task of classifying between probable AD and CN subjects. Remarkably, analyses showed that in the MIRIAD dataset, the HC atrophy of women with AD advances at an average rate approximately 1.5 times faster than that of men with AD. Risk factors of AD such as APOE4 may interact with gender to accelerate disease progression in women.
ACKNOWLEDGMENTS
The authors would like to thank Mehrad Taskindoust for proofreading the manuscript and Elaine Bermudez for helping in data analysis. This work was partly supported by grant DK064087. Data used in the preparation of this article were obtained from the MIRIAD database. The MIRIAD investigators did not participate in analysis or writing of this report. The MIRIAD dataset is made available through the support of the UK Alzheimer’s Society (Grant RF116). The original data collection was funded through an unrestricted educational grant from GlaxoSmithKline (Grant 6GKC).
Authors’ disclosures available online (http://www.j-alz.com/manuscript-disclosures/15-0780r2).
