Abstract
Keywords
INTRODUCTION
Cognitive deficits are the defining features of dementia. These impairments are strong predictors of functional outcome [1, 2], and are associated with alterations in brain structure and function [3]. Hence, the focus of recent neurocognitive studies is on individuals at risk for developing dementia. This risk period, typically associated with mild cognitive impairment (MCI), is signified by a measurable deterioration in cognitive function that is greater than expected based upon an individual’s age and education, but does not meaningfully affect a person’s daily functioning [4]. Despite being a major research focus in recent years [5], establishing the diagnosis of MCI [6], and monitoring disease progress using neuropsychological function over time remains challenging. Moreover, there is little work in developing measures that focus on and monitor individual differences in neurocognition, despite significant evidence of heterogeneity in disease presentation and progression. Recent investigations of within-individual variability (WIV; or intraindividual variability) confer unique predictive information about cognitive functioning beyond mean performance [7–10], and suggest this measure to be a relatively stable characteristic of an individual.
Early and accurate detection of cognitive impairments that precede dementia will enhance understanding of possible individual differences in disease trajectory as well as clinical management. To this effect, recent studies of neuropsychological function in MCI confirm the utility of neuropsychological tests for early detection and prevention strategies [11, 12]. Not surprisingly, use of an efficient, but multi-dimensional neuropsychological inventory (CERAD-NB) is more accurate at distinguishing MCI or AD from healthy individuals (HC) than brief screening measures. Yet, diagnostic accuracy declines when using these instruments to distinguish between HC and MCI or MCI and AD [11]. Difficulty in differentiating MCI is likely due to several factors including, but not limited to: 1) the heterogeneity of causes of MCI diagnosis; 2) variable progression rates from MCI to AD per year [13]; 3) the historical focus of research on cross-sectional differentiation of MCI from AD and healthy older adults (see [6]); and 4) the relative dearth of valid screening measures for detecting subtle deficits of early stage or prodromal AD. The importance of this last point cannot be overstated as the value of any test will be in its ability to accurately differentiate diagnostic features and identify markers of furtherdecline.
Neurocognitive variability across tests has been measured before in aging and lifespan research [7], but with inconsistent nomenclature, including ‘dispersion’ [14, 15], ‘within-person variability’ [16], ‘within-person across neuropsychological test variability’ [17], and ‘intra-individual differences’ [7]. Here we define WIV as inconsistent relative strengths and weaknesses in test performances within or across domains [7, 18]. Thus, WIV is estimated within a single person, but across several tasks in several domains and therefore provides an index of evenness, or consistency, of neurocognitive performance. Specifically, a low WIV value indicates a relatively consistent within-individual performance profile, whereas a high WIV value indicates uneven performance profiles [15]. Measuring WIV therefore could be another, more sensitive way, to document the emergence of problematic individual performance differences. This type of variability differs from ‘intra-individual variability’ (e.g., across-trial IIV) within a given test [19, 20], which typically focuses on consistency in performance speed. WIV has been operationalized two ways: 1) variability associated with measuring one individual at one time point across multiple neurocognitive tasks; or 2) measuring one individual on a single task across multiple occasions [7]. It is important to note that when WIV is small, mean performance is a robust metric; however, if WIV is high, the utility of mean performance diminishes. Thus, exclusive reliance upon mean performance without considering WIV may lead to inaccurate conclusions [21, 22]. Consequently, WIV has emerged as a useful construct for assessing cognitive performance in many disorders. WIV is higher than normal in individuals with cognitive decline, Parkinson’s disease, frontotemporal dementia, ADHD (for reviews see [7, 22]), and in dementia [10]. Here, we use MCI and AD data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) sample and compared WIV in those individuals who transition from one diagnostic category to another within one year. Using a measure of across-task variability, we hypothesized 1) that WIV would improve upon diagnostic classification of mean neurocognitive performance and 2) WIV would be a sensitive index (e.g., lower at baseline, more change over time) in individuals who transition from MCI to AD.
MATERIALS AND METHODS
Study population: The Alzheimer’s disease neuroimaging initiative
Data used in this study were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu) and approval for this project was granted [23]. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of MCI and early AD. For up-to-date information, see http://www.adni-info.org.
The neuropsychological data were collected in 229 healthy (HC), 397 MCI, and 193 AD individuals. There were significantly more males diagnosed with MCI at baseline χ2(2) = 12.37, p = 0.002. Overall, HC were older than MCI patients (p = 0.023), but not the AD patients (Table 1). HC attained higher levels of education than the AD patients (p < 0.001), but not the MCI patients (Table 1).
Diagnostic assessments included history, physical and neurological examinations conducted by experienced clinicians. On the basis of these data, a consensus diagnosis was established using standardized clinical criteria for AD, MCI [24], or other neurological or psychiatric conditions presenting with cognitive impairment (see [25] for description). Screening assessments included the Mini-Mental State Examination (MMSE; [26]) and Clinical Dementia Rating [2]. Informed consent for the use of all data was obtained from all persons, in accord with university institutional review board–approved protocols. As expected, MMSE differed by group with AD<MCI<HC (F(1, 809)=454.73, p < 0.001).
The ADNI neuropsychological battery includes tests that assess multiple cognitive domains including memory, executive functioning, attention and language [24]. Detailed descriptions of the tasks including administration and scoring instructions can be found here: http://www.adni-info.org/Scientists/ADNIStudyProcedures.html. For the current analysis, memory tests included immediate and delayed recall of the Rey Auditory Verbal Learning Test (RAVLT) [27] and immediate and delayed recall from the Wechsler Memory Scale-Revised [28] Logical Memory subtests. Executive function and attention tasks included the Digit Span Test (forward and backward) [28], the Trail Making Tests (Part A and B) [29], and the Digit Symbol Substitution Test [30]. Test of language function included semantic word-list generation (animal and vegetable fluency) and visual confrontation naming (Boston Naming Test (BNT); [31]). Additional neuropsychological measures included the Alzheimer’s Disease Assessment Scale–Cognitive subscale (ADAS-Cog; [32] and the Clock Drawing Test [33]). Total scores for each task were used as outcome measures. For measures with immediate and delayed memory scores, each score was considered separately. For the Trail Making Test the difference in time between the completion of A and B was used as the outcome measure. For the current study, neuropsychological data from the baseline visit and 12-month visit wereincluded.
Neuropsychological within-individual-variability (WIV)
Performance values were transformed to their standard equivalents based on the means and standard deviation (SD) of the healthy sample. An index of WIV across tasks was calculated for each participant in SD units (see [17]). This index of variability has been used in other studies [9, 34], and reflects variation within a single person across several neurocognitive tasks and is therefore an index of an individual’s evenness or dispersion of performance across neuropsychological domains. WIV was calculated at baseline and at the 12-month follow-up. In addition, a global index of neurocognitive performance (GNP) was calculated by averaging the standardized scores (z-scores) across all neuropsychological tasks. Three individuals were excluded from analysis for having completed fewer than half of the tasks. All other participants completed a minimum of 9 of the 12 neuropsychological tasks.
Within-individual variability =
where Z
ik
is the kth test score for the ith individual and:
Statistical analysis
WIV values were entered as dependent measures in ANCOVAs with MMSE, age, and sex included as factors and education as a covariate. Post-hoc contrasts were used to examine interactions; Satterthwaite corrections were used when equal variances could not be assumed; corrected degrees of freedom are reported where appropriate. Pearson correlations were performed between demographic and performance variables. Cohen’s d values are presented for group-specific WIV comparisons. The diagnosis accuracy for each measure (or combination of measures) was calculated as the area under the receiver operating characteristic (ROC) curve (AUC). The AUC measure represents the mean sensitivity value for all possible values of specificity and larger AUC values indicate more accurate classification of participants. The logistic ROC analysis used a 10-fold cross-validation approach to estimate AUC and optimal cut-off score. AUCs were compared using DeLong’s non-parametric method [35]. A cut-off score for each measure that best differentiated diagnostic groups was determined using the Youden Index [36], which maximizes the tradeoffs between sensitivity and specificity. The classification accuracy (probability of correct classification of subject with or without impairment at a given cut-off score) was calculated based upon these cut-off scores (Table 2). Diagnostic accuracy of the GNP and WIV (or combination of measures) were compared via Chi-Square analysis. Classification accuracy of each measure was compared using the Wilcoxon Signed Rank test. All statistical analyses were performed in R [37].
RESULTS
Within-individual variability and global neurocognitive performance in AD and MCI
WIV differed by diagnostic group, F(2, 810) = 258.42, p < 0.0001, but not by age (p = 0.68) or sex (p = 0.44) and there were no significant interactions (Fig. 1). As hypothesized, AD patients had higher WIV than MCI [t(305.31) = 8.76, p < 0.001; Cohen’s d = 0.80] or HC [t(238.46) = 21.58, p < 0.001; Cohen’s d = 2.18] and MCI had higher WIV than HC [t(610.16) = 20.25, p < 0.001; Cohen’s d = 1.54]. These effects remained significant in an age and gender matched subsample (see Supplementary Material). Exploratory group-specific analysis found that in HC, age F(1, 220) = 5.01, p = 0.02, was significantly related to WIV, while there were no effects of MMSE, sex, nor any interactions. In the MCI group, a lower MMSE score F(1, 391) = 24.18, p < 0.001 and lower education, F(1, 391) = 8.62, p = 0.004, were associated with higher WIV; age, sex, and the interactions were not significant. In AD, lower MMSE scores were associated with higher WIV, F(1, 391) = 10.05, p < 0.002; age, sex, and the interactions were not significant. Associations between MMSE and WIV are displayed in Fig. 2. As expected, GNP differed by diagnostic group, F(2, 801) = 519.42, p < 0.0001, by age F(1, 801) = 5.20, p < 0.03, but not sex (p = 0.07), and there were no significant interactions. As seen in Table 1, HC outperformed MCI, who in turn outperformed AD. Plots of performance by task are shown in the Supplementary Material.
The association between WIV and GNP was measured using Pearson correlation. Higher variability was associated with poorer GNP across diagnostic group (Pearson r = –0.78, p < 0.001). This relationship was more prominent in the two patient groups: AD (r = –0.67, p < 0.001) and MCI (r = –0.62, p < 0.001), but was also significant in the HC (r = –0.23, p < 0.001). In addition, variability at baseline was associated with variability at 12-month follow-up (r = 0.73, p < 0.001). Again, this association was strongest in the two patient groups: AD (r = 0.58, p < 0.001), MCI (r = 0.54, p < 0.001); however, HC (r = 0.37, p < 0.001) also showed a significant positive correlation.
ROC analysis of GNP and WIV in AD, MCI,and HC
The ROC analysis was used to evaluate the diagnostic accuracy of each measure (GNP and WIV) to discriminate AD and MCI from each other and from healthy cognitive subjects. Graphic representations of the ROC curves are provided in Fig. 3, and Table 2 shows clinically relevant cut-offs for the GNP, WIV, and combination of the two measures. The diagnostic accuracies of GNP and WIV were excellent for HC versus AD, with AUCs > 0.96. Diagnostic accuracies were lower, but still very good in the GNP (0.94) and WIV (0.89) for HC versus MCI. Diagnostic accuracies of both measures were the lowest when differentiating AD from MCI, yet the AUCs were still moderate to good: GNP (0.81) and WIV (0.72). A comparison of AUC between GNP and WIV is presented in the Supplementary Material.
Combining WIV and GNP improves diagnostic accuracy in MCI
Combining WIV with GNP significantly improved diagnostic accuracy as compared to using either measure alone for discriminating MCI from HC (Table 2/Fig. 4, green lines). Specifically, considering WIV in addition to GNP improved diagnostic accuracy when differentiating HC from MCI [Z = 3.32, p < 0.0001]. This was due to an increase in specificity (improvement in identifying HC). GNP alone was best for differentiating HC from AD and MCI from AD; WIV did not significantly increase diagnostic discrimination (AUC) in these two comparisons. However, using the combination of optimal and clinically relevant cut-off scores (a score below either cut-off score) of the GNP and/or WIV to classify individuals resulted in an increase in classification accuracy of 4% at the optimal cut-off for HC versus AD (V = 153, p < 0.01), 9% for HC and MCI (V = 1711, p < 0.001) and 11% for MCI and AD (V = 1378, p < 0.001). Cut-off scores for each measure are provided in Table 2.
WIV is higher in individuals that transitionfrom MCI to AD
The majority of individuals (83%) had neuropsychological data available at 12-month follow-up (Table 1). Approximately 10% of individuals had a diagnostic change within 12 months, the vast majority transitioning from MCI to AD (85% or 58/68 subjects). Since the majority of diagnostic change was seen within the MCI group, follow-up statistical analyses were conducted only within the MCI→MCI and MCI→AD group. The change in variability was calculated as WIV at 12 months minus WIV at baseline. The change in WIV is shown for all groups in Fig. 4. Age did not differ between the MCI→MCI and MCI→AD groups (p = 0.21). The MCI→AD group F(1, 314) = 6.51, p = 0.01) had more change in WIV over a 12 month period (Fig. 4). On average, there was a 12% increase in variability in the MCI→AD group, while the MCI→MCI group had approximately a 6% increase over the same time period. In addition, the MCI→AD group had higher baseline WIV [t(77.88) = 3.20 = 0.002] and higher WIV at 12 months [t(97.05) = 3.16 = 0.001]. Importantly, HC→HC showed lower WIV at 12 months (∼3% reduction), while AD→AD patients showed, on average, a 17% increase in WIV. There were a small number of individuals (n = 7) with a change from MCI→HC, however this group was too small to perform meaningful statistical analysis.
DISCUSSION
We report the utility of measuring neurocognitive WIV in individuals at risk for dementia. We report lower GNP and higher WIV in MCI and AD as compared to healthy older adults. Higher variability was associated with higher MMSE and poorer overall neurocognitive performance; however, this was more prominent in MCI and AD patients than in healthy individuals. Most importantly, we show that diagnostic change from MCI to AD corresponds with greater baseline WIV and a larger 12-month change in WIV. In addition, we show that WIV can add to the differentiation between diagnostic groups, particularly between healthy individuals and MCI.
The preponderance of the neurocognitive research in dementia emphasizes mean group differences in neurocognitive performance, which typically ignores within-individual variability. As we show, within-individual variability appears to be a useful tool to monitor individual differences in neurocognition, and aids in diagnostic differentiation of MCI from healthy individuals. WIV has not been thoroughly studied in AD and MCI, but is more common in aging and lifespan research. Our findings indicate that WIV is associated with general cognitive performance (i.e., MMSE), but not age; these findings parallel other large studies of aging [14, 15]. Yet, the specific aims of these two studies were not to directly compare healthy individuals to those with MCI or AD, although Lindenberger & Baltes (1997) provide exploratory findings in a small cohort of individuals with dementia. Another study, which measured the deviation of measures of cognition from crystalized intelligence in healthy aging, found age to be associated with higher variability [38], particularly when compared to healthy young adults. In addition, Rabbitt (1993) concludes that when neurocognitive function begins to decline in old age these abilities do not “all go together when they go” [38] (p. 385). More recently, a study of limited neuropsychological data from the ADNI indicates that computing variability within items or across tests provides a useful summary measure of performance in AD, MCI, and healthy cognitive aging [39], and we show similar results when measuring global WIV in the ADNI sample. More recent evidence from another sample suggests that high WIV is characteristic of patients with MCI [10], which we replicate. In addition, we show that WIV, in combination with GNP, may help to identify those individuals with a progressive pathological process earlier in the course of disease. Specifically, our measure of variability, which has been used before in aging research [7], examines intra-individual consistency of performance across neurocognitive domains. Our finding that AD and MCI groups were more varied in their performance than our HC group fits with prior reports in aging [21] and dementia [10]. For example, Reckess et al. [10] measured across-task WIV in 528 individuals, 395 with clinical symptoms. Their findings indicate that WIV increases with symptom severity in MCI, but shrinks in those with significant dementia. In conjunction with our findings, this indicates that WIV may be more sensitive in detecting subtle change in those people with subtle cognitive impairments (such as MCI), but is less meaningful in frank dementia where mean neuropsychological performance is near floor and sufficient, for detecting dementia-level impairment. We do not find lower WIV in advanced AD as compared to MCI; however, Reckess et al. [10] estimated variability using the standard deviation of performance for groups of individuals with specific MMSE scores, whereas we estimated variability over the entire performance spectrum.
Most importantly, WIV appears to provide a sensitive measure to detect small, but potentially meaningful change over short periods of time in MCI. We find MCI individuals that transition to clinical dementia show higher WIV at baseline and a larger increase in WIV over a 12-month period. Thus, higher WIV may reflect domain specific deterioration of cognitive performance. This inconsistency in performance suggests that relative declines in one area of cognition as compared to another are an important signal of overall deterioration of the neural system. This suggests that WIV provides researchers and clinicians a tool that is sensitive to subtle changes in the disease course that traditional approaches fail to detect. Likewise, our findings suggest that WIV may help reduce some of the heterogeneity in the definition of MCI; however, these measures need to be validated in larger samples, prospectively. Similar to changes in average performance, inconsistencies in variability are likely due to underlying changes in the neural architecture; however, this remains to be confirmed. Other studies [22, 40] suggest consistency, or lack thereof, may be a sensitive index of performance over time that is related, in part, to neurotransmitter function [19, 41] or white matter integrity [9]. While WIV provides an index of deterioration it does not identify the specific domain affected. Nonetheless, it is noteworthy that we were able to detect these subtle changes in WIV, at least within the relatively short time parameters of this study. Future studies should consider measuring the relationship between response slowing/variability at the task level and brain function (i.e., diffusion MRI) in dementia.
While our findings are intriguing, there are several limitations to consider. First, we use global scores for performance and variability and these are calculated across domain. Given the modest number of tests in each neurobehavioral domain, the global scores may have been biased by the psychometric characteristics of the test instruments used. Although age-related differences in GNP and WIV appear systematic, there are likely multiple factors that contribute to variability. Differences in WIV therefore may depend to a large extent on the specific constellation of abilities being measured; if so, its generalizability needs further scrutiny. Directly measuring WIV in neurocognitive performance provides a general view of neurocognitive ability. However, Cole et al. [40] argue that using a composite index of neurocognitive domains, such as WIV, provides a better index of consistency in neurocognitive ability. Furthermore, WIV can be advantageous in elucidating common underlying mechanisms of information processing that result in increased variability [42]. This approach may be more sensitive to detecting change over time by taking advantage of the variability within an individual to aid in determining individuals at-risk for dementia. Future longitudinal follow-up studies could expand upon the variability findings by elucidating the specific neurocognitive domains responsible for higher variability in MCI and AD and by examining variability across time within a specific test. The contribution of WIV to diagnostic classification was significant, but small. We acknowledge that there is only a small increase in the AUC of the ROC curve for HC versus MCI, however we believe the more important finding is the increase in classification accuracy at the proposed cut-offs since this translates to mean performance scores that could be used clinically. Given the heterogeneity in the MCI diagnosis, it is unlikely that a global measure of variability will suffice for detecting very subtle performance changes. Measuring variability on a trial-wise basis may further improve the capability of variability in classification. Future studies that implement computerized neurocognitive testing and record reaction time on a trial-by-trial basis should consider evaluating variability in MCI and AD. Moreover, our results support the potential utility of WIV as a dimension or metric related to cognitive decline. Initial evaluation of the construct validity of this new dimension suggests that WIV correlates negatively with average performance. The negative association between higher WIV and lower GNP data adds validity to our measure as the findings align with the substantial literature showing decreases in neurocognitive ability with increasing age and dementia [21]. A single measure of variability over a battery of tasks does not replace a thorough neuropsychological evaluation, as variability is common even in healthy adults [43]. However, measurement of within-person variability as a complementary measure to traditional mean performance metrics provides a generalizable index of neurocognitive performance that is informative and potentially useful in the study of dementia.
Despite being a major research focus in recent years [5], identifying individuals at risk for developing dementia using traditional neurocognitive assessments remains challenging. This is likely due, in part, to defining and diagnosing MCI (and more recently in individuals with subjective memory complaints) in terms of acquired impairment in neurocognitive domains specifically affected in AD [44]. While this approach ensures clinical continuity between MCI and AD, traditional tests tend to be insensitive to early, subtle deficits. Moreover, this approach assumes that early deficits in at-risk individuals are the result of higher order (cortico-cortical) neural disruptions; however, recent work suggests that lower level neural deficits may be more prominent in MCI [45]. We believe that early identification and monitoring of disease progression can be improved upon in MCI by increasing the specificity of neurocognitive testing by measuring variability in neurocognitive performance. The present study of within-individual variability provides evidence that capturing variability in neurocognitive performance is a useful index in MCI and dementia, and may be a beneficial screening tool. Specifically, increases in performance variability may index vulnerability and potential transition to dementia. Moreover, the use of a patient-centered metric, such as WIV, is ideal for monitoring subtle change in performance over extended periods of time, may reflect the unfolding of neurocognitive dysfunction and may be associated with brain deterioration that would go undetected if only mean performance isconsidered.
Footnotes
ACKNOWLEDGMENTS
This work was supported by NIA AG10124, NIMH K01 MH102609 (Dr. Roalf), and the Institute of Aging and Alzheimer’s Disease Core Center Pilot Funding Program (Dr. Roalf).
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (
). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
