Abstract
Background:
Cognitive change in mild cognitive impairment (MCI), a likely prodrome to Alzheimer’s disease, can be tracked with repeated neuropsychological assessments, but there has been little work quantifying these changes over time. Cognitive change can be statistically examined using standardized regression-based (SRB) formulas, which yield a z-score indicating amount of change compared to a normative group.
Objective:
To use SRB z-scores to quantify cognitive change in a sample of patients classified as MCI at baseline, and to compare cognitive change in those who remained MCI on follow-up (MCI-Stable) and those who progressed to dementia (MCI-Decline).
Methods:
Using 283 MCI patients from a cognitive disorders clinic who were re-assessed after approximately one- and one-half years, SRB z-scores were calculated for each test in a comprehensive neuropsychological battery for each participant.
Results:
There was a significant decline between timepoints across all cognitive tests, with the greatest amount of decline on tests of learning and memory. Group differences were seen on nearly all cognitive tests, with the MCI-Decline group showing more decline (i.e., significantly larger and negative z-scores) than the MCI-Stable participants. Notable cognitive decline was also observed in the MCI-Stable group, with z-scores ranging from –0.01 – –2.24 compared to normative data.
Conclusion:
This study highlights the amount of cognitive decline that occurs in MCI, including for those who remain “stable” and those who progress to dementia. It also demonstrates the value of the SRB method in more clearly quantifying cognitive decline, which may help identify individuals most vulnerable to MCI progression.
INTRODUCTION
Repeated neuropsychological assessments allow clinicians to track cognitive change in patients over time. This is particularly relevant for mild cognitive impairment (MCI), a condition known to remain stable across time or progress to dementia (or less commonly to revert to normal). For example, the change of performance on cognitive measures administered at two time points can be examined to evaluate for disease progression. Indeed, the American Academy of Clinical Neuropsychology recommends that neuropsychologists thoughtfully measure change between repeat assessments to inform clinical interpretations [1]. However, it should be noted that a significant decline on cognitive testing performance is not required for progression from MCI to dementia, as most diagnostic criteria indicate that a worsening of one’s ability to perform daily activities (e.g., driving, managing medications, completing household chores) is necessary to infer this worsening disease state.
In a study that examined change over five years in participants determined to be cognitively intact at baseline, Machulda and colleagues [2] noted that those who had developed MCI or dementia at follow-up were found to perform lower at baseline than those who remained cognitively intact. Individuals likely to progress also showed different trajectories over time in individual cognitive domains [2]. Other studies that have looked at progression across time have found similar changes in MCI, including declines in verbal memory, attention and working memory, language, and a lack of practice effects on cognitive measures [3–6]. However, most studies on longitudinal change in MCI have dichotomized the outcome (e.g., stability or decline) rather than consider a more quantitative determination of change.
There are multiple methods of statistically examining change, including the Reliable Change Index (RCI) and standardized regression based (SRB) formulas, which each yield z-scores that can be interpreted as reflecting decline, stability, or improvement in performance between time points. The RCI accounts for practice effects by quantifying the difference between a patient’s two test scores compared to a change in some normative sample [7]. Conversely, simple SRB formulas utilize multiple regression to predict a Time 2 from a Time 1 score, and the predicted and observed Time 2 scores are compared [8]. In both formulas, z-scores below –1.645 traditionally indicate “decline” between timepoints, z-scores between –1.645 and +1.645 indicate “stability,” and z-scores above +1.645 indicate “improvement” [9]. One advantage of the SRB method over the RCI is that the former controls for baseline level of cognitive functioning, which might be particularly important for MCI and dementia research, given that these individuals will have lower scores at baseline that may not exhibit much change between Time 1 and Time 2. Maassen and colleagues [10] presented a method for estimating SRB results based on normative data found in test manuals and empirical papers (e.g., means and standard deviations of scores at Time 1 and Time 2, test-retest correlations). Although this adaptation of the SRB method has made it more accessible to clinicians and researchers, there are relatively few studies that have used it to address clinically meaningful questions.
The purpose of this study is twofold. First, since there remains a need for clinical research that examines the continuous trajectories of cognitive change in different patient groups [1], the current study aimed to characterize and quantify cognitive change across repeated comprehensive evaluations using the estimated SRB formulas of Maassen et al. [10] in a sample of patients classified as MCI at baseline. By using a standardized score to quantify change in cognitive performance, individuals can be compared to the normative sample, allowing us to better understand their change across time. Understanding the cognitive trajectories in MCI may also shed light on those at greater risk of progression who require earlier intervention. Furthermore, having a consistent outcome value (i.e., z-scores from SRBs) allows one to compare change in different cognitive domains. With regards to this aim, we predicted there to be decline between timepoints across the sample as a whole. Second, the current study aimed to use SRB z-scores to objectively determine cognitive status at follow-up and compare cognitive change in those that clinically progressed to dementia and those who remained stable at MCI. We hypothesized that the subgroup that progressed to dementia would show more cognitive decline at follow-up than the subgroup who remained at MCI. It was expected that higher-order tasks, such as for measures of language and executive functioning, would show more decline in the progressing subgroup when compared to more basic cognitive functions (e.g., attention, processing speed). On tests of learning and memory, although additional decline was predicted, it may be attenuated due to “floor effects” (i.e., such low scores at baseline that they cannot decline much at follow-up) in both groups.
Participants
Two hundred eighty-three patients who presented to a cognitive disorders clinic met criteria for MCI at an initial (i.e., Time 1) neurological visit, met criteria for either MCI or dementia at a follow-up (i.e., Time 2) neurological visit, and had neuropsychological evaluations at both Times 1 and 2 were included in the current study. An MCI diagnosis (at Times 1 or 2) was based on cognitive deficits in one or more domains and report of intact daily functioning (e.g., driving, managing finances, medications) as determined by a neurologist. A dementia diagnosis (at Times 1 or 2) was also based on neurologist determination of cognitive deficits in two or more domains and report of impaired daily functioning that was not better explained by another medical or psychiatric condition. As such, the main difference between MCI and dementia was that the latter group had impairments of daily functioning and the former group did not. Participants who remained classified as MCI at Time 2 were categorized as “MCI-Stable” (N = 156, 55.1%). Participants who met criteria for dementia at Time 2 were categorized as “MCI-Decline” (N = 127, 44.9%). Table 1 shows descriptive statistics of demographic information for entire sample, as well as the two follow-up groups.
Sample demographics
°M(SD) presented for this variable. MCI, mild cognitive impairment.
METHODS
Procedures
All study procedures were approved by the University of Utah Institutional Review Board, which included the use of clinical data for research purposes. As noted above, all participants presented to a cognitive disorders clinic for concerns about cognitive problems. Following an initial visit (i.e., Time 1) with a neurologist and a neuropsychological evaluation, a neurologist diagnosed all participants with MCI (any subtype). During a follow-up visit (i.e., Time 2) with a neurologist and another neuropsychological evaluation, the neurologist diagnosed participants with either MCI (any subtype) or dementia (any etiology). Time 2 evaluations were completed approximately one- and one-half years (M = 525.4 days, SD = 300.4, range = 169 –2,443) after the Time 1 evaluations. It should be reiterated that the neurologist’s diagnosis at Times 1 and 2 were used. Even though these diagnoses may have been informed by the neuropsychological evaluation, they were also informed by the neurologist’s examination, report of the patient and a collateral source about the patient’s cognitive and daily functioning, and neuroimaging and other diagnostic tests. It was not standard clinical practice to formally calculate any reliable change metrics for repeat cases in this clinic. As such, there is expected to be minimal circularity between the diagnoses and the amount of cognitive change seen across the two neuropsychological evaluations.
Measures
Test manuals (and research papers) typically report test-retest data for limited subsets of the cognitive variables and standardization sample. As such, we were forced to utilize some raw scores and some demographically-corrected scores to calculate SRB formulas. This highlights the importance of making available comprehensive data that includes the necessary components for estimated SRB formulas (i.e., means and standard deviations of scores at T1 and T2, and test re-test reliability coefficient; see Method), allowing clinicians and researchers to employ them more broadly.
Within each neuropsychological evaluation, the following cognitive tests were administered: Animals, Vegetables, Fruits [11] (AVF; Category Fluency) is a timed semantic fluency test in which participants must say as many words as possible according to the specified category. The raw score for this test that was included in the analysis was the number of correct words produced across three 60” trials, with higher scores indicating better performance. The SRB formula for this test was computed using 15-month test-retest reliability values from a Mayo Clinic Study of Aging cohort of 614 cognitively intact older adults, ages 65 to 85 at baseline [12]. Boston Naming Test [13] (BNT) and Boston Naming Test Second Edition [14] (BNT-2) are sequential versions of a confrontational naming test in which participants provide the names of 60 pictures. Both versions contain the same test stimuli. The version of the test administered in patient cases was dependent on date and test availability. The raw score included in analyses was the number of total correct responses where higher scores indicate better performance. The SRB formula for this test was computed using test-retest reliability values from the BNT/BNT-2 re-administered after one year in a sample of 122 cognitively intact older adults, ages 57 to 85 at baseline [15]. Brief Visuospatial Memory Test – Revised [16] (BVMT-R) is a test of visual memory in which six figures are learned over three trials (Total Recall) and recalled following a 20–25-minute delay (Delayed Recall). Points for this test were awarded for accurate reproduction and placement of the six figures in each condition, with higher scores indicating better performance. This raw score was included in analyses, and test-retest reliability data for the SRB formula was obtained from 71 adults (M = 43.5 years) retested after an average of 55.6 days as reported in the BVMT-R Professional Manual [16]. Controlled Oral Word Association Test [17] (COWAT) from the Multilingual Aphasia Battery is a timed phonemic fluency test in which participants must provide as many words as possible according to the specified letter. The raw score included in analyses for this test was the number of correct words generated across three 60” trials, with higher scores indicating better performance. 6-month test-retest reliability values from updated normative data using a sample of 360 cognitively intact adults, ages 16 to 70 at baseline, was used to generate the SRB formula [18]. Geriatric Depression Scale [19] (GDS) is a 30-item self-report questionnaire for older adults which assesses symptoms of depression, where higher scores indicate more depressive symptoms. Total raw score was included in analyses, and 7-day test-retest values were obtained from an empirical paper examining GDS reliability in a community sample of 30 older adults, ages 60 to 89 [20]. Hopkins Verbal Learning Test – Revised [21] (HVLT-R) is a test of verbal memory in which 12 words are learned over three trials (Total Recall) and recalled following a 20–25-minute delay (Delayed Recall). The raw score included in analyses was the total number of correct words recalled for each condition. 2-week test-retest data was obtained from two papers presenting the reliability of neuropsychological measures in samples of 40 cognitively intact adults, ages 57 to 82 at baseline [22, 23]. Trail Making Test [24] (TMT) consists of two parts. TMT Part A is a timed test to assess visual scanning and processing speed in which participants connect 24 randomly arranged numbers in sequence. TMT Part B is a timed test to assess executive functioning in which participants switch between connecting randomly arranged numbers and letters in sequence. For both conditions, the score was the time to complete, with lower scores indicating better performance. This raw score was included in analyses. The SRB formula for this test was computed using values from a paper presenting test-retest reliability of neuropsychological measures over 3–16 months in 384 cognitively intact adults, ages 15 to 83 [23]. Wechsler Memory Scale, Third Edition [25] (WMS-III) and Fourth Edition [26] (WMS-IV) are sequential versions of a measure of learning and memory from which two subtests were administered. The version administered was dependent on the date and test availability. Logical Memory I (LMI, older adult version, ages 65–90) is a test of episodic learning in which one story is learned over two trials and another story is learned over one trial. Logical Memory II (LMII, older adult version, ages 65–90) is a test of episodic memory in which the previously presented stories are recalled following a 20–25-minute delay. The score for this test was the number of correct details from each story recalled, with higher scores indicating better performance. The age-corrected scaled score was included in analyses, and test-retest reliability values from 71 older adults, ages 65–90, as reported in the WMS-III and WMS-IV Technical and Interpretive Manuals were used to compute the SRB formula [26]. Retest interval was between 14 to 84 days (M = 23 days). Wechsler Test of Adult Reading [27] (WTAR) and the Test of Premorbid Functioning [28] (ToPF) are measures of premorbid intellect requiring participants to pronounce irregularly spelled words. The score for this test was the number of correctly pronounced words, with higher scores indicating better performance. The demographically-corrected standard score was included in analyses, and 2–12 week (M = 35 days) test-retest reliability values from 71 older adults, ages 75 to 89, as reported in the WTAR/ToPF manuals were used [27, 28]. Wechsler Adult Intelligence Scale – Third Edition [29] (WAIS-III) and Fourth Edition [30] (WAIS-IV) are sequential versions of a test of intelligence for use with individuals 16–90 years old. The version of the five subtests administered was dependent on date of administration and test availability. Block Design is a timed test of constructional ability in which participants must reproduce abstract designs using blocks. The score for this test was the number of correct designs, plus an added time bonus for faster time to complete. Similarities is a test of verbal reasoning in which participants describe how pairs of words are similar. The score for this test was the total number of correct items, with an added bonus for responses using greater abstraction. Digit Span is a test of simple attention in which participants repeat series of numbers in forward, backwards, and sequential order. The score was the total of correct number strings across the three conditions. Matrix Reasoning is a test of nonverbal reasoning in which participants identify logical sequences of abstract designs. The score was the number of correct responses. Digit Symbol/Coding are each tests of complex attention and processing speed in which participants must match and copy digits or symbols. The number of correctly copied digits or symbols was the score for this test. For each WAIS-III and WAIS-IV subtest, the age-adjusted scaled score was used in analyses. Test-retest reliability over 8 to 82 days (M = 22 days) was obtained 298 adults with an average age of 52.6, as reported in the WAIS-IV Technical and Interpretive manual [29, 30].
Although different versions with updated normative datasets were administered across the patient sample for some measures (e.g., WAIS-III and WAIS-IV), most test stimuli and protocols were identical. Across this large cohort, it is not expected that these different versions would notably change current results, especially given that all SRB comparisons were made within individuals.
Statistical procedures
Less than 5% of total cases in the dataset were missing at random. Nonetheless, multiple imputation was used to create and analyze five imputed datasets using the SPSS Missing Values module. Pooled parameter estimates were included in analyses. For parameters for which the module did not provide a pooled estimate, the parameter estimates from the five imputed datasets were averaged to create a pooled estimate.
To quantify the amount of cognitive change that occurred between Time 1 and Time 2, simple SRB change formulae [8] were calculated to obtain an estimate of change between assessments, correcting for practice effects, test-retest reliability, regression to the mean, and a patient’s baseline score. Traditional SRBs are calculated as SRB = (T2 -
Specifically, Maassen et al.’s SRB model [10] was estimated through the following procedure: The estimated beta weight (b) was calculated by dividing the standard deviation of the control group at Time 2 (S2) by the standard deviation of the control group at Time 1 (S1), as in b = S2/S1. The estimated constant (c) was calculated by finding the product of b and the mean of the control group at Time 1 (M1), and subtracting the value from the mean of the control group at Time 2 (M2), as in c = M2 – (bM1). The estimated standard error of the estimate (SEE) was calculated by: Find the sum of the S1 squared and S2 squared, as in Find one minus the control group’s test-retest correlation (r12), as in 1 - r12. Multiply Take the square root of the resulting product, as in The estimated predicted Time 2 score ( The estimated SRB z-score was calculated by subtracting the
This procedure for was repeated for each participant for each test in the battery.
A multivariate ANOVA (MANOVA) was used to address Aims 1 and 2 in order to account for multiple dependent variables in the comparison between collective z-scores at each timepoint. First, to examine cognitive change using SRBs in the total sample, the model intercept was evaluated to determine whether the mean z-score of the total sample statistically differed from 0, indicating a significant change in cognitive performance between time points. Second, to examine cognitive change in the two subgroups, the MANOVA was examined for comparisons between the MCI-Stable and MCI-Decline subgroups on their estimated SRB z-scores. In each MANOVA, the 17 cognitive and mood SRB z-scores were the dependent variables. Follow-up univariate ANOVAs were conducted for each dependent variable, signifying change in individual measures. Alpha level of p < 0.05 will be used to determine statistical significance in all analyses.
RESULTS
Demographic data on the entire sample and the two groups are presented in Table 1. There were no significant differences in demographic variables between the MCI-Stable and MCI-Decline groups. Cognitive and mood scores are presented in Table 2, as well as reliability coefficients between scores and each timepoint.
Cognitive and mood scores
Raw scores presented above, except those marked with ∧ and #, where ∧ indicates scaled score and # indicates standard score. Availability of normative test-retest data determined the type of score used. Statistically significant (i.e., p < 0.05) correlation coefficients are denoted with *. WTAR, Wechsler Test of Adult Reading; ToPF, Test of Premorbid Functioning; WAIS-III/IV, Wechsler Adult Intelligence Scale – Third Edition/Fourth Edition; TMT, Trail Making Test; COWAT, Controlled Oral Word Association Test; AVF, Animals, Vegetables, Fruits; BNT/BNT-2, Boston Naming Test /Second Edition; HVLT-R, Hopkins Verbal Learning Test – Revised; BVMT-R, Brief Visuospatial Memory Test – Revised; WMS-III/IV, Wechsler Memory Scale, Third Edition/Fourth Edition; GDS, Geriatric Depression Scale.
Cognitive change in the total sample
Results of the overall MANOVA examining change in the total sample showed a significant difference between timepoints on the collective estimated SRB z-scores (i.e., statistical significance of the intercept; Pillai’s Trace = 0.87, F(18, 264) = 95.19, p < 0.001,
Cognitive and Mood SRB Z-Scores derived from Maassen et al. (2006) [10]
Z-scores presented above. ¶notation denotes statistically significant change between timepoints in the total sample. †denotes tests for which z-scores between groups significantly differed (i.e., p < 0.05) at follow-up. Z-score signs were flipped for Trails A and B and the GDS (denoted by ‡) such that negative scores reflect decline and positive scores reflect improvement. SRB, standardized regression-based score; MCI, mild cognitive impairment; WTAR, Wechsler Test of Adult Reading; ToPF, Test of Premorbid Functioning; WAIS-III/IV, Wechsler Adult Intelligence Scale – Third Edition/Fourth Edition; TMT, Trail Making Test; COWAT, Controlled Oral Word Association Test; AVF, Animals, Vegetables, Fruits; BNT/BNT-2, Boston Naming Test /Second Edition; HVLT-R, Hopkins Verbal Learning Test – Revised; BVMT-R, Brief Visuospatial Memory Test – Revised; WMS-III/IV, Wechsler Memory Scale, Third Edition/Fourth Edition; GDS, Geriatric Depression Scale.
Cognitive change in the two subgroups
Results of the MANOVA tests of between-subjects effects showed a significant difference between the MCI-Stable and MCI-Decline groups on the collective estimated SRB z-scores (Pillai’s Trace = 0.24, F(18, 264) = 4.33, p < 0.001,
Follow-up ANOVAs were used to examine each cognitive test individually. There were significant differences between the MCI-Stable and MCI-Decline group on the following cognitive tests: WTAR/ToPF (F = 4.53, p < 0.05
DISCUSSION
The current study examined cognitive change across one- and one-half years in patients classified as MCI at baseline using Maassen and colleagues’ estimated SRB method [10]. Such fine-grained analysis of cognitive change within individuals can clarify diagnoses and prognoses in both clinical and research settings. Results for the entire sample demonstrated decline in all cognitive domains across the retest interval, with the greatest amount of decline occurring on tests of learning and memory. Interestingly, there was relatively more decline on the immediate recall trials than the delayed recall trials (e.g., HVLT-R and BVMT-R Total Recall = –2.56 and –2.16, respectively; Delay Recall = –1.77 and –1.69, respectively). The greater range of possible scores in immediate recall trials and potential “floor effects” on the delay trails (i.e., scores so low at baseline that it is difficult to get worse on follow-up) may explain this finding. Compared to the decline on the memory tests, the SRB z-scores for other cognitive tests were more modest. Declines on tests of attention and processing speed were the next most notable series of changes in this clinical cohort (e.g., SRB z-scores ranging –0.65 – –1.49). Declines on tests of language, construction, and executive functioning were quite similar. Notably, on a test of estimated premorbid functioning, a small decline was seen in the entire sample, as such cognitive abilities are thought to be generally resistant to cognitive decline [31]. Overall, these results are consistent with other studies that have tracked individuals with MCI across time [2–5]; however, few studies have quantified change in this way.
The current study also sought to quantify the amount of cognitive change in those who remained stable over time (i.e., retained their MCI diagnosis at a follow-up visit) compared to those who progressed (i.e., were diagnosed with dementia at a follow-up visit). Consistent with expectations and the results of the entire sample, both groups demonstrated decline in their cognition across the retest interval. Despite this, the MCI-Stable group showed less decline at follow-up on all tests of attention and processing speed, naming and semantic fluency, non-verbal reasoning and set-shifting, and all memory measures. Traditionally, the course of MCI is typically viewed dichotomously [32], as stable or progression to dementia (or sometimes trichotomously, with some individuals reverting back to normal [33, 34]). Admittedly, the diagnoses at Times 1 and 2 made by the neurologists in our clinic also viewed these conditions in this way. However, such dichotomous outcomes are likely missing important continuous changes that occur within patients. For example, in the current study, those classified as MCI-Stable had mean negative SRB z-scores on all cognitive scores. Although these changes may reflect progression within MCI (e.g., single-domain MCI progressing to multi-domain MCI [35]), they still can be captured and tracked across time. The SRB method allows one to quantify that change and potentially identify those in need of closer follow-ups and intervention.
Even though both groups cognitively declined across time, those who progressed to dementia showed more decline than those who retained their MCI diagnosis. Within the reliable change literature, ±1.645 is typically used a demarcation point of change, with scores at or below –1.645 indicating decline and those at or above +1.645 indicating improvement [8, 9]. As seen in Table 3, the MCI-Stable group had three mean cognitive scores that fell at or below –1.645 compared to the normative data: HVLT-R Total Recall, HVLT-R Delayed Recall, BVMT-R Total Recall. Those in the MCI-Decline group showed six scores at or below –1.645, including on tests of learning and memory, processing speed, and set shifting. Furthermore, since these SRBs control for baseline level of scores, this cannot be viewed as solely due to floor effects. Regardless, this greater range and depth of decline since Time 1, especially in these domains, may explain why functional impairment has developed in the MCI-Decline patients, leading to a dementia diagnosis. For example, decline in memory contributes to the inability to complete daily tasks such as medication and financial management. Although greater executive decline in the MCI-Decline group was not observed in all executive measures of the battery, significant decline in set shifting may have important implications for functional decline, for instance, for driving ability.
These z-scores computed using information from the normative sample data may help clarify the nature of patients’ decline over time that might be missed when not considering “normal change” on cognitive tests over time. In some measures, the “raw” scores suggested no/minimal decline, but the SRB z-scores indicated a clearer downward trajectory for these patients. Therefore, the estimated SRB method more sensitively identified and accurately quantified this change. This method also allows one to identify specific individuals with more notable decline. For example, even though the mean SRB z-score for both groups was above –1.645 for WAIS-III/-IV Matrix Reasoning, 6% of those classified as MCI-Stable showed z-scores below that cutoff, and 17% of the MCI-Decline group had z-scores below that cutoff. By identifying these individuals with specific types of cognitive decline, specific types of interventions may be offered to these individuals (e.g., personalized medicine).
The current findings may also assist with understanding the cause of decline. Although the sample examined in the current study included multiple MCI subtypes and likely multiple etiologies, the majority of the sample was classified as having amnestic MCI (single or multi-domain) at Time 1. Therefore, it is likely that a large proportion of the sample had some Alzheimer’s disease (AD) pathology. Therefore, the greater decreases in semantic (but not phonemic) fluency and naming in the MCI-Decline would be expected, as they are typically seen in patients with early AD [36, 37]. Similarly, despite quite poor scores on measures of learning and memory at baseline in both groups, the MCI-Decline showed greater levels of decline on all memory measures than their stable counterparts. Such preferential decline in memory would also be consistent with AD [38]. Whereas previous research has indicated that attention and verbal reasoning scores do not seem to differentiate groups of MCI and AD [39, 40], our results were mixed with significantly greater decline in the MCI-Decline on all measures of attention and processing speed, but comparable levels of decline on a verbal reasoning test in the MCI-Stable group. Our observed differences between stable and progressing MCI in this study suggest that these changes in measures over time might be more sensitive to distinguishing groups than previously appreciated.
One other observation that is worth mentioning is about the degree of variability in these change scores. Even though mean SRB z-scores for all neuropsychological scores in both groups were negative, the standard deviations of these z-scores were quite large. For example, as is seen in Table 3, the mean SRB z-score for the MCI-Stable group on TMT Part B was –0.90 (and its z-score was reversed so that negative values indicated decline), but its standard deviation was 2.73. Therefore, even though the “average” patient who retained their MCI diagnosis at Time 2 declined on this measure compared to the normative data, it is clear that not all of them did. With such a wide standard deviation, it is reasonable to suspect that a sizeable minority actually improved on this test more than the normative group (e.g., –0.90+2.73 = +1.81). As alluded to earlier, SRBs can also be used to identify individual patients who show atypical patterns of change, both negative and positive. These “change outliers” may provide valuable insights for both the progression and resiliency in the face of disease.
Although the findings of this study strongly support the use of SRB z-scores in repeated neuropsychological evaluations, it is critical for the clinician to understand the psychometric features of the SRB method that limit its use in particular situations. One complexity to consider is the interpretation of “floor” scores at both timepoints, which is a common case in individuals with MCI. For example, scores at floor level at both timepoints yield misleading z-scores that may underestimate change in certain measures, such as HVLT-R Delayed Recall and TMT-B. Interpretating SRB z-scores may be impractical in the case of very low test scores where meaningful change cannot be captured. Additionally, because different normative data may be used for different measures, as was the case in the current study, one should be cautious when making comparisons of change across tests. Taken together, a multimodal approach that includes quantitative and qualitative comparison of test performance is best suited for understanding cognitive decline in individual patients. A change score calculator (i.e., Microsoft Excel spreadsheet) that includes the data used to generate these SRB z-scores scores is available upon request from the corresponding author for use in individual clinical cases.
This study is not without limitations. First, the selection of the normative data for the estimated SRBs raises some important issues. In choosing normative data, we wanted to use data that most clinicians and researchers would also have easy access to. As such, we primarily relied on test-retest data from test manuals. However, there are frequently many differences in retest data in test manuals and clinical settings. For example, while the patient sample was re-assessed after about one- and one-half years, retest intervals for the normative data used to compute z-scores were much shorter. Critically, several studies show that retest interval accounts for little to no variance beyond Time 1 scores in SRB models predicting Time 2 performance [8, 41–45]. Some studies show this is also the case for age, such that the age of participants contributes very little, or not at all to performance change over time [42, 46–51]. However, other research has found age as a strong predictor in longitudinal performance [52–54]. Some of the normative samples used for comparison in this study were considerably younger in age than the current sample. Thus, some extrapolation to the study sample was required, and may have skewed the quantification of cognitive decline. Second, the Maassen et al. formula [10] is limited in that it predicts the Time 2 score from the Time 1 score using normative data, without consideration of other important factors such as retest interval or demographic information. Even though Time 1 score is consistently the best predictor of a Time 2 score, and demographics and retest interval add relatively little additional variance in the prediction of change across time [41–48], the expansion of SRB methods to include these factors may improve the interpretation of change in performance across time. Third, although diagnoses of MCI and dementia at Times 1 and 2 were made by a neurologist (and not the neuropsychologist), some circularity could have been present, as the neurologist had access to the report of the neuropsychological evaluation when they made their diagnoses of the patient. However, it seems unlikely that the neurologist calculated estimated SRBs to make diagnoses, and it was not standard practice for the neuropsychologist to include this information in their report. Fourth, as the current sample came from a cognitive disorders clinic at an academic medical center, it may not fully generalize to the larger population that it proposes to reflect. Our cohort had the opportunity to be seen for two neurological and neuropsychological visits over approximately one- and one-half years, which suggests adequate healthcare resources. As such, less economically-advantaged patients with MCI may show different rates of change. Additionally, our sample was almost exclusively Caucasian, so an examination of the rates of cognitive change in more racially- and ethically-diverse cohorts is needed [49]. Despite these limitations, this study demonstrated the value of the SRB method in more clearly capturing and quantifying cognitive decline over time and highlighting that patients classified as diagnostically stable may also show notable cognitive decline. These results would appear to augment existing studies in research cohorts [50, 51] in our understanding of cognitive change in MCI.
Footnotes
ACKNOWLEDGMENTS
The authors have no acknowledgements to report.
FUNDING
The authors have no funding to report.
CONFLICT OF INTEREST
The authors have no conflict of interest to report.
DATA AVAILABILITY
The data supporting the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.
