Short-Term Practice Effects on Cognitive Tests Across the Late Life Cognitive Spectrum and How They Compare to Biomarkers of Alzheimer’s Disease

Abstract

Background:

Practice effects on cognitive testing in mild cognitive impairment (MCI) and Alzheimer’s disease (AD) remain understudied, especially with how they compare to biomarkers of AD.

Objective:

The current study sought to add to this growing literature.

Methods:

Cognitively intact older adults (n = 68), those with amnestic MCI (n = 52), and those with mild AD (n = 45) completed a brief battery of cognitive tests at baseline and again after one week, and they also completed a baseline amyloid PET scan, a baseline MRI, and a baseline blood draw to obtain APOE ɛ4 status.

Results:

The intact participants showed significantly larger baseline cognitive scores and practice effects than the other two groups on overall composite measures. Those with MCI showed significantly larger baseline scores and practice effects than AD participants on the composite. For amyloid deposition, the intact participants had significantly less tracer uptake, whereas MCI and AD participants were comparable. For total hippocampal volumes, all three groups were significantly different in the expected direction (intact > MCI > AD). For APOE ɛ4, the intact had significantly fewer copies of ɛ4 than MCI and AD. The effect sizes of the baseline cognitive scores and practice effects were comparable, and they were significantly larger than effect sizes of biomarkers in 7 of the 9 comparisons.

Conclusion:

Baseline cognition and short-term practice effects appear to be sensitive markers in late life cognitive disorders, as they separated groups better than commonly-used biomarkers in AD. Further development of baseline cognition and short-term practice effects as tools for clinical diagnosis, prognostic indication, and enrichment of clinical trials seems warranted.

Keywords

Alzheimer’s disease amyloid biomarkers brain imaging effect sizes mild cognitive impairment neuropsychological testing practice effects

INTRODUCTION

Practice effects on repeated cognitive testing have received considerable interest lately as they relate to mild cognitive impairment (MCI) and Alzheimer’s disease (AD). For example, in a review of this literature, Jutten et al. [1] identified 27 studies that have looked at practice effects in late life cognitive conditions. These authors reported that smaller practice effects were consistently associated with worse diagnostic status (e.g., MCI, dementia) and/or future cognitive decline. Additionally, smaller practice effects were associated with AD risk factors (e.g., family history, brain pathology) and neurodegeneration biomarkers (e.g., amyloid positivity, APOE ɛ4). In other empirical works, practice effects can mask the severity of cognitive deficits and thereby delay the diagnosis of a cognitive disorder [2, 3]. Similarly, Jacobs et al. [4] warned about the negative potential on clinical trials in AD if practice effects are not considered.

However, the existing studies on practice effects in AD fail to provide a clear picture of the complexities of this topic. For example, in the 27 studies reviewed by Jutten et al. [1], only one study examined the spectrum of late life cognition (i.e., intact, MCI, AD), five examined only two of these groups (e.g., intact versus MCI), and most only considered one group (e.g., intact individuals). Without the inclusion of a fuller range of diagnostic severity and comparisons between these various phases, our view of these artificial improvements due to repeated exposure to test materials is incomplete.

Additionally, much of what we know about practice effects in AD comes from clinical databases and longitudinal research projects that tend to retest individuals over relatively long periods of time (e.g., 1–2 years of follow-up). In the 27 studies reviewed by Jutten et al. [1], about half used brief retest intervals (e.g., same day to two weeks). The remaining studies used notably longer retest intervals of 6 weeks to 4 + years. Additionally, the studies using brief intervals averaged about 75 participants per study, whereas those with longer retest intervals had more than six times that number of participants per study. With the majority of the data coming from studies with longer retest intervals, this may skew our understanding of practice effects because other factors, including normal aging, disease progression, interventions, and retention biases, may be altering the magnitude and direction of cognitive changes. To more purely study practice effects, shorter intervals appear less confounded.

Finally, few studies have looked at how practice effects compare to the various biomarkers of AD. Again, in the 27 studies reviewed by Jutten et al. [1], about half considered biomarkers, with APOE being the most common and only three studies utilizing more than one biomarker. This review also reported that practice effects were associated with biomarkers in 65% of the published papers that examined them. Clearly, more information is needed about how practice effects compare to biomarkers in AD and related dementias.

Therefore, the current study sought to address some of these gaps in the literature by examining short-term practice effects, collected across one week, in older adults who were either cognitively intact or impaired, including amnestic MCI and mild AD. Baseline cognitive scores and biomarker data (e.g., brain amyloid deposition, hippocampal volumes, APOE status) was also available for participants. For baseline cognitive scores and practice effects, a gradient was expected based on cognitive status, with the cognitively intact individuals showing the largest baseline scores/practice effects, those with AD showing the smallest baseline scores/practice effects, and the individuals with MCI falling between the other two groups. This same gradient with the biomarkers was expected, such that the intact group would show the lowest rates of biomarker signal, followed by the MCI participants, and with those with AD having the highest rates of biomarker signal.

MATERIALS AND METHODS

Participants

One hundred sixty-five older adults were recruited from a cognitive disorders clinic (53.6%) or through the community (46.4%) to participate in a larger study of brain imaging and neuropsychological testing across the dementia spectrum. Their mean age was 74.1 (SD = 6.0) years and their mean education was 16.0 (SD = 2.5) years. Most were Caucasian (98.2%) and 56.6% were female. Mean premorbid intellectual functioning, as measured by the Reading subtest of the Wide Range Achievement Test–4 (WRAT-4) [5], was in the average range (M = 110.0, SD = 8.6), and self-rating of depression symptoms were minimal on the 15-item Geriatric Depression Scale (GDS, M = 1.2, SD = 1.3) [6].

Participants from the cognitive disorders’ clinic were recruited with a clinical diagnosis of either amnestic MCI (single or multi-domain) or AD based on a neurological visit, neuropsychological evaluation, and brain imaging. Participants from the community were largely recruited as cognitively intact controls; however, a minority of amnestic MCI and AD cases (13% and 12%, respectively) were identified in the community. Confirmation of group assignment was made with the Alzheimer’s Disease Neuroimaging Initiative classification battery [7], which included the Mini-Mental Status Examination [8], the Clinical Dementia Rating Scale [9], and the Wechsler Memory Scale–Revised [10] Logical Memory II Paragraph A. Based on these criteria, 68 participants were classified as cognitively intact, 52 as amnestic MCI (single or multidomain), and 45 as mild AD.

Participants were included if they were 65 years of age or older and had a knowledgeable collateral source available to comment on their cognition and daily functioning. Participants were excluded for medical comorbidities likely to affect cognition (including neurological conditions, current severe depression, substance abuse, and major psychiatric conditions), the inability to complete MRI or PET, the inability to complete cognitive assessments due to inadequate vision, hearing, or manual dexterity, and being enrolled in a clinical drug trial related to anti-amyloid agents. Additional exclusion criteria included elevated depression as indicated by a score of greater than 5 on the 15-item Geriatric Depression Scale, and moderate or severe dementia as indicated by a Clinical Dementia Rating score of 2 or greater or a Mini-Mental Status Examination score of less than 20.

Procedure

Procedures were approved by the local Institutional Review Board before participants enrolled. Following informed consent/assent, participants underwent testing with the Alzheimer’s Disease Neuroimaging Initiative battery and other neuropsychological testing at a baseline visit. They returned about a week later (M = 6.9 days, SD = 0.7) for repeat neuropsychological testing and an MRI of the brain. Within about a month (M = 36.4 days, SD = 49.0), they returned to receive an amyloid PET scan of the brain using ¹⁸F-Flutemetamol and a blood draw to determine APOE ɛ4 status.

Neuropsychological measures

The following neuropsychological tests were administered at a baseline visit, as well as again after approximately one week. Alternate forms of these tests were not used to maximize practice effects. Additionally, these test results were neither part of the clinical evaluation of individuals nor of the research classification that confirmed group assignment (i.e., they were independent of any diagnosis).

Brief Visuospatial Memory Test–Revised (BVMT-R) [11] is a visual memory task with six designs in six locations on a card learned over three trials, with correct designs and locations summed for the Total Recall score (range = 0–36). The Delayed Recall score is the number of correct designs and locations recalled after a 20–25-min delay (range = 0–12). For all BVMT-R scores, higher values indicate better performance.

Hopkins Verbal Learning Test –Revised (HVLT-R) [12] is a verbal memory task with 12 words learned over three trials, with the correct words summed for the Total Recall score (range = 0–36). The Delayed Recall score is the number of correct words recalled after a 20–25-min delay (range = 0–12). For all HVLT-R scores, higher values indicate better performance.

Symbol Digit Modalities Test (SDMT) [13] is a divided attention and psychomotor speed task, with the number of correct responses in 90 s being the total score (range = 0–110), and higher values indicate better performance.

Trail Making Test Parts A (TMT-A) and B (TMT-B) [14] are tests of visual scanning/processing speed and set shifting/complex mental flexibility, respectively. For each part, the score is the time to complete the task (range = 0–180 s for TMT-A, and range = 0–300 s for TMT-B), and higher values indicate poorer performance.

Two additional measures were administered at the baseline visit to better characterize the individuals in this study.

GDS–15-item version [6] is a self-report scale of depressive symptoms experienced over the past week. Higher raw scores indicate more depressive symptoms.

The WRAT-4 Reading subtest [5], in which a participant reads irregular words, was administered to assess premorbid intellect. Using normative data, age-corrected standard scores are generated (M = 100, SD = 15), with higher scores indicating higher premorbid intellect.

Amyloid imaging

Amyloid imaging was performed using ¹⁸F-Flutemetamol which is a radioactive diagnostic agent indicated for PET imaging of the brain to estimate amyloid-β neuritic plaque density in adult patients with cognitive impairment. ¹⁸F-Flutemetamol was produced under PET current Good Manufacturing Practice standards and conducted under an approved Food and Drug Administration Investigational New Drug application. Twenty minutes of emission imaging was performed 90 min after the injection of approximately 185 mBq (5 mCi) of ¹⁸F-Flutemetamol. A GE Discovery PET/CT 710 (GE Healthcare) was used in this study. This PET/CT scanner has full width at half-maximum spatial resolution of 5.0 mm and excellent performance characteristics [15, 16]. Volumes of interest were automatically generated by using the CortexID Suite analysis software (GE Healthcare). ¹⁸F-Flutemetamol binding was analyzed using a regional semi-quantitative technique described by Vandenberghe et al. [17] and refined by Thurfjell et al. [18]. The CortexID Suite software generates, semi-quantitative regional (prefrontal, anterior cingulate, precuneus/posterior cingulate, parietal, mesial temporal, lateral temporal, occipital, sensorimotor, cerebellar grey matter, and whole cerebellum) standardized uptake value ratios (SUVRs) normalized to the pons. A composite SUVR in the cerebral cortex was generated automatically and normalized to the pons using the CortexID Suite software [19].

MRI

MRI was acquired on a 3.0-T Siemens Prisma scanner with a 64-channel head coil. Structural data were acquired using an MP2RAGE sequence (TR = 5000, TE = 2.93, acquired sagittally, resolution = 1×1×1 mm) to obtain high quality, whole brain 1 mm isotropic T1 images with improved signal homogeneity in ∼7 min. All MRI scans were examined for the presence of common artifacts, including motion, susceptibility, and distortion, and were determined to be of sufficient quality for quantitative analysis. All data were processed on the same workstation using FreeSurfer image analysis suite v6.0 (http://surfer.nmr.mgh.harvard.edu/) to estimate total estimated intracranial and hippocampal volumes. Technical details are described previously [20 –22]. To address head size differences, hippocampal volumes were expressed as a proportion of the estimated total intracranial volume. Left and right hemispheric volumes were summed to create total hippocampal volume adjusted by total intracranial volume.

APOE genotyping

Polymerase chain reaction and fluorescence monitoring using hybridization probes for APOE genotyping was conducted using whole blood samples. The number of APOE ɛ4 allele carriers was determined for each participant.

Data analysis

A baseline cognitive composite score was generated by averaging for the seven baseline demographically-corrected T-scores (BVMT-R Total and Delay Recall, HVLT-R Total and Delay Recall, SDMT, TMT-A, TMT-B) using normative data from the test manuals for the memory tests and the Mayo Older Adult Normative Studies [23] for the non-memory tests.

To quantify practice effects on the tests that were repeated after approximately one week, the standardized regression-based change formulae of Hammers et al. [24] were applied to the current sample. Those formulae were developed on 200 robustly cognitively intact older adults who were administered this same battery across one week. Baseline scores, age, education, sex, estimate of premorbid intellect, and retest interval were used to predict one-week scores on each test. The predicted one-week scores were subtracted from the observed one-week scores and divided by the standard error of the estimate from the original regression models in Hammers et al., which resulted in a z-score of change that indicated how much the current sample deviated from the expected practice effects in the Hammers et al. sample (see Table 1). For these z-scores of change, positive values indicated more improvement than expected and negative values indicated less improvement than expected (or possibly more decline than expected). These z-scores of change were calculated for the seven scores from the repeated battery: HVLT-R Total Recall, HVLT-R Delayed Recall, BVMT-R Total Recall, BVMT-R Delayed Recall, SDMT, TMT-A, and TMT-B. A practice effects composite was generated by averaging of the seven z-scores, which was used as the outcome in the primary analysis. Note that the signs of the z-scores of change were reversed for TMT-A and TMT-B so that all signs went in the same direction of positive values indicating more improvement than expected.

Table 1

Standardized regression-based change formulae of Hammers et al. [25]

Cognitive score	Formula
BVMT-R Total Recall	(T2 –(11.01+(T10.52) –(age0.13)+(ed0.17)+(sex0.41)+(WRAT*0.13)))/3.78
BVMT-R Delay Recall	(T2 –(2.78+(T10.49) –(age0.04)+(ed0.07)+(sex0.23)+(WRAT*0.04)))/1.35
HVLT-R Total Recall	(T2 –(13.13+(T1*0.65)))/3.34
HVLT-R Delay Recall	(T2 –(5.90+(T1*0.49)))/1.17
SDMT	(T2 –(19.83+(T10.83) –(age0.21)+(ed*0.37)))/4.92
TMT-A	(T2 –(12.21+(T1*0.61)))/14.23
TMT-B	(T2 –(–9.94+(T10.47)+(age1.41) –(ed0.93) –(sex1.94) –(WRAT0.22) –(interval2.12)))/22.69

In the formulae: T2, raw cognitive score at one-week visit; T1, raw cognitive score at baseline visit; age, years old; ed, years of education; sex: 0, male and 1, female; WRAT, age-corrected standard score on Reading subtest of the Wide Range Achievement Test; interval, days between baseline and one-week visits. BVMT-R, Brief Visuospatial Memory Test –Revised; HVLT-R, Hopkins Verbal Learning Test –Revised; SDMT, Symbol Digit Modalities Test; TMT, Trail Making Test.

The three groups (intact, MCI, AD) were compared on demographic variables (age, education, sex, race) with ANOVA and chi-square tests, and any significant differences were considered as covariates in the primary analyses. For the primary analyses, a series of five ANCOVAs were run, with the markers of AD (baseline cognitive composite, practice effects composite SRB z-score, composite SUVR, total hippocampal volume adjusted by total intracranial volume, number of APOE ɛ4 alleles) as the dependent variable, group as the independent variable, and age as the covariate (as it was the only demographic variable that was significantly different among the groups, F[2,164] = 7.05, p = 0.001). In secondary analyses, effect sizes were calculated for each comparison in the primary analyses, and analogous effect sizes were compared with a z-test [25]. For example, the effect size of the intact versus MCI subjects on the overall practice effects composite SRB were compared to the effect size of the intact versus MCI subjects on composite SUVR. In total, 15 such comparisons of effect sizes were made (i.e., three diagnostic group comparison effect sizes x five marker comparison effect sizes). To examine if practice effects contributed additional information, above and beyond baseline cognition, a final ANCOVA compared the three groups on the overall practice effects composite SRB, with age and the baseline cognitive composite as covariates. To reduce the risk of Type I error, an alpha level of < 0.01 was used throughout all analyses.

RESULTS

As noted earlier, the three groups were significantly different on age (F[2,164] = 7.05, p = 0.001), with the participants with AD being significantly older than the participants in the other two groups. However, there were no statistical differences between the groups on years of completed education (p = 0.08), sex (p = 0.77), or race (p = 0.56). As seen in Table 2, the groups were also significantly different on the 15-item Geriatric Depression Scale, Clinical Dementia Rating, Mini-Mental Status Examination, delayed recall for Story A for Logical Memory, and all three biomarkers (all p-values < 0.01). Baseline, one-week, and practice effects scores are presented in Table 3.

Table 2

Baseline demographics, clinical, and biomarker data

Variable	Intact	MCI	AD
N	68	52	45
Age (y)^a	72.7 (5.2)	73.7 (5.6)	76.8 (6.8)
Education (y)	16.6 (2.3)	15.4 (2.8)	15.9 (2.3)
Female	59%	54%	57%
White	99%	96%	100%
Retest interval (days)	6.8 (0.8)	7.0 (0.6)	6.9 (0.8)
WRAT-4 Reading	111.3 (8.2)	108.3 (9.2)	110.0 (8.5)
GDS-15^b	0.9 (1.1)	1.6 (1.4)	1.4 (1.3)
MMSE^c	29.4 (0.8)	26.5 (1.9)	23.0 (2.8)
Logical Memory IIA^c	12.0 (3.8)	2.8 (2.8)	1.2 (1.4)
CDR Global^c	0.0 (0.0)	0.5 (0.0)	0.8 (0.2)
Baseline Cognitive Composite^c	55.5 (5.4)	40.1 (7.6)	31.3 (5.6)
Composite SUVR^b	0.51 (0.12)	0.76 (0.15)	0.77 (0.14)
Hippocampal volume^c	4.29 (0.78)	3.66 (0.62)	3.08 (0.97)
APOE ɛ4 (frequency)^b	19 + /49–	36 + /15–*	33 + /12–

Superscripts indicate that the groups were statistically significant different at p < 0.01. ^aintact = MCI > AD, ^bintact > MCI = AD, ^cintact > MCI > AD. WRAT-4, Wide Range Achievement Test –4, presented as standard scores (M = 100, SD = 15); GDS-15, 15-item version of the Geriatric Depression Scale; MMSE, Mini-Mental State Examination; Logical Memory IIA, delayed recall of Story A of the Logical Memory subtest of the Wechsler Memory Scale –Revised; CDR, Clinical Dementia Rating scale global rating; SUVR, standardized uptake value ratio; ɛ4:+, individuals with one or two alleles and -, individuals with no alleles; MCI, mild cognitive impairment; AD, Alzheimer’s disease. *One MCI subject did not have APOE results. Baseline Cognitive Composite is presented as T-scores (M = 50, SD = 10).

Table 3

Cognitive scores at baseline and one-week and practice effects

Cognitive score	Baseline^a			One-week^a			Practice effect^b
	Intact	MCI	AD	Intact	MCI	AD	Intact	MCI	AD
BVMT-R Total Recall	24.2 (5.4)	11.6 (6.1)	6.5 (3.1)	30.9 (4.0)	16.0 (7.9)	7.4 (3.6)	–0.08 (0.79)	–2.09 (1.46)	–3.69 (0.88)
BVMT-R Delay Recall	9.5 (1.9)	4.0 (3.2)	0.9 (1.1)	10.6 (1.5)	5.2 (3.6)	1.9 (2.1)	0.44 (0.93)	–1.40 (1.75)	–2.74 (1.28)
HVLT-R Total Recall	25.8 (4.9)	16.0 (5.2)	11.4 (4.5)	29.0 (5.1)	18.2 (5.5)	13.2 (4.0)	–0.25 (1.17)	–1.60 (1.21)	–2.18 (0.61)
HVLT-R Delay Recall	9.4 (2.1)	2.6 (2.9)	0.5 (1.5)	10.1 (1.8)	4.4 (3.2)	1.4 (1.9)	–0.31 (1.04)	–2.35 (1.92)	–4.09 (1.29)
SDMT	45.1 (8.1)	36.7 (8.4)	26.7 (10.7)	48.5 (8.6)	38.0 (9.7)	28.9 (10.2)	0.07 (0.89)	–0.51 (1.20)	–0.63 (0.98)
TMT-A	30.7 (8.1)	38.9 (18.8)	60.5 (35.4)	27.9 (7.5)	37.5 (11.6)	53.6 (28.3)	0.22 (0.41)	–0.11 (0.64)	–0.30 (1.00)
TMT-B	76.9 (30.3)	118.1 (58.8)	212.0 (90.8)	70.7 (25.8)	123.9 (73.6)	200.1 (93.5)	0.19 (0.94)	–1.18 (2.43)	–2.50 (2.97)
Overall composite	–	–	–	–	–	–	0.04 (0.40)	–1.32 (0.82)	–2.30 (0.75)

^araw scores, ^bz-scores. BVMT-R, Brief Visuospatial Memory Test –Revised; HVLT-R, Hopkins Verbal Learning Test –Revised; SDMT, Symbol Digit Modalities Test; TMT, Trail Making Test; MCI, mild cognitive impairment; AD, Alzheimer’s disease.

Primary analyses

In the primary analysis of baseline cognition, the three groups were significantly different on the baseline composite (F[2,165] = 233.66, p < 0.001), after covarying age, with group and age sharing approximately 74% of the variance with baseline cognition (R² = 0.74). Post-hoc comparisons showed that all three groups were significantly different from each other (p-values < 0.001), with the intact subjects having the largest baseline cognition composite, the AD subjects having the smallest baseline composite, and the MCI subjects falling between the other two.

In the primary analysis of practice effects, the three groups were significantly different on composite SRB z-score (F[2,165] = 180.30, p < 0.001), after covarying age, with group and age sharing approximately 70% of the variance with practice effects (R² = 0.70). Post-hoc comparisons showed that all three groups were significantly different from each other (p-values < 0.001), with the intact subjects having the largest practice effect, the AD subjects having the smallest practice effect, and the MCI subjects falling between the other two.

In the primary analysis of amyloid deposition, the three groups were also significantly different on composite SUVR (F[2,161] = 69.39, p < 0.001), with age as the covariate. Group and age shared approximately 47% of the variance with this biomarker (R² = 0.47). In the post-hoc comparisons, the intact individuals had significantly smaller composite SUVRs than individuals with MCI or AD (p-values < 0.001). However, those with MCI and AD had comparable composite SUVRs (p = 0.64).

When considering total hippocampal volume adjusted by total intracranial volume after controlling for age, there was a statistically significant effect of group (F[2,149] = 23.34, p < 0.001), with group and age sharing approximately 33% of the variance with total hippocampal volume (R² = 0.33). Similar to the post-hoc comparisons of practice effects, each group was significantly different from the other (intact versus MCI p < 0.001, intact versus AD p < 0.001, MCI versus AD p = 0.005).

Finally, for the primary analysis of APOE ɛ4 alleles, there was also a group effect (F[2,161] = 25.91, p < 0.001), with age as a covariate. APOE, group, and age shared approximately 26% of their variance (R² = 0.26). Post-hoc comparisons revealed a similar pattern to composite SUVR, with the intact participants having significantly fewer ɛ4 alleles than either of the other two groups (p-values < 0.001), but the MCI and AD participants having comparable numbers (p = 0.14).

Secondary analyses

In the secondary analyses of effect sizes derived from the primary analyses, nearly all group comparisons (e.g., intact versus MCI, MCI versus AD) yielded large effect sizes for all AD markers (e.g., baseline cognition composite d = 1.31–4.41, practice effects composite SRB z-score d = 1.24–4.15, composite SUVR d = 0.05–2.02, total hippocampal volume d = 0.72–1.40, number of APOE ɛ4 alleles d = 0.14–1.11, see Table 4). However, when effect sizes were compared across markers with z-tests, some interesting patterns emerged. Of the nine comparisons that involved the practice effects composite compared to the three biomarkers (composite SUVR, total hippocampal volume, APOE ɛ4 alleles), seven showed significantly larger effect sizes for the practice effects composite (all p-values < 0.001). This same finding was also observed for the baseline cognition composite compared to the three biomarkers (all p-values < 0.001). Conversely, none of the 9 comparisons showed significantly larger effect sizes for the biomarkers compared to either the baseline or practice effects composite. Finally, comparison of effect sizes between the baseline cognition and practice effects composites were not statistically significant (p-values = 0.21–0.56). The interested reader can contact the first author for these comparisons between the biomarkers (e.g., total hippocampal volume versus composite SUVR) for the group comparisons (e.g., intact versus MCI).

Table 4

Effect sizes for markers of Alzheimer’s disease

Marker comparison	Intact versus MCI	Intact versus AD	MCI versus AD
PE versus composite SUVR	2.20 versus 1.89, 0.98, 0.33	4.15 versus 2.02, 5.19, <0.01	1.24 versus 0.05, 3.93, <0.01
PE versus hippocampal volume	2.20 versus 0.87, 4.29, <0.01	4.15 versus 1.40, 6.84, <0.01	1.24 versus 0.72, 1.68, 0.09
PE versus APOE	2.20 versus 0.98, 4.01, <0.01	4.15 versus 1.11, 7.71, <0.01	1.24 versus 0.14, 3.66, <0.01
PE versus baseline cognition	2.20 versus 2.37, 0.56, 0.58	4.15 versus 4.38, 0.54, 0.60	1.24 versus 1.30, 0.21, 0.83

In each cell is the effect size (i.e., Cohen’s d) for the SRB versus another marker, the z-value comparing those two effects sizes, and the p-value of that comparison. PE, practice effects via the standardized-based regression of Hammers et al. (2021); SUVR, standardized uptake value ratio; MCI, mild cognitive impairment; AD, Alzheimer’s disease.

A final analysis was calculated to examine if practice effects contributed additional information, above and beyond baseline cognition, with an ANCOVA that compared the three groups on the practice effects composite SRB, with age and the baseline cognitive composite as covariates. In this analysis, practice effects remained statistically significant (F[2,165] = 10.89, p < 0.001) and contributed additional variance to separating the groups, above and beyond age and baseline cognition.

DISCUSSION

Practice effects on cognitive testing have received increased interest as a potential marker in MCI and AD [1], both for informing clinical practice and potentially enriching clinical trials in AD. Unfortunately, many studies have considered practice effects across longer periods of time (e.g., months to years), which can obscure their actual value with other confounding factors (e.g., cognitive changes with aging and disease, interventions, etc.). Furthermore, few studies have examined how practice effects compare to biomarkers of AD. The current study sought to address some of these gaps in the literature by examining practice effects over a short period of time (e.g., approximately one week) across the late life spectrum of cognition and also considering how practice effects compared to traditional biomarkers of AD in separating phases of this spectrum. Consistent with hypotheses, significant group differences were seen on a composite measure of practice effects, such that intact older adults had larger practice effects than those with MCI, who had larger practice effects than those with AD. This same gradient was largely seen across baseline cognition and three traditional biomarkers of AD: composite SUVR, total hippocampal volume, and number of APOE ɛ4 alleles. Overall, these findings are not surprising, as the existing literature has frequently reported on this same pattern in many prior studies [7 , 26–29].

However, a closer look at the results suggests that baseline cognition and practice effects may make for a better marker of AD than the traditional biomarkers. For example, in the primary analyses, only baseline cognition, practice effects, and hippocampal volumes separated all groups along the spectrum of late life cognitive functioning in the expected pattern of intact > MCI > AD. Amyloid deposition via a composite SUVR and number of APOE ɛ4 alleles adequately separated the intact individuals from the impaired individuals, but these two markers failed to adequately separate the two impaired groups. The poorer ability of amyloid deposition and APOE to separate the MCI and AD participants might be a product of evolution of these biomarkers over time. For example, one’s genetic risk via APOE does not change throughout the progression from intact to MCI to AD (i.e., it remains constant). As such, as the disease progresses (e.g., MCI to AD), this marker confers no additional information. Similarly, although amyloid appears to accumulate over a relatively long prodromal period in AD [30], it also appears to plateau towards conversion to AD [31], and thereby provides less information once the disease has progressed to a certain stage. Since hippocampal volume loss starts later in the disease process [30], it seems to be able to separate later disease stages, like MCI progressing to AD. Practice effects and baseline performance on cognitive testing seem to be similar to hippocampal volumes (and perhaps tau) in this way.

Although the advantage of practice effects compared to the other markers of AD are evident in the relatively larger R² values in the primary analyses, their ability to separate these groups might be best highlighted by the secondary analyses comparing effect sizes across the markers. Even though all markers tended to yield large effect sizes when making group comparisons within the marker (e.g., intact versus MCI for composite SUVR d = 1.89; MCI versus AD for hippocampal volume d = 0.72), the baseline cognition and practice effects markers led to significantly larger effect sizes than any of the AD biomarkers in 7 of the 9 comparisons (and had comparable effect sizes in the other two comparisons). When comparing intact and MCI subjects, baseline cognition and practice effects led to significantly larger effect sizes than hippocampal volumes and APOE. When comparing MCI and AD subjects, the effect sizes for baseline cognition and practice effects were larger than those for amyloid deposition and APOE. Moreover, when comparing intact and AD subjects, practice effects’ effect sizes were larger than all three biomarkers, as was the composite of baseline cognition. Since powering a future clinical trial with effect sizes can dramatically alter the cost and burden of a trial, baseline cognition and practice effects appear to offer an advantage over these AD biomarkers.

Although not formally evaluated in this study, baseline cognition and practice effects have other relative benefits over traditional biomarkers of AD in current use in clinical practice and clinical trials. For example, baseline cognition and short-term practice effects are relatively inexpensive to collect, they require no special equipment (beyond testing supplies), they can be collected anywhere, they pose no safety risks, they are non-invasive, and they are relatively well-tolerated in impaired individuals. Additionally, the baseline testing often provides valuable clinical information about an individual’s cognitive status and possible diagnosis (even though some of this can be gained with traditional biomarkers too). As indicated in the Results, baseline cognition and practice effects composites showed a clear linear pattern across the three groups, with intact > MCI > AD. Conversely, amyloid SUVR and APOE did not tend to separate amnestic MCI from mild AD, with effect sizes in the 0.05–0.14 range. Even though baseline cognition and practice effects showed strong separation among groups, these are cross-sectional results that would need to be validated in actual longitudinal progression from one phase to the next (e.g., worsening baseline cognition and practice effects indicate conversion from MCI to AD). If validated in prospective studies, then practice effects may be considered as an outcome variable in clinical trials of disease-modifying agents. Although baseline cognition is sometimes used as an outcome in clinical trials, it is often a global measure of cognition (e.g., Mini-Mental Status Examination or Alzheimer’s Disease Assessment Scale –Cognitive subscale). These findings suggest that more sensitive and domain-specific measures of baseline cognition may be more valuable than global measures. As such, with larger effect sizes than traditional biomarkers of AD, baseline cognition and practice effects need to be more strategically leveraged in clinical settings and future research endeavors.

Since both baseline cognition and practice effects effectively separated these groups and had comparable effect sizes, it becomes debatable if practice effects are worth the extra effort to collect. That is, baseline cognition, which only requires one visit of cognitive testing, would seem to be preferred compared to practice effects, which requires two visits. In situations where one needs to find differences between groups (e.g., diagnostically, subject selection), baseline cognition might be the more efficient method. However, practice effects might be more useful in other situations. First, practice effects may provide additional value, above and beyond baseline cognition, in separating these groups. For example, when the primary analysis for the practice effects composite was re-run with baseline cognition as another covariate (with age), it was observed that practice effects remained statistically significant in contributing additional variance to separating the groups. Second, practice effects may be more useful for tracking progression over time. For example, Oltra-Cucarella et al. [32] found changes across a short period of time (i.e., practice effects) was associated with increased risk of AD across six years in individuals with amnestic MCI. Others have also shown the prognostic value of practice effects, above and beyond baseline cognition [33 –35]. Third, practice effects appear predictive of biomarker positivity. In an experimental design, Papp et al. [36] demonstrated the superiority of learning curves (i.e., practice effects) over cross-sectional differences (i.e., baseline cognition) between those who were amyloid positive versus negative. Similar findings have been noted in the literature on practice effects [37 –39]. As such, there may be situations where practice effects are worth the extra costs.

One last observation that seems worth mentioning is the relatively larger standard deviations on the practice effect scores in the participants with MCI compared to those in the other groups. On the practice effects composite, the standard deviation of the MCI subjects was 106% larger than the intact subjects and 10% larger than the AD subjects. It has been previously suggested that there are likely two subgroups of MCI: those that benefit from practice and those that do not [40]. And that these subgroups have differential patterns of decline [35]. It is expected that the identification of these subgroups and their differential enrollment in clinical trials may make for more efficient studies by focusing on those individuals who are more likely to decline over the course of the trial. It is also worth noting that the standard deviation of the baseline cognitive composite was larger in the MCI group, but less so than practice effects (e.g., 40% larger than intacts).

The current study is not without its limitations. First, the current sample was relatively well-educated and had little racial/ethnic diversity. As such, the generalizability of these results to a more diverse population is unclear. Second, despite examining practice effects on multiple cognitive tests spanning memory and processing speed, these findings may miss other sensitive markers of disease and its progression that could come from examining practice effects on tests of attention, visuospatial perception and construction, language, and executive functioning. Third, the current study focused on amnestic MCI and mild AD, which precludes our understanding of how practice effects might separate more advanced cases of dementia (e.g., moderate, severe) and different etiologies of dementia (e.g., vascular dementia, dementia with Lewy bodies, frontotemporal dementia). Fourth, although three commonly used biomarkers of AD were utilized, the extension of these findings to imaging, cerebrospinal fluid, and blood-based biomarkers of tau, neurofilament light chain, and neuroinflammatory markers seems reasonable. Fifth, the practice effects SRBs from Hammers et al. [24] were developed on cognitively intact individuals, so their applicability to impaired individuals with notably lower scores are uncertain. Finally, there is frequently potential circularity between independent and dependent variables in these types of studies. However, in the current case, this appears to be relatively low. Whereas the criteria from the Alzheimer’s Disease Neuroimaging Initiative used performance on two cognitive tests (Mini-Mental Status Examination and the Wechsler Memory Scale –Revised Logical Memory II Paragraph A) to classify individuals as intact, MCI, or mild AD, these tests were not part of the practice effects battery in the current study. Despite these limitations, the future refinement of short-term practice effects as a tool for clinical diagnosis, prognostic indication, and enrichment of clinical trials seems warranted.

AUTHOR CONTRIBUTIONS

Kevin Duff (Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Writing – original draft; Writing – review & editing); Dustin B. Hammers (Conceptualization; Writing – original draft; Writing – review & editing); Vincent Koppelmans (Writing – original draft; Writing – review & editing); Jace B. King (Methodology; Writing – original draft; Writing – review & editing); John M. Hoffman (Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Writing – original draft; Writing – review & editing).

Footnotes

ACKNOWLEDGMENTS

The authors have no acknowledgments to report.

FUNDING

The project described was supported by research grants from the National Institutes on Aging: R01AG055428, and it was registered at clinicaltrials.gov (NCT03466736). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health. This project also utilized REDCap, which is supported by 8UL1TR000105 (formerly UL1RR025764) NCATS/NIH.

CONFLICT OF INTEREST

The authors have no conflict of interest to report.

DATA AVAILABILITY

The data supporting the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

Jutten

, Grandoit

, Foldi

, Sikkes

SAM

, Jones

, Choi

, Lamar

, Louden

DKN

, Rich

, Tommet

, Crane

, Rabin

(2020) Lower practice effects as a marker of cognitive performance and dementia risk: A literature review. Alzheimers Dement (Amst) 12, e12055.

Sanderson-Cimino

, Elman

, Tu

, Gross

, Panizzon

, Gustavson

, Bondi

, Edmonds

, Eglit

GML

, Eppig

, Franz

, Jak

, Lyons

, Thomas

, Williams

, Kremen

, Alzheimer’s Disease Neuroimaging Initiative (2022) Cognitive practice effects delay diagnosis of MCI: Implications for clinical trials. Alzheimers Dement (N Y) 8, e12228.

Sanderson-Cimino

, Elman

, Tu

, Gross

, Panizzon

, Gustavson

, Bondi

, Edmonds

, Eppig

, Franz

, Jak

, Lyons

, Thomas

, Williams

, Kremen

(2022) Practice effects in mild cognitive impairment increase reversion rates and delay detection of new impairments. Front Aging Neurosci 14, 847315.

Jacobs

, Ard

, Salmon

, Galasko

, Bondi

, Edland

(2017) Potential implications of practice effects in Alzheimer’s disease prevention trials. Alzheimers Dement (N Y) 3, 531–535.

Wilkinson

, Robertson

(2006) WRAT 4: Wide Range Achievement Test, professional manual, Psychological Assessment Resources, Inc., Lutz, FL.

Yesavage

, Brink

, Rose

, Lum

, Huang

, Adey

, Leirer

(1982) Development and validation of a geriatric depression screening scale: A preliminary report. J Psychiatr Res 17, 37–49.

Petersen

, Aisen

, Beckett

, Donohue

, Gamst

, Harvey

, Jack

Jr. , Jagust

, Shaw

, Toga

, Trojanowski

, Weiner

(2010) Alzheimer’s Disease Neuroimaging Initiative (ADNI): Clinical characterization. Neurology 74, 201–209.

Folstein

, Folstein

, McHugh

(1975) “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 12, 189–198.

Morris

(1993) The Clinical Dementia Rating (CDR): Current version and scoring rules. Neurology 43, 2412–2414.

10.

Wechsler

(1987) Manual for theWeschler Memory Scale - Revised, The Psychological Corporation, San Antonio, TX.

11.

Benedict

RHB

(1997) Brief Visuospatial Memory Test-Revised, Psychological Assessment Resources, Inc, Odessa, FL.

12.

Brandt

, Benedict

RHB

(1997) Hopkins Verbal Learning Test-Revised, Psychological Assessment Resources, Inc, Odessa, FL.

13.

Smith

(1973) Digit Symbol Substitution Test, Western Psychological Services, Los Angeles.

14.

Reitan

(1992) Trail Making Test: Manual for administration and scoring., Reitan Neuropsychology Laboratory.

15.

Sunderland

, Christian

(2015) Quantitative PET/CT scanner performance characterization based upon the society of nuclear medicine and molecular imaging clinical trials network oncology clinical simulator phantom. J Nucl Med 56, 145–152.

16.

Yester

, Al-Senan

, White

(2014) NEMA testing of GE Discovery 710 PET scanner compared to a simplified protocol for routine testing of PET scanners. J Nucl Med 55, 2157.

17.

Vandenberghe

, Van Laere

, Ivanoiu

, Salmon

, Bastin

, Triau

, Hasselbalch

, Law

, Andersen

, Korner

, Minthon

, Garraux

, Nelissen

, Bormans

, Buckley

, Owenius

, Thurfjell

, Farrar

, Brooks

(2010) 18F-flutemetamol amyloid imaging in Alzheimer disease and mild cognitive impairment: A phase 2 trial. Ann Neurol 68, 319–329.

18.

Thurfjell

, Lilja

, Lundqvist

, Buckley

, Smith

, Vandenberghe

, Sherwin

(2014) Automated quantification of 18F-flutemetamol PET activity for categorizing scans as negative or positive for brain amyloid: Concordance with visual image reads. J Nucl Med 55, 1623–1628.

19.

Lundqvist

, Lilja

, Thomas

, Lotjonen

, Villemagne

, Rowe

, Thurfjell

(2013) Implementation and validation of an adaptive template registration method for 18F-flutemetamol imaging data. J Nucl Med 54, 1472–1478.

20.

Fischl

, Dale

(2000) Measuring the thickness of the human cerebral cortex from magnetic resonance images. Proc Natl Acad Sci U S A 97, 11050–11055.

21.

Fischl

, Salat

, Busa

, Albert

, Dieterich

, Haselgrove

, van der Kouwe

, Killiany

, Kennedy

, Klaveness

, Montillo

, Makris

, Rosen

, Dale

(2002) Whole brain segmentation: Automated labeling of neuroanatomical structures in the human brain. Neuron 33, 341–355.

22.

Fischl

, van der Kouwe

, Destrieux

, Halgren

, Segonne

, Salat

, Busa

, Seidman

, Goldstein

, Kennedy

, Caviness

, Makris

, Rosen

, Dale

(2004) Automatically parcellating the human cerebral cortex. Cereb Cortex 14, 11–22.

23.

Ivnik

, Malec

, Smith

, Tangalos

, Petersen

(1996) Neuropsychological tests’ norms above age 55: COWAT, BNT, MAE Token, WRAT-R Reading, AMNART, STROOP, TMT, and JLO. Clin Neuropsychol 10, 262–278.

24.

Hammers

, Suhrie

, Dixon

, Porter

, Duff

(2021) Validation of one-week reliable change methods in cognitively intact community-dwelling older adults. Neuropsychol Dev Cogn B Aging Neuropsychol Cogn 28, 472–492.

25.

Borenstein

, Hedges

, Higgins

, Rothstein

(2009) Introduction to Meta-Analysis, Wiley, New York.

26.

Fennema-Notestine

, Hagler

Jr. , McEvoy

, Fleisher

, Wu

, Karow

, Dale

, Alzheimer’s Disease Neuroimaging Initiative (2009) Structural MRI biomarkers for preclinical and mild Alzheimer’s disease. Hum Brain Mapp 30, 3238–3253.

27.

Apostolova

, Hwang

, Kohannim

, Avila

, Elashoff

, Jack

Jr. , Shaw

, Trojanowski

, Weiner

, Thompson

, Alzheimer’s Disease Neuroimaging Initiative (2014) ApoE4 effects on automated diagnostic classifiers for mild cognitive impairment and Alzheimer’s disease. Neuroimage Clin 4, 461–472.

28.

Walhovd

, Fjell

, Brewer

, McEvoy

, Fennema-Notestine

, Hagler

Jr. , Jennings

, Karow

, Dale

, Alzheimer’s Disease Neuroimaging Initiative (2010) Combining MR imaging, positron-emission tomography, and CSF biomarkers in the diagnosis and prognosis of Alzheimer disease. AJNR Am J Neuroradiol 31, 347–354.

29.

Wang

, Kennedy

, Goldberg

, Fowler

, Cutter

, Schneider

(2020) Using practice effects for targeted trials or sub-group analysis in Alzheimer’s disease: How practice effects predict change over time. PLoS One 15, e0228064.

30.

Jack

Jr. , Knopman

, Jagust

, Petersen

, Weiner

, Aisen

, Shaw

, Vemuri

, Wiste

, Weigand

, Lesnick

, Pankratz

, Donohue

, Trojanowski

(2013) Tracking pathophysiological processes in Alzheimer’s disease: An updated hypothetical model of dynamic biomarkers. Lancet Neurol 12, 207–216.

31.

Koivunen

, Scheinin

, Virta

, Aalto

, Vahlberg

, Nagren

, Helin

, Parkkola

, Viitanen

, Rinne

(2011) Amyloid PET imaging in patients with mild cognitive impairment: A 2-year follow-up study. Neurology 76, 1085–1090.

32.

Oltra-Cucarella

, Sanchez-SanSegundo

, Ferrer-Cascales

, Alzheimer Disease Neuroimaging Initiative (2022) Predicting Alzheimer’s disease with practice effects, APOE genotype and brain metabolism. Neurobiol Aging 112, 111–121.

33.

Duff

, Beglinger

, Schultz

, Moser

, McCaffrey

, Haase

, Westervelt

, Langbehn

, Paulsen

(2007) Practice effects in the prediction of long-term cognitive outcome in three patient samples: A novel prognostic index. Arch Clin Neuropsychol 22, 15–24.

34.

Duff

, Beglinger

, Moser

, Paulsen

, Schultz

, Arndt

(2010) Predicting cognitive change in older adults: The relative contribution of practice effects. Arch Clin Neuropsychol 25, 81–88.

35.

Duff

, Lyketsos

, Beglinger

, Chelune

, Moser

, Arndt

, Schultz

, Paulsen

, Petersen

, McCaffrey

(2011) Practice effects predict cognitive outcome in amnestic mild cognitive impairment. Am J Geriatr Psychiatry 19, 932–939.

36.

Papp

, Jutten

, Soberanes

, Weizenbaum

, Hsieh

, Molinare

, Buckley

, Betensky

, Marshall

, Johnson

, Rentz

, Sperling

, Amariglio

(2024) Early detection of amyloid-related changes in memory among cognitively unimpaired older adults with daily digital testing. Ann Neurol 95, 507–517.

37.

Duff

, Hammers

, Dalley

BCA

, Suhrie

, Atkinson

, Rasmussen

, Horn

, Beardmore

, Burrell

, Foster

, Hoffman

(2017) Short-term practice effects and amyloid deposition: Providing information above and beyond baseline cognition. J Prev Alzheimers Dis 4, 87–92.

38.

Duff

, Horn

, Foster

, Hoffman

(2015) Short-term practice effects and brain hypometabolism: Preliminary data from an FDG PET Study. Arch Clin Neuropsychol 30, 264–270.

39.

Duff

, Anderson

, Mallik

, Suhrie

, Atkinson

, Dalley

BCA

, Morimoto

, Hoffman

(2018) Short-term repeat cognitive testing and its relationship to hippocampal volumes in older adults. J Clin Neurosci 57, 121–125.

40.

Duff

, Beglinger

, Van Der Heiden

, Moser

, Arndt

, Schultz

, Paulsen

(2008) Short-term practice effects in amnestic mild cognitive impairment: Implications for diagnosis and treatment. Int Psychogeriatr 20, 986–999.