Abstract
Background:
The Cogstate Brief Battery (CBB) is a computerized cognitive test battery used commonly to identify cognitive deficits related to Alzheimer’s disease (AD). However, AD and normative samples used to understand the sensitivity of the CBB to AD in the clinic have been limited, as have the outcome measures studied.
Objective:
This study investigated the sensitivity of CBB outcomes, including potential composite scores, to cognitive impairment in mild cognitive impairment (MCI) and dementia due to AD, in carefully selected samples.
Methods:
Samples consisted of 4,871 cognitively unimpaired adults and 184 adults who met clinical criteria for MCI (Clinical Dementia Rating (CDR) = 0.5) or dementia (CDR > 0.5) due to AD and CBB naive. Speed and accuracy measures from each test were examined, and theoretically- and statistically-derived composites were created. Sensitivity and specificity of classification of cognitive impairment were compared between outcomes.
Results:
Individual CBB measures of learning and working memory showed high discriminability for AD-related cognitive impairment for CDR 0.5 (AUCs ∼ 0.79–0.88), and CDR > 0.5 (AUCs ∼ 0.89–0.96) groups. Discrimination ability for theoretically derived CBB composite measures was high, particularly for the Learning and Working Memory (LWM) composite (CDR 0.5 AUC = 0.90, CDR > 0.5 AUC = 0.97). As expected, statistically optimized linear composite measures showed strong discrimination abilities albeit similar to the LWM composite.
Conclusions:
In older adults, the CBB is effective for discriminating cognitive impairment due to MCI or AD-dementia from unimpaired cognition with the LWM composite providing the strongest sensitivity.
Keywords
INTRODUCTION
Computer-controlled cognitive tests are being used increasingly to understand the nature and magnitude of cognitive impairment in symptomatic Alzheimer’s disease (AD) [1–3]. This understanding raises their potential for use in identifying cognitive impairment in clinical settings, a sign necessary for a diagnosis of dementia or mild cognitive impairment (MCI) [4–6]. However, when compared to standardized neuropsychological tests, refined from decades of use in AD, studies suggest some computerized tests may require further optimization for use in clinical settings [5, 7]. Identification of suboptimal use-cases for new tests is important because this provides a strong foundation for improving precision. Further, optimization strategies developed in response to the suboptimal use-cases may increase understanding of the relative strengths and limitations of computerized tests in AD research and clinical contexts [1].
The Cogstate Brief Battery (CBB) is computer-controlled set of tests of psychomotor function, attention, working memory, and visual learning [8]. These tests were developed initially to detect the change in cognition that characterizes the early stages of AD through use of short administration times, simple rules and requirements for performance, and use of consistent visual stimuli and stimulus-response mapping across all tests [8, 9]. Tests in the CBB, and the method for their administration, were also designed to be unaffected by individuals’ linguistic, cultural, or social economic backgrounds. However, to minimize risks from false positive classification of change, performance on each CBB test was defined by a single outcome selected based on it being drawn from a normal, or corrected-to-normal, distribution [10, 11]. In AD, CBB tests of visual learning and working memory show the greatest sensitivity to cognitive change in the preclinical, prodromal, and dementia stages. For example, performance accuracy on the visual learning and working memory tests, measured individually, or combined into a Learning/Working Memory composite score, show qualitatively and quantitatively similar decline over intervals from six months to five years in preclinical and prodromal AD [12–15]. These same performance measures remain stable over the same intervals in cognitively unimpaired (CU) older adults without AD biomarkers [16], while change in performance speed on tests of psychomotor function and simple attention was not different between AD and matched CU groups [13, 17]. In preclinical AD, decline in visual learning and working memory was increased further by carriage of the apolipoprotein E (APOE) ɛ4 allele and associated with loss of hippocampal volume [18–20]. In adults with clinically classified AD dementia of mild severity, decline in performance on tests of visual learning and working memory was also evident over 18 and 36 months and improved following short periods of treatment with donepezil or the histamine antagonist GSK239512[21–23].
Given their responsiveness to AD-related cognitive change, CBB tests came to be utilized for identifying cognitive impairment in individuals at risk for AD dementia in the clinic [4–7, 25]. Identification of cognitive impairment requires comparison of an individual’s CBB performance to relevant normative data ranges. In some studies, CBB learning and working memory tests showed a high accuracy for identifying MCI (i.e., Clinical Dementia Rating [CDR] = 0.5) or dementia due to AD (i.e., CDR between 1 and 3) [6, 26]. However, other studies reported only moderate sensitivity of CBB test in classifying MCI and this was lower than that observed for other standardized memory tests [7, 28]. Differences in the ability of CBB tests to identify early AD in different samples may have occurred for multiple reasons. First, restriction of CBB performance measures to one per test, based on their sensitivity to cognitive change, may have resulted in other individual or composite performance measures with stronger sensitivity to cognitive impairment being overlooked [29]. Second, CBB normative samples have varied in size and selection criteria between studies, including requirements for supervised administration, practice, and provision of complete performance and this may have resulted in a reduced precision in estimates of normal performance [7]. Third, while current Cogstate normative data is stratified by age, it is possible that sex or education levels could also exert non-trivial influence on decisions about cognitive impairment and may therefore need be taken into account [30]. Finally, the main validation of CBB performance measures for detecting cognitive impairment in the clinic in symptomatic AD was based on analyses from the Australian, Imaging Biomarkers and Lifestyle (AIBL) study [31, 32]. The highly restrictive AIBL inclusion criteria and high socio-economic status of this sample may have limited the extent to which criteria for cognitive impairment developed here could be generalized to other samples of symptomatic AD.
To challenge these hypotheses, performance on the CBB tests can be investigated in groups of CU adults and adults with symptomatic AD who are naïve to the CBB and to AD research in general. Individuals being assessed for entry into clinical trials evaluating experimental drugs for AD are one representative sample for this purpose. This is because they are generally recruited from AD clinics and the general population, and not from prospective research cohorts. Furthermore, entry into such clinical trials requires the individuals’ clinical demographic, general medical, and AD biomarker status to be confirmed by detailed inclusion/exclusion criteria based on a standardized assessments conducted by trained clinicians and certified laboratories. Second, clinical decisions about cognitive impairment in AD can be based on test batteries that yield multiple performance measures, for which data distributions are characterized by some restriction of range around the maximum performance scores and, which can be combined theoretically or using statistical dimensional reduction methods to generate composite measures. This means that individual or composite performance measures from the CBB, not used or deliberately rejected from studies of cognitive change, can be explored for their sensitivity to cognitive impairment in early symptomatic AD.
The aim of this study was to investigate the ability of CBB tests to detect cognitive impairment in symptomatic AD through comparison to matched CU older adults. First, the effects of age, sex and education on speed and accuracy performance measures of each CBB test was determined from a large group of CU adults recruited from different centers and selected using rigorous inclusion criteria. Second, performance on all possible CBB performance measures, including existing and novel composite scores, was compared between this CU group and a group of adults with symptomatic AD being evaluated for entry into a large international clinical trial to establish their sensitivity in detection of AD related cognitive impairment.
MATERIALS AND METHODS
Participants
Data for the CU group was drawn from the first screening visit of a sample of 5,001 older adults aged between 65 and 85 years old recruited from 67 sites in the United States, Canada, and Australia as part of the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease (A4) study [33, 34]. To qualify for CBB assessment, participants had to be classified as cognitively normal based on a Mini-Mental Status Exam (MMSE) score of ≥25 and a CDR Score of 0. Exclusion criteria included a diagnosis of cognitive impairment or dementia, the use of AD medications, significant anxiety or depression, a history of, or intercurrent cerebral vascular disease and unstable medical conditions [9].
Data for the AD group was drawn from the first CBB attempt from the screening or baseline visits of 194 individuals, recruited from specialist dementia care centers in Europe, who were ultimately randomized into the ADAMANT study [35] for which the inclusion and exclusion criteria have been previously outlined in detail. Briefly, all participants met the criteria for probable AD according to the revised National Institute on Aging/Alzheimer’s Association criteria [36]. All had an MMSE total score ≥20 and≤26, a brain MRI consistent with the diagnosis of AD and evidence of the AD pathophysiological process (one or both of the following): 1) medial temporal lobe atrophy on brain MRI (a Scheltens score of ≥2 on a scale of 0–4 on the more atrophied side), 2) positive AD biomarker signature in the cerebrospinal fluid (total tau protein > 400 pg ml–1 and pT181 tau protein > 60 pg ml–1 and Aβ42 < 600 pg ml–1 and Aβ42:Aβ40 ratio < 0.089). Patients were aged 50–85 years inclusive. Stable therapy with an acetylcholinesterase inhibitor for at least 3 months before screening was required; patients taking memantine also had to be on stable therapy for at least 3 months. Patients were excluded if there was a central nervous system disorder other than AD that could be the cause of dementia, such as MRI abnormalities such as infarction in the territory of large vessels, more than one lacunar infarct or any lacunar infarct in a strategically important location, confluent hemispheric deep white matter lesions (Fazekas grade 3) or other focal lesions that may be responsible for the cognitive status of the patient. Patients were also excluded for severe comorbidities such as recent cancer, recent myocardial infarction, poorly controlled diabetes, poorly controlled congestive heart failure, severe renal insufficiency, relevant psychiatric illness, epilepsy, chronic liver disease, chronic infectious disease (hepatitis B, hepatitis C, HIV, or syphilis); uncorrected hypothyroidism or B12 hypovitaminosis. As the ADAMANT study evaluated the effect of active immunotherapy, patients with immunodeficiency, clinically relevant autoimmune disease, current or expected immunosuppressive or immunomodulatory treatment were excluded as well. In terms of disease severity, 105 patients were classified with a CDR score of 0.5, and 89 were classified with a CDR score > 0.5; see Table 1 for details of final analysis sample).
Theoretical and statistical composites developed from the CBB performance measures, their rationale, and their computation
DET, Detection Test; IDN, Identification Test; ONB, One Back Test; OCL, One Card Learning Test; AD, Alzheimer’s disease. 1Equations for each statistical derived composite score estimated from the study sample. See methods section for details on their derivation.
Cogstate brief battery
The CBB was presented to all participants by trained and certified raters using a laptop or tablet computer according to standardized administration procedures and in the participants most used language [6, 8]. At the beginning of each test, written instructions and testing rules were presented on the screen and were also read aloud to the participant. An interactive demonstration of the CBB followed and practice trials were presented to allow participants to demonstrate their understanding of the rules. Each Cogstate test is presented in the form of a card game and requires participants to respond using a “Yes” or “No” button on each trial with “Yes” responses made with the right hand. Participants were instructed and trained to “respond as fast and as accurately as possible.” For each test, responses faster than 100 milliseconds (ms) were defined as anticipatory and responses slower than 3000 ms were defined as abnormally slow with both excluded from further analysis. For each CBB test, the speed and accuracy of performance was computed for each participant. Test performance accuracy was defined as the proportion of correct responses made from the total trials attempted, and test performance speed was defined as the average speed of correct responses. To optimize performance data for use in parametric statistical analyses, distributions of proportion correct scores were transformed using an arcsine square root transformation (arcsine), and distributions of mean reaction times were transformed using a logarithmic base 10 (log10 ms) transformation. In the present study, both speed and accuracy measures from each test for each were used to define performance, resulting in eight CBB outcome measures. The CBB tests are described below in their order ofpresentation.
The Detection (DET) test assesses psychomotor function and is based on the simple reaction time test paradigm. Participants must attend to the task in the center of the screen and follow the rule, “Has the card turned face up? Press ‘yes’ as soon as it does”. The task continued until 25 correct responses were recorded or the maximum allowed time (2 min) had elapsed.
The Identification (IDN) test assesses visual attention and is based on a choice reaction time test paradigm. Participants must attend to the card in the center of the screen and follow the rule “Is the face-up card red? As the card turns, press the “yes” or “no” buttons”. This task continued until 35 correct responses have been recorded or the maximum allowed time (2 min) had elapsed.
The One Card Learning Test (OCL) assesses visual learning and is based on a pattern separation paradigm. Participants must attend to the card in the center of the screen and follow the rule “Have you seen this card before in this task? As the card turns press “yes” or “no” Participants were required to learn a series of four cards (target cards) presented in pseudo-random order with 6 distracter (i.e., non-repeating) cards in groups of 10 trials for 80 trials or until the maximum time allowed (3 min) had elapsed. This 80-trial version of the OCL (i.e., the OCL80) has now been superseded by a shorter, easier version which has been shown to be more sensitive to AD-related memory impairment (OCL48) [37]. Thus, prior to analysis, the OCL80 accuracy scores were transformed to their equivalent OCL48 accuracy score using the algorithm described previously [36].
The One-Back (ONB) test assesses working memory and is based on the n-back (1-back condition) paradigm. Participants must attend to the card in the center of the screen and use the rule “Is this card the same as that on the immediately previous trial? As the card turns over, press “yes” or no”. Forty-two cards were shown in which correct responses occurred on 50% of trials. This task continued until the 42 trials were completed or the maximum allowed time (3 min) had elapsed.
Procedure
All CBB data in this study was drawn from participants enrolled in investigations into biological aspects of AD dementia [33–35]. To complete the CBB, individuals were seated at a desk and the computer/tablet placed in front of them. Participants adjusted their seating position to be comfortable interacting with the testing computer. The test supervisor provided background material on the CBB using standardized instructions. This process continued for each test until all were completed.
Data analyses
Data for the speed and accuracy of performance on each test was computed for each participant, and each participant’s age, sex and years of education completed were recorded. All statistical analyses were undertaken in R version 4.2.0 [38] and proceeded in three stages.
Determination of the effects of age, sex, and education level on CBB measures in healthy older adults
To determine whether age, sex and education level should be controlled in CBB reference data, a series of Ordinary Least Squares linear regression analyses was conducted in which age (in years), sex (Males versus Females) and education level (< 13 years versus ≥13 years) were regressed on each performance measure. Predictions from these regression models and proportions of variance in performance explained by each, accounting for the effects of the other two variables (i.e., semi-partial correlations squared) were computed.
Development of novel CBB composite measures
To identify combinations of CBB performance measures that might improve sensitivity to AD-related cognitive impairment, a series of theoretically defined and statistically derived composite scores were developed from the eight CBB outcome measures. These composite scores, their rationale and computation are summarized in Table 1.
The theoretically defined composites consisted of composite scores a) whose validity has been established, b) are constructed from speed of performance measures from each test, c) are constructed from the accuracy of performance measures from each test, and d) are constructed from speed and accuracy of the tests shown to be most sensitive to AD-related cognitive impairment. Statistically defined composite scores were developed from the application of dimension reduction techniques to the study data.
Two statistically derived composite scores were developed using exploratory factor analysis (EFA) of speed of performance measures and of accuracy of performance measures. Two other statistically derived composites were created using Linear discriminant analysis (LDA) and logistic regression to find the performance measures and their weightings that best discriminate the CU and total symptomatic AD group. EFA was conducted on the CU group with the psych R package [39] and used to derive composite measures that optimally explain shared variance in either speed or accuracy performance measures. For this EFA, the maximum loading of each variable was retained, and all other loadings were set to zero before Bartlett’s method [40] was used to calculate the factor weights from which composite measures for each individual were computed. This method was used to ensure that only performance speed measures were included in the speed composite, and only performance accuracy measures were included in the accuracy composite. For each composite, the EFA procedure determined the relative weights of performance measures from each test. The LDA was undertaken using the MASS package [41] with the aim to identify the linear combination of performance scores that best separated the CU and AD groups. Fisher’s linear discriminant, which finds the linear combination of variables that maximizes the ratio of the between-groups sum of squares and the within-groups sum of squares [42] was calculated and used as a composite measure. Logistic regression was also applied to find a linear combination of all 8 variables to separate the two groups. Here, a binary predictor variable “isAD” (0 = CU group, 1 = combined AD group) was regressed on the 8 outcome measures, giving the log odds that an observation was AD. The log odds for each case were then used as a composite score.
Known groups validity: Discrimination ability of individual and composite CBB measures for cognitive impairment associated with MCI and AD-dementia
Three statistical approaches were used to ascertain the discrimination ability of the CBB individual measures and statistical and theoretical composites for AD-related cognitive impairment. Analyses of demographic factors identified only age and sex to have non-trivial influence on CBB performance. As such, the total AD and CU groups were matched on age, t(188) = 0.896, p = 0.372, and sex, χ2(1) = 1.16, p = 0.28, and comparisons were made without adjustment for demographic variables.
First, for each measure, Glass’s Δ was calculated to provide scale-free effect sizes reflecting the magnitude of impairment in mean performance in the AD groups. The statistical significance of these differences was determined using a series of Welch’s t-tests. Second, a receiver operating characteristics (ROC) analysis was undertaken, and the area under the curve (AUC) and its associated 95% confidence interval [44]) so that the ability to identify AD related cognitive impairment could be compared and ranked between the individual and composite CBB measures [45]. Third, to provide clinical context for these analyses, the discriminability of each individual outcome and composite was investigated, first, under a condition where the criterion for impairment was performance ≥1 standard deviation unit below the mean in the CU group [6, 47]. The use of this criterion results in the specificity of classification of impairment being approximately 86%, assuming normality of the CU data.
To determine the validity of this criterion for false positive classification of AD-related cognitive impairment, an ‘optimal’ criterion for classifying AD-related cognitive impairment was also computed for each composite with reference to Youden’s J index: the threshold at which the sum of sensitivity and specificity is maximized. Sensitivity and specificity were then calculated for both these separate criteria/cutoffs. These analyses were undertaken using the pROC R package [48] and were conducted separately for the MCI subgroup (i.e., CU versus MCI groups), and for the AD-dementia subgroup (CU versus AD-dementia groups). To facilitate comparison of the different outcome measures, scores that discriminated AD-related cognitive impairment from CU with an AUC > 0.7 were considered as acceptable, > 0.8 were considered excellent, and > 0.9 were considered outstanding [45].
Control of Familywise Type 1 error
Familywise Type I error rate for the multiple comparisons made in the present study was controlled by setting alpha for statistical tests of comparisons at p < 0.01. This is because performance on the CBB individual and composite outcome measures are moderately correlated so criteria adjusting Type I error on the assumption of independence (e.g., Bonferroni or Holm corrections) would be too conservative. Second, measures of effect size were computed for all comparisons and effects, and trivial effects (i.e., d < 0.2, sr2 < 0.01; [49]) were not interpreted, irrespective of their statistical significance. Third, this is an important and developing area of medicine where there is a need for effective clinical and research tools and in which findings will be challenged in further studies. Thus, setting alpha at 0.01 provides a good balance between the probability of Type I and Type II errors [50].
RESULTS
Missing data
In the CU group, 5001 adults began the CBB with 2.6% of cases (n = 32) failing to provide complete data on all four tests in the one session. Non-completion rates were 0.52% for the Detection test, 0.84% for the Identification test, 0.78% for the One Back test, and 1.28% for the One Card Learning test. For the AD group, 194 adults began the CBB with 5.15% (n = 10) of cases failing to provide complete data for all four tests in the one session. Participants with missing data for any test were removed from analyses. The final CU, MCI, and AD-dementia groups with complete CBB data are described in Table 2.
Demographic characteristics of participants by diagnosis group
N-miss, number of missing observations; MMSE, Mini-Mental State Examination; CDR, Clinical Dementia Rating.
Effects of age, sex, and education level on CBB performance in CU older adults
Ordinal Least Square Linear Regressions were conducted on both the speed and accuracy measures for each CBB test with age in years at time of assessment, sex, and years of education as predictor variables. The results are summarized in Supplementary Table 1, and Supplementary Table 2 contains univariate and bivariate descriptive statistics for the CU sample.
Each regression analysis was statistically significant with the total amount of variation explained ranging from 2% to 3% (see adjusted R2 values in Supplementary Table 1). With the effects of sex and education level controlled, statistically significant effects of age were observed for CBB speed and accuracy measures on each CBB test with performance accuracy becoming worse and performance speed becoming slower as age in years increased (Fig. 1). With the variation in age and education level controlled, sex was associated systematically with performance speed, with males responding faster on each test. However, the variation explained by sex was uniformly small (i.e., 0 to 1.2%). With variation due to age and sex controlled, there was no strong or systematic effect of education on test performance. Where a statistically significant relationship was observed, the magnitude of the effect was trivial; for example, the most amount of variance explained by education was for OCL accuracy (0.5%), and for ONB accuracy (0.2%) and DET speed (0.2%). Thus, only age and sex were included as covariates in group comparisons. Figure 1 visualizes the age and sex effects for both accuracy and speed of performance for each test.

Predicted effects for age and sex (male, solid lines; female, dotted lines) following statistical control of the effect of education for speed and accuracy of performance on each CBB test. Shading represents 95% confidence intervals. DET, Detection; IDN, Identification test; ONB, One Back test; OCL, One Card Learning test.
Statistical derivation of composite scores from CBB performance data
The six theoretically defined composites were described in the data analyses section. To derive the statistical composites, principal-axis EFA with oblimin rotation was used in the CU sample to determine the latent factor structure underlying the CBB. The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy (KMO = 0.69) and Bartlett’s test of Sphericity (χ 2 (28) = 10166.13, p < 0.001) confirmed the data was appropriate for EFA. The method agreement procedure [51] was used to determine the number of factors to extract. Two factors were extracted as the 2-factor solution was most favored (by 35.71% of methods; Optimal coordinates, Acceleration factor, Parallel analysis, Kaiser criterion, SE Scree). Supplementary Table 3 shows rotated factor loadings, the communality, and the factor weights for each variable. Factor 1 loaded on the performance speed measure of each test, and factor 2 loaded on the performance accuracy measure of each test. These two factors were uncorrelated (r = 0.06) and accounted for 40.91% of the total variance of the original data (Factor 1 = 27.17%, Factor 2 = 13.74%). The solution exhibited simple structure as there were no strong cross-loadings among the factors (range: –0.18–0.23). These cross loadings were then set to 0, before Bartlett’s method [40] was used to create factor weights, ensuring each variable would only contribute to one factor. These factor weights were then used to calculate a composite score for each participant on each factor labelled as ‘Factor analysis speed composite’ and ‘Factor analysis accuracy composite’.
The LDA examined the linear combination (via Fisher’s linear discriminant; [42]) of the eight test outcomes that best separated the combined MCI and AD-dementia from the CU group. Each participant’s score on this linear combination was then calculated and used as a composite score. The logistic regression sought a linear combination of all eight variables (plus an intercept) that separated the AD and CU groups and providing a log odds outcome that an observation was AD. These scores were then used to compute the composite score. The coefficients, both unstandardized and standardized, for the linear combination of variables defining both the LDA and logistic regression composites measures are shown in Supplementary Table 4. The standardized coefficients reveal that ONB Speed and Accuracy, OCL Speed and Accuracy, and (to a lesser extent) DET accuracy, were the most important measures in the construction of both the LDA and logistic regression composites, although the relative importance between these measures differed between the two composites. Finally, the density distributions of each composite measure grouped by diagnosis group are provided in Supplementary Figure 1 to show the distributions of these measures.
Known groups validity: Discrimination ability of individual and composite CBB performance measures in MCI and AD dementia
Prior to analyses comparing performance between the CU and AD groups, the extent to which education influenced performance in the AD group was checked by comparing performance between groups with < 13 years and 13 or more years of education. No statistically significant effect of education was identified in these analyses (see Supplementary Table 5). The ability of each individual CBB measure, and of the theoretically and statistically derived composite scores, to classify cognitive impairment in CDR 0.5 and CDR > 0.5 is summarized in Fig. 2 and Tables 3 and 4. The AUC for each outcome compared provides a reference for ranking the discrimination ability of these different measures (Fig. 2).

Ability to discriminate cognitive impairment in MCI (blue triangles) or AD dementia (purple dots) from CU for individual CBB measures and the theoretically defined and statistically defined composite scores, measured by the Area Under the Receiver Operating Characteristics Curve (AUC). Error bars represent 95% confidence intervals.
Discrimination ability of CBB measures to cognitive decline in Alzheimer’s Disease (CDR > 0.5)
Crit, Criterion; Sens, Sensitivity; Spec, Sensitivity. ‘Youden’s J’ and ‘1 SD threshold’ refer to the criterion, sensitivity, and specificity, when applying an optimal decision rule via Youden’s J index (Youden’s J) or a prespecified 1 SD deficit decision rule (1 SD Threshold). Values in square brackets are 95% confidence intervals. *Sign for estimate of effect size adjusted so that positive values indicate performance in dementia group worse than controls and vice versa.
Discrimination ability of CBB measures to cognitive decline in Mild Cognitive Impairment (CDR 0.5)
Crit, Criterion; Sens, Sensitivity; Spec, Sensitivity. ‘Youden’s J’ and ‘1 SD threshold’ refer to the criterion, sensitivity, and specificity, when applying an optimal decision rule via Youden’s J index (Youden’s J) or a prespecified 1 SD deficit decision rule (1 SD Threshold). Values in square brackets are 95% confidence intervals. *Sign for estimate of effect size adjusted so that positive values indicate performance in MCI group worse than controls and vice versa.
For the individual CBB test measure, AUC values greater than 0.8 were observed only for the accuracy of performance on the ONB and OCL tests. As expected, AUCs for both tests were higher in the AD-dementia group than for the MCI group. For individual or composite CBB measures that reflected performance speed, AUCs were low and equivalent in the MCI and AD dementia groups. Of the theoretically defined composites, those that included accuracy measures from both the OCL and ONB tests provided the highest AUCs, with the Learning/Working Memory (LWM) composite providing the best discriminability in both MCI and AD dementia groups. Adding accuracy outcomes from the IDN or DET tests did not improve the ability of composites to discriminate AD-related cognitive impairment (i.e., the Total Error or Average Accuracy composites). Similarly, there was no increase in the ability of accuracy-based composites to discriminate AD-related cognitive impairment with incorporation of speed of performance measures from the same tests (i.e., ONB Speed/Accuracy, OCL Speed/Accuracy or the OCL and ONB Speed/Accuracy composites). Of the statistically derived composites, the Logistic regression composite provided the highest AUC, although this was not greater than that from the theoretically defined LWM composite (Fig. 2). The LDA composite also had excellent ability to detect cognitive impairment MCI and AD dementia. While this high discriminability also extended to the EFA derived composite of performance accuracy, the EFA derived measure of performance speed did not discriminate groups much better than that of the theoretically defined speed composite (Average speed). For both speed-based composites, the ability to detect cognitive impairment was similar for MCI and AD-dementia groups.
The data in Table 3 allows the performance of the individual and composite CBB measures in AD dementia to be considered in more detail. First, group differences on all individual and composite outcomes, except speed of performance on the Detection and Identification tests, were statistically significant. Second, as expected, the individual and composite outcome measures providing the highest AUCs (Fig. 2) also showed the largest magnitudes of impairment from comparison of group means (i.e., Glass’s Δ), with some AD-dementia group means being three or more standard deviation units below that of the matched CU group (e.g., the logistic regression and LDA composites). Based on the magnitude of impairment alone, measures with the greatest sensitivity to AD-related cognitive impairment were the logistic regression and LDA composites, an outcome expected because their composition was based on optimization of performance measures in the present sample that best separated the groups when combined. The theoretically defined LWM composite and Total Error Composites also showed magnitudes of impairment close to three. Finally, individual, and composite CBB outcomes that reflected only speed of performance showed magnitudes of impairment in the AD dementia group of approximately one SD below that of the matched CU adults (average speed composites and factor analysis speed composite). With respect to estimates of sensitivity and specificity, accuracy on the OCL and ONB tests provided the greatest sensitivity (> 88%) when using an ‘optimal’ threshold, although this sensitivity was reduced slightly when specificity was held at approximately 85% using a threshold of 1 SD impairment from the CU group mean. Of the theoretically defined composites, the LWM composite provided the greatest sensitivity and specificity, although composites based only on CBB error scores provided strong sensitivity to AD-dementia-related cognitive impairment. Of the statistically derived composites, the logistic regression and LDA composites provided the greatest sensitivity to AD related cognitive impairment while retaining high specificity in both cases.
The pattern of magnitudes of impairment, and sensitivity/specificity of the individual and composite CBB outcomes and AUC estimates in the MCI group were quantitatively reduced but qualitatively similar to that observed for AD dementia (Table 4). Of the individual CBB tests, differences in performance between the MCI and CU groups were statistically significant for all measures except for speed on the DET and IDN tests, with the magnitude of these differences between 0.56 and 2.05 SD units. Magnitudes of impairment in the MCI group were greatest for the LWM composite score although the magnitude of impairment on all composites that included OCL and ONB accuracy measures was statistically significant with impairment greater in magnitude than 2, even when these composites included performance speed measures from the same tests. Group differences for the speed measures from individual tests, or when combined as speed composites, did not yield large magnitudes of impairment. Again, as expected the logistic regression and LDA composites provided very large magnitudes of impairment in the MCI group. Of these outcomes, the greatest sensitivity to MCI related cognitive impairment was observed for the ONB accuracy score although estimates for the OCL accuracy score learning working memory composite total error logistic regression and LDA composite were also close to 0.8. In general, sensitivity estimates were reduced when specificity was held at ≥1 SD of impairment, although under this criterion, estimates of specificity remained relatively high (> 79%) for the LWM composite, the Logistic Regression and LDA composites.
DISCUSSION
The results show that in older CU adults and adults with symptomatic AD performance on the CBB measures of visual learning and working memory have the greatest sensitivity to cognitive impairment associated with early symptomatic AD. The study was conducted in older adults whose medical and cognitive status had been defined using the rigorous inclusion/exclusion criteria required for AD clinical trials and as they were not participants in prospective studies of AD, the older adults studied here were naive to the CBB. As such, assessments on the CBB are like that of the clinical assessments made of older adults who seek help for their cognitive abilities. The ability to discriminate AD-related cognitive impairment from normal cognition was compared between individual performance speed and accuracy measures for each CBB test, as well as for their combination in theoretically- and statistically-derived composite scores (Table 1, Fig. 2). Within this context, accuracy of performance on the OCL and ONB tests, individually, or combined according to theoretical models or statistical optimization showed the greatest ability to discriminate CU from MCI (e.g., CDR = 0.5) and dementia (e.g., CDR > 0.5) patient groups (Tables 3 and 4). As has been reported previously [13, 28] individual and composite measures of psychomotor function and attention were relatively unimpaired in both MCI and AD-dementia groups and integration of performance speed measures into learning and working memory composite scores did not increase the ability of those composites to discriminate CU from AD-related cognitive impairment (Fig. 2). Together, these results suggest the CBB measures that have been used most commonly as markers of cognitive impairment in individuals at risk for AD dementia remain the most appropriate for that use [6].
The finding that accuracy of performance on the working memory and visual learning tests provides the best discrimination of MCI and AD-dementia from CU adults is consistent with earlier CBB data from the AIBL study which also found accuracy measures from these tests, alone or combined into a learning and working memory composite score, to be the most sensitive to cognitive impairment [6, 52]. The sensitivity of the Learning/Working Memory composite to cognitive impairment in MCI due to AD in the present study is also consistent with findings from a Mayo Clinic Study of Aging (MCSA) cohort, where the same composite yielded a very large magnitude impairment (Hedges g = 2.12) in a relatively small group of individuals with MCI due to AD (i.e., n = 15) compared to CU older adults with amyloid and tau PET levels within normal limits (n = 146; [7]). This large impairment magnitude yielded a 93% sensitivity and 79% specificity for classification of MCI due to AD when the classification criterion was derived from the MCSA AD and CU subsamples through application of Youden’s J index. This sensitivity for classification of MCI due to AD was reduced (73%) when specificity was controlled by requiring impairment be classified by performance ≥1 SD below older age-stratified normative means for Cogstate tests. In the current MCI sample, the sensitivity to cognitive impairment was also reduced for most composites when specificity was controlled using the ≥1 SD impairment criterion compared to the optimal criteria identified from application of Youden’s index (Table 4). It is possible that reduction in sensitivity resulting from application of older Cogstate normative data, may have been due to performance variability in this being inflated by inclusion of performance from unsupervised or incomplete assessments [7]. In this study, the potential for uncontrolled variance in reference data to influence estimates of sensitivity was reduced through use of data from a large group of CU older adults who had satisfied rigorous inclusion/exclusion criteria and who completed all four CBB tests. In addition, as exploration showed that in addition to age, sex accounted for some additional variance in normative performance, we ensured that the CU and AD groups were matched on both age and sex. With this precision operating, the strong ability of CBB learning and working memory tests to discriminate AD-related cognitive impairment suggests that normative data for the CBB should also be restricted to a homogenous group and organized according to both age and sex.
The current study also investigated the extent to which the discrimination ability of the CBB to detect AD-related cognitive impairment could be improved by exploiting different combinations of the speed and accuracy performance measures from the four CBB tests. By comparing different combinations of these scores in the same sample, a framework for understanding how such outcomes relate to one another and to the nature of cognitive impairment in AD was developed. For both the MCI and the AD-dementia group, impairment in accuracy of performance on the learning and working memory tests was substantial and was increased further by their combination (Tables 3 and 4). While speed of performance on the learning and working memory tests was also impaired in both groups, the magnitude was considerably less than that observed for accuracy measures (Table 3). However, this raised the possibility that inclusion of performance speed into learning and working memory composites could improve their ability to detect AD-related cognitive impairment. This was not the case, at least when speed measures were given equal weighting in the composite calculation. For example, compared to the Learning/Working Memory composite computed from performance accuracy measures only, learning, and working memory composites that included measures of performance speed, showed less impairment, lower sensitivity, and a lower AUC for identification of cognitive impairment in MCI and AD-dementia (Fig. 2). The low sensitivity of CBB performance speed measures is consistent with neuropsychological models of early AD, which emphasize that psychomotor and simple attentional processes remain relatively unaffected in early symptomatic AD [46, 53]. These data are also consistent with findings from multiple independent cohorts that CBB measures of performance speed, alone or combined into composite scores are not impaired in early biomarker-defined AD [7–9, 54]. This finding suggests therefore that etiologies other than, or in addition to, AD, should be considered in older adults whose performance on the CBB is characterized by impairment on both learning/working memory and attention/psychomotor function composites or on the individual tests from which they are derived.
The very large sample studied here provided an opportunity to challenge the extent to which CBB composite scores, computed using equal weighting of their component measures, could be improved through accommodation of any shared variance. First, separate factor analysis of the CBB speed and accuracy measures indicated that single factors could describe each aspect of performance in CU data. However, comparison between CU and AD groups on these factors indicated that none provided larger magnitudes of impairment or AUCs larger than the theoretically defined composites of speed or accuracy computed using equal weighting of performance measures (i.e., Total Errors or Average Accuracy or Average Speed, Tables 3 and 4). Thus, equal weighting of CBB scores was sufficient for generation of composite scores able to detect both AD-dementia and MCI-related cognitive impairment. This provides an additional benefit that the composites retain a stronger theoretical link to the tests themselves and to the psychological paradigms upon which they are based [8, 55]. The second statistical approach applied logistic regression and LDA to CBB performance data from the CU and AD samples to identify the combination of tests, and their relative weightings, that provided the greatest discrimination between groups. As expected, composite scores derived using this method provided the most accurate discrimination of AD from CU groups (Tables 3 and 4). However, given their potential for sample dependence, the component performance scores identified and their exact weightings for the component performance scores, should be considered exploratory or hypothetical and their validity then challenged in other AD groups. However, of relevance to the current analyses is the extent to which the performance measures identified, and their weighting, was different to that used in the theoretically defined composites. Interestingly, the logistic regression and LDA methods provided different weightings for different performance measures, even though their discrimination ability was equivalent. Both techniques weighted most heavily ONB and OCL accuracy as well as OCL speed. They also included speed of performance on the ONB and DET test with much smaller, yet still relevant weightings. However, while the composite derived from the logistic regression weighted ONB and OCL equivalently, in the composite identified from the LDA, ONB accuracy had the greatest weighting (by a factor of ∼2). As the sensitivity of these different statistical composites were similar, the exact weighting of these individual measures (e.g., ONB accuracy, OCL accuracy) may not be important, provided they remain within a wide optimal range.
The Learning/Working Memory composite score has been applied in other studies also aiming to determine its ability to classify cognitive impairment in individuals with early symptomatic AD. In general, where MCI is due to AD (e.g., due to the presence of abnormal amyloid biomarkers), estimates of discriminability (e.g., AUC = 0.90, Fig. 1) and group mean difference are high (e.g., Glass Δ= 2.05, Table 4). However, where the etiology of MCI is unclear, or is likely to reflect other non-AD processes, sensitivity of the Learning/Working Memory composite to MCI-related cognitive impairment is reduced [5, 43]. For example, a magnitude of impairment of 0.97 was observed for the Learning/Working Memory composite in adults with MCI of an undefined etiology from the MCSA cohort. Impairment of a similar magnitude was observed for the Learning/Working Memory composite in a small MCI group, with undefined etiology, from the ADNI study (0.71). While the lower-than-expected sensitivity may threaten the diagnostic accuracy of the CBB for MCI (p. 584; [7]), such results cannot be generalized to models of the sensitivity of the CBB to AD-related cognitive impairment. For example, the current and past studies show in MCI due to AD there is substantial impairment on the Learning/Working Memory composite and relatively little impairment on Psychomotor/Attention composite [6, 52]. Furthermore, in clinically classified MCI not due to AD (i.e., where amyloid levels are within normal limits), impairment on Psychomotor function /Attention composite can become greater than, or equal to that in prodromal AD [5, 52]. Given the epidemiological design of the MCSA sample, it is likely that many of the participants classified with prevalent MCI, who performed the CBB, did not have AD, although this needs to be confirmed using biomarkers in the future. Furthermore, in the MCSA both the CU and MCI groups are likely to contain individuals with high rates of comorbid CNS disease. For example, published estimates of comorbidities in the MCSA show high rates of history of stroke (15%), cardiovascular disease (33%), hypertension (79%) and atrial fibrillation (16%; [56]), and analyses of representative sub-samples of the MCSA show high levels of myocardial infarction, coronary arterial bypass grafting, diabetes, obesity and depression [56–58]. Each of these conditions is associated with cognitive impairment in older adults, often manifesting as a decline in psychomotor function and attention, and are therefore excluded in specific AD studies such as AIBL, or in AD clinical trials such as A4 and ADAMANT in which there is particular emphasis on removing individuals with, or at increased risk of, vascular cognitive impairment or dementia [31, 60]. Furthermore, despite variation in the geographical, linguistic, or social economic characteristics, methods for recruitment and screening into clinical trials do deliver homogenous study samples. In the current study this is seen in the equivalence in the average age and distributions of sex in the A4 and ADAMANT samples, and in the observation that education history did not influence CBB performance in either sample.
Evidence that the cognitive impairment observed in the MCSA prevalent MCI group reflected the impact of non-AD biology is also seen in the impairment on the Psychomotor/Attention composite (g = 0.88) being almost large as that observed for the Learning/Working Memory Composite (g = 0.97). Impairment on the Psychomotor/Attention composite was also greater than the 0.19 standard deviation impairment observed in the AB + MCI in this study, and also greater than the 0.5 standard deviation impairment observed in the AIBL cohort [6, 52]. Furthermore, comparison of CBB performance between MCSA MCI groups with normal and abnormal AD biomarkers indicated while Learning/Working Memory composite was substantially worse in the prodromal AD group (g = 1.43), the Psychomotor/Attention composite was worse in the non-AD MCI group (g = 0.22). Thus, while it is important for epidemiological studies to understand the sensitivity of computerized tests to screen for cognitive impairment in older adults more generally, a more appropriate approach to exploring the sensitivity of the CBB to prevalent MCI in the MCSA sample would be to allow classification of abnormality on either the Learning/Working Memory or the Psychomotor/Attention composite, or both. Such an approach would assist in understanding the ability of individual or composite measures from the CBB to detect cerebrovascular disease, either alone or in combination with AD.
One final and important point is that performance measures from the CBB should never be used to “diagnose” MCI (p.56) [7, 61]. While cognitive impairment is a necessary condition for MCI, it is not sufficient. Additional criteria such as acknowledgement by the individual, or a confidant, that their cognition has become worse over time, and that functional activities of daily living remain unimpaired are required [62, 63]. Furthermore, when these criteria are met and AD is suspected, it becomes necessary to rule out non-AD factors that explain cognitive impairment, such as cerebrovascular disease, serious systemic disease with direct or indirect CNS effects or mood disorders. In the current study, and in related studies from AIBL, no claims that the CBB can diagnose MCI or AD dementia have been made.
Taken together, the data from the current study indicate that accuracy of performance on the CBB tests of visual learning and working memory can detect AD-related CNS disruption and that, with care, these can be applied to assist with clinical decisions about the presence of cognitive impairment in individuals at risk for the disease. Importantly, the data here are drawn from tightly controlled experimental studies and will therefore require further validation in community samples of older adults at risk of AD. Such studies are underway. As the data in this study support a recommendation from the MCSA researchers that there be greater precision in the CBB normative data, it will now be important that community studies and ongoing research studies gain access to such data. One issue to also investigate further is the extent to which the presence of AD biomakers in the normative sample influenced decisions about cognitive impairment. While in CU adults, abnormal levels of AD biomakers (i.e., preclinical AD) are associated with cognitive decline, cross-sectional studies comparing cognition between adults with preclinical AD and CU adults with AD biomakers in normal limits show there to be no, or only very small, differences (e.g., ds∼< / = 0.2). While there is evidence that removal of CU adults with occult AD from normative samples can increase age-stratified normative means and reduce the estimates of variation associated with those means, these effects are generally very small, and operate to the greatest extent in older (i.e., > 75 years) adults [64, 65]. While the cognitive effects of AD biomakers in CU adults are important to models of AD development, their small magnitude suggests their presence will have limited impact on clinical decision-making [9, 66]. For the current CU sample AD biomarker levels were not considered, especially since the aim was to obtain as large a normative sample as possible and because each CU individual had met rigorous medical and cognitive inclusion/exclusion criteria. Now that the individual and composite performance scores from the CBB have been developed, future research can be conducted in CU samples to determine the extent to which normative cut-scores are influenced by AD biology. Such investigations can also be extended to other clinical trial data in symptomatic AD to provide replication and potentially further refinement to the CBB outcomes and their sensitivity and specificity to AD related cognitive impairment. Data from placebo groups from AD clinical trials can also be harnessed to examine the stability, reliability, and sensitivity to change of the theoretically- and statistically derived composite scores developed here. Finally, it is now important to conduct studies with similar designs to this one, in older individuals with cognitive impairment arising from etiologies other than, or in addition to, AD.
Footnotes
ACKNOWLEDGMENTS
We thank all participants for their participation in the A4 and ADAMANT studies, the data from which was used for the analyses in this paper.
FUNDING
The ADAMANT study was funded by AXON Neuroscience SE. The Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease (A4) A4 study was funded through a Public-Private-Philanthropic Partnership with funding and/or in-kind support from the National Institute on Aging, Eli Lilly and Co., Avid, Cogstate, Accelerating Medicines Partnership, Alzheimer’s Association, GHR Foundation, an anonymous foundation, and additional philanthropic organizations. YY Lim is funded by an NHMRC Career Development Fellowship (GNT1162645) and an NHMRC Emerging Leadership Investigator Grant (GNT2009550). The authors have no other funding to report.
CONFLICT OF INTEREST
JW, AS, and PM are/were full time employees of Cogstate Ltd, the company that distributes the Cogstate Brief Battery. PN, CPG, MO, and SK were employees of AXON at the time that this data was collected. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
DATA AVAILABILITY
Data from the analyses included in this study are available from the authors upon written request.
