Abstract
This study investigated the utility of four WAIS-IV Digit Span (DS) indices (traditional Reliable Digit Span [RDS], RDS-Working Memory [RDS-WM], RDS-Revised [RDS-R], and DS Age-Corrected Scaled Score [ACSS]) as embedded performance validity tests (PVTs) among a sample of 342 consecutive adults referred for neuropsychological evaluation of ADHD. All DS indices had acceptable classification accuracy (areas under the curve: .73–.76) for detecting invalid performance with optimal cut-scores of RDS ≤7 (35% sensitivity/93% specificity), RDS-WM ≤7 (56% sensitivity/86% specificity), RDS-R ≤12 (48% sensitivity/85% specificity), and ACSS ≤7 (46% sensitivity/87% specificity). Although all indices were able to detect invalid performance, DS indices incorporating the more complex working memory trials of the task yielded the best accuracy for identification of invalid test performance among adults referred for ADHD evaluation.
Introduction
Attention-Deficit/Hyperactivity Disorder (ADHD) is a prevalent psychiatric disorder that commonly results in referral for neuropsychological evaluation (Faraone & Biederman, 2005; Kessler et al., 2006; Schroeder, Martin, & Walling, 2019). Despite being a frequent referral question for neuropsychological evaluations, accurate assessment of ADHD is often complicated by concerns of invalid test performance, especially given the potential external incentives associated with a diagnosis, such as access to psychostimulant medication and academic/standardized testing accommodations (e.g., Teter et al., 2005). ADHD is also highly heterogeneous in terms of associated cognitive sequelae (e.g., Pievsky & McGrath, 2018), thereby leading to variability in neuropsychological performance and difficulty establishing standards for adequate test engagement and the presence of self-reported symptoms, particularly in adults (Barkley et al., 2011; Faraone et al., 2004; Suhr et al., 2008). Prior research reported high bases rate of performance invalidity (i.e., 20–50%; Martin & Schroeder, 2020; Suhr et al., 2008; Sullivan et al., 2007) and exaggerated symptom reporting (e.g., 21%; Leib et al., 2021) among adults referred for ADHD evaluation. These findings underscore the importance of performance validity testing in this population (Hirsch & Christiansen, 2018; Marshall et al., 2010; Resch et al., 2021), mirroring more general practice standards calling for routine administration of performance validity tests (PVTs) during all neuropsychological evaluations (Sweet et al., 2021).
The use of both freestanding and embedded PVTs not only ensures that test results are an accurate representation of a patient’s true cognitive abilities, but also increases confidence in conclusions drawn from such evaluations in clinical and research settings. Embedded PVTs may have particular utility given their ability to assess validity at numerous time points throughout an evaluation without inflating testing time (Boone, 2009). Among embedded PVTs, Reliable Digit Span (RDS; Greiffenstein et al., 1994) is one of the most heavily researched, commonly used, and well-validated embedded PVTs (Boone, 2007; Martin et al., 2015). RDS was originally developed using the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III; Wechsler, 1997) Digit Span (DS) subtest, which only included Forward and Backward trials. RDS was later revised for the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV; Wechsler, 2008) and included the newly added Sequencing trial (RDS-R). Although RDS, and, to a lesser extent, RDS-R and DS age-corrected scaled score (ACSS) have been extensively cross-validated in a multitude of clinical populations (e.g., Schroeder et al., 2012; Webber & Soble, 2018), there is a relative paucity of research examining DS PVTs specifically among adult ADHD populations. This is surprising considering that DS is frequently administered as a test of attention/working memory or as part of the full WAIS-IV IQ testing during ADHD evaluations. As such, this study aimed to examine the utility of DS indices as embedded PVTs among a large sample of adults referred for ADHD evaluation as well as to compare classification accuracy and corresponding sensitivity/specificity of these indices, determine optimal cut-scores, and assess which index most accurately detects performance invalidity among adult ADHD referrals.
Method
Participants
This cross-sectional study examined data from a consecutive academic medical center sample of 348 adults referred for neuropsychological evaluation of possible ADHD by their treating physician or psychiatrist and who consented to including their data for research. Six patients were missing a criterion PVT and thereby excluded, resulting in a final sample of 342. The sample had a mean age of 27.8 (SD = 6.67), education of 15.7 years (SD = 2.06), and Test of Premorbid Functioning (TOPF; Pearson, 2009) estimated Full-Scale IQ of 105.88 (SD = 7.75). Sex distribution was 38% male (N = 130) and 62% female (N = 212). The sample had the following ethnoracial composition: Non-Hispanic White (N = 159; 46%), Hispanic (N = 84; 25%), Black (N = 48; 14%), Asian (N = 32; 9%), and other race/ethnicity (N = 19; 6%). All patients identified English as their primary language and completed evaluation procedures in English. Sixty-eight percent (N = 231) were college students. Eighty-five percent were classified into the valid group (N = 290) and 15% in the invalid group (N = 52) based on performance on seven, independent criterion PVTs (see Criterion PVT section below). Among the 290 in the valid group, 235 met formal diagnostic criteria for ADHD (see ADHD Diagnostic Workup section below).
Measures
ADHD Diagnostic Workup
All patients received a detailed and multimodal assessment to determine if they met Diagnostic and Statistical Manual of Mental Disorders-Fifth Edition (DSM-5; APA, 2013) criteria for ADHD. This workup included: 1) a thorough chart/record review (including previous testing and academic records, if available); 2) a semi-structured clinical interview that assessed DSM-5 criteria and concomitant mental disorders, as well psychosocial and developmental history; 3) the Clinical Assessment of Attention Deficit-Adult; (CAT-A; Bracken & Boatwright, 2005), an inventory of childhood and adulthood ADHD symptoms with embedded symptom validity scales (Leib et al., 2021; White, Ovsiew, et al., 2020); 4) a uniform, comprehensive neuropsychological test battery; and 5) the Minnesota Multiphasic Personality Inventory-Second Edition-Restructured Form (Ben-Porath & Tellegen, 2008).
Criterion PVTs
All patients were administered a battery of seven freestanding and embedded PVTs that previously have been cross-validated in ADHD samples or ADHD-predominant mixed clinical samples, including the Dot Counting Test (Abramson et al., 2021; Boone et al., 2002); Rey 15-Item Test Recall/Recognition (Ashendorf et al., 2021), Rey Auditory Verbal Learning Test Effort Score (Pliskin et al., 2021), Brief Visuospatial Memory Test-Revised Recognition Discrimination (Bailey et al., 2018; Resch et al., 2020), Stroop Test Word Reading T-Score (White, Korinek, et al., 2020); Trail Making Test - Part A (White, Korinek, et al., 2020), and Verbal Fluency (F/A/S; White, Korinek, et al., 2020). Consistent with current empirical findings and neuropsychology practice standards, those with ≤1 criterion PVT failures were classified into the valid group and those with ≥2 failures were assigned to the invalid group (e.g., Critchfield et al., 2019; Jennette et al., 2021; Larrabee, 2014; Rhoads et al., 2021; Schroeder, Martin, Heinrichs, & Baade, 2019; Soble et al., 2020).
Digit Span PVTs
As part of their comprehensive neuropsychological evaluation, all patients were administered the WAIS-IV DS subtest. Four previously identified DS PVTs were examined in this study: (1) traditional RDS (Forward/Backward), (2) RDS-WM (Backward/Sequencing), (3) RDS-R (Forward/Backward/Sequencing), and (4) DS ACSS.
Data Analytic Plan
Spearman correlations evaluated associations among the DS indices and criterion PVTs in the valid group. Descriptive analyses were calculated for each DS PVT, and independent samples t-tests examined significant performance differences between the validity groups. Receiver operating characteristic (ROC) curve analyses determined the classification accuracy for each DS PVT, and optimal cut-scores that maximized sensitivity/specificity were selected for each DS PVT with a statistically significant area under the curve (AUC). Classification accuracy (AUCs) was interpreted as low (.50–.69), acceptable (.70–.79), excellent (.80–.89), or outstanding (≥.90; Hosmer et al., 2013). Optimal cut-scores were defined in this study as those that maximized sensitivity for invalid detection while maintaining minimally accepted specificity levels (i.e., >84% but preferable closer to >90%; Boone, 2013; Larrabee, 2003). All analyses were first conducted for the overall sample (i.e., all adult ADHD referrals), and then replicated among the subsample of patients who met DSM-5 diagnostic criteria for ADHD to investigate whether the four embedded DS PVTs functioned similarly in a more homogenous ADHD sample.
Results
Intercorrelations between Performance Validity Tests among the Valid Group.
Note. n = 290. DCT: Dot Counting Test; RFIT: Rey 15-Item Test/Recognition; FAS: Verbal Fluency Test; RAVLT ES: Rey Auditory Verbal Learning Test Effort Score; BVMT-R RD: Brief Visuospatial Memory Test-Revised Recognition Discrimination; TMT-A: Trail Making Test-Part A; RDS: Reliable Digit Span Forward and Backward; RDS-WM: RDS Backward and Sequencing; RDS-R: RDS Forward, Backward, and Sequencing; ACSS: Digit Span Age-Corrected Scaled Score.
*p < .05.
**p < .01.
Performance on Digit Span Embedded Validity Indictors by Validity Group.
Note. RDS: Reliable Digit Span Forward and Backward; RDS-WM: RDS Backward and Sequencing; RDS-R: RDS Forward, Backward, and Sequencing; ACSS: Digit Span Age-Corrected Scaled Score.
***p < .001.
Accuracy of the Digit Span Embedded Validity Indicators for the Overall Sample.
Note. N = 342. RDS: Reliable Digit Span Forward and Backward; RDS-WM: RDS Backward and Sequencing; RDS-R: RDS Forward, Backward, and Sequencing; ACSS: Digit Span Age-Corrected Scaled Score; AUC: area under the curve; SN: sensitivity; SP: specificity; PPV: positive predictive value; NPV: negative predictive value. Bolded scores denote optimal cut-scores for detecting invalid performance.
***p < .001.
Accuracy of the Digit Span Embedded Validity Indicators for the Subsample Diagnosed with Attention-Deficit/Hyperactivity Disorder.
Note. N = 235. RDS: Reliable Digit Span Forward and Backward; RDS-WM: RDS Backward and Sequencing; RDS-R: RDS Forward, Backward, and Sequencing; ACSS: Digit Span Age-Corrected Scaled Score; AUC: area under the curve; SN: sensitivity; SP: specificity; PPV: positive predictive value; NPV: negative predictive value. Bolded scores denote optimal cut-scores for detecting invalid performance.
***p < .001.
Discussion
This study cross-validated four previously identified PVTs derived from the WAIS-IV DS subtest in a large sample of adults referred for neuropsychological evaluation of ADHD. Among the total sample, results revealed that all four DS PVTs (i.e., RDS, RDS-WM, RDS-R, DS ACSS) reliably differentiated valid from invalid performance and demonstrated acceptable classification accuracies (see Table 3). Among the subsample of patients who met DSM-5 diagnostic criteria for ADHD, all four DS PVTs significantly distinguished valid from invalid performance and produced nearly identical classification accuracy statistics to the overall sample. Therefore, these results indicate that these cut-scores can be used in the evaluation of suspected ADHD. Importantly, given the high correlations between these four DS scores, they cannot be used as four independent PVTs as this may inflate the false positive rate (Boone, 2013). Rather, only one DS score (and preferably the one with the strongest psychometrics) should be selected as the PVT. Interestingly, although all four DS PVTs were able to detect performance invalidity, the traditional RDS PVT evidenced the least robust psychometric properties when compared to RDS-WM and RDS-R, even at a higher cut-score of ≤7, suggesting that the latter embedded DS PVTs may be more advantageous in detecting suboptimal performance when examining a patient with suspected ADHD. Finally, although the DS ACSS showed adequate detection of performance invalidity, it may ultimately be less useful in actual clinical practice, as the optimal cut-score represents a score that falls in the low average range (i.e., ≤7ss), which may lead to difficulties in distinguishing suboptimal engagement from a genuine weakness on this measure (see Ovsiew et al., 2020). Overall, the results from the current study are broadly consistent with research regarding the utility of RDS indices in the detection of invalid performance, as findings from previous studies resulted in similar classification accuracy across RDS indices (Babikian et al., 2006). However, previous findings were mixed regarding the sensitivity/specificity of individual RDS measures, indicating further research is needed to establish consistent guidelines regarding the use of individual RDS indices as PVTs (Webber & Soble, 2018; Webber et al., 2020; Young et al., 2012).
Methodological strengths of this current study included the utilization of a demographically diverse sample and the use of a standardized and multimodal diagnostic assessment protocol to facilitate the accurate diagnosis of ADHD. Additionally, the study analyses were replicated with the subsample of patients who met DSM-5 diagnostic criteria for ADHD to establish that the four DS PVTs functioned comparably regardless of whether patients met ADHD diagnostic criteria. Finally, a standardized battery of diverse and well-validated criterion PVTs were used to detect suboptimal neuropsychological performance, which maximizes the accuracy of our validity classifications and optimizes internal validity (Bilder et al., 2014). Despite these strengths, results should be interpreted in the context of a few limitations. Many of the criterion PVTs used to ascertain patients’ validity status were embedded measures, which are often less psychometrically robust than freestanding PVTs (e.g., Bain et al., 2021; Ovsiew et al., 2020; Pliskin et al., 2021). However, the seven independent criterion PVTs administered as a part of this neuropsychological battery evidenced adequate psychometric properties in other cross-validation studies. Considering research indicating that aggregating validity indices improves their capacity to detect invalidity (e.g., Larrabee, 2008), a high rate of false positive or false negative classification errors is unlikely in the current investigation. Further, corroborating informant reports of ADHD symptoms were largely unavailable; however, this is not uncommon in assessment of adult ADHD. Finally, although the sample’s mean estimated premorbid FSIQ (i.e., 105.88) closely approximated the population mean, the mean education attainment (i.e., 15.7 years) of the sample was relatively high. Thus, study findings may not generalize to individuals with ADHD and lower educational attainment such that replication among an ADHD sample with greater educational diversity is warranted.
In sum, results of this study emphasize the clinical utility of using RDS as an embedded PVT in neuropsychological evaluations for ADHD. Findings further suggest that RDS-WM and RDS-R, although less commonly used than RDS and DS ACSS, may better detect invalid test performance due to their high degree of sensitivity (56% and 48%, respectively) and specificity (86% and 85%, respectively). Because the WAIS-IV is already commonly used in ADHD evaluations (Woods et al., 2002), RDS indices (particularly RDS-WM and RDS-R) can easily be incorporated as additional PVTs as a part of a comprehensive neuropsychological assessment.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
