Abstract
Embedded performance validity tests (PVTs), like Digit Span PVTs from Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV), offer a valuable means of evaluating validity without extending administration time. This study investigated the utility of novel indices of performance inconsistency for WAIS-IV Digit Span (DS IRs) in the detection of invalid performance among 705 adults referred for ADHD evaluation. Results showed DS IR indices were inadequate in classifying overall validity status (areas under the curve = 0.52–0.59). Predictably, four established Digit Span PVTs effectively distinguished between valid and invalid performance score groups (areas under the curve = 0.74–0.78) with 32–49% sensitivity and 86–93% specificity at optimal cut-scores. Overall, individuals with noncredible performance scores did not differ significantly from those with valid scores regarding performance inconsistency on WAIS-IV Digit Span.
Introduction
Attention-deficit/hyperactivity disorder (ADHD) is among the most common referrals for neuropsychological evaluation (Rabin et al., 2016). Assessing for ADHD in adults comes with its own set of challenges, including the nonspecific nature of attentional complaints and difficulty obtaining relevant data (e.g., collateral reports of childhood functioning and accessing relevant school records). Moreover, adult ADHD assessment in educational settings is complicated by potentially powerful external incentives, such as access to psychostimulant prescriptions and standardized testing accommodations (Ovsiew et al., 2023). Predictably, a significant portion of adults undergoing ADHD evaluation display invalid cognitive test results (20–50%; Martin & Schroeder, 2020; Hirsch & Christiansen, 2018) and exaggerated symptom reporting (16–22%; Ovsiew et al., 2023). As such, assessment of validity via multiple performance validity tests (PVTs) is an essential component of every ADHD evaluation, as is the case for all neuropsychological evaluations (Sweet et al., 2021).
Embedded PVTs are particularly useful due to their ability to assess validity throughout a neuropsychological evaluation without increasing assessment time and burden (Boone, 2009) and their strong diagnostic accuracy in discerning genuine from feigned ADHD symptomology (Jasinski et al., 2011). A commonly used and extensively cross-validated embedded PVT is Reliable Digit Span (RDS; Greiffenstein et al., 1994), which was originally derived from Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 1981) Digit Span (DS) subtest and was most recently revised using the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV; Wechsler, 2008). Presently, four primary DS PVTs exist: RDS (DS Forward/Backward), RDS-Working Memory (RDS-WM; Backward/Sequencing), RDS-revised (RDS-R; Forward/Backward/Sequencing), and DS Age-Corrected Scaled Score (ACSS). Most of these DS PVTs have been well-validated in various clinical populations (Resch et al., 2023; Schroeder et al., 2012; Webber & Soble, 2018), including ADHD (e.g., Bing-Canar et al., 2022), and have shown good accuracy for detecting invalid neuropsychological test performance.
Some researchers have also examined how various DS process variables can enhance RDS’s accuracy as a PVT. Most notably, Babikian et al. (2006) posited that DS response speed latency may serve as an embedded PVT and found a strong inverse relationship between response time latency and the likelihood of answering correctly. Accordingly, they concluded a long delay in repeating numbers on three- and four-digit strings was more sensitive to feigned performance than RDS. Findings such as these highlight the potential utility of identifying and validating other DS metrics that increase its accuracy as an embedded PVT above traditional RDS scores.
Individuals with ADHD have been described as consistently inconsistent in their performance over time, with this variability often emerging due to testing environments, symptomology, and high rates of comorbidities (Longridge et al., 2019; Marshall et al., 2021). On the DS subtest in particular, patients can inconsistently respond on DS trials yet still obtain a high RDS score, which is based solely on the longest span in which both trials were answered correctly, irrespective of prior trials. Thus, an examinee’s total score may not reflect their level of subtest variability, possibly minimizing their performance deficits or their level of test engagement (i.e., performance invalidity). The present study aimed to determine if consistency of DS performance may function as a novel embedded PVT to detect invalid performance in a clinical sample of adults referred for ADHD evaluation and to compare its accuracy to established DS PVTs.
Method
Participants
Sample Demographics and Performance on Digit Span Indices by Validity Group.
Note. Valid = 0–1 PVT failures; invalid = >/ = 2 PVT failures.
DSF-I: Digit Span Forward Inconsistency; DSB-I: Digit Span Backward Inconsistency; DSS-I: Digit Span Sequencing Inconsistency; LDS-F: Longest Digit Span Forward; LDS-B: Longest Digit Span Backward; LDS-S: Longest Digit Span Sequencing; DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score.
*p < .05.
**p < .01.
***p < .001.
Measures
Criterion Performance Validity Tests
Participants were administered the following five independent PVTs (two freestanding; three embedded) throughout their evaluations, all of which have previously been cross-validated in the context of adult ADHD evaluations: Dot Counting Test (cutoff ≥14; Abramson et al., 2023), Rey 15-Item Test + Recognition (cutoff ≤23; Ashendorf et al., 2021), Stroop Word Reading T-score (cutoff ≤28; Khan et al., 2022), Rey Auditory Verbal Learning Test Effort Score (cutoff ≤12; Phillips et al., 2023), and Trail Making Test Part A T-score (cutoff ≤34; Ashendorf et al., 2017).
Digit Span PVTs
Participants were administered the WAIS-IV DS subtest. The four established DS PVTs, which have been cross-validated in adult ADHD samples (Bing-Canar et al., 2022), were included in this study: (1) traditional RDS (Forward/Backward), (2) RDS-WM (Backward/Sequencing), (3) RDS-R (Forward/Backward/Sequencing), and (4) DS ACSS. DS inconsistency ratio (IR) scores were calculated by summing the number of times a participant obtained an item score of “1” (i.e., demonstrated an inconsistency error) and dividing this sum by their longest digit span (LDS) in the corresponding DS trial (i.e., Forward; Backward; Sequencing). DS IRs were examined in lieu of raw inconsistency totals for two reasons. First, valid examinees with higher LDS scores have more opportunities for error with each subsequent trial administered before discontinuation and thus would be more likely to be penalized by scoring systems that emphasize the number of inconsistency occurrences rather than the ratio of inconsistencies to items attempted. Second, it was hypothesized that invalid participants would have more frequent instances of inconsistent performance and a lower total LDS score for each DS condition, thereby producing higher DS IRs relative to valid participants.
Data Analytic Plan
Spearman correlations evaluated associations among the DS IRs, DS PVTs, and criterion PVTs. Descriptive statistics were calculated for each DS score, and analyses of variance (ANOVAs) examined for significant performance differences between the validity groups. Skewness (<.80) and kurtosis (≤.70) values did not indicate the need for nonparametric tests. The false discovery rate (FDR; Benjamini & Hochberg, 1995) procedure with a 0.05% FDR was used to control for multiple comparisons. Receiver operating characteristic (ROC) curve analyses determined the classification accuracy for each DS IR and PVT for differentiating valid versus invalid participants. Classification accuracy was considered acceptable for ROCs with areas under the curve (AUCs) of ≥ .70 (Hosmer et al., 2013). Optimal cut-scores for each DS IR and DS PVT were selected for scores with maximal sensitivity while maintaining minimally acceptable specificity of at least 85%, preferably closer to >90 (Boone, 2013; Larrabee, 2003). In addition to overall validity status, ROC curves assessed the accuracy of DS IRs for classifying passing versus failing each previously validated DS PVT (RDS; RDS-WM; RDS-R; DS ACSS).
Results
Correlations Between Performance Validity Tests Among the Valid and Invalid Groups.
Note. n = 609 (valid); n = 96 (invalid). DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score; DCT: Dot Counting Test; RFIT: Rey-15 Item Test/Recognition; RAVLT ES: Rey Auditory Verbal Learning Test Effort Score; TMT-A: Trail Making Test-Part A T-score; Stroop: Stroop Word T-score. Correlations above the diagonal are among the valid group and correlations below the diagonal are among the invalid group.
*p < .05.
**p < .01.
Accuracy of the Digit Span Inconsistency Ratios as Embedded Validity Indicators.
Note. n = 705. DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score; AUC: area under the curve; SN: sensitivity; SP: specificity. Bolded scores denote optimal cut-scores for detecting invalid performance.
*p < .05.
**p < .01.
***p < .001.
Discussion
Elevated rates of invalid test performances among adults referred for ADHD evaluation underscore the importance of performance validity testing (PVT) during assessment. This study investigated the utility of consistency of WAIS-IV DS performance as a novel embedded PVT in a large sample of adults referred for neuropsychological evaluation of ADHD. As hypothesized, invalid patients had significantly lower total LDS scores for each DS condition; however, surprisingly, we found that the valid and invalid groups did not remarkably differ in their number of inconsistencies (both ratio and raw totals). Further, none of the three DS IRs reliably differentiated valid from invalid performance with an acceptable classification accuracy. Although the Digit Span Backward Inconsistency Ratio (DSB-IR) showed adequate detection of performance invalidity on RDS, this finding is less useful in clinical practice as no evidence was found of DSB-IR providing additional information above and beyond RDS, so the use of the two tests in conjunction would be redundant. These results do not support the use of DS IRs as PVTs during clinical assessment for ADHD. By contrast, as expected, all four previously validated DS PVTs significantly distinguished between valid and invalid performances in this sample, indicating that use of one previously cross-validated, embedded PVT of the DS subtest remains a best practice during neuropsychological evaluation of ADHD. Further, RDS-WM, RDS-R, and DS ACSS all outperformed RDS and yielded strikingly similar sensitivity (47–49%) and specificity (86–88%) values at their respective optimal cut-scores from a strictly psychometric standpoint. That said, in clinical practice RDS-WM and RDS-R may be more desirable than DS ACSS as the optimal ACSS cutoff (≤7) reflects a normatively low average score and highlights the “invalid before impaired” conundrum that can be seen among PVTs in which a single score conveys information regarding validity status and cognitive function. In short, valid-impaired and invalid performance may be psychometrically indistinguishable (Erdodi & Lichtenstein, 2017).
Investigation of the DS IRs aimed to find an embedded PVT that could be used in addition to established embedded PVTs to more accurately differentiate valid from invalid performance using the DS subtest. Research has demonstrated that the combined use of the four existing, embedded DS PVTs as independent tests of validity increases risk of a false positive as they are not independent indices (Boone, 2013). This conclusion was replicated by the strong correlations observed between RDS, RDS-WM, RDS-R, and DS ACSS across valid and invalid groups in this study. However, because the DS IRs were not strongly correlated with the existing embedded PVTs, the DS IRs could have contributed additional validity to the use of the DS subtest as a PVT as aggregating independent PVTs has shown to improve the accuracy of validity determination (Larrabee, 2008). Unfortunately, their low sensitivity to invalidity detection precluded this. Alternatively, patterns of performance on DS examined in other clinical samples (e.g., children with traumatic brain injury and older adults) have demonstrated that inconsistencies may be due to task-fatigue effects during which a participant experiences lower vigilance or attentional arousal with sustained attention or effort (LaBelle et al., 2019; Warschausky et al., 1996). Given the prominence of within-task performance inconsistency in ADHD, it is certainly possible that the DS IRs simply are unable to differentiate inconsistency due to invalid test performance versus bona fide momentary fluctuations in executive control seen in ADHD (Friedman et al., 2022; Marshall et al., 2021).
The current study displayed methodological strengths including a demographically diverse sample, use of five independent criterion PVTs which have previously been validated among adult ADHD samples to establish validity groups, and inclusion of all established DS PVTs against which the novel DS IRs could be directly compared. Nevertheless, some study limitations also exist. First, information from third-party sources such as school records or parent reports, which are common in child ADHD evaluations, was unavailable; however, this lack of corroborating information is more common in adult ADHD evaluations (McIntosh et al., 2009). Secondly, although the sample analyzed was racially diverse, it was on average highly educated (15.86 years), and thus their performance may not be representative of the broader population.
Overall, the results of this study suggested that DS IRs are not useful as PVTs, which was in sharp contrast to the established DS PVTs, which robustly discriminated valid from invalid performers. It is possible that the DS IRs could be useful measures of another construct relevant to ADHD evaluations given the disorder’s conceptual basis in inconsistent working memory performance, especially when working on tasks that exceed cognitive capacity (Friedman et al., 2022). Thus, further research is needed to determine possible clinical uses of DS IRs. Exploratory research seeking additional convergent and discriminative evidence obtained from measures already commonly administered within assessment batteries is a critical endeavor in establishing efficient and comprehensive psychoeducational evaluation practices.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
