An Examination of WAIS-IV Digit Span Performance Inconsistency as a Novel Embedded Performance Validity Test Among Adults Clinically Referred for Attention-Deficit/Hyperactivity Disorder

Abstract

Embedded performance validity tests (PVTs), like Digit Span PVTs from Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV), offer a valuable means of evaluating validity without extending administration time. This study investigated the utility of novel indices of performance inconsistency for WAIS-IV Digit Span (DS IRs) in the detection of invalid performance among 705 adults referred for ADHD evaluation. Results showed DS IR indices were inadequate in classifying overall validity status (areas under the curve = 0.52–0.59). Predictably, four established Digit Span PVTs effectively distinguished between valid and invalid performance score groups (areas under the curve = 0.74–0.78) with 32–49% sensitivity and 86–93% specificity at optimal cut-scores. Overall, individuals with noncredible performance scores did not differ significantly from those with valid scores regarding performance inconsistency on WAIS-IV Digit Span.

Keywords

neuropsychology assessment embedded performance validity ADHD

Introduction

Attention-deficit/hyperactivity disorder (ADHD) is among the most common referrals for neuropsychological evaluation (Rabin et al., 2016). Assessing for ADHD in adults comes with its own set of challenges, including the nonspecific nature of attentional complaints and difficulty obtaining relevant data (e.g., collateral reports of childhood functioning and accessing relevant school records). Moreover, adult ADHD assessment in educational settings is complicated by potentially powerful external incentives, such as access to psychostimulant prescriptions and standardized testing accommodations (Ovsiew et al., 2023). Predictably, a significant portion of adults undergoing ADHD evaluation display invalid cognitive test results (20–50%; Martin & Schroeder, 2020; Hirsch & Christiansen, 2018) and exaggerated symptom reporting (16–22%; Ovsiew et al., 2023). As such, assessment of validity via multiple performance validity tests (PVTs) is an essential component of every ADHD evaluation, as is the case for all neuropsychological evaluations (Sweet et al., 2021).

Embedded PVTs are particularly useful due to their ability to assess validity throughout a neuropsychological evaluation without increasing assessment time and burden (Boone, 2009) and their strong diagnostic accuracy in discerning genuine from feigned ADHD symptomology (Jasinski et al., 2011). A commonly used and extensively cross-validated embedded PVT is Reliable Digit Span (RDS; Greiffenstein et al., 1994), which was originally derived from Wechsler Adult Intelligence Scale-Revised (WAIS-R; Wechsler, 1981) Digit Span (DS) subtest and was most recently revised using the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV; Wechsler, 2008). Presently, four primary DS PVTs exist: RDS (DS Forward/Backward), RDS-Working Memory (RDS-WM; Backward/Sequencing), RDS-revised (RDS-R; Forward/Backward/Sequencing), and DS Age-Corrected Scaled Score (ACSS). Most of these DS PVTs have been well-validated in various clinical populations (Resch et al., 2023; Schroeder et al., 2012; Webber & Soble, 2018), including ADHD (e.g., Bing-Canar et al., 2022), and have shown good accuracy for detecting invalid neuropsychological test performance.

Some researchers have also examined how various DS process variables can enhance RDS’s accuracy as a PVT. Most notably, Babikian et al. (2006) posited that DS response speed latency may serve as an embedded PVT and found a strong inverse relationship between response time latency and the likelihood of answering correctly. Accordingly, they concluded a long delay in repeating numbers on three- and four-digit strings was more sensitive to feigned performance than RDS. Findings such as these highlight the potential utility of identifying and validating other DS metrics that increase its accuracy as an embedded PVT above traditional RDS scores.

Individuals with ADHD have been described as consistently inconsistent in their performance over time, with this variability often emerging due to testing environments, symptomology, and high rates of comorbidities (Longridge et al., 2019; Marshall et al., 2021). On the DS subtest in particular, patients can inconsistently respond on DS trials yet still obtain a high RDS score, which is based solely on the longest span in which both trials were answered correctly, irrespective of prior trials. Thus, an examinee’s total score may not reflect their level of subtest variability, possibly minimizing their performance deficits or their level of test engagement (i.e., performance invalidity). The present study aimed to determine if consistency of DS performance may function as a novel embedded PVT to detect invalid performance in a clinical sample of adults referred for ADHD evaluation and to compare its accuracy to established DS PVTs.

Method

Participants

This cross-sectional study examined data from a large consecutive series of 780 adults referred for ADHD evaluation at a university academic medical center from 2018 to 2024. All participants underwent a comprehensive diagnostic evaluation, which included record review, completion of a detailed history questionnaire, clinical interview by a board-certified clinical neuropsychologist, administration of a focused neuropsychological test battery, and self-report measures of ADHD and psychopathology. All participants consented to retaining their evaluation data as part of an ongoing, IRB-approved neuropsychological database repository, data from which have been previously published (e.g., Bing-Canar et al., 2022, Scimeca et al., 2021); however, because the database is deidentified, and data collection is continuous, further delineation of overlap was not possible. Participants missing a criterion PVT (n = 50) or for whom DS inconsistency data was not coded in the database (n = 25) were excluded, which resulted in a final study sample of 705 (see Table 1 for sample demographics). In brief, the sample was ethnoracially diverse and primarily composed of students at the undergraduate or graduate/professional level. Study validity groups were formed based on five independent criterion PVTs (see below) such that those with failure on 0–1 PVTs were classified as valid and those with ≥2 PVT failures as invalid (Jennette et al., 2022).

Table 1.

Sample Demographics and Performance on Digit Span Indices by Validity Group.

	Valid (n = 609)		Invalid (n = 96)
	M (SD)	Range	M (SD)	Range	F	η _p ²
Age	28.33 (7.10)	18–60	28.06 (7.58)	18–55	0.12	.000
Education	15.96 (2.05)	9–20	15.26 (2.37)	8–20	9.27^**	.013
	N	%	N	%	X ²
Sex					1.74
Male	236	39%	44	46%
Female	373	61%	52	54%
Race/ethnicity					11.50^*
Non-Hispanic White	296	44%	30	31%
Hispanic	142	23%	32	33%
Non-Hispanic Black	90	15%	22	23%
Asian/Pacific Islander	67	11%	8	9%
Other	41	7%	4	4%
Student status					4.98
Undergraduate	195	32%	28	29%
Graduate/professional	228	37%	28	29%
Not current student	186	31%	40	42%
Past diagnosis of ADHD					1.43
No	436	72%	63	66%
Yes	173	28%	33	34%
Digit Span Score/Index	M (SD)		M (SD)		F	η _p ²
DSF-I	1.01 (0.88)		1.11 (0.89)		1.21	.002
DSB-I	1.07 (0.93)		0.80 (0.72)		7.44^**	.010
DSS-I	1.10 (0.95)		0.98 (0.97)		1.34	.002
LDS-F	6.86 (1.25)		6.03 (1.12)		37.74^***	.051
LDS-B	5.18 (1.38)		4.09 (1.13)		54.06^***	.071
LDS-S	6.15 (1.24)		5.13 (1.18)		57.73^***	.076
DSF-IR	0.14 (0.12)		0.18 (0.14)		7.12^**	.010
DSB-IR	0.20 (0.16)		0.19 (0.17)		0.19	.000
DSS-IR	0.17 (0.14)		0.17 (0.15)		0.08	.000
RDS	10.30 (2.14)		8.47 (1.86)		62.38^***	.082
RDS-WM	9.53 (1.97)		7.75 (1.75)		69.53^***	.090
RDS-R	15.54 (2.75)		12.80 (2.49)		83.97^***	.107
DS ACSS	10.46 (2.72)		7.72 (3.32)		87.35^***	.111

Note. Valid = 0–1 PVT failures; invalid = >/ = 2 PVT failures.

DSF-I: Digit Span Forward Inconsistency; DSB-I: Digit Span Backward Inconsistency; DSS-I: Digit Span Sequencing Inconsistency; LDS-F: Longest Digit Span Forward; LDS-B: Longest Digit Span Backward; LDS-S: Longest Digit Span Sequencing; DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score.

*p < .05.

**p < .01.

***p < .001.

Measures

Criterion Performance Validity Tests

Participants were administered the following five independent PVTs (two freestanding; three embedded) throughout their evaluations, all of which have previously been cross-validated in the context of adult ADHD evaluations: Dot Counting Test (cutoff ≥14; Abramson et al., 2023), Rey 15-Item Test + Recognition (cutoff ≤23; Ashendorf et al., 2021), Stroop Word Reading T-score (cutoff ≤28; Khan et al., 2022), Rey Auditory Verbal Learning Test Effort Score (cutoff ≤12; Phillips et al., 2023), and Trail Making Test Part A T-score (cutoff ≤34; Ashendorf et al., 2017).

Digit Span PVTs

Participants were administered the WAIS-IV DS subtest. The four established DS PVTs, which have been cross-validated in adult ADHD samples (Bing-Canar et al., 2022), were included in this study: (1) traditional RDS (Forward/Backward), (2) RDS-WM (Backward/Sequencing), (3) RDS-R (Forward/Backward/Sequencing), and (4) DS ACSS. DS inconsistency ratio (IR) scores were calculated by summing the number of times a participant obtained an item score of “1” (i.e., demonstrated an inconsistency error) and dividing this sum by their longest digit span (LDS) in the corresponding DS trial (i.e., Forward; Backward; Sequencing). DS IRs were examined in lieu of raw inconsistency totals for two reasons. First, valid examinees with higher LDS scores have more opportunities for error with each subsequent trial administered before discontinuation and thus would be more likely to be penalized by scoring systems that emphasize the number of inconsistency occurrences rather than the ratio of inconsistencies to items attempted. Second, it was hypothesized that invalid participants would have more frequent instances of inconsistent performance and a lower total LDS score for each DS condition, thereby producing higher DS IRs relative to valid participants.

Data Analytic Plan

Spearman correlations evaluated associations among the DS IRs, DS PVTs, and criterion PVTs. Descriptive statistics were calculated for each DS score, and analyses of variance (ANOVAs) examined for significant performance differences between the validity groups. Skewness (<.80) and kurtosis (≤.70) values did not indicate the need for nonparametric tests. The false discovery rate (FDR; Benjamini & Hochberg, 1995) procedure with a 0.05% FDR was used to control for multiple comparisons. Receiver operating characteristic (ROC) curve analyses determined the classification accuracy for each DS IR and PVT for differentiating valid versus invalid participants. Classification accuracy was considered acceptable for ROCs with areas under the curve (AUCs) of ≥ .70 (Hosmer et al., 2013). Optimal cut-scores for each DS IR and DS PVT were selected for scores with maximal sensitivity while maintaining minimally acceptable specificity of at least 85%, preferably closer to >90 (Boone, 2013; Larrabee, 2003). In addition to overall validity status, ROC curves assessed the accuracy of DS IRs for classifying passing versus failing each previously validated DS PVT (RDS; RDS-WM; RDS-R; DS ACSS).

Results

DS IRs yielded generally weak, nonsignificant correlations with each other and with the criterion PVTs, suggesting the DS IRs were largely independent from the criterion PVTs used to establish validity groups. DS IRs also had weak, variably significant correlations with previously validated DS PVTs (see Table 2) (see Table 1 for performance on DS indices by validity group). The valid group performed significantly better (i.e., higher scores) across all four previously validated DS PVTs and longest DS indices, with medium to large effects. By contrast, negligible differences across DS inconsistency indices emerged between validity groups.

Table 2.

Correlations Between Performance Validity Tests Among the Valid and Invalid Groups.

	DSF-IR	DSB-IR	DSS-IR	RDS	RDS-WM	RDS-R	DS ACSS	DCT	RFIT	RAVLT ES	TMT-A	Stroop
DSF-IR	-	−.09^*	.04	−.26^**	−.09^*	−.23^**	−.12^**	−.01	.02	.03	.00	.00
DSB-IR	.22^*	-	.00	−.14^**	−.20^**	−.13^**	−.01	.03	.05	−.00	−.01	.01
DSS-IR	.11	.09	-	−.06	−.24^**	−.20^**	−.07	−.01	−.02	.11^**	−.08	−.04
RDS	.27^**	−.29^**	.00	-	.71^**	.90^**	.85^**	−.24^**	.14^**	.06	.06	.17^**
RDS-WM	−.03	−.29^**	−.18	.78^**	-	.90^**	.83^**	−.28^**	.18^**	.18^**	.10^*	.16^**
RDS-R	−.20	−.25^*	−.06	.93^**	.91^**	-	.92^**	−.28^**	.16^**	.11^**	.11^**	.20^**
DS ACSS	−.02	−.12	.07	.87^**	.84^**	.92^**	-	−.32^**	.19^**	.14^**	.11^**	.21^**
DCT	.02	−.01	−.04	−.26^**	−.29^**	−.31^**	−.35^**	-	−.17^**	−.15^**	−.22^**	−.31^**
RFIT	−.14	−.01	.18	.34^**	.26^*	.33^**	.32^**	−.26^*	-	.14^**	.08^*	.05
RAVLT ES	−.00	−.05	−.04	.26^*	.32^**	.27^**	.33^**	−.10	.23^*	-	.03	.04
TMT-A	−.11	−.01	−.10	−.10	−.19	−.14	−.12	.19	−.18	−.30^**	-	.30^**
Stroop	−.20^*	−.13	−.05	−.08	−.12	−.11	−.15	−.10	−.29^**	−.14	.12	-

Note. n = 609 (valid); n = 96 (invalid). DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score; DCT: Dot Counting Test; RFIT: Rey-15 Item Test/Recognition; RAVLT ES: Rey Auditory Verbal Learning Test Effort Score; TMT-A: Trail Making Test-Part A T-score; Stroop: Stroop Word T-score. Correlations above the diagonal are among the valid group and correlations below the diagonal are among the invalid group.

*p < .05.

**p < .01.

ROC curve analyses (Table 3) showed that DS IRs generally had poor classification accuracy (AUCs = .49–.72) for differentiating those who failed versus passed each of the four established DS PVTs (i.e., RDS; RDS-WM; RDS-R; DS ACSS). Similarly, DS IRs were unable to accurately classify overall validity status, as determined by the independent criterion PVTs (AUCs = .52–.59). By contrast, all four of the previously validated DS PVTs demonstrated acceptable classification accuracy (AUCs = .74–.78) for detecting invalid performance with 32–49% sensitivity and 86–93% specificity at optimal cut-scores.

Table 3.

Accuracy of the Digit Span Inconsistency Ratios as Embedded Validity Indicators.

	AUC	Cutoff	SN	SP	30%Base rate		20%Base rate		10%Base rate
RDS (634 valid/71 invalid)-					PPV	NPV	PPV	NPV	PPV	NPV
DSF-IR	.67^***	≥26.79%	.39	.84	0.51	0.76	0.38	0.85	0.21	0.93
		≥30.95%	.32	.94	0.70	0.76	0.57	0.85	0.37	0.93
		≥35.42%	.20	.97	0.74	0.74	0.63	0.83	0.43	0.92
DSB-IR	.72^***	≥30.95%	.59	.82	0.58	0.82	0.45	0.89	0.27	0.95
		≥38.75%	.32	.90	0.58	0.76	0.44	0.84	0.26	0.92
		≥41.43%	.27	.95	0.70	0.75	0.57	0.84	0.38	0.92
DSS-IR	.56	-	-	-	-	-	-	-	-	-
RDS-WM (575 valid/130 invalid)-
DSF-IR	.56^*	≥23.61%	.32	.77	0.37	0.73	0.26	0.82	0.13	0.91
		≥26.79%	.26	.84	0.41	0.73	0.29	0.82	0.15	0.91
		≥30.95%	.18	.93	0.52	0.73	0.39	0.82	0.22	0.91
DSB-IR	.67^***	≥30.95%	.41	.82	0.49	0.76	0.36	0.85	0.20	0.93
		≥38.75%	.27	.91	0.56	0.74	0.43	0.83	0.25	0.92
		≥46.43%	.15	.96	0.62	0.72	0.48	0.82	0.29	0.91
DSS-IR	.63^***	≥23.61%	.54	.74	0.47	0.79	0.34	0.87	0.19	0.94
		≥30.95%	.40	.84	0.52	0.77	0.38	0.85	0.22	0.93
		≥38.75%	.22	.95	0.65	0.74	0.52	0.83	0.33	0.92
RDS-R (584 valid/121 invalid)-
DSF-IR	.64^***	≥23.61%	.40	.79	0.45	0.75	0.32	0.84	0.17	0.92
		≥26.79%	.35	.85	0.50	0.75	0.37	0.84	0.21	0.92
		≥30.95%	.26	.95	0.69	0.75	0.57	0.84	0.37	0.92
DSB-IR	.61^***	≥30.95%	.34	.81	0.43	0.74	0.31	0.83	0.17	0.92
		≥38.75%	.21	.90	0.47	0.73	0.34	0.82	0.19	0.91
		≥46.43%	.15	.96	0.62	0.72	0.48	0.82	0.29	0.91
DSS-IR	.58^**	≥30.95%	.33	.82	0.44	0.74	0.31	0.83	0.17	0.92
		≥38.75%	.19	.95	0.62	0.73	0.49	0.82	0.30	0.91
		≥41.43%	.12	.96	0.56	0.72	0.43	0.81	0.25	0.91
DS ACSS (586 valid/119 invalid)-
DSF-IR	.56^*	≥26.79%	.24	.83	0.38	0.72	0.26	0.81	0.14	0.91
		≥30.95%	.21	.94	0.60	0.74	0.47	0.83	0.28	0.91
		≥38.75%	.10	.98	0.68	0.72	0.56	0.81	0.36	0.91
DSB-IR	.56	-	-	-	-	-	-	-	-	-
DSS-IR	.49	-	-	-	-	-	-	-	-	-
Overall validity (609 valid/96 invalid)-
DSF-IR	.59^**	≥26.79%	.28	.83	0.41	0.73	0.29	0.82	0.15	0.91
		≥30.95%	.16	.92	0.46	0.72	0.33	0.81	0.18	0.91
		≥35.42%	.08	.96	0.46	0.71	0.33	0.81	0.18	0.90
DSB-IR	.50	-	-	-	-	-	-	-	-	-
DSS-IR	.52	-	-	-	-	-	-	-	-	-
RDS	.74^***	≤6	.14	.98	0.75	0.73	0.64	0.82	0.44	0.91
		≤7	.32	.93	0.66	0.76	0.53	0.85	0.34	0.92
		≤8	.51	.81	0.53	0.79	0.40	0.87	0.23	0.94
RDS-WM	.75^***	≤6	.24	.96	0.72	0.75	0.60	0.83	0.40	0.92
		≤7	.49	.86	0.60	0.80	0.47	0.87	0.28	0.94
		≤8	.74	.66	0.48	0.86	0.35	0.91	0.19	0.96
RDS-R	.77^***	≤11	.30	.95	0.72	0.76	0.60	0.84	0.40	0.92
		≤12	.47	.88	0.63	0.79	0.49	0.87	0.30	0.94
		≤13	.69	.77	0.56	0.85	0.43	0.91	0.25	0.96
DS ACSS	.78^***	≤6	.30	.95	0.72	0.76	0.60	0.84	0.40	0.92
		≤7	.49	.88	0.64	0.80	0.51	0.87	0.31	0.94
		≤8	.68	.77	0.56	0.85	0.43	0.91	0.25	0.96

Note. n = 705. DSF-IR: Digit Span Forward Inconsistency Ratio; DSB-IR: Digit Span Backward Inconsistency Ratio; DSS-IR: Digit Span Sequencing Inconsistency Ratio; RDS: Reliable Digit Span (Forward/Backward); RDS-WM: RDS-Working Memory (Backward/Sequencing); RDS-R: RDS-Revised (Forward/Backward/Sequencing); DS ACSS: Digit Span Age-Corrected Scaled Score; AUC: area under the curve; SN: sensitivity; SP: specificity. Bolded scores denote optimal cut-scores for detecting invalid performance.

*p < .05.

**p < .01.

***p < .001.

Discussion

Elevated rates of invalid test performances among adults referred for ADHD evaluation underscore the importance of performance validity testing (PVT) during assessment. This study investigated the utility of consistency of WAIS-IV DS performance as a novel embedded PVT in a large sample of adults referred for neuropsychological evaluation of ADHD. As hypothesized, invalid patients had significantly lower total LDS scores for each DS condition; however, surprisingly, we found that the valid and invalid groups did not remarkably differ in their number of inconsistencies (both ratio and raw totals). Further, none of the three DS IRs reliably differentiated valid from invalid performance with an acceptable classification accuracy. Although the Digit Span Backward Inconsistency Ratio (DSB-IR) showed adequate detection of performance invalidity on RDS, this finding is less useful in clinical practice as no evidence was found of DSB-IR providing additional information above and beyond RDS, so the use of the two tests in conjunction would be redundant. These results do not support the use of DS IRs as PVTs during clinical assessment for ADHD. By contrast, as expected, all four previously validated DS PVTs significantly distinguished between valid and invalid performances in this sample, indicating that use of one previously cross-validated, embedded PVT of the DS subtest remains a best practice during neuropsychological evaluation of ADHD. Further, RDS-WM, RDS-R, and DS ACSS all outperformed RDS and yielded strikingly similar sensitivity (47–49%) and specificity (86–88%) values at their respective optimal cut-scores from a strictly psychometric standpoint. That said, in clinical practice RDS-WM and RDS-R may be more desirable than DS ACSS as the optimal ACSS cutoff (≤7) reflects a normatively low average score and highlights the “invalid before impaired” conundrum that can be seen among PVTs in which a single score conveys information regarding validity status and cognitive function. In short, valid-impaired and invalid performance may be psychometrically indistinguishable (Erdodi & Lichtenstein, 2017).

Investigation of the DS IRs aimed to find an embedded PVT that could be used in addition to established embedded PVTs to more accurately differentiate valid from invalid performance using the DS subtest. Research has demonstrated that the combined use of the four existing, embedded DS PVTs as independent tests of validity increases risk of a false positive as they are not independent indices (Boone, 2013). This conclusion was replicated by the strong correlations observed between RDS, RDS-WM, RDS-R, and DS ACSS across valid and invalid groups in this study. However, because the DS IRs were not strongly correlated with the existing embedded PVTs, the DS IRs could have contributed additional validity to the use of the DS subtest as a PVT as aggregating independent PVTs has shown to improve the accuracy of validity determination (Larrabee, 2008). Unfortunately, their low sensitivity to invalidity detection precluded this. Alternatively, patterns of performance on DS examined in other clinical samples (e.g., children with traumatic brain injury and older adults) have demonstrated that inconsistencies may be due to task-fatigue effects during which a participant experiences lower vigilance or attentional arousal with sustained attention or effort (LaBelle et al., 2019; Warschausky et al., 1996). Given the prominence of within-task performance inconsistency in ADHD, it is certainly possible that the DS IRs simply are unable to differentiate inconsistency due to invalid test performance versus bona fide momentary fluctuations in executive control seen in ADHD (Friedman et al., 2022; Marshall et al., 2021).

The current study displayed methodological strengths including a demographically diverse sample, use of five independent criterion PVTs which have previously been validated among adult ADHD samples to establish validity groups, and inclusion of all established DS PVTs against which the novel DS IRs could be directly compared. Nevertheless, some study limitations also exist. First, information from third-party sources such as school records or parent reports, which are common in child ADHD evaluations, was unavailable; however, this lack of corroborating information is more common in adult ADHD evaluations (McIntosh et al., 2009). Secondly, although the sample analyzed was racially diverse, it was on average highly educated (15.86 years), and thus their performance may not be representative of the broader population.

Overall, the results of this study suggested that DS IRs are not useful as PVTs, which was in sharp contrast to the established DS PVTs, which robustly discriminated valid from invalid performers. It is possible that the DS IRs could be useful measures of another construct relevant to ADHD evaluations given the disorder’s conceptual basis in inconsistent working memory performance, especially when working on tasks that exceed cognitive capacity (Friedman et al., 2022). Thus, further research is needed to determine possible clinical uses of DS IRs. Exploratory research seeking additional convergent and discriminative evidence obtained from measures already commonly administered within assessment batteries is a critical endeavor in establishing efficient and comprehensive psychoeducational evaluation practices.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Justyna Piszczor

Jason R. Soble

References

Abramson

D. A.

White

D. J.

Rhoads

Carter

D. A.

Hansen

N. D.

Resch

Z. J.

Jennette

K. J.

Ovsiew

G. P.

Soble

J. R.

(2023). Cross-validating the Dot Counting Test among an adult ADHD clinical sample and analyzing the effect of ADHD subtype and comorbid psychopathology. Assessment, 30(2), 264–273. https://doi.org/10.1177/10731911211050895

Ashendorf

Clark

E. L.

Humphreys

C. T.

(2021). The Rey 15-item memory test in US Veterans. Journal of Clinical and Experimental Neuropsychology, 43(3), 324–331. https://doi.org/10.1080/13803395.2021.1932761

Ashendorf

Clark

E. L.

Sugarman

M. A.

(2017). Performance validity and processing speed in a VA polytrauma sample. The Clinical Neuropsychologist, 31(5), 857–866. https://doi.org/10.1080/13854046.2017.1285961

Babikian

Boone

K. B.

Arnold

(2006). Sensitivity and specificity of various digit span scores in the detection of suspect effort. The Clinical Neuropsychologist, 20(1), 145–159. https://doi.org/10.1080/13854040590947362

Benjamini

Hochberg

(1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300. https://doi.org/10.1111/j.2517-6161.1995.tb02031.x

Bing-Canar

Phillips

M. S.

Shields

A. N.

Ogram Buckley

C. M.

Chang

Khan

Skymba

Ovsiew

Resch

Jennette

Soble

J. R.

(2022). Cross-validation of multiple WAIS-IV digit span embedded performance validity indices among a large sample of adult attention deficit/hyperactivity disorder clinical referrals. Journal of Psychoeducational Assessment, 40(5), 678–688. https://doi.org/10.1177/07342829221081921

Boone

K. B.

(2009). The need for continuous and comprehensive sampling of effort/response bias during neuropsychological examinations. The Clinical Neuropsychologist, 23(4), 729–741. https://doi.org/10.1080/13854040802427803

Boone

K. B.

(2013). Clinical practice of forensic neuropsychology. Guilford Press.

Erdodi

L. A.

Lichtenstein

J. D.

(2017). Invalid before impaired: An emerging paradox of embedded validity indicators. The Clinical Neuropsychologist, 31(6-7), 1029–1046. https://doi.org/10.1080/13854046.2017.1323119

10.

Friedman

L. M.

Rapport

M. D.

Fabrikant-Abzug

(2022). Consistently inconsistent working memory performance among children with ADHD: Evidence of Response Accuracy Variability (RAV). Journal of Psychopathology and Behavioral Assessment, 44(3), 787–799. https://doi.org/10.1007/s10862-022-09967-7

11.

Greiffenstein

M. F.

Baker

W. J.

Gola

(1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6(3), 218–224. https://doi.org/10.1037/1040-3590.6.3.218

12.

Hirsch

Christiansen

(2018). Faking ADHD? Symptom validity testing and its relation to self-reported, observer-reported symptoms, and neuropsychological measures of attention in adults with ADHD. Journal of Attention Disorders, 22(3), 269–280. https://doi.org/10.1177/1087054715596577

13.

Hosmer

D. W.

Lemeshow

Sturdivant

R. X.

(2013). Applied logistic regression (3rd ed.). John Wiley & Sons.

14.

Jasinski

L. J.

Harp

J. P.

Berry

D. T.

Shandera-Ochsner

A. L.

Mason

L. H.

Ranseen

J. D.

(2011). Using symptom validity tests to detect malingered ADHD in college students. The Clinical Neuropsychologist, 25(8), 1415–1428. https://doi.org/10.1080/13854046.2011.630024

15.

Jennette

K. J.

Williams

C. P.

Resch

Z. J.

Ovsiew

G. P.

Durkin

N. M.

O’Rourke

J. J.

Marceaux

J. C.

Critchfield

E. A.

Soble

J. R.

(2022). Assessment of differential neurocognitive performance based on the number of performance validity tests failures: A cross-validation study across multiple mixed clinical samples. The Clinical Neuropsychologist, 36(7), 1915–1932. https://doi.org/10.1080/13854046.2021.1900398

16.

Khan

Rauch

A. A.

Obolsky

M. A.

Skymba

Barwegen

K. C.

Wisinger

A. M.

Ovsiew

G. P.

Jennette

K. J.

Soble

J. R.

Resch

Z. J.

(2022). A comparison of embedded validity indicators from the Stroop Color and Word Test among adults referred for clinical evaluation of suspected or confirmed attention-deficit/hyperactivity disorder. Psychological Assessment, 34(7), 697–703. https://doi.org/10.1037/pas0001137

17.

LaBelle

D. R.

Lee

B. G.

Miller

J. B.

(2019). Dissociation of executive and attentional elements of the digit span task in a population of older adults: A latent class analysis. Assessment, 26(7), 1386–1398. https://doi.org/10.1177/1073191117714556

18.

Larrabee

G. J.

(2003). Detection of malingering using atypical performance patterns on standard neuropsychological tests. The Clinical Neuropsychologist, 17(3), 410–425. https://doi.org/10.1076/clin.17.3.410.18089

19.

Larrabee

G. J.

(2008). Aggregation across multiple indicators improves the detection of malingering: Relationship to likelihood ratios. The Clinical Neuropsychologist, 22(4), 666–679. https://doi.org/10.1080/13854040701494987

20.

Longridge

Norman

Henley

Newlove Delgado

Ford

(2019). Investigating the agreement between the clinician and research diagnosis of attention deficit hyperactivity disorder and how it changes over time; a clinical cohort study. Child and Adolescent Mental Health, 24(2), 133–141. https://doi.org/10.1111/camh.12285

21.

Marshall

Hoelzle

Nikolas

(2021). Diagnosing attention-deficit/hyperactivity disorder (ADHD) in young adults: A qualitative review of the utility of assessment measures and recommendations for improving the diagnostic process. The Clinical Neuropsychologist, 35(1), 165–198. https://doi.org/10.1080/13854046.2019.1696409

22.

Martin

P. K.

Schroeder

R. W.

(2020). Base rates of invalid test performance across clinical non-forensic contexts and settings. Archives of Clinical Neuropsychology, 35(6), 717–725. https://doi.org/10.1093/arclin/acaa017

23.

McIntosh

Kutcher

Binder

Levitt

Fallu

Rosenbluth

(2009). Adult ADHD and comorbid depression: A consensus-derived diagnostic algorithm for ADHD. Neuropsychiatric Disease and Treatment, 5, 137–150. https://doi.org/10.2147/NDT.S4720

24.

Ovsiew

G. P.

Cerny

B. M.

Boer

A. B. D.

Petry

L. G.

Resch

Z. J.

Durkin

N. M.

Soble

J. R.

(2023). Performance and symptom validity assessment in attention deficit/hyperactivity disorder: Base rates of invalidity, concordance, and relative impact on cognitive performance. The Clinical Neuropsychologist, 37(7), 1498–1515. https://doi.org/10.1080/13854046.2022.2162440

25.

Phillips

M. S.

Wisinger

A. M.

Lapitan-Moore

F. T.

Ausloos-Lozano

J. E.

Bing-Canar

Durkin

N. M.

Ovsiew

G. P.

Resch

Z. J.

Jennette

K. J.

Soble

J. R.

(2023). Cross-validation of multiple embedded performance validity indices in the Rey Auditory Verbal Learning Test and Brief Visuospatial Memory Test-Revised in an adult attention deficit/hyperactivity disorder clinical sample. Psychological Injury and Law, 16(1), 27–35. https://doi.org/10.1007/s12207-022-09443-3

26.

Rabin

L. A.

Paolillo

Barr

W. B.

(2016). Stability in test usage practices of clinical neuropsychologists in the United States and Canada over a 10-year period: A follow-up survey of INS and NAN members. Archives of Clinical Neuropsychology, 31(3), 206–230. https://doi.org/10.1093/arclin/acw007

27.

Resch

Z. J.

Cerny

B. M.

Ovsiew

G. P.

Jennette

K. J.

Bing-Canar

Rhoads

Soble

J. R.

(2023). A direct comparison of 10 WAIS-IV Digit Span embedded validity indicators among a mixed neuropsychiatric sample with varying degrees of cognitive impairment. Archives of Clinical Neuropsychology, 38(4), 619–632. https://doi.org/10.1093/arclin/acac082

28.

Schroeder

R. W.

Twumasi-Ankrah

Baade

L. E.

Marshall

P. S.

(2012). Reliable digit span: A systematic review and cross-validation study. Assessment, 19(1), 21–30. https://doi.org/10.1177/1073191111428764

29.

Scimeca

L. M.

Holbrook

Rhoads

Cerny

B. M.

Jennette

K. J.

Resch

Z. J.

Obolsky

M. A.

Ovsiew

G. P.

Soble

J. R.

(2021). Examining Conners Continuous Performance Test-3 (CPT-3) embedded performance validity indicators in an adult clinical sample referred for ADHD evaluation. Developmental Neuropsychology, 46(5), 347–359. https://doi.org/10.1080/87565641.2021.1951270

30.

Sweet

J. J.

Heilbronner

R. L.

Morgan

J. E.

Larrabee

G. J.

Rohling

M. L.

Boone

K. B.

Kirkwood

M. W.

Schroeder

R. W.

Suhr

J. A.

Conference Participants . (2021). American Academy of Clinical Neuropsychology (AACN) 2021 consensus statement on validity assessment: Update of the 2009 AACN consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 35, 1053–1106. https://doi.org/10.1080/13854046.2021.1896036

31.

Warschausky

Kewman

D. G.

Selim

(1996). Attentional performance of children with traumatic brain injury: A quantitative and qualitative analysis of digit span. Archives of Clinical Neuropsychology, 11(2), 147–153. https://doi.org/10.1016/0887-6177(95)00004-6

32.

Webber

T. A.

Soble

J. R.

(2018). Utility of various WAIS-IV Digit Span indices for identifying noncredible performance validity among cognitively impaired and unimpaired examinees. The Clinical Neuropsychologist, 32, 657–670. https://doi.org/10.1080/13854046.2017.1415374

33.

Wechsler

(1981). Wechsler adult intelligence Scale. Revised: Psychological Corporation.

34.

Wechsler

(2008). Wechsler adult intelligence Scale (4th ed.). Pearson Assessment.