Abstract
The present study examined the impact of performance validity test (PVT) failure on the Test of Premorbid Functioning (TOPF) in a sample of 252 neuropsychological patients. Word reading performance differed significantly according to PVT failure status, and number of PVTs failed accounted for 7.4% of the variance in word reading performance, even after controlling for education. Furthermore, individuals failing ≥2 PVTs were twice as likely as individuals passing all PVTs (33% vs. 16%) to have abnormally low obtained word reading scores relative to demographically predicted scores when using a normative base rate of 10% to define abnormality. When compared with standardization study clinical groups, those failing ≥2 PVTs were twice as likely as patients with moderate to severe traumatic brain injury and as likely as patients with Alzheimer’s dementia to obtain abnormally low TOPF word reading scores. Findings indicate that TOPF word reading based estimates of premorbid functioning should not be interpreted in individuals invalidating cognitive testing.
Introduction
When conducting cognitive assessments, psychologists frequently establish estimates of premorbid intellectual functioning in order to determine whether an examinee’s current functioning represents a decline from previous abilities. Estimates of premorbid functioning are often formulated based on examinee demographic information and/or performance on “hold” tests believed to be relatively resistant to neurological injury or degenerative processes (Barona, Reynolds, & Chastain, 1984; Crawford, 1992). While some clinicians might incorporate this information qualitatively, standardized methods for predicting premorbid intellectual functioning are readily available and have been demonstrated to be more accurate than impression-based estimates (Crawford, Millar, & Milne, 2001; Pearson, 2009).
The Advanced Clinical Solutions package (ACS; Pearson, 2009) for the Wechsler Adult Intelligence Scale–Fourth Edition (WAIS-IV; Wechsler, 2008) and Wechsler Memory Scale–Fourth Edition (WMS-IV; Wechsler, 2009) contains the Test of Premorbid Functioning (TOPF), which provides standardized approaches for estimating premorbid abilities. The TOPF allows clinicians to estimate premorbid cognitive functioning based on demographic information, a word reading test, or a combined equation that incorporates both of these methods. Similar to previous word reading measures, the TOPF word reading test consists of words with atypical grapheme-to-phoneme translations, requiring examinees to have prior knowledge of the words in order to pronounce them correctly (Barona et al., 1984; Nelson, 1982; Pearson, 2009; Wiens, Bryan, & Crossen, 1993). Because word reading tests primarily rely on previously learned information, they are believed to be less influenced by current cognitive ability and recently acquired neurocognitive deficits than other neuropsychological measures, allowing for their use as measures of premorbid functioning (Crawford, 1992; Green et al., 2008; Johnstone, Hogg, Schopp, Kapila, & Edwards, 2002; Pearson, 2009).
In comparing the three premorbid estimation methods available via the TOPF, the combined word reading and simple demographic model accounted for the greatest proportion of Full Scale Intelligence Quotient (FSIQ) variance (62%) in the standardization sample (Pearson, 2009). This was followed by word reading performance alone (49%) and, finally, simple demographic prediction alone (46%). Despite these and similar findings showing that demographic information and word reading test performance strongly correlate with FSIQ (e.g., Dykiert & Deary, 2013; Schretlen, Buffington, Meyer, & Pearson, 2005), limitations in premorbid prediction models based on this information have been noted, especially with regard to certain populations (e.g., Reynolds, 1997).
One limitation is that estimates based on current test performance are only accurate insofar as the tests used to estimate premorbid ability are immune to current cognitive change. Although word reading test performance is generally believed to be relatively resistant to neurological injury in comparison with other neuropsychological measures, certain neurological diagnoses, including dementia, severe traumatic brain injury (TBI), and aphasia, are associated with declines in word reading test performance (Leritz, McGlinchey, Lundgren, Grande, & Milberg, 2008; Mathias, Bowden, Bigler, & Rosenfeld, 2007; McFarlane, Welch, & Rogers, 2006; Patterson, Graham, & Hodges, 1994; Pearson, 2009). Furthermore, using word reading test performance in individuals with long-standing verbal deficits, such as those with verbal learning disorders, would likely underestimate levels of predicted intellectual functioning in those individuals. Therefore, word reading methods may not provide accurate estimates of premorbid functioning for all examinees and deciding which premorbid estimate method is most appropriate for a given examinee requires some level of clinical discernment.
Given that failure on performance validity tests (PVTs) accounts for a significant proportion of the variance in neuropsychological test performance (Constantinou, Bauer, Ashendorf, Fisher, & Mccaffrey, 2005; Meyers, Volbrecht, Axelrod, & Reinsch-Boothby, 2011), clinicians might wonder about the accuracy of estimating premorbid functioning from word reading measures in examinees invalidating neuropsychological testing. According to a recent survey, such concern may be of particular relevance given that 99.7% and 93.6% of neuropsychologists believe validity testing to be desirable, if not mandatory, in forensic and clinical evaluations, respectively (Martin, Schroeder, & Odland, 2015). Results reported by Martin, Schroeder, Heinrichs, and Baade (2015) as well as findings by Green, Rohling, Lees-Haley, and Allen (2001) have shown verbal ability test performance to be negatively affected by PVT failure, although the magnitude of this effect appears to be somewhat less than that seen in other cognitive domains. Three studies (Davis, McHugh, Axelrod, & Hanks, 2012; Sawyer, Young, Roper, & Rach, 2014; Whitney, Shepard, Mariner, Mossbarger, & Herman, 2010) have examined the effects of test invalidity on word reading tests, specifically, and the findings were mixed.
Whitney et al. (2010) examined the construct validity of the Wechsler Test of Adult Reading (WTAR) among a sample of U.S. military veterans by examining the associations between the WTAR and verbal intellectual functioning and memory. Additionally, the study compared WTAR performance between those failing (n = 26) and passing (n = 98) the Test of Memory Malingering (TOMM; Tombaugh, 1996). While those failing the TOMM performed worse across measures of true memory ability than those passing the TOMM, there were no differences in WTAR performance across the two groups, leading the authors to conclude that the WTAR word reading test may remain robust even in cases of neuropsychological invalidity.
Davis et al. (2012) examined neuropsychological test performance in individuals who underwent neuropsychological evaluations for reported TBI in the context of civil litigation or disability claims. The study found that 69 individuals failing two or more PVTs performed significantly worse than 61 individuals passing all administered PVTs across neuropsychological tests including the North American Adult Reading Test (NAART). On average, individuals in the PVT fail group obtained a standard score of 92.5 on the NAART, as compared with a score of 101.3 obtained by those in the PVT pass group.
Sawyer et al. (2014) examined the impact of test validity on a number of neuropsychological measures including verbally mediated “hold tests,” such as the Word Reading subtest from the Wide Range Achievement Test–Fourth Edition (WRAT-4; Wilkinson & Robertson, 2006). The study found that 32 patients failing 2 or more PVTs performed significantly worse on the Word Reading subtest than 84 patients passing all PVTs in a sample of U.S. veterans. Classification based on number of PVT failures accounted for 4% of the variance in WRAT-4 Word Reading scores. In comparison, PVT classification accounted for 7% to 12% of the variance on verbal subtests and 8% to 19% of the variance on perceptual reasoning tests of the WAIS-IV.
While these previous studies have investigated the impact of performance invalidity on various word reading tests, further research is needed for several reasons. First, the findings and conclusions varied across these studies, with two studies indicating an association between PVT failure and word reading test performance and one indicating no association. Thus, the existence and magnitude of a relationship between performance invalidity and word reading performance remains unclear. Second, only one of the above studies (i.e., Sawyer et al., 2014) controlled for the effect of education when examining the relationship between invalidity and word reading performance. Not controlling for education is problematic given that word reading test performance correlates strongly with education (r = .55; Pearson, 2009) and because level of education was higher in the PVT pass group relative to the PVT fail group in each of the above studies. Therefore, further research is needed to clarify whether word reading performance is affected by performance invalidity above and beyond the effects of educational differences in those passing and failing PVTs. Third, no studies to date have investigated the impact of performance invalidity on the TOPF, which is the only word reading test developed and validated to estimate WAIS-IV and WMS-IV test performance. Finally, while the above studies provide useful information regarding the possible association between performance invalidity and word reading test performance, the studies do not speak to the frequency of invalid word reading performance in those failing validity testing. While examining group differences can be helpful in determining an effect of a grouping variable on a given outcome, such analysis does not demonstrate the likelihood that individuals of a group of interest are influenced by the effect.
The goal of the current study was to examine the impact of PVT failure on TOPF word reading test performance by testing several hypotheses. First, it was hypothesized that PVT failure status would have a significant effect on TOPF word reading test performance, with individuals failing PVTs performing worse than those passing PVTs. Second, it was hypothesized that TOPF word reading performance would be associated with degree of PVT failure, even after controlling for education. Third, it was hypothesized that a greater proportion of individuals failing ≥2 PVTs, as compared with those passing PVTs, would obtain a score on the TOPF word reading test that was abnormally low relative to the score expected given demographic information. Testing these hypotheses was believed to be of clinical relevance as it is currently unclear whether TOPF word reading estimates of premorbid functioning can be relied on in individuals failing PVTs.
Method
Participants
Patients were identified retrospectively from an archival database consisting of 661 individuals referred for comprehensive outpatient neuropsychological evaluations at a university medical center. Only patients administered at least the TOPF word reading subtest and simple demographic questionnaire, the WAIS-IV, and three or more PVTs were retained for the current study. Additionally, only patients age 20 and above were included, as TOPF simple demographic estimates of premorbid functioning for individuals below age 20 are formulated using different demographic information than for individuals age 20 and above. Specifically, the demographic-based model for examinees below age 20 uses years of parent education instead of years of examinee education and omits occupation as a model predictor. Individuals diagnosed with dementia or cognitive impairment secondary to left-hemisphere cerebrovascular insult were excluded because word-reading estimates of premorbid functioning are often inaccurate in these individuals (Pearson, 2009). No remaining patients had a reported history or diagnosis of verbal learning disorder. Diagnoses were made prior to the initiation of this study and at the time of the patient’s initial evaluation based on information gathered from medical records, collateral sources, patient interview and observation, and neuropsychological test data. Three patients were excluded due to incomplete TOPF information. No other missing values were identified. This left an overall sample of 252 patients.
Study patients were classified according to whether they failed 0 PVTs (n = 160), 1 PVT (n = 50), or ≥2 PVTs (n = 42). Diagnoses for each group are listed in Table 1. The type and frequency of various identified external incentives by group can be seen in Table 2. An external incentive was identified in 83.4% of those failing ≥2 PVTs, but in only 34.4% of those passing PVTs. The most commonly identified external incentive across groups was “seeking or maintaining disability.” Proportion of patients by general diagnostic category with a known external incentive includes: mild TBI (54%), moderate/severe TBI (60%), psychiatric diagnosis (43%), somatoform/conversion disorder (59%), mild cognitive impairment (20%), cerebrovascular disease (38%), other health issue (43%), cognitive disorder NOS/other cognitive disorder (41%), no disorder/unsubstantiated cognitive decline (55%). It should, however, be noted that failure to identify an external incentive does not guarantee a true absence of an external incentive (van Egmond & Kummeling, 2002). Additionally, base rates of malingering in forensic neuropsychological samples range from 30% to 54% (Ardolf, Denney, & Houston, 2007; Larrabee, 2003; Mittenberg, Patton, Canyock, & Condit, 2002), suggesting that approximately 50% or greater of individuals with a clear external incentive provide valid performance. Thus, neither the absence nor the presence of a known external incentive can definitively rule in or rule out the possibility that a patient’s test performance is in fact influenced by an external incentive. Nevertheless, these numbers provide additional descriptive information regarding the sample and group characteristics.
Frequencies of Diagnostic Impressions Separated by PVT Classification.
Note: PVT = performance validity test; TBI = traumatic brain injury; ADHD = attention deficit hyperactivity disorder; BIF = borderline intellectual functioning; HIV = human immunodeficiency virus.
Presence and Type of External Incentive.
Note: PVT = performance validity test.
Measures
All patients were administered comprehensive neuropsychological batteries consisting of the TOPF, WAIS-IV, at least three PVTs, and other neuropsychological testing, all of which was administered according to standardized procedures. The PVTs used in the current study, their associated cutoffs, and their administration and failure rates can be seen in Table 3. Eight PVTs were used overall to determine PVT status; however, no patient was administered more than 7 PVTs and the average number of PVTs administered was 4.65 (SD = 1.14). Five of the eight PVTs were stand-alone measures and three of the eight were PVTs embedded within measures of true neuropsychological functioning. Additionally, none of the patients were categorized on the basis of embedded PVT test performance alone, and all patients were administered at least one standalone PVT.
Cutoff Points for Administered Performance Validity Tests.
Note. PVT = performance validity test; IR = inconsistent responses; WAIS-IV = Wechsler Adult Intelligence Scale–4th Edition, %adm = proportion of overall sample administered the PVT; % Fail = proportion administered the measure who failed the measure.
Opinions as to whether the administration of multiple PVTs within an evaluation decreases overall specificity rates as the number of PVTs administered increases have been mixed (Davis & Millis, 2014; Larrabee, 2014; Odland, Lammy, Martin, Grote, & Mittenberg, 2015). Larrabee (2014), as well as Davis and Millis (2014), present empirical data showing specificity to be 88.9% and 85%, respectively, when 2 PVTs are failed and 7 PVTs are administered, whereas Odland et al. (2015) suggest that specificity declines to 82.5% when 7 PVTs are administered with only 2 PVTs failed. It should be noted, however, that in the present study, only 5 patients in the ≥2 PVT fail group were administered the maximum number of PVTs administered (i.e., 7), and that all these patients failed at least 4 of the PVTs. Such failure is indicative of perfect specificity according to clinical data reported by Larrabee as well as Davis and Millis, and indicative of 97.1% specificity according to Monte Carlo simulated data reported by Odland et al. (2015). Thus, despite discussion that increased PVT administration might potentially inflate false positive rates, acceptable specificity was maintained in classifying patients to the ≥2 PVT fail group for the current study across varying standards of specificity.
The current study used TOPF estimates of premorbid FSIQ derived from word reading test performance and simple demographic information. Several additional TOPF values were used for the current analyses. These additional values require some explanation given that they may be potentially unfamiliar to some individuals. The obtained TOPF word reading standard score (referred to as the “actual” TOPF score by the ACS manual) is an age-corrected standard score based on an examinee’s TOPF word reading raw score. It is this value that is used to calculate word reading derived estimates of FSIQ as well as estimates of other WAIS-IV indices, by means of equipercentile equating. In addition to the obtained word reading test standard score, the ACS package provides a predicted word reading standard score, which is a prediction of an examinee’s TOPF word reading performance based on his or her demographic information. The predicted word reading score approximates what an examinee’s word reading score should be based on demographic information, and provides a benchmark to which to compare an examinee’s obtained word reading score. The greater the discrepancy between an obtained and predicted word reading test standard score, the less likely it becomes that the obtained score is one that normally occurs in the healthy population (Pearson, 2009). Additionally, the ACS package provides normative base rates, indicating the frequencies at which discrepancies between obtained versus demographically predicted word reading scores occurred in the normative sample, which are to aid clinicians in determining whether a given discrepancy is abnormal. The ACS manual suggests that clinicians interpreting TOPF derived premorbid estimates take into account this base rate information and use caution in interpreting the TOPF word reading derived premorbid estimates when an abnormally low base rate is obtained. While the ACS manual does not specify at what base rate a discrepancy should be considered abnormal, for the purposes of the current study, a base rate of 10% or less was considered to indicate abnormality.
Data Analyses
ANOVAs were used to compare the PVT pass, 1 PVT fail, and ≥2 PVT fail groups on several variables related to the TOPF, including TOPF word reading standard score. Additionally, the magnitude of the discrepancy between TOPF word reading data and demographic derived data was compared. This was done in two ways. First, groups were compared regarding the discrepancy between the obtained TOPF word reading standard score and the demographically predicted word reading standard score. Second, groups were compared regarding the discrepancy between FSIQ estimates derived from the TOPF simple demographics model and the FSIQ estimates derived from the TOPF word reading model. Both discrepancies were calculated by subtracting the TOPF word reading derived score from the demographically derived score. Thus, the differences are not absolute values but, instead, reflect the average standard score values that TOPF word reading scores fall below demographic based scores. Finally, the FSIQ estimate derived from a combination of both word reading test performance and simple demographic information was compared across groups. Across all group ANOVAs, Bonferroni corrected post hoc comparisons were conducted to examine pairwise differences.
To examine the effects of PVT failure on TOPF word reading test performance while controlling for education, a partial correlation was performed. Additionally, discrepancies between obtained TOPF word reading standardized scores and demographically predicted TOPF word reading standardized scores were analyzed and compared with normative sample base rates from the ACS prediction sample. As noted above, obtained TOPF word reading scores that were lower than the demographically predicted TOPF word reading scores to an extent seen in 10% or less of the normative sample were considered abnormally low. The proportion of patients with abnormally low obtained word reading scores was compared between the PVT pass and the ≥2 PVT fail group using χ2 analysis. Finally, a receiver operating characteristic (ROC) analysis was used to determine area under the curve (AUC) for the TOPF word reading obtained versus predicted discrepancy base rate to determine if the normative base rate could be used to identify those individuals providing invalid test performances.
Because it is not uncommon for both validly performing and invalidly performing patients to fail 1 PVT (Victor, Boone, Serpa, Buehler, & Ziegler, 2009) a common approach within PVT research is to exclude patients failing 1 PVT for the purpose of more accurately categorizing true-valid and true-invalid patients, following a known-groups method (e.g., Davis et al., 2012; Sawyer et al., 2014; Wolfe et al., 2010). At the same time, it has been noted (e.g., Mossman et al., 2014) that important information may be lost by excluding ambiguous cases in PVT research (i.e., those failing 1 PVT), as such cases are common in clinical situations and often pose the greatest interpretative challenge to clinicians. Because of this, all analyses, except for the final two analyses listed above, compared those passing PVTs to both the 1 PVT fail and ≥2 PVT fail groups. For the final two analyses, which compared the proportion of abnormally low obtained versus predicted word reading scores between valid and invalid test-takers within the current sample alongside two identified clinical groups of the ACS and then examined the accuracy of low versus predicted word scores in identifying invalid test takers, a known-groups method was thought most appropriate and those failing 1 PVT were excluded.
Results
Demographic information is available in Table 4. Group differences in education and age were analyzed using ANOVAs. Education varied significantly across groups, F(2, 249) = 7.882, p < .001. Post hoc comparisons using Bonferroni correction revealed years of education to differ significantly between the PVT pass group (M = 14.6) and the ≥2 PVT fail group (M = 12.9; p < .001). All other pairwise comparisons regarding education were nonsignificant. Differences in age across PVT groups were not significant at p < .05. Neither distribution of gender, χ2 = 0.782, p = .239, nor ethnicity, χ2 = 1.547, p = .177, differed significantly across the PVT pass and ≥2 PVT fail groups.
Demographic Information.
Note. PVT = performance validity test.
Significant at p < .001.
As illustrated by Table 5, PVT failure had a significant effect on TOPF word reading performance, with those failing both 1 PVT and ≥2 PVTs performing significantly worse on TOPF word reading than those in the PVT pass group. Additionally, individuals in the PVT failure groups, as compared with those passing all PVTs, produced significantly lower standardized scores on the TOPF word reading test relative to their demographically predicted word reading performance. More specifically, on the TOPF word reading test, individuals failing ≥2 PVTs performed approximately 9 standard score points, on average, below their demographically predicted TOPF score. In comparison, those passing all PVTs averaged only a 3-point discrepancy between their obtained TOPF word reading standard score and their demographically predicted TOPF word reading score. A similar trend emerged when comparing discrepancies between TOPF estimated FSIQ and simple demographic estimated FSIQ across PVT status groups. Finally, both PVT fail groups produced significantly lower combined TOPF word reading and simple demographic premorbid FSIQ estimates as compared with the combined premorbid FSIQ estimate produced by the PVT pass group.
TOPF Performance According to PVT Failure Status.
Note. ACS = Advanced Clinical Solutions; TOPF = Test of Premorbid Functioning; PVT = Performance Validity Test; FSIQ = Full Scale Intelligence Quotient; SS = standard score.
Indicates a statistically significant difference (p < .05) between the given PVT failure group and the PVT pass group using Bonferroni corrected post hoc comparisons.
TOPF word reading performance was significantly related to both number of PVTs failed, r(250) = −.330, p < .001, and years of education, r(250) = .455, p < .001. Additionally, number of PVTs failed was also related to years of education, r(250) = −.198, p = .002. Therefore, a partial correlation was analyzed to examine whether the association between PVT failure and TOPF word reading performance was independent of the effect of education. When controlling for education, the relationship between number of PVTs failed and TOPF word reading performance remained significant, r(249) = −.275, p < .001. PVT failure accounted for 7.6% of the variance in TOPF word reading performance, even after partialling out the effects of education.
The proportion of patients in the ≥2 PVT fail group with an abnormally low TOPF word reading score was significantly greater than the proportion of patients in the PVT pass group with an abnormally low score (χ2 = 6.70, p = .011) when abnormal was defined as an obtained TOPF word reading score that was lower than a demographically predicted score to an extent seen in 10% or less of the normal population. As seen in Figure 1, when using a normative base rate cutoff of 10%, patients failing PVTs were twice as likely as patients passing PVTs to produce an abnormally low TOPF word reading score (33% vs. 16%). Also in Figure 1, it is evident that even when using different base rate criteria to define abnormality, individuals remained at a higher risk for producing abnormally low TOPF word reading scores if failing versus passing PVTs. That is, of individuals in the ≥2 PVT fail group, 14% obtained a TOPF word reading score that was lower than their demographically predicted TOPF word reading score to an extent seen in 5% or less of the ACS normative sample. In comparison, only 7% of individuals in the PVT pass group produced such a discrepancy. Similarly, 55% of the ≥2 PVT fail group versus 34% of the PVT pass group obtained a TOPF word reading score lower than TOPF demographically predicted score to an extent equivalent to a 25% normative base rate. The difference in proportion of abnormally low TOPF obtained scores between the ≥2 PVT fail group and the PVT pass group was statistically significant at p < .05 when using a normative sample base rate of 25% (χ2 = 5.83, p = .013), but not a base rate of 5%. For comparison, Figure 1 also shows the proportion of individuals with Alzheimer’s disease and moderate/severe TBIs (clinical comparison groups obtained from the ACS manual) who obtained abnormally low TOPF scores when various base rate criteria were applied.

Rates of abnormally low TOPF word reading scores when using various normative base rates and comparing PVT pass, PVT fail, and ACS clinical groups.
An ROC analysis examining the accuracy of the word reading obtained versus predicted discrepancy normative base rate in classifying those failing versus passing PVTs produced an AUC of 0.67 (see Figure 2), whereas an AUC of 0.70 or higher is needed to demonstrate acceptable discrimination (Larrabee & Berry, 2007; Ross, Millis, Krukowski, Putnam, & Adams, 2004). Using a normative base rate of 7.2% as a cutoff score resulted in a false positive rate of 10% (i.e., specificity of 90%) and a sensitivity of 19%.

ROC curve demonstrating TOPF discrepancy base rates as a means for identifying those failing versus passing PVTs.
Discussion
The current study examined whether those failing PVTs are also likely to underperform on the TOPF. To answer this question, TOPF performance was compared across PVT fail and PVT pass groups, and the frequency of abnormally low TOPF scores in the PVT fail group was compared with that found in other clinical groups. The latter analysis was conducted using ACS features that make the TOPF unique among word reading based premorbid predictors. The TOPF allows the comparison between an examinee’s obtained word reading performance and the word reading performance predicted by his or her demographic information. Not only does the ACS package provide the magnitude of the difference between obtained and predicted standardized TOPF word reading scores, it also allows users to see the frequencies at which such differences occurred in the normative sample. In doing so, the ACS package provides a means for determining the validity of TOPF performance in estimating premorbid intellectual functioning. When an examinee’s obtained TOPF word reading score is abnormally low relative to his or her TOPF demographically predicted word reading score, as demonstrated by a low normative base rate of this difference, the likelihood that TOPF word reading provides an accurate representation of premorbid functioning decreases. For the purposes of the current study, a discrepancy was considered abnormal if occurring in 10% or less of the normative sample.
Base Rates for Abnormally Low TOPF Performance
The current study found that individuals failing ≥2 PVTs were more than three times as likely to obtain abnormally low TOPF word reading scores as those in the normative sample and approximately twice as likely as those in our clinical sample of individuals passing PVTs. Furthermore, the proportion of individuals failing ≥2 PVTs who produced abnormally low TOPF word reading scores was roughly equivalent to that of the ACS Alzheimer’s dementia group and twice that of the ACS moderate/severe TBI group. This is salient given that previous research suggests word reading performance to not accurately estimate premorbid ability in many individuals of these clinical groups (Leritz et al., 2008; Mathias et al., 2007; McFarlane et al., 2006; Patterson et al., 1994; Pearson, 2009).
While the current study defined abnormality as occurring in 10% or less of the normative sample, similar results were also found when using base rates of 5% and 25%. Overall, these findings provide strong evidence that underperformance on TOPF word reading is much more likely in individuals failing validity testing versus individuals passing validity testing. This conclusion is further supported by the study’s findings that individuals failing versus passing PVTs obtained significantly lower TOPF word reading standard scores and had significantly greater discrepancies between their FSIQ estimated scores derived from TOPF word reading performance versus those derived from their simple demographics.
Use of ≥2 PVT and 1 PVT Fail Groups
In comparing base rates between those failing ≥2 PVTs and those passing all PVTs, a known-groups design approach was employed. Such a method has the advantage of comparing characteristics of two groups believed to exemplify two poles of a given construct while minimizing the likelihood of misclassification. For the current study, those in the ≥2 PVT fail group are believed to be true-invalid test-takers and those in the PVT pass group are believed to be true-valid test-takers. In addition to base rate comparisons, several analyses were also conducted to examine the differences between group means on TOPF variables using ANOVAs. A known-groups approach can also be employed for these analyses by examining the results of the pairwise post hoc comparisons between those passing all PVTs and those failing ≥2 PVTs.
At the same time, analyses comparing group mean performance also included a group of patients failing 1 PVT. While research suggests that the validity status of such patients may be ambiguous (Victor et al., 2009), such cases occur frequently in clinical settings and it was believed that the TOPF performance of such individuals would also be of interest to clinicians. While this group may represent a heterogeneous group consisting of both true-valid and true-invalid test-takers, it is of interest that this group also performed significantly below the PVT pass group. Interestingly, while those failing 1 PVT performed significantly worse than those passing all PVTs across outcome variables listed in Table 5, their scores were consistently better than those failing ≥2 PVTs. Such findings, suggest that as confidence for performance invalidity increases, underperformance on TOPF word reading decreases, a conclusion further supported by the finding that number of PVTs failed was inversely correlated with TOPF word reading performance.
Comparison to Previous Research
The findings of the current study are consistent with those seen in two of three previous studies that examined associations between PVT failure and word reading test performance. Davis et al. (2012) and Sawyer et al. (2014) found performance invalidity to negatively impact performance on the NAART and WRAT-4, respectively. Conversely, the current findings are at odds with those of Whitney et al. (2010), which concluded word reading test performance to be resistant to validity test failure as indicated by TOMM classification. Of note, however, study inclusion criteria stated by Whitney et al. indicated that only a subset of potential patients were administered the TOMM and thus eligible for the study. The choice of administering the TOMM was influenced by clinical observations of the patient’s credibility; patients “whose reported symptoms of medical or neurological illness raised suspicion of negative response bias” were included (Whitney et al., 2010, p. 198) and any patients who had an incentive to perform well on testing were excluded. This is in contrast to inclusion/exclusion criteria used by the current study as well as that reported in both Davis et al. and Sawyer et al. Unlike Whitney et al., these later studies did not intentionally select individuals who raised suspicion of response bias, or exclude individuals with an incentive to do well, practices that could potentially increase base rates of malingering in the sample. This is notable because inflating malingering base rates would potentially increase the rate of false negatives in the PVT pass group by reducing negative predictive power.
Whitney et al. (2010) also differed in their approach to classifying patient test performance as valid or invalid. Davis et al. (2012), Sawyer et al. (2014), and the current study each administered multiple PVTs to patients, and separated patients according to whether they passed all PVTs or failed two or more PVTs, a practice purported to better discriminate between valid and invalid examinees as compared with the practice of classifying individuals based on pass or failure of a single test (Chafetz et al., 2015; Larrabee, 2008; Odland et al., 2015). Conversely, Whitney et al. classified patients solely on TOMM performance. Such an approach would be expected to further amplify rates of false negatives in the PVT pass group, as research shows that the TOMM, when not used in combination with other measures, produces false negative rates (1 = sensitivity) of approximately 50% (Bashem et al., 2014; Buddin et al., 2014; Greve, Ord, Curtis, Bianchini, & Brennan, 2008; Schroeder et al., 2013). Differences in the findings between Whitney et al. (2010) and the other studies could, therefore, potentially be due to weaker separation of true valid and true invalid examinees secondary to the former’s group assignment approaches.
Consideration of Level of Education
A major contribution of the current study is that it demonstrates that PVT failure negatively impacts word reading test performance independent of the impact of level of education. In each of the three previous studies examining the relationship between PVT failure and word reading test performance, those failing PVTs had lower educational attainment than those passing PVTs. This is problematic given that education level is strongly correlated with word reading test performance (Pearson, 2009).
The current study demonstrated an association between increased PVT failure and lower word reading test performance, after removing the effects of education, in two ways. First, a partial correlation was used to demonstrate that number of PVT failures significantly related to, and accounted for 7.6% of the variance of, TOPF word reading score even after controlling for education. Additionally, many of the analyses in this study used as their outcome variable the discrepancy between an obtained word reading score and a demographically predicted word reading score. In doing so, each individual’s TOPF word reading performance was considered in relation to his or her simple demographic information, obviating the impact of educational differences across PVT status groups. For example, the demographically predicted TOPF word reading standard score of a 50-year-old White male with 14 years of education, who works in customer service, is 100. The demographically predicted TOPF word reading standard score of an individual who completed 10, as opposed to 14, years of education, with otherwise identical demographic information, is 89. If these same two individuals both obtained an actual TOPF word reading score of 85, the magnitude of the discrepancy between their predicted and obtained scores would differ as a function of level of education. That is, the individual with 14 years of education would produce a 15-point discrepancy, whereas the individual with 10 years of education would produce a 4-point discrepancy. Thus, by considering the discrepancy between demographically predicted performance and actual performance, as opposed to considering actual performance alone, these analyses effectively correct for education. As a result of these two approaches, findings from this study demonstrate that differences in TOPF word reading performance seen between those failing and passing PVTs cannot be explained by differences in level of education. Differences in TOPF word reading performance between individuals passing and failing PVTs are best attributed to rate of PVT failure/degree of test invalidity.
Clinical Significance
Although the mean TOPF scaled score for those failing 2 or more PVTs fell within the average range of functioning, the findings are still of notable clinical significance. On average, the difference in TOPF wording reading performance between those passing and those failing PVTs was approximately two thirds of a standard deviation. Additionally, those failing PVTs performed approximately two thirds of a standard deviation (9 standard score points) below that expected given their demographic information on TOPF word reading. Such a difference is of the same magnitude as that often seen in literature comparing processing speed in healthy controls versus patients with moderate/severe TBI (e.g., Donders & Strong, 2015), a difference concluded to be of both statistical and clinical significance. Additionally, we believe such a discrepancy, when speaking specifically to premorbid estimates, to be especially important. For example, definitions for mild neurocognitive disorder (ND) in the current DSM require a one standard deviation decline from premorbid abilities. Thus, if an examinee’s premorbid estimate were two thirds a standard deviation below their actual premorbid ability, this could easily cause the examinee to be classified as normal when they would otherwise be classified as mild ND or as mild ND when they would otherwise be classified as major ND. Additionally, it is important to note that group mean comparisons often mask significant findings within individuals of the group. For this reason, the proportion of individuals with abnormally low TOPF actual relative to predicted scores were analyzed and compared across groups. In these analyses, reported in Figure 1, the authors found that one third of the PVT fail group produced abnormally low TOPF scores and that this was twice the proportion seen in a clinical group passing PVTs and three times that seen in the normative sample.
Despite the finding that TOPF word reading performance is often suppressed in the presence of failed validity testing, the current results do not support the TOPF word reading test base rate comparison as an embedded measure of performance validity. Specifically, the present study used a ROC to examine the accuracy of the TOPF word reading obtained versus predicted discrepancy normative base rate provided by ACS software as a validity indicator. Analyses found that acceptable specificity (≥90%) was achieved when a normative base rate of 7.2% was used as a cutoff score; however, such a cutoff score resulted in only 19% sensitivity and the AUC, overall, was only 0.67. Multiple authors (e.g., Hosmer & Lemeshow, 2000; Larrabee & Berry, 2007; Ross et al., 2004) recommend that an AUC of 0.70 or greater is needed to indicate that an instrument provides acceptable discrimination. Thus, the current findings indicate that the TOPF discrepancy base rate discrepancy does not provide adequate discrimination as an embedded validity measure.
It is important to note that there are cases where TOPF word reading performance will be substantially lower than what would be expected based on demographic information, and yet such performances still provide valid, stable, and uncompromised representations of word reading ability. Such would be the case, for example, if an individual obtained an average level of educational attainment despite having a below average reading aptitude. In general, however, an abnormally low obtained word reading score versus demographically predicted TOPF word reading score raises the suspicion that current clinical factors (whether invalidity or a neurological condition), as opposed to premorbid factors, are influencing an examinee’s TOPF word reading performance. As the degree of the discrepancy between TOPF obtained word reading scores versus TOPF demographically predicted scores increases, the likelihood of the TOPF performance being a valid estimate of premorbid ability decreases.
Limitations
Limitations of the current study include use of a predominantly Caucasian sample collected from a single university medical center. Thus, racial and regional differences in the association between PVT performance and word reading test performance could not be examined despite these factors being considered in analyses which incorporated demographic predictions of TOPF performance and FSIQ. Additionally, the study sample was above average in education, which likely explains why demographically predicted estimates of FSIQ were also higher than the normative mean. These factors should be remembered when considering the generalizability of the results to specific individuals, especially to those with differing races, levels of education, and geographical locations. Finally, the results should not be generalized to individuals diagnosed with dementia, left hemisphere stroke, or verbal learning disorders, groups excluded from the current study due to the impact of these pathologies on word-reading tests and/or PVTs.
Because the sample is comprised of patients with a variety of neurologic and psychiatric diagnoses, a uniform indication of injury or condition severity is not available. Thus, a potential limitation of the study is that there is no quantitative means for comparing injury or severity level across groups; however, diagnostic heterogeneity across PVT fail and pass groups is common across the PVT literature (e.g., Davis & Millis, 2014; Meyers et al., 2014; Schroeder & Marshall, 2011; Victor et al., 2009). Furthermore, qualitative comparison between the PVT fail and pass groups of the current study suggests that those in the ≥ 2 PVT fail group would not be expected to have greater cognitive dysfunction than those in the PVT pass group. For example moderate/severe brain injury made up approximately 10% of the PVT pass group and only 2% of the PVT fail group. Patients listed as moderate/severe TBI, Mild Cognitive Impairment, or Cognitive Disorder NOS comprised 33% of the PVT pass group, and only 5% of the PVT fail group. Conversely, psychiatric disorders including somatoform or conversion, mild TBI, and unsubstantiated cognitive decline made up the majority of patients in the ≥2 PVT fail group. This suggests that those in the fail group are less severely cognitively impaired and that impairment severity is not responsible for group TOPF differences.
One might wonder if the differences between PVT status groups evident within the mixed sample might also emerge when looking specifically at certain diagnostic groups. While the use of clinical samples heterogeneous with respect to diagnosis is common within the PVT literature, use of mild TBI samples is also common given that mild TBI is a commonly alleged etiology of cognitive difficulties in forensic neuropsychological cases. Unfortunately, only a small proportion of the current sample reported cognitive complaints secondary to mild TBI, and thus statistical comparison between PVT status groups with mild TBI would have been notably underpowered. At the same time, it is worth mentioning, from a purely descriptive standpoint, that the 5 patients with mild TBI failing ≥2 PVTs averaged a TOPF standard score of 91.4, which was less than those passing validity testing with either mild TBI (TOPF = 103.7; n =7) or moderate/severe TBI (TOPF = 95.2; n = 15). Statistical comparison between PVT groups within a mild TBI sample large enough to provide adequate power would help to demonstrate that these results can be easily generalized to certain specific populations, which would be expected given that performance invalidity should manifest similarly regardless of the reported reason for cognitive complaints.
Conclusions
Overall, the current study, along with two of three previous studies examining relationships between validity test failure and word reading test performance, found performance invalidity to negatively impact word reading test performance. Furthermore, the current study found that even when word reading test performance is combined with demographic information to estimate premorbid FSIQ, these estimates are significantly lower in those failing versus passing performance validity testing. Given these findings, it is recommended that clinicians avoid estimating premorbid intellectual abilities based on word reading test performance in individuals who fail performance validity testing. In such individuals, accurate estimates of premorbid functioning are more likely to be obtained by using nonperformance based (i.e., demographic) information exclusively.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
