Abstract
It is almost self-evident that test results will be unreliable and misleading if those undergoing assessments do not make a full effort on testing. Nevertheless, objective tests of effort have not typically been used with young adults to determine whether test results are valid or not. Because of the potential economic and/or recreational benefits of obtaining the diagnosis of attention deficit hyperactivity disorder (ADHD) or a learning disability (LD), concerns have been raised regarding the ease with which unimpaired young adults can feign either of these disorders to gain access to test accommodations, stimulant medication, or disability benefits. Much evidence has been presented recently regarding the need for symptom validity tests (SVTs) in assessment of college-aged students seeking diagnoses of LD and/or ADHD. Four cases are presented here in which intelligence and other test scores of young adults greatly underestimated their actual abilities, owing to poor effort that sometimes went undetected. Selected effort tests for use with young adults are discussed. Objective testing of effort is recommended to avoid misinterpreting invalid test data, which is why the use of effort tests is now standard practice in forensic neuropsychology.
All psychological testing is an attempt to sample specific behaviours when an individual is given particular tasks to complete. Psychological testing demands adherence to standardized administration procedures to ensure reliability and validity of test results (Sattler, 1988). Standardization permits the clinician to establish procedural consistency and control over the administration of psychological tasks by holding constant the testing protocol. The ideal procedural methodology makes sure that, as far as possible, examinees are subjected to the same tasks in the same manner (Anastasi, 1988; Kamphaus, 1993; Kaufman, 1994). An integral part of this procedural methodology is adherence to the recommended guidelines for establishing rapport and the promotion of optimal effort levels across psychological and neuropsychological tasks.
An attempt is made by the psychological administrator to elicit the best performance from the student during the testing. The tester tries to engage cooperation and to encourage the participant to pay close attention to the task demands and take the testing seriously (Reitan & Wolfson, 1997). The tester tries to establish environmental conditions that minimize distractions, increase active participation, stimulate interest in the tasks, and minimize anxiety and fear responses (Kaufman, 1994; Prifitera & Saklofske, 1998, Prifitera, Saklofske, & Weiss, 2005). Hence, effective performance hinges on the ability of the examiner to establish rapport with the participant (Kaufman & Lichtenberger, 2000) and the ability to make sure that the participant works hard during the psychological or neuropsychological evaluation.
Despite a general acceptance of the need for optimal conditions to encourage the participant’s best effort on testing, little research has been done that details how to measure effort levels in participants objectively and how to identify when effort is suboptimal. A large percentage of clinicians use clinical judgment to determine the degree of effort put forth on the tasks by their clients; however, research has suggested that clinicians’ judgments of effort levels are inaccurate, deficient, and clinically faulty (Dawes, Faust, & Meehl, 1989; Oldershaw & Bagby, 1997). Faust, Hart, and Guilmette (1988) added to this research by demonstrating that when children were coached into demonstrating neurocognitive impairment, most clinicians failed to detect malingering. Faust, Hart, Guilmette, and Arkes (1988) replicated the study with adolescents and found the same results. Clinical judgment may therefore not be enough to identify invalid data owing to poor effort.
Poor performance on psychological and neuropsychological tasks may result from actual cognitive impairment and define a clinical picture accurately representing the participant’s overall ability profile. However, individuals who deliberately put forth poor effort will produce test results that are invalid and unreliable. Some clinicians discern poor effort by examining consistency levels across task performance (Kaufman, 1994; Reitan & Wolfson, 1997). Others simply assume that children and young adults have no vested interest in performing poorly in any psychological or neuropsychological evaluation. The assumption that these participants will exert consistent and optimal levels of effort in the testing has been challenged by a number of researchers (Faust, Hart, & Guilmette, 1988; McCaffrey & Lynch, 1992; Rogers, 1997). In particular, concerns have been raised regarding the ease with which young adults can feign the symptoms of ADHD and/or LD to access academic accommodations and other disability-related benefits offered at the postsecondary level (e.g., Harrison, 2006; Mullis, 2003; Suhr, Hammers, Dobbins-Buckland, Zimak, & Hughes, 2008; Sullivan, May, & Galbally, 2007). Recent research shows that there is good reason to ensure that optimal effort is exerted by the client being evaluated in these types of assessments. Indeed, in addition to gaining access to academic accommodations, Canadian postsecondary students diagnosed with LD or ADHD can receive provincial and federal bursaries for purchasing computers and assistive adaptive technology software, obtain tax benefits as a person with a disability, and access medications such as stimulants for illicit use (Azar, 2008; Conti, 2004; Harrison, 2004; Harrison, Edwards, & Parker, 2007). As such, there appear to be many secondary gain incentives for students in Canada to feign LD or ADHD, and research has clearly shown that effort must be evaluated in such situations to ensure detection of possible response bias (Iverson, 2006).
Children with ADHD often perform poorly on a variety of academic and psychological tasks and demonstrate significant motivation difficulties, even though there is evidence of adequate cognitive capacities (Barkley, 1999; Douglas, 1983; Flaro & Green, 2000). Of particular concern, it has been demonstrated that those motivated to feign LD can do so easily on tests of phonological awareness, word decoding, reading and processing speed (Harrison, Edwards, & Parker, 2008), and other tests of academic fluency (Sullivan et al., 2007). Those wishing to feign ADHD can do so in a manner that is undetectable when filling out self-report symptom inventories (Fisher & Watkins, 2008; Harrison et al., 2007; Jachimowicz & Geiselman, 2004; Quinn, 2003), measures of processing speed (Harrison et al., 2007), neuropsychological tests (Suhr et al., 2008), and on computerized vigilance tests (e.g., Henry, 2005; Leark, Dixon, Hoffman, & Huynh, 2002; Quinn, 2003). Such research evidence supports the need for SVTs in assessment of students seeking accommodations at the postsecondary level, as the base rate for malingering or symptom exaggeration in this group of students may meet or exceed the rate of malingering found in other medico-legal contexts (Alfano & Boone, 2007; Harrison & Edwards, 2010; Suhr et al., 2008; Sullivan et al., 2007).
The lack of objective measures for assessing effort levels in psychological and educational testing results primarily from an emphasis on determining the antecedent causes of poor effort. This may be a viable theoretical position, but the fact of the matter is that poor effort contaminates the validity and the reliability of the test results, which sometimes leads to serious misinterpretations and misdiagnoses. In the literature, little attention has been paid to the consequences of poor effort, which include inconsistent task performance, inconsistent test results over time, overestimation of impairment, and misdiagnosis. To illustrate the importance of effort testing in psychoeducational assessments, several case studies will be presented.
The first two cases report data from students who admitted to having deliberately malingered after being confronted with data from symptom validity tests after their initial assessment; retest data are provided for both participants. The third student demonstrated a decline in test performance only when being assessed to document his need for accommodations at the postsecondary level. Given the significant decline in his scores over time, poor effort appears the most parsimonious explanation for his sudden deterioration in test performance. The last student likely performed poorly initially due to antisocial personality traits and lack of investment in the assessment. In all cases, specific details that would allow for client identification have been altered, but the actual test data and salient information have remained relatively intact.
Student 1: Kim’s Reading Fluency Increased by 21 Points
Kim was a 1st-year student who requested an assessment at the referral of a school counsellor to determine if he might have LD or ADHD. Kim was in his 1st year of university and was not achieving the marks expected by his family (his marks to that point had been solid Bs). As shown in Table 1, his initial test scores would suggest someone of modest intellectual ability who may have actually been overachieving academically relative to his ability; however, he also performed poorly on the Word Memory Test (WMT; Green, 2003; Green, Allen, & Astner, 1996; Green & Astner, 1995) and also the Victoria Symptom Validity Test (VSVT; Slick, Hopp, Strauss & Thompson, 1997).
Scaled Scores for Students 1 and 2 Who Admitted Malingering in Their First Assessment
Note: Scores in bold indicate significant changes in scores
After receiving feedback that his test scores could not be interpreted due to low effort, Kim returned to the referring counsellor and subsequently admitted that he had indeed deliberately attempted to feign cognitive problems in the hopes that he could secure academic accommodations and supports. He had lost face with his family due to his lower-than-expected marks and hoped that by being diagnosed as LD or ADHD he could regain their respect and have a legitimate reason for his unacceptable performance.
Upon retesting, it is noteworthy that many of his scores on timed tests improved remarkably. For example, his reading fluency score on the Woodcock-Johnson Psycho-Educational Battery–III (WJPB; Woodcock, McGrew, & Mather, 2001) improved by 21 points, and his VCI and POI scores on the Wechsler Adult Intelligence Scale–III (Wechsler, 1997) improved by one standard deviation (SD). With his improved effort, there was no evidence to support the presence of LD. Clearly, his low effort in the first assessment had affected his obtained scores on at least some of the tests administered.
Student 2: Suzie Admits to Symptom Exaggeration
Suzie was a 17-year-old woman with a longstanding identification as a student with an LD who was assessed to update her documentation so as to obtain accommodations and supports at the postsecondary level. Suzie was last tested in Grade 3, and the report was brief and did not provide a clear diagnostic statement. Nevertheless, she had obtained numerous supports and services since that time, including unlimited time on tests and exams, a scribe for tests, purchasing of a laptop for written work, and additional specialized software for reading print material. She wanted to retain all of these supports at the postsecondary level.
Before testing began, Suzie was reminded to invest her full effort in testing and warned that failure to do so would result in her scores being declared invalid. Nevertheless, her scores on two different SVTs (the WMT and the VSVT, see Table 1) both strongly suggested that she was not investing full effort in some or all aspects of the assessment. She was informed that the data could not be interpreted and that the psychologist could not determine whether she actually met the diagnostic criteria for an LD. Suzie became extremely upset and pleaded with the examiner, saying that she had only exaggerated this to ensure that she could continue to obtain the supports on which she had come to rely and that she had feared that her LD had been “cured” and would not show up on the tests administered. She wanted to be retested right away so that her accommodations could be put in place immediately and a laptop could be purchased for her by the federal disability bursary fund; she was informed that this would not be possible. Later, her mother insisted that Suzie had not deliberately feigned but had reduced effort only on tasks that reflected her severe reading impairment; she did not see that this would create a problem with test interpretation and argued that her daughter should be given the benefit of the doubt.
Mother and daughter agreed to a reevaluation of Suzie 6 months after the initial test. While most of the tests administered in the second assessment were different from the first (to avoid practice effects obscuring actual impairments), tests such as the newer WAIS-IV (Wechsler, Coalson & Raiford, 2008) and the WJPB-III were readministered to evaluate the effect of improved effort. Different symptom validity tests were included in the second battery (the DASH [Harrison et al., 2008] and the b Test [Boone et al., 2000]), and her performance was interpreted as within normal limits. As may be seen in Table 1, performance did not improve universally but increased specifically on timed reading tests and timed visual processing tests (one SD higher on processing speed from the WJPB and Digit Symbol on the WAIS). Working memory remained an area of weakness, as did specific tests of arithmetic (not shown in Table 1). The conclusions from the second assessment were that Suzie did show evidence of a lifelong impairment in arithmetic skills secondary to a working memory deficit and that accommodations specifically tailored to this disability would be appropriate. Unlimited time on all exams, however, was not warranted based on her timed reading test performance or any underlying processing problems with word decoding or phonemic awareness. Suzie and her mother accepted this diagnosis but were both disappointed to learn that Suzie would not qualify for a laptop being purchased for her at taxpayers’ expense.
Student 3: Andrew Shows Unusual Deterioration
Andrew had been assessed first in grade school due to parental concerns that he was underachieving. Although no formal diagnosis was provided in the first assessment, he was subsequently provided with academic accommodations in late elementary and high school. He was retested in Grade 11 at the insistence of his parents so that his documentation could be updated for entry into postsecondary education. In this assessment, he was diagnosed as having an attention deficit hyperactivity disorder because his Verbal Comprehension Index (VCI) score was 23 points higher than both his working memory and processing speed indices on the WAIS-III, and a language-based LD was suggested because there was a 37-point discrepancy between VCI and his obtained score on the WJPB-III reading fluency subtest. This was despite the fact that his score on the Test of Variables of Attention (TOVA; Greenberg, 1991) was normal relative to most other adults, no early history of attention or concentration difficulties was reported, his word reading and pseudoword decoding scores were average or better, written expressive skills on a timed writing test were very superior, and his performance on a timed reading comprehension test (Nelson Denny, Brown, Fishco, & Hanna, 1993) was well within the normal range for young adults.
This documentation was deemed insufficient to provide him the accommodations and supports he requested at postsecondary-level education, and so Andrew and his parents sought an updated assessment privately. As part of the assessment, the evaluator administered the Test of Memory Malingering (TOMM; Tombaugh, 1997). The TOMM has been shown to be a highly specific test but one that is not particularly sensitive to malingering (Gervais, Rohling, Green, & Ford, 2004). Hence, a passing score on the TOMM (a raw score of 45 or better on Trial 2 and a raw score of 45 or better on long-term recognition) does not guarantee full effort, but a score below this level is almost always due to feigning of symptoms. Indeed, apart from a severe dementing process that renders the individual incapable of independent living, no other neurological injury or illness can explain a below average score on this test, especially on the long-term recognition subtest (Iverson, LePage, Koehler, Shojania, & Badii, 2007; Rees, Tombaugh, & Boulay, 2001; Teichner & Wagner, 2004).
As shown in Table 2, it is evident that the scores he obtained on the TOMM cannot be explained by any known neurological process other than either severe dementia or malingering. Indeed, the scores Andrew returned on Trial 2 and retention on the TOMM fall more than 9 SDs below those returned by individuals with documented traumatic brain injuries, and 1.5 and 2.7 SDs, respectively, below individuals diagnosed with dementia who could no longer live independently in the community (Tombaugh, 1997). Clearly, since Andrew could live independently and was able to attend school full time, these scores cannot reflect an early dementing process.
Scaled Scores for Student 3
The psychological assessor explained away the findings from the TOMM, saying that they were so low precisely because of Andrew’s very severe and debilitating ADHD, which caused him to be so impaired that he could not attend to information in his environment. This rationalization seems to be at odds with Andrew’s actual behaviour. Of note is the fact that Andrew had never been in a motor vehicle accident despite driving for many years, and had performed as a high-level athlete in a sport that requires sustained attention and alertness. Also of note is the pronounced decline in Andrew’s functioning specifically on timed performance tests only when being reevaluated to document his need for extra time on tests and exams. Given that individuals typically improve their performance on psychometric tests after prior experience taking them (Cantron & Thompson, 1979), it is difficult to explain his one SD or greater drop in performance on all of the WJPB-III fluency tests, his two SD drop in performance on Symbol Search and Digit Symbol relative to his last assessment, and his two SD drop on the processing speed index of the WAIS. Unusual, too, was his impaired reading speed on the Nelson Denny, a measure that gauges time to read a passage in 1 min, even though his overall reading comprehension score over a 20-min period fell within the average range relative to most other adults.
Also difficult to understand was how Andrew’s performance on a test of vigilance and attention (the TOVA) was interpreted as normal when in high school, but his performance on a similar test of vigilance (the Integrated Visual and Auditory Continuous Performance Test [IVA]; Sandford & Turner, 1995) was interpreted as atypical when seen for his reassessment. ADHD is a developmental disability with symptoms that tend to be more impairing in the earlier grades (Hill & Schoener, 1996) and which decrease to some extent with age and with stimulant medication treatment. It is therefore difficult to understand how he could have performed in the normal range when on medication for the second evaluation (in high school), and yet have declined substantially and developed more symptoms of ADHD over the course of 2 years while continuing to take stimulant medication, especially when the normal developmental course for ADHD in general is for symptoms to decline to some extent.
Student 4: Melissa’s Reading Level and FSIQ Increased Significantly Over Time
Melissa was a 16-year-old girl from a high-achieving, wealthy professional family, who was referred for a neuropsychological evaluation by a friend of the family, Dr. Y, who requested assessment of a wide range of cognitive abilities. Dr. Y was concerned that there might be significant reading difficulties, holding back Melissa’s progress in school. He noted his perception of personality differences between Melissa and her older, highly academic sister. During a clinical interview with the second author, Melissa said that she really did not see the importance of formal academic studies. She indicated that she had problems with reading, saying that she read more slowly than her friends and that she did not like studying because it was too difficult. As a result of her reading difficulties, she claimed that she had become tired of school, which required too much effort.
Melissa was administered a number of effort measures and she achieved normal-range performance on all of them. Hence, we assume that her neuropsychological results were valid and reliable. On a measure of intellectual functioning (Multidimensional Aptitude Battery), her results were in the superior range and at the 93rd percentile. Reading skills on the Wide Range Achievement Test–3 were in the average range (53rd percentile). Listening abilities, memory functioning, abstract abilities, attention, visual search, and alternating mental set were all within normal to above-average ranges.
Assessment of personality functioning with the MMPI-2 indicated the presence of antisocial tendencies. A high score on Scale 4 suggested difficulty incorporating the values and standards of society and suggested that she was prone to engaging in a wide range of antisocial and conduct-disordered behaviour. It would be expected that Melissa would be rebellious toward authority figures. Moreover, the personality test results characterized Melissa as probably being impulsive and having significant difficulty with delaying gratification.
When the current results were compared with those from previous testing, some important discrepancies emerged. In the previous assessment, her intelligence was only average (FSIQ = 94), whereas it was now in the superior range (FSIQ = 122). No objective tests of effort were used in the previous psychological evaluation. The most likely explanation for her poor showing in the initial assessment was that she was not putting forth her best effort, which would be consistent with her antisocial tendencies and her expressed attitude of boredom and lack of motivation in school. In the current assessment, in contrast, she passed all effort measures. In a similar vein, her reading comprehension skills in the previous assessment were noted to be below the normal mean and at only the 23rd percentile. In this assessment, her reading comprehension skills were found to be at the 96th percentile using the Woodcock-Johnson Passage Comprehension subtest. This represents another discrepancy between test results produced over time. Melissa’s personality features and the presence of antisocial tendencies were thought to be contributing to significant differences in test results over time. It was also noted that there was a strong possibility of a mood disturbance. Her mood was slightly elevated when found to be making a full effort but she had previously been through at least one episode of major depression.
Effort Tests for Use in All Psychoeducational Assessments
All four cases discussed above demonstrate the importance and the necessity of using objective measures of effort in psychological and psychoeducational evaluations. Without these effort measures, we must rely on our clinical judgment for the analysis of inconsistent performance. In one of these cases, clinical judgment failed and invalid data were interpreted as valid. Poor effort has serious consequences, including inconsistent task performance, underestimation of the individual’s true abilities, and misdiagnosis.
Comments have traditionally been made about effort in nearly all assessment reports, reflecting the widespread belief that the effort applied to testing is of critical importance to the validity of the test results. It is surprising, therefore, that until recently there were no objective ways to measure effort levels in psychoeducational assessments. Over the past decade, however, several researchers have explored the possibility of testing young adults using symptom validity tests originally devised for use in neuropsychological assessments, such as the WMT (Green, 2003; Green et al., 1996; Green & Astner, 1995), the VSVT (Slick et al., 1998), and the TOMM (Tombaugh, 1997). New effort measures have been created for use in evaluation of adult LD, such as the DASH (Harrison et al., 2008) and the b Test (Boone et al., 2000). The WMT is available in several languages, including German, French, Spanish, and English.
Initial research showed that the Word Memory Test was a viable psychological instrument that could be used even with severely impaired children (Flaro & Green, 2000). In an extension of the latter study, 135 children between the ages of 7 and 18 years, as a group, achieved results on the WMT effort subtests similar to those from parents seeking custody of their children (Green & Flaro, 2003). In this study, most children diagnosed with fetal alcohol syndrome, schizophrenia, bipolar disorder, ADHD, conduct disorder, and LDs demonstrated no difficulty exceeding the adult cutoffs for good effort on the Word Memory Test, as long as they were 7 years of age or older and with a Grade 3 reading level or higher.
Recent research on the Word Memory Test continues to support this instrument as a potentially valuable objective measure of effort in postsecondary-aged students. Even those with severe reading disabilities are able to read and correctly respond to the words presented on the WMT (Larochette & Harrison, 2012), and as such it is not vulnerable to misclassifying postsecondary-level students due to their actual disability. In addition, Sullivan et al. (2007) demonstrated that almost half of those undergoing assessments for ADHD at their university-based clinic failed the WMT, and about 16% of those presenting for assessment of LD failed, suggesting that many young adults may be motivated to feign or exaggerate their symptoms to obtain a diagnosis.
Many adults tested with the WMT have an incentive to appear more impaired than they are because they are seeking compensation for disability. In more than 2,000 adults tested by the second author and by Dr. Roger Gervais (personal communication), 30% of cases failed the WMT effort subtests, compared with only 11% of 263 children tested clinically. In a recent analysis of data from 116 parents seeking custody of children, only two cases failed the WMT effort subtests (Flaro, Green, & Robertson, 2007). This is a very important group because it consists of adults who had a positive incentive to appear competent. The Court and the Department of Social Services relied on the results of the assessment to determine whether these parents were to be given custody of their children. In 65% of cases, custody of their children was denied. This group contained many people with very significant cognitive impairment, which was evident, for example, in the fact that 60% of cases made more than 60 errors on the Category Test and 20% had a FSIQ of less than 80. Nevertheless, 98.2% of cases passed the WMT effort subtests (i.e., scored more than 82.5% correct on immediate recognition, delayed recognition, and consistency of performance across the two subtests). Those judged unfit to be parents were of lower-than-average intelligence (mean FSIQ = 86, SD = 12) but their mean scores on the WMT effort subtests were almost perfect, at 98% correct (SD = 4) on immediate recognition, 98% correct (SD = 3) on delayed recognition and 97% (SD = 4) on consistency. In this sample of adults, the existence of false positive results (i.e., cases who failed WMT despite making a good effort) was zero. Two parents failed the WMT but they both admitted that they had made a poor effort on testing. They had changed their minds in the course of a drawn out custody battle and now did not want their children returned to their care. A year later, one of these parents returned for testing and passed the WMT.
Some SVTs, such as the Victoria Symptom Validity Test and the Validity Indicator Profile (VIP; Frederick, 2003), have been used to identify normal individuals instructed to feign ADHD or LD (Frazier, Frazier, Busch, Kerwood, & Demaree, 2008), but these measures have not yet demonstrated acceptable sensitivity and specificity in clinical populations. Indeed, the only measures to date that show potential for accurately discriminating feigners from those with actual LD or ADHD have been the Word Memory Test (Sullivan et al., 2007) and the Dyslexia Assessment of Simulation or Honesty (DASH; Harrison et al., 2008; Harrison, Edwards, Armstrong and Parker, 2010). The DASH, however, is an experimental test without a proven track record or independent replication of the findings.
Conclusions
We know that a participant’s level of effort during the assessment will determine the validity and the reliability of the test data. When individuals fail to put forth their best effort, we cannot be sure of their actual ability profile. As demonstrated in the four cases presented in this article, failing to identify less-than-optimal effort levels can have detrimental effects on the individual in terms of diagnosis, placement, programming, treatment intervention, and future educational opportunities. Without some way to measure effort levels objectively during an assessment, conclusions based on test results must be considered tentative at best. At the worst, they must be considered inaccurate and invalid. In the emerging field of symptom validity testing, we now have a number of objective effort measures, including but not limited to the WMT, the DASH, the TOMM, and the VSVT, which provide us with the means to detect objectively when effort is below the level needed to obtain valid test results. In the future, it is likely that any psychologist or neuropsychologist conducting psychoeducational assessments will be required to employ objective effort measures as an integral part of the test battery, just as it is now the accepted practice to incorporate such tests in adult neuropsychological assessments (Bush et al., 2005; Iverson, 2006). In times where resources and supports for students with disabilities may become scarce, it is of the utmost importance for clinicians to ensure that they are not inappropriately providing diagnoses to students who would not otherwise qualify for such a label and who are reducing the resources available to those who truly need such supports and services.
Footnotes
Dr. Green is the owner of Green’s Publishing, and is the developer and publisher of both the Word Memory Test and the Medical Symptom Validity Test.
The authors received no financial support for the research, authorship, and/or publication of this article.
