Abstract
From 1985 to 2009, public school enrollment in elementary and secondary schools rose 26%, and the number of students with disabilities in these schools rose steadily through 2005 (National Center for Education Statistics, 2009). One consequence of that growth has been an increase in the number of students seeking test accommodations under the Americans With Disabilities Act of 1990 (now the ADA Amendment Act, 2008). The most commonly requested test accommodation is extended time, particularly for students with learning disabilities (LDs) and ADHD (Ofiesh, Mather, & Russell, 2005).
Despite the large number of requests for test accommodations over the past 20 years, the process by which eligibility decisions are made for those services has had relatively no empirical guidance. Even the most basic assumption—that students with ADHD will inevitably struggle on timed tests—has never been validated. Moreover, missing is compelling scientific evidence that extended time and a distraction-free environment will ameliorate any purported test-taking deficits in individuals diagnosed with this disorder. Therefore, educational institutions and testing organizations are granting accommodations based more on intuition or advocacy than science.
Some research does show that students with ADHD have generally less academic success than peers. For example, Barkley (2006) noted that 10% to 39% of children with ADHD are found to have a LD, most often in reading. A recent meta-analysis supported that assertion by identifying a large effect size difference of 0.73 in reading skills in studies comparing students with and without ADHD (Frazier, Youngstrom, Glutting, & Watkins, 2007), and such problems were noted to continue for postsecondary students (N. S. Ofiesh, Hughes, & Scott, 2004; Polderman, Boomsma, Bartels, Verhulst, & Huizink, 2010). More generally, studies have documented that students with ADHD are more apt to have lower grades, standardized test scores, and work productivity, as well as higher rates of grade retention (Cordón & Day, 1996; Ofiesh et al., 2004).
In addition to deficits in reading and academic skills, students with ADHD may have weaker learning/study strategies. One study found that college students with ADHD self-report more difficulty than peers in note taking, outlining, summarizing information, and test taking (DuPaul, 2007). DuPaul also reported more difficulties with time management, use of appropriate test strategies, selecting the main ideas, and concentration. They tended to possess a negative attribution style to test performance as well as motivational deficiencies, including less persistence and preference for easier work. DuPaul and Volpe (2009) argued that cognitive (i.e., vigilance and memory) and behavior deficits (i.e., disruptive behavior) could mediate the impact of ADHD on achievement.
Test anxiety is another factor that might impair the test performances of students with ADHD, particularly on high-stakes exams such as the Scholastic Assessment Test (SAT). Because individuals with ADHD are prone to comorbid anxiety disorders, one might think that sitting for high-stakes exams might trigger an anxious reaction (Frazier et al., 2007). If true, it is possible that performance anxiety would impair performance. Indeed, a review of hundreds of studies found that higher test anxiety was consistently associated with lower performance on individual tests administered in laboratory settings, as well as with students’ grade point averages (GPAs) and standardized achievement test scores. A subset of those studies showed that high–test anxiety groups typically took longer than low–test anxiety groups (d = 0.3; Reaser, Prevatt, Petscher, & Proctor, 2007). To this point, college students with LDs have been found to experience significantly more anxiety toward timed, multiple-choice tests, including less confidence and greater amounts of stress; however, no research has examined this relationship for students with ADHD (Heiman & Percel, 2003; Stevens, 2000).
Given the various skill and strategy weaknesses associated with ADHD, coupled with negative self-perceptions and anxiety, it would seem likely that many students with ADHD would be at a disadvantage on standardized tests. However, no evidence exists to confirm that ADHD is actually associated with poor test taking. The purpose of the present study was to examine certain test-taking skills, strategies, and perceptions in high school students with and without ADHD. It should be noted that we were interested in speeded test taking, so our measures of test-taking skills include reading speed, vocabulary, decoding and comprehension under time constraints, use of time during the test, movement around the comprehension test (navigation), and amount of test items completed during a time period. Specifically, we compared students with and without ADHD on reading tasks (i.e., reading speed, decoding, vocabulary, and comprehension), test-taking variables (i.e., time usage, switches from passage to question to response, and reading strategy use), and self-perception variables (i.e., test anxiety, self-perception of testing and reading). We were particularly interested in differences in reading comprehension and what variables best predicted this outcome. We were also interested in comparing high and low comprehension performers, so as to see what variables were associated with doing well or poorly on a speeded reading comprehension test similar to what is found on most high-stakes exams. As TestTracker is a relatively new assessment system, we wanted to see how typical students would perform the tasks, and so the ADHD students were excluded from some analyses.
Method
Participants
A total of 1,003 high school students from 15 schools in New York State participated in the study. Each school district agreed to offer students an opportunity to have their test-taking skills assessed in exchange for feedback on their performance. A total of 784 of these students were included in analyses as participants (38 with ADHD). A number of students were not included in this particular study for the following reasons: (a) 128 had a different or comorbid diagnosis, (b) 3 did not complete all aspects of the assessment, (c) 21 (1 ADHD) had invalid profiles due to low effort scores, and (d) 67 (2 ADHD) had scores that fell more than two standard deviations from the mean on one of the reading speed (44), decoding (9), vocabulary (13), or comprehension (1) measures. Many of the students listed above either gave incomplete effort, lost track of what they were doing during a task, made a computer error, or performed in an irregular fashion (i.e., reading speed more than 1,000 words per minute).
Students were considered to meet criteria for ADHD if they self-identified, had a professional evaluation indicating an ADHD diagnosis, and were recognized by the school district as a student with special needs (Individualized Educational Plan [IEP] or 504 Plan). These students also had to be receiving test accommodations in the schools. In each school, a professional staff member confirmed that the students had a diagnosis and 504 Plan, and were eligible for test accommodations. We were unable to test children individually, so we relied on school determinations. To do this systematically, we required formal evidence of school acknowledgment of ADHD. It is possible that our screening method excluded some students with ADHD who were not receiving any school services. Students with a different primary diagnosis (i.e., autism, anxiety, bipolar disorder, LDs, etc.), although ADHD may have been noted as a comorbid diagnosis, were excluded from this study. This was done to recruit a more “pure” ADHD group, and avoid cases in which ADHD symptoms may be the result of a different disorder. We could not reliably ascertain ADHD subtype information or current use of medication at the time of testing. Students in the comparison group were drawn from the same classes and had to be free of diagnosed disorders or special education services and accommodations.
Table 1 lists the demographic data for both groups. Analyses showed no significant differences between groups for age, or distributions by grade, sex, race, or parent education. The sample did include more females than males, however, this difference is typical for a college bound population. The sample did not include dropouts, vocational trainees, and students with behavioral, emotional, or cognitive disabilities, most of which include higher rates of males.
Demographic Data for Students With and Without ADHD.
Materials
TestTracker
TestTracker is a computer-based (online) assessment system designed to measure test taking skills and behaviors similar to those needed on high-stakes tests. In this study, it was used to deliver all reading tests and questionnaires as well as to record performance and monitor behavior. TestTracker measures the activities of a test taker and the time spent on each activity to the millisecond. Once TestTracker is initiated, it guides students through a variety of tasks and questionnaires. Briefly, these include tasks of reading speed, comprehension, vocabulary, and decoding, as well as questionnaires on demographics, ADHD symptoms, strategy use, perceived reading speed, and need for extended time. All data gathered through TestTracker were routed to a main server for storage.
Reading speed test
Participants were presented with a reading passage in which they were instructed to read for comprehension, and as if time were an important factor. The reading passage was 389 words in length and had a Flesch–Kincaid readability level of 12.0 (Brown, Fishco, & Hanna, 1993). Using TestTracker, students clicked on a start button to see the passage and were instructed to click on a stop button when they had completed reading. This test took between 1 and 5 min. The number of words read per minute was recorded.
Reading comprehension test
Serving as the primary test of interest, all students were presented on TestTracker with passages and questions resembling a computerized and timed, high-stakes reading comprehension test. Ten passages with five questions each were available for presentation to ensure that most students would continue working for the entire 20-min period allocated to this test. Multiple prompts encouraged students to approach this test as they would any other high-stakes test (i.e., SAT). This measure was found to correlate moderately with the Nelson–Denny Reading Test (NDRT)—Comprehension test score (r = .51; Woodcock, McGrew, & Mather, 2001).
The order of the passages in the test was based on the level of readability, with the first passage being the easiest to read and the last being the hardest. The reading passages were developed to be similar to sample passages from the SAT and the GRE (Graduate Record Exam). By applying the Flesch–Kincaid Readability formula, it was determined that readability estimates on these passages varied between the 9th and 15th grade levels. In addition, similar to the GRE and SAT, all passages in the study had between 300 and 400 words. Furthermore, strong efforts were made to avoid subject material in which students would be likely to have background knowledge (e.g., history, literature, science). This was done so as to measure passage comprehension rather than previous learning. Based on the purpose of the study, it was critical that students were not able to simply look at the questions, without reading the passage, to answer correctly. Moreover, questions were composed across a range of difficulty and level of inference. TestTracker recorded data on the following comprehension measures: (a) total items attempted, correct, and percentage correct; (b) time utilization (time spent reading the passages, questions, and responses); and (c) navigation style (number of switches across the passage, question, and responses; self-reported strategy).
Vocabulary test
Vocabulary items consisted of a single target word followed by five possible choices. Participants were instructed to designate which word from the list of choices is a synonym to the target word. There were 80 items on the test and students had 2 min to answer as many of the items as possible. The number of items attempted and correct was recorded, yielding an overall percentage correct. This test was similar to vocabulary tasks on the ACT (American College Test), SAT, and GRE, and commercial tests such as the NDRT. Vocabulary target words were selected from sample ACT, SAT, and GRE exams and graded word lists (8th-16th grades). A total of 100 items were piloted on 500 high school and college students. Items with a difficulty level between 25% and 80% were retained and ordered from easiest to most difficult. In a validity study, the vocabulary percentage correct score from TestTracker correlated (r = .64) with vocabulary scores from the NDRT.
Decoding/word recognition test
This task was designed to assess a student’s fluency of word/nonword recognition. Participants were presented with letter strings (3-6 letters) and asked to decide if the string was a real word or a pseudoword (similar to a lexical decision task). Sixty words were selected from graded word lists (8th-16th) and 60 nonwords were generated that matched the string length, orthography, and approximate phonology of the real words (e.g., “aisle” vs. “niehl”). These 120 items were piloted and 90 items were retained whose difficulty levels ranged from 25% to 90%. Items were ordered from easy to hard, and students had 2 min to answer as many of the items as possible. The number of items correct and attempted was recorded, yielding an overall percentage correct. This test was based on similar standardized decoding tasks such as the Word Attack subtest of the Woodcock–Johnson–III (Scruggs & Mastropieri, 1988). In a validity study, this task correlated with the Word Attack subtest of the Woodcock–Johnson–III (r = .47).
Self-Evaluation of Performance on Timed Academic Reading (SEPTAR)
The SEPTAR (Kleinmann & Lewandowski, 2005) was employed to assess students’ self-perceptions of their reading speed in timed high-stakes situations (e.g., exams) and their perceived need for extra time. Nine statements such as “I am a slow reader,” “I have trouble finishing timed tests,” and “I could do better on my exams if I had additional time” were posed to each student. They were asked to rate their self-perceptions on a 5-point Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly agree). Scores range from 9 to 45 with higher scores reflecting students’ lower confidence in their timed academic reading skills. Initial psychometric analyses found good reliability (α = .89; Kleinmann & Lewandowski, 2005). An exploratory factor analysis found the scale to be unidimensional. In addition, scores on the scale were significantly related to reading speed (r = .41), reading comprehension (r = .39), and processing speed (r = .20; Kleinmann & Lewandowski, 2005). The SEPTAR took between 1 and 2 min to complete.
Timed Test Anxiety Inventory (TTAI)
This test was especially designed to assess student perceptions of anxiety on timed tests such as high-stakes exams. It was patterned after the Test Anxiety Inventory (TAI; Taylor & Deane, 2004). Nine items were presented using a 4-point scale ranging from 1 (almost never) to 5 (almost always). Scores range from 8 to 32 with higher scores reflecting an individual’s increased level of anxiety about taking tests. An example of an item is “timed exams make me particularly nervous.” Factor analysis of the TTAI revealed one factor with a Cronbach’s alpha of .80. The TTAI correlated with the TAI five-item scale at r = .69. The TTAI took between 1 and 2 min to complete.
Demographic questionnaire
Participants completed a brief computerized demographic questionnaire on TestTracker including questions about age, gender, ethnicity, year in school, estimated GPA, SAT scores, previous professional diagnoses (i.e., ADHD, LD in reading, anxiety, depression), other disabilities that would interfere with their test-taking ability (i.e., visual problems, arm or hand problem that limited the ability to use computer), use of test accommodations, and English as a primary language.
Effort measure
TestTracker contains a perceived effort measure. Participants are asked to report their effort (as a percentage from 1 to 100) on the reading comprehension test.
Results
First, the data were subjected to tests of normality (Levene’s Test for Equality of Variances). All variables met requirements for homogeneity of variance. Second, analysis of variance tests were performed to compare males and females on all dependent measures. There were no group differences found and so data were collapsed across sex for all analyses. It should be noted that the absence of sex differences has been reported elsewhere in persons with ADHD (Frazier et al., 2007).
Mean scores, standard deviations, and group comparison statistics are presented for both groups in Table 2. The data are more notable for similarities than differences. Students with ADHD produced significantly lower scores on comprehension accuracy (p < .05), vocabulary accuracy (p < .01), and decoding number correct (p < .05). Interestingly, they did not differ on comprehension and vocabulary (number of items correct), reading speed, test navigation (number of switches), test anxiety score, or self-perceptions of test taking. Also of note, the ADHD group spent less time reading the comprehension passages and more time examining the questions (both p < .05) than did its peer group. Clearly, most predictions were not supported by the results. In this sample, there were only a few modest differences in reading skills and none of the anticipated differences in navigation, strategy use, or self-perceptions. It should be noted that as an exploratory study, we did not use a Bonferroni correction for multiple significance tests. Thus, even these modest results must be viewed cautiously.
Mean Scores and Group Comparisons for ADHD and Non-ADHD Students.
Note: wpm = words per minute; SEPTAR = Self-Evaluation of Performance on Timed Academic Reading.
Number of times the reader switched from reading the passage or question or response.
p < .05. **p < .01.
Mean Scores and Group Comparisons for Top and Bottom 15% Comprehension Scores on TestTracker.
Note: wpm = words per minute; SEPTAR = Self-Evaluation of Performance on Timed Academic Reading. All Fs significant at p < .001.
Number of times the reader switched from reading the passage or question or response.
Next, we examined relationships among the variables within the non-ADHD group, with a particular interest in relationships to reading comprehension (number correct). As one might expect, vocabulary score correlated most strongly with comprehension (r = .75), followed by decoding score (r = .50) and reading speed (r = .45). Other significant relationships with comprehension included SEPTAR (r = −.42), test navigation (r = .34), and test anxiety (r = −.28). These data suggest that performance on a reading comprehension measure as found on the SAT and GRE might be multiply determined by one’s reading skills, self-perceptions toward testing, and approach to the task. Therefore, we included these key variables in a regression analysis to predict reading comprehension score. Five variables were found to be significant (p < .001) predictors of reading comprehension accounting for 62% of the variance: vocabulary (β = .57), SEPTAR (β = .46), reading speed (β = −.12), number of switches (β = .11), and decoding (β = .08).
Exploratory analyses were performed to examine the test-taking profiles of students that are most successful on the comprehension test (total number correct). We created top (n = 157) and bottom (n = 159) groups of performers from the non-ADHD sample based on the upper and lower 15% scores on the comprehension test. We were interested in determining what seems to make a good or poor test taker in general, and then to see if the ADHD group looked more like the high or low performers. A comparison of the high and low groups across our set of dependent measures revealed significant differences on every variable, indicating that the top group had better overall reading skills, more positive perceptions of testing, less test anxiety, and a more active navigation style (more switching from passage to question to response). Despite all these differences, the groups did not differ in the amount of time spent reading passages, questions, or responses. A regression was conducted on the comprehension score of the top group. Only two variables emerged as significant predictor variables predicting 42% of the variance: vocabulary (β = .38) and reading speed (β = .27). Last, the ADHD group performed between the high and low comprehension groups on all measures.
Discussion
The study findings are noteworthy in revealing that the test-taking performance of students with ADHD is nearly indistinguishable from typical peers. The only differences between the groups were modest; they mainly involved several reading variables (i.e., comprehension and vocabulary accuracy, decoding number correct), and yielded effect sizes less than .50. By contrast, the groups were similar on the number of items attempted and reading speed. As such, they did not differ on measures that examined speed, or number of items completed on a test. Furthermore, on measures of comprehension strategy, effort, test anxiety, and test-taking perceptions, the groups of students were no different. An additional finding was that the ADHD group spent less time than peers reading the passage, yet spent more time than peers reading questions. Last, the ADHD group performed similar to the average nondisabled student. In other words, they tended to score below the top 15% and better than the bottom 15% of the controls.
The results provide some support for previous findings suggesting that students with ADHD may not always be as accurate in their work, but that they do not necessarily take more time or work more slowly than peers. As noted by other researchers, students with ADHD often demonstrate performance deficits, such as making careless errors, on academic tasks (Barkley, 2006; Cordón & Day, 1996; DuPaul, 2007).
Our results support this finding on three of our reading variables. Interestingly, we did not find a difference in reading speed. However, the reading speed task involves silent reading with no measure of reading accuracy. It is possible that students with ADHD read as many words per minute as peers yet made more errors. On other measures related to time usage, the ADHD group apportioned time across passages, questions, and responses about the same as did peers. Moreover, the ADHD group was able to attempt the same number of items in a fixed time span as their peers on decoding, vocabulary, and comprehension tests. Reading speed, time used for reading, and rate of work per unit of time were identical to nondisabled peers. These results are supported by recent findings showing that children with ADHD completed significantly more problems correctly per minute when given standard time compared with extended time (Pariseau, Pelham, Fabiano, Massetti, & Hart, 2010), and Miller and Lewandowski (2012) found that college students with ADHD performed identically to peers on a reading comprehension test when given standard time, and dramatically better than peers when given extended time. The combined findings suggest that individuals with ADHD may not be very different from controls in the amount of test items they access in a set amount of time, yet may be more prone to careless mistakes. They tended to spend more time reading questions than passages relative to peers, which might suggest a strategy difference for some or a level of uncertainty regarding the question. It is possible that students with ADHD may need extra time to review their work, be sure they understand the questions, rather than make a hurried or careless answer.
The results from this study reflect a departure from parallel research on a similar disability group, students with LDs. Several studies have shown that students with LD perform less well on tests, lack effective test-taking strategies, voice more concerns about testing, and report higher than average levels of test anxiety (Cahalan-Laitusis, King, Cline, & Bridgeman, 2006; Heiman & Percel, 2003; LoShiavo & Shatz, 2002; Lufi, Ohasha, & Cohen, 2004). In this study, students with ADHD did not follow the LD pattern on these measures. They did not show elevated levels of test anxiety, did not have negative self-perceptions of testing, and used the same strategies as did their peers. Thus, it appears that high school students with ADHD are not greatly disadvantaged in a testing situation, particularly with regard to completing test items on a timed exam.
Test taking, in general, has not been a widely studied phenomenon despite its importance and popularity in education. The present study selected reading comprehension as the preeminent measure in the test battery, largely because it is similar to sections on most high-stakes exams (i.e., SAT, ACT, GRE, LSAT [Law School Admission Test]). First, we found that reading accurately and having a good vocabulary are strongly related to comprehension performance. Second, confidence in one’s testing ability and low levels of test anxiety help to predict comprehension performance. Interestingly, when we compared low and high comprehension performers, again vocabulary and reading speed were important predictors of comprehension outcome. The main reasons for the high-low analyses were to see what variables differentiated these types of performers, what characterized the successful test taker, and how would students with ADHD fare in relation to the high and low groups. As might be expected, successful test takers had better reading speeds and vocabularies, were more confident and less anxious about test taking, and made more switches on the comprehension task, suggesting a more active search strategy for correct answers. Students in the ADHD group performed better than the bottom 15% on every variable, suggesting that they are more average than impaired on the measures used in this study.
In this study, it seems that there are clear predictors of performance on our simulated high-stakes reading comprehension test, and of these, students with ADHD show slight weaknesses in vocabulary accuracy (make more errors) and decoding (get fewer correct). Again, it should be noted that these are performance deficits and not speed-related deficits per se. It could be that students with ADHD sacrificed accuracy for speed, although there is no way to know if this trade-off occurred. If students rush to get through an exam and make more errors, then they may be in need of some extra time to review work and correct errors, or alternatively, they may need to be taught how to maintain high accuracy on a timed test. One could use TestTracker to do such individual analysis of test taking, but such analysis is obscured when examining group data.
The results from this study must be viewed cautiously for several reasons. First, this study is a first of its kind, and represents an initial attempt to examine test-taking behaviors in students with ADHD. Second, the students with ADHD were not evaluated or diagnosed within the study. The ADHD designations were applied by clinical professionals and verified by participating high schools. Each student qualified for an IEP or 504 Plan and was eligible to receive test accommodations. In the educational system, these students were considered as having ADHD; however, we did not verify these diagnoses. In addition, we could not control the use of medication by the ADHD students. Of the 38 students, 30 indicated that they took medication for the disorder. This could have had an effect on the rather positive performances delivered by this group. It should be noted that they take most tests under the same medication conditions, even when getting test accommodations like extra time or a separate room. Thus, the withholding of medication is not a viable manipulation for such a study and would actually hinder ecological validity. As noted earlier, this was also a somewhat “purified” ADHD sample. We chose to rule out comorbid conditions that were considered primary and have substantial symptom overlap with ADHD (i.e., bipolar disorder). This likely restricted our ADHD sample and may have removed some of the more impaired examinees. A future large-scale study could examine a variety of clinical groups including a comorbid group, particularly if the diagnoses can be confirmed within the study.
A potential limitation in this study is the use of a new technology for examining test-taking performance. TestTracker is an online assessment system that attempts to delineate a student’s test-taking profile. Research shows that it has adequate psychometric properties and that it measures what it purports to measure; however, limited research is available to date. A recent study applying this system to high school students with and without LDs found robust differences on all reading, time usage, reading strategy, and test perception variables (Berger, Lewandowski, & Gordon, 2010). These findings add to the validity of TestTracker as an assessment system, as well as highlight differences in test taking between students with ADHD and LD. It should be noted that both of these studies did not induce a high-stakes testing environment, and that the TestTracker tasks are of a much shorter duration than those found on high-stakes exams like the SAT. Moreover, the variables measured on TestTracker do not take into account test-taking behaviors such as eliminating foils, educated guessing, selecting main ideas, or a host of other behaviors. Thus, there may be differences in test taking between the groups that we were unable to measure. Future research may be able to examine a broader set of test-taking variables than the present study, and ideally could investigate areas other than reading (i.e., writing and math).
Despite the exploratory nature of the study and aforementioned limitations, this work demonstrates the feasibility and potential utility of research on test-taking behaviors. For example, we found that students with ADHD tended to make more errors than peers, but read just as fast, attempted as many items, and managed time similarly to peers. This suggests a need to identify and remediate the sources of inaccuracy, not merely to extend testing time. Taking tests is a major educational enterprise that warrants further study. Students with and without disabilities may struggle on classroom and high-stakes tests, thus restricting opportunities for educational and professional advancement. Perhaps systems like TestTracker can enable a better understanding of test-taking strengths and weaknesses in our students, thereby informing interventions that allow students to demonstrate what they know. Teaching students how to overcome their weaknesses and perform up to their capability on tests may reduce the reliance on extended time as a panacea.
Footnotes
Acknowledgements
The authors would like to thank the students and high schools for their participation, Joshua Gordon for his computer programming expertise, and Cassie Berger, PhD for her assistance.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Lewandowski and Dr. Gordon developed TestTracker. We used our own money to develop this tool. It has not been sold commercially yet, but we hope to get it to market eventually.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
