Abstract
College students with attention deficit hyperactivity disorder (ADHD) often request and receive extended time to complete high-stakes exams and classroom tests. This study examined the performances and behaviors of college students on computerized simulations of high-stakes exams. Thirty-five college students with ADHD were compared to 185 typical peers on measures of reading decoding, speed, vocabulary, comprehension, test strategies, time management, and test anxiety. Results indicated no differences between students with and without ADHD on various reading (decoding, speed, vocabulary, and comprehension) and test-taking variables (time utilization, navigation style, or strategy use), yet significant differences were present regarding their perceptions of, and anxiety during, test taking. It appears that students with ADHD perform similarly to peers on timed reading tests, although they think they perform less well and worry more about their performance.
Enrollment in postsecondary institutions has risen 38% between 1999 and 2009 to 20.4 million students (National Center for Education Statistics, 2010). Most of these institutions require students to take standardized, high-stakes admission tests such as the Scholastic Assessment Test (SAT) and the American College Testing (ACT) exam. Because they are not provided information about those factors that contributed to their low scores, students who struggle on these tests have limited options for improving performance other than practicing on items of similar content. Indeed, low performance can result from a range of weaknesses, both in academic skills and test-taking behaviors. Unfortunately, students only receive feedback on their scores and rankings.
Students with disabilities are at particular risk to underperform on high-stakes tests. Without an assessment and intervention for their test-taking weaknesses, they are often left to apply for test accommodations, a practice that has been on the rise for the past two decades (Cox, Herner, Demczyk, & Nieberding, 2006; Ranseen & Parks, 2005). Although test agencies and educational institutions are legally mandated to provide students with disabilities equal access to exams and, therefore, must evaluate students and manage their test accommodation requests, almost no research exists to inform the test accommodation decision process.
Students with ADHD represent one of the largest disability groups in college populations (Weyandt & DuPaul, 2006). Yet, children and young adults with ADHD often demonstrate academic difficulties that result in lower grades and test scores, more grade retention and dropping out of school, more diagnoses of learning disabilities, and less overall academic success than peers (see reviews in Barkley, 2006; Barkley, Murphy, & Fischer, 2008; DuPaul & Volpe, 2009; Weyandt & DuPaul, 2006). For example, a meta-analysis demonstrated significantly lower achievement scores for students with ADHD (Frazier, Youngstrom, Glutting, & Watkins, 2007). Similarly, DuPaul (2007) found the symptoms of ADHD to be predictive of future academic problems (e.g., grade retention, special education, and high school drop-out). DuPaul also noted that on standardized achievement tests, students with ADHD scored between 10 and 30 points lower than their peers. The meta-analysis involved 72 studies conducted since 1990. Moderate effect sizes (d = 0.55-0.73) were found between participants with and without ADHD, with the largest effect occurring in the content area of reading (d = 0.73). Frazier et al. specifically investigated academic problems in college students using average effect sizes from the meta-analysis for comparison to the college students. It was concluded that the academic problems associated with ADHD extended beyond childhood and into the college years.
In addition to weaknesses in reading and academic skills, students with ADHD may have weaknesses in learning and study strategies. A recent study (Reaser, Prevatt, Petscher, & Proctor, 2007) found that college students with ADHD reported greater levels of difficulty than peers in note taking, outlining, summarizing information, and test taking. Students with ADHD also reported more difficulties with time management, use of appropriate test strategies, selecting the main ideas, and concentration. Finally, they tended to possess a negative attribution style with regard to their test performance as well as motivational deficiencies, including less persistence and preference for easier work. DuPaul et al. (2004) further indicated that classroom behaviors such as motivation, study skills, and academic engagement may mediate the lower test scores of this group. Some have explained the academic skill and procedural problems of students with ADHD in terms of deficits in executive functioning (eg. Barkley, 1997, 2006).
Another factor that could affect the test performance of students with ADHD, particularly on high-stakes tests such as the SAT, is test-related anxiety. Individuals with ADHD have a greater likelihood of anxiety disorders than the general population (Barkley, 2006). It could be the case that a high-stakes test situation exacerbates anxiety for these individuals. In a meta-analysis, Hembree (1988) reviewed more than 500 studies on text anxiety, finding that it was consistently associated with lower performance on tests, as well as with lower GPAs and standardized test scores (d = 0.52). However, it is difficult to determine whether anxious students perform worse due to their anxiety or whether poorer performance on tests heightens an individual’s anxiety regarding performance.
If students with ADHD do indeed have skill and strategy weaknesses coupled with negative self-perceptions and anxiety, they may be at a disadvantage on standardized tests. Many high-stakes tests involve reading comprehension, and this is a skill area shown to be weak in students with ADHD. The purpose of the current study was to examine test-taking skills, strategies, and perceptions in college students with and without ADHD to determine ways in which these students differ when taking standardized tests. We expected college students with ADHD to perform more poorly than peers on various measures of reading and test-taking abilities and to have more negative self-perceptions of their test performance.
Method
Participants
A total of 220 students from a private university in the Northeast, U. S. were enrolled in the study (35 with ADHD, 185 without ADHD). They were recruited from introductory psychology classes and received research credit for participation. To be included in the ADHD group, participants had to first identify themselves as having been professionally diagnosed with ADHD. In addition, they had to either be approved to receive accommodations at the postsecondary level and/or meet criteria for ADHD on the ADHD Self-Report Scale (ASRS) symptom checklist (Kessler et al., 2005). In other words, those who were not receiving test accommodations had to meet criterion (four or more symptoms) on the ASRS (six-item form), and non-ADHD peers had to score below criterion (less than four symptoms). Of the ADHD sample, 35 students indicated they had a professional diagnosis and 26 reported that they received test accommodations at the university; the remaining 9 students had clinically significant symptom counts on the ASRS. Twenty-one of the ADHD students reported that they take medication for ADHD, yet only six reported to have taken this medication on the day of testing. There were no differences in reading comprehension detected between those on or off medication, so all 35 students with ADHD were included in analyses. Due to high comorbidity levels of learning disabilities, anxiety, and depression in students with ADHD at the college level, students with these diagnoses were not excluded in analyses. Fourteen students in the sample with ADHD had at least one comorbid diagnosis (i.e., nine of these students reported diagnosed anxiety, six reported depression, and seven reported a learning disability). Students without ADHD were drawn from the same class, had English as a primary language, no history of psychiatric diagnosis, and fewer than four symptoms on the ASRS checklist.
Demographic data for both samples are presented in Table 1. Participants ranged in age from 17 to 37; the mean age for the ADHD group was
Demographic Data
Note: SES = socioeconomic status; GPA = grade point average; SAT = Verbal SAT test score. SES estimate calculated based on the four-factor Hollingshead formula (Hollingshead, 1975). Based on missing data, sample sizes by variable are as follows: age = 35 ADHD, 185 peer; SES = 33 ADHD, 178 peer; GPA = 32 ADHD, 126 peer; and Verbal SAT = 22 ADHD, 95 peer.
p < .05. **p < .01.
Materials
TestTracker
TestTracker is a web-delivered assessment system designed to measure testing skills and behaviors displayed on high-stakes tests. In this study it was used to deliver all reading tests and questionnaires as well as to record performance and monitor behavior. TestTracker guides students through a variety of tasks: reading speed, comprehension, vocabulary, decoding, and various self-report measures. The focus on reading is based on the most common component of many high-stakes tests, reading comprehension. The TestTracker measures are described further in the order they were administered.
Reading speed
Participants were presented with a reading passage in which they were instructed to read for comprehension, but as if time were an important factor. The reading passage was 389 words in length and had a Flesch–Kincaid readability level of 12.0. Using TestTracker, students clicked on a start button to see the passage and were instructed to click on a stop button when they had completed reading. This test took between 1 and 5 minutes. The number of words read per minute was recorded. The passage was followed by two comprehension questions as a check on engagement. Students needed to get at least one answer correct to continue. The reading speed task was patterned after standardized measures such as the Reading Rate subtest of the Nelson–Denny Reading Test (NDRT; Brown, Fishco, & Hanna, 1993). Correlations with other such tasks have been modest (Berger, 2010; NDRT, r = .41; Reading Fluency subtest from the Woodcock Johnson-III Tests of Achievement (Woodcock, McGrew, & Mather, 2001), r = .35; and oral reading fluency, r = .62).
Reading comprehension
The reading comprehension task consisted of 10 passages, each followed by 5 multiple-choice questions. Students had 20 min to complete as many of the questions as possible. This task was patterned after similar tasks found on the SAT, ACT, and GRE exams as well as commercial tests such as the NDRT. Several factors were considered in the development of this test. First, the readability of all passages on the Flesch–Kincaid Grade-Level estimate ranged from 9th- to 15th-grade level. The order of the passages in the test was based on level of readability, with the first passage being the easiest to read and so on. The reading passages were 300 to 400 words in length, similar to SAT and ACT exams. Also, efforts were made to avoid subject material in which college students would likely have background knowledge (e.g., college- or high school–level history, literature, science, etc.). Items included a balance of inferential and factual questions. The comprehension task was scored for (a) total items attempted, correct, and percent accuracy; (b) time utilization (time spent on each passage, question, and answer); (c) navigation style (number of switches across passage, question, and answer); and (d) strategy employed. The results of a validity investigation (Berger, 2010) found that the comprehension score from TestTracker had an internal consistency reliability of .81, a concurrent validity coefficient of .51 with the NDRT comprehension score, and demonstrated a large effect size difference between high school students with and without learning disabilities (d = 0.84).
Vocabulary
This test was similar to vocabulary tasks on the ACT, SAT, and GRE and commercial tests such as the NDRT. Items presented a single target word followed by five possible choices, one of which was a synonym to the target word. There were 80 items on the test and students had 2 minutes to answer as many items as possible. The number of items correct and attempted was recorded, yielding an overall percent correct. Vocabulary target words were selected from sample ACT, SAT, and GRE exams and graded word lists (8th to 16th grades). A total of 100 items were piloted on 500 high school and college students. Items with a difficulty level between 25% and 80% were retained and ordered from easiest to most difficult. Berger (2010) found that the vocabulary accuracy score from TestTracker had a internal consistency reliability of .82, a concurrent validity coefficient of .64 with the NDRT vocabulary score, and demonstrated a large effect size difference between high school students with and without learning disabilities (d = 0.91).
Decoding/word recognition
This task was designed to assess a student’s fluency of word/nonword recognition. Participants were presented with letter strings (3-6 letters) and asked to decide whether the string was a real word or a pseudoword (similar to a lexical decision task). Sixty words were selected from graded word lists (8th to 16th grades), and 60 nonwords were generated that matched the string length, orthography, and approximate phonology of the real words (e.g., “aisle” vs. “niehl”). These 120 items were piloted and 90 items were retained whose difficulty levels ranged from 25% to 90%. Items were ordered from easy to hard. Students had 2 minutes to answer as many items as possible. Berger (2010) found that this task had an internal consistency reliability of .86, correlated .47 with the Word Attack subtest of the Woodcock Johnson-III, and demonstrated a large effect size difference between high school students with and without learning disabilities (d = 0.84).
Demographic questionnaire
Participants completed a brief computerized demographic questionnaire on TestTracker including questions about age, gender, ethnicity, year in school, estimated grade point average, SAT scores, socioeconomic status, whether they had ever received any relevant diagnoses, whether they had any other disabilities that would interfere with their test-taking ability, and whether English was their primary language.
ADHD Self-Report Scale (ASRS; Kessler et al., 2005)
Each participant completed the ASRS to determine the extent of his or her ADHD symptoms. The ASRS version used in this study is composed of six items and has been determined to be psychometrically equivalent to the full ASRS (composed of 18 items). Scale ratings were used to verify that the participants in the ADHD group positively endorsed at least four or more items in the clinically significant range. Internal consistency estimates range from 0.63 to 0.72. Test–retest reliability estimates have been in the range of 0.58 to 0.77. Validity of the ASRS has been shown through strong concordance with clinician diagnoses (area under the receiver operating characteristic curve of 0.90). The scale is recommended by the World Health Organization as a screener for adults with ADHD.
Self-Evaluation of Performance on Timed Academic Reading (SEPTAR)
The SEPTAR (Kleinmann & Lewandowski, 2005) was employed to assess students’ self-perceptions of their reading speed in timed high-stakes situations (e.g., exams) and their perceived need for extra time on tests. The scale consists of nine items such as, “I am a slow reader,” “I have trouble finishing timed tests,” and “I could do better on my exams if I had additional time.” Statements were rated on a 5-point Likert-type scale ranging from 1 (strongly disagree) to 5 (strongly agree). Scores on the scale have been found to be significantly related to NDRT reading speed (r = .41), NDRT reading comprehension (r =.39), and WAIS-IV (Wechsler Adult Intelligence Scale, 4th ed.) processing speed score (r = .20).
Strategy question
A question regarding reading comprehension strategy use was created for this study based on research regarding the way in which students approach comprehension tests (Daneman & Hannon, 2001; Farr, Pritchard, & Smitten, 1990). Students were asked to select the strategy that best fit their approach on the comprehension task. The six choices were (a) read the entire passage thoroughly and then answered questions, (b) skimmed the passage and then answered questions, (c) read the question(s) and then went back to read the entire passage, (d) read the question and then went back to skim the passage to find the correct answer, (e) read the question first and selected an answer based on prior knowledge, and (f) read the question first and selected an answer based on an educated guess.
Timed Test Anxiety Scale (TTAS)
This test was especially designed for this study to assess student test anxiety on timed tests such as high-stakes exams. It was patterned after the Test Anxiety Inventory (TAI; Taylor & Deane, 2004). Nine items were endorsed on a 4-point scale ranging from 1 (almost never) to 4 (almost always). An example of an item is “timed exams make me particularly nervous.” Factor analysis of the TTAS revealed one factor with a Cronbach’s alpha of .80. The TTAS correlated with the TAI 5-item scale at r = .69.
Effort measure
A measure of perceived effort was obtained by asking students what level of effort (expressed as percentage from 0 to 100) they put into completing the comprehension test. Participants whose effort test scores were 2 standard deviations below the mean were excluded from the study (n = 4, all non-ADHD students).
Procedure
Participants were tested in small groups in a classroom and were provided laptops to complete their tests (< 15). The entire battery of tests was automated, including instructions for each task. The order of tests was: reading speed, reading comprehension, decoding, vocabulary, test anxiety and self-perception scales, and demographic questionnaire. During the reading comprehension task, participants had to move the cursor over what they wished to view (i.e., passages, questions, answer choices). While the cursor was over a section the participant was able to see the entire text in that section. This method prevented participants from seeing more than one section at a time. For all tasks, TestTracker was designed so that participants were required to select a response for each item and were not allowed to return to previous questions once they had passed them. TestTracker displayed both the question number and the amount of time remaining in the test. All test movements were tracked and recorded to the millisecond. The entire battery took approximately 40 min.
Results
Perceived Effort
The perceived effort estimates add to the likelihood that the results of the reading measures are reliable. The effort estimates were 87.7% for ADHD students and 89.3% for peers, indicating relatively high and similar reports of effort.
Demographics
Because there were different proportions of men and women in the two groups, we examined potential sex differences on test-taking variables. No significant differences existed between men and women, while considering the entire sample (n = 220) or in just considering the non-ADHD sample (n = 185). Specifically, in considering the entire sample, there were no significant differences between groups on reading speed, t(218) = 0.10, p = .99; comprehension score, t(218) = 0.46, p = .65; vocabulary score, t(218) = 0.31, p = .56; or decoding score, t(218) = 1.94, p = .06. There were no significant differences between men and women in the total time they spent on passages, t(218) = 0.53, p = .60; total number of section switches, t(218) = 0.83, p = .41; SEPTAR score, t(218) = 1.08, p = .28; or TTAS score, t(218) = 1.07, p = .29.
Group Comparisons on Reading Measures
The Levene test was used to test equality of variances between groups for all key variables. Normal distributions and homogeneity of variance were found for all measures. Group comparisons were based on the t-test procedure (using weighted means) for unpaired samples that have unequal sizes and equal variance. Table 2 summarizes the group (ADHD vs. peer) comparison data on the primary reading measures (i.e., speed, comprehension items correct and attempted, vocabulary items correct, and decoding items correct). Comparisons of group means yielded no significant differences between the ADHD and peer groups on any of the performance measures. Specifically, there were no statistical differences between groups for reading speed, t(218) = 1.30, p = .20; comprehension items correct, t(218) = 0.54, p = .59; comprehension items attempted, t(218) = 0.54, p = .59; vocabulary, t(218) = 0.48, p = .63; or decoding, t(218) = 0.63, p = .53.
Group Differences on Reading and Test-Taking Variables
All p > .10.
Group Comparisons on Test-Taking Measures
As described previously, test-taking measures included time utilization (the amount of time one spent on the passages, questions, and responses) and navigation style (the number of switches one made between the passages, questions, and responses). The ADHD group did not differ from the peer group in total time spent on passages, t(218) = 0.42, p = .57; time on questions, t(218) = 1.67, p = .10; or time on responses, t(218) = 0.04, p = .96. Similarly, the groups did not differ when the total number of switches, t(218) = 0.56, p = .58, was examined (see Table 3).
Group Differences on Perception Variables
Note: SEPTAR = Self-Evaluation of Performance on Timed Academic Reading; TTAS = Timed Test Anxiety Scale; ASRS = ADHD Self-Report Scale.
p < .001.
Test taking was further analyzed based on self-reported strategy use. Subjects were asked to select one of six strategies that they used during the comprehension test. The ADHD group did not differ from the peer group regarding the strategy used, χ2(5, 220) = 0.48, p = .92. The majority of students in both groups (55% of ADHD, and 57% peers) chose the strategy, “Read the entire passage thoroughly and then tried to answer each question.” The second most popular strategy (34% ADHD, 32% peers) was, “Read the question and then went back to skim the passage to find the correct answer.” Only a small percentage of students (2%-5%) endorsed two of the other four strategies, “Skimmed the passage and then answered questions,” and “Read the question(s) and then went back to read the entire passage.”
Group Comparisons on Perception Measures
Data comparing the ADHD and peer groups on self-report measures are summarized in Table 3. Significant group differences existed in responses on the SEPTAR, the TTAS, and the ASRS. Specifically, the ADHD group reported higher scores on the SEPTAR, t(218) = 5.66, p < .001, d = 1.05, indicating that this group perceived themselves to experience more difficulty in reading under timed conditions than their peers. In addition, the ADHD group reported higher scores on the TTAS, t(218) = 5.93, p < .001, d = 0.85, indicating that the ADHD group perceived themselves as more anxious about taking tests than their peers. In addition, the data from the ASRS-v1.1–verified group status. As expected, the ADHD group perceived themselves to have more inattentive and impulsive symptoms than the peer group, t(218) = 14.29, p < .001, d = 2.29. Finally, mean scores on “perceived effort” were not significantly different between groups, t(218) = 0.75, p = .45.
Discussion
Contrary to our predictions, we found no differences between groups on any of the reading tests. Students with ADHD demonstrated comparable reading speed, word recognition, vocabulary, and comprehension to that of the peer group. They also attempted as many items as peers on each timed task and spent the same amount of time as peers reading passages. Their approach to testing was no different from peers, preferring the same comprehension strategies and navigation style (i.e., number of switches). Despite these similarities, those with ADHD perceived themselves as having more difficulty in reading under timed conditions and reported more test-related anxiety than their peers. It also should be noted that both reading speed (d = .26) and time spent reading questions (d = .30) showed small effect size differences between groups in favor of non-ADHD students. These may be areas that would be significant with a larger, more impaired ADHD sample.
The ADHD group did not differ significantly from peers on the four reading tasks or in test-taking behavior (i.e., time management, number of switches, comprehension strategy). These results were surprising given the academic performance (Barkley, 2006; DuPaul, 2007; Frazier et al., 2007) and executive functioning deficits (Barkley, 2006; Willcutt, Doyle, Nigg, Faraone, & Pennington, 2005) that individuals with ADHD have demonstrated in other studies. These results were even more surprising given that these students with ADHD often receive 50% to 100% extra time on tests yet performed as well as peers on speeded tests.
Several explanations for the similarity between the ADHD and peer groups seem plausible. It is possible that some students in this sample may have a mild form of ADHD and/or are higher functioning than a noncollege ADHD group. These students were attending a competitive university and had above-average SAT scores and higher socioeconomic status than peers. Therefore, they may be more academically competent and better test takers than a general sample of individuals with the disorder. It may be that these ADHD students were not experiencing a high degree of academic impairment, and consequently, had reading skills comparable to peers.
Another interpretation is that some students included in the ADHD group may have been inaccurately diagnosed with ADHD. It is possible that some of these students were diagnosed based on their symptom reports without meeting the DSM-IV-TR (Diagnostic and Statistical Manual of Mental Disorders, 4th ed., text rev.; American Psychiatric Association, 2000) impairment criterion (Gordon et al., 2006; Joy, Julius, Akter, & Baron, 2010). If colleges rely on self-report of symptoms or a previous diagnosis based on only symptoms, they may be providing resources to students who are not significantly impaired in their actual functioning. We did not provide clinical testing of psychoeducational functioning; thus, we could not determine whether the ADHD group had areas of impairment in everyday life. Another confounding factor is medication use. It is possible that some in the ADHD group (n = 6) were aided by the use of medication and, thus, the disorder did not negatively affect their test performance as much as it would otherwise.
Not only did students with ADHD perform as well as their peers on various reading measures, they also attempted as many test items as their peers under timed conditions, had close to the same reading speed, and used their time in the same way as peers. Interestingly, a recent study found that children with ADHD completed significantly more problems correctly per minute when given standard time compared with extended time (Pariseau, Pelham, Fabiano, Massetti, & Hart, 2010), another indication that their problem is not too little time to complete work. These findings “coupled with our results” suggest that students with ADHD, based on their performance speed, use of time, and number of items completed, may not warrant more time than peers to take a test. Hence, care must be exercised in confirming diagnoses, assessing for an ADA level of substantial impairment, and apportioning extended time to anyone with a diagnosis. This is particularly important in light of research that shows virtually all students improve performance on a speeded test when given extended time (for reviews, see Lovett, 2010; Sireci, Scarpati, & Li, 2005). As we have noted elsewhere, universal design in testing (i.e., developing tests and procedures that allow greater access to all examinees) seems to be a more valid and fair approach than guessing who should get more time and how much (Lewandowski, Lovett, Parolin, Gordon, & Codding, 2007; Lewandowski, Lovett, & Rogers, 2008).
Interestingly, the only significant differences in the study involved the student self-perceptions. Even though they did not perform differently, students with ADHD perceived themselves as being slower readers and inferior test takers. They also expressed more anxiety about taking timed tests. These results correspond to the findings of Reaser et al. (2007) who also found that college students with ADHD reported difficulties with test taking and associated activities. The results of our study raise questions regarding whether some individuals in college who are receiving test accommodations are simply those who are self-conscious or anxious enough about testing to actually seek accommodations. Alternatively, perhaps students with ADHD, even mild forms of the disorder, have received negative feedback about test performance, which in turn has lowered their self-efficacy with regard to taking timed tests.
The findings and implications of this study should be considered in light of several limitations. As noted, we did not individually assess students to make diagnoses or determine one’s level of impairment. Also, this was not a high-stakes testing condition, and our measures were quite brief compared to an SAT or an ACT. It is likely that our ADHD sample may be higher functioning than the ADHD population in general, thus constraining the generalizability of our findings. The sample sizes were quite uneven as a result of sampling a large introductory class, and although this was considered in analyses, there is an increased chance of confounded results.
Given the lack of research in the area of test taking, the increasingly more advanced assessment software that is becoming available, and the feasibility of this research protocol for widespread administration, several directions for further research exist. First, it seems essential that the battery of tests (TestTracker) used in this study undergo additional validation, if it is going to be a useful tool for research and clinical intervention. Also, studies like this one need to be replicated with larger and broader samples of participants, particularly including different subtypes of ADHD. In addition, it would be interesting to test other clinical groups as compared to ADHD students, such as those with anxiety, learning disabilities, and individuals who speak English as a second language. There is reason to believe that these types of students might struggle with speeded, standardized exams.
The findings of the current study, though tentative and requiring replication, do suggest that college students with ADHD do not differ from peers in their test-taking speed, skills, or time management, yet may be receiving as much as 100% extra time on exams. Perhaps educational and testing agencies, and/or clinicians making the ADHD diagnosis, should examine test-taking skills before awarding double time on an exam to someone merely with a diagnosis. It would seem that a data-based approach to the use of test accommodations such as extended time is worth considering. Also, it seems reasonable to use an assessment system such as TestTracker to identify test-taking weakness in students that could be targeted for intervention.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
