Abstract
Students often feel time pressure when taking tests, and students with disabilities are sometimes given extended time testing accommodations, but little research has been done on the factors that affect students’ test-taking speed. In the present study, 253 students at two colleges completed measures of processing speed, reading fluency, and self-reports of their reading and test-taking skills, as well as a standardized paper-and-pencil reading comprehension task. The time taken to complete the reading comprehension task was not significantly related to students’ accuracy on the task, but it was predicted by students’ reading fluency and by their self-reports of problems with timed reading/test-taking. Students’ processing speed did not significantly predict comprehension task completion time or accuracy when reading fluency and self-reports were held constant. We discuss the implications of these and other results for making determinations about extended time testing accommodations, as well as for future research studies.
Keywords
Many students feel time pressure when taking exams, both in their classes and in high-stakes contexts (e.g., admissions tests, statewide tests, certification tests). In one study (Lewandowski, Lovett, Codding, & Gordon, 2008), almost half (45.4%) of a large sample of nondisabled college students reported not performing well on timed standardized tests; an even higher proportion of students with attention deficit hyperactivity disorder (ADHD) diagnoses (67.7%) reported the same problem. In a related study (Lewandowski, Lambert, Lovett, Panahon, & Sytsma, 2014), 87% of nondisabled college students reported that having 50% additional time would benefit them on high-stakes tests; about the same proportion of students with disabilities (88%) agreed. Even with regard to typical classroom exams, Lewandowski et al. (2014) found that 73% of students believed that having 50% additional time would benefit their scores. As timed tests are so common, many successful test-taking strategies involve managing time appropriately (Dodeen, 2008). However, little is known about why students vary in the time that they take to complete tests.
Students with recognized disabilities—especially learning disabilities and ADHD—are often given additional time to complete tests as part of a package of academic accommodations (Newman et al., 2011). However, the provision of extended time accommodations is controversial (Lovett & Lewandowski, 2015). Giving different students different amounts of time to complete tests can lead to unfair score comparisons, especially on high-stakes tests where students compete for selective opportunities, such as admission to college or graduate programs (Freedman, 2003). Even in noncompetitive situations, giving additional time can artificially inflate scores, overestimating students’ skill levels, and keeping a test from measuring how fluent and automatic students’ skills are (Phillips, 2011). Scholars have suggested that diagnostic test data be used to estimate how much extended time a student with a disability condition should be given on real-world exams (Ofiesh, Hughes, & Scott, 2004), but no clear guidelines have been developed for translating specific diagnostic test scores into precise amounts of needed additional time. This is another reason that it is important to understand the determinants of test completion time.
Prior Research on Test Completion Time
Most past research on test completion time has examined its relationship to test performance in classroom settings; research has investigated whether students who take longer to finish tests tend to do better or worse than their peers. The relationship between performance and completion time is often found to be nonsignificant (Bridges, 1985; Foos, 1989; Nevo & Spector, 1979) but sometimes there are small effects (Landrum, Carlson, & Manwaring, 2009). When effects are found, researchers usually find higher performance among students who take less time to complete the tests, but again, the effects are inconsistent and tend to be very modest in magnitude, and scholars have often concluded that there is no substantial, robust relationship between completion speed and overall performance on classroom tests.
A different research strategy involves predicting whether a student would benefit from additional testing time, an indirect index of their speed of test-taking. Ofiesh, Mather, and Russell (2005) gave a diagnostic battery of brief speeded measures to 43 college students with learning disabilities, and then gave each student 20 min to complete one version of a reading comprehension test, and gave them 32 min (60% additional time) to complete a second version of the test. Students were said to “need” additional time if they were unable to finish the 20-min version in the allotted time, or if their score on the 32-min version was substantially higher. These investigators found that only two of 10 diagnostic test scores were significant predictors of students’ need for additional time, and even then the prediction was far from perfect (rs = .36 and .38). More recently, Lovett and Leja (2015) conducted a similar study with a general sample of college students, giving the students a battery that included a highly speeded reading comprehension test, on which each student was first given 10 min, and then an additional 5 min (50% additional time), to see how much their score improved. Students’ processing speed and reading fluency failed to significantly predict their degree of improvement from the additional time.
The Present Study
The majority of test accommodation requests are for extended time (Newman et al., 2011). Diagnostic tests are commonly used, along with other data, to document a need for extended time. In adolescents and adults, the Nelson Denny Reading Test (NDRT; Brown, Fishco, & Hanna, 1993) timed reading comprehension task is one of these diagnostic tests (Sparks & Lovett, 2014). Therefore, in the present study, we sought to answer two questions: First, what is the relationship between NDRT comprehension test completion time and comprehension performance (i.e., number of items correct)? Second, how well do reading fluency, processing speed, and a self-report measure of timed reading and test-taking skills predict each of the two outcome variables (i.e., completion time and comprehension performance)?
We used a reading comprehension test (the NDRT) as a proxy for a real-world test, based on the logic that the vast majority of tests in educational settings require reading comprehension to access the test items, and some high-stakes tests (e.g., admissions tests for college, graduate school, and professional schools) include formal reading comprehension sections. Admittedly, real-world tests vary greatly in the amount of reading required, and many teacher-made classroom tests do not include lengthy passages to read; however, even typical multiple-choice items are sometimes based on multi-sentence question stems, and students must often take tests based on lengthy reading of material that will be covered on the tests. In any case, in the present study, we were particularly interested in predicting students’ speed in completing tests with significant reading requirements.
We included reading fluency as a predictor for two reasons. First, there is a large literature from K-12 educational settings showing a relationship between reading fluency and performance on high-stakes reading comprehension tests (for one review of relevant literature, see Reschly, Busch, Betts, Deno, & Long, 2009). Second, scholars have suggested using reading fluency scores in college students to determine which students need extended time testing accommodations (e.g., Ofiesh et al., 2004). We included processing speed measures for the latter reason as well; processing speed deficits are often cited to argue for extended time accommodations. Finally, self-reports of timed testing and reading skills were included as students’ self-perceptions of need are significant predictors of their likelihood of actually requesting accommodations (Barnard-Brak, Davis, Tate, & Sulak, 2009).
No extended time accommodations were provided to participants during the study, but the study was designed to better understand why some students may be more likely to need such accommodations. Rather than control how much time or how much additional time students were given on a test, as previous studies had done, we chose to examine how much time students would actually use and which characteristics would predict this use of time.
Method
Participants
The participants were 253 undergraduates (125 males) at two institutions, a medium-sized public college and a large private university, both in the Northeastern United States. Most of the students were freshman (n = 124) or sophomores (n = 109), with an average age for the total sample of 19 years (SD = 2.5 years). Most participants (75%) were White, with approximately equal amounts of Asian (9%), Hispanic (8%), and African American (8%) participants. The demographic breakdowns for students at the two institutions were very similar; most students at each institution were White freshmen or sophomores who were 18 or 19 years old. We included students with reported disability conditions, and 24% reported having at some point received a disability diagnosis, most commonly ADHD or a psychological disorder such as anxiety or depression; 4% of the participants reported being eligible for some kind of disability accommodations at their institution. These percentages are not unusual, when viewed in the context of national statistics (e.g., National Center for Education Statistics, 2013).
Measures
Processing speed measures
Two subtests from the Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008) were used to assess participants’ speed of information processing. The first subtest given, Digit-Symbol Coding, requires copying symbols that are paired with numerical digits for 135 digit stimuli. Symbol Search, the second subtest given, requires that the participants search rows of symbols (60 rows are present on the test) to determine whether the row contains either of two symbols, and indicate whether they are present. These tests are both highly speeded with a time limit of 2 min each. When the scaled (normed) scores from the two tests are summed and rescaled, the resulting Processing Speed Index (PSI) score has an average internal consistency reliability coefficient of rxx = .90 and an average test-retest stability coefficient (over intervals of approximately one month) of rxx = .87.
Reading fluency test
The Reading Fluency subtest from the Woodcock-Johnson Tests of Achievement, Third Edition (WJ-III; Woodcock, McGrew, & Mather, 2001) was used to assess participants’ reading fluency. In this task, students have 3 min to indicate whether each of 98 brief sentences (e.g., “A bird can fly.”) is true or false. The WJ-III manual reports good test-retest reliability for this score, even with an extended test-retest interval of one year (rxx = .88). The raw score from this task was used.
Reading comprehension test
The comprehension subtest from the NDRT (Brown et al., 1993), Form G, was used as a proxy for a timed academic test (as in a teacher-made or high-stakes test), as most academic tests require reading comprehension to access the items. The test contains 38 multiple-choice items pertaining to seven passages; the format is similar to the reading comprehension sections of high-stakes tests such as the SAT and GRE. Students were asked to indicate their answer to each of the multiple-choice questions by circling the letter of the desired answer in the test booklet.
The NDRT technical manual reports good alternate-form reliability (rxx = .81) for the comprehension test score. We altered the directions slightly, instructing students to work “as quickly and as accurately” as they could, to remind students about the importance of both performance factors. In addition, students were told that there was no time limit (unlike the 20-min time limit in the standardized version of the test), and so they were to take as long as they needed, but to let the experimenter know as soon as they were finished.
An additional score, the “Reading Rate” score, is available on the NDRT comprehension test, by asking students to indicate which line of the first passage they are reading, 1 min after the test begins, as a rough estimate of the number of words that they can read per minute. We asked students to record this, and so we had access to their 1-min Reading Rate as well as their comprehension score.
Perceptions of time needs
Participants were given the nine-item Self-Evaluation of Performance on Timed Academic Reading (SEPTAR; Kleinmann, 2005), a questionnaire used to assess students’ perceptions of their need for extra time when reading and taking exams. Sample items are “I am a slow reader” and “I would do better on exams if I were faster.” Participants agreed or disagreed with the items on a 5-point Likert scale, and higher scores on the SEPTAR represent more self-reported problems with timed reading and test-taking. Psychometric analyses of the SEPTAR have found good reliability for the total score (α = .89), a unidimensional factor structure, and significant correlations with actual speeded cognitive performance measures, showing evidence of construct validity.
Demographic questionnaire
Finally, participants completed a brief demographic questionnaire including items about gender, age, ethnicity, year in school (e.g., freshman), and whether they had ever received any relevant diagnoses (e.g., learning disability, psychiatric disorder) and were eligible for academic/testing accommodations.
Procedure
Participants were recruited from introductory psychology classes at two northeastern universities, and given a small amount of extra credit toward those classes for their participation. Testing sessions were conducted in quiet classrooms in groups of five to 15 students per session, along with a lead experimenter and a secondary proctor who ensured that a standardized procedure was followed. After completing an informed consent form and a demographic questionnaire, students completed the other measures. The measures were administered in counterbalanced order across sessions, except for the reading comprehension test, which was always last so that students could leave when they were finished. Participants were instructed to raise their hand as soon as they had completed the reading comprehension test, at which point the experimenter or proctor would note the time and mark the test booklet with that time.
Results
Table 1 shows the descriptive statistics for our principal variables. The values are in expected ranges, based on previous research as well as the test manuals’ interpretations. For instance, students’ mean processing speed score on the WAIS was in the average range (M = 105.21). Of note is that the average student took 1,056 s (17.6 min) to complete the NDRT comprehension measure, and 22% of the sample failed to complete the test in 20 min (the standard administration time). This last statistic is not due to students with disabilities being slower; of the 55 students who failed to complete the test in 20 min, 78% reported no disability conditions. We did not conduct separate analyses by disability type given the small number of students reporting any particular disability condition and our lack of opportunity to validate certain disabilities, such as ADHD.
Descriptive Statistics for Principal Variables.
Note. NDRT = Nelson Denny Reading Test; WAIS = Wechsler Adult Intelligence Scale; WJ = Woodcock-Johnson (raw score); SEPTAR = Self-Evaluation of Performance on Timed Academic Reading. For all variables, n = 253.
Table 2 shows the correlations between the principal variables; several correlations are of particular note. First, the time taken to complete the NDRT was not significantly related to comprehension performance (number of items correct), r(251) = .04, n.s. Second, the number of words read in the first minute of the NDRT was strongly related to how long it took to complete the entire measure, r(251) = −.55, p < .001, yet it had little relation to how well students performed on the comprehension test, r(251) = .09, n.s. Third, the cognitive/academic test scores and self-reports were all significantly related to NDRT comprehension completion time: for the WAIS PSI, r(251) = −.18, p < .01; for the WJ Reading Fluency score, r(251) = −.41, p < .001; for SEPTAR scores, r(251) = .35, p < .001. The other correlations in the table generally showed expected relationships based on prior research and theoretical expectations. For instance, processing speed was significantly related to reading fluency, r(251) = .40, p < .001. However, processing speed had only a small positive relationship to comprehension performance, r(251) = .14, p < .05.
Intercorrelations of Principal Variables.
Note. NDRT = Nelson Denny Reading Test; WAIS = Wechsler Adult Intelligence Scale; WJ = Woodcock-Johnson (raw score); SEPTAR = Self-Evaluation of Performance on Timed Academic Reading. All correlations based on df = 251.
p < .05. **p < .01. ***p < .001.
Table 3 shows the results of a simultaneous multiple regression analysis in which WAIS PSI, WJ Reading Fluency, and SEPTAR scores were used to jointly predict NDRT completion time. The R2 for the entire regression model was .20. Reading Fluency and SEPTAR scores were significant predictors in the model, whereas the WAIS PSI score was not a significant predictor.
Simultaneous Multiple Regression Predicting NDRT Completion Time.
Note. NDRT = Nelson Denny Reading Test; WAIS = Wechsler Adult Intelligence Scale; WJ = Woodcock-Johnson (raw score); SEPTAR = Self-Evaluation of Performance on Timed Academic Reading.
p < .05. **p < .01. ***p < .001.
Finally, Table 4 shows a second simultaneous multiple regression analysis with the same predictors, but predicting the number of correct items on the NDRT comprehension test. The R2 for the entire regression model was .08. In this analysis, only the SEPTAR score was a statistically significant predictor in the model.
Simultaneous Multiple Regression Predicting NDRT Number Correct.
Note. NDRT = Nelson Denny Reading Test; WAIS = Wechsler Adult Intelligence Scale; WJ = Woodcock-Johnson (raw score); SEPTAR = Self-Evaluation of Performance on Timed Academic Reading.
p < .05. **p < .01. ***p < .001.
Discussion
The present study was conducted to investigate college students’ time usage on a reading comprehension test, as reading comprehension skills are needed to access virtually all tests, both in classroom and in high-stakes settings. First, we found that more than 20% of our students failed to complete the NDRT comprehension test in the standard allotted time, 20 min. This is interesting, in that diagnosticians sometimes interpret a student failing to finish in 20 min as a sign of a learning disability or a need for extended time accommodations. In fact, it is not especially uncommon for college students to need more than the standard time allotment to complete the test, and most students who took more than 20 min reported having never been diagnosed with any disabilities. Indeed, according to the NDRT manual, an even higher proportion of college students in the NDRT normative sample failed to complete the comprehension test in 20 min, probably because they were using separate answer forms rather than circling desired answers in the test booklet itself (as our participants did).
Second, students’ 1-min reading rate on the reading comprehension test was not related to their reading comprehension performance, yet it was strongly related to the time that they took to complete the test as a whole (r = .55). This suggests that students’ test-taking speed is relatively stable over the course of an exam; the students who were farther into a test after just 1 min generally finished earlier (i.e., took less time to complete the test), although this had no bearing on actual comprehension performance.
Third, we found that students’ test completion time was not related to how many questions they answered correctly on the test. This finding contradicts casual inferences made by clinicians who claim that an examinee could perform optimally with extra time. Students who used more than the allotted 20 min performed no better than students who finished within 20 min. Inspection of the scatterplot suggested no curvilinear relationships either (the null correlation was also replicated when we only considered data from nondisabled students, suggesting that disability-related deficits did not attenuate the correlation in the overall sample). This result converges with the findings of several previous studies using instructor-made classroom tests (Bridges, 1985; Foos, 1989; Nevo & Spector, 1979), and increases the generality of those prior findings. If we think of speed and accuracy as two dimensions of test-taking behavior, it appears that students are just as likely to be slow and accurate, as they are to be slow and inaccurate, fast and accurate, or fast and inaccurate. This may be because separate factors determine speed and accuracy. Accuracy on the NDRT reading comprehension test may have been due more to general language comprehension skills, intelligence, general knowledge, and vocabulary knowledge, whereas speed on the test may have been affected more by personality tendencies such as compulsiveness and indecisiveness (as well as by reading fluency, which we measured). Further research is needed to examine additional variables that predict comprehension speed and/or accuracy.
Fourth, we found that although processing speed, reading fluency, and self-reports of timed reading/test-taking skills all predicted completion time on the NDRT reading comprehension test, only the latter two variables were statistically significant predictors in a multiple regression model. It appears that whatever value processing speed has as a predictor is accounted for by the other variables.
Finally, we found that the same three predictors accounted for less of the variance in NDRT number correct (8% vs. 20% for completion time), and the only predictor that was significant in this multiple regression model was students’ self-reports of their timed reading/test-taking skills. Apparently, speeded performance measures (including time-to-completion on the NDRT) do not relate significantly to (untimed) accuracy. Even the 20% of variance explained in completion time leaves much variance unexplained, but this finding appears consistent with research cited in the introduction, and suggests the need for further exploration of other predictors, a topic we return to below.
Implications for Clinical Practice
Diagnosticians and education professionals often make decisions about which students require extended time testing accommodations. Diagnosticians typically conclude their evaluation reports by making recommendations concerning appropriate academic accommodations, and education professionals make final decisions about whether to grant accommodations on classroom and high-stakes tests. Unfortunately, little research has been available to guide individual decision making, so even when decision makers have access to data from a student’s diagnostic testing, decisions about extended time accommodations are difficult (see, for example, Ofiesh & Hughes, 2002). However, the present study’s results provide some guidance.
First, processing speed measures appear to be of limited value compared with reading fluency measures, when predicting test completion time for reading-heavy tests. Extended time accommodation recommendations should not be predicated solely on low processing speed scores, and so a diagnostic assessment battery should go beyond an IQ test (and any processing speed measures that it may include). In contrast, reading fluency is a promising indicator of test-taking speed.
Second, students’ self-reports of performance on timed reading/test-taking should be solicited, but interpreted with caution. Admittedly, the present study found that these self-reports predicted test completion time, even when processing speed and reading fluency were held constant. However, self-reports are easily influenced, especially by incentives, and so students can claim to be a slow reader or test-taker when they are not. In this study, there was no incentive for a student to claim to be a slow reader. However, given that most college students believe that extended time would be helpful to them (Lewandowski et al., 2014), there is a strong incentive to report such problems when undergoing diagnostic assessments. Therefore, self-reports should be recorded, but students’ incentives and motivation should be considered when interpreting them.
Third, diagnosticians, education professionals, and counselors should assure students that slow test-taking is not a reflection of their ability to do well on a test when adequate time is provided. In our study, many students were relatively slow test-takers and nonetheless obtained relatively high scores on the reading comprehension test. The lack of a relationship between speed and accuracy, at least in our college sample, may suggest the distinct importance of a student’s test-taking style, without significant implications for someone’s knowledge and comprehension skills. One could also interpret this finding as supportive of a universal design approach to assessment (e.g., Ketterlin-Geller, 2005), in which liberal time limits would be provided, and all students would have more time to access test items, so long as speed is not a skill that the test developers/users wish to measure.
Finally, diagnosticians who use the NDRT as part of a diagnostic assessment battery, and education professionals who rely on NDRT scores for decision making, should know that many college students do not finish the NDRT reading comprehension task in the allotted 20 min. Furthermore, failing to finish in 20 min was not suggestive of a poor (untimed) score or a disability condition in our sample, and so merely showing that a student does not finish in 20 min, or that their score improves when given additional time, is not a pathognomonic sign. As with any such sign, test performance must first be shown to be atypical—far from what is expected for someone’s age—before we suspect that a clinical problem is present.
In summary, the results from the present study suggest that certain current practices for determining a student’s need for extended time testing accommodations should be revised. This conclusion converges with other findings showing a need for more evidence-based practice in the area of testing accommodations decisions (for review, see Lovett & Lewandowski, 2015).
Limitations and Directions for Future Research
Our study had several expected limitations, each of which suggests possibilities for future research. First, we used the NDRT reading comprehension task as a proxy for a real-world reading-based test, but our findings may have been different if we had used a real-world test with genuine consequences (e.g., grades). The logistics of such a study would be cumbersome, but we would recommend that researchers attempt to give a diagnostic battery to students who will be in an actual real-world testing situation (e.g., college students who will be taking a classroom test in a large course or who are preparing for a graduate or professional school admissions test) and try to predict the time that it takes to complete that real-world measure.
Second and relatedly, the NDRT is a reading-heavy test, and so the present results may not generalize to tests with less extensive reading requirements. Another advantage of studies with actual classroom or high-stakes tests (see above) is that researchers would be able to examine whether results differ depending on the amount of reading that must be done on the exam. It would not be surprising if, for instance, reading fluency is not as good a predictor of test completion time on an essay test where the reading consists of only a brief prompt.
Third, our study did not measure several traits or skills that could plausibly affect students’ test-taking speed. The total R2 for our regression model predicting NDRT completion time was only .20, and so approximately 80% of the variability in test-taking speed would be accounted for by factors that we did not measure. Earlier in the discussion of our results, we mentioned several potential factors (e.g., personality traits, vocabulary knowledge, general intelligence, etc.), and future studies should measure such factors and investigate their predictive value. In any case, we had not chosen our brief battery of predictors in an attempt to account for as much variance as possible, and so given the goals of this particular study, we were not surprised by the relatively low R2 value.
Fourth, we did not apply a firm time limit on the NDRT; instead of measuring students’ performance under a time limit, we separately measured completion time and (untimed) performance. Although we asked students to work “as quickly and accurately as possible,” having a firm time limit (as is found on many high-stakes tests, as well as on the standard version of the NDRT) may have led to different test-taking behavior. It is also possible that our predictors would have been more effective at predicting performance under a time limit, rather than predicting completion time per se.
Finally, the present study did not involve any true experimental manipulations, and so we were unable to confidently say what causes individual differences in test-taking speed. We would encourage future investigators to conduct intervention studies in which potential causes, such as reading fluency, are increased through training, to determine if this actually causes students to complete real-world tests (or their proxies) in less time. Similarly, personality traits (such as indecisiveness) and test-taking style preferences may be amenable to intervention. Obviously, since timed tests are unlikely to go away any time soon, it would be preferable if we could design tests or put interventions in place to help students take these tests under standard administration conditions, rather than recommending extended time accommodations for so many students with disability diagnoses.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
