Abstract
Differential granting of extra-examination time (EET) is commonly based on learning disabilities (LD) status: EET is granted to LD examinees and is denied to nondisabled examinees. We argue that LD serves as a proxy for the extent to which time limitation affects the examinee’s test score (e). Hence, the validity of the LD-based EET granting policy depends on how well LD status serves as a proxy for e. Reanalysis of 11 comparative experimental studies of the effect of EET shows that LD status is a poor proxy for e. The proportion of nondisabled examinees who benefit from EET roughly equals the corresponding proportion among LD students. Implications of these results for the validity and fairness of this policy are discussed.
Exclusive granting of extra-examination time (EET) on standardized tests to individuals with learning disabilities (LD) is widely practiced in all domains of educational and psychological testing (Lewandowski, Cohen, & Lovett, 2013). Notable among these are the SAT and ACT, where the amount of EET is typically time and a half of the original time limit (Cohen, n.d.; CollegeBoard, n.d.; Ishii, 2011; The Princeton Review, n.d.). 1 This policy is typically justified in terms of the interaction hypothesis (IH; Koenig & Bachman, 2004; Lewandowski et al., 2013; Sireci, Scarpati, & Li, 2005) or the “Maximum Potential Thesis” (MPT; Zuriff, 2000), according to which test accommodations only improve the test scores of examinees with disabilities; examinees without disabilities will not exhibit higher scores when taking the test with those same accommodations (Sireci et al., 2005).
Specifically, the rationale at the basis of the exclusive provision of EET to examinees with LD consists of two components. The first component provides justification for granting EET to students with LD: Because these examinees process certain kinds of information slowly and are therefore impeded in performance (Lyon, Shaywitz, & Shaywitz, 2003), their scores are negatively biased by time limitation (Savage, 2004). That is, they are lower than the corresponding hypothetical scores that would have been obtained under untimed conditions. Viewed in the perspective of test validity theory, time limitation introduces construct-irrelevant variance for examinees with LD (Phillips, 1994; Sireci et al., 2005).
The second component is intended to justify the denial of EET to nondisabled examinees. It claims that nondisabled examinees perform to the best of their capability under time-limited test conditions; their scores are not negatively affected by the time limit and, therefore, these examinees will not benefit from EET (Lewandowski et al., 2013; Phillips, 1994; Sireci et al., 2005). If the assumptions on which this rationale relies hold true, then the EET accommodation helps examinees with LD to demonstrate their true knowledge, skills, and abilities without impairing the nondisabled examinees. In turn, this promotes fairness in testing and leads to more accurate score interpretations of examinees with LD and nondisabled examinees, alike (Ranseen & Parks, 2005; Sireci et al., 2005; Zuriff, 2000).
The Underlying Causal Model
Underlying the exclusive granting of EET to examinees with LD, therefore, is a causal model that attributes the detrimental and invalidating effects of time limitation on an examinee’s (power) test score solely to LD. The existence of other potential factors is denied. Importantly however, the literature offers no theoretical justification for this position. Although the assumption that examinees with LD, particularly those with reading disabilities, are likely to be negatively affected by the imposition of time limits relies on compelling theoretical arguments, no justification has been offered for the complementary assumption that nondisabled examinees are not affected by such limits—and, therefore, that their test scores will not be higher when taking the test with this accommodation. That is, no theoretical justification has been offered for the assumption that LD is the only possible explanation for the attenuating and invalidating effect of time limitations on examinees’ test scores.
Critical examination of this assumption points out its implausibility: Additional factors that may hamper an examinee’s functioning under limited time conditions can be easily imagined (e.g., a general negative main effect of time limitation and slow cognitive processing due to other factors, such as low intelligence level, low working memory capacity, high susceptibility to time pressure, etc.). This possibility is inconsistent with the exclusive causal attribution of the negative effects of time limitation to LD on theoretical grounds and challenges the legitimacy of exclusive granting of this common form of accommodation to examinees with LD (Phillips, 1994). If LD is not the only possible cause for nonoptimal functioning under limited time conditions, at least some nondisabled examinees may also be affected by limited test time and therefore can benefit from the extra time.
The implications of this possibility are highly significant both theoretically and practically, particularly in view of the fact that nondisabled individuals constitute the vast majority of the examinee population. Denying EET from nondisabled examinees who could benefit from it biases and invalidates their test scores as measures of the construct supposedly measured by the test and introduces unfairness in the measurement procedure. Indeed, studies from recent years point to the possible benefits that EET may offer to nondisabled examinees (Gregg & Nelson, 2010; Lesaux, Pearson, & Siegel, 2006; Lewandowski et al., 2013; Lovett, 2010; Ranseen & Parks, 2005; Sireci et al., 2005; Zuriff, 2000). Hence, one cannot but conclude that the assumption that nondisabled individuals perform at the best of their ability under timed conditions and are not affected by time limitations—on which the justification for the exclusive granting of EET to examinees with LD relies—is both theoretically problematic and empirically unwarranted.
We suggest that the theoretical treatment of the EET accommodation and the justifications provided for its selective granting have been hampered by its conceptualization as necessarily related to LD. The purpose of this article is twofold: (a) to propose a different conceptualization, which, unlike the currently accepted one, will justify not only the granting of EET to some examinees but also its withholding from others, and (b) to evaluate the empirical validity of the current EET granting policy in light of the suggested reconceptualization. The article consists of three sections. The first section (a) presents and explicates the psychometric rationale for the very granting of EET and its differential allocation; (b) specifies the psychometrically normative EET granting policy and points out the impossibility of its application, due to the hypothetical nature of the underlying criterion; and (c) introduces the notion of a “proxy”-based EET granting policy. In the framework of this conceptualization, LD status is shown to be a proxy for the true (however, unknown) criterion for EET granting. Accordingly, the validity of the exclusive granting of EET to LD examinees depends on the similarity between this proxy-based EET granting policy and the normative one. Quantitative validity measures are suggested, and their relation to the IH is examined. The second section presents empirical estimates of these measures, based on a reanalysis of the results of pertinent published studies. The last section concludes.
Reconceptualizing the EET Granting Policy
The Psychometric Rationale for EET Granting
By definition, speed of performance is not part of the theoretical construct measured by “power” standardized tests or examinations. Hence ideally, such tests should be untimed (Lu & Sireci, 2007). Nevertheless, for obvious practical reasons, the time allocated to examinees on such tests is typically limited. The imposition of time limits—which, in this case, constitutes a construct-irrelevant feature of the test administration procedure—may negatively affect to various extents the scores of all or some of the examinees, thereby hampering the validity of the absolute interpretation of the scores of those examinees as well as that of the relative interpretation (i.e., rank order) of the entire score set.
Under these circumstances—and assuming that the time limitation cannot be avoided—EET granting is not only desirable but also obligatory to maintain the validity of the test scores. Our main argument is that, ideally, the differential granting of EET should be based on the expected magnitude of the detrimental effect of time limitation on the examinees’ test scores, regardless of their disability status. Here lies the basic difference between the suggested conceptualization and the disability-based current conceptualization. Formally, let
where YTi is the score of examinee i under a particular time limit T, and Y∞i denotes his or her “correct” test score under untimed conditions—stand for the effect of the specific time limit for examinee i. 2 The difference between them, eTi, indicates the (“Platonic”) measurement error induced by the specific time limitation. Under the consensual assumption that time limitations can only lower the test score, eTi is either zero or negative and its magnitude reflects the strength of the detrimental effect of time limitation on the examinee’s test score. Without loss of generality, we assume that the test time, T, is fixed for a given test, and henceforth, we drop the index T from our notation for brevity, as appropriate.
The purpose of EET is to compensate for the detrimental effects of time limitations. To achieve this aim, the granting of EET should be based for each examinee i on his or her value of e for the given test, according to the following rule:
where Δi is the amount of extra time required for examinee i, which is both necessary and sufficient to eliminate e; and fi is a zero-preserving monotonic function rule, which specifies the value of Δ for each value of e, namely, Δ = 0 if e = 0, and increasing amounts of Δ are granted as the absolute value of e increases. Note that fi is an individual function rule, which may vary between individuals with the same e value, thus leading to possible different values of Δ for different examinees.
We submit that implementation of an extra-time granting policy, which satisfies the above criteria, can successfully “level the playing field” by entirely eliminating ei. We shall refer to this policy as the “normative” policy and to the respective Δ i specified by it for Examinee i as the “correct” Δ i amount.
The Need for a Proxy
Implementation of the normative EET granting policy suggested above requires knowledge of e and fi prior to testing. Because both are obviously unknown to the test administrator, the normative EET granting policy is only a hypothetical, unattainable ideal. Nonetheless, it is necessary for the proper conceptualization and evaluation of any actual EET granting policy, which can only rely on a known, however imperfect, proxy of e, X:
where
The Validity of the LD-Based EET Granting
The validity of any proxy-based EET granting policy is defined by the similarity or closeness of the EET amounts actually granted by it (Δ*) to the corresponding correct amounts that would have been granted by the normative policy (Δ), that is, on the closeness of
Granting EET to LD examinees who derive no benefit from it (Type I error), and
Withholding EET from nondisabled examinees who derive benefit from it (Type II error).
All else being equal, the higher the relative frequency of these errors—particularly Type II errors (“false negatives”)—the lower the validity and fairness of the LD-based policy and the poorer its justification, hence the critical importance of estimating both rXe and the prevalence of the two error types for the evaluation of the validity, fairness, and justification of the LD-based approach to EET granting.
Unfortunately, no empirical estimates of rXe and of the probabilities of the two types of error associated with the LD-based EET granting can be found in the literature. However, they can be derived or estimated based on the results of published experimental studies of the effect of EET on the test scores of LD and non-LD examinees that used a within-subject design—two test scores (standard time [S] and extra time [L]) for each examinee in each of the two groups (LD and non-LD). This article estimates rXe as well as the probabilities of the two error types in all (11) studies meeting this requirement, yielded by a search of several electronic databases.
Method
Database
Table 1 displays the 11 studies included in this article. Five of these studies were included in Sireci et al.’s (2005) meta-analysis of the effect of test accommodations on LD and non-LD students. For each study, the table provides information about the tests, the examinees (age and number), and test time. As is evident from the table, the original test time was very short in the majority of the studies (an average of 19.9 min); in the seven studies in which an extension was granted, the time extension was relatively long (on average about double the original test time); and in four studies, the time was unlimited. Four studies used two-stage tests, and seven used different tests for the extended time condition. The age of examinees was concentrated in the lower grades (seven out of nine studies for which this information was provided). Finally, due to statistical power considerations, in 10 out of 11 studies, the proportion of LD participants (ca. 50%) exceeds considerably their true proportion in the population (5%-15%).
The Eleven Studies.
Note. Studies marked with an asterisk were included in Sireci, Scarpati, and Li’s (2005) meta-analysis. LD = learning disabilities; CBMs = curriculum-based measurements; GRE = Graduate Record Examination; ITBS = Iowa Tests of Basic Skills.
Non-LD Sample a.
Non-LD Sample b.
Time limit was not noted in the dissertation.
This population consists of above-average (n = 20) and average (n = 22) students.
Six participants out of the total population of this group were defined as physically handicapped.
Statistical Analysis
To estimate the correlation between e and X, we computed gain scores for all examinees following time extension, GS = YL − YS, where S and L denote standard time and extended time, respectively. These gain scores estimate the corresponding absolute values of eS, because there is an implicit assumption that granting the extended time L will result in attaining the correct test score or at least approach it. Therefore, as LD status is known for all examinees, the correlation between X and e can then be estimated by the negative value of the correlation between X and G, rGX (where, as already mentioned, X = 0 for nondisabled students and X = 1 for LD students). Because, theoretically, the EET in the extended condition can only improve scores relative to those obtained by the same examinees in the standard condition, the gain scores GS can only be null or positive. Hence, any experimentally obtained negative values are entirely attributable to measurement error. In fact, due to measurement error, rXG underestimates rXe.
Because X is binary in this case, rGX is directly obtained from
where dG (Cohen, 1988) is the standardized mean GS difference between the LD and non-LD groups (see Rice & Harris, 2005). For each study, the calculation of dG requires knowledge of the mean and standard deviation of the difference scores for each group (LD and non-LD). The studies fall into two groups according to whether or not results include the within-group (LD and non-LD) mean and standard deviation of within-examinee difference scores. To compute these statistics for the studies that did not report them, the authors of these studies (with the exception of Lewandowski et al., 2013) have been individually approached and asked to provide the individual score gains GS in their studies. Unfortunately, none of them provided the required information. Therefore, in the studies that did not report statistics of intraindividual difference scores for the two experimental groups, the mean difference score was calculated for each group (LD and non-LD) by subtracting the mean score of the standard time test from the mean score of the extended time test. The standard deviations of the difference scores GS in each group were estimated on the basis of the conservative assumption that the within-group (LD and non-LD) correlation between scores on the two tests was
In addition, we calculated the proportion of positive and nonpositive (zero or negative) gain scores for each test using the assumption that the distribution of GS is normal. The proportion of nonpositive gain scores among LD examinees estimated the proportion of Type I error, because these examinees derived no benefit from EET granting. In a similar way, the proportion of positive gain scores among nondisabled examinees estimated the proportion of Type II error, because these examinees derived benefit from extra time but it is not granted to them under the EET policy. In sum, we computed three (interrelated) indicators of the validity of the LD-based EET granting policy for each of the 17 tests:
The standardized mean difference dG between gain scores following time extension in the LD and nondisabled group;
The correlation rXG between LD status (X = 0 nondisabled, X = 1 LD) and gain scores; and
The proportions of Type I and Type II errors.
Results
LD Status and Gains Following Time Extension
Figure 1 ranks the 17 tests that are reported in the 11 studies according to the estimated standardized mean differences dG between gain scores in the LD and non-LD group and the associated rXG values. To control for the effect of between-study variability in the relative size of the LD and groups on the value of rGX for the same dG, in each study the two groups were weighted in inverse proportion to their over- or underrepresentation in the study sample, such that the ratio of their weighted sizes will be 9:1, for non-LD and LD, respectively, the typical ratio in the population. As a result,

Rank order of the 17 tests according to increasing order of the standardized mean gain score difference between the LD and non-LD group (dG) and the associated rXG value computed as

Joint distribution of the proportions of LD and non-LD examinees who have benefited from EET in the 17 tests.
As evident from Figures 1 and 2, in all but two of the 17 tests, the relationship between LD and gain from EET is weak and its direction varies between tests. In Figure 1, this is reflected by the negligible absolute values of the difference dG between the LD and non-LD groups in mean gain score and the corresponding low value of the correlations rXG (absolute value for 15 of the tests < 0.14), whereas in Figure 2 by the equality between the LD and non-LD groups in the estimated percentage of examinees who benefited from EET. The extreme result of Runyan’s (1991) study looks highly suspect and worthy of further examination. Regrettably, Dr. Runyan was unable to provide us with the raw data of her study, which has been lost in an accident.
False Negatives and False Positives
Figure 3 shows the joint distribution of the estimated proportions of the two error types had the reviewed studies implemented an LD-based EET granting policy—thereby withholding EET from all the non-LD examinees. Each point in Figure 3 represents one test.

Joint distribution of the relative frequencies of Type I and Type II errors in the 17 tests.
As indicated in Figure 3, the estimated probability of Type I error (i.e., “false positives”) is low (0%-61% across tests, with a median value of 0%). Much higher and far more important is the high probability of Type II error or “false negatives.” Across all tests, this probability ranges from 36% to 100%, with a median of 100%. That is, the LD-based policy erroneously denies time extension from the vast majority of the examinees who could benefit from it.
Discussion
As clearly indicated by all the validity indicators used, the LD-based EET granting policy is found here to be empirically invalid and biased against nondisabled examinees. Contrary to the key assumption underlying this policy, LD status appears to be a poor proxy for gain from time extension: Nondisabled examinees’ mean gain scores following EET in the reviewed studies are, in fact, similar to those of their LD counterparts, and the proportion of nondisabled examinees who benefit from EET (48%-100% across tests, with a median of 64%) roughly equals the corresponding proportion among LD examinees. Therefore, implementation of the LD-based policy, which withholds EET from all nondisabled examinees, in these studies would have resulted in high percentages of Type II errors, namely, unwarranted denial of EET to examinees who need it. About 90% of all the individuals who benefited from time extension in these studies (i.e., had positive gain scores) and, therefore, are entitled to EET would have been erroneously denied EET by this policy. The psychometric, ethical, and social implications of this result are particularly critical in view of the high relative frequency of examinees who benefited from time extension (about two thirds; Figure 2). Taken together with the 90% probability of Type II error found in this study, the two thirds base rate probability of benefiting from time extension indicates that the individuals who would have been unduly denied EET by the LD-based policy (i.e., the nondisabled examinees who benefited from time extension) constitute a robust majority (about 60%) of the examinee population.
Of course, these estimates may be specific to some extent to the particular set of studies on which they are based and their characteristics—in terms of participants, test content, number of items and their difficulty, original time frame, absolute and relative amount of time extension, and so on—as well as to the estimation method and the underlying assumptions. A more direct approach to the estimation of the relative frequency of Type II error (i.e., false negatives) following the implementation of the LD-based EET granting policy—suggested by one reviewer of this work—would be a planned experimental study using individual data, which could provide information about the percentage of students (both LD and without LD) who would benefit from extended time, thereby dispensing with the need for estimation and the associated assumptions.
Furthermore, as aptly noted by one reviewer of this article, most of these studies used research measures rather than actual high-stakes tests. These research measures may have been more highly speeded than actual high-stakes tests are, leading to the very high degree of benefit by both LD and nondisabled participants. Indeed, Lewandowski et al. (2013) even indicated that they made their test more speeded to avoid ceiling effects. Thus, the estimates reported here should not be considered as universal constants but rather as illustrative examples, in need of replication by additional studies using “real-world” measures. However, the across-study consistency of the results, despite the sizable heterogeneity of the study sample in terms of relevant characteristics, suggests that a high percentage of false negatives is inevitable in the implementation of the LD-based EET granting policy, particularly in view of the relatively low frequency of LD in natural populations.
It can thus be tentatively concluded that the LD-based EET granting policy fails to achieve its aim: preventing the vast majority of the biasing and invalidating effects of time limitation on “power” test scores. Furthermore, if additional research shows that they generalize to real-world measures, our results also point out the falseness of the theoretical basis of the LD-based EET granting policy, which attributes the detrimental effects of time limitations on power test scores to LD. As clearly indicated by these results, LD is but one minor factor among many others that may cause time limitations to detrimentally affect test scores: 90% of the examinees who benefited from EET are nondisabled.
The empirical invalidity of the LD-based EET granting policy, revealed by our study, is particularly relevant to the weaker formulation of the IH (Sireci et al., 2005), which only requires that score gains following EET in the nondisabled group be smaller, on average, than in the LD group (i.e., a positive rGX value in contrast to the virtually perfect correlation implicitly required by IH’s original formulation) and ignores the resulting high proportion of “false negatives.” Thus, it fails to provide a psychometrically and ethically valid justification for the exclusive granting of EET to LD examinees.
The policy implication of our results is clear. Until a valid proxy of the detrimental effect of time limitation on test scores is found, there is only one way to revise the present policy and improve the validity of the testing enterprise as well as its fairness: taking time out of the examination equation, that is, universal granting of liberal time limits. This recommendation is consistent with the emerging consensus regarding the imposition of time limits on tests when speed is not part of the construct to be measured as part of the broader Universal Test Design (UTD) strategy 4 (e.g., Lewandowski et al., 2013; Lovett, 2010; Royer & Randall, 2012; Sireci et al., 2005).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
