Abstract
With the shift from No Child Left Behind (NCLB) to Every Student Succeeds Act (ESSA), accountability models are being changed. Given the past 15 years of reporting on student subgroups and 10 years using various growth models, accountability systems can now be better informed. In this study, we analyze identification and services of students with specific learning disabilities (SLDs). First, we document the degree to which they are identified and receive special services for three cohorts and then document changes in proficiency and growth on a state test. Next, we use two measures of growth to document progress: a transition matrix and a multilevel model. We found that some students change in their identification as SLDs over three grades with resulting differences in special education supports, but the effect is negligible on growth. Accountability systems, therefore, may not need to be based on complex models using time-varying student characteristics.
Keywords
Advancements have been made in legislative enactments to focus on reporting outcomes for (sub)populations of students, so performance is no longer hidden in an overall average. Furthermore, the focus is on individual student growth rather than change with intact groups in accountability systems. These (sub)populations have been both reported and researched with somewhat unequal attention to definition, participation, and performance. At the same time, the analytical tools used to highlight outcomes have increasingly attended to the impact and the consequences from such participation.
The purpose of this article is report on outcomes for students with learning disabilities (LDs) in state accountability systems, and address subgroups of students at varying levels of specificity, and consider an appropriate growth model within the accountability system. In particular, we focus on students with and without specific learning disabilities (SLDs) and the degree to which they receive special education services over three middle school grades. Just as the main outcomes (e.g., average performance on a state test) may mask important findings for subgroups (e.g., with or without a disability), the granularity of the subgroup likewise, may not represent the outcomes for students in a specific subgroup (e.g., students with different types of disabilities and receiving different amounts of special education services). Similarly, the outcomes themselves may be equally problematic: With many different types of outcomes possible to report in an accountability system (e.g., proficiency status, percentile ranks, normal curve equivalent scores, and scale scores), results may not be fully or equally consistent or sensitive. Therefore, the rationale of the study is to document the identification of students in a subgroup (special education or SLDs) and the degree to which they receive special education services as it is associated with progress on a state test over three grades using two different outcome metrics.
Legislative Contexts for Subgroup Performance and Growth
With the No Child Left Behind (NCLB) Act (U.S. Department of Education, 2001), performance on statewide testing programs was required to be reported for subgroups by disability, race, ethnicity, English language proficiency, and gender. For nearly a decade, these results have been used to focus on accountability and ensure resources were being allocated for all students to become proficient, including students from these subgroups. Probably the most important dimension of this act was that this subgroup performance would be evaluated against a defined set of content standards. The fatal flaw in this logic, however, was the use of status-based measures, which confounds outcome changes with population changes.
In the final years of NCLB, a pilot program was introduced using various growth models in states (U.S. Department of Education, 2005, 2009) with further analytical models incorporated by Race to the Top. Finally, in the most recent accountability legislation, the Every Student Succeeds Act (ESSA; 2015), even more flexibility has been introduced in the measures being deployed. This sequence of legislative acts has systematically improved the integrity of accountability models not only with attention to subgroups but also in consideration of growth over time within these populations. Two important assumptions are made, however, when addressing growth over time for subgroups of students that lead to a conundrum for accountability systems.
First, an assumption is made that subgroups are well defined and stable over time even though students’ special education status can change from year to year (Nese, Stevens, Schulte, Tindal, & Elliott, in press; Schulte & Stevens, 2015). This assumption becomes problematic when some student characteristics (e.g., being with a disability and/or with a specific language proficiency) are changing at the same time progress is being made on growth models (which also may be different at various ages). And, as noted by the National Center on Assessment and Accountability for Special Education (NCAASE; 2018), when students begin to respond successfully to specialized interventions and then move out of special education, improvement on accountability systems is misleading as their results are reported for general education.
Second, the choice of growth model needs to be considered, given that each has measurement and statistical assumptions (Stevens et al., 2017). Previous research has documented the insensitivity of transition matrices using proficiency levels over that of multilevel models using scaled scores for general education populations (Tindal, Nese, & Stevens, 2017) as well as for students with significant cognitive disabilities (Tindal & Nese, 2015; Tindal, Nese, Farley, Saven, & Elliot, 2016). In the remainder of the article, we address the performance of subgroups of students, particularly those with SLDs receiving different amounts of special education services over time. Finally, we consider the manner in which growth is documented (whether using transition matrices or multilevel models as well as the functional form of the latter).
Stability of Students With SLDs
Controversy over the identification of students with SLDs has existed since the initial definition promulgated by the National Advisory Committee on Handicapped Children (1968) and the legislation of PL-94-142 in 1975. A decade later, Ysseldyke, Algozzine, Shinn, and McGue (1979) critically questioned the clarity of SLD definitions, and then conducted scores of research studies through the Institute for Research on Learning Disabilities (IRLD). In a culminating report summarizing several years of this research, Ysseldyke et al. (1982) list the following conclusions.
The special education team decision-making process is inconsistent and operated to verify problems cited by teachers in a search for pathology.
Placement decisions are not based on the student performance data but more on “naturally occurring” student characteristics.
Many SWoD are being found eligible for special education services.
No defensible process exists to determine whether students should receive LD services even though the process is becoming more “sophisticated.”
With several criteria available to identify a student with an SLD, more than 75% qualify on at least one of them.
When analyzing performance on various diagnostic tests, no differences exist between students with LDs and students considered low achieving.
The most important decision made in the identification of students with SLDs is the referral by the teacher for an assessment to be conducted.
In the 30 years since the research by Ysseldyke et al. (1982), identifying students with SLDs continues to be a significant issue in the field, if only because the various options being put may expand rather than decrease the variance (see Fuchs & Vaughn, 2012; Reschly, 2005). The Learning Disabilities Association of America (n.d.) describes the eligibility criterion as the time when a child reaches Tier 3 level in a response-to-intervention system (RTI) along with consideration of several other student characteristics such as achievement commensurate with peers and discrepancy between ability and achievement. The National Center for Learning Disabilities (2014) cites many survey statistics that summarize the most recent perspectives such as public perceptions, prevalence, and troubling trends, noting that RTI systems are “now being used in the majority of schools, districts and states as a way to identify and address learning, attentional and behavioral issues” (p. 34).
Performance/Progress of Students With and Without Disabilities
An important study on the impact of special education status on achievement growth was reported by Stevens, Schulte, Elliott, Nese, and Tindal (2015). In this study, growth and gaps in mathematics achievement tests were compared for students with disabilities (SWD) in several exceptionality categories based on their special education status in third grade versus students without disabilities (SWoD). They reported that growth for all groups was present but decelerated over successive grades. Several important predictors related to student characteristics were also related to growth, including race/ethnicity, gender (which had very small effects for quadratic growth), parental education, economically disadvantaged, and English language proficiency (which was not significant for linear or quadratic growth). “Addition of the socio-demographic predictors accounted for approximately 45%, 13%, and 7% of the variance in student intercepts, slopes, and curvature, respectively in comparison with the unconditional longitudinal model” (p. 55). When growth was expressed as an effect size (ES), achievement gaps were found between students in specific exceptionality categories and SWoD. For students with disabilities, these effect sizes were .74, .54, .32, and .24 for Grades 3 and 4, 4 and 5, 5 and 6, and 6 and 7, respectively. Considerable variance was found in the ES for specific disabilities. The average effect sizes for SWoD across these grade pairs (3–4, 4–5, 5–6, and 6–7) were .78, .74, .48, and .32, respectively.
In a similar focus on students with specific disabilities, Wei, Lenz, and Blackorby (2013) used the Special Education Elementary Longitudinal Study (SEELS) data “a nationally representative sample of students with disabilities, ages 7 to 17, classified according to the federal special education disability categories” (p. 155). In this study, growth trajectories for students with LDs were documented relative to students with other disabilities (along with also analyzing growth by gender, race–ethnicity, and socioeconomic status). They reported that growth for LDs was not higher than for all subgroups, without LD as the referent group, and that a number of coefficients for exceptionality groups were not significant as linear slope or quadratic meaning; even with reference to coefficients (not significance), some exceptionalities were higher.
Finally, Stevens and Schulte (2017) reported on the interaction of students with LDs and receiving free and reduced-price lunch (FRPL), being Black, and being either male or female; they documented the subsequent influence on performance as well as growth trajectories. The significant advancement of this study was that not only were partial effects reported but also true interactions of combinations of characteristics were associated with greater risk of mathematics performance. They found that three significant interactions were present for (a) LD versus non-LD with male versus female, (b) LD versus non-LD with Black versus White, and (c) LD versus non-LD with FRPL status. Differences were found for intercept, slope, and curvature. All six possible comparisons were significant for intercept only, although the biggest difference was between non-LD male and LD female. In conclusion, growth for LD or non-LD students differed markedly, depending on whether the student was Black versus White, or receiving FRPL or not. Furthermore, these differences were more dramatic than indicated by partial effects.
Not only are subgroups of students with disabilities being studied more closely but also increasing research is concurrently appearing on the movement of students with disabilities (or with specific disabilities) in and out of these classifications. For example, Schulte and Stevens (2015) classified students with disabilities into three partially overlapping groups according to their participation in special education from third to seventh grades. The groups varied on basis of whether they received special education services (a) in the first year of this interval (Grade 3), regardless of their status in subsequent grades; (b) anytime between Grades 3 and 7; or (c) throughout the entire interval. Using multilevel growth models (Raudenbush & Bryk, 2002), they found similar growth for students who were identified on the basis of their third-grade special education status (Group “a” above) or identified on the basis of whether they had ever been placed in special education during the grade interval (Group “b” above). In contrast, students in special education during the entire interval had the lowest interval (Group “c” above) and showed the lowest rate of growth. The effect was “shifting classifications complicate the interpretation of longitudinal studies of achievement growth for students with disabilities” (p. 372).
These findings have important implications for documenting the effects of special education programs, though the granularity and stability of these populations also need to be considered. We focus on students with SLDs because it is the largest category of students being served in special education. In the most recent Annual Report to Congress (U.S. Department of Education, 2015b), the total number of students aged 6 through 21 who served under Individuals With Disabilities Education Act (IDEA), Part B, was nearly 6 million, representing 8.5% of the resident population in this age band (note these data were from 2013). Of this group, nearly 40% were students with SLDs, though the identification rates have been decreasing over the past decade, as noted earlier.
Summary and Purpose: Reporting Status and Performance
In summary, we know the percentages of students with disabilities being served in special education programs and their movement into and out of special education, and some sense of outcomes for both groups. Furthermore, the two studies by Stevens and Schulte (2017) and Schulte and Stevens (2015) represent advancements in the research on students with disabilities. In the former study, impact is in relation to combinations of student characteristics and in the latter study, impact is reported in association with changes in special education services.
Our current study extends the work of Stevens et al. (2015) by analyzing a different state test (Oregon rather than North Carolina), using three different cohorts, tracking annual change in SLD status and receipt of special education services over three grades, and applying two different models of growth. We also increase the data set beyond the school effect data set used earlier, which was constrained to allow comparison of eight growth models (percent proficient, average gain score, transition matrix, school growth percentiles, value added models, and three different multilevel linear growth models) and, therefore, resulted in removal of 45% of the student records. Specifically, we address the following three questions:
First, we expected similar results as those reported by Schulte and Stevens (2015) on identification of an SLD with varying amounts of time receiving special education services. Second, given that this state used RTI in their identification system and that students were in middle school, we expected only modest changes in the SLD services they received. Most students would either be always or never SLDs and always or never receive special education services. Some students, however, might either be identified and begin receiving special education service in Grade 6 and then later have no such services (in Grades 7 and/or 8) or not be identified and receive services in the early grade(s) and then begin later (in Grades 7 and/or 8). Very few students would change in the middle grade and, indeed, we found only 104 such students. We have labeled students who received services for only one or two grades as sometimes SLD with partial service (for only 1 or 2 years).
We expected differential changes in proficiency over the 3-year period for different student subgroups, based on the results reported by Stevens et al. (2015) on subgroups (SWD vs. SWoD) and Wei et al. (2013) for students with SLDs. However, these changes would either be muted, given the insensitivity of transition matrices (Tindal et al., 2017), or be related to the amount of services provided. Students who were sometimes SLD and received services in only some of the three grades (partial) would show higher proficiency levels than those always labeled SLD and received services in all three grades (always), in part, reflecting either a different population or the responsiveness of the special education supports.
The results from Stevens and Schulte (2017) growth led us to expect students who never receive services would show the highest initial level of performance and the highest rate of growth. More important, however, when this group is used as the reference, students who were sometimes SLDs and received special education services in only one or two grades (partial) would perform higher initially and grow at a steeper rate than students who were always SLDs and received services all three grades.
Method
The study is based on an extant data set from a larger multistate longitudinal project, using three cohorts of students during a 5-year interval. In this section, we describe the student sample, the state test, and the models used to operationalize accountability.
Student Sample
The original sample for this study included three cohorts of students progressing from Grades 6 to 8 from 2007–2008 to 2009–2010, from 2008–2009 to 2010–2011, and from 2009–2010 to 2011–2012. We refer to each of these cohorts according to the year they completed Grade 6—Cohort 08, Cohort 09, and Cohort 10, respectively. Students were included if their scores on any of the three grades general statewide mathematics assessment, the Oregon Assessment of Knowledge and Skills (OAKS), contributed to Adequate Yearly Progress school performance calculations. Very few students were missing test scores, ranging from 0.1% to 0.6% of all test records by cohort and grade. Because we utilized statewide reporting data, no demographic data were missing. However, demographic data did vary slightly across the years (e.g., FRPL eligibility). In these cases, the “wandering” demographics were collapsed to the median demographic. In cases when a student had two demographic records and only two test scores (with the third missing), the specific demographic was randomly assigned between the two. This produced a stable demographic profile that was non–time varying. The final sample included approximately 42,860, 42,332, and 42,922 students across the three cohorts, respectively, for a total of 128,114 students. Means and standard deviations (SDs) by grade, cohort, and SLD status (collapsed to always, sometimes, or never classified as SLDs) are reported in Table 1.
Performance on State Test for Three Groups of SLD/Special Education Service and Across Three Grades and Three Cohorts: Count, Mean, and Standard Deviation.
The Oregon Department of Education (ODE) had emphasized RTI as the primary model for identification of students with SLDs. Annually, funds had been directed to a school district in the state to lead training and provide support for other districts in adopting and implementing this system. In general, a classification of SLDs was made after screening indicated low performance with subsequent lack of progress over a period of time in the presence of increasingly intense (tiers of) interventions. As students moved into successive grades, this RTI system was applied to continue the SLD classification or reclassify the student as not having an SLD. Thus, we were able to organize our data file as always, sometimes, and never with an SLD during the 3-year interval of the study. Students who were always with SLDs received special education in all three grades, students who were sometimes with SLDs and received partial special education services, students who were never with SLDs never received special education services. Note that students in the sometimes services category included both students who had previously been labeled with an SLD and then later were reclassified without it, as well as students who had not previously been classified with an SLD but then later were classified with this label. This grouping was done independently for each of the three cohorts, which represented unique student populations who participated in all 3 years of the study.
State Test Used to Show Growth
The OAKS is a summative, computer adaptive assessment based on the Oregon Content Standards (ODE, 2007). Alignment of items to content standards are specific to grade levels and subject areas, with technical adequacy reported from research conducted using 2008–2009 and 2009–2010 mathematics test data (ODE, 2012b). Teachers are expected to administer the test under standard administration conditions (ODE, 2012a, 2012c). These technical reports present technical adequacy information for the OAKS mathematics tests, including test information curves, suggesting adequate reliability across a wide range of ability levels within grades and subgroups (i.e., ethnicity, English learners [ELs], and students with disabilities). Approximately, 90% of the sample had a standard error of measurement of three RIT points. This standard error increased for students in the tails of the distribution, particularly for students in the first and 99th percentiles. Concurrent validity evidence is sufficient, with correlations ranging from .74 to .80 for the California Achievement Test, and .76 to .85 for the Iowa Test of Basic Skills (ODE, 2007). Information on the development, operational procedures, and technical properties of the test are publicly available at http://www.ode.state.or.us/search/page/?id=1302. Annual updates are available, describing outcomes and changes to the assessment system.
A vertical scale score (referred to as an RIT score by ODE) has been developed using a Rasch model. This scaled score has been centered on 200 points in Grade 3. Using a bookmarking procedure, RIT scores have been converted to proficiency categories in which content experts and special educators reviewed items that were sorted from the easiest to the most difficult and identified those items that shifted to a different category. This standard setting process directed panelists to first locate the item distinguishing meets from nearly meets, then identify the item that distinguished meets from exceeds, and finally locate the item distinguishing nearly meets from far below. These decisions were made in three rounds: (a) independently, (b) in small groups in which panelists compared their category breaks with each other, and (c) in small groups with impact data available to allow them to adjust their decisions (ODE, 2008).
In the current study, outcome measures are reported on percentage of students in each proficiency category based on these (RIT) scale scores and the RIT scale scores. Two types of proficiency categories are reported using (a) 5-point scale (1 = very low, 2 = low, 3 = nearly meets, 4 = meets, and 5 = exceeds) and (b) a 2-point scale with does not meet representing categories of 1 to 3, and meets representing categories of 4 and 5. We report on the changes across the 5-point proficiency categories.
We evaluated the pattern of three student characteristics across the three grades: SLD, FRPL, and EL. As noted earlier, we grouped students for each cohort into those with an SLD and receiving special education services during the entire three grades (always), those whose classification changed during the three grades (sometimes) and received partial special education services over the three grades, and those who were never classified with an SLD and never received special education services.
Performance Using Transition Tables and Multilevel Growth
We organized student outcomes to document transition across proficiency categories as well as longitudinal growth over three grades using multilevel models (both an unconditional and a conditional model) that depict the amount of growth for subgroups of SLD students and cohorts.
Transition tables
These tables refer to the cross-tabulation of proficiency (counts and percentages) across two successive grades. The axis on the left represents the five proficiency categories for the later grade, and successive columns represent proficiency categories from the earlier grade. The intersecting cell is the transition, which we report as percentages.
Multilevel modeling
Multilevel models were fit to the data with time (school year) nested within students. An unconditional linear growth model was fit first, followed by a nonlinear model with time transformed according to the natural logarithm. As discussed in the “Results” section, the nonlinear model displayed the better fit to the data, and was, thus, retained. We then fit a covariate-adjusted model, with students’ cohort, race/ethnicity, sex, FRPL status, and limited English proficiency (LEP) status entered as student-level variables predicting students’ initial achievement and rate of growth. Because these variables were entered at the student level, any changes across time were collapsed to the most common value (i.e., the mode). Finally, the full model was fit, with students’ SLD status (always, sometimes, and never) added as a student-level predictor of students’ initial achievement and rate of growth. The full model was, therefore, specified as
where
Results
We summarize the results from the three main questions that focus on (a) changes in SLD status and receipt of special education services over three grades (6–8) and three cohorts as well as their performance on a state test, (b) movement across five proficiency categories for the three groups of students, and (c) multilevel models that condition growth as a function of SLD status and special education service for the three cohorts.
The results for the first research question are displayed in Table 1 for all possible patterns of SLD classification and special education services across the three grades and three cohorts: the total number of students represented in each pattern, as well as the corresponding performance on the state test (M and SD for the RIT score) for each grade. For all SLD groups, the differences in scaled scores were greater between Grades 6 and 7 than between Grades 7 and 8. The majority of students with a classification were always SLDs with special education service across all three grades (71%). However, more than a quarter of students were sometimes classified as SLDs and, therefore, received special education services differently at some point across the three grades. Two dominant patterns were found: (a) students with an SLD in Grade 6 receiving special education services that were later reclassified and these services removed (in Grade 7 and/or Grade 8), given they were no longer labeled with an SLD; or (b) students without an SLD in Grade 6 were later identified with an SLD and began receiving special education services (in Grade 7 and/or Grade 8). A very few number of students were with a different classification in the middle grade than the beginning or ending grade. The score differences between students always with an SLD and, therefore, were receiving special education across all three grades was only about 0.25 SD points from those whose status changes somewhere from Grade 6 to Grade 8. This pattern was stable across all three cohorts.
To answer the second question, we developed a transition matrix that displays the change in proficiency category from Grade 7 (horizontal axis) to Grade 8 (vertical axis) in Figure 1 for students who were always SLDs and received special education for all three grades, sometimes SLDs and received special education partially during the three grades, and never SLDs and never received special education services across the three grades. This figure displays a heatmap, with darker colors indicating higher proportions of students in that transition cell. The percent proficient (with scores of 4 or 5) from either group of students always with an SLD and receiving special education services or sometimes with an SLD and receiving special education partially in the three grades was lower than students never classified with SLDs and never receiving special education. The former two groups did not differ in two transitions, whether they reflected low or meets in the proficiency categories, with the two major transitions being from 2 to 2 and 4 to 4.

Percent of students transitioning across proficiency categories from Grades 7 to 8 using a heatmap for three groups of students.
We began our analysis of longitudinal growth over three grades using scale scores (Research Question 3) and the effect of SLD status with receipt of special education services on both intercept and slope. First, we displayed the growth across the three grades for the three SLD service groups using RIT scores to determine whether a linear or nonlinear model was needed. Then, we conducted a multilevel model of growth, first unconditionally with no student characteristics and then included cohort, race, gender, FRPL, English language, and SLD special education services. Finally, we documented effect sizes from grade to grade for SLD special education services.
Figure 2 displays the model implied and observed means from the full model for each SLD service status within each cohort for the demographic reference group (i.e., White female students who were not eligible for FRPL and were not designated as having LEP). As can be seen, the model was highly nonlinear, which was largely accounted for by the log transformation. However, for this particular group of students, the model tended to overestimate the achievement of students who were always identified as SLDs receiving special education services over all three grades and sometimes SLDs, therefore, receiving special education only partially in the interval.

Model-based growth trajectories compared with the raw means by cohort.
We began the multilevel growth model, first by fitting an unconditional linear growth model, and compared this model with a nonlinear model using a log transformation applied to the assessment wave, as described above, which resulted in a substantial reduction in the model AIC (
The addition of the control variables to the model resulted in a substantial reduction in the model AIC (
Table 2 provides the parameter estimates for the final model (see Note 1). Although it includes parameter estimates for all of our control variables, we only summarize our primary variable of interest—shifting SLD status and receipt of special education services over time. Students identified as always SLDs and receiving special services all three grades had the lowest initial achievement, scoring 10.21 points lower than the reference group, averaged across all demographic characteristics included in the model. However, they were among the fastest growing group of students, progressing 0.49 points more than the reference group, on average, from Grades 6 to 7 (i.e., log(2) × 0.70), and 0.28 points more, on average, from Grades 7 to 8 (i.e., log(3) × 0.70 − log(2) × 0.70). Students identified as sometimes SLDs with partial receipt of special education services had an initial achievement 8.09 points lower, on average, but also progressed faster than the reference group, with average gains 0.42 points higher from Grades 6 to 7, on average, and average gains 0.25 points higher from Grades 7 to 8.
Multilevel Model Fixed Effects: Full Model With Cohort, Student Demographics, and SLD Status.
Note. CIs reported by profiling the log-likelihood, which allows for nonuniform distributions. SLD = specific learning disability; CI = confidence interval; FRPL = free and reduced-price lunch; EL = English learner; LEP = limited English proficiency.
Table 3 displays the random effect estimates from the full model. Students’ initial achievement varied between students with an SD of 8.00 points (95% confidence interval [CI] = [7.96, 8.04]), whereas the average growth varied with an SD of 2.80 (95% CI = [2.73, 2.86]).
Multilevel Model Random Effects: Full Model (With Cohort, Student Demographics, and SLD Status).
Note. CIs reported by profiling the log-likelihood, which allows for non-uniform distributions. Students’ initial achievement and rate of growth correlated at –.16 (95% CI = [–0.17, –0.15]). SLD = specific learning disability; CI = confidence interval.
The average gains are also displayed in Table 4 in terms of effect sizes by SLD status and receipt of special education services. Generally the ES findings mimic the findings from the modeling: All three groups show a 1 SD difference from Grade 6 to Grade 7 and overall from Grade 6 to Grade 8. At the same time, the ES for gain from Grade 7 to Grade 8 is considerably lower (only a quarter or a third SD). These findings are similar to those reported in Figure 1, depicting the change in scale score performance as nonlinear, with more growth in the first two grades than the later two grades (with a deflection in the middle grade).
Effect Size Estimates of Gain Scores by Amount of Special Education Services From Grade to Grade.
Note. SLD = specific learning disability.
Discussion
We focused on three questions for students with SLDs receiving special education services for three groups (always, sometimes, and never) across three grades (6–8) and three cohorts: (a) the number of students in the three groups and their performance on a state test in each grade, (b) the amount of change in proficiency over the final two grades for these three groups, and (c) improvement on a scale score across the three grades for these three groups.
In conclusion, most students were either always identified with an SLD and, therefore, received special education services throughout the entire three grades or they were never classified as SLDs and, therefore, never received special education services. A sizable group of students, however, received special education services for only one or two grades: They were identified with an SLD at the beginning of the interval but not later, or not at the beginning of the interval but then later. Both groups of students with SLDs and receiving special education services (always or only sometimes over the grades) were below their general education peers (in both proficiency level and Grade 6 starting performance); however, they showed significant improvement relative to this general education referent, whether the outcome was expressed as transition across proficiency categories or improvement using scale scores.
The two groups of students with SLDs and receiving special education services (whether always or sometimes) were different from each other but the practical significance was relatively small and not sufficient to lead to different outcomes on the state test. About 20% to 25% of the students from both groups remained in the not proficient category of 2 from Grade 7 to Grade 8, whereas 15% to 20% remained in the proficient category of 4. When the analysis used the state scale score (RIT), students who were always with SLDs and received special education services for all three grades started out at Grade 6, about 1.6 RIT scores lower than students who were sometimes SLDs and received less than the 3 years of special education services (only 1 or 2 years). The difference between these two groups in growth was only 0.10 RIT score values over the three grades.
These findings are consistent with previous NCAASE research (Stevens & Schulte, 2017; Stevens et al., 2015) on growth and gaps: Students with SLDs tend to be lower on state test performance but improve over time. This improvement may be statistically significant, but probably does not lessen the gap between them and those without disabilities.
Student Samples and Classification Issues for SWD and SLDs
Perhaps the most significant finding is that the composition of student subgroups needs to be considered, whether they are being served in special education in general or with an SLD (Stevens & Schulte, 2017; Stevens et al., 2015; Wei et al., 2013). Particularly for students with SLDs (always or sometimes over three grades), the manner in which they are identified using an RTI conflates issues of identification with receipt of special education services. For example, students identified later in Grade 7 or 8 with an SLD, may be positive responders to RTI for a year or two. However, students with an SLD in Grade 6 but then later classified as not with an SLD in Grade 7 or 8, indeed may be a different kind of success in which RTI provides a proven integration of them into general education without special services.
Differences from previous research are apparent in the grouping of students as SLDs, as well as their grade/ages and demographics. Our study is most comparable with Schulte and Stevens (2015) in grouping of students (always and sometimes), though they only considered special education status whereas we focused on SLDs. We also considered a reference group (never classified as SLDs in each grade) and did not consider students with SLDs only at the beginning of the time period (referred to as once). Furthermore, the longitudinal research by both Schulte and Stevens (2015) and Stevens et al. (2015) spanned Grades 3 through 7, whereas we studied students in Grades 6 through 8. Finally, all three previous studies (Schulte & Stevens, 2015; Stevens & Schulte, 2017; Stevens et al., 2015) were conducted with a North Carolina data set, reflecting a population considerably different from those in Oregon. These differences were particularly prevalent on characteristics such as race–ethnicity, ELs, and FRPL. And, as specifically noted in this last study by Stevens and Schulte, interactions were found between these characteristics and SLDs, which we did not study.
In all these studies, the percentage of students with disabilities (irrespective of type) was higher than the most recent figures from the Annual Report to Congress (U.S. Department of Education, 2015). In our study, the percentages of SWD was 11% in all grades (6, 7, and 8) across all three cohorts (2008, 2009, and 2010). In the previous studies by the NCAASE researchers, the percentages ranged from 12% at Wave 1 (in Grade 3), 16% ever, and 6% always (Schulte & Stevens, 2015). For Stevens et al. (2015), these percentages were 14% from the total sample and 12% from the analyzed sample in special education and about 6% with SLDs in both samples. Finally, Stevens and Schulte (2017) reported about 6% of the sample was SLDs. In contrast, with the national percentage of SWD in 2013, nearly 6 million students aged 6 through 21 years were served under the IDEA, Part B, representing 8.5% of the resident population ages; of this sample, 40% were with SLDs (U.S. Department of Education, 2015). In the current study, we had 7% of the students with SLDs in all grades (6, 7, and 8) across all three cohorts (2008, 2009, and 2010).
The identification of SLD needs to be considered when interpreting our findings. When SLD was first prominent in the early and mid-1970s through the 1980s, teacher recommendation appeared to be the dominant identification criterion, and this group was indistinguishable from low-achieving students using a multitude of identification criteria (Ysseldyke et al., 1979). Yet, by the early 2000s, different models for identification were being promulgated, including RTI (Fuchs & Vaughn, 2012; Reschly, 2005), along with complementary considerations such as strengths and weaknesses. As noted by the National Center for Learning Disabilities (2014), special educational services may have reached into general education. Clearly, identifying students is not the isolated issue it once was in the decade of research conducted by Ysseldyke et al. (1979) and Ysseldyke et al. (1982) or Shepard and Smith (1983). In the time period of this research, RTI must be considered as not only an issue of identification but also the provision of special education services through multiple tiers.
Documenting Growth on State Accountability Measures
When the measurement of growth was based on changes in proficiency, our findings were consistent with previous research that has noted the majority of students remain in the same category from 1 year to the next. This finding of the insensitivity of transition matrices to show change is similar to the findings reported by Tindal et al. (2017). In the current study as well as the previous research, transition matrices tend to be proficiency centric: Once in a category, continued presence is in that category. This intransigence of proficiency categories also appears for students with significant disabilities (Tindal & Nese, 2015; Tindal et al., 2016).
In the current study, we found that the intercept for SLD (whether always or sometimes) was significantly (10 and 8 scale score points) lower than students never with SLDs and never receiving special education services, respectively). The difference in growth between these groups of SLDs was significant (0.70 vs. 0.61), with the reference group of students never with SLDs and never receiving special education services; the difference between these growth rates, however, overlapped in the CIs. Furthermore, this rate of growth is unlikely to close the achievement gap.
Our results on achievement level are consistent with previous studies but different on growth rates from previous studies. However, our findings may be a function of differences in student samples being studied, as well as the measures and growth models being used. For example, Morgan, Farkas, and Wu (2011) reported slower growth using a multilevel model, for students with SLDs and speech-language disabilities. However, they were using Early Childhood Longitudinal Study–Kindergarten Cohort (ECLS-K) with kindergarten students who were then studied over a 5-year period. Wei et al. (2013) also used a multilevel model and reported on different disability classifications using the SEELS database of students aged 7 to 17 years. Their main conclusion was that “students in 11 disability categories not only had lower math achievement, but their math achievement also grew more slowly than that of their peers in the general population in elementary school” (p. 162). Stevens and Schulte (2017) reported significantly lower initial levels of performance and lower growth rates for LD, while also documenting the effect that other demographic characteristics. Finally, Schulte and Stevens (2015) reported significantly lower initial levels of achievement and the lowest growth rates on the North Carolina state test for students always in special education from Grades 3 to 7 (a 5-year span).
Our findings on closing the gap using effect sizes were a bit more positive than those reported by Schulte and Stevens (2015), who reported “a substantial gap between the achievement of SWDs and non-SWDs was observed at each grade level” (p. 14). When we computed effect sizes (see Table 4), we confirmed that, for both groups of students with SLDs (always and sometimes), the ES was greater than students who never were SLDs and never received special services. According to Schulte and Stevens (2015), “the gap widened more across grades when SWD subgroup membership was allowed to vary by year as in NCLB” (p. 14). However, we found that the ES gap for both SLD groups narrowed from Grade 6 to Grade 7 and overall, from Grade 7 to Grade 8.
Although not directly implicated in the previous research is the relation between identification as a student with a disability (or SLD) and the amount of special education services received. Because students in this study were identified with an SLD using an RTI model, this consideration becomes important. Students can be identified earlier or later (and receive special services or have them removed) in a more fluid manner that integrates special and general education. For example, if Tier 2 interventions are not working, services are intensified (in time, grouping, explicitness, etc.), and the student is identified with an SLD. At the same time, if services are effective in Tier 3, it is more likely to deploy them in a less intensive manner in Tier 2 and the student is identified without a disability. This model presumes less of a medical model than a functional model (of a disability). However, given that this study was in mathematics and with middle school teachers, the RTI model needs to also consider student class schedules, different teachers from grade to grade, subject matter differences, and so forth, all of which complicate the RTI model used in identification
Limitations
A few limitations need to be considered in the manner we conducted our study that temper our interpretations. Specifically, we need to consider issues in identification of an SLD, the test being used, and final outcome for growth models (and particularly for transition matrices).
Although we found only slight differences in whether students were always or sometimes identified as SLDs, the system being used for SLD identification has changed considerably in the past 10 years. RTI has become the dominant model, which is considerably different than the ability–achievement discrepancy that was likely more prominent in the previous decade than the interval for collecting the data from our study (2007–2008 to 2011–2012). As noted earlier, this identification system may be highly related to provision of services.
Another limitation is that the results are based on only a single state test no longer administered. An important justification, however, for studying a test no longer being deployed is that the findings from this study can be better interpreted with this test, in the context of the other research being conducted with the NCAASE. Furthermore, the results can serve as a reference, should the same analyses be conducted with different state tests or from interim assessments. Now that states have adopted a common test as part of a consortium (e.g., Partnership for Assessment of Readiness for College and Careers [PARCC] and Smarter Balanced), future research should provide more solid outcomes for interpreting important influences on growth, whether it is the population being studied, the measure used to test students, or the model deployed for documenting level of achievement and growth.
Finally, our results for transition matrices rests on a complex logic chain that depends not only on the reliability of the test (in this study, OAKS) but also on the process for setting achievement standards (Cizek & Bunch, 2007). In this state, the benchmarking method was deployed with three phases that included marking items at three cut points: (a) between meets and exceeds, (b) between nearly meets and meets, and (c) between low and very low. This process was first done independently and then through consensus, following which an impact analysis was conducted so raters could adjust their cut points. Of course, in this process, judgments are made that may be problematic. In the end, a scale is reduced from a wide range to one that is only points, so it is not surprising that transition matrices are insensitive. Yet, using more sensitive scales reveals lower initial performance and statistically significant but minor improvement.
Implications
Conclusions about school accountability should consider the specific student subgroups being reported, the measure being used, and the growth model being deployed. As noted by Schulte and Stevens (2015), “alternate methods of identifying the SWD subgroup for NCLB or other public reporting of disaggregated test results should be considered that include tracking students after they leave special education” (p. 15). Particularly with time-varying covariates (English language, disability status, and probably, most important, FRPL as the proxy for poverty), analytical models need to reflect the changing nature of student characteristics. And, in this time series, policy changes and their implications need to be documented to punctuate and condition possible explanations.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded in part by a Cooperative Service Agreement from the Institute of Education Sciences (IES) establishing the National Center on Assessment and Accountability for Special Education (NCAASE; PR/Award Number R324C110004). The findings and conclusions do not necessarily represent the views or opinions of the U.S. Department of Education.
