Abstract
We examined whether U.S. schools systemically discriminate when suspending or otherwise disciplining students with disabilities (SWD). Eighteen studies met inclusion criteria. We coded 147 available risk estimates from these 18 studies. Of four studies including individual-level controls for infraction reasons, over half of the available estimates (i.e., 14 of 24, or 58%) failed to indicate that SWD were more likely to be suspended than otherwise similar students without disabilities. Of the seven available estimates adjusted for the strong confound of individual-level behavior, most (i.e., five of seven, or 71%) failed to indicate that SWD were more likely to be suspended. The other two estimates indicating SWD were more likely to be suspended were from one study. We also examined whether SWD were less likely to be suspended than otherwise similar students without disabilities. There was no strong evidence of this. Empirical evidence regarding whether U.S. schools discriminate when disciplining SWD is currently inconclusive.
Students with disabilities (SWD) have been reported to be disproportionately suspended from U.S. schools (Achilles, McLaughlin, & Croninger, 2007; Losen & Gillespie, 2012; U.S. Department of Education Office for Civil Rights [OCR], 2014), including SWD who are of color (Achilles et al., 2007; Krezmien, Leone, & Achilles, 2006; Losen & Gillespie, 2012; U.S. Government Accountability Office [GAO], 2018). These reported disparities have led to suggestions that U.S. schools use discriminatory disciplinary practices (Kim, Losen, & Hewitt, 2010; Losen & Gillespie, 2012). Ensuring that SWD are not being unfairly suspended is important because of suspension’s associations with lower academic achievement, school dropout, substance abuse, juvenile delinquency, and adult criminality (Katsiyannis, Thompson, Barrett, & Kingree, 2013; Mittleman, 2018; Morris & Perry, 2016; Mowen & Brent, 2016; Noltemeyer, Ward, & Mcloughlin, 2015). Being identified as disabled has been hypothesized to increase the risk for being suspended and so of entering the metaphorical school-to-prison pipeline (Behnken et al., 2014), although other work finds this not to be the case (Wright, Morgan, Coyne, Beaver, & Barnes, 2014).
U.S. schools may legally suspend SWD. However, the Individuals With Disabilities Education Act (IDEA) provides SWD with greater legal protections than are afforded students without disabilities (Rothstein & Johnson, 2014; Ryan, Katsiyannis, Peterson, & Chmelar, 2007). U.S. schools can suspend SWD as they would students without disabilities for a total of 10 or fewer school days per year. However, for suspensions exceeding 10 total school days, SWD are entitled to receive (a) continued access to special education services, (b) manifestation determination reviews to assess whether the suspension was related to their disabilities or for failures by their schools to properly implement their individualized education programs (IEPs), and (c) functional behavior assessments and behavior implementation plans (Rothstein & Johnson, 2014; Ryan et al., 2007). The IDEA also requires local education agencies to report whether there is significant disproportionality in the extent to which SWD who are of color are suspended or otherwise disciplined. The Equity in IDEA Rule (U.S. Department of Education [DoE], 2016) would, if implemented, further expand these requirements. The rule’s implementation was delayed to allow for further scientific study including whether the disciplinary disparities result from systemic bias or instead from alternative explanatory factors (DoE, 2018). A federal district court subsequently ordered implementation of the Equity in IDEA Rule to continue based on procedural grounds. (The DoE (2019) is implementing the Rule while also appealing the court’s decision.
Using a Differential Treatment Standard to Assess for the Use of Discriminatory Disciplinary Practices
Because U.S. schools may legally suspend SWD, evidence of disparities in suspension between students with and without disabilities is insufficient to infer that the disparities result from the systemic use of discriminatory disciplinary practices. Instead, it may be that the disparities result from differences in the extent to which students with and without disabilities engage in disruptive or other types of behaviors that might reasonably result in suspension (e.g., fighting, threatening a teacher, bringing a weapon to school). A wide range of disability conditions, including the conditions for which most SWD in the United States are identified (e.g., learning disabilities, speech or language impairments, attention deficit hyperactivity disorders), are associated with impairments in impulse control and self-regulation, attentional difficulties, and more frequent externalizing problem behaviors (DuPaul, Gormley, & Laracy, 2013; Larson, Russ, Kahn, & Halfon, 2011; Peyre et al., 2016) and so behaviors that might make suspension more likely. SWD may also be more likely to engage in disruptive behaviors as a result of experiencing academic difficulties (Goldston et al., 2007; Morgan, Farkas, & Wu, 2009). Academic difficulties increase the risk for self-reported feelings of anger and peer rejection (Morgan, Farkas, & Wu, 2012), teacher-rated externalizing problem behaviors (Morgan, Farkas, Tufis, & Sperling, 2008), and teacher-rated attention deficit hyperactivity and conduct disorder symptoms (Morgan et al., 2016).
Contrasts between students who are similarly situated including in regard to factors materially relevant to being suspended such as engaging in disruptive behavior would help establish whether and to what extent U.S. schools suspend or otherwise discipline SWD in ways that discriminate based on disability status (OCR, 2016; National Research Council [NRC], 2004). The OCR (2016) states that “students are similarly situated when they are comparable (even if not identical) in all material respects” (p. 8). When evaluating for differential treatment in the use of disciplinary practices the most materially relevant factor regarding whether students are directly comparable is their behavior in school (Huang, 2018; Wright et al., 2014), although additional factors may also help ensure that students are similarly situated. These additional factors might include age, grade, and other indicators of school performance including academic achievement (OCR, 2016). This standard of differentially treating similarly behaving students has been used to test for racial discrimination in suspension (e.g., Kinsler, 2011; Wright et al., 2014). For example, Skiba et al. (2011) found that among students engaging in minor misbehaviors, Black and Hispanic elementary students were more likely to be suspended. Among those engaging in disruptive behaviors, students of color were also more likely to be suspended or expelled. Skiba et al. (2014) found that conditional on individual-level behavior and other covariates as well as aggregate-level factors, Black students were more likely than White students to be expelled. Contrasting similarly behaving students accounts for the strong confound of differential involvement in behaviors that might reasonably result in suspension and so provides stronger evidence of the use of discriminatory disciplinary practices (Huang, 2018; Wright et al., 2014). The OCR (2018) uses the differential treatment standard when deciding whether to conduct civil rights investigations of whether U.S. schools are using discriminatory disciplinary practices.
Contrasting similarly situated students with and without disabilities would also allow for an evaluation of whether SWD are less likely to be suspended or, if suspended, to receive suspensions of shorter durations. SWD might be less likely than students without disabilities to be suspended or to receive shorter suspensions because of federal monitoring and funding reallocation mandates in districts reporting significant disproportionality in discipline as well as IDEA’s legal protections pertaining specifically to the total days of suspensions that SWD receive. In analyses using school fixed effects and conditioning on prior offenses, achievement, and economic disadvantage, Kinsler (2011) found that SWD received shorter suspensions than students without disabilities. Whether SWD including those who are of color are less likely to be suspended than otherwise similar students without disabilities has yet to be systematically evaluated.
Findings from a best-evidence synthesis of whether U.S. schools are more likely to suspend or otherwise discipline SWD than otherwise similar students without disabilities would help establish the strength of the empirical evidence base of whether schools systemically use disciplinary practices that discriminate against SWD. Such findings would also inform federal civil rights legislation and regulation. A best-evidence synthesis is particularly timely given the DoE’s (2018) request for further scientific study regarding whether significant disproportionality in discipline results from systemic bias or instead from factors such as differential involvement in behaviors likely to result in exclusionary discipline.
Purpose of This Synthesis
We synthesized the best-available empirical evidence to evaluate whether and to what extent U.S. schools use discriminatory practices when suspending or otherwise disciplining SWD. We were particularly interested in establishing whether the well-documented greater risk of suspension of SWD is attributable to disability status and so possibly due to the use of discriminatory disciplinary practices by schools or instead to alternative explanatory factors including differential involvement in behaviors likely to result in suspension (Huang, 2018). To better inform federal legislation and policy making, we also examined the extent to which the available studies analyzed nationally representative samples and so reported generalizable results. We examined the following three sets of research questions: We synthesized the best-available empirical evidence to evaluate whether and to what extent U.S. schools use discriminatory practices when suspending or otherwise disciplining SWD.
What is the strength of the empirical evidence that SWD are more likely than otherwise similar students without disabilities to be suspended, particularly as increasingly strong confounds are accounted for including individual-level behavior? Is there consistent evidence that among similarly behaving students, U.S. schools differentially suspend SWD? To better inform federal legislation and policy making, we also examined to what extent the available evidence is based on nationally representative samples.
What is the strength of the empirical evidence that SWD who are of color are more likely to be suspended than otherwise similar SWD who are White?
What is the strength of the empirical evidence that SWD are instead less likely to be suspended?
Method
We searched for empirical studies published prior to June 2018 in four electronic databases (ERIC, Web of Science, Pubmed, and PsycInfo) as well as in Google Scholar. We used the following search terms: suspension, school discipline, expulsion, and special education or disability. We also included the terms race or racial/ethnic minority in our database search. Two doctoral students independently conducted two rounds of initial search and identified a total of 112 nonduplicated studies. Supplemental Figure 1, available online, displays a PRISMA diagram of the search process. Following this initial search, a postdoctoral scholar independently completed a third search as an additional fidelity check. In this third round, we reviewed the reference lists of studies found via search terms as well as in prior reviews (e.g., Fenning & Rose, 2007; Gregory, Skiba, & Noguera, 2010) to identify other eligible studies. This identified four additional studies from the four databases and seven additional studies through reference lists, yielding a total of 123 empirical studies for initial inclusion consideration. In the second stage, we excluded studies examining disciplinary actions not including suspension risk or using only samples of students without disabilities. This removed 73 studies, leaving 50 studies reporting on some type of suspension risk for SWD.
The two doctoral students and the postdoctoral scholar independently judged the eligibility of the 50 studies using six-part inclusion criteria. We calculated an intercoder agreement rate (i.e., the number of included and excluded studies that all members agreed upon divided by the total number of studies) of 92%. An intercoder meeting led to the unanimous inclusion of 18 studies at the final stage of our best-evidence synthesis. Fourteen of the 18 studies examined risk factors for suspension specifically. Four additional studies examined risk factors for exclusionary discipline more generally (e.g., suspension but also expulsion) and so are analyzed separately in the online supplemental material.
Inclusion Criteria
We applied the following six-part inclusion criteria to the 50 studies identified across the first two rounds of search. First, did the study use a quantitative empirical design? Studies were excluded if they either qualitatively or theoretically examined suspension of SWD (e.g. Brobbey, 2018). Second, did the study report numerical estimates in the form of regression coefficients corresponding to suspension of SWD? This criterion excluded studies that provided no regression estimates in their results (e.g., Garnett, 2014; Losen & Gillespie, 2012; Miller & Meyers, 2010; Morrison & D’Incau, 1997; Skiba, Poloni-Staudinger, Gallini, Simons, & Feggins-Azziz, 2006; Whitford, Katsiyannis, & Counts, 2016). Third, we included peer-reviewed studies, working papers, and dissertations in our synthesis. Doing so helped limit potential publication bias (Joober, Schmitz, Annable, & Boksa, 2012; Pigott, Valentine, Polanin, Williams, & Canada, 2013). Fourth, did the study analyze a sample of students attending U.S. schools between kindergarten and 12th grade? This criterion excluded two studies conducted in the United Kingdom (i.e., Ford et al., 2018; Paget et al., 2018). Fifth, did the study’s analyses include at least one covariate when estimating the risk attributable to disability status? We included only studies that used at least one covariate to synthesize estimates that were less ambiguously attributable to disability status. This fifth criterion excluded one study (i.e., Losen, Hodson, Ee, & Martinez, 2014). Sixth, did the study report risk estimates based on a reference group of students without disabilities? This last criterion excluded five studies that estimated the risk of suspension but only between students with specific disabilities (e.g., students with learning vs. students with behavioral disabilities; Achilles et al., 2007; Bowman-Perrott et al., 2011; Duran, Zhou, Frew, Kwok, & Benz, 2013; Goran & Gage, 2011; Sullivan, Van Norman, & Klingbeil, 2014). We excluded these studies because they did not report estimates pertaining to whether SWD were more or less likely to be suspended than students without disabilities. A total of 18 studies were included in the best-evidence synthesis after applying the six-part criteria.
Best-Evidence Methodology
A best-evidence synthesis examines studies with the strongest internal and external validity (Slavin, 1986). Our minimal criterion for study inclusion was use of at least one covariate in analyses reporting on the risk of suspension associated with disability status. We then examined how the estimates reported in these studies fluctuated as both aggregate- and individual-level units of analysis and covariates were included. We were particularly interested in results from studies that used regression or other methods to approximate contrasts between students who were similarly situated including in regard to behavior and so the factor most materially relevant to being suspended. We also examined to what extent the results changed with the inclusion of covariates related to either the types of infractions committed by students or their assessed behavior. We considered estimates that controlled for directly assessed and individual-level behavior (e.g., self-reported or teacher or parent ratings of an individual student’s behavior but not school administrator surveys of the percentage of the school’s students who had engaged in fights) as the best available. We also examined the extent to which these estimates were based on analyses of nationally representative samples and so generalizable to the U.S. school-aged population.
Analyses
Supplementary Table 1, available online, reports on descriptive characteristics of the 18 studies. Tables 1 and 2 display our coding results. Table 1 displays estimates of whether SWD were more likely to be suspended conditional on an increasingly rigorous set of controls. Table 2 displays estimates of whether SWD were less likely to be suspended conditional on an increasingly rigorous set of controls. For each study, we calculated the ratio of significant regression coefficients finding that SWD were more or less likely to be suspended relative to the total number of regression coefficients reported (see Tables 1 and 2, respectively). For example, K. A. Anderson, Howard, and Graham (2007) reported three estimates of the likelihood of suspension for SWD. Only one of these indicated that SWD were significantly more likely to be suspended than their peers. None of the estimates in K. A. Anderson et al. found that SWD were significantly less likely to be suspended than their peers. Therefore, estimates for K. A. Anderson et al. are reported as “1/3” in Table 1 and as “0/3” in Table 2, respectively.
Characteristics of Studies Finding Students with Disabilities are More Likely to be Suspended.
Nationally representative sample.
Characteristics of Studies Finding that Students with Disabilities are Less Likely to be Suspended.
Nationally representative sample.
We arranged the rows in both tables as follows. Row 1 displays results from studies analyzing only individual-level data and so without contextualizing the results within schools, districts, or states by also including aggregate-level covariates. Row 2 displays estimates from studies analyzing both individual- and aggregate-level data (e.g., multilevel models with both individual- and school-level controls or school fixed-effects models) but without any type of adjustment for individual-level behavior. Row 3 displays estimates more distally adjusted for behavior by controlling for the type of school conduct code violation or type of infraction for which students had been suspended at the aggregate level (e.g., the schoolwide percentage of fighting incidents). Row 4 again displays estimates adjusting for the infraction type at the individual level. Row 5 displays estimates adjusted more proximally for a student’s own behavior whether as reported by students or by teachers.
The coded studies were not always explicitly designed to investigate whether SWD were being inappropriately suspended. Of the 18 studies included in this synthesis, 14 studies examined risk factors for suspension separately from other types of exclusionary discipline (e.g., expulsion). Only three of the 14 studies reporting risk estimates for suspension explicitly stated a research question relating to SWD (i.e., Camacho, 2016; Krezmien et al., 2006; Krezmien, Travers, & Camacho, 2017). Three other studies investigated whether, more generally, individual-level sociodemographic factors related to the risk for suspension or other disciplinary practices (i.e., Anyon et al., 2014; Cholewa, Hull, Babcock, & Smith, 2017; Sullivan, Klingbeil, & Van Norman, 2013). The other eight studies investigated racial or ethnic disparities in disciplinary practices but included disability status as a covariate (i.e., K. A. Anderson et al., 2007; Anyon et al., 2016; Cornell, Maeng, Huang, Shukla, & Konold, 2018; Huang, 2018; Kinsler, 2011; Morris & Perry, 2016; Roch & Edwards, 2017; Wright et al., 2014). These studies best approximated contrasts between similarly situated students by including a range of covariates in the regression models.
Some studies reported multiple analyses. For example, four of the 14 studies included dependent variables that aggregated suspension with other types of exclusionary discipline (e.g., expulsion). We report estimates from these four studies separately in Supplementary Tables 2 and 3, available online. We considered these estimates as indicating the risk for exclusionary discipline generally because they were not disaggregated for suspension specifically. Ten of the remaining 14 best-evidence studies that reported specifically on suspension used a dichotomous dependent variable of whether students were ever suspended. The other four studies analyzed rates of suspension (i.e., Camacho, 2016), number of days suspended (i.e., Kinsler, 2011), the percentage of schoolwide disciplinary actions that were out-of-school suspensions (i.e., Roch & Edwards, 2017), or whether students were suspended once or more than once (i.e., Sullivan et al., 2013). We included these significant coefficients in Tables 1 and 2. For studies reporting several sets of analyses in which the level of analysis differed across models, we separated the estimates in rows of Tables 1 and 2. For example, we included two coefficients from Huang’s (2018) models containing aggregate- and individual-level data in Table 1’s row 2 and two different coefficients that adjusted for individual-level behavior in row 5. Figure 1 displays the average odds ratio (OR) effect sizes from the best-available studies. Figure 1’s row 1 displays estimates from studies controlling for infraction type but only at the aggregate level. Row 2 displays estimates from studies controlling for infraction type but at the individual level. Row 3 displays estimates from studies controlling for individual-level behavioral assessments. We viewed Table 1’s row 5, Table 2’s row 5, and Figure 1’s row 3 results as the best empirical evidence currently available of whether U.S. schools are suspending SWD in ways that may be discriminatory.

Average odds ratios (ORs) reported within select best-available studies (using ln of each OR to put in additive form, averaging, and then exponentiating the average). Estimates from Sullivan, Klingbeil, and Van Norman (2013) correspond to row 3 of Tables 1 and 2 that condition on infraction type measured at the aggregate level. Estimates from Anyon et al. (2014, 2016) and Cornell, Maeng, Huang, Shukla, and Konold (2018) correspond to row 4 of Tables 1 and 2 that condition on infraction type measured at the aggregate level. Estimates from Huang (2018) and Wright, Morgan, Coyne, Beaver, and Barnes (2014) correspond to row 5 of Tables 1 and 2 that condition on individual behavioral assessments. Average OR represents the average value of all estimates within the study. The significant OR represents only the average value of the estimates that were statistically significant. Two studies (Kinsler, 2011; Roch & Edwards, 2010) did not report ORs and so are not included in Figure 1.
Results
Are SWD More Likely to Be Suspended Following Controls for Aggregate- and Individual-Level Covariates but Not Including Behavior?
Of studies using only individual-level covariates, 65 of 100 estimates (i.e., 65%) indicated that SWD were more likely to be suspended than students without disabilities. Across the risk estimates adjusted for individual- and aggregate-level covariates but not the strong confounds of differential behavioral functioning, seven of the seven available estimates (100%) indicated that SWD were more likely to be suspended. Adjusting for infraction type at the aggregate level (i.e., controlling for the schoolwide proportion of certain discipline code violations), nine of the nine estimates (i.e., 100%) indicated that SWD were more likely to be suspended. However, controlling for individual-level infractions resulted in over half of the estimates (i.e., 14 of 24, or 58%) failing to indicate that SWD were more likely to be suspended. Of the seven available risk estimates adjusted for individual-level behavior, most (i.e., five of seven, or 71%) failed to indicate SWD were more likely to be suspended.
Of the seven available risk estimates adjusted for individual-level behavior, most (i.e., five of seven, or 71%) failed to indicate SWD were more likely to be suspended.
Table 1’s row 1 includes models that analyzed individual-level data but without adjusting for behavior as well as without adjusting for school-level characteristics through regression adjustment or fixed-effects models. Of the 100 available estimates, 65 (i.e., 65%) indicated that SWD were more likely to be suspended. Because these studies did not adjust the risk estimates for school-level factors, the estimates did not account for variability between schools in their use of “zero-tolerance” disciplinary policies that may also be related to the risk of being suspended (Kinsler, 2011). That is, estimates from these studies were not adjusted for school contexts where suspension may be more likely to occur.
Row 2 includes estimates that were adjusted for both individual- and aggregate-level covariates including through school fixed effects (e.g., Huang, 2018) or multilevel modeling with school-level controls to account for clustering (e.g., Sullivan et al., 2013). Each of the seven available estimates (i.e., 100%) indicated that SWD were more likely to be suspended. Only three of the 18 studies analyzed nationally representative data (Cholewa et al., 2017; Huang, 2018; Wright et al., 2014). Six of the 11 estimates (55%) from the three studies indicated SWD were more likely to be suspended than students without disabilities.
Do U.S. Schools Differentially Suspend SWD Among Similarly Behaving Students?
Table 1’s rows 3, 4, and 5 display risk estimates from studies that adjusted for behavior, either for the type of school conduct code infraction at the aggregate (i.e., row 3) or individual level (i.e., row 4) or for directly assessed individual-level behavior (i.e., row 5). We considered row 4’s estimates as relatively more rigorous than row 3’s estimates because using aggregate-level data to make individual-level inferences often yields spurious results and so are subject to the “ecological fallacy” (Robinson, 1950). We considered estimates from the two studies in Table 1’s row 5 as the most rigorous evidence available regarding whether, among students who are similarly situated including on the factor most materially related to being suspended (i.e., directly assessed, individual-level behavior), SWD were more likely to be suspended. We considered row 5’s estimates to be more rigorous than row 3’s or row 4’s estimates because of the possibility of unaccounted-for heterogeneity in behavior that may have occurred within as well as between types of infractions. For example, among students suspended for the same infraction of fighting, some students might have behaved in ways that were more violent than others.
Of the 14 studies reporting risk estimates specifically for suspension, six reported results adjusted for infraction type. Controlling for school- or teacher-reported infractions helped isolate the risk of suspension among otherwise similar SWD. That is, this method controlled for inadvertently contrasting students disciplined for violent infractions like possessing a weapon to those suspended for nonviolent infractions like disobedience. Two of these six studies adjusted for infraction type by controlling for aggregate proportions of disciplinary offenses across schools. Four included individual adjustments for specific school code violations. For example, Roch and Edwards (2017) controlled for the number of incidents that occurred within schools and Anyon et al. (2016) controlled for the type of disciplinary referral. Similar to studies that included both individual- and aggregate-level data but without adjustments for individual-level infraction type, each of the estimates (i.e., 100%) including infraction controls but at the aggregate level indicated that SWD were more likely to be suspended in U.S. schools. Yet controlling for infractions at the aggregate level may have resulted in biased estimates (Robinson, 1950). As shown in Table 1’s row 4, adjustment for individual-level infraction type resulted in a 60% decrease (100% – 42% = 58%) in the percentage of estimates reporting that SWD were more likely to be suspended.
We also observed some external and internal validity limitations in the six studies in Table 1’s rows 3 and 4. None of the studies analyzed nationally representative samples. Therefore, the results may not generalize to the U.S. school-aged population. Use of teacher-reported infraction reasons may also be biased due to subjective judgment of behavior. The reported estimates did not always replicate within the same study. For example, Cornell et al. (2018) conducted analyses of out-of-school suspension twice using the same variables but based on two different analytical samples, yielding one estimate for each analysis estimating the risk of suspension for SWD. The associated confidence interval included 1.00 for one of these estimates. The other was not statistically significant.
We identified only two studies assessing suspension that controlled for the strong confound of directly assessed, individual-level behavior. Only two of the seven coefficients (29%) from these two studies indicated that SWD were more likely to be suspended. Both coefficients were from one study (i.e., Huang, 2018). None of the other five of seven estimates (71%) indicated that SWD were more likely to be suspended than similarly behaving students without disabilities (Wright et al., 2014).
Figure 1 displays average ORs from studies (Table 1, rows 4 and 5) that estimated the risk of suspension among students with and without disabilities similarly situated on the factor most materially relevant to being suspended (i.e., directly assessed, individual-level behavior). The averaged OR estimates from studies controlling for infraction types or directly assessed behavior ranged from 1.2 to 1.9 and 1.05 to 1.8, respectively. We briefly detail the methods and results from these two best-evidence studies.
Huang (2018) analyzed data from 10th graders in the 1988 National Education Longitudinal Study (NELS). The data included many student-reported variables including ratings of 18 delinquency-related attitudes and whether the students reported ever being suspended. Although Black students did not report more misbehavior or endorse more deviant attitudes than White students after controlling for gender, socioeconomic status, and family structure, the suspension rate of Black students was twice that of the White students. For SWD, the ORs for suspension were between 1.69 and 1.84 in the four reported logistic regression models.
Disability status was included as a covariate in Model 3 of a series of logistic regressions predicting receipt of an out-of-school suspension in 10th grade. The initial estimate of the risk for out-of-school suspension attributable to disability status after controlling for sex, race or ethnicity, and other covariates was an OR of 1.84. This coefficient was reduced to a statistically significant OR of 1.70 after controlling additional covariates in Models 4 and 5. These other statistically significant self-reported covariates included having fought with students (OR of 4.04), disregarding rules (OR of 1.08) and having drank alcohol or smoked cigarettes (ORs of 1.25 and 1.42, respectively). The estimates were also adjusted for school fixed effects.
In contrast, Wright et al. (2014) analyzed the more recent Early Childhood Longitudinal Study-Kindergarten Cohort of 1998–1999 (ECLS-K) and found that the average of teacher-rated problem behavior from kindergarten to third grade fully explained racial disparities in suspension by eighth grade. The analyses also included measures of the school environment (e.g., school size, percentage of students receiving free or reduced lunch, percentage Black enrollment, and teacher race and ethnicity). Unlike Huang (2018) who controlled for student self-reported and concurrent behavior and attitudes, Wright et al. controlled for prior behavior using teacher ratings and several parent-reported measures concurrently assessed at eighth grade including delinquent behaviors, whether the school was of “good” or “bad” quality, and additional covariates. In contrast to Huang, Wright et al.’s ORs for SWD across Models 1 to 3 ranged from 0.90 to 1.04. None of these were statistically significant. The OR for SWD declined from 1.04 to 0.90 when prior problem behavior was included as a control. Additional inclusion of an interaction term for prior as well as contemporaneous behavior resulted in an OR for SWD of 0.93.
Both Huang (2018) and Wright et al. (2014) analyzed nationally representative, longitudinal data sets. Wright et al.’s analyses estimated the risk of suspension in eighth grade among students whose behavior was similarly rated by teachers in earlier grades. It is also possible that the parent-reported delinquency assessed in Wright et al.’s study may have better captured differences in behavior relative to Huang’s student-reported attitudes and behaviors. Teachers and parents may have been relatively more objective when reporting delinquency than students.
Comparisons of the two studies were also limited by differences in behavioral measurement and sampling. In Huang’s (2018) study, students may have been less likely to report delinquency or misbehavior. In Wright et al.’s (2014) study, parents may not have been fully aware of their children’s misbehavior or delinquency. It is also unclear how accessible the self-report questions in Huang’s study were to SWD. Huang analyzed NELS data of students who attended high school in 1990 and so data that were older than Wright et al.’s ECLS-K data. Wright et al.’s analyses estimated suspension risk across elementary to middle school but not high school. Huang estimated risk during high school but not elementary or middle school. Differences in the sampled time periods and age groups may explain the inconsistent findings reported by the two best-evidence studies.
Are Students of Color With Disabilities More Likely to Be Suspended Than SWD Who Are White?
We coded for statistically significant interactions between disability and racial- or ethnic-minority status for suspension risk. Only three of the 14 studies of suspension reported such interactions. None contrasted the suspension risk for SWD who are White to that of SWD who are of color while also controlling for at least one covariate. The available estimated interactions instead were of the risk of suspension for SWD who are of color relative to (a) students of color without disabilities (K. A. Anderson et al., 2007) or (b) SWD who are White but without adjusting for covariates (Krezmien et al., 2006, 2017).
Are SWD Less Likely to Be Suspended Than Students Without Disabilities?
We also examined the included studies for whether SWD were less likely to be suspended than students without disabilities. Table 2 shows that only 10 of 147 estimates (7%) indicated SWD were less likely to be suspended than students without disabilities. Studies including only individual-level data without accounting for the risks attributable to schooling environments yielded only seven of 100 estimates (7%) indicating that SWD were less likely to be suspended. Among studies including both individual- and aggregate-level data but not infraction or behavioral controls, SWD were less likely to be suspended in zero of seven estimates (0%).
In rows 3 and 4 of Table 2, zero of nine estimates (0%) and three of 24 estimates (13%), respectively, indicated that SWD were less likely to be suspended than students without disabilities. Each of these three estimates were from Kinsler (2011), who found that SWD received shorter suspensions when controlling for individual infractions. Specifically, sixth- and ninth-grade students with physical disabilities attending new schools received suspensions of shorter duration than students without disabilities (the ninth-grade estimate was nonsignificant in models with school fixed effects). However, in row 5, zero of seven (0%) estimates conditioned on directly assessed and individual-level behavior indicated that SWD were less likely to be suspended than students without disabilities.
Are SWD More or Less Likely to Experience Exclusionary Discipline?
As an additional sensitivity check, we examined four studies that combined assessments of suspension with other discipline-related indicators of exclusionary discipline (i.e., K. P. Anderson & Ritter, 2017a, 2017b; Theriot, Craun, & Dupper, 2010; Vincent, Sprague, & Tobin, 2012). Such discipline typically included out-of-school suspensions with other types of exclusion from the school environment such as expulsion or removal to an alternative education setting. As shown in Supplementary Table 2, only six of 25 (24%) of the estimates indicated that SWD were more likely to experience exclusionary discipline. Findings from Supplementary Table 3 show that only seven of 25 (28%) of the estimates indicated that SWD were less likely to experience exclusionary discipline. None of the four studies controlled for the strong confound of directly assessed, individual-level behavior.
Discussion
We synthesized findings from 18 studies to examine the best-available empirical evidence of whether U.S. schools discriminate based on disability status when suspending SWD. These 18 studies included 14 studies reporting specifically on whether SWD are more likely to be suspended than students without disabilities. Four additional studies reported on exclusionary discipline generally but reported no disaggregated risk estimates for suspension specifically. We were especially interested in studies approximating contrasts between similarly situated students including in regard to their behavior in school. Although often unaccounted for in existing work, differential involvement in disruptive or other problem behaviors is a strong potential confound for between-group disparities in discipline (Huang, 2018; Wright et al., 2014). Accounting for this confound allows for contrasts between similarly behaving students and so provides stronger evidence of whether U.S. schools use discriminatory disciplinary practices (Huang, 2018; NRC, 2004; OCR, 2016; Wright et al., 2014).
Of the 14 studies reporting specifically on suspension, six adjusted the risk estimates for school conduct code infractions. Of the 33 available risk estimates, nine of nine (100%) using aggregate-level controls but 10 of 24 (42%) using individual-level controls indicated that SWD were more likely to be suspended. Only two studies controlled for directly assessed, individual-level behavior. These studies reported seven risk estimates. Most of these (i.e., five of seven, or 71%) failed to indicate that SWD were more likely to be suspended. The five nonsignificant estimates conditioned on teacher-rated behaviors (Wright et al., 2014). The two statistically significant estimates indicating that SWD were more likely to be suspended conditioned on student self-reported attitudes and behaviors (i.e., Huang, 2018). The percentage of risk estimates indicating SWD were more likely to be suspended declined 71% (i.e., from 100% to 29%) following adjustments for individual- instead of aggregate-level confounds including directly assessed behavior. We found no evidence to indicate that SWD are less likely to be suspended than similarly behaving students without disabilities. Our supplemental analysis found no evidence to indicate SWD were more or less likely to experience exclusionary discipline than similarly behaving students without disabilities.
Limitations
Our synthesis has at least six limitations. First, we synthesized a limited number of empirical studies. This was especially true for those studies approximating contrasts between similarly behaving students and so accounting for the strong confound of differential involvement in behaviors that might reasonably result in disciplinary action (Huang, 2018; Wright et al., 2014). Additional empirical studies accounting for this strong confound, particularly if based on nationally representative samples, would better establish whether U.S. schools are using discriminatory practices when suspending SWD. Second, we were unable to systematically analyze potential moderators of suspension risk. This included whether SWD who are of color are more likely to be suspended than otherwise similar SWD who are White. It is currently unclear whether SWD of color are more likely to be suspended than otherwise similar SWD who are White. Third, we were unable to independently confirm the appropriateness of the specific disciplinary actions in the synthesized studies.
Fourth, we were unable to systematically examine whether students with specific disability conditions are more likely to be disciplined than similarly behaving students without the specific disability conditions. Although five studies disaggregated some results by disability type, each assessed distinct disability conditions. For example, Kinsler (2011) included covariates only for learning disabilities or physical disabilities. In contrast, Anyon et al. (2014) included separate covariates only for emotional or behavioral disabilities. It remains unknown whether and to what extent students with specific disability conditions are more likely to be suspended relative to students without the conditions while also accounting for directly assessed, individual-level behavior.
Fifth, our risk estimates may be conservative. We synthesized findings from studies including at least one covariate in the analyses. Although IDEA allows U.S. schools to discipline SWD as they would students without disabilities for 10 days or less (Rothstein & Johnson, 2014; Ryan et al., 2007), disentangling whether SWD engaged in behaviors that resulted in discipline actions because of their disabilities or instead because of other factors is challenging from both an administrative and a methodological perspective. Sixth, our synthesis was not designed to evaluate the effectiveness of the disciplinary practices being used by U.S. schools including for SWD. Other work has reported on alternatives to suspension that may more effectively manage disruptive or problem behavior including by students with or at risk for disabilities (e.g., Cook et al., 2018; Flannery, Fenning, Kato, & McIntosh, 2014).
Contributions and Implications
No prior synthesis has systematically evaluated the strength of the evidence regarding whether U.S. schools differentially suspended or otherwise discipline SWD than otherwise similar students without disabilities. An important contribution of our synthesis is to show that the strength of the evidence base regarding whether U.S. schools use discriminatory practices when disciplining SWD is currently limited as well as inconclusive due to inconsistent methods and sampling. To our knowledge, rigorous evidence that U.S. schools may discriminate when disciplining SWD is currently found in only one study (i.e., Huang, 2018), which itself is based on a sample of students who attended high school in 1990. Another study analyzing more recent data from an elementary and middle school sample found no evidence to suggest that U.S. schools are more likely to discipline SWD than similarly behaving students without disabilities (Wright et al., 2014). Federal legislation and regulations mandate that U.S. schools monitor the extent to which SWD of color are being disciplined. Another important contribution of our synthesis is to show that, despite these federal legislation and regulatory monitoring mandates, no empirical evidence is currently available indicating that SWD who are of color are more likely to be suspended than similarly behaving SWD who are White. Such contrasts have yet to be conducted.
No empirical evidence is currently available indicating that SWD who are of color are more likely to be suspended than similarly behaving SWD who are White. Such contrasts have yet to be conducted.
Our findings suggest the need for further rigorous empirical study of disciplinary disparities for SWD, particularly studies that can account for differential involvement in behaviors that might reasonably result in being disciplined (Huang, 2018). Although such contrasts have more commonly been approximated by controlling for infraction type, such controls do not allow for an examination of “the sources of variance that may enter into the disciplinary procession prior to the administrative decision” (Skiba et al., 2014, p. 663). Bias may be more evident for subjective rather than objective types of infractions (e.g., teacher referrals for defiance vs. fighting; Girvan, Gion, McIntosh, & Smolkowski, 2017), but such bias is not modeled in controls for infraction type, especially when using aggregate-level controls. Studies that estimate the risk of discipline for SWD while controlling for directly assessed, individual-level behavior would provide stronger evidence regarding whether SWD are being disciplined in ways that are discriminatory (Huang, 2018; Wright et al., 2014). Empirical studies that condition the risk estimates on directly assessed classroom behavior by well-trained and independent observers, thereby limiting measurement error that may result from use of teacher or parent ratings or student self-report as statistical controls, would be especially valuable. Qualitative and mixed-method studies investigating the decision-making processes of school practitioners when considering the use of exclusionary discipline for SWD would also advance the knowledge base.
Currently, methodological and sampling differences in the available empirical studies preclude strong inferences regarding whether disciplinary disparities between students with and without disabilities result from the systemic use of discriminatory practices by U.S. schools. Further research is needed to better establish the empirical evidence base including as might be used to justify the Equity in IDEA Rule’s monitoring mandates (DoE, 2018). Other work analyzing nationally representative data sets finds no empirical evidence to support IDEA’s monitoring mandates regarding whether significant disproportionality in disability identification results from systemic bias (Morgan, Farkas, Cook, et al., 2017). Instead, students of color are repeatedly found to be less likely to be identified than similarly situated students who are White while attending U.S. schools (Hibel, Farkas, & Morgan, 2010; Morgan, Farkas, Hillemeier, & Maczuga, 2017). These mandates do not account for between-group differences including differential involvement in behaviors that might reasonably result in experiencing school disciplinary practices. Federal and state monitoring should make some attempt to account for differential involvement in problem behaviors as well as other explanatory factors when attempting to identify U.S. schools that may be using discriminatory practices when suspending SWD, including those of color.
Methodological and sampling differences in the available empirical studies preclude strong inferences regarding whether disciplinary disparities between students with and without disabilities results from the systemic use of discriminatory practices by U.S. schools.
Conclusion
Suspension increases the risk of life course adversities including felony arrest and incarceration, and so should be used judiciously if at all. Descriptive evidence of disparities is insufficient to infer that U.S. schools systemically discriminate when suspending or otherwise disciplining students based on their disability status or, for SWD of color, based on their race or ethnicity. Rigorous evidence of the use of discriminatory practices is obtained after accounting for alternative explanatory factors including differential involvement in behaviors that might result in disciplinary disparities (Huang, 2018). Yet such confounds have rarely been accounted for (Huang, 2018; Wright et al., 2014). The available empirical work is currently inconclusive regarding whether U.S. schools systemically discriminate based on disability status when suspending SWD.
Supplemental Material
EC868528_Supplementary_Materials – Supplemental material for Are U.S. Schools Discriminating When Suspending Students With Disabilities? A Best-Evidence Synthesis
Supplemental material, EC868528_Supplementary_Materials for Are U.S. Schools Discriminating When Suspending Students With Disabilities? A Best-Evidence Synthesis by Paul L. Morgan, Yangyang Wang, Adrienne D. Woods, Zoe Mandel, George Farkas and Marianne M. Hillemeier in Exceptional Children
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
