Abstract
A full canon of empirical literature shows that students who are African American, Latinx, or American Indian/Alaskan Native, and students who are male, diagnosed with disabilities, or from low socioeconomic backgrounds are more likely to experience exclusionary discipline practices in U.S. schools. Though there is a growing commitment to mitigating discipline disparities through alternative programming, it is clear that disproportionality in the application of harmful discipline practices persists. The purpose of this literature synthesis was to examine the effectiveness of empirically studied school-based interventions in reducing disproportionality in discipline practices. We analyzed articles that assessed both prevention and intervention program effects using at least one outcome variable representing exclusionary discipline, either in the form of office discipline referrals or suspension/expulsion rates. Included studies used experimental, quasi-experimental, or observational research designs that disaggregated student outcomes by race, ethnicity, gender, disability, or other sociodemographic categories. We identified 20 articles meeting inclusion criteria, four of which provided direct evidence of disproportionality reduction using interaction terms. Results indicate limited evidence that available programs reduce discipline disparities and that common programs may function as a protective factor for White and female students while failing to do so for marginalized students. Findings identify promising areas for future research.
Exclusionary discipline includes a spectrum of punitive practices, including office discipline referrals (ODRs), in-school suspensions, out-of-school suspensions, expulsions, and referrals to the juvenile justice system (Noltemeyer & Mcloughlin, 2010). Research has indicated that exclusionary discipline results in a host of negative outcomes, such as lost instructional time (Losen & Whitaker, 2017), lowered academic outcomes (Noltemeyer et al., 2015), and increased likelihood of truancy and dropout (Balfanz et al., 2015; Fabelo et al., 2011; Rumberger & Losen, 2017). Additionally, higher suspension and expulsion rates have been associated with reduced schoolwide academic performance (Skiba & Rausch, 2006) and perceptions of school climate (Gregory et al., 2011).
Racial disparities in the disciplinary process have been well documented in the literature (Losen et al., 2015; Losen & Gillespie, 2012; Welsh & Little, 2018). African American students are more likely to receive an ODR than their White peers (Skiba et al., 2011), and evidence suggests that African American students are more likely to receive an ODR for subjective infractions—those that require teacher judgment, such as disruption or defiance—as opposed to objective infractions (e.g., tardiness or truancy) compared with White peers (Skiba et al., 2002; Smolkowski, Girvan, et al., 2016). Following an infraction, African American students are also more likely to receive harsher consequences than White peers, even when the behavior violation is similar (Okonofua & Eberhardt, 2015; Skiba et al., 2002). Using a nationally representative sample, Skiba et al. (2011) examined patterns of discipline disparities in 364 elementary and middle schools. They found that “both initial referral to the office and administrative decisions made as a result of that referral significantly contribute[d] to racial and ethnic disparities in school discipline” (p. 101), indicating that disparities exist at multiple points within the disciplinary process and that solutions must be applied across this process.
Latinx and American Indian/Alaska Native students are also more likely to experience exclusionary discipline compared with White peers (Losen et al., 2015; Losen & Gillespie, 2012). Although exclusionary discipline research on Latinx students has been less consistent, potentially due to impacts of localized political shifts (Mora, 2014), migration trends (Jaworsky et al., 2012), and contexts of reception (Rumbaut, 2005), research has found that Latinx students are underrepresented in the application of exclusionary discipline in early-school years and overrepresented in later years (Losen et al., 2015; Skiba et al., 2011). For American Indian students, research has pointed to persistent and substantial disparities, particularly in discipline outcomes (e.g., Nowicki, 2018); however, relatively small sample sizes have posed measurement challenges (Hussar et al., 2020), and few studies exist that have foregrounded outcomes for American Indian students in general. Studies have consistently described Asian and Asian American students as underrepresented in exclusionary discipline practices (see Losen et al., 2015), suggesting the impact of model minority stereotypes on disciplinary processes (see Ngo & Lee, 2007).
In addition to racial disparities, studies have also highlighted discrepancies in discipline practices for males (e.g., Bradshaw et al., 2010), students of low socioeconomic status (e.g., Petras et al., 2011), and students labeled with disabilities (e.g., Brobbey, 2018; Sullivan et al., 2013). Intersectionality (i.e., intersecting forms of oppression, exclusion, and erasure; see Annamma et al., 2018; Blake et al., 2017) refers to the compounding experiences of social injustice for students at the intersections of these groups (Artiles, 2011). Examining the experiences of students who exist at these intersections is critical in understanding the complex phenomena that underlie disparate practices (Crenshaw, 1991). For example, African American students with disabilities are suspended at higher rates than White students with disabilities (Losen, 2018; Losen & Gillespie, 2012), and African American males with disabilities are suspended at rates double those of White males with disabilities and triple those of White females without disabilities (Cruz & Rodl, 2018; Morris & Perry, 2017; Wallace et al., 2008), indicating the interplay among racism, ableism, and gender issues in school discipline systems (Annamma et al., 2018; Blake et al., 2017).
Reports have shown extensive variation in ODR, suspension, and expulsion rates between schools and districts, suggesting that local factors affect exclusionary risk (Losen et al., 2015; Skiba, 2015). School features, such as enrollment (Camacho & Krezmien, 2018) and diversity of the student body (Anyon et al., 2014), have been cited as significant factors predicting the use of exclusionary practices. Other work has implicated schools’ malleable features, such as average suspension rates (Theriot et al., 2010) and average school achievement (Rausch & Skiba, 2005), indicating school climate and organizational behavior as predictive factors. Studies have also examined teacher perspectives toward discipline (Skiba et al., 2014; Skiba & Knesting, 2001) and found that teachers’ classroom management skills (Skiba et al., 2014) highly predict referral rates, suggesting that practitioner behavior may be a lever for improving discipline disparities (e.g., Okonofua et al., 2016). As a whole, this body of work underscores the complexity of exclusionary discipline practices (see Welsh & Little, 2018, for a comprehensive review).
Proposed Solutions
School discipline practices exist on a continuum that spans from prevention to postinfraction intervention. In considering this continuum, Gregory et al. (2017) developed an integrated framework for application in discipline disparity reduction, which includes prevention, intervention strategies, and structures that offer both. The framework identifies 10 requisite components for addressing disparities across the discipline continuum: (a) supportive relationships, (b) bias-aware classrooms, (c) academic rigor, (d) culturally relevant teaching, (e) opportunities for learning and correcting behavior within the classroom, (f) data-based programming, (g) problem-solving approaches to discipline, (h) inclusion of student and family voice, (i) reintegration after conflict, and (j) multitiered systems of support. Gregory et al. emphasized that interventions focused only on narrow or singular aspects of the discipline continuum are unlikely to rectify entrenched inequities.
Previous Reviews
Welsh and Little (2018) conducted a comprehensive review focused on factors that contribute to discipline disparities and the nascent evidence for alternatives to exclusionary discipline. The authors uncovered a variety of programs designed to support schools and districts in reducing exclusionary discipline practices overall, including School-Wide Positive Behavior Interventions and Supports (SWPBIS; e.g., Gage et al., 2018), restorative justice (e.g., Fronius et al., 2019), and professional development interventions targeting teachers’ culturally responsive practices (e.g., Bottiani et al., 2018). The authors found that many of these programmatic approaches problematically focused on “fixing” student behavior and assimilating students to existing school culture, rather than adjusting existing structures to support the students’ varied socioemotional needs. The authors theorized that such programs remain unlikely to reduce disparities. However, their review did not examine specific quantitative outcomes (e.g., ODR rates) or program moderators and mediators (e.g., feasibility of implementation or program fidelity requirements). Additionally, though a multitude of systematic reviews exist on school discipline—particularly for SWPBIS and restorative justice—none have examined direct relationships between program implementation and disparity reduction.
SWPBIS Research
SWPBIS seeks to improve discipline practices across school systems through universal screening, a continuum of interventions and supports, and data-based progress monitoring. The program is grounded in many robust, high-quality studies indicating improved prosocial student behavior (e.g., Bradshaw et al., 2012), and many components of Gregory et al.’s (2017) framework are included in the program (e.g., multitiered systems and data-based decision making). However, SWPBIS research has rarely disaggregated impact by race, gender, disability label, or other sociodemographic groups known to experience higher rates of exclusionary discipline (e.g., Flannery et al., 2014). Studies that have included race, gender, and/or disability variables have indicated that SWPBIS may benefit White students more than other students (e.g., Vincent & Tobin, 2011), females more than males (e.g., Bradshaw et al., 2012), and results have been inconclusive regarding students with disabilities (e.g., Vincent & Tobin, 2011).
In response to these findings, researchers have developed models for incorporating culturally responsive practices into SWPBIS (see Fallon et al., 2012), but evidence regarding the efficacy of such add-on programming in reducing discipline disparities is emergent. In Gage et al.’s (2018) review of SWPBIS’ impact on disciplinary exclusion, the authors found a statistically large treatment effect of SWPBIS on reducing suspensions overall. However, their review was based on just four experimental research studies that met criteria for rigor. Only one of these studies (Bradshaw et al., 2012) examined disproportionately disciplined student subgroups (the study examined differential effects by gender).
Restorative Justice Research
School-based restorative justice has been described as a collective approach to building mutual respect and inviting student participation in the development of school community (Buckley & Maxwell, 2007; Mansfield et al., 2018). Restorative justice also features many components of Gregory et al.’s (2017) framework, such as encouraging supportive relationships and problem-solving approaches to discipline. Pilot studies have shown efficacy in reducing racial discipline disparities (e.g., Augustine et al., 2018; Jain et al., 2014; Sumner et al., 2010). However, peer-reviewed studies have been largely qualitative in nature (e.g., González, 2012), which, though important for understanding the program, leaves open questions regarding direct efficacy. Song and Swearer (2016) described the ways in which restorative justice has become a part of the cultural zeitgeist, carrying forward a belief that it reduces disparities without a full cannon of research to support these claims. Fronius et al.’s (2019) review of restorative justice in schools uncovered one randomized controlled trial (RCT; Augustine et al., 2018), implemented in one school district, that demonstrated a reduction in discipline disparities between African American and White students, and another observational study (Jain et al., 2014), also implemented in a single school district, that produced mixed results. Fronius et al. emphasized that, “most programs are still at the infancy stage” (p. 21), and further research was needed to understand the direct effects of restorative justice in reducing discipline disparities.
Wholistic Evidence
Both SWPBIS and restorative justice have been shown to reduce exclusionary punishment. However, previous systematic reviews have not examined whether these programs reduce disparities in such practices for marginalized groups. In their review, Welsh and Little (2018) detailed the vast scope of documented discipline disparities and emergent solutions, and they hypothesized why disparities remain despite the implementation of such programs. Their review did not examine for whom these interventions have been shown to work or the minimum implementation fidelity with which schools must implement them to reduce disparities (see Welsh & Little, 2018, Table S3). In other words, they examined all studies related to discipline reduction, including studies that did not disaggregate by demographic category. Welsh and Little (2018) asserted a need for further research to illuminate these unexplored areas in order to provide “a better sense of not only whether alternative approaches to exclusionary discipline are working but also why” (p. 785). They concluded that the phenomenon is multifaceted and remains rife with unanswered empirical questions, including whether it is possible for schools to address disparities with currently available programs. Given Welsh and Little’s foundational findings, in this review we examine the body of evidence available for empirically studied, school-based interventions in reducing disproportionality in school discipline practices and associated key program implementation features.
Implementation Challenges
It is critical to note that schoolwide programs, such as restorative justice and SWPBIS, come at a cost to schools and districts, which may attenuate the benefits reported in highly controlled research. Newmann et al. (2001) detailed the perils of initiatives that require intensive time and energy but lack immediate success, as these commonly incur high costs, cause teacher fatigue, and struggle to improve teaching and learning in a sustainable way (Forman et al., 2013). Relatedly, recent studies on SWPBIS have underscored the importance of implementation fidelity (e.g., Gage et al., 2018), as outcomes have varied as a function of implementation quality (Kim et al., 2018). Given the costs and training it takes to achieve fidelity for schoolwide programs, it remains unclear whether feasible solutions are available to schools seeking to reduce discipline disparities. If implementation fidelity is out of reach, then intended outcomes and scaling up remain untenable (Fixsen et al., 2013).
It is critical, then, to understand how disparity-reduction research considers program efficacy in relation to treatment fidelity. The proportion of studies that were demonstration programs—implemented by researchers to ensure implementation fidelity under controlled conditions—remains unexamined. Therefore, we sought to determine whether or not there was an efficacy difference between demonstration programs and those implemented on a routine basis in schools. In doing so, we relied on Wilson et al.’s (2003) definition of demonstration programs as, “those implemented and evaluated by a researcher mainly for research or demonstration purposes,” and routine practice programs as, “those in which the program being studied already exist[ed] in the school on an ongoing basis and the evaluation [was] conducted either by school-based or outside researchers” (p. 137). Wilson et al. pointed to the relevancy of this distinction in that some meta-analyses have yielded smaller effect sizes for routine practice programs than for demonstration programs. We sought to examine whether this discrepancy existed in the extant research on discipline disparity reduction.
Purpose of the Study
In alignment with Welsh and Little’s (2018) conclusions and Gregory et al.’s (2017) identified priorities for future research, this analysis reconceptualizes the literature on school discipline by answering the following questions: (a) Which programs designed to reduce exclusionary discipline are associated with reduced disproportionality for students who are male, African American, Latinx, or American Indian/Alaskan Native, labeled with a disability, and/or of low socioeconomic status? (b) Which components of Gregory et al.’s framework for equitable discipline are included in these programs? (c) How do studies consider treatment fidelity in the context of program application and efficacy? By extracting overall effect sizes, we examined whether efforts to reduce exclusionary discipline were associated with an equal reduction in discipline for students of all sociodemographic categories, or whether reduction occurred at different rates for different student populations. The analysis included both referral incidents and consequence application (e.g., in-school suspension, out-of-school suspension, expulsion). We connected empirically investigated programs to Gregory et al.’s 10 components to determine which areas of the framework have been substantiated, and we identified key themes in the literature describing the ways in which schools can implement programs in the most efficient and effective manner.
Method
Given that there were very few common design features across the studies, we conducted an integrative best-evidence synthesis (Slavin, 1986) rather than a meta-analysis. A best-evidence synthesis identifies studies using explicit inclusion criteria and reports on effect sizes while also reporting on key themes. We conducted a systematic review of empirical studies examining the relationship between school- or district-wide behavior support programs and disciplinary exclusion (i.e., the dependent variable). We conducted this review in three phases: (a) title and abstract search, (b) full text review, and (c) data extraction and literature synthesis.
Phase 1: Title and Abstract Search
We used a comprehensive search strategy to locate articles that were peer reviewed, published, and relevant to the field of schoolwide discipline. In consultation with a research librarian, we performed systematic searches using keywords. We began with general search terms for literature related to school discipline: Along with the connector “and,” we searched the terms “school discipline” and “race OR ethnic OR ethnicity OR diverse OR diversity” located in article abstracts. We also used the term school discipline in combination with other sociodemographic category terms: “gender OR socioeconomic OR disability OR special education.” We then searched for specific terms related to school discipline, replacing “school discipline” with the terms “Positive Behavior Interventions and Supports,” “school wide positive behavior interventions,” “social emotional learning,” “restorative justice,” “restorative practices,” and “social justice discipline.” After reviewing these results, we performed additional searches using found search terms with potential for identifying programs meant to reduce exclusionary discipline, such as “teen court” and “socio-emotional learning.” We conducted our search throughout Spring 2019 and, thus, did not include articles published after May 2019.
Next, we performed hand searches of identified articles’ reference lists to find additional studies potentially eligible for the synthesis. As aforementioned, some of the programs available have a substantial evidence base, including several literature reviews and meta-analyses (e.g., SWPBIS); thus, we conducted hand searches of these articles for relevant citations (Boneshefski & Runge, 2014; Bottiani et al., 2018; Bouchard & Wong, 2017; Cotter Stalker, 2017; Durlak et al., 2011; Fallon et al., 2012; Gage et al., 2018; Gregory et al., 2017; Mallett, 2016; Mitchell et al., 2018; Öğülmüş & Vuran, 2016; Welsh & Little, 2018). Search engines used included EBSCO, PsychINFO, ERIC, PubMed, and JSTOR. We conducted these ancestral searches of relevant article citation lists using Google Scholar. We set search criteria for articles published from 1990 to present, as school policies and programs meant to reduce exclusionary discipline emerged during this time (Kafka, 2011).
Next, we entered each article’s title and abstract into an Excel database and coded them with four inclusion criteria: (a) the study was empirical and published in a peer-reviewed journal (i.e., this review excluded book chapters, technical reports, master’s theses, dissertations, and conceptual papers); (b) the study examined outcomes in public school settings (i.e., studies examining juvenile justice or criminal justice systems were excluded); (c) the study used a quantitative or mixed-method research design (i.e., studies utilizing solely qualitative methods were excluded); and (d) the study included exclusionary discipline as an explanatory variable. Any articles fitting the aforementioned inclusion criteria—or with unclear coding results based on the title and abstract—were kept for a full text review.
This initial search yielded 506 articles for potential inclusion in the study, 83 of which met criteria for a second round of review. Table 1 displays the number of articles at each phase of the retrieval process by search term. After applying Phase 1 criteria, the most common reason for exclusion of articles about “restorative practices,” “social justice,” or “disproportionality” was that the manuscript was conceptual or used qualitative methods (e.g., interview data regarding school/classroom climate or adult perceptions and attitudes). For articles about “SWPBIS,” “teen court,” and “socioemotional learning,” the most common reason for exclusion was the absence of exclusionary discipline as an outcome variable. For example, many teen court studies examined dependent variables related to substance abuse, and many SWPBIS studies examined fidelity measures or predictors of sustained program implementation.
Article retrieval process
Note. Ancestral and meta-analysis searches only included titles not previously identified in our original search. SWPBIS = School-Wide Positive Behavior Interventions and Supports; SEL = Socioemotional Learning.
Phase 2: Full Text Review
During this phase, we reviewed the full texts of all studies that met eligibility in Phase 1 using the following inclusion criteria: (a) whether the study question, purpose, and hypothesis included effects for an outcome variable related to exclusionary discipline (i.e., ODR, suspensions, or expulsions); (b) whether the study included outcomes disaggregated by sociodemographic categories; and (c) whether the study used a methodological design appropriate to understanding intervention efficacy, including RCTs, quasi-experimental designs (QED), or observational research designs (ORD) with extensive control covariates that compared groups receiving one or more identifiable intervention with one or more control condition. We included studies that presented both pre- and posttest measures on at least one qualifying outcome variable, or that used a pretest-posttest design in which measures of at least one qualifying outcome variable were taken before and after intervention on the same participants, including single-group designs and multiple-group designs involving different interventions. We also included observational studies that used administrative data sets with extensive control covariates that could be compared. Studies describing interventions without evaluation data were excluded.
Both primary researchers independently read and coded the 83 articles for inclusion, and we calculated interobserver agreement using Cohen’s kappa. Interobserver agreement for Phase 2 resulted in substantial agreement (κ = 0.77, CI = [0.61, 0.93]), with disagreement on five studies. After reviewing inclusion criteria and definitions, the authors independently reread articles on which disagreements occurred and subsequently achieved nearly perfect agreement (κ = 0.90, CI = [0.79, 1.01]), with disagreement remaining on three articles. We discussed these three articles until consensus was reached, which resulted in the inclusion of 20 studies.
Phase 3: Study Coding and Data Extraction
Both primary researchers coded eligible studies using a predetermined coding protocol (see Table 2). Our descriptive coding included a wide variety of study characteristics, including the publication year, the intervention type, and the sample population (i.e., age/grade, demographic characteristics). We included the design (i.e., RCT, QED, or ORD), measures (e.g., implementation tools), and attrition. We also coded for outcome (i.e., ODR, out-of-school suspension, other), sociodemographic disaggregation of outcomes (e.g., race, gender, disability status), and any other control variables included (e.g., socioeconomic status).
Article coding and descriptions
Note. RCT = randomized controlled trial; ORD = observational research design; QED = quasi-experimental design; IEP = individualized education program; FRL = free or reduced price lunch; ELL = English language learner; RJ = restorative justice; CRT = culturally responsive teaching; MTSS = multitiered systems of support; SWPBIS = School-Wide Positive Behavior Interventions and Supports; ODR = office discipline referral; OSS = out-of-school suspension; ISS = in-school suspension; PD = professional development; MTP-S = My Teaching Partner-Secondary; AI/AN = American Indian/Alaskan Native; PI = Pacific Islander.
To assess implementation feasibility, we coded for two features: (a) whether the program was demonstration or routine and (b) which measures of implementation fidelity—if any—were used. (See previous discussion regarding how demonstration programs may show higher efficacy than routine programs, whereas routine programs may provide more viable evidence for implementation feasibility; Wilson et al., 2003.) We also analyzed the ways that researchers conceptualized adherence to the program (i.e., implementation fidelity), as the frequency of schools attempting implementation but unable to reach fidelity may indicate a lack of feasibility.
Finally, in alignment with Gregory et al.’s (2017) framework, we analyzed the location on the discipline continuum at which the intervention occurred (i.e., prevention of an offense citation vs. differential processing following an offense) and which of the 10 equity components were present in the program or intervention. Both primary researchers independently read and coded the 20 articles and compared codebooks using MAXQDA 2020 (Version 20.3.0). When disagreements occurred, the authors independently reread articles and discussed the codes until consensus was reached.
After descriptive coding, we began data extraction. We tabulated whether each study included a regression coefficient for a specific demographic category (e.g., race, gender, disability) with an interaction term for the treatment under study (e.g., SWPBIS or restorative justice), given the study design. For example, ORDs might report a time × treatment × demographic category interaction, whereas RCTs might report a treatment × category interaction. These interaction terms provided statistical evidence of differences in slope for treated students in a demographic subgroup compared with treated students in the reported reference group. We also extracted data from each study that provided evidence of pre- and posttreatment impact for each demographic group; in other words, we extracted odds ratios or coefficients that indicated whether the treatment reduced exclusionary discipline for that group between the pre- and postintervention period, regardless of effect compared with the reference group. Although most studies reported the results of regression models as odds ratios or beta coefficients, studies varied widely by methodological features. For studies reporting beta coefficients, we exponentiated to odds ratios using conversion methods outlined by Borenstein et al. (2009). We arranged these results in Table 3 from the most to the least methodologically rigorous. We classified RCTs as the most rigorous way to provide evidence of treatment effects, so these were ranked the highest, followed by ORDs with extensive control covariates, and then studies that did not randomize or use control covariates and only reported descriptive results.
Program by category interaction effects
Note. When only one number is reported in brackets, it represents a standard error rather than a confidence interval. RCT = randomized controlled trial; ORD = observational research design; ns = nonsignificant; OSS = out-of-school suspension; ODR = office discipline referral; PBIS = Positive Behavior Interventions and Supports; MTP-S = My Teaching Partner-Secondary; SWPBIS = School-Wide Positive Behavior Interventions and Supports; RJ = restorative justice; SWD = students with disabilities; AI/AN = American Indian/Alaskan Native.
Results
Descriptive Results
Our analytic sample contained nine articles on SWPBIS: five examined the efficacy of the program alone, two examined the efficacy of the program supplemented with cultural responsivity training and/or individual teacher coaching, and two examined the efficacy of an intervention that was not SWPBIS but included SWPBIS components (i.e., use of data-based decision making and universal language for behavior expectations). We also analyzed six studies focused on restorative justice, one of which evaluated any alternative to out-of-school suspension, including in-school suspension and restorative practices, and three of which analyzed the same data set collected from Denver Public Schools between 2011 and 2015. The remaining three studies also analyzed district-level data sets. We found just one study that referred to a social-emotional learning program (Osher et al., 2014), and this study included comprehensive, system-wide interventions in addition to the social-emotional learning component (e.g., student-centered planning teams, data-based decision making, and instructional coaches). 1 Five articles evaluated the impact of teacher professional development on disciplinary outcomes; two of these supported teacher application of the Virginia Student Threat Assessment following an offense (Cornell et al., 2012; Cornell et al., 2018). The remaining three studies implemented professional development for teachers designed to reduce racial discipline disparities by addressing teacher attitudes, perceptions, skills, and actions.
Estimates of Program Efficacy in Reducing Disproportionality
Our first research question asked which programs were associated with reduced disparities in exclusionary discipline application for students who are African American, Latinx, or American Indian/Alaskan Native, 2 male, labeled with a disability, or of low socioeconomic status. Given that the most direct way to examine differential impact was with studies that reported differences in slope between demographic groups before and after treatment, we tabulated studies that provided an interaction term between the category and treatment. We found only four studies that provided such a term, and these studies, reported in Table 3, yielded mixed results. Bradshaw et al. (2012) examined SWPBIS and found increased ODRs and suspensions by treatment for males and reduced ODRs but increased suspensions by treatment for students with disabilities, whereas Cruz and Rodl (2018) examined SWPBIS and found increased suspensions by treatment for African American and Latinx students. Anyon et al. (2016) examined restorative justice and found increased ODRs but reduced suspensions for African American students in treatment conditions.
The only program for which we found explicit disparity reduction by treatment was Gregory, Hafen, et al.’s (2016) study of a one-to-one coaching intervention, which found a larger reduction in exclusionary discipline by treatment for African American students compared with the overall student population. We found one additional study in which the authors conducted such an analysis but did not report the results due to nonsignificance (Gregory et al., 2018). Table 3 depicts the reported interaction terms for each study in ranked order of study methodology, and in what follows, we further substantiate these findings by discussing key themes found across the sample of 20 articles.
School-Wide Positive Behavior Interventions and Supports
The majority of studies, both in our initial search and final sample, examined SWPBIS (see Table 1). The five included studies indicated that (a) the efficacy of SWPBIS alone in reducing disparities was either inconsistent or ineffective and (b) the program may have benefited White, female students more so than other demographic groups. Only two studies provided direct interaction terms (Bradshaw et al., 2012; Cruz & Rodl, 2018), and findings suggested continued or worsening inequities for males, students with disabilities, African American, Latinx, and American Indian/Alaska Native students over the course of program implementation.
These results can be further explored alongside other studies that examined SWPBIS without interaction terms, which suggests that SWPBIS has provided general reductions in exclusionary discipline but has not reduced disparities among student groups. Vincent et al. (2011) found that schools using SWPBIS with fidelity did not impact the discipline disparity between African American and White students, and that—though suspensions for both groups decreased at an equal rate—African American students remained overrepresented in suspensions. Vincent et al.’s (2011) study found that, in non-SWPBIS schools, the gap between African American and White students widened over time due to an increase in the number of suspensions for African American students. In their QED, Gage et al. (2019) found that African American students and students with disabilities in schools implementing SWPBIS with fidelity had significantly fewer out-of-school suspensions than those in nonimplementing schools. The authors reported standardized mean difference effect sizes for all students (OR = 0.37, g = −0.55) compared with students with disabilities (OR = 0.36, g = −0.56) and African American students (OR = 0.57, g = −0.31), which indicated that the program reduced suspensions for African American students but that the effect was smaller for this group than the overall student population. Though studies show that SWPBIS reduces exclusionary discipline overall, the available evidence does not demonstrate a reduction in the disciplinary disparities that negatively affect marginalized student subgroups.
Finally, some smaller descriptive studies (McIntosh et al., 2018; Scott et al., 2012) showed that single schools implementing a modified, equity-focused version of SWPBIS may wield efficacy in disparity reduction. In these studies, researchers supported schools’ use of a disaggregated data-tracking system to assist faculty in identifying problems, building awareness, developing goals for improvement, and tracking progress with data—all concepts derived from SWPBIS. 3 Similar to the classic SWPBIS studies discussed in the previous paragraph, these two studies documented changes in ODRs across time for all students, but differed in that they found a greater effect for African American students compared with White peers. However—due to nonrandom methodological designs—these studies did not provide evidence of a direct causal connection between the programs implemented and reductions in discipline disparities.
Restorative Justice
Five ORD (see Table 2) studies indicated both the preventative benefits of strengthening teacher-student relationships and the intervention benefits of providing alternatives to suspension and expulsion; however, results were often complicated by teachers’ differential ability to implement recommended techniques. Similar to the findings in the SWPBIS literature, this set of research did not provide evidence that the program reduced discipline disparities despite reducing exclusionary discipline overall. Using the same data set from Denver Public Schools, three studies examined the impact of district policy changes that included staff training in restorative practices and policy recommendations that students be offered a restorative intervention following a discipline action. The only study in this group to report interaction terms was Anyon et al.’s (2016), which examined the differential processing of students (n = 9,921) after an infraction and found that being African American was associated with increased ODRs (OR = 1.41) but decreased out-of-school suspensions (OR = 0.80). However, both of these findings were nonsignificant, and African American students and students with disabilities remained overrepresented in exclusionary discipline despite longitudinal district-wide policy changes encouraging alternatives. Similarly, Gregory et al.’s (2018) findings—using the same data set with a postinfraction student sample (n = 9,039)—demonstrated that student participation in a restorative intervention reduced the odds that a student would be suspended. However, after controlling for other covariates—including participation in restorative interventions—discipline-referred African American students were still 11% more likely to receive an out-of-school suspension compared with discipline-referred White peers. In the third Denver study, Anyon et al. (2014) included the entire student sample (n = 87,997) to analyze infraction prevention and found that schools reduced overall out-of-school suspensions through a combination of both in-school suspension (i.e., another exclusionary practice) and restorative approaches. The authors did not include interaction effects but reported that even after controlling for participation in restorative justice, sociodemographic characteristics such as race/ethnicity, gender, special education status, and socioeconomic status were all significantly associated with increased ODRs and out-of-school suspensions.
Several studies combined concepts from SWPBIS and restorative justice (e.g., Hashim et al., 2018), and these suggested potential systemic improvements for marginalized student groups. In addition to the studies suggesting that restorative approaches can positively impact “differential processing” (Gregory et al., 2018, p. 168) after a referral, other studies examined the potential of restorative approaches to transform teacher-student relationships and thus prevent ODRs. Together, these studies suggested the benefits of strengthening classroom climate and teachers’ ability to implement restorative justice. Mansfield et al. (2018) examined one large district implementing the SaferSanerSchools Whole-School Change program (Mirsky, 2011), which focused on mutual respect and community among school stakeholders. The authors reported a reduction in suspensions for African American students and students with disabilities but noted that disparities were not eliminated entirely. Similarly, using teacher reports of restorative justice implementation in the classroom and student reports of perceived teacher respect, Gregory, Clawson, et al. (2016) found that “higher Restorative Justice implementers narrowed the racial discipline gap but did not eradicate it in their referral patterns” (p. 342). Gregory, Clawson, et al. noted that because the relationship between implementation and students’ perceived respect from teachers held across racial groups, the discipline gap may be better conceptualized as a “relationship gap” (p. 345).
Other Programs
Studies of programs other than SWPBIS and restorative justice mainly focused on professional development related to culturally responsive or empathetic practices. These studies indicated the potential of one-on-one teacher coaching in preventing disparities in ODRs, but they showed mixed results regarding the impact of brief professional development in prevention efforts. Gregory, Hafen, et al.’s (2016) and Bradshaw et al.’s (2018) RCTs both examined the effects of one-to-one coaching programs in which teachers received ongoing and personalized feedback on instructional segments. Teachers in these interventions showed significantly lower use of ODRs, and, using a direct interaction term (see Table 3), Gregory, Hafen, et al. (2016) found that teachers who incorporated higher order thinking into their instruction significantly reduced ODRs for African American students (OR = 0.98, p < .01), suggesting that improved pedagogy affected discipline disparities.
Aiming to examine more cost-effective and sustainable approaches, Bradshaw et al.’s (2018) study also included five 60-minute faculty trainings on cultural responsiveness for all teachers, and they found that teachers who received the professional development training improved attitudes and beliefs about discipline but did not decrease in ODR frequency, indicating that without the individual coaching component, teachers struggled to implement suggested strategies. C. R. Cook et al. (2018) and Okonofua et al. (2016) also studied the impact of professional development programs. C. R. Cook et al.’s (2018) study assisted teachers in building respectful relationships and providing proportionate responses to student behavior in the classroom, whereas Okonofua et al.’s (2016) study supported teachers in adopting an empathetic mind-set when considering classroom discipline practices. Both studies uncovered promising results. In Okonofua et al.’s study, students in classrooms with a teacher who participated in the intervention were half as likely to be suspended over the school year compared with students in nontreatment classrooms (OR = 0.42, p < .001), and this effect held when controlling for student race, gender, and suspension in the prior year. However, neither study provided an interaction effect, and both authors suggested the need for expanded future study.
Gregory’s Framework
Our second research question sought to identify which components of Gregory et al.’s (2017) framework were present across the studies and whether patterns emerged regarding these components and the strength of study findings. As aforementioned, Gregory et al.’s framework stipulated the need for preventative approaches that reduce ODRs, intervention approaches that improve differential processing of referred students, and integrated systems that address both (i.e., multitiered systems of support). We coded articles for explicit inclusion of Gregory et al.’s components. For example, although SWPBIS is a system that emphasizes multitiered systems of support, Vincent and Tobin’s (2011) study focused on schools using the Effective Behavior Support Survey (Sugai et al., 2000), which focused primarily on Tier 1 implementation rather than implementation of the entire program. As described in Vincent and Tobin’s (2011) study, the survey focused on clearly defined behavior expectations and use of data to drive decisions. Therefore, we coded this study as an approach that provided data-based inquiry, opportunities for correcting and learning from behavior, and problem-solving approaches to discipline. We did not code this study as including multitiered systems, as there was no evidence that participating schools implemented all three tiers. Figure 1 depicts each component of the framework and the location at which each study fell based on at least one relevant code in the authors’ description (i.e., codes were not mutually exclusive).

Connections to Gregory et al.’s (2017) framework.
Preventative Approaches
We grouped preventative approaches into two categories: those in which the teacher provided preventative environmental adjustments, including (a) bias-aware practices, (b) academic rigor, and (c) culturally relevant teaching, and those in which interactions between teachers and students prevented the need for removal from the classroom, including (d) building supportive relationships and (e) opportunities for learning and correcting behavior within the classroom. Of the studies that examined preventative efforts (k = 11), the most abundant evidence existed for coaching or professional development addressing teachers’ ability to build supportive relationships and correct behavior with an empathetic lens. We found only one study that explicitly examined culturally responsive teaching and awareness of bias (Bradshaw et al., 2018), although several studies potentially included these components but placed explicit emphases on other factors. In these, researchers presented disaggregated data to faculty and may be considered as building a more bias-aware teaching staff, as results from disaggregated data often led to faculty discussions for reasons behind disproportionate ODRs (e.g., Scott et al., 2012). Finally, although Gregory’s framework suggested academic rigor and instructional responsiveness as a key component in rectifying inequities, we found just one study (Gregory, Hafen, et al., 2016) with promising results that included academic rigor, thus representing a preventative component for which we found limited but promising evidence. This represents a notable gap in the literature.
Intervention Approaches
Most studies examining intervention approaches included (a) data-based programming (k = 8), (b) problem-solving approaches to discipline (k = 11), (c) inclusion of student and family voice (k = 6), and (d) reintegration after conflict (k = 9). By design, restorative justice addresses these aspects of Gregory et al.’s (2017) framework directly. Studies focused on this program included problem-solving approaches by students, families, faculty, and leadership following an infraction and emphasized plans for students to repair harm and rejoin the learning environment. Similarly, Cornell et al. (2012) and Cornell et al. (2018) examined the use of the Virginia Threat Assessment protocol to inform disciplinary responses for students who had made a threat of violence. Although threats of violence in schools remain rare, responses may be subject to bias and, therefore, are a working component in the school-to-prison pipeline. The protocol was effective in funneling students into counseling rather than long-term suspension or expulsion—compared with students in control conditions. The authors found no significant disparities between African American, Latinx, and White students in long-term suspension rates after use of the protocol. These studies, along with the restorative justice research, provided evidence in support of problem-solving approaches to discipline in improving differential processing for African American students, in particular. Furthermore, these studies suggested that when interdisciplinary teams (e.g., school psychologists, teachers, families, and administrators) critically analyze discipline cases, it may support informed decisions that reduce bias in the process of assigning consequences. However, our results indicate that these components alone are not sufficient to eradicate entrenched inequities, as they are ineffective at preventing ODRs for marginalized groups.
Whole-School Approaches
Although Gregory et al. (2017) outlined a limited description of systems that address both prevention and intervention (i.e., multitiered systems of support), we found several studies (k = 6) that comprised various systemic improvements meant to achieve reduction in ODRs and suspensions. Osher et al. (2014) studied the effects of comprehensive, district-wide interventions (e.g., student-centered planning teams, data-based decision making, and staffing schools with instructional coaches), and Hashim and colleagues (2018) studied a layered approach to discipline (i.e., the district began with SWPBIS, enacted policies banning suspension for defiance, and ultimately adopted a restorative justice philosophical approach). In accordance with Gregory’s recommendations, both studies examined districts that implemented multitiered systems, the use of data to refine interventions, alignment of prevention and intervention systems to address immediate needs, and support of schools and staff to implement research-based programs. In both studies, suspensions trended downward, but disparities remained for “frequently disciplined subgroups” (Hashim et al., 2018, p. 184). Furthermore, it remains unclear whether reductions were related to increased teacher capacity or the districts’ policy shifts, such as a ban that prevented suspension without necessarily altering school climate and classroom management.
Local Control and Sustained Uptake
Our final research question sought to identify the feasibility of identified programs for application in authentic school contexts. As Fixsen et al. (2013) noted, implementation is complex and context specific; there are critical differences in program feasibility given the context of everyday school settings and factors required for successful implementation and sustained uptake (e.g., training time, fiscal demands, human resource demands, program complexity). To examine the viability of programs in real-world applications, we assessed the ways in which studies trained faculty and leadership and enacted scalable, sustained uptake.
Our analysis showed that, though simple and cost-effective, traditional models of teacher professional development meant to add cultural responsivity to programs already being implemented in districts were less effective than ongoing, growth-in-practice approaches. However, it is critical to note that one such growth-in-practice approach, one-to-one coaching, was not sustainable. Gregory, Hafen, et al. (2016) and Bradshaw et al. (2018) both examined coaching models implemented by researcher trained and supervised coaches, and study authors acknowledged the “additional cost and burden associated with coaching individual teachers, both in terms of teacher time and coach time” (Bradshaw et al., 2018, p. 122). To this point, Osher et al.’s (2014) study included support from coaches hired from within the district and similarly indicated that training costs and hiring difficulties created significant barriers for the one-to-one coaching portion of the intervention, likely adding to the study’s null findings.
Whereas our findings indicate the complexity and system-wide investment required to disrupt entrenched inequities in discipline disparities, approaches must be jointly anchored in feasibility and effectiveness to optimize the likelihood of achieving equitable practices and outcomes in real-world school settings (C. R. Cook et al., 2018). Hashim et al. (2018) described a 10-year staged reform series in Los Angeles Unified School District that required extensive faculty and leadership training, an independently hired implementation auditor, a district-level task force, and a more than $4.9-million cost to the district. Even still, the largest reduction in disparities occurred only after the district prohibited student suspensions for willful defiance, rather than after implementation of SWPBIS and restorative interventions. Though this indicates the need for both policy and practice solutions, it is unclear whether the costly measures undertaken by the district improved climate and culture in conjunction with the suspension ban. On the other hand, although some studies’ interventions implemented shorter, more cost-effective teacher-training modules (e.g., Okonofua et al., 2016), studies in our sample that implemented such professional development demonstrated mixed or null results. It is critical for future research to grapple with this tension.
Treatment Versus Intent to Treat
To further examine evidence for feasibility and sustained uptake, we identified the differences in each study’s approach to application in authentic school contexts. We encountered critical differences between results of studies that used treatment-to-fidelity approaches (e.g., Gage et al., 2019) versus intent-to-treat approaches that ignored treatment adherence and prioritized the randomization process (e.g., Gregory, Hafen, et al., 2016). B. G. Cook and Odom (2013) argued that “implementation is the critical link between research and practice” (p. 138), and if schools struggle to implement with fidelity, program efficacy matters little. It is no surprise that researchers who took a treatment-to-fidelity approach—eliminating schools that tried a program but did not achieve fidelity (i.e., Gage et al., 2019)—demonstrated higher levels of efficacy than researchers who took an intent-to-treat approach and kept schools in the sample regardless of fidelity (i.e., Cruz & Rodl, 2018). It is well established that implementation fidelity is critical (Fixsen et al., 2013; Kim et al., 2018), and our results further demonstrate that schools may struggle to implement particular interventions without robust researcher support (e.g., C. R. Cook et al., 2018) or external grant funding (e.g., Mansfield et al., 2018). Both demonstration studies, which provide direct researcher support for the intervention, and routine studies, which only classify schools achieving a certain metric of fidelity as treated schools, can provide valuable evidence that a program may work. However, further research is needed to establish whether schools can implement these programs on a routine basis to fidelity without considerable outside support and funding.
Discussion
In recent decades, scholars have undertaken considerable effort to understand discipline disparities (Welsh & Little, 2018), yet there is a dearth of quality research focused explicitly on disparity-reducing interventions. Our findings indicate that research on school discipline has largely cohered not only around a “color-evasive” approach (Annamma et al., 2017) but also around a larger, neurotypical, and socially normative approach that avoids addressing the wide range of variability present in diverse classrooms. This has resulted in insufficient data regarding the extent to which embedded structural and personal biases affect intervention effectiveness. The studies we found that overtly examined differential reductions concluded with a common refrain: “Although our data suggests that the rate of suspension and expulsion decreased, disparities may remain” (Osher et al., 2014, p. 1). The primary purpose of this analysis was to provide relevant program direction to districts seeking to reduce disparities in exclusionary discipline. It is clear that schools have a variety of options for reducing ODRs, out-of-school suspensions, and expulsion rates, and it is encouraging that many of these options reduced exclusionary practices overall. However, it is also clear that schools lack programming to reduce disproportionality in exclusionary discipline practices.
Our analysis indicated that schools have several comprehensive programs available that address student-teacher relationships and teacher practice at various points on the discipline continuum. Programs under study demonstrated a shift from exclusively focusing on student behavior toward relationships within the school and community, as recommended in Gregory et al.’s (2017) framework. A small set of studies indicated that teacher coaching strategies may lead to reduced exclusionary discipline for all students, including those belonging to vulnerable groups, but few provided significant evidence of disparity reduction, and—critically—many require costly adoption measures and ongoing external support, which may limit true potential for impact (Smolkowski, Strycker, et al., 2016).
There is substantial evidence from rigorous research that SWPBIS reduces ODRs and out-of-school suspensions, and this is reflected in our analysis. Perhaps because SWPBIS is not designed with a specific equity focus, we did not find evidence that SWPBIS alone is effective at reducing disparities. In fact, we found evidence that it may exacerbate gaps in certain contexts. Vincent and Tobin (2011) posited that we still know very little about how SWPBIS implementation differs in culturally homogeneous versus culturally heterogeneous schools and that context may be an important consideration that has gone unaddressed in SWPBIS implementation. Welsh and Little (2018) discussed the conceptual underpinnings of programs primarily designed to address student behavior without consideration of larger underlying drivers of disproportionality, and they theorized that “addressing the biases and cultural clashes that may be driving discipline disparities” (p. 773) is a critical component in addressing entrenched inequities. McIntosh et al. (2018) and Scott et al. (2012) provided preliminary evidence that being intentional with school personnel regarding the goal of reducing inequitable discipline practices—building awareness around disparities and tracking progress with disaggregated data—can supplement an SWPBIS framework, but these strategies are still unlikely to address the underlying drivers that Welsh and Little emphasized.
We also confirmed a paucity of rigorous evidence for restorative justice’s capacity to reduce exclusionary discipline, both overall and in terms of disparities. We found no high-quality RCTs examining restorative justice, although some articles indicated that larger RCTs may be forthcoming. Studies that used extensive control covariates in an ORD were largely focused on the same data set (i.e., Denver Public Schools), and these examined differential processing rather than prevention. Though the differential processing line of research is critical—given that African American and Latinx students often receive harsher consequences after a disciplinary infraction—programs addressing postinfraction processing are situated on just one end of the prevention-intervention continuum. These studies did not examine the potential for restorative justice to prevent initial infractions, and, as was the case in Denver Public Schools, they often required a student to take sole responsibility for an incident through the reintegration process. In the Denver Public Schools studies, if students failed to accept responsibility because of perceived injustice or unfairness in the initial infraction assignment, they were assigned an exclusionary consequence. Restorative justice advances the idea that authentic spaces be built for students and staff to work through conflict, but these spaces are not power-free and can quickly become a site for surveillance when students are forced to share their motivations for certain behavior (Lustick, 2017). Whereas Mansfield et al. (2018) stated that a restorative approach prioritizes “engaging students socially in the school community” as opposed to “social control” (p. 306), our results suggest that—in forcing students to admit to and repair harm on the intervention end of the continuum without addressing fairness through prevention—a level of social control remains present in some of this research. Given the popularity of the program among social justice leaders and within the public lexicon, and the potential ability of the program to address student-teacher relationships and school climate, further research using rigorous methodology is critical.
Considering the aforementioned studies on restorative justice, and the two studies that examined the Virginia Student Threat Assessment protocol (Cornell et al., 2012; Cornell et al., 2018), there are options for schools seeking equity in differential processing following a serious infraction (e.g., a threat of violence). We found that schools may address this issue through use of uniform protocols that provide alternatives to the zero-tolerance narrative. Multidisciplinary, school-based teams trained to assess and respond to student infractions with a well-defined yet flexible process should apply such a protocol (Cornell et al., 2012), and the process should support students who have caused harm in the learning environment to repair the harm and remain in or return to the learning environment (Anyon et al., 2016). As mentioned, however, these studies did not provide guidance on prevention efforts, again indicating a focus on the reactionary side of the school-discipline continuum. Schools that support more positive social bonds among practitioners and students and increase feelings of belonging for all students may reduce the need for these processing protocols.
In conjunction with Welsh and Little (2018), our results highlight the continued proliferation of “color-evasive” interventions available to schools. However, we identified a small set of studies (e.g., Gregory, Hafen, et al., 2016; Okonofua et al., 2016) that did not identify specific equity foci within their intervention frameworks but still found disparity reduction for African American students in particular. These studies showed that teachers who gained a deeper understanding of students as individuals, and who provided instruction that communicated high expectations for analytic thinking, can become a powerful driver for disparity reduction, even without explicit training to reduce implicit bias and increase cultural consciousness. These interventions should be implemented with caution, as we also found that key environmental factors in learning environments did not always provide a uniformly positive impact on different groups; the impact of risk and protective factors varies based on developmental timing, family and social circumstances, and niche-specific contexts that exist across cultures (Masten, 2015). For example, it may be that certain programs in the school context (i.e., SWPBIS absent culturally responsive pedagogy) may function as a protective factor for White and female students while failing to do so for non-White and male students. Welsh and Little (2018) questioned whether the conceptual underpinnings behind the array of available alternative approaches sufficiently address the sociocultural causes of discipline disparities. This is a critical question for future study, especially given that some scholars assert that interventions meant to “fix students of color” (Gorski, 2019, p. 58) perpetuate deficit ideologies and, thus, exacerbate racial inequities.
Finally, we found little evidence that studies have explicitly addressed students’ intersectional identities in the context of unfair discipline practices, and the ways in which oppressive policies and practices have disparate effects on groups whose needs are poorly served through policy and practice design (Crenshaw, 1991). Studies most commonly included disaggregated racial outcomes, and a small set of studies included disability labels or gender, but almost no studies included outcomes for the most vulnerable students (e.g., African American males with disabilities). We believe this relates to insufficient theorization in this literature of the underlying drivers of discipline disparities. Though Mansfield et al. (2018) included the importance of intersectionality in selecting interventions meant to support marginalized students, and Gregory, Clawson, et al. (2016) discussed the humanist origins behind restorative justice in regard to its use in building equity, clear grounding in an established theoretical framework is critical in guiding researchers to select interventions that address underlying drivers of disparities and, thus, affect outcomes. Future studies should be grounded in a theoretical or conceptual framework that provides a rationale for the selected intervention, analytic design, and included covariates. Doing so may more effectively target the intersectional impact of racism, ableism, and sexism that pervades inequitable systems within schools (Annamma, 2014) but remain unaddressed in existing programs. Given the complexity of inequities—as evidenced in the efficacy of programs in reducing discipline enactment but not disparities—it is likely that disparate practices are rooted in different student groups accessing fundamentally different school programs and resources in a given context (Carter et al., 2013; Carter et al., 2017; Orfield & Ee, 2014). Thus, clarifying the epistemological assumptions underlying proposed solutions is critical.
Limitations
This review is limited in that it did not include books, book chapters, dissertations, or other published works apart from peer-reviewed journal articles. There were several studies, particularly related to restorative justice, such as doctoral dissertations, that may have provided additional evidence for its efficacy (see Fronius et al., 2019, for a complete review). However, we chose not to include these studies, given that they had not undergone peer review. Additionally, Fronius et al. (2019) reported that there were several large-scale RCTs of restorative justice under way, but these had not been published at the time of this writing. Future work should consider these studies in an effort to reduce publication bias and increase our understanding of restorative justice’s efficacy in reducing disparities. This review also did not include qualitative studies because we aimed to focus on direct evidence of disparity reduction rather than the processes and perceptions experienced by practitioners implementing these interventions. However, we acknowledge that additional information related to discipline disproportionality is available from a wide range of sources, which are relevant in informing future implementation research.
In addition, this study was limited in that we were not able to conduct a full-scale meta-analysis. We reported odds ratios as they were reported in the research and, given that odds ratios are a measure of effect size in that they identify the strength and direction of a relationship, we recognize that they are unstandardized effect sizes. Studies that have different units of measure for the dependent variable (e.g., counts vs. binary indicators) are difficult to compare. The limited number of available studies, along with their vastly different methodological designs, analyses, covariate adjustments, and samples, meant that a meta-analysis was not feasible. Future research should consider the smaller set of RCTs available and use meta-analytic procedures to explore moderators for each program type.
Conclusions and Future Directions
This study examined key components of disparity-reducing discipline programs—in alignment with Gregory et al.’s (2017) framework—and the available evidence underlying these programs’ impact on discipline disparities. Our analysis indicated the trend toward research on multilevel programs that address student-teacher relationships and teacher practices. We also uncovered several gaps in the literature that should be prioritized in future study. We found that few studies offered disaggregated estimates of program efficacy by demographic group—a critical aspect of research if we are to understand whether such programs actually address unfair practices. Future studies should include interaction terms wherever possible to further elucidate impact on marginalized groups. Additionally, we found that the ways in which treatment fidelity was considered affected interpretation of program efficacy. We therefore emphasize that future studies must be clear about how fidelity is measured and included in analyses. Although preliminary studies might examine treatment fidelity, interventions must eventually consider routine implementation before a program can be considered effective and, ultimately, feasible. Program implementation studies could begin with demonstration studies, and, when these are found to be efficacious, researchers should then proceed to studies implemented in real-world contexts that examine implementation in typical school conditions prior to costly, highly controlled RCTs (Hill et al., 2013). Additionally, future studies should be clear about whether a program supports prevention or intervention—or whether the program addresses multiple aspects of the discipline continuum—in line with Gregory et al.’s (2017) framework.
Our analysis also highlighted the importance of local context in understanding students’ social-emotional needs and development. Although we identified some studies that examined national- and state-level data, most analyzed a single district with unique needs and resources. For example, Osher et al.’s (2014) study of a Cleveland-area school district displayed the need to understand an educational agency’s unique challenges before implementing a program meant to support the school, district, and community. Programs in urban California may serve needs and goals different from those needed to support students in rural Idaho, for example. Studies that use learning-lab techniques (Bal et al., 2018) or design-based school improvement techniques (Mintrop, 2016) that engage with schools’ and districts’ contextual needs are an area for future research in reducing discipline disparities. Researchers must be explicit in describing the local contexts in which interventions are studied.
Finally, one of our most notable findings was that one study (Gregory, Hafen, et al., 2016) provided robust evidence of disparity reduction despite the intervention being focused on instructional practices rather than on equity-specific, discipline practices. This is notable because few studies on exclusionary discipline have examined the relationship between classroom instruction and classroom management, both of which are critical factors in student engagement. The idea that teachers should hold high expectations for all students is well established in the empirical base for culturally responsive pedagogy (see Hammond, 2014; Ladson-Billings, 1995; Valenzuela, 1999). Given that teachers in Gregory, Hafen, et al.’s study integrated higher order thinking skills into their pedagogy and, in doing so, significantly reduced discipline disparities, indicates the need for further analysis of increased academic rigor as a way to reduce discipline disparities. The authors hypothesized that, through “the opportunity to engage in cognitively demanding problem-solving tasks, Black students may detect their teachers’ high expectations and confidence in them as scholars” (Gregory, Hafen, et al., 2016, p. 186).
We believe it is a critical gap that recommendations for improving discipline practices have revolved around school and classroom climate, multitiered systems of support, and collaboration among practitioners (Morgan et al., 2014) without attending to depth and rigor of instruction. We cannot overstress the need for further research in this area, as it holds potential for both increasing positive relationships between teachers and students and for building academic opportunities for marginalized groups. It may be that “training teachers to strengthen the motivating and engaging qualities of instruction” (Anyon et al., 2016, p. 1688) functions as a prevention strategy. Though this is an understudied approach in the disparity reduction literature, we encourage schools and districts wishing to address discipline inequities to consider the integrated nature of pedagogy and social/emotional supports.
In addition to understanding the dynamic nature of schools as they comprise students’ sociocultural contexts, studies must consider the ways that programming may improve school and classroom climate and student-teacher interactions through instructional and pedagogical design. Though we found a lack of empirically supported solutions, there is reason for optimism as the field learns from this body of work and systematically tests innovative approaches in line with Gregory’s framework. As disparities research identifies systemic and structural issues that pervade educational opportunity, this study provides insight into a small but critical aspect of school quality. Given the persistent racial and class stratification that exist in society (e.g., racial segregation, discriminatory housing policies, equal access to health care; see Desmond & Emirbayer, 2015) and, in turn, affect how schools and districts operate and ultimately shape educational opportunities and outcomes (Carter et al., 2013), future educational research has the potential to provide a rich understanding of interventions that build students’ identities as learners and allow teachers to bring empathy and understanding to the learning environment.
Footnotes
Notes
Authors
REBECCA A. CRUZ earned her PhD in special education from a joint doctoral program between University of California, Berkeley, and San Francisco State University (SFSU) and is an assistant professor of education at San Jose State University, 1 Washington Sq., San Jose 95112-3613, USA; email:
ALLISON R. FIRESTONE is a PhD candidate in the University of California, Berkeley, and San Francisco State University’s joint doctoral program in special education (University of California Berkeley, 2121 Berkeley Way, 4th Floor, GSE, Berkeley, CA 94118; email:
JANELLE E. RODL earned her PhD in special education from a joint doctoral program between University of California, Los Angeles, and California State University, Los Angeles, and is an assistant professor of special education (mild/moderate disabilities) at San Francisco State University, 1600 Holloway Ave., San Francisco, CA 94132-1722, USA; email:
