Abstract
Racial disparities in school discipline remain central to policy discussions around school discipline. Recent research points to the importance of how discipline disparities are measured for the conclusions that are drawn about the extent of the problem or whether such disparities are improving. This brief uses data from Maryland to demonstrate how the choice of metric of the Black-White discipline gap can drastically change conclusions about whether the gap is closing or widening as well as conclusions about whether particular districts or schools have high or low racial disparities in discipline. This brief shows how interpretations of the Black-White discipline gap can be drastically different when using different metrics. Implications for educational researchers and practitioners studying school discipline are given.
Keywords
Purpose of This Research Brief
School discipline has garnered increasing attention among policymakers and the public. Much of this has been driven by evidence of racial disparities in discipline (Gregory et al., 2010). Largely absent from such discussions, however, has been a consideration of how the metrics used to estimate racial disparities may matter for the conclusions drawn. Recent work by Girvan et al. (2019) found that metrics of discipline disparities differ in their properties in ways that may influence conclusions drawn from their use.
Although such measurement issues have been described in a select set of practitioner guides (Bollmer et al., 2014; Petrosino et al., 2017) and policy reports (Curran & Finch, 2018; Porowski et al., 2014), in public testimony (Losen, 2018), and in other commentary (Scanlan, 2015), most policy documents and peer-reviewed research studies of school discipline rely on a single metric of racial disparities, with little consideration of how doing so may affect the conclusions (Gregory et al., 2010; Scanlan, 2015). This contrasts with the study of racial disparities in other areas of education (i.e., Bollmer et al., 2007). For example, in special education much consideration has been given to how racial disparities in placement are measured, and federal guidance codifies approaches to measuring special education disparities (see Bollmer et al., 2014; Office of Special Education Programs, 2017).
This research brief compares the use of two commonly used approaches of measuring racial disparities in discipline, particularly risk ratios (RR) and risk differences (RD), along with the use of a third metric, the raw differential representation (RDR), which prior work has argued has desirable properties (Girvan et al., 2019). Using data from Maryland as an example, this brief demonstrates that the choice of metric can significantly alter conclusions about whether discipline gaps are closing over time and whether particular districts have relatively high or low racial disparities in discipline. In doing so, it highlights the practical implications of the measurement issues discussed by Girvan and colleagues (2019).
Methodology
This analysis drew on data from the Maryland Department of Education and from the federal Department of Education’s Office for Civil Rights. Conclusions about the Black-White discipline gap were compared using RRs, RDs, and RDR. The methodological appendix contains further details about the data and analysis.
RRs (see Equation 1) are one of the most common ways to measure racial disparities in discipline. They are also the metric codified by the Department of Education for measuring racial disparities in special education placement (Office of Special Education Programs, 2017).
RRs yield the commonly cited statistic that Black students are three to four times as likely as their White peers to experience a suspension from school. 1
In contrast, RDs (see Equation 2) represent the difference in the proportion of Black students suspended and the proportion of White students suspended.
For example, if a given district disciplined 15% of its Black students and 5% of its White students, it would have an RR of 3 (15% divided by 5%) and an RD of 0.10 or, equivalently, 10 percentage points (15% minus 5%).
A third measure, RDR, captures information about the number of students affected by the disparity (a feature not captured by RR or RD).
RDR represents the number of Black students who experience a disciplinary event above and beyond the number expected if rates of discipline for Black and White students were equal. The RDR metric has the benefit of being more stable over time and communicating the magnitude of students affected by the disparity (Girvan et al., 2019). It is, however, highly correlated with enrollment size of the group of interest and therefore has limited utility for comparisons across districts with varying enrollment of the target group. 2 Table 1 provides a concise summary of the relative strengths and weaknesses of each of these three metrics.
Comparison of Discipline Disparity Metrics
Note. Strengths and weaknesses summarized here are drawn in part from findings in Girvan et al. (2019), where readers can find greater detail about the technical properties of each metric.
Key Findings
Is the Black-White Discipline Gap in Maryland Closing or Widening? Depends on the Metric
Conclusions about changes in discipline gaps over time can be sensitive to the metric used. Panel 1 of Figure 1 shows the percentage of students who experienced a suspension or expulsion in Maryland from 2010 to 2015. Overall rates of suspension or expulsion for both Black and White students decreased during this time period; however, whether the Black-White discipline gap decreased or increased depends on how it is measured. As shown in Panel 2 of Figure 1, the Black-White discipline gap as measured by an RR increased during this period, whereas RD decreased. This is consistent with findings that RRs tend to be negatively correlated with underlying rates of discipline whereas RDs are positively correlated with underlying rates of discipline (Girvan et al., 2019; Scanlan, 2015). As shown in Panel 3, RDR also decreased from approximately 20,000 students to 15,000, more closely aligning with the trend of RD than RR. To the extent that stakeholders have evidence that overall rates of discipline are decreasing, RD or RDR may be a more desirable measure of disproportionality as these measures are not as susceptible to showing increasing disparities as a function of decreases in underlying discipline rates.

Black and White suspension or expulsion rates over time with corresponding Black-White discipline gap measured as risk ratios, risk differences, and raw differential representation.
The Curious Case of Montgomery County Public Schools
The choice of metric also can influence conclusions about whether a particular district has a relatively large or small racial discipline gap (Curran & Finch, 2018). Table 2 shows the Black-White discipline gap measured at the school district level for each district in Maryland for the 2015–2016 school year. The left-hand panel shows district-level enrollment and number of suspensions by race for districts ordered alphabetically. The right-hand panel shows rates of discipline by race along with RR, RD, and RDR, each ordered by magnitude of the rate or metric of disparity.
Maryland School District Enrollment and Discipline Data Arranged by Black-White Discipline Gap by Metric Type
Note. Disparities in OSS rate calculated from 2015–2016 Civil Rights Data Collection data. OSS rates represent unduplicated counts of the number of students of a subgroup experiencing OSS relative to the number of students of each subgroup in the district as reported by the district to the federal government. Districts with fewer than 20 White or Black students were omitted due to low sample size. OSS = out-of-school suspension; p.p. = percentage points.
Ironically, Montgomery County Public Schools (in bold in Table 2) appears at the top of the RR list and the bottom of the RD list. In other words, Montgomery has either the largest Black-White discipline gap in the state (as measured by RR) or the smallest in the state (as measured by RD). Interesting to note, in terms of raw rates of discipline, Montgomery has the lowest White and Black suspension rates of any district in the state. This may explain, in part, the stark differences seen in Montgomery’s ranking across RR and RD as RRs tend to be higher and RDs lower when underlying rates of discipline are lower. Consequently, examining RRs and RDs together with the underlying rates of discipline is important for contextualization.
Although Montgomery ranks higher on RDR (fifth among all districts), this is largely driven by the size of Black student enrollment in Montgomery (fourth in the state). Although this points to a greater absolute number of Black students affected by the racial disparity in Montgomery than in many other districts, the RDR metric is not as useful for determining whether districts with differing enrollment of Black students are excelling or struggling at promoting equitable discipline by race. Coupling RDR with metrics that are scaled to the target group’s enrollment is useful when attempting to gauge relative performance.
Underlying Rates Matter Too
Finally, it is worth nothing that RRs, RDs, and RDR can miss important differences in schools’ use of discipline. For example, in 2015–2016, Benjamin Stoddert Middle School (Charles County, Maryland) had an RR of 1.08, an RD of 0.97, and an RDR of less than five, indicating virtually no Black-White discipline gap. Similarly, Violetville Elementary/Middle (Baltimore City, Maryland) had an RR of 1.13, an RD of 0.82, and an RDR of about two, all quite similar to Stoddert’s. Despite similar values on each metric, Stoddert suspended nearly twice as many students of each subgroup (12.7% of White students and 13.6% of Black students) as Violetville (6.3% of White students and 7.1% of Black students). In short, attention to the overall levels of suspensions for subgroups is also important.
Implications
This brief highlights the drastic differences in our understanding of discipline disparities that can occur as a result of the choice of metric. The choice of metric can alter whether we conclude that discipline gaps are increasing or decreasing over time and can drastically alter whom we identify as high or low performers in discipline equity. In doing so, this brief highlights the potentially large practical implications of measurement issues raised elsewhere (Girvan et al., 2019; Gregory et al., 2010; Losen, 2018; Petrosino et al., 2017; Scanlan, 2015).
It is important to note that no single metric is inherently correct or wrong. They merely represent different ways of measuring discipline disparities, and each has its own strengths and weaknesses (see Table 1). To the extent that policymakers and practitioners value reducing suspension rates for all subgroups in addition to improving racial equity in discipline, using alternative metrics in addition to RR may be desirable as RR is prone to increase as underlying rates of discipline decrease. The use of RDs would capture decreases in the percentage point difference of students disciplined, and RDR would capture changes in the magnitude of minority students disciplined above and beyond what would be expected if rates were equal across subgroups. Coupling these with presentation of the rates of discipline for each subgroup helps contextualize a district or school’s performance. In sum, a sole reliance on a single metric may yield narrow conclusions. A consideration of multiple metrics then can help guide efforts to identify and combat differential identification of misconduct, differential administrative responses to misconduct, and differential effectiveness of reforms to discipline. Researchers, policymakers, and educators should be aware of how measurement choices can change conclusions drawn from the data.
Recommendations
Policies pertaining to and research examining discipline disparities should employ multiple metrics of disparities.
Raw levels of discipline for subgroups should also be reported and examined.
Researchers should further probe the sensitivity of discipline disparities to the metric used to better understand under what circumstances differing conclusions may be drawn based on the metric used (see Girvan et al., 2019, for a detailed examination of the properties of various metrics).
Addressing racial disparities in discipline remains a crucial priority for policymakers and educators. Doing so with appropriate attention to the metrics used can help ensure such efforts are as productive as possible at improving outcomes and equity for all students.
Footnotes
Notes
Author
Methodological Appendix
This research brief drew on data from two sources. The first was the Maryland State Department of Education’s ([MSDE’s] 2010a, 2010b, 2011a, 2011b, 2012a, 2012b, 2013a, 2013b, 2014a,
) annual reports on school enrollment and discipline use. The second was data collected by the federal Department of Education’s Office for Civil Rights as part of the Civil Rights Data Collection (CRDC).
This methodological appendix organizes the discussion of the methodology by subsection of the research brief. First, I present the analytic approach and data sources that support the findings of the section “Is the Black-White Discipline Gap in Maryland Closing or Widening? Depends on the Metric.” Then, I present the analytic approach and data sources that support the findings of the sections “The Curious Case of Montgomery County Public Schools” and “Underlying Rates Matter Too.”
