Abstract
As the use of risk assessments for correctional populations has grown, so has concern that these instruments exacerbate existing racial and ethnic disparities. While much of the attention arising from this concern has focused on how algorithms are designed, relatively little consideration has been given to how risk assessments are used. To this end, the present study tests whether application of the risk principle would help preserve predictive accuracy while, at the same time, mitigate disparities. Using a sample of 9,529 inmates released from Minnesota prisons who had been assessed multiple times during their confinement on a fully-automated risk assessment, this study relies on both actual and simulated data to examine the impact of program assignment decisions on changes in risk level from intake to release. The findings showed that while the risk principle was used in practice to some extent, the simulated results showed that greater adherence to the risk principle would increase reductions in risk levels and minimize the disparities observed at intake. The simulated data further revealed the most favorable outcomes would be achieved by not only applying the risk principle, but also by expanding program capacity for the higher-risk inmates in order to adequately reduce their risk.
Introduction
Risk assessments for correctional populations are generally used to prospectively identify those who have a greater risk of violating the rules of prison or jail, the conditions of community supervision or, more broadly, the laws of society. Correctional authorities use risk assessments to guide a host of decisions that are often intended to enhance public safety and make better use of limited resources. Because risk assessments rely, to a large extent, on historical data to make predictions about future behavior, these instruments are, in many ways, like holding up a mirror to criminal justice system policies, practices and decisions (Mayson, 2019).
Yet, given the disproportionate involvement of racial and ethnic minorities in the criminal justice system, some critics have argued that risk assessments perpetuate and perhaps even exacerbate these disparities (Angwin et al., 2016; Doleac & Stevenson, 2016). Because one of the best predictors of what will occur in the future is past behavior, a person’s criminal history has consistently been shown to be a strong predictor of recidivism (Caudy et al., 2013; Duwe, 2014). However, if that person’s criminal history is the by-product of prior discriminatory decisions and practices, even if only in part, then the risk assessment’s prediction about his or her likelihood for recidivism will presumably contain this bias.
In response to this dilemma, one of the common assumptions has been that disparities can, and should, be resolved within the instrument itself in order to achieve greater equity (Chouldechova, 2017; Johndrow & Lum, 2019; Wadsworth et al., 2018). In other words, risk assessments should be designed to yield predictions that are free of disparities. As Berk et al. (2021) have demonstrated, however, it is mathematically impossible to design a risk assessment instrument that simultaneously maximizes predictive accuracy and fairness. As a result, the apparent solution to this predicament involves designing risk assessments in which trade-offs are made between fairness and accuracy. For example, the removal of some items on a risk assessment may help produce greater equity but would, at the same time, weaken the instrument’s predictive validity.
While much of the concern over fairness has zeroed in on the data and algorithms used to design risk assessments, relatively little attention has been given to how they are used in practice. The primary reason for the focus on fairness is likely due to the impression that risk assessments are used largely, if not exclusively, for punitive purposes for those involved in the criminal justice system. Although the equity concern has been broadly applied to risk assessments in general, the truth is that much of it has been connected to pre-sentencing risk assessments such as those used for pre-trial detention and sentencing decisions. For example, in the case of pre-trial detention, risk assessments have been used to help make decisions as to whether someone should be confined or released. Moreover, because pre-sentencing assessments are typically administered only once to individuals, there is no subsequent opportunity for a person to reduce his or her risk.
But with post-sentencing risk assessments based on the risk-needs-responsivity (RNR) model, such as those typically used with probationers, prisoners and parolees, the purpose is more likely to be rehabilitative. During the last few decades of the twentieth century, evidence began to accumulate that challenged the infamous conclusion drawn in the 1970s that “nothing works” (Martinson, 1974). The rise of the “what works” literature within corrections gradually led to the emergence of the principles of effective correctional intervention and the risk-needs-responsivity (RNR) model, which identifies who should be treated (risk), what areas should be treated (needs), and how treatment should be delivered (responsivity) (Andrews et al., 2006; Bonta & Andrews, 2007).
The risk principle, in particular, holds that programming resources should be concentrated among those with a higher risk for recidivism. To successfully desist from crime, higher-risk individuals generally require a greater dose or intensity of programming (Bonta et al., 2000; Lowenkamp & Latessa, 2005). Moreover, at a broader level, adhering to the risk principle helps maximize an effective intervention’s impact on recidivism (Duwe & Kim, 2018). Correctional agencies can therefore get the most bang for their treatment buck by providing higher-risk individuals with more programming resources.
While the RNR model places a premium on prioritizing higher-risk individuals for programming, it also assumes that interventions which target criminogenic needs (dynamic risk factors) are more likely to decrease recidivism because changes can be made in these factors. The emphasis on identifying criminogenic needs has figured prominently in the development of instruments that include both static and dynamic predictors of recidivism. More recent assessments have also been designed to better integrate protective factors (i.e., factors that reduce recidivism risk) within the assessment process, follow individuals from intake to case closure, and be administered on multiple occasions (Brennan et al., 2009; Hamilton et al., 2016, 2017; Taxman, 2018). Therefore, under the RNR framework, if an individual is assessed as higher risk at the time of intake, she would not only be prioritized for programming that addresses her criminogenic needs but would also receive multiple assessments prior to release to determine whether and to what extent her risk has been reduced.
As the above demonstrates, the primary purpose of post-sentencing risk assessments used within the RNR model is to provide supportive and therapeutic resources to those who need it the most. This has important implications for how to optimize the accuracy and equity of post-sentencing risk assessment instruments. Let us assume, for example, that a correctional system observes disparities in risk for its population. If this system follows the risk principle, then there should also be disparities in program participation. To illustrate, if all of Group A is high risk compared to half of Group B, then Group A’s rate of program participation should be roughly double that for Group B if the risk principle is being followed. And, if the programming delivered was effective in reducing recidivism, then fidelity to the risk principle would help mitigate the disparities in risk observed between both groups. Therefore, rather than exacerbating racial disparities, responsible use of post-sentencing risk assessments should yield more equitable outcomes.
Present Study
Given the ongoing focus on the data and algorithms used in the design of risk assessments, existing research has yet to rigorously examine how the use of these instruments might help reduce racial and ethnic disparities. To this end, the present study analyzes the extent to which application of the risk principle affects program assignment decisions and, by extension, differences in risk by race and ethnicity. In November 2016, the Minnesota Department of Corrections (MnDOC) implemented the Minnesota Screening Tool Assessing Recidivism Risk (MnSTARR) 2.0, a gender-specific, fully-automated instrument that assesses risk for multiple types of recidivism. Relying on the MnSTARR 2.0 assessment data is advantageous in several ways. First, because the MnSTARR 2.0 does not need to be scored manually by correctional staff, an assessment schedule is used to automatically generate multiple assessments during each person’s period in prison. In particular, assessments are automatically generated on the days that persons enter and leave prison, which makes it possible to evaluate changes in risk from admission to release. Second, due to the automated process, all assessments are scored the same way; in other words, any differences in risk between admission and release are not attributable to inter-rater disagreement, which is endemic to manually-scored assessments (Duwe & Rocque, 2017).
In examining changes in risk levels among 9,529 inmates released between 2017 and 2019, the results presented later show the MnDOC did not fully adhere to the risk principle in making program assignment decisions. As a result, this study also analyzes simulated data that assume greater fidelity to the risk principle. In doing so, it addresses the question: If the risk principle had been assiduously followed, how would that have changed racial and ethnic disparities in the context of program participation and risk level changes? Further, this study analyzes simulated data that assume greater risk principle adherence along with increased program capacity. That is, if more programming was available that was delivered to those who were higher risk, what effects would it have had on disparities in recidivism risk?
The Development and Validation of the MnSTARR 2.0
Prior to the debut of the MnSTARR 2.0, the MnDOC implemented the original MnSTARR in April 2013. A recidivism risk assessment that was manually scored by prison caseworkers, the MnSTARR was a “multiple-band” instrument that assessed risk separately for male and female prisoners for five different types of recidivism—nonviolent, felony, nonsexual violent, first-time sexual offending, and repeat sexual offending—over a 4-year follow-up period (Duwe, 2014). The MnSTARR was developed on the assumption that risk factors vary by gender, resulting in separate recidivism risk scales for males and females (Duwe, 2014). Both males and females were assessed for their risk of nonviolent, felony, and nonsexual violent recidivism, although only males are assessed for their risk of either first-time or repeat sexual offending. 1 Males without a history of sexual offending were assessed for their risk of committing a first-time sex offense, whereas those with a sexual offending history were assessed for their risk of sexual recidivism. 2
Because the MnSTARR was created to be a risk assessment tool, it was not designed to identify which needs areas should be targeted for programming. Yet, because the MnSTARR’s noncriminal history/dynamic items measured observable behavior in prison such as misconduct or completion of programming, the assessment indicated which needs areas improved or grew worse while an individual was incarcerated. For example, antisocial peers is a major criminogenic need. On the MnSTARR, active membership in a security threat group (i.e., gang) is a dynamic factor—because people in prison can gain or lose active membership while confined—that increases recidivism risk. In contrast, receiving visits in prison, which generally increases prosocial support and has been associated with reduced recidivism in Minnesota (Duwe & Clark, 2013) and elsewhere (Mitchell et al., 2016), decreases risk for some measures of recidivism. Similarly, completing chemical dependency treatment in prison, which addresses substance abuse (a moderate criminogenic need), lowers an individual’s recidivism risk according to the MnSTARR.
In the MnSTARR development study, the overall sample consisted of 11,375 males and 1,100 females who were released from prison between 2003 and 2006. Multiple logistic regression was the classification method used to develop the MnSTARR, and backward stepwise selection, along with bootstrap resampling, was used to identify significant predictors. Using the AUC as the lone metric for predictive validity, Duwe (2014) reported the optimism-corrected estimates ranged from 0.73 to 0.80 across the five recidivism measures for male offenders and from 0.73 to 0.81 for the three recidivism measures for female offenders. Because the MnSTARR was originally designed to be a manually-scored instrument via a database review, the development study also included an inter-rater reliability assessment among MnDOC caseworkers that found an overall intraclass correlation coefficient (ICC) of 0.84 for the eight recidivism measures (Duwe, 2014).
In a more recent study that externally validated the MnSTARR among a sample of 3,985 inmates released from Minnesota prisons in 2014, Duwe and Rocque (2019) found it achieved adequate predictive performance. The average AUC was 0.73 for males and 0.77 for females. Nonetheless, the MnSTARR would have achieved better predictive performance had it used an automated scoring process. The findings showed the MnSTARR performed better for Whites than Non-Whites, and the magnitude of this difference would have been minimized using automated scoring.
Prior to the MnSTARR development study, the MnDOC had used the Level of Service/Case Management Inventory (LS/CMI) and, before that, the Level of Service Inventory-Revised (LSI-R) to assess risk and need. But given that the MnSTARR significantly outperformed the LSI-R in predicting multiple types of recidivism for Minnesota prisoners, the MnDOC began using the MnSTARR as its risk assessment instrument in April 2013. Per MnDOC policy, administration of the MnSTARR was limited to prisoners who were confined more than 180 days. Thus, inmates whose imprisonment periods were less than 180 days did not receive a MnSTARR or LS/CMI assessment. The MnDOC has continued to use the LS/CMI but strictly as a needs assessment instrument. More specifically, use of the LS/CMI has been limited to higher-risk inmates, per the MnSTARR, because these are the individuals who generally get prioritized for institutional programming.
In November 2016, the MnDOC transitioned from the MnSTARR—an assessment manually scored by correctional staff—to the MnSTARR 2.0—a fully-automated assessment (Duwe & Rocque, 2017). The MnSTARR 2.0 extracts data from the state’s criminal history repository to populate the criminal history items on the instrument, while data from the Correctional Operations Management System (COMS)—the MnDOC’s centralized database—are pulled to populate items pertaining to demographic characteristics (e.g., gender, age, and marital status), institutional behavior (e.g., discipline convictions and gang affiliation), and participation in programming (e.g., earning a post-secondary degree in prison, completing chemical dependency treatment, and completing cognitive-behavioral therapy). The only MnSTARR 2.0 items that are not auto-populated are those for the MnSOST-4 (Duwe, 2019), the sex offense recidivism risk scale that has continued to be scored manually by correctional staff. Still, after a MnSOST-4 assessment has been completed, the MnSOST-4 score is extracted from COMS and uploaded within the MnSTARR 2.0 assessment.
In addition to using an automated scoring process, there are several other notable differences between the original MnSTARR and the MnSTARR 2.0. First, the data set was expanded from 2003 to 2006 releases to include all persons released between 2003 and 2010. The training set for the MnSTARR 2.0 included individuals released during the 2003 to 2008 period, whereas the test set comprised releases from 2009 to 2010. Second, whereas logistic regression was used to develop the original MnSTARR, regularized logistic regression was the classification method used to create the MnSTARR 2.0. Third, in response to potential concerns over the stigma associated with the sex offender label for those who have never been convicted of a sex offense, the first-time sexual offending risk scale was removed. Finally, due to the automated scoring process, the MnSTARR 2.0 includes nearly 50 items (for both male and female offenders), which is more than double the number of items that were on the MnSTARR.
Because the MnSTARR 2.0 is not scored manually by correctional staff, Minnesota prisoners generally receive at least three assessments during their confinement. Most MnSTARR 2.0 assessments are generated through an overnight batch process in which assessments are run according to a person’s confinement schedule. That is, on the basis of this schedule, prisoners receive auto-generated assessments on (1) the day they enter prison, (2) 130 days prior to release (for release planning purposes), and (3) the day before their release from prison. People with longer lengths of stay (i.e., more than 18 months) also receive an updated MnSTARR 2.0 assessment every 12 months. In addition to the auto-generated assessments, MnDOC staff can produce MnSTARR 2.0 assessments on their own within COMS. More specifically, by clicking a button in the COMS application, staff can generate an up-to-date MnSTARR 2.0 assessment in less than 15 seconds.
As with the original MnSTARR, the MnSTARR 2.0 contains dynamic items that reflect whether needs areas have grown worse or improved while someone is in prison. In addition to STG involvement, documented suicidal tendencies and prison misconduct increase the risk of recidivism. On the other hand, the MnSTARR 2.0 contains nine dynamic items that decrease recidivism risk. These nine items consist of interventions that have been found to decrease recidivism among Minnesota prisoners, including visitation (Duwe & Clark, 2013), education programming (Duwe & Clark, 2014), work release (Duwe, 2015a), the Challenge Incarceration Program (CIP) (Duwe & Kerschner, 2008), chemical dependency (CD) treatment (Duwe, 2010), cognitive-behavioral therapy (CBT) (Duwe & Clark, 2015), the EMPLOY program (Duwe, 2015b), and the InnerChange Freedom Initiative (IFI) (Duwe & King, 2013), which is now known as the Prison Fellowship Academy (PFA).
Table 1 shows the effect sizes for each of the nine interventions across the three recidivism measures for males and females. These effect sizes are derived from the RLR models that were trained on releases from 2003 to 2008 and tested on those released in 2009 and 2010. As Table 1 indicates, the size of the effect for each intervention varies across the recidivism measures as well as gender. For example, some interventions, such as CBT for males, do not reduce risk for every recidivism measure. Moreover, some interventions, such as CIP, have a greater effect for males, while the reverse is true for education programming for females.
MnSTARR 2.0 Intervention Effect Sizes by Recidivism Type.
Persons assessed on the MnSTARR 2.0 receive a “score” for each of the three recidivism measures that reflects the probability (which ranges from 0% to 100%) of reconviction within 3 years of release. For example, a male prisoner without a prior history of sexual offending could receive a score of, say, 20% for violent recidivism, 50% for felony recidivism, and 60% for non-violent recidivism at the time of intake. When this same person receives another assessment at the time of release, let us assume he has participated in EMPLOY. As the effect sizes for the EMPLOY program for males in Table 1 indicate, this person’s risk scores would therefore drop to 18.4% for violent recidivism, 59.4% for non-violent recidivism, and 40% for felony recidivism.
Each person assessed on the MnSTARR 2.0 receives a risk level—low, medium, high, or very high. As shown in Table 2, each recidivism measure has a cut point for the four risk levels. The cut points for each recidivism measure are not only tied to the observed base rates for males and females, but they also reflect a priority in assessing risk for more serious, violent offenses. For example, while a score of 30% would fall within the very high risk range for violent recidivism, it would be in the low range for either felony or non-violent recidivism. However, a score of 30% would be nearly double the violent recidivism base rate for male Minnesota prisoners, and it would place this person at about the 85th percentile; in other words, only 15% of male Minnesota prisoners would have a higher violent recidivism probability.
MnSTARR 2.0 Recidivism Probabilities and Risk Categories.
For the MnSTARR 2.0, risk levels are determined by the recidivism scores (i.e., probabilities) that meet or exceed the highest risk level cut point. Returning to the example discussed above, a male prisoner with scores of 20% for violent recidivism, 60% for non-violent recidivism, and 50% for felony recidivism would fall into the high risk category. More specifically, even though his non-violent and felony recidivism scores are in the medium range, his violent recidivism score is in the high range, which is why he would be classified as high risk at the time of intake. Recall, however, that participating in EMPLOY would lower his scores to 18.4% for violent recidivism (medium range), 59.4% for non-violent recidivism (medium range), and 40% for felony recidivism (low range). Therefore, after having participated in EMPLOY, this person would have seen his MnSTARR 2.0 risk level drop from high at the time of intake to medium by the time of release.
It is worth clarifying that the effect sizes for EMPLOY (or any of the other items on the MnSTARR 2.0) are estimates of its influence on actual recidivism outcomes. Therefore, as is the case with estimates, there is some uncertainty in the impact that the MnSTARR 2.0 items, including EMPLOY, would have on observed recidivism outcomes for those assessed on the instrument. Conversely, the effect sizes for EMPLOY are not estimates of the influence that this item (or any other item on the MnSTARR 2.0) would have on a person’s risk scores. Indeed, as shown earlier in Table 1, participating in EMPLOY reduces a male prisoner’s risk scores by 8% for violent recidivism, 1% for non-violent recidivism, and 20% for felony recidivism. Because the effect sizes for EMPLOY were determined when the MnSTARR 2.0 was developed and validated, the precise impact of this item (or any other item on the MnSTARR 2.0) on recidivism risk scores is already known.
Data and Method
The dataset for this study consisted of 9,529 inmates released from Minnesota prisons between January 1, 2017 and December 31, 2019. Given that the MnSTARR 2.0 debuted on November 16, 2016, the sample contained only the individuals who entered prison on or after November 16, 2016 and were released prior to 2020. The 9,529 persons in the sample contained 8,105 males and 1,424 females whose average length of stay (LOS) in prison was 9 months. The 9,529 inmates received a total of 31,966 MnSTARR 2.0 assessments during their confinement, an average of more than three assessments per person.
To examine the impact that program assignment decisions may have on risk levels and, more narrowly, on racial and ethnic disparities, this study developed three distinct scenarios. The first scenario consisted of actual results, while the latter two contained simulated outcomes. For all three scenarios, this study examines (1) risk level at the time of intake, (2) participation in programs included within the MnSTARR 2.0, and (3) risk level at the time of release.
Even though the MnSTARR 2.0 contains nine risk-reducing interventions, only four of these are generally amenable to the risk principle. Both work release and CIP are early release programs which, as the name implies, grant participants with an earlier release from prison. Accordingly, the criteria for both programs, some of which are based on Minnesota statutes, favor persons who tend to be lower risk. Education programming, especially adult basic education (ABE), is typically provided to any inmate who does not have a secondary degree regardless of recidivism risk. While the risk principle could be applied to visitation, particularly if a community volunteer visitation program was run within a prison system, this is not currently the case within Minnesota’s prison system. Instead, as long as inmates have social supports who are willing and able to visit, then prisoners can receive visits irrespective of their risk level.
The remaining four programs—CD treatment, CBT, EMPLOY, and IFI—are available for both male and female prisoners, and the risk principle can be applied to assignment decisions for each one. In fact, the MnDOC uses the MnSTARR 2.0 risk level in a simple algorithm to help prioritize prisoners for CD treatment, although risk accounts for less than half of the points in the algorithm (assessed chemical dependency need and type of offense account for the remaining points). The major constraint to participating in any of these four programs involves the duration of each program and, by extension, the length of time a person is confined. For example, it generally takes at least 3 months to successfully participate in either the CBT or EMPLOY programs. Given the amount of time it takes to process new admissions to prison, it is uncommon to see prisoners with less than 120 days in prison participating in these programs. CD treatment, meanwhile, often takes at least 6 months to complete, while IFI is a much longer program that takes at least 18 months to finish.
As noted above, the first scenario examined in this study involves the observed risk levels at intake and release along with actual participation in the four programs. In the second scenario, this study assumes that program capacity would remain the same as it was for the first scenario. However, as long as individuals had a sufficient length of stay to participate in a program, then the highest-risk inmates would be assigned to the program. For example, it was assumed the confinement period would need to be at least 4 months for CBT and EMPLOY, 6 months for CD treatment, and 18 months for IFI. For multiple program participation, it was assumed the LOS needed to be at least 7 months for both CBT and EMPLOY; 10 months for CD treatment and either CBT or EMPLOY; 13 months for CD treatment, CBT and EMPLOY; 18 months for IFI; 21 months for IFI and either CBT or EMPLOY; 24 months for IFI and CD treatment; 25 months for IFI, CBT, and EMPLOY, and 30 months for all four programs. This study then simulates the effects of risk-based program assignment decisions on MnSTARR 2.0 risk levels at the time of release.
In the third scenario, this study applies the same assumptions regarding the minimum LOS for participating in the four programs, either for a single program or multiple programs. This scenario assumes, however, that program capacity would expand and be available to all inmates with a risk level of high or very high at the time of intake as long as they have a sufficient LOS. This study then simulates the impact of risk-based assignment decisions with expanded capacity on risk levels at the time of release. In doing so, this scenario also helps illustrate the extent to which there may be gaps in program capacity.
Analytical Procedures
To evaluate the effects of risk principle fidelity, this study begins by presenting descriptive statistics on program participation rates by race/ethnicity across the three different scenarios for males and females. Next, to develop estimates of these effects, a series of multiple logistic regression models were run for each of the four programs across the three scenarios for males and females. More specifically, program participation was the dependent variable in the logistic regression models, while the independent variables were those that may affect whether someone participates in a program. In addition to risk level at the time of intake and LOS, the other covariates included race/ethnicity and dynamic items on the MnSTARR 2.0 such as prison misconduct, suicidal tendencies, involvement in a security threat group (STG) (i.e., gang affiliation), and involvement in risk-reducing interventions.
This study then examines changes in risk level from intake to release across the three scenarios by race/ethnicity and gender. For example, if someone assessed as very high risk at intake dropped to medium by the time of release, then this would be a two category risk level reduction (−2). Conversely, if someone assessed as medium at intake increased to a high risk level by the time of release, then this would be a one category risk level increase (+1). These analyses thus reveal the extent to which variations in risk level fidelity and program capacity have an impact on changes in risk levels from intake to release and, more important, on disparities in risk.
Results
In Table 3, the results show program participation rates across the three scenarios by race/ethnicity. Of the 9,529 released prisoners in the sample, females made up 15% (N = 1,424). Among the 9,529 inmates, non-Hispanic whites accounted for 54% of the males and 58% of the females. African-Americans constituted a much larger percentage of the male population compared to females, while the reverse was true for American Indians.
Program Participation Rates by Race/Ethnicity and Scenarios.
Note. 1 = Actual Results; 2 = Risk-Based Simulation; 3 = Risk and Capacity Simulation.
The actual results show that 30% of the males completed CD treatment compared to 15% for females. The other three programs were used somewhat sparingly for both males and females given that each one had participation rates that were less than 5%. Compared to their share of the total population, Whites were more likely to participate in both CD treatment and the EMPLOY program. African-Americans were more likely to participate in the CBT and IFI programs. American Indians were not more likely to participate in any of the four programs.
When we look at the results from the risk-based assignment simulation, we see decreases in the White program participation rate for each program for both males and females. Conversely, the rates increase for African-American and American Indian inmates for nearly every program. For example, while only one of the 41 male CBT participants was American Indian, a risk-based assignment process would increase this number to 12 (29%).
The results from the risk/capacity simulation hew more closely to the overall racial/ethnic distribution for both males and females. What is notable, however, is the apparent gap between program capacity and the actual delivery of programming. If we consider the risk/capacity simulation to be optimal capacity for higher-risk inmates, then programming for males was under capacity by 22% for CD treatment, 41% for IFI, 86% for EMPLOY, and 98% for CBT. For females, programming was under capacity by 55% for CD treatment, 75% for IFI, 82% for EMPLOY, and 98% for CBT. The higher percentages for CBT and EMPLOY reflect the fact that inmates in prison for relatively short periods of time (i.e., less than 6 months) are much more likely to be warehoused (Duwe & Clark, 2017).
While Table 3 provides the racial/ethnic breakdown in program participation rates across the three scenarios, there are other factors that have a bearing on program participation. In Tables 4 and 5, a series of logistic regression models were estimated to examine the influence of these factors on program participation across the three scenarios. In Table 4, which shows the odds ratios for males, we see that LOS was positively associated with participation in all four programs. However, higher-risk males were significantly more likely to participate in only two of the programs—CD treatment and EMPLOY. On the other hand, lower-risk males were significantly more likely to participate in IFI, while risk level did not have a significant effect for CBT. Even though CD treatment favored higher-risk inmates, we still see that African-Americans were significantly less likely to participate.
Multiple Logistic Regression: Male Program Participation by Race/Ethnicity and Scenario.
Note. 1 = Actual Results; 2 = Risk-Based Simulation; 3 = Risk/Capacity Simulation. *p < .05. **p < .01.
Multiple Logistic Regression: Female Program Participation by Race/Ethnicity and Scenario.
Note. 1 = Actual Results; 2 = Risk-Based Simulation; 3 = Risk/Capacity Simulation. *p < .05. **p < .01.
In all four of the risk-based simulation models, we see a much stronger effect for risk level in program assignment decisions. In the model for IFI, the results suggest African-Americans were significantly more likely to be selected. In the risk/capacity simulation, risk level still has a strong effect on participant assignment for all four programs.
The odds ratios from the logistic regression models for females are presented in Table 5, which show that risk level had a significant effect on assignment decisions for two of the programs—CD treatment and CBT. Race/ethnicity was not a significant predictor in any of the actual results models. In both simulation models, we see that risk level is, of course, a significant predictor for all four programs. Both African-Americans and American Indians were significantly more likely to be assigned to CD treatment and EMPLOY for the risk-based simulation.
In Tables 6 through 8, the results show changes in risk level from intake to release across the three scenarios by race/ethnicity for males and females. The actual results in Table 6 for males show that 11.5% reduced at least one risk level, 3.5% increased at least one risk level, and the remaining 85% left prison with the same risk level as when they entered. In the risk-based simulation, a risk-level reduction would have been observed for 15%, which is a 30% increase compared with the rate of actual risk level reductions (11.5%). At 16.2%, African-Americans would have had the greatest rate of risk level reduction, while American Indians would have had the greatest rate of improvement (52% increase) in risk level decreases from an actual rate of 7.3% to a rate of 11.1% in the risk-based scenario.
Risk Level Change by Race/Ethnicity for Males.
Risk Level Change by Race/Ethnicity for Females.
Risk Levels by Race/Ethnicity and Scenario.
In the risk/capacity simulation, 19% would have observed a risk level reduction, which is a 67% increase from the actual rate (11.5%). Once again, with a rate of 20.2%, African-American males would have had the greatest proportion of risk-level decreases. Moreover, American Indian males would have had the greatest rate of improvement over the actual results, with a risk level reduction rate (15.2%) that was more than double the actual rate (7.3%).
As shown in Table 7, 19% of females had a risk level reduction, 1% experienced an increase, and the remaining 80% stayed at the same risk level. A little more than 9% of African-American and American Indian inmates, respectively, experienced a risk level reduction, which is less than half the overall rate (19%). Because the four programs for females adhered more closely to the risk principle compared to males, the risk-based simulation did not produce a sizable jump in risk-level reductions. Indeed, 20.6% would have had a risk level decrease in this simulation, a 5% increase over the actual rate (19.6%). Still, 13% of African Americans would have experienced a risk level decrease, which is a 39% improvement over the actual results. Further, the rate of risk level reduction for American Indians would have been 14%, a 53% increase.
In the risk/capacity simulation, 31% of the females would observe a risk-level reduction, which is a 59% improvement over the actual results (19.6%). The proportion of risk level reductions would have been 23% for African-Americans, an increase of 141% over the actual results (9.6%). For American Indian females, 22% would have experienced a risk level reduction, an improvement of 136% over the actual results (9.4%).
Examining the same data, Table 8 breaks out the risk level changes in more detail by race/ethnicity for males and females. There are several broad findings from this table that are worth highlighting. Because the risk-based simulation placed the greatest priority on assigning the highest-risk individuals for programming, we see a reduction in the percentage of very high cases across all categories but a corresponding increase in the percentage of those in the high category. In other words, by providing those who were the highest risk with greater access to programming, the risk-based simulation produced more cases in which people transitioned from the very high category at intake to the high category at release.
Not surprisingly, the risk/capacity simulation delivered the best outcomes. Due to greater program capacity, this simulation yielded the greatest number of risk level changes. More specifically, in addition to people assessed as very high risk, those in the high risk category were also able to reduce their risk. And, because the implementation of the increased capacity was consistent with the risk principle, this simulation achieved more equitable outcomes for risk levels at the time of release.
Conclusion
The risk principle implies that if there are disparities in risk within a correctional population, then there should also be disparities in program participation. The results from this study showed disparities in risk at the time of intake, with African-American and American Indian inmates, in particular, assessed as higher risk. Although the MnDOC used the risk principle to some extent in program assignment decisions, its application was uneven across the programs examined in this study. When the risk principle was fully applied in the risk-based simulation, the results showed that African-American and American Indian individuals, in particular, would have been more likely to participate in programming. While the findings from the risk-based simulation suggest there would be more risk-level reductions overall, African-American and American Indian prisoners would have benefited more from greater fidelity to the risk principle.
Just as a rising tide lifts all boats, the same is true with an increase in program capacity. The results from the risk/capacity simulation suggest that expanded capacity would produce more risk-level reductions overall. Yet, as long as program assignment was consistent with the risk principle, then the rising tide would have provided a greater lift to the boats that needed it the most. Indeed, an increase in programming resources would have further mitigated disparities, especially for African-American and American Indian individuals.
But the results also reveal the size of the gap between how much programming is currently delivered and how much could, or should, be provided. About half of the 9,529 individuals in this study did not participate in interventions that would have lowered their risk, which is not unique to Minnesota. The development and validation of the risk assessment recently created for the Federal Bureau of Prisons showed that roughly half of its releases did not participate in effective programming prior to release (U.S. Department of Justice, 2019). If reducing recidivism continues to be an important goal for U.S. prison systems, then program capacity likely needs to increase substantially. If, however, an increase in programming is deemed too costly or impractical, then the size of state and federal prison populations likely needs to diminish. Yet, given the magnitude of the programming gap, perhaps both—an increase in programming coupled with decarceration—are needed to achieve better recidivism outcomes.
The RNR model, when it is applied with fidelity, provides correctional systems with an established framework for not only reducing recidivism but also minimizing disparities (Hamilton et al., 2019a, 2019b). Because this model is designed to deliver more supportive resources to those who are higher risk, instruments should attempt to be as accurate as possible in predicting the outcome. To illustrate, suppose we have two persons in prison who, if all things were equal, would have a similar recidivism risk. However, given that all things are seldom equal, the first person has a higher recidivism risk, per the assessment being used, because he has a longer criminal history that is attributable, at least in part, to having grown up in a disadvantaged, high-crime community that was subject to aggressive policing practices. Given that people released from prison often return to the same neighborhood from which they came, let us further assume he will go back to that same community while the second person will be released to a more desistance-friendly location that is laden with resources. To successfully desist from crime, the first person will likely need more resources while he is in prison than the second person. But with an assessment that aims to remove disparities in predicted risk, the first person would not be prioritized for programming any differently than the second person because their level of risk would be similar.
This example underscores the need to draw a distinction between pre- and post-sentencing risk assessments and, more important, between how risk assessments are designed versus how they are used. While post-sentencing risk assessments used within the RNR framework should be designed to be as accurate as possible so as to optimize more equitable outcomes, the same may not necessarily be true for pre-sentencing risk assessments. If a risk assessment is designed to maximize accuracy, but its use exacerbates disparities because the purposes for its use are more punitive, then perhaps trade-offs between accuracy and equity should be made during the design process.
Despite the fact the RNR model can help achieve more equitable results by providing more resources to those who need it the most, it is worth emphasizing that use of this approach will not eliminate disparities. After all, discrimination and bias have long pervaded the criminal justice system and society in general. Due to the extent to which disparities exist, combined with the relatively brief periods of time many people are in prison, correctional systems are limited in what they can do to close the gap. Nevertheless, the size of this gap could be reduced by following the risk principle and, more broadly, the RNR model.
There are several limitations with this study that are important to mention. First, due to the unavailability of needs assessment data for much of the sample, the analyses assumed the four programs would be appropriate as long as individuals were higher risk with a sufficient amount of time to serve. Second, even though this study focused on changes in risk levels, a risk-level reduction is not a guarantee of better recidivism outcomes. If, for example, expanding program capacity diluted the quality of the interventions, then the program items on the MnSTARR 2.0 would overestimate the reduction in risk. Finally, because the MnSTARR 2.0 has been customized to Minnesota’s prisoners, the results from a home-grown risk assessment instrument from one state may not necessarily be generalizable to other correctional populations.
While this study was able to leverage the automated assessment data from the MnSTARR 2.0 to more closely examine the application of the risk principle, there are a number of promising avenues for future research. First, even though the vast majority of correctional agencies use risk assessment instruments, it is unclear the extent to which there is fidelity to the RNR principles. Future research should therefore pay closer attention to how risk assessments are being used in practice. Second, while it is important to consider changes in assessed risk, it is also critical to examine the impact of risk principle implementation on actual outcomes such as recidivism or prison misconduct. Finally, given the findings from this study and other recent research on the programming gap in U.S. prison systems, future research should more closely evaluate whether efforts to close this gap would lead to better outcomes for recidivism, prison misconduct and post-prison employment.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
