Abstract
This study examined the nature and impacts of the professional override on the Level of Service Inventory–Ontario Revision (LSI-OR), using a large archival database of 40,539 individuals’ information. Research questions focused on the predictive validity of various LSI-OR risk metrics, including total risk/need scores, initial risk categories, and adjusted risk categories, for various types of recidivism; how professional overrides were used; whether they were used more with some groups than others; and whether their impacts varied depending on recidivism type. Overrides were applied in 15.4% of cases, most often (94.1%) to increase risk levels. Override use varied based on gender, race, and the nature of index offenses. Based on receiver operating characteristic analyses, the results generally indicated that adjusted risk levels (incorporating professional overrides) demonstrated inferior predictive validity relative to unadjusted metrics. The results suggest a need for increased caution and consistency in the application of professional overrides.
Forensic risk assessment measures are used in various criminal justice, law enforcement, and health care contexts. These measures have been described as evolving through four generations (Bonta, 1996), from unstructured clinical judgments (first generation), to statistically based and atheoretical actuarial tools (second generation), to theoretically informed structured instruments comprising at least some dynamic variables (third generation), and most recently, to tools that incorporate an ongoing case management component (fourth generation). Although a large body of research has demonstrated that structured approaches outperform first-generation approaches with regard to the prediction of recidivism (Andrews et al., 2006; Grove et al., 2000), contemporary forensic assessors and researchers have embraced, and debated the merits of, varied approaches (Singh et al., 2014).
The Evolution of the Clinical Versus Actuarial Debate
Reviews of clinical and actuarial prediction techniques have tended to favor the latter category, whether they have focused on wide-ranging fields (Dawes et al., 1989; Grove et al., 2000; Meehl, 1954) or focused on the prediction of recidivism in particular (Ægisdottir et al., 2006). Based on such findings, Ægisdottir and colleagues argued that disregarding the higher performing actuarial approaches could be unethical and may work to the detriment of public safety and individual rights. However, other researchers and practitioners have challenged the purported supremacy of strict actuarial techniques. Closely aligned with arguments on this side of the debate is the widespread use of the structured professional judgment (SPJ) approach. SPJ measures were developed to emphasize structure, prevention, and flexibility (Hart & Boer, 2009). Unlike actuarial tools that combine data in a predetermined fashion, SPJ tools typically require the evaluator to consider a number of risk factors individually, and then formulate final appraisals of risk or case prioritization on the balance of information. The manner of combining the data to formulate conclusions is left to the discretion of the evaluator. Arguments in favor of this approach have criticized the methodological rigor of actuarial research (e.g., Litwack, 2002), challenged the applicability of group data to individuals (Hart et al., 2007), and emphasized the identification of risk management targets over statistical predictors. Although these tools have been supported by meta-analyses treating them in an actuarial manner (Campbell et al., 2009; Rossdale et al., 2020; Yang et al., 2010) or in a binary manner (i.e., low vs. moderate/high; Fazel et al., 2012), such evaluative strategies deviate from the tools’ clinical application. That said, a meta-analytic review contained in an unpublished doctoral dissertation (Chevalier, 2017) found that SPJ summary risk ratings demonstrated predictive validity for recidivism. Additional studies that were not included in Chevalier’s review have also found support for SPJ summary risk ratings, both in terms of overall risk/case prioritization (e.g., Hogan & Olver, 2019; Vargen et al., 2020) and in terms of imminent risk of institutional outcomes (e.g., Hogan & Olver, 2016, 2018).
Combining Clinical and Actuarial Data and the Professional Override
In light of the aforementioned debate, still other authors have argued for a combination of actuarial and clinical decision making. Webster et al. (2002), for instance, argued for the use of clinical information to supplement actuarial and prediction-based assessment measures to form a more complete assessment report and integrated understanding of each case. Evaluators may wish to increase a person’s risk level if a risk factor appears to be driving offending behavior, but is not captured by a particular tool; alternatively, they may wish to reduce one’s risk level if she is actuarially high risk, but physically incapacitated. Practically speaking, professional overrides provide one mechanism for combining such data. For instance, the principle of the professional override has been included in the level of service (LS) family of assessment measures (discussed in greater detail below) since the original conception of the risk–need–responsivity (RNR) principles (Andrews et al., 1990). The risk principle of RNR posits that rehabilitative services are most effective when matched to the individual’s risk level (e.g., high-intensity treatment programming for the highest risk individuals, and minimal or no programming for the lowest risk individuals), and the use of structured risk scales is recommended to select risk levels. That said, overrides allow for one’s initial risk level, determined by scale items, to be either increased or decreased to arrive at a final risk level. The professional override is often described with the caveats that it should be used sparingly and only with reasonable justifications (Andrews et al., 2006); however, its use has not received the systematic monitoring originally recommended.
Based on studies of varied tools and offending populations, the incremental predictive value of the professional override has been challenged. For example, two published studies of overrides and the Level of Service/Case Management Inventory (LS/CMI; Andrews et al., 2004), focused on Canadian samples of persons who have sexually offended (PSO) from Ontario (Wormith et al., 2012) and persons convicted of any offense in Quebec (Guay & Parent, 2018), found that adjusted scores resulted in a reduction in predictive validity, and in no improvement in predictive validity, respectively. Studies conducted by Schmidt et al. (2016) and Vaswani and Merone (2014), examining a youth variant of the LS instruments, found that overrides reduced predictive validity for multiple recidivism categories, among adolescents in Canada and Scotland, respectively. McCafferty (2017) evaluated the effect of overrides on the predictive validity of a general risk assessment instrument for juveniles in Ohio, and found that overrides resulted in similar, but reduced, predictive validity for recidivism relative to unadjusted scores. Although they did not report detailed inferential statistics in their study, T. H. Cohen et al. (2016) examined overrides with an actuarial tool for general recidivism among persons under federal probation in the United States, and observed that those receiving upward overrides appeared to reoffend at rates associated with their unadjusted risk levels. The effect was reversed for downward overrides, but the authors urged caution in the interpretation of this trend, due to small numbers. Finally, in three studies using sexual offense–specific actuarial tools for adults (Duwe & Rocque, 2018; Hanson et al., 2007; Storey et al., 2012), overrides reduced the predictive accuracy of unadjusted scores for sexual recidivism.
Limited existing research has also ventured beyond predictive validity, to examine other aspects of overrides and their application. For instance, although there are some exceptions, including an older unpublished doctoral dissertation (Girard, 1999) and a more recent study by Guay and Parent (2018), most of the studies cited above suggest a tendency for assessors to override risk levels upward more than downward. This tendency is particularly notable in light of preliminary evidence from Guay and Parent’s study suggesting that upward overrides exerted a more positive effect on predictive validity than downward overrides. In Schmidt and colleagues’ (2016) study, youth who had sexually offended were more likely to be subjected to an override (74%) than those who had not done so (42%). A similar pattern was observed in Wormith and colleagues’ (2012) study, as the rate of overrides among the sexual offense group (35%) exceeded the rate observed in a non–sexual offense group (15%).
Individual factors, such as gender and race, may also play a role in the application of overrides. In one study, Hsu et al. (2009) examined the Level of Service Inventory–Revised (LSI-R; Andrews & Bonta, 1995) in an Australian sample, and noted that overrides were applied to males nearly twice as often as females. Although T. H. Cohen and colleagues (2016) also found that males received overrides at a higher rate than females, this discrepancy can be attributed to policy overrides; females received an approximately equivalent but marginally higher rate of discretionary overrides. Wormith and colleagues’ (2012) study suggested that the direction of overrides is relevant to the relationship with gender, as female gender was correlated with downward overrides. Regarding racial differences, Hsu and colleagues observed that overrides were applied more often to non-Indigenous males (14%) than their Indigenous (10%) counterparts. In a study of services offered to youth with probation sentences, Campbell and colleagues (2018) found that although scores on a youth variant of the LS instruments predicted court officers’ recommendations for programming, race was also an incremental predictor of program referrals. In contrast with Hsu and colleagues’ findings, members of racial minority groups were more likely to be referred for additional correctional programming after controlling for risk scores, which can be interpreted as a form of upward override. Thus, it appears that the relationships between diversity factors and overrides are complex.
Notably, the handful of studies evaluating the impact of overrides on risk assessment instruments have suggested that overrides often reduce, but do not necessarily negate the predictive validity of these instruments. The research cited above also indicates that although rates vary, overrides are regularly applied in various criminal justice contexts. As such, replication and expansion of the existing literature are indicated, to better understand how overrides are applied, and how they may affect various instruments, groups, and criminal justice processes.
The LS Family of Risk/Need Assessment Instruments
The Level of Service Inventory–Ontario Revision (LSI-OR; Andrews et al., 1995) is a general risk/needs instrument, based on RNR principles, designed to match treatment intensity to risk level and to identify criminogenic needs. The core risk/need items of the LSI-OR are identical to those included in the internationally published LS/CMI and the Level of Service/Risk, Need, Responsivity (LS/RNR; Andrews et al., 2008), and as such, research evaluating the predictive validity of the general risk/need score of one instrument is equally applicable to the others (Wormith, 2019). The LS instruments are the most often used risk assessment tools in the world (Wormith, 2011), and have demonstrated predictive validity for recidivism among diverse groups (e.g., Girard & Wormith, 2004; Olver & Kingston , 2019; Rettinger, 1998). Meta-analytic evidence (Olver, Stockdale, & Wormith, 2014) supports the predictive validity of the LS instruments in general, and also suggests that the LSI-OR risk component performs as well or better than the previous iterations.
The Current Study
The studies reviewed above suggest that overrides do not improve upon the predictive validity of purely actuarial methods. However, it also appears that adjusted actuarial scores may nonetheless predict recidivism, and various authors have offered empirical and theoretical arguments in favor of overrides and SPJ processes, which arguably involve analogous processes. The research base focused on overrides remains relatively small, particularly when considering individual instruments. Given that the LS tools are widely used to inform criminal justice processes, and that professionals are applying overrides using processes that are not well or fully understood, we believe that further research is warranted.
The current study was designed to shed further light on the nature of professional overrides applied to the LSI-OR, and their associations with individual characteristics. To this end, we examined data pertaining to persons with criminal convictions (PCC) from Ontario, which included LSI-OR risk ratings completed by supervising officers. First, we sought to describe and examine how overrides were used to adjust risk levels overall. We hypothesized that more overrides would be used to increase than decrease risk levels. Second, we examined whether overrides were used more with some index offense groups than others, and the extent to which such categorization predicted overrides. We hypothesized that overrides would be applied more often to males than females, and that sexual index offenses would predict upward overrides. Due to the limited and inconsistent findings cited above (Campbell et al., 2018; Hsu et al., 2009), we did not formulate hypotheses based on race; that said, we conducted exploratory analyses to determine whether racial categories were associated with discrepant override rates. Third, we examined the impacts of overrides on the predictive validity of the LSI-OR for various outcomes. Based on extant research, we hypothesized that overrides would decrease predictive validity for general recidivism. In light of the wide application of the LSI-OR and its foundation as a tool to predict general recidivism, we also explored whether the impacts of overrides on predictive validity differed among subcategories of recidivism. Finally, based on a suggestion from a peer reviewer, and in keeping with Guay and Parent’s (2018) analyses, we conducted separate predictive validity analyses to explore the impacts of upward and downward overrides.
Method
Sample
The sample was derived from a large archival database, comprising all PCC in Ontario, who (a) were released from custody after serving at least 1 month’s sentence, (b) commenced a conditional sentence, or (c) commenced a probation or intermittent sentence in the calendar years of 2007 and 2008 within the Ontario Ministry of Community Safety and Correctional Services (MCSCS). The original sample consisted of 41,117 PCC, after duplicate entries were removed. Cases were deleted for a number of reasons, including death during sentence (257 cases), release to a penitentiary or other correctional facility (49 cases), and release to immigration- or deportation-related proceedings, or other jurisdictional transfers (444 cases). A further 1,850 cases were removed because their files indicated that they recidivated, but the offense occurred prior to either their custody release date or their LSI-OR assessment date. The final sample consisted of 40,539 PCC; Table 1 contains demographic information for the overall sample, and for index offense–based subsamples. Individuals were categorized according to index offense type (i.e., violent, sexual, or nonviolent) using the Offense Severity Scale (OSS; Stasiuk et al., 1996), which discriminates among Canadian Criminal Code offenses. For example, violent offenses were those that involved acts, attempted acts, and threatened acts with the potential to cause physical harm to another person (e.g., homicide and related offenses, assault and related offenses). Approximately 53.0% of the sample (n = 21,471) comprised persons with violent index offenses (PVIO), 3.4% (n = 1,357) comprised persons with sexual index offenses (PSIO), and 43.7% (n = 17,709) comprised persons with nonviolent (i.e., neither violent nor sexual) index offenses (PNVIO).
Index Offense Frequencies and General Recidivism Base Rates, Stratified by Demographic Variables
Note. n = 38,689. Row percentages are offered separately for gender and race.
Measures
The LSI-OR
An LSI-OR assessment is required for all PCC in Ontario serving a sentence of at least 30 days (Wormith, 1997). The main users of the LSI-OR are probation, parole, and correctional officers. The LSI-OR contains both a General Risk/Need and Specific Risk/Need section. The General Risk/Need Scale (Section A) contains eight subscales: Criminal History (eight items), Education/Employment (nine items), Family/Marital (four items), Leisure/Recreation (two items), Companions (four items), Procriminal Attitude/Orientation (four items), Substance Abuse (eight items), and Antisocial Pattern (four items). Each of these 43 items are scored on a dichotomous scale (i.e., 1 = present, 0 = absent) and are summed to create a total General Risk/Need score and eight subscale scores.
The General Risk/Need score has a possible range of scores from 0 to 43, which correspond with one of five ordinal risk categories; total scores of 0 to 4 correspond with very low risk, scores of 5 to 10 correspond with low risk, scores of 11 to 19 correspond with medium risk, scores of 20 to 29 correspond with high risk, and scores of 30 to 43 correspond with very high risk. Assessors may apply a professional override, which increases or decreases the initial risk level to a different final risk level. When initial risk levels are overridden, an override change score can be computed by subtracting the initial risk level from the final risk level (e.g., +3 for a case overridden from low [second category] to very high [fifth category] risk).
Outcome Data
Recidivism was defined as any offense that returned the individual to MCSCS custody as documented in MCSCS’ Offender Tracking Information System (OTIS). This outcome was selected because it is more inclusive than formal convictions; because all individuals were under community supervision, a formal trial and conviction were not necessary to trigger a return to custody. Although no approach is perfect, we believe this approach more accurately captures new offending behavior and addresses some of the concerns associated with conservative outcome variables and underreporting of crime in general (Scurich & John, 2019). Six recidivism variables were computed. First, a dichotomous general recidivism variable was created (yes = 1, no = 0). Second, a categorical outcome variable was computed to discriminate among types of recidivism (i.e., 1 = violent, 2 = sexual, 3 = nonviolent, 4 = unknown). Three additional dichotomous variables were created for violent, sexual, and nonviolent recidivism.
Procedure
Ethical review and approval were provided by a Western Canadian university. All study data were extracted by an MCSCS employee and provided to the authors in SPSS and Excel files. Data were screened for outliers and data entry errors prior to analyses, which were conducted with IBM SPSS Statistics version 21.0. Individuals who met the criteria for being released from a custodial sentence or for starting a conditional, probation, or intermittent sentence in 2007 and 2008 were identified in OTIS and had their demographic, criminal history, and LSI-OR information extracted on May 15, 2013. The latter types of sentences are forms of community supervision, under which an individual is allowed to reside in the community under specific conditions; intermittent sentences allow the individual to serve a custodial sentence on weekends, while remaining under community supervision on weekdays. The aforementioned information was then merged with the individuals’ release from supervision and recidivism information. Using this sample from the 2007 and 2008 calendar years allowed for a follow-up period between 4 and 5 years for all individuals.
Data Analytic Plan
Although the magnitude of effects is likely of greatest interest to most readers, statistical significance tests were conducted and reported for reference (threshold for statistical significance set at p < .05). Receiver operating characteristic (ROC) analyses to generate area under the curve (AUC) values were conducted to assess the baseline predictive validity of the LSI-OR for all recidivism variables. Higher AUC values indicate stronger predictive accuracy (e.g., an AUC of .75 indicates a 75% probability of a randomly selected recidivist scoring higher on the LSI-OR than a randomly selected nonrecidivist). According to Rice and Harris (2005), AUC values of .556, .639, and .714 may be considered categorical thresholds for small, medium, and large effect sizes, respectively. These analyses were stratified across multiple groups based on demographics, index offense, and recidivism type. Analyses were repeated using the initial and final risk levels to determine the effect of the professional override on predictive validity. Comparisons among AUC values involved reviewing 95% confidence intervals for overlap, which constitutes an approximate test of statistical significance.
Multiple comparisons of override rates were conducted. Mann–Whitney tests were used for binary comparisons of risk levels and override scores between males and females. Corresponding effect sizes, based on r (Fritz et al., 2012), were computed, and described using J. Cohen’s (1992) effect size descriptors of small (.10), medium (.30), and large (.50). Kruskal–Wallis tests were used for comparisons involving more than two groups.
To further evaluate relationships among individual characteristics and the application of overrides, we conducted a series of hierarchical logistic regression analyses. Characteristic(s) of interest were entered in the first step, followed by the addition of initial risk levels in the second step. These analyses were intended to examine the magnitude of apparent relationships between salient individual variables and overrides, and the extent to which these relationships were independent of differences in risk. Finally, Cox regression survival analyses were conducted to investigate whether override change scores made significant contributions to the prediction of time to recidivism, controlling for initial risk level and for time at risk.
Results
The overall base rate of general recidivism was 43.5% (n = 16,816). Of the observed recidivism offenses, 40.2% were violent (n = 6,855), 1.3% were sexual (n = 226), and 58.5% (n = 9,733) were nonviolent. Base rates of general recidivism for PVIO, PSIO, and PNVIO were 44.3%, 25.7%, and 43.8%, respectively. Further information pertaining to the recidivism base rates stratified by demographic variables is presented in Table 1. Mean follow-up time across the entire sample (for recidivists, the recidivism offense constituted the end of follow-up) was 3.9 years, with a standard deviation of 2.2 years, and a maximum of 6.4 years. Although not a primary focus of the study, ROC analyses were conducted to establish the predictive validity of the General Risk/Need scores, to inform consideration of the generalizability of other findings. Large AUCs (.747–.762, p < .001) were observed for general recidivism, among the overall sample, males and females, and all index offense groups.
Use of the Override
To evaluate the frequency and nature of overrides, we compared initial and final risk levels. In total, 15.4% of cases received an override (5,954), with 91.4% of overrides used to increase risk levels (5,601) and 5.9% used to decrease risk levels (353). Table 2 contains an initial-by-final risk-level matrix, with recidivism rates in each cell in parentheses (e.g., of the 211 cases with an initial very low risk level and a final low risk level, 14.7% recidivated).
Distribution and Recidivism Rates for All Offenders in an Initial-by-Final Risk-Level Matrix
Note. Percentages denote recidivism rates of individuals in each cell.
Gender and Overrides
In all, 14.6% of males (5,321) received an override, with 95.0% of overrides (5,053) increasing risk levels and 5.0% (268) decreasing risk levels; 10.2% of females received an override (268), with 86.6% (548) increasing risk levels and 13.4% (85) decreasing risk levels. Mann–Whitney tests supported the observations that, relative to females, males had higher initial risk levels (U = 89,191,284.00, Z = −14.835, p < .001) and final risk levels (U = 83,550,322.00, Z = −22.254, p < .001). Based on effect sizes (Fritz et al., 2012), the difference in initial risk levels based on gender (r = .08) did not meet Cohen’s threshold for a small effect, but the difference in final risk levels did pass this threshold (r = .11). Using hierarchical logistic regression analysis (Table 4), male gender predicted upward overrides when entered alone (eB = 1.90, p < .001), and incrementally predicted upward overrides to a greater degree when entered alongside initial risk level (eB = 2.39, p < .001).
Race and Overrides
Subsamples stratified by self-reported race differed on initial risk levels, final risk levels, and override change scores (Table 3); the differences were supported by Kruskal–Wallis tests. Hierarchical logistic regression analyses were conducted to further evaluate the impact of race on upward override use, with detailed information provided in Table 4. Identifying as White and identifying as Indigenous were each associated with increased odds of upward overrides, although the odds ratios decreased in magnitude when controlling for initial risk level. Identifying as Hispanic and identifying as Asian were each associated with decreased odds of upward overrides; these odds ratios also decreased in magnitude when controlling for initial risk level. Neither identifying as Black nor selecting the Other category (i.e., racial identity documented as Other, unknown, or the individual declined to specify) incrementally predicted overrides beyond initial risk levels.
Mean Differences Between Offenders by Race on Initial Risk Levels, Final Risk Levels, and Override Change Scores
n = 24,380.
p < .001.
Logistic Regression Analyses: Incremental Associations of Diversity Factors and Offense Category With Upward Overrides, Controlling for Initial Risk Level
Note. n = 38,688. CI = confidence interval.
p < .05. **p < .01. ***p < .001.
Index Offense Category and Overrides
Of the PSIO, nearly half (49.8%) received an override, with 98.8% of these overrides used to increase risk levels. In contrast, of the PVIO, only 18.9% received an override, with 95% of overrides used to increase risk levels. Finally, of the PNVIO, 8.6% received an override, with 88.6% used to increase risk levels. As seen in Table 5, the PSIO group demonstrated the lowest mean initial risk level, followed by PNVIO, and then PVIO. In contrast, the PSIO demonstrated the highest mean final risk level, and the highest mean override change score. These differences were supported by Kruskal–Wallis tests. Using hierarchical logistic regression analysis (Table 4), having a sexual index offense predicted upward overrides when entered alone in the first step (eB = 6.34, p < .001), and incrementally predicted upward overrides with larger odds ratio when entered alongside initial risk level in the second step (eB = 7.02, p < .001).
Mean Differences on Initial Risk Levels, Final Risk Levels, and Override Change Scores
Note. n = 38,687.
n = 20,450.
p < .001.
Overrides and Predictive Validity
Predictive Validity of the Initial and Final Risk Levels
Additional ROC analyses were conducted for the prediction of the four recidivism variables using the initial and final risk levels (Table 6). Examining the entire sample, the AUC values were smaller for final risk levels than for initial risk levels with nonoverlapping 95% confidence intervals, for violent, nonviolent, and general recidivism. Final risk levels had higher AUC values for sexual recidivism than initial risk levels, although the 95% confidence intervals overlapped in this case; readers should note that the analyses pertaining to sexual recidivism among females are included for reference only, as they are based on only three instances of recidivism. With regard to the magnitude of effects, using Rice and Harris’ (2005) descriptors, the initial risk levels produced a large AUC for general recidivism, whereas the final risk levels produced an AUC at the upper threshold of the medium range. Both initial and final risk levels produced medium-sized AUCs for violent and for nonviolent recidivism. As for sexual recidivism, the AUC for initial risk levels was small, whereas the AUC for final risk levels was medium. Table 7 contains detailed information pertaining to AUCs based on varied outcomes and index offense types.
AUC Values for the Initial and Final Risk Levels’ Predictive Validity for Recidivism
Note. 95% confidence intervals in brackets. AUC = area under the curve.
Analyses pertaining to sexual recidivism among females included only three recidivists.
p < .05. **p < .01. ***p < .001.
Cox Regression Survival Analyses: Examination of Incremental Validity of Override Change Scores for Recidivism
Note. n = 38,688. CI = confidence interval.
p < .05. **p < .01. ***p < .001.
Incremental Predictive Validity of Override Change Scores
When controlling for initial risk level and time-at-risk using Cox regression survival analysis (Table 7), override change scores did not predict general recidivism. Comparable findings with regard to subtypes of recidivism were mixed. When controlling for initial risk level, upward overrides were associated with a reduced hazard of nonviolent recidivism, but an increased hazard of violent and sexual recidivism. Two further analyses were conducted, to parse out the potentially unique contributions of upward and downward overrides, respectively. Neither type of override incrementally predicted recidivism, after controlling for initial risk level.
Discussion
The current study examined the professional override function of the LSI-OR among a diverse Canadian sample. We endeavored to determine whether and how it was used in this sample, including whether the override was applied more often with certain groups, and whether and how the override affected predictive validity.
Overrides were used in just more than 15% of cases, and as hypothesized, considerably more of these overrides were used to increase (94.1%) than to decrease (5.9%) risk levels. The finding that professional overrides were disproportionately used to increase rather than decrease risk levels is generally consistent with previous research studies of various tools and populations (e.g., T. H. Cohen et al., 2016; Hogg, 2011; McCafferty, 2017; Schmidt et al., 2016; Wormith et al., 2012), although Guay and Parent’s (2018) study using a French translation of the LS/CMI is a notable exception to this pattern. The tendency to inflate risk ratings through professional judgment observed in this study is arguably reminiscent of the first-generation risk assessment practices that came to light through study of the Baxstrom cohort (Steadman & Keveles, 1972), and which ultimately helped precipitate the wide adoption of structured tools.
Rates of override use notably varied with gender. Males received marginally higher initial risk–level ratings, but the discrepancy increased for final risk ratings. Logistic regression analyses indicated that when holding initial risk level constant, male gender was associated with a 239% increase in the odds of receiving an upward override. Although the literature summarized above does not contain a clear consensus regarding gender and overrides (T. H. Cohen et al., 2016; Hsu et al., 2009; Wormith et al., 2012), we consider the current results to be generally consistent with a trend in the literature for male gender to be associated with upward overrides. Considered alongside evidence indicating that females who have offended tend to receive more lenient sentences than their male counterparts (Sandler & Freeman, 2011; Shields & Cochran, 2020), these findings support Shields and Cochrane’s hypothesis that females tend to be perceived by professionals as posing a lower risk than males—even when holding actuarial risk scores constant. Conclusions regarding whether these apparent gender discrepancies are justified, including the observed discrepancy in override use, may be informed by the predictive validity findings described below.
With regard to override differences based on self-identified race, the results were complex and suggested that baseline risk is critical to the interpretation of the data. Each group was subjected to an overall increase in risk level, although the magnitude of changes varied. When comparing among the groups using simple override change scores, individuals identifying as Asian saw the greatest increases in risk level, followed in decreasing order by those identifying as Hispanic, Black, White, and Indigenous, respectively. However, hierarchical logistic regression analyses, controlling for initial risk levels, revealed a different pattern. For instance, despite seeing the smallest overall increase in risk level, the Indigenous category was associated with the largest odds of an upward override of any racial group. The odds ratio indicated that identifying as Indigenous was associated with a 198% increase in the odds of an upward override before controlling for risk; when holding initial risk level constant, it was associated with a smaller, but nonetheless notable 55% increase in the odds of an upward override. In comparison, although individuals identifying as Asian were subjected to the largest overall increase in risk level, logistic regression analyses indicated that this category was actually associated with a 30% reduction in the odds of an upward override when controlling for initial risk level. These results build on existing literature (Campbell et al., 2018; Hsu et al., 2009), in that, they suggest that the relationship between race and overrides varies among particular groups, and that meaningful interpretations of such discrepancies require consideration of diversity, as well as baseline risk. In light of the impacts of risk assessments on the rights of individuals, these results highlight the need for further study of the variables that might explain such discrepancies.
As expected, variability in rates of override use was also observed among the index offense–based categories. Consistent with expectations and previous research (Schmidt et al., 2016; Wormith et al., 2012), although all groups were subjected to increases in risk level, PSIO were subjected to the greatest overall increase. Although PSIO had lower initial risk levels than PVIO and PNVIO, they had significantly higher final risk levels than the other groups. The odds ratio from logistic regression analysis indicated that having a sexual index offense was associated with a 534% increase in the odds of an upward override before controlling for risk; when holding initial risk level constant, it was associated with a 602% increase in the odds of an upward override.
The disproportionately high rate of overrides among the sexual offense group warrants comment, particularly considering the group’s relatively low rate of recidivism. On one hand, this trend could reflect the fact that the LSI-OR is arguably less applicable to, and less predictive of (Olver, Stockdale, & Wormith, 2014), sexual recidivism than other types of recidivism. Specialized risk tools are in common use, reflecting empirical evidence that certain risk factors, such as deviant sexual interests, are differentially predictive of sexual recidivism (e.g., Hanson & Morton-Bourgon, 2005). In the current sample, it is possible that specialized sexual recidivism risk assessments were used to inform overrides of the LSI-OR risk levels, which could arguably constitute an appropriate and standardized application of the override, particularly in light of the Cox regression analyses suggesting that upward overrides were associated with a 172% increase in the hazard of sexual recidivism. Unfortunately, information regarding any additional assessments applied to this sample was not available. On the other hand, it is also possible that this trend reflects the fact that members of the public and professionals alike tend to overestimate sexual recidivism rates (Brown et al., 2008; Olver & Barlow, 2010) and to hold negative attitudes toward PSO (Corăbian, 2016; Craig, 2005). Such attitudes and beliefs are so prevalent that they have been identified as potential impediments to effective policy development, treatment interventions, community reintegration, and the development of appropriate community-based support for PSO (Corăbian & Hogan, 2012; Willis et al., 2010). Although the relative contributions of these types of factors could not be evaluated, our findings suggest that a more fulsome understanding of the professional override may be gleaned from studies that differentiate among index offense types, rather than assuming homogeneous patterns of override use across populations.
With regard to the impact of professional overrides on predictive validity, the results varied by outcome. As evidenced by AUC values and associated 95% confidence intervals, the predictive validity of the LSI-OR risk levels for general, violent, and nonviolent recidivism decreased as a result of overrides, whereas the predictive validity of the LSI-OR for sexual recidivism appeared to increase as a result of overrides. It is also worth noting that overrides did not improve predictive accuracy for either gender group. Due to low numbers, readers should interpret the reported AUCs related to female PSIO (n =16) and female sexual recidivism cautiously; only three females sexually recidivated, leading to questionable, and ultimately statistically insignificant (p > .05) results. The general tendency for unadjusted metrics to produce larger predictive effect sizes is consistent with previous research using the LSI-OR and LS/CMI (Guay & Parent, 2018; Hogg, 2011; Wormith et al., 2012); however, it should be noted that the final risk levels also predicted recidivism, often with respectable medium-sized effects. With regard to the ROC findings related to sexual recidivism, readers should also note that when comparing the predictive validity of the initial and final risk levels for sexual recidivism among PSIO alone, the results were reversed, with initial risk levels demonstrating stronger predictive validity than the final risk levels. This is likely the more representative and meaningful comparison due to base rate considerations, and reflects the approach taken by Olver, Stockdale, and Wormith (2014) in their meta-analysis.
Further detail regarding predictive validity was gleaned from Cox regression survival analyses, which demonstrated that overrides did not incrementally predict general recidivism, the outcome the instrument was designed and calibrated to predict, after controlling for initial risk level and time at risk. For nonviolent recidivism, again controlling for the above-noted variables, an upward override of one level was associated with 8% reduction in the hazard of recidivism. In contrast, comparable analyses focused on violent recidivism and sexual recidivism suggested than an upward override of one level was associated with an increase in the hazard of recidivism of 11%, and 172%, respectively. As discussed previously, it is possible that the increase in the hazard of sexual recidivism associated with overrides could reflect appropriately applied overrides, based on formal or informal specialized assessment criteria. However, in light of the aforementioned research on professional attitudes and fear associated with sexual offending, it is also possible that in some cases, the act of designating a person who has sexually offended as high risk increased the likelihood of any future misconduct being detected and sanctioned. Given the disparate effects of overrides on the different outcomes, future research should likely move beyond a rudimentary comparison of predictive effect sizes, to evaluate how overrides may influence outcomes, rather than simply how they predict them.
Potentially complicating the interpretation of these results is the fact that the data analyzed in this study came from real-world applications of the tool. That is, the LSI-OR was being used not to predict recidivism, but to match services to individuals to reduce risk. As such, one might reasonably argue that by overriding risk levels upward, the evaluators may have allowed the individuals to access additional resources, thereby reducing the likelihood of a new offense, and in turn undermining predictive validity. Although such an effect is possible, it bears mentioning that the initial risk levels and total risk/need scores are designed for this purpose, and these metrics remained predictive of recidivism despite their own potential relationship to intervention. Nonetheless, tracking intervention data would assist in the interpretation of the results of comparable future studies.
Some clinicians and researchers (e.g., Webster et al., 2002) have argued that evaluators should be free to supplement actuarial measures with clinical information to form a comprehensive understanding of the risk/need profile of each client. In theory, the consideration of unique characteristics and circumstances when assessing risk is defensible. For instance, confidence in actuarial-based risk estimates might decrease if an individual is physically incapacitated by injury or illness. In other cases, evaluators may wish to incorporate the potentially risk-reducing effects of effective correctional treatment (Gannon et al., 2019). Alternatively, an upward override may be warranted in certain circumstances, based on the presence of a particularly salient risk factor, or based on concerns regarding the potential severity of recidivism (e.g., violent rape vs. technical breaches). At minimum, the fact that adjusted risk levels predict recidivism, as do SPJ summary risk ratings, suggests that such approaches should not be dismissed altogether. However, research conducted by Schaefer and Williamson (2018) suggests that deviations from assessment guidelines can also be influenced by other assessor variables, ranging from exhaustion to self-confidence. Regardless, the current results indicate that real-world applications of the professional override did not improve the utility of the LSI-OR, and therefore may not constitute an appropriate mechanism for incorporating clinical information. One promising alternative is the consideration of structured measures of dynamic risk. A growing research base suggests that when using appropriate instruments, change measurement can improve risk prediction for violence (Coupland & Olver, 2020; de Vries Robbé et al., 2015; Hogan & Olver, 2016, 2018, 2019; Lewis et al., 2013; Penney et al., 2016), sexual recidivism (Beggs & Grace, 2011; Olver, Beggs Christofferson, et al., 2014; Olver et al., 2007), and general recidivism (Rojas & Olver, 2020).
Finally, this study also raises issues regarding the strengths and limitations of categorical/nominal risk communication more generally. Recent efforts have been undertaken to standardize risk communication (Hanson et al., 2017) and to provide more precise definitions of risk categories. Although the implications of the common risk language are still being investigated (Hogan, 2020; Hogan & Sribney, 2019), greater precision in defining risk levels may carry over into improved decision making related to professional overrides.
This study demonstrated that the predictive validity of LSI-OR risk ratings was reduced by the application of professional overrides among a large and diverse sample. The main limitation of this research is its observational nature. Although the size and representativeness of the sample can increase confidence in the generalizability of the results, the findings do not speak to why the observed patterns were present. For example, it is not possible to determine why evaluators tended to override risk ratings upward or why they disproportionately applied overrides to PSIO. In light of the large number of analyses, the possibility of Type 1 errors should also be acknowledged, although concerns in this regard are mitigated by the large sample size, robust effect sizes, and consistency with extant research. In addition, the current results pertain to assessments and follow-ups that occurred between 2007 and 2013 and, therefore, the results may be less representative of current circumstances than more recent data; that said, the apparent consistencies between this and other relevant studies arguably mitigate such concerns. The nature of the study also provides some significant advantages. By reflecting real-world practices, the findings provide insight into the field validity of the LSI-OR, LS/CMI, and LS/RNR. Moreover, despite some limitations such as the lack of information regarding interventions, the data did allow for the consideration of race, offense type, and gender.
Ultimately, the results of this study indicate that professional overrides should be applied cautiously. If nothing else, it would be prudent for evaluators to document and standardize their processes for applying overrides so that they may be evaluated with greater rigor in the future, particularly given that little standard guidance is offered by tool developers. Future research may explore what factors and circumstances might contribute to beneficial and problematic overrides, respectively. Although risk assessment requires evaluators to exercise good judgment, the potential pitfalls of biases and heuristics in risk assessment remain relevant concerns five decades on from the release of the Baxstrom patients (Steadman & Keveles, 1972).
Footnotes
Authors’ Note:
Both Laura Orton and Neil Hogan gratefully acknowledge the invaluable support and influence of Dr. Stephen Wormith on their academic careers, and wish to dedicate this article to his memory. This manuscript stems from a master’s thesis project undertaken by Laura Orton, under the supervision of J. Stephen Wormith. Dr. J. Stephen Wormith received royalties from sales of the Level of Service/Case Management Inventory from its publisher, Multi-Health Systems. The views expressed in this article do not necessarily reflect the views of the authors’ respective organizations.
