Abstract
Effective intervention with offenders requires accurate identification of their risk-relevant propensities. In this prospective study, 139 Canadian community supervision officers were trained to assess the risk factors and criminogenic needs of adult male sexual offenders using structured risk tools. Recidivism outcomes were recorded for 768 offenders (average age of 41 years, approximately half had child victims, 14% Aboriginal) during an average 7-year follow-up period. All forms of recidivism (sexual, violent, any) were predicted by sex crime specific risk tools based on static, historical factors (Static-99R; Static-2002R) and by tools designed to assess psychologically meaningful risk factors of sexual offenders (STABLE-2000; STABLE-2007). Professional overrides of the Static-99 scores did not improve predictive accuracy. STABLE-2007 scores added incrementally over STATIC scores for all recidivism outcomes, but only for complete cases, suggesting meaningful variation in the extent to which community supervision officers can assess psychologically meaningful risk factors for sexual offenders.
Managing the risk presented by sexual offenders is a preoccupation of the criminal justice system. Even a single instance of sexual recidivism by a known offender can invoke considerable public scrutiny. Most sexual offenders in Canada and the United States will serve part of their sentence under community supervision. The effectiveness of community supervision for sexual offenders requires the accurate identification of risk-relevant propensities. Sexual offenders vary in their risk for recidivism, as well as in the reasons for that risk. For example, community supervision officers would want to set different conditions and treatment plans for an otherwise prosocial man who has molested boys as a sports coach compared with a substance abusing pimp with a history of vindictive rape.
During the past 20 years, there has been considerable research identifying the risk-relevant propensities of sexual offenders (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2005; Mann, Hanson, & Thornton, 2010). These factors, also called criminogenic needs (Andrews & Bonta, 2010), fall within the broad categories of general criminality (e.g., negative peer associations, low self-control, substance abuse) and sex crime specific criminality (e.g., deviant sexual interests, sexual preoccupations, emotional congruence with children). For general criminality, there are a number of validated risk tools that are commonly used by community supervision officers, such as the Level of Service Assessments (Andrews, Bonta, & Wormith, 2010) and the Statistical Information on Recidivism–Revised (SIR-R1; Nafekh & Motiuk, 2002), which also predict general recidivism among sexual offenders (Hanson & Morton-Bourgon, 2009; Wormith, Hogg, & Guzzo, 2012).
There is much less research concerning structured risk scales for the assessment of sex crime specific needs. The most commonly used risk scales for sexual offenders have focused on static, historical factors (e.g., Static-99; McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2010). Although static factors, such as the number of prior offenses, can be efficient risk markers, it is not obvious how they can be used to infer risk-relevant propensities (Bonta, 1996). Consequently, there is a need for empirically validated risk tools for sexual offenders that help community supervision officers identify the source of the risk, and to track changes over time.
The Sexual Violence Risk-20 (Boer, Hart, Kropp, & Webster, 1997) was one of the earliest of the current generation of structured risk scales for sexual offenders. Based on a structured professional judgment (SPJ) approach, evaluators rate 20 factors relevant to case management, and then provide an overall assessment (low/moderate/high) concerning the extent to which the offenders have life problems worthy of clinical intervention (Hart & Boer, 2010). This instrument is primarily aimed at forensic mental health practitioners, including items such as major mental illness, negative attitude toward intervention, and psychopathy (Hare, 2003).
Thornton’s Structured Risk Assessment (SRA) framework is another plausible approach to assessing the criminogenic needs of sexual offenders (Thornton, 2002). Originally an SPJ measure (Thornton, 2002), this framework has been organized into a structured scale (Structured Risk Assessment–Forensic Version [SRA-FV]; Thornton & Knight, 2015). The SRA framework, however, may not be ideally suited to the context of community supervision because SRA scores are largely determined by life history and are relatively insensitive to changes in current functioning.
The Violence Risk Scale–Sexual Offender version (VRS-SO; Olver, Wong, Nicholaichuk, & Gordon, 2007) combines 7 static and 17 dynamic risk factors into empirically derived recidivism rate estimates. The dynamic factors are grouped into three broad categories of sexual deviance, criminality, and treatment responsivity. It is the only measure for sex offenders that provides empirically derived adjustments to initial risk ratings based on how the offender progresses (or not) in treatment (Olver, Beggs Christofferson, Grace, & Wong, 2014). Given the VRS-SO’s focus on treatment, it is unclear how this scale should be applied to offenders who have not recently been in treatment.
One risk tool that was specifically designed for community supervision of sexual offenders is the Sex Offender Treatment Intervention and Progress Scale (SOTIPS; McGrath, Lasher, & Cumming, 2012). It contains 16 items grouped into three categories: (a) sexual deviance, (b) criminality, and (c) social stability and support. Initial research has been promising (McGrath et al., 2012), and an extended SOTIPS replication study is currently underway (Michael Miner, personal communication, February 28, 2014).
The current study focuses on the STABLE-2007 (Hanson, Harris, Scott, & Helmus, 2007), which is the most commonly used measure of sexual offender needs in Canada and the United States (Kelley, Barahal, Thornton, & Olson, 2015; McGrath et al., 2010). It contains 13 items related to sexual self-regulation (e.g., sexualized coping, deviant sexual interests), general self-regulation (e.g., poor cognitive problem solving), social relationships (e.g., lack of concern for others), and cooperation with supervision. Currently, there are translations of the scoring manual (Fernandez, Harris, Hanson, & Sparks, 2014) available in French, German, Dutch, Swedish, and Estonian. STABLE-2007 was the product of a program of research informed by quantitative reviews (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2004, 2005), by interviews with community supervision officers (Hanson & Harris, 2000, 2001), and other risk scales (Beech, 1998; Thornton, 2002). Hanson and Harris created STABLE-2000 (in 2000), tested it in the Dynamic Supervision Project (DSP; see Hanson & Harris, 2013; Harris & Hanson, 2003, 2010), and used the results of that study to create STABLE-2007 (Hanson et al., 2007).
Independent validation studies of STABLE-2007 have been conducted by Eher, Matthes, Schilling, Haubner-MacLean, and Rettenberger (2012); Eher et al. (2013); Looman and Abracen (2012); and Sowden (2013). STABLE-2007 significantly predicted sexual recidivism in all studies except Sowden (2013) and was significantly incremental to Static-99/R in Eher et al. (2013) and Looman and Abracen (2012; but not Eher et al., 2012, or Sowden, 2013). In all four studies, STABLE-2007 was scored by research psychologists based on files of offenders serving custodial sentences.
Although there is general consensus concerning the risk-relevant propensities of sexual offenders, and evidence that these propensities can be assessed reliably by trained professionals using structured risk tools, we know little about the qualifications and training required to implement these measures with adequate fidelity. Given that certain constructs included in STABLE-2007 require considerable psychological inference (e.g., sexualized coping), it is an open question whether they can be reliably assessed by community supervision officers as part of their routine duties. In the current study, we examined the ability of community supervision officers to assess the criminogenic needs of adult male sexual offenders using STABLE data from the DSP (Hanson et al., 2007). Officers were trained to assess a range of static and dynamic risk factors, and then their risk scores were used by the officers to guide their case management. Once trained, the officers were asked to score consecutive cases, resulting in a relatively complete sample of adjudicated sexual offenders that included all types (child victims, adult victims, non-contact, intrafamilial) in the proportions naturally found in each jurisdiction. In summary, this study examined the applied use of these risk tools with routine, unselected sexual offenders using a prospective design.
Method
The original DSP study included approximately 1,000 sexual offenders from Canada and two U.S. states (Iowa and Alaska), who started supervision between 2001 and 2005. Only the Canadian samples were included in the current study because of the difficulty obtaining reliable recidivism information from the U.S. states. As well, four Canadian female offenders were deleted. These women (age range from 23 to 62) had all sexually offended against children (age range of 6-14) who were family members or acquaintances. Three of the four women had been hospitalized overnight for a psychiatric condition.
All offenders were adult males starting a period of community supervision (probation or parole) following a recent conviction for a sexual offense. Offenders were excluded if they had been in the community for 6 months prior to the first assessment. For the vast majority of offenders, their most recent conviction was a sexually motivated offense; however, certain cases were included whose most recent offense was not sexually motivated provided that they began supervision for a sexually motivated offense within the past 2 years. In all cases, the supervising officers considered the offenders’ primary problem to be sexual offending and were supervising them as such.
Data on offenders were obtained from 139 supervision officers from 14 different jurisdictions: all Canadian provinces and territories, as well as the Atlantic region of the Correctional Service of Canada (CSC). Descriptive data on the supervision officers were not collected. Based on current practices at the time, the sample of officers would have included men, women, diverse levels of expertise, sex crime specialists, and officers with general caseloads. The use of the risk tools was mandated by the jurisdictions, but the submission of data was voluntary. Consequently, the sample does not include all adjudicated sex offenders in these jurisdictions. Of 887 Canadian male offenders registered in this project, 119 were removed because no relevant assessment information was provided. Of the 768 included, 764 had Static-99R scores, 710 had Static-2002R scores, 616 had at least one STABLE-2000 assessment, and 615 had at least one STABLE-2007 assessment.
On average, the offenders were 41 years old (n = 768, SD = 13.5, range: 18-84 years) and serving their first sentence for a sexual offense conviction (69% [490/710] had no prior sentencing occasions for a sex offense). Half the sample had child victims (22% [162] were incest offenders, and 27% [206] had any unrelated child victim), 35% (267) had adult victims, 10% (76) had only non-contact offenses, and 6% (44) were mixed offenders (n = 738 for offense type classification). Most of the sexual offenses (77%, 588/763) involved physical contact but not overt physical injury. Eleven percent of the offenders (82/746) had ever been hospitalized overnight for a psychiatric condition, and 5% (37/746) had been diagnosed as developmentally delayed (DD; low intellectual functioning). Two out of three offenders (67.5%) had ever lived with a lover (married or common law) for 2 years (516/764). Three out of 10 offenders (29.5%, 224/760) had been separated from their biological parents prior to age 16. Fourteen percent of the offenders self-identified as being of Aboriginal heritage (106/760).
Measures
Stable-2000
The STABLE-2000 (Hanson et al., 2007) is a mechanical risk tool assessing dynamic risk factors among adult male sex offenders. It has 16 items organized into six subsections (significant social influences, intimacy deficits, sexual self-regulation, attitudes, general self-regulation, and cooperation with supervision). Each item is assessed using a 3-point rating system, where 0 refers to no problem, 1 refers to some concern/slight problem, and 2 refers to present/definite concern. Total scores on the STABLE-2000 are calculated by summing the highest score on each subsection, resulting in total scores ranging from 0 to 12, where scores of 0 to 4 indicate low risk, 5 to 8 indicate moderate risk, and scores of 9 or higher indicate high risk. Although officers submitted multiple STABLE assessments, only the first assessment was used.
Stable-2007
The STABLE-2007 (Fernandez et al., 2014; Hanson et al., 2007) is an empirical actuarial risk tool assessing dynamic risk factors among adult male sex offenders. The scale was developed by revising the STABLE-2000 scale based on preliminary results from this dataset (Hanson et al., 2007). Specifically, the three attitude items were removed and then the items for lovers/relationship stability and deviant sexual interest were revised to more clearly incorporate past behavior. Last, the emotional identification with children item was restricted to apply only to offenders with at least one victim less than 14 years old. Consequently, the STABLE-2007 has 13 items organized into five subsections (significant social influences, intimacy deficits, sexual self-regulation, general self-regulation, and cooperation with supervision).
Total scores on the STABLE-2007 are calculated by summing all item scores. Given that the emotional identification with children item is scored only for offenders with child victims, this means that total scores can range from 0 to 26 for offenders with child victims and 0 to 24 for other offender types. Based on the total score, offenders can be assigned to one of three risk categories: low risk (0-3), moderate risk (4-11), and high risk (12+). For these analyses, STABLE-2007 scores were calculated only if there was no more than one item with missing information. Officers submitted scores on an earlier version of the scale (STABLE-2000), and STABLE-2007 was calculated by dropping and modifying the previously noted items (see Hanson et al., 2007).
Static-99R
Static-99R (Hanson & Thornton, 2000; Helmus, Thornton, Hanson, & Babchishin, 2012) is an empirically derived actuarial risk assessment tool designed to predict sexual recidivism in adult male sex offenders (see also www.static99.org). The Static-99R items are identical to those in the Static-99 with the exception of updated age weights. The scale has 10 items assessing criminal history, victim characteristics, age, and relationship history. Static-99R has been found to have moderate predictive accuracy for sexual recidivism (area under the receiver operating characteristic curve [AUC] = .69; Helmus, Hanson, Thornton, Babchishin, & Harris, 2012). In this study, the supervising officers completed the original Static-99 scale. Static-99R was later computed by using the offender’s date of birth to calculate the updated age item.
Static-2002R
Static-2002R (Hanson & Thornton, 2003; Helmus, Thornton, et al., 2012) is also an empirical actuarial risk assessment tool for adult male sex offenders (see also www.static99.org). The scale has 14 items grouped into five main subscales: age at release (1 item), persistence of sex offending (3 items), sexual deviance (3 items), relationship to victims (2 items), and general criminality (5 items). The Static-2002R items are identical to the original Static-2002 with the exception of updated age weights. Static-2002R has moderate predictive accuracy (AUC = .69), similar to that of Static-99R (Babchishin, Hanson, & Helmus, 2012). Static-2002 was scored in 2005 by two research assistants based on victim information provided by supervising officers in their Static-99 assessment and from official criminal records (Helmus & Hanson, 2007).
Static-99 Override
After completing a file review, recording demographic and offense history, and scoring the Static-99, officers were asked the following: Are there any exceptional circumstances that support an override of the Static risk assessment (e.g., historical offender being low risk because last known offense was 30 years ago, and there is no evidence of subsequent offending or a first time offender is high risk due to a large number of victims)? The response options were (a) there are no exceptional circumstances, (b) there are exceptional circumstances that increase risk, and (c) there are exceptional circumstances that decrease risk. For the purpose of statistical analyses, these response options were coded as 0, +1, and −1, respectively.
Procedure
Assessments were completed as part of the routine supervision practices of the officers (Hanson et al., 2007). Supervising officers attended a 2-day training session, although in rare cases, officers were trained by apprenticing with other local officers.
On average, each of the 139 participating officers submitted information on 5.5 cases (Mdn = 3). The rate of participation varied (SD = 6.9, range: 1-57), with 33 officers reporting on only one case, and 29 officers reporting on only two cases. There was also variability in the rate at which the submitted data for a case were complete (343 out of 768 cases), defined as including the Static-99, the Static-99 override, at least one STABLE-2000, and at least one ACUTE-2000. 1
Rater reliability was assessed by six experts re-scoring 92 cases. Given our selection procedures, the reliability files contained a disproportionate number of complete cases. For the total scores, rater reliability was high: For Static-99, the intraclass correlation coefficient (ICC) was .91 (n = 88) and for STABLE-2000, the ICC was .89 (n = 87). The ICC for the individual STABLE-2000 items ranged from .66 to .92 (Mdn = .83). The only item with unacceptable levels of rater reliability was the Static-99 override rating (ICC = .15, n = 74). The above ICCs likely overestimate reliability because the second raters used case files prepared by the officer who originally scored the case. As well, although the second raters were directed to make independent ratings, the original scores were contained in the case files.
Recidivism
All of the risk measures (Static-99R, Static-2002R, STABLE-2000, and STABLE-2007) were designed to assess the likelihood of new sexually motivated offenses for individuals who have already been charged or convicted for a sexually motivated offense against an identifiable victim. Nevertheless, other types of recidivism were considered because the predictors of general recidivism overlap with the predictors of sexual recidivism, and non-sexual recidivism is more common than sexual recidivism among individuals convicted of sexual offenses. Specifically, we examined five different recidivism outcomes: (a) sexual recidivism—any sexually motivated offense (regardless of the charge or conviction), including Category B sex offenses as defined by the Static-99 coding rules (e.g., possession of child pornography, prostitution; Harris, Phenix, Hanson, & Thornton, 2003); (b) sexual recidivism including breaches (“any sex”)—all sexual recidivism as well as sexual breaches, defined as official sanctions for sexually motivated violations of the conditions of community supervision (e.g., being in the company of children); (c) violent recidivism—all crimes involving direct confrontation with the victim. This category included contact sexual offenses, but excluded non-contact sex offenses and sexually motivated breaches; (d) any criminal recidivism—included all crimes (sexual, violent, or non-violent) but excluded technical offenses; and (e) any recidivism—included all crimes (sexual, violent, non-violent), as well as all technical offenses (e.g., breach of conditional release).
Information concerning new offenses was gathered from provincial and national criminal history records, supervising officers, and local police jurisdictions. National criminal history records maintained by the Royal Canadian Mounted Police (RCMP) were received in August 2005, June 2006, and March 2011. Provincial records were received from British Columbia (January 2006), Manitoba (April 2005), and Ontario (December 2005). The Offender Management System of CSC was checked in May 2005, for the CSC offenders registered in the project. Police jurisdictions were contacted to obtain information on all new sexual and violent recidivism incidents identified from official records (as well as technical or non-violent incidents where a serious sanction was applied). Police contacts also resulted in identification of new recidivism incidents not appearing on official criminal records. Last, newspaper databases were searched for each offender in the summer of 2011 to identify new charges or convictions that may not be on criminal records.
Recidivism was considered to have occurred if the agency reporting the information believed that the offense occurred. For breaches, however, an official record of parole revocation or a new conviction for violation of conditional release was required. Given that criminal history records were the major source of recidivism information, the vast majority of recidivism events were linked to an officially recorded charge or conviction.
For violent and sexual recidivism incidents, information about the date and circumstances of the offense were obtained in all but a few cases. For other recidivism types, the date and offense circumstances were known only if the supervising officer provided this information. Classification of offense types was based on the behavior (if available) as opposed to the official charge. So, for violent and sex offenses, the date recorded in the dataset was generally the date of offense. For non-violent recidivism outcomes, it was often the date of charge or conviction. The follow-up period was calculated from the date of the first assessment information to the date that the last recidivism information was received (or until death or deportation). The offender start dates ranged from February 2001 to October 2005. Eighty-four percent of cases were followed until March 2011 (range: 0.2-10.1 years, M = 7.4, SD = 2.2, Mdn = 7.9).
Overview of Analyses
All analyses were run independently by two people (the first two authors and two undergraduate students [psychology, criminology]). Discrimination accuracy was measured using the AUC (Swets, Dawes, & Monahan, 2000). AUCs can be interpreted as the probability that a randomly selected recidivist would have a higher risk score than a randomly selected non-recidivist. Hanley and McNeil’s (1983) approach was used to compare AUCs between different groups.
Incremental predictive accuracy used Cox regression (Allison, 1984), which estimates relative predictive accuracy (hazard ratios) associated with one or more predictor variables from survival data with unequal follow-up times. For Cox regression, the offender’s jurisdiction (i.e., province or CSC region) was used as a strata variable, allowing for separate baseline hazard functions.
Recidivism estimates for combining STABLE-2007 with either Static-99R or Static-2002R were calculated using life table survival analysis (Soothill & Gibbens, 1978). Although the median follow-up was 8 years, recidivism estimates are presented only up to 5 years because criminal events take considerable time to appear on official records (Bureau, 2015).
Results
The overall recidivism rates were 10.8% (n = 83) for new sexual crimes, 12.9% (n = 99) for sexual crimes or sexually motivated breaches, 21.4% (n = 164) for any violent recidivism, 31.5% (n = 242) for any new crime, and 36.2% (n = 278) for any new transgression including breaches (all percentages based on a total sample of 768 offenders).
Descriptive statistics for the risk measures are presented in Table 1. The correlation between Static-99R and Static-2002R was .92, and between STABLE-2000 and STABLE-2007 was .91. The correlations between the STATIC and STABLE measures ranged from .40 to .45. The total scores of all the risk scales were significantly related to all outcomes (see Table 2). AUC values ranged from a low of .652 for STABLE-2000 predicting sexual recidivism to a high of .759 for Static-2002R predicting any recidivism (including breaches). In general, the STATIC risk scales (Static-99R; Static-2002R) showed somewhat better discrimination than the STABLE risk scales (STABLE-2000; STABLE-2007; Mdn = .732 vs. Mdn = .677).
Descriptive Statistics for Risk Measures
Note. All correlations p < .001.
Discrimination of Risk Measures for Five Recidivism Outcomes
Note. AUC = area under the receiver operating characteristic curve; CI = confidence interval.
Most STABLE-2007 items were positively associated with all types of recidivism (59 out of 65 comparisons; see Appendix A). The weakest effects were found for the emotional identification with children and deviant sexual interests, which were not significantly related to any of the recidivism outcomes. Five of the six STABLE-2000 items that were removed from STABLE-2007 (or revised) were not significantly related to either of the sexual recidivism outcomes. Readers are cautioned, however, that given the large number of item-outcome comparisons (95), some of the variation in the predictive accuracy is simply due to chance.
Completed cases displayed better predictive accuracy than incomplete cases (see Tables 2 and 3). Completed cases contained the Static-99, the Static-99 override, at least one STABLE-2000, and at least one ACUTE-2000. Of the 139 officers, 42 always submitted completed cases, 105 submitted at least one completed case, and 34 never submitted completed cases (median per officer was 50% completed cases). There was considerable officer consistency in the rate of submitting completed cases (r2 = .44). To estimate this between-officer effect size, logistic regression was used with case completion as the dependent variables (0 = incomplete; 1 = completed) and officers as a categorical predictor (139 categories). The effect size was then computed as the square of the Pearson correlation between the predicted values and outcome (Hosmer, Lemeshow, & Sturdivant, 2013, §5.2.5). The proportion of completed cases was not significantly related to the number of cases each officer submitted (r = −.149, p = .081). The recidivism rates for completed cases were not significantly different from the rates for incomplete cases (odds ratios for the five outcomes ranged from .82 to .91 [with .5 added to each cell; Fleiss, Levin, & Paik, 2003], in the direction of lower recidivism rates for the complete cases; two-tailed Fisher’s exact tests values ranging from p = .29 to p = .67).
Comparing Discrimination for Completed Versus Incomplete Assessments (Sexual Recidivism)
Note. AUC = area under the receiver operating characteristic curve; CI = confidence interval.
For sexual recidivism, the differences in AUCs for the risk scales between the completed and incomplete cases were large and significant for all scales but Static-99R (for Static-99R, AUC of .80 vs. .68, difference = .116, 95% confidence interval [CI] of [−.001, .232]; for Static-2002R, .80 vs. .68, difference = .127, 95% CI of [.011, .243]; for STABLE-2000, .73 vs. .56, difference = .169, 95% CI of [.027, .310]; for STABLE-2007, .76 vs. .58, difference = .182, 95% CI of [.044, .319]). Even though the predictive accuracy of STATIC instruments was reduced in the incomplete cases, they were still significantly related to the outcome. In comparison, neither STABLE-2000 nor STABLE-2007 significantly predicted sexual recidivism for incomplete cases (AUCs of .56 and .58, respectively). In contrast, for the completed cases, the predictive accuracies of the STABLE-2000 and STABLE-2007 were moderate to large (AUCs of .73 and .76, respectively). Of the 13 STABLE items, 12 had a higher AUC for the completed cases. The probability that at least 12 of the 13 comparisons would favor completed cases was less than would be expected by chance (exact binomial test; p = .002).
Of the 343 completed cases, officers recommended override of the Static-99 risk score in 61 cases (17.8%). Risk was increased in 54 cases and decreased in 7 cases. The overrides had little relationship to recidivism (AUC for sexual recidivism = .554, 95% CI [.445, .663]). Although the Cox regression beta weights for the incremental effect over Static-99 were positive for all five recidivism outcomes (see Table 4), none were statistically significant. When the override value (−1, 0, +1) was added to the Static-99 total score, the AUCs for the adjusted scores were lower for four of the five outcomes (the exception being any criminal recidivism, where the AUC increased slightly from .715 to .721). The absolute values of all the changes were small (median change of −.005 AUC units). The same patterns were observed when Static-99R or Static-2002R was used as the control variable.
Incremental Predictive Accuracy of Professional Override (−1, 0, +1)
Note. Total sample size of 343 for the AUC analyses, with the number of recidivists identical to those in the Cox regression analyses (using jurisdiction as strata). AUC = area under the receiver operating characteristic curve; CI = confidence interval.
As can be seen in Table 5, STABLE-2007 scores added incrementally to four of the five recidivism outcomes after controlling for Static-99R and to all types of recidivism after controlling for Static-2002R. For sex crime recidivism, the incremental effect did not reach conventional standards of significance (p = .081, two-tailed). When the analysis was restricted to the completed cases, however, STABLE-2007 was significantly incremental to both STATIC measures for all recidivism outcomes. For the completed cases, the median incremental hazard ratio was 1.08, meaning that for each 1 unit increase in STABLE-2007 total score, offenders would be expected to have an 8% increase in the rate of recidivism, averaged across the follow-up period. Given that STABLE-2007 and the STATIC measures were incremental to each other, we created rules for combining both measures into an overall evaluation of risk (see Appendix B). The combination rules start with the STATIC risk categories and then adjust up one category if the STABLE-2007 score is high and adjust down one category if the STABLE-2007 score is low. Low and high STABLE-2007 scores were defined as ±1 SD from the mean in routine samples of sexual offenders (i.e., 0-3 = low, 4-11 = moderate, 12 or higher = high; Fernandez et al., 2014). Exceptions to the rules were as follows: No category was lower than low, no category was higher than high for Static-2002R, and offenders with high Static-99R scores (6+) remained in the high category even when their STABLE-2007 score was low. Of the three offenders with high Static-99R scores and low STABLE-2007 scores, two sexually reoffended. There were no offenders with high Static-2002R scores and low STABLE-2007 scores.
Incremental Predictive Accuracy of STABLE-2007
Note. HR = hazard ratio.
The discrimination for the combined STATIC/STABLE risk categories are displayed in Table 2. The AUCs were mostly moderate to large, ranging from .69 to .75 for all assessments (Mdn = .73) and .72 to .81 for the completed assessments (Mdn = .77). Readers should note that the AUC values for risk categories are expected to be lower than for continuous scores because precision is lost when data are clumped.
Discussion
The current study examined the extent to which community supervision officers could assess the criminogenic needs of sexual offenders. When officers used a structured approach to identifying risk-relevant propensities (STABLE-2000/STABLE-2007), their scores were significantly related to sexual, violent, and any recidivism. Furthermore, STABLE-2007 provided incremental information over Static-99R and Static-2002R (well-validated risk tools based on criminal history and demographic information) for most types of recidivism.
There was considerable variability in the predictive accuracy of the risk measures based on whether or not the case assessments were complete. Specifically, when officers completed all the measures requested of them (static, stable, acute, override), the predictive accuracies of the STATIC and STABLE assessments were high, and higher than when officers submitted incomplete cases. Among incomplete cases, the STATIC measures (Static-99R, Static-2002R) continued to be moderately accurate. In contrast, neither STABLE-2000 nor STABLE-2007 significantly predicted the intended outcome (sexual recidivism) among the incomplete cases.
The substantial difference in predictive accuracy for STABLE-2007 for the complete and incomplete cases (AUC values of .76 vs. .58, respectively) cannot be interpreted with confidence because it was not expected. Our working hypothesis is that it was related to the conscientiousness with which the officers engaged in the risk assessment process. Although risk assessment has become a core task of community supervision, it is not clear that all officers have the skills and motivation to provide valid assessments. Miller and Maloney (2013), for example, found that only half of the probation officers in their sample were substantive compliers with risk/needs assessments, with the remaining officers divided between bureaucratic compliers and cynical compliers. Consequently, it is likely that a non-trivial portion of the officers in the current study did not fully engage in the assessment process.
The largest differences in predictive accuracy for the completed/incomplete cases were for the sexual crime specific items that required psychological inference (see Table 3). Whereas the effect sizes for the completed cases were moderate (AUC > .60) for hostility toward women, sex as coping, sexual entitlement attitudes, and deviant sexual interests, the effect sizes for these items hovered near chance levels (.46-.52) for the incomplete cases. In contrast, the predictive accuracies for cooperation with supervision, impulsive acts, and poor cognitive problem solving were similar for the complete and incomplete cases (significant AUCs ranging from .57 to .66). Given that these latter items are relevant to all probationers (sexual or non-sexual), it is likely that most officers were already skilled at identifying these features prior to the training provided in the current project.
Overall, we found the predictive accuracy of the STABLE risk tools to be similar to that observed in other settings. The AUC values we found for STABLE-2000 predicting sexual recidivism (.65 for the full sample; .73 for completed cases) bounded those of Saum (2006; AUC = .68) and Webb, Craissati, and Keen (2007; AUC = .72). Eher et al. (2012) found a somewhat lower value (.62).
For STABLE-2007 predicting sexual recidivism, we observed AUC values of .67 overall and .76 for complete cases. Eher’s research group observed intermediate AUC values for STABLE-2007 predicting sexual recidivism of .71 for Austrian prisoners and .71 for German forensic psychiatric patients (Eher et al., 2012; Eher et al., 2013). Sowden (2013), however, found only small effects for STABLE-2007 predicting sexual recidivism among Canadian federal sexual offenders attending a high-intensity treatment program (AUC of .56 for pre-treatment scores; .58 for post-treatment scores). It was unclear whether Sowden had the necessary information to score STABLE-2007 as the rater reliability in her study was only ICC = .38 for the pre-treatment scores and ICC = .56 for the post-treatment scores.
The significant incremental effect of STABLE-2007 scores over STATIC scores contrasts with the failure of unstructured professional overrides to improve risk prediction. At the time that this study was designed (2000), unstructured adjustments to empirical actuarial risk tools were considered a plausible approach to risk assessment (Hanson, 1998, p. 53). There is now professional consensus that structured risk assessments are more accurate than unstructured risk assessments (Hanson, 2009; Heilbrun, Yasuhara, & Shah, 2010; Skeem & Monahan, 2011). Furthermore, since 2000, professional opinion has become increasingly disillusioned with post hoc adjustments to actuarial risk tools (Campbell & DeClue, 2010; Hanson & Morton-Bourgon, 2009; Wormith et al., 2012), a trend supported by the current findings. Post hoc adjustments likely fail because evaluators have difficulty judging whether particular characteristics are risk-relevant, and, if characteristics are risk-relevant, the extent to which they have been already captured by the existing items.
According to Kahneman and Klein (2009), mechanical risk tools outperform professional judgment when (a) the number of relevant variables is large, (b) the effect of any particular variable is small, and (c) we do not receive rapid feedback concerning the accuracy of our decisions. All three conditions apply to risk assessment with sexual offenders. As observed in the current study and previous meta-analyses (Hanson & Bussière, 1998; Hanson & Morton-Bourgon, 2005), no variable was strongly related to sexual recidivism. It is often years between the assessment date and the date at which recidivism is known—if at all. Consequently, it should not be surprising that the evaluators in the current study, like many before them (Bengtson, & Långström, 2007; Cocozza, & Steadman, 1976; Quinsey & Maguire, 1986), failed to demonstrate intuitive expertise for offender risk assessment.
Limitations
Empirically derived risk assessment instruments gain credibility through replication in diverse, independent samples. Although the final version of STABLE-2007 has been validated in independent samples (Eher et al., 2012, 2013), the construction of STABLE-2007 was based on a single sample. Consequently, there is the risk of overfitting, that is, selecting variables that do not generalize to other samples. A related concern is that potentially important variables may have been missed. For example, attitudes variables included in STABLE-2000 were dropped from STABLE-2007 because they were unrelated to recidivism in previous versions of this dataset; however, subsequent research in other settings has found the STABLE-2000 child molester attitudes item to significantly predict sexual recidivism among child molesters (Helmus, Hanson, Babchishin, & Mann, 2013). Consequently, we believe that the propensities targeted by STABLE-2007 are credible but are unlikely to be ideal.
Large sample sizes are needed for the development of empirically derived risk tools. In agreement with Vergouwe, Steyerberg, Eijkemans, and Habbeman (2005), we believe that approximately 100 events and 100 non-events are required for stable estimates. Compared with this standard, the sample size of recidivists in the current study was barely adequate for evaluating the prediction of sexual crime recidivism (83 recidivists), and insufficient for reliable analyses of subgroups, or incremental effects.
Risk measures may also perform differently with offenders of different ethnic background (Babchishin, Blais, & Helmus, 2012; Långström, 2004). Analyses of an earlier version of this dataset found that STABLE-2007 worked poorly for offenders of Aboriginal heritage (Helmus, Babchishin, & Blais, 2012). Consequently, further research is needed before STABLE-2007 can be used confidently with Aboriginal sexual offenders.
Offenders with developmental delays may also have distinct risk factors that are not addressed by STABLE-2007. Hanson, Sheahan, and VanZuylen’s (2013) meta-analysis concluded that Static-99 works as intended with DD sexual offenders. The small number of DD offenders in the current sample (8 sexual recidivists out of a total of 38) limits our ability to make strong statements concerning the utility of STABLE-2007 with sexual offenders with developmental delays.
Directions for Research
STABLE-2007 is intended to assess risk-relevant propensities, that is, offender characteristics that can be used to guide interventions and community safety planning with sexual offenders. Although there is reasonable evidence that the items in STABLE-2007 are related to recidivism risk (Hanson & Harris, 2013), further research is needed concerning whether the STABLE items are dynamic risk factors (Kraemer et al., 1997). Currently, there is some evidence that STABLE items change during the course of treatment (Nunes, Babchishin, & Cortoni, 2011; Sowden, 2013); however, there is no research demonstrating that changes in STABLE items are related to changes in recidivism risk. Furthermore, little is known about how the use of STABLE-2007, or any other risk tool, influences community supervision practices. Based on research on community supervision for general offenders (Bonta, Rugge, Scott, Bourgon, & Yessine, 2008), officers require more than valid risk tools to meaningfully influence recidivism risk.
Implications for Practice
The reliability and predictive accuracy of STABLE-2007 were sufficient to justify its use in applied risk assessment with adult male sexual offenders. The use of any evidence-based risk tool, of course, requires a preliminary judgment concerning the similarity of the case-at-hand to the participants in the samples for which the tool was intended and validated. The greater the differences, the greater the inferences required to interpret the results. Without research evidence, however, it is difficult for evaluators to know the extent to which any observed differences on factors external to the risk scheme actually make a difference in the offender’s overall risk.
Given that there were consistent incremental effects of Static-99R and Static-2002R over STABLE-2007, evaluators should combine STABLE-2007 with a static risk tool when conducting comprehensive risk evaluations. Tables with recidivism estimates based on combining STABLE-2007 with Static-99R, Static-2002R, or Risk Matrix-2000 Sex (Helmus, Hanson, Babchishin, & Thornton, 2015) are available upon request.
The results also suggest that care is needed when implementing STABLE-2007, given that a substantial portion of the current sample submitted STABLE assessments that had little or no relationship to recidivism. High fidelity implementation requires officer training, effective methods for motivating evaluators, ongoing supervision, and systemic support from senior managers (for more guidance on implementation and fidelity issues, see Fernandez et al., 2014).
Footnotes
Appendix A
Appendix B
The views expressed are those of the authors and not necessarily those of Public Safety Canada. We would like to thank the community supervision officers who took the time to contribute data to this project, Marie Coligado and Jessica Woodley for double-running analyses, and the following individuals for assistance with data coding and management: Terri-Lynne Scott, Jennifer Cooney, Erik Gaudreault, Shannon Hodgson, Cathrine Pettersen, Shelley Price, Chelsea Sheahan, and Kimberly Smallshaw.
Notes
). He speaks on the history of prison architecture, risk assessment for violent, sexual, and pornography offenders, psychopathy, high-risk offenders, and sexual offenders with prominent cognitive delays.
