Abstract
This study presents the results from efforts to revise the Minnesota Sex Offender Screening Tool–Revised (MnSOST-R), one of the most widely used sex offender risk-assessment tools. The updated instrument, the MnSOST-3, contains nine individual items, six of which are new. The population for this study consisted of the cross-validation sample for the MnSOST-R (N = 220) and a contemporary sample of 2,315 sex offenders released from Minnesota prisons between 2003 and 2006. To score and select items for the MnSOST-3, we used predicted probabilities generated from a multiple logistic regression model. We used bootstrap resampling to not only refine our selection of predictors but also internally validate the model. The results indicate the MnSOST-3 has a relatively high level of predictive discrimination, as evidenced by an apparent AUC of .821 and an optimism-corrected AUC of .796. The findings show the MnSOST-3 is well calibrated with actual recidivism rates for all but the highest risk offenders. Although estimating a penalized maximum likelihood model did not improve the overall calibration, the results suggest the MnSOST-3 may still be useful in helping identify high-risk offenders whose sexual recidivism risk exceeds 50%. Results from an interrater reliability assessment indicate the instrument, which is scored in a Microsoft Excel application, has an adequate degree of consistency across raters (ICC = .83 for both consistency and absolute agreement).
Introduction
The 1990s were a fertile period for the development of actuarial tools designed to predict sex offense recidivism. In response to a Minnesota Department of Corrections (MnDOC) report calling for a more formal and uniform process to identify predatory and violent offenders, Epperson and colleagues started work on developing the Minnesota Sex Offender Screening Tool (MnSOST) in 1991 (Epperson, Kaul, Huot, Goldman, & Alexander, 2003). Five years later, they began efforts to revise the MnSOST, eventually resulting in the Minnesota Sex Offender Screening Tool–Revised (MnSOST-R). During this same period, Quinsey and colleagues developed the Violence Risk Appraisal Guide (VRAG) along with the Sex Offender Risk Appraisal Guide (SORAG; Harris, Rice, & Quinsey, 1993; Quinsey, Harris, Rice, & Cormier, 1998; Quinsey, Rice, & Harris, 1995). In 1997, Hanson presented the Rapid Risk Assessment for Sex Offense Recidivism (RRASOR), a four-item tool that was later combined with the Structured Anchored Clinical Judgment (SACJ) to produce the STATIC-99 (Hanson & Thornton, 1999) and eventually the STATIC-2002 (Hanson & Thornton, 2003). That same year, Boer, Hart, Kropp, and Webster (1997) also published their work on the Sexual Violence Risk-20 (SVR-20).
Since its inception, the MnSOST-R has been one of the most widely used sex offender risk-assessment tools (McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2009). The original version of the tool (MnSOST), which was developed on a sample of 256 sex offenders released from Minnesota prisons, consisted of 21 items that were selected using empirical methods but were scored on the basis of clinical judgment. In contrast, both the selection and scoring of items on the MnSOST-R, which contains five fewer items (16), were based on empirical methods (a modified Nuffield weight system; Epperson et al., 2003). Of the 16 items on the MnSOST-R, 12 are historical or static factors, whereas the other four are considered dynamic factors.
In assessing the predictive accuracy of risk-assessment instruments, the area under the receiver operating characteristic (ROC) curve is the most widely used statistic. With values that range from 0 to 1, the area under the curve (AUC) statistic is interpreted as the probability that a randomly selected recidivist has a higher value on a risk-assessment instrument than a randomly selected nonrecidivist. Values at either end of the spectrum (0 or 1) reflect complete separation between recidivists and nonrecidivists, whereas a value of .50 indicates the prediction tool does no better than chance. Epperson and colleagues (2003) reported an AUC value of .77 for the sample on which the MnSOST-R was developed. In later research on a cross-validation sample of 220 sex offender releasees, Epperson et al. (2003) indicated the MnSOST-R had an AUC value of .73 for this sample. The results from the cross-validation sample therefore suggest that a randomly selected sexual recidivist will have a higher MnSOST-R value than a nonrecidivist 73% of the time.
Although the MnSOST-R has been one of the most widely used sexual recidivism actuarial tools (McGrath et al., 2009), only a handful of validation studies have been completed, some of which have been published in peer-reviewed journals (Barbaree, Seto, Langton, & Peacock, 2001; Bartosh, Garby, Lewis, & Gray, 2003; Boccaccini, Murrie, Caperton, & Hawes, 2009; Knight & Thornton, 2007; Kropp, 2002; Langton, Barbaree, Harkins, Peacock, & Arenovich, 2008; Langton et al., 2007; Parent, Guay, & Knight, 2011). With AUC values ranging from .59 to .73, the predictive accuracy of the MnSOST-R has generally been lower than that reported by Epperson and colleagues in the cross-validation sample. Still, its accuracy in predicting sexual recidivism is comparable to the other widely used risk-assessment instruments for adult sex offenders. For example, in their comprehensive study of the most commonly used sexual recidivism risk-assessment instruments, Knight and Thornton (2007) reported the MnSOST-R had AUC values that were slightly above the overall average. Moreover, in their meta-analysis of 118 prediction studies, 12 of which examined the MnSOST-R, Hanson and Morton-Bourgon (2009) reported a Cohen’s d value of .76, which suggests the instrument has moderately high predictive accuracy.
Although the predictive discrimination of the MnSOST-R is, on the whole, similar to that for other risk-assessment instruments, it has faced a number of criticisms. First, original development of the MnSOST-R did not contain a cross-validation sample (Vrieze & Grove, 2008; Wollert, 2002). Second, the development sample for the MnSOST-R oversampled for sexual recidivists, producing an artificially high baseline rate (Wollert, 2002). Third, unlike other sexual recidivism risk-assessment instruments, research on the development of the MnSOST-R (or MnSOST) has not been published in a peer-reviewed journal (Langton et al., 2008; Vrieze & Grove, 2008; Wollert, 2002).
In addition to these issues, assessing the predictive accuracy of the MnSOST-R on a contemporary sample of Minnesota sex offenders is complicated by substantial changes in policy and practice that have taken place over the past few decades. Since the inception of the Community Notification Act in 1997, Minnesota has used a tiered risk-management system in which the level of community notification is based on the offender’s predicted risk of sexual recidivism. Sex offenders with a high predicted risk of sexual recidivism are given the most extensive level of notification (i.e., community meetings held by law enforcement, publication of the offender’s photograph and offense description on the Minnesota Department of Corrections’ website, etc.), whereas those with lower risk are given more limited forms of notification. Because the MnSOST-R is used by End of Confinement Review Committees to determine risk levels for offenders, it has anchored Minnesota’s tiered risk-management system, which has been found to be effective in reducing sexual recidivism. In particular, Duwe and Donnay (2008) reported that broad community notification, which is applied to the highest risk sex offenders, significantly reduces sexual recidivism.
The MnSOST-R has also been used to screen sex offenders for civil commitment. Since the early 1990s, Minnesota has used civil commitment statutes to indefinitely confine more than 500 offenders who had served time in prison for a sex offense. Over the past two decades, approximately 7% of sex offenders released from Minnesota’s prisons have been civilly committed. By incapacitating some of the higher risk sex offenders, the expanding use of civil commitment has likely had an effect, albeit to an unknown extent, on the sexual recidivism rate.
In addition, Minnesota uses a determinate sentencing system in which offenders serve the initial two thirds of their sentence in prison and the remaining one third under community supervision. When the Minnesota legislature increased the penalties for sex offenses in the late 1980s and early 1990s, it meant not only that sex offenders were going to spend more time in prison but also that they would spend more time under postprison community supervision. During the 1990s, Minnesota increased its use of intensive supervision with sex offenders. Due largely to longer and more intensive periods of postprison supervision, sex offenders have returned to prison more frequently as technical violators (Minnesota Department of Corrections, 2007).
Thus, sex offenders released from Minnesota prisons over the past 15 years have been more likely to be civilly committed, subjected to broad community notification, intensively supervised, have their parole revoked for a technical violation, and incarcerated for longer periods of time. The growing use of these external constraints has likely been responsible, at least to some extent, for the declining sexual recidivism rates observed in Minnesota since the early 1990s (Minnesota Department of Corrections, 2007). Consistent with the risk principle, the application of such constraints has been concentrated among the highest risk offenders, who have had sexual recidivism rates similar to those considered to be lower risk (see Duwe & Donnay, 2008). Yet because of the similarity in reoffense rates between higher risk (i.e., those with higher MnSOST-R values) and lower risk (i.e., those with lower MnSOST-R values) offenders, using sexual recidivism as the measure to assess the predictive accuracy of the MnSOST-R on a contemporary sample of Minnesota sex offenders yields relatively low AUC values. Indeed, as shown later in Table 3, the MnSOST-R has an AUC value of 0.55 for the 2,315 sex offenders released from Minnesota prisons between 2003 and 2006.
Although the MnSOST-R has figured prominently in the apparent effectiveness of Minnesota’s tiered risk-management system, it is nevertheless necessary to continue efforts to accurately evaluate the validity and reliability of risk-assessment decisions. Given the potential impact of risk-assessment decisions on public safety and the offenders themselves, accurately predicting sexual recidivism—with or without external constraints—is paramount. Therefore, in this study, we attempt to develop an assessment tool that continues to be a valid predictor of not only sexual recidivism in the absence of external constraints but also sexual reoffending for those who face significant constraints or modifications to the environment.
Present Study
We revise the MnSOST-R in this study by developing the MnSOST-3. In doing so, we examined a sample of 2,535 sex offenders, most of whom were released from Minnesota prisons during a relatively recent 4-year period (2003-2006). In scoring and selecting items for the MnSOST-3, we use predicted probabilities generated from a multiple logistic regression model. We internally validated the model by using a bootstrap resampling method. To estimate the consistency with which the MnSOST-3 can be scored, we conducted an interrater reliability assessment.
In the following section, we first describe the methods used in the selection and scoring of items on the MnSOST-R and the other commonly used sexual recidivism actuarial tools. Next, we discuss the advantages of using a predictive logistic regression model to develop a risk-assessment actuarial tool. We then explain the data and methodology used to develop the MnSOST-3 and present the findings from our analyses. We conclude by exploring the implications for policy and practice.
Sex Offender Risk-Assessment Tools: Item Selection and Scoring
The most commonly used sexual recidivism actuarial instruments—Static-2002, SORAG, and MnSOST-R—have used similar procedures to select and score items. Indeed, both the SORAG and MnSOST-R were developed using a modified Nuffield (1982) weighting scheme. With the MnSOST-R, for example, items were scored by first cross-tabulating potential individual items with recidivism rates and then comparing those rates with the baseline rate. Items were given a value of 0 if they were within ±5% of the base rate. Items received a value of ±1 if the associated sexual recidivism rate was within ±5%-9% of the baseline rate. A value of ±2 was assigned to items with a distance of ±10%-14% from the baseline rate, whereas a value of ±3 was given to items with a distance of ±15%-19%. All items with a distance of ±20% or more from the baseline rate were assigned a value of ±4. Individual items were retained in the MnSOST-R if: (a) the assigned value was different from 0, (b) the item was consistent with existing theory and/or practice, (c) the association with sexual recidivism was p < .10, and (d) the items significantly improved the prediction of sexual reoffending in a hierarchical logistic regression model at the p ≤ .20 level (Epperson et al., 2003).
In developing the Static-2002, Hanson and Thornton (2003) also began selecting items by first examining the univariate relationships between sexual recidivism and variables from the following content areas: age at release, sex offense history, deviant sexual interests, range of available victims, and general criminality. Variables that had a significant association with sexual recidivism were combined within each of the five content areas and were retained in the model if they produced any improvement in predictive accuracy. To determine whether the content areas significantly contributed to the prediction of sexual recidivism, Hanson and Thornton conducted multivariate statistical analyses. Weights were then assigned to the content areas based on the size of their impact on sexual recidivism.
Predictive Logistic Regression Modeling
In their development of the SORAG, Quinsey et al. (1995) argued that multiple regression is not a suitable method to use for item selection and scoring, at least in comparison with Nuffield’s modified method, because regression coefficients are computed to maximize fit, which leads to shrinkage in variance accounted for in cross-validation samples. To be sure, overfitting is the most serious problem involved with the creation of a predictive regression model. Advances in statistical modeling, however, have led to widespread use of multiple regression in the development of prediction tools over the past few decades. Much of this work has taken place in the biomedical field, although there are several recent examples from criminal justice (Brennan, Breitenbach, & Dieterich, 2008; Lowenkamp & Whetzel, 2009; Mossman, 2007; Williams & Grant, 2006).
Using logistic regression modeling to develop the MnSOST-3 offers several advantages. First, it provides a more accurate and detailed measurement of the effects that continuous predictors (e.g., age at release or number of sex offenses) may have on sexual recidivism. Indeed, Barbaree, Blanchard, and Langton (2003) have argued that most risk-assessment instruments fail to adequately account for the significant effect that age has on recidivism risk. For example, in assigning a value of +1 to offenders below the age of 31 and a value of –1 to those 31 and older, the MnSOST-R assumes the risk of sexual recidivism is, all else being equal, the same for offenders who are 35 and 60, respectively, at the time of release. A regression coefficient, however, is able to capture the incremental variation in recidivism risk across the age spectrum.
Second, existing sexual recidivism risk-assessment instruments, including the MnSOST-R, assume that the only predictors of sexual recidivism are main effects. In their discussion of ways to improve the accuracy of risk-assessment instruments, Knight and Thornton (2007) suggest using statistical models that can account for interactions. Accordingly, using multiple logistic regression provides the opportunity to explore whether there are interaction effects that significantly predict sexual recidivism.
Finally, the predicted probabilities derived from a multiple logistic regression model yield risk estimates that are relatively easy to understand and interpret. For example, rather than receiving a value of, say, 8 on the MnSOST-R, an offender’s value on the MnSOST-3 is the predicted probability of sexual reoffense within 4 years (e.g., 8%). In addition, because the predicted probability can be easily converted into a percentile rank (e.g., a predicted probability of 8% places an offender in the 90th percentile), this approach would make it easier to assess a given offender’s risk of reoffense relative to other sex offenders.
In developing the MnSOST-3, our overarching goal is to improve the accuracy of sexual recidivism risk predictions for incarcerated Minnesota sex offenders. Moreover, because the MnSOST-3 would be used on a daily basis by MnDOC staff, we wanted to use an approach that could be integrated within the Correctional Operations Management Systems (COMS), the database maintained by the MnDOC. Yet given the widespread use of the MnSOST-R, selecting a scale construction method that yields a flexible instrument amenable for implementation outside Minnesota was also an important consideration. In using multiple logistic regression, we believe these goals have been achieved. The equations used to estimate a multiple logistic regression model and the corresponding predicted probabilities can be integrated within the MnDOC’s COMS database. Yet the scoring of items and computation of MnSOST-3 values for individual offenders can also be accomplished in a broadly accessible environment such as a Microsoft Excel spreadsheet.
In addition to the advantages associated with predictive regression modeling, revising the MnSOST-R yields several other benefits. In assessing whether the individual items on the MnSOST-R are still statistically significant predictors of sexual recidivism, the coefficients generated from the multiple logistic regression model can be used to recalibrate the weights (or values) assigned to these items. As the predictive accuracy of the MnSOST-R is comparable with the other commonly used sex offender risk-assessment instruments, this issue is an important consideration.
In this study, we examine the predictive validity of factors that, to our knowledge, have received little or no consideration in the development of sex offender risk-assessment tools. Given that the best predictor of future behavior is often past behavior, existing risk-assessment instruments have, for good reason, focused primarily on prior sexual offending characteristics. There may be other factors, however, that significantly predict sexual recidivism. By disaggregating the nonsexual criminal histories of sex offenders and examining the relationship between sexual recidivism and interventions such as postprison supervision, we identify several promising predictors of sexual reoffending.
Revising the MnSOST-R also provides the opportunity to develop a risk-assessment tool on a relatively recent sample of released sex offenders. Given the declining rates of sexual recidivism observed over the past few decades, this is an important consideration. Moreover, using a contemporary sample addresses concerns that have been raised about the inflated sexual recidivism baseline rate for the MnSOST-R development sample.
MnSOST-3 Sample
In developing the MnSOST-3, we examined 2,535 sex offenders who were drawn from two separate samples: the MnSOST-R cross-validation sample and a contemporary sample of released sex offenders. Included among the 2,535 sex offenders were 99 offenders whose only sex offense conviction(s) occurred as a juvenile, 53 “intrafamilial fondlers” (a group of incest-only offenders whose only sex offenses consisted exclusively of nonpenetration sexual contact for whom the MnSOST-R has had limited predictive accuracy), and 12 offenders whose only sex-related offense(s) involved possession of child pornography. We included these groups of offenders in the development sample because they have at least one prior sex or sex-related offense, which triggers the need to assess their risk for sexual recidivism, as evidenced by the fact that MnSOST-R assessments were administered to these offenders while they were in prison.
The MnSOST-R cross-validation sample contains 220 offenders released from Minnesota prisons during the early 1990s, whereas the contemporary sample includes 2,315 sex offenders released from Minnesota prisons between 2003 and 2006. During this 4-year period, there were 134 sex offenders who were released from prison but were not at risk to reoffend because they were civilly committed. Due to the absence of an at-risk period, we excluded the 134 civilly committed offenders. Yet, as discussed later, we completed MnSOST-3 assessments on these offenders to further assess the validity of the instrument.
We used the contemporary and MnSOST-R cross-validation samples to develop the MnSOST-3 for a few reasons. First, due to the recent decline in sexual recidivism rates and to concerns raised about the inflated baseline rate for the MnSOST-R development sample, it was necessary to select a group of sex offenders who had recently been released from prison. Second, to ensure the MnSOST-3 predicts sexual recidivism risk without constraints as accurately as the MnSOST-R, it was also necessary to include the MnSOST-R cross-validation sample. Although data were available on the sample used to develop the MnSOST-R, we did not use this sample to develop the MnSOST-3 because it oversampled for recidivists. As shown later, however, we use the MnSOST-R development sample to help cross-check the predictive validity of the MnSOST-3.
All 2,535 sex offenders in this study were scored at least once on the MnSOST-R. In some instances, offenders received more than one MnSOST-R assessment during the same sentence. For the offenders in the contemporary sample who had more than one MnSOST-R assessment during their confinement, we selected the most recent score prior to their release date. Minnesota prisoners receive a MnSOST-R assessment if they have at least one sex offense in their history for which documentation is available. Of the 2,535 offenders, 67% were incarcerated for a sex offense whereas the remaining 33% had a nonsexual index offense.
MnSOST-3 Items
To recalibrate the weights assigned to the 16 items on the MnSOST-R, we created binary measures for the dichotomous items (e.g., under any form of supervision, sex offense committed in a public place, force used, multiple acts, offended against a 13-15-year-old victim). For example, on the MnSOST-R, offenders who have committed a sex offense in a public location are given a value of “2,” whereas those who have not committed a sex offense in public are assigned a value of “0.” We modified the scoring of these items in the multiple logistic regression analyses by giving them values of either “0” or “1.” For the categorical measures on the MnSOST-R (e.g., length of sexual offending history, different age groups, stranger victims, adolescent antisocial behavior, pattern of recent alcohol or drug abuse, employment history, chemical dependency treatment, and sex offender treatment), we transformed these into dichotomous dummy variables. For example, on the MnSOST-R, offenders whose history of sexual offending is less than 1 year receive a value of “–1,” offenders with a history between 1 and 6 years are given a value of “3,” whereas those with a history in excess of 6 years are assigned a value of “0.” For the multiple logistic regression analyses, the following three variables were created for length of sexual offending history: less than 1 year (1 = yes; 0 = no), 1 to 6 years (1 = yes; 0 = no), and more than 6 years (1 = yes; 0 = no). Less than 1 year was the reference in the statistical analysis. Finally, we transformed three of the MnSOST-R items—number of sex offenses, discipline convictions, and age at release—into continuous variables.
To identify whether there are, in addition to the 16 MnSOST-R items, other factors predictive of sexual recidivism, we gathered all of the data collected by the MnDOC and maintained in COMS on the 2,535 offenders. The data included information relating to demographics, prior criminal history (e.g., total number of convictions, age at first conviction, type of offense, etc.), educational level (e.g., presence or absence of high school degree or general equivalency diploma at admission and release from prison), institutional misconduct (e.g., whether the offender received any disciplinary sanctions, the total number of disciplinary convictions, the type of institutional misconduct, etc.), gang membership (i.e., security threat group), involvement in institutional programming (e.g., anger management classes, critical thinking courses, etc.), prison visitation (e.g., whether offenders were visited in prison, the number of times they were visited in prison, number of prison visits divided by length of stay, etc.), length of stay in prison during the most recent incarceration period prior to release, total prison time served during the current sentence, type of offense (e.g., sex offense, assault, robbery, failure to register as a predatory offender, etc.), type of prison admission (e.g., new court commitment, probation violator, and supervised release violator), whether they were released to supervision and, if so, what type of supervision (e.g., regular supervision and intensive supervised release; a full list of the variables used can be obtained from the authors on request). To facilitate valid and reliable scoring of the MnSOST-3, we focused on identifying items that significantly predicted sexual recidivism, were consistent with existing theory and/or research and were relatively objective measures that are consistently available in COMS.
Measuring Sexual Recidivism
We collected sex offense reconviction data on the 2,535 sex offenders through the end of December 31, 2010. For the offenders in the contemporary sample released toward the end of 2006, 4 years was the maximum follow-up period. As logistic regression assumes that each offender has the same amount of time in which to reoffend, we limited the follow-up period to 4 years for all 2,535 offenders in this study.
We defined sexual recidivism as a reconviction for a new sex crime within 4 years of release. In operationalizing sex crimes, we included only hands-on sex offenses. In doing so, we excluded noncontact, sex-related offenses such as possession of child pornography or indecent exposure. We used reconviction as the recidivism measure because it reduces the likelihood of including false positives (i.e., cases that are not truly instances of sexual recidivism). Although rearrest is arguably a more sensitive measure of recidivism and, thus, increases the chances of identifying more true positives (i.e., actual sex reoffenses), it also increases the odds of including more false positives. In addition, information on the date(s) when the reoffense occurred was seldom available in the rearrest data but was consistently present in the conviction data. Offense date information was necessary to exclude cases of “pseudo recidivism,” as there were a handful of offenders who returned to prison for a “new” sex offense that had been committed prior to the beginning of their previous prison term, for example, an offender who was incarcerated from 2002 to 2005 is reconvicted in 2008 for an offense committed in 1998. In these instances, we did not consider the reconviction to be a recidivism event.
We obtained reconviction data from both the Minnesota Bureau of Criminal Apprehension (BCA) and the Federal Bureau of Investigation (FBI). Whereas the BCA data include only convictions that occur in Minnesota, the FBI criminal history data contain information on convictions that took place outside Minnesota. As with any recidivism study, official criminal history data will likely underestimate the actual extent to which the sex offenders examined here recidivated.
The recidivism data revealed that 102 (4.0%) of the 2,535 offenders had been reconvicted of a new sex offense within 4 years of their release from prison. The 4-year sexual reconviction rate was 12.3% in the MnSOST-R validation sample and 3.3% in the contemporary sample.
Developing the MnSOST-3
Existing research has identified three types of validity important for predictive regression modeling: apparent, internal, and external (Harrell, Lee, & Mark, 1996). Apparent validity refers to performance on the sample used to develop the prediction model. In examining the performance of the model on the population underlying the sample, internal validity is concerned with whether the model can be reproduced. External validity, meanwhile, focuses on the generalizability of the model by looking at how well it performs on a related, but slightly different, population. Applied to the present study, apparent validity addresses the performance of the MnSOST-3 on the sample used to develop it. While internal validity tells us how well the MnSOST-3 would likely perform on other samples of Minnesota sex offenders, external validity would assess MnSOST-3 performance on non-Minnesota sex offender populations. In this study, we focus on apparent and internal validity.
To assess apparent validity, statistics such as ROC curves may be estimated on the development sample to determine the predictive accuracy of the model. As for internal validity, three main methods have been developed to determine the reproducibility of a prediction model. The split-population, or data splitting, method has been the most popular approach in the development of sexual recidivism risk-assessment tools. With this method, a portion (e.g., one half or two thirds) of the sample is used to develop the prediction model. The developed model is then applied to the remaining portion to test the internal validity of the model. Despite its popularity, this approach wastes data (Harrell et al., 1996).
Cross-validation, or k-fold validation, is more efficient than the split-population approach because it involves repeated data splitting. Research has demonstrated, however, that bootstrap resampling is the most efficient internal validation technique (Steyerberg, Bleeker, Moll, Grobbee, & Moons, 2003; Steyerberg et al., 2001). Developed by Efron (1979), bootstrap resampling involves pulling many smaller samples from the overall sample to generate estimates of error. In doing so, it makes full use of the data set for developing and validating models while also providing error estimates that have relatively low variability and minimal bias (Harrell, 2001; Steyerberg et al., 2001). As discussed shortly, we use bootstrap resampling not only to refine our selection of items for the MnSOST-3 but also to calculate estimates of optimism due to overfitting.
Selection of Predictors
Stepwise variable selection procedures are frequently used in the development of prediction models. Although there are a variety of stepwise methods available, the two main approaches are forward selection and backward selection. Under forward selection, a variable does not enter the model unless it is statistically significant at a predetermined level (e.g., α = .05). With backward selection, a variable is removed from the model if its level of statistical significance exceeds the established alpha level. Stepwise routines have been criticized on a number of grounds, especially for producing biased regression coefficients (Tibshirani, 1996) and for capitalizing on chance features of the data (Judd, McClelland, & Ryan, 2008). Still, because backward selection is generally preferable to forward selection (Harrell et al., 1996), it is the approach we use here.
We conducted multiple logistic regression analyses on the offenders in the development sample to identify significant predictors of sexual recidivism. In addition to including the 16 items from the MnSOST-R, we examined a host of variables derived from COMS data. Using an alpha of .10, we examined more than 100 potential predictors. Following Efron and Gong (1983), we added predictors one at a time until no further single addition achieved significance level a = .10. The main effects model showed there were 10 predictors that had a significant effect (p < .10) on sexual recidivism (see “Main Effects” model Table 1). Among the 10 significant predictors, there were 45 possible two-way interaction effects for which we tested. Using an alpha of .05, we found six interaction effects that were statistically significant (see “Full” model in Table 1).
Multiple Logistic Regression Models for MnSOST-3
In an effort to develop a more parsimonious prediction model, we used bootstrap resampling to refine the selection of predictors included in the MnSOST-3. More specifically, we retained only the predictors that were consistently significant in the bootstrap samples. Although the bootstrap variable selection method has been discussed in the literature (Efron & Gong, 1983), there is no widely accepted “rule of thumb” threshold for retaining or removing predictors. Zhao (1998) recommended using at least a 40% cutoff (i.e., predictors are retained in at least 40% of the bootstrap samples), whereas Cooke and colleagues (2009) used a 60% threshold. Here we use a relatively high threshold (70%) to determine whether predictors should be included in the model.
After estimating 1,000 bootstrap samples from our 16-predictor model, there were five predictors (one main effect and four interaction terms) that were statistically significant at the .05 level in less than 70% of the samples. After removing these five predictors, we estimated another 1,000 bootstrap samples. The results show the remaining 11 predictors were statistically significant at the .05 level in at least 70% of the bootstrap samples.
Discussion of multiple logistic regression results
The results from the reduced 11-item model are presented in Table 1, whereas a description of how these predictors were coded can be found in the appendix. Of the nine main effects in the model, three are items derived from the MnSOST-R (public place, completion of chemical dependency and sex offender treatment, and age at release). Although the predatory offense sentences item is somewhat similar to the number of sex/sex-related convictions item on the MnSOST-R, it is arguably a much broader measure of sexual offending history. Moreover, even among the three items derived directly from the MnSOST-R, it is worth noting that they are measured differently for the MnSOST-3. For example, public place is a dichotomous measure (as opposed to a categorical item), completion of both chemical dependency and sex offender treatment is a dichotomous measure that merges these two categorical items on the MnSOST-R, whereas age at release is a continuous, rather than a dichotomous, measure. Although a visual inspection of the residuals did not reveal any signs of nonlinearity for either age at release or number of predatory offenses, we tested for nonlinearity by estimating a model with a logarithmic transformation of both predictors. Neither coefficient, however, was statistically significant at the .10 level, which suggests that recidivism or, more specifically, the logit of the recidivism measure used here is linearly related to age at release and number of predatory offenses.
The results presented in Table 1 are generally consistent with existing research. We found, for example, that the risk of sexual recidivism was significantly less for offenders who completed both chemical dependency and sex offender treatment in prison, a finding that dovetails with prior research on offenders from Minnesota (Duwe, 2010; Duwe & Goldman, 2009) and in general (Lösel & Schmucker, 2005; Mitchell, Wilson, & MacKenzie, 2007). Similar to prior research on sex offenders (Hanson & Morton-Bourgon, 2004) and, more narrowly, those from Minnesota (Epperson et al., 2003), the risk was significantly greater for younger sex offenders and those with more prior predatory offenses, more predatory offenses that involved male victims, and a history of committing a sex-related offense in a public location.
The number of felony sentences a sex offender had significantly increased the odds of reoffending sexually. We also found that the risk of sexual recidivism was significantly greater for offenders with convictions for violations of orders for protection (VOFP), stalking, or harassment. In addition to measuring impulsivity, this measure may tap into rule noncompliance and intimacy deficits, which have been found to be salient predictors in previous research (Hanson & Morton-Bourgon, 2004). Given the significant interaction between VOFP and age at release, we found that the risk of sexual recidivism was especially pronounced for younger offenders who had VOFP, stalking, or harassment sentences. The results showed that offenders with disorderly conduct convictions in the 3 years preceding their commitment to prison had a significantly elevated risk of recidivism. The risk was greater, however, for older offenders with recent disorderly conduct convictions due to the significant interaction between age at release and disorderly conduct convictions in the 3 years prior to commitment to prison.
We found that offenders who were released to no supervision (because their sentence had expired) were significantly more likely to reoffend sexually than those who were released to some form of postprison supervision. Offenders were typically released to no supervision if they had multiple stays in prison as a release violator or had accumulated substantial extended incarceration disciplinary time stemming from institutional misconduct or failure to complete a sex offender treatment directive. The finding regarding the absence of postrelease supervision is consistent with recent research on offenders in general, which has shown that prisoners who “max out” are significantly more likely to reoffend (Ostermann, 2009; Schlager & Robbins, 2008). Moreover, in their validation study of the MnSOST-R and Static-99, Boccaccini and colleagues (2009) found that the risk of sexual recidivism was significantly greater for sex offenders who were discharged (i.e., released to no supervision).
Assessing Predictive Accuracy
The validity, or accuracy, of a prediction model is often assessed by examining its predictive discrimination and calibration (Harrell et al., 1996; Steyerberg, 2009). With the MnSOST-3, predictive discrimination looks at how well it separates recidivists from nonrecidivists. Calibration, however, examines the extent to which there is agreement between the predicted probabilities of recidivism and the actual rates of reoffending. In light of the recent decline in sexual recidivism, one of the concerns raised about tools such as the MnSOST-R is that, despite having good predictive discrimination, it overestimates the risk of sexual recidivism (Wollert, 2006). With a well-calibrated model, however, the predicted probabilities closely correspond with the observed recidivism rates. In the ensuing sections, we examine predictive discrimination of the MnSOST-3 before moving on to an assessment of its calibration with actual rates of sexual recidivism.
Predictive discrimination
We first analyzed the apparent predictive discrimination for the MnSOST-3 by estimating ROC curves for the predicted probabilities derived from the main effects, full, and reduced models (the MnSOST-3 is the reduced model). The results showed an AUC of 0.819 for the main effects model, 0.835 for the full 16-item model and 0.821 for the reduced 11-item model. To determine the extent to which these ROC curves overestimate predictive discrimination due to overfitting, we estimated optimism values for each model based on the method described by Efron and Tibshirani (1993).
First, as shown above, we obtained upwardly biased (i.e., overly optimistic) AUC estimates of apparent predictive discrimination for the three models based on the full sex offender sample (N = 2,535) examined here. Second, we drew a bootstrap sample from this full offender sample and then obtained maximum likelihood estimates of beta weights based on the bootstrap sample. Third, we calculated an AUC value for that bootstrap sample. Fourth, we applied the beta weights developed from the bootstrap model to the full offender sample and obtained AUC values for these results. Fifth, we generated optimism estimates by calculating the differences in AUC values obtained during the third and fourth steps. Sixth, we repeated Steps 2 through 5 200 times, keeping track of the differences obtained at each iteration. Seventh, we used the average of the 200 differences generated during Step 6 as our “bootstrap estimate” of optimism for each model. Finally, we calculated an optimism-corrected AUC estimate for each model by subtracting the optimism average obtained during the seventh step from the apparent AUC value produced during the first step.
As shown in Table 2, the optimism values were .036 for the main effects model, .055 for the full 16-item model, and .025 for the reduced 11-item model. As a result, the optimism-corrected AUC values are .783 for the main effects model, .780 for the 16-item model, and .796 for the reduced 11-item model. The results suggest that removing five marginally important predictors (one main effect and four interaction term) to create a more parsimonious model produced a more stable model. The optimism-corrected AUC value of .796 for the MnSOST-3 provides an unbiased estimate of internal validity that adjusts for overfitting. It may also represent an upper-level estimate as to what may be expected in validation studies on non-Minnesota sex offenders.
Optimism-Corrected AUC Estimates
In examining the predictive discrimination of the MnSOST-3, it is worth comparing its performance not only among several different samples but also with the MnSOST-R. For the offenders released from prison between 2003 and 2006 (contemporary sample), the AUC was .824 compared with .550 for the MnSOST-R. For the cross-validation sample, the MnSOST-3 had an AUC value of .792 in comparison with .758 for the MnSOST-R. As noted by Epperson et al. (2003), the MnSOST-R development sample contained 256 sex offenders released from prison during the late 1980s and early 1990s. Yet because the data needed to fully score the MnSOST-3 were unavailable for 13 offenders in the MnSOST-R development sample, we limited our analyses to the remaining 243 offenders. The results in Table 3 show the AUC for the MnSOST-3 (.752) was slightly lower than that of the MnSOST-R (.758). The AUC values for the MnSOST-R development and cross-validation samples are not the same as those reported by Epperson and colleagues (2003) due to the different definition of sexual recidivism we used here; that is, Epperson et al. (2003) defined sexual recidivism as a new sex offense rearrest within 6 years. Overall, the findings suggest that while the MnSOST-3 has higher predictive accuracy for offenders recently released from prison who are subject to significant external constraints, it performs no worse than the MnSOST-R for offenders released from prison more than 20 years ago who were exposed to relatively few external constraints (i.e., the MnSOST-R development and cross-validation samples).
MnSOST-3 and MnSOST-R Predictive Discrimination Across Samples
In Table 3, we take a closer look at the predictive discrimination of the MnSOST-3 on the MnSOST-R development sample. Epperson and colleagues (2003) distinguished the offenders in the MnSOST-R development sample on the basis of whether they were rapists or molesters. Of the 243 offenders from the MnSOST-R development sample who were examined in this study, 140 had been classified as rapists and the other 103 as molesters. The AUC values for the MnSOST-3 were, compared with those for the MnSOST-R, lower for rapists (.733) but higher for child molesters (.781).
In Table 4, we present additional performance measures for the MnSOST-3. We see that the top 1% of offenders had a MnSOST-3 value of 40% or higher. The top 5% had a value of 13.5% or higher, the top 10% had a value of 8% or higher, and the top 15% had a value of 5.5% or higher. Among the 251 offenders with a MnSOST-3 value of 8% or higher (the top 10%), there were 55 who were recidivists, which amounts to a reconviction rate of 22%. Considering the sexual recidivism rate was 4% for the sample, the reconviction rate for the top 10% is 5.4 times greater than the overall rate. For every true positive (i.e., recidivist) identified at the 8% cut point, there were nearly four false positives (nonrecidivists). As there were a total of 102 recidivists, the 55 recidivists with MnSOST-3 values of 8% or higher accounted for 54% (capture rate) of the total recidivists.
MnSOST-3 Performance Metrics
As noted earlier, we did not examine 134 sex offenders released from Minnesota prisons between 2003 and 2006 because they were civilly committed. Still, to further test the validity of the MnSOST-3, we generated MnSOST-3 values for these offenders. The average MnSOST-3 value for the 134 civilly committed offenders was 10.5%, which is 2.6 times higher than the overall average. One of the criteria for civil commitment or sexually violent predator (SVP) decisions is the determination that the offender is either “substantially likely” or “more likely than not” to reoffend sexually, which roughly translates into a probability of 51% or higher. Only four of the offenders (3%), however, had a MnSOST-3 value greater than 50%, and only 9 (7%) had an upper 95% confidence interval (CI) that exceeded 50%. Moreover, as noted above, less than 1% of the 2,535 offenders had a MnSOST-3 value that exceeded 50%, which is substantially lower than the rate (7%) at which Minnesota sex offenders have been civilly committed over the past few decades. These findings should not be considered too surprising, however, given that a recent report on Minnesota’s civil commitment program found that county of commitment, which is unrelated to sexual recidivism risk, was a significant factor in determining whether sex offenders were civilly committed (civil commitment decisions are finalized at the county level in Minnesota; Minnesota Office of the Legislative Auditor, 2011).
Calibration
In Table 5, we present data on the distribution of MnSOST-3 values and the corresponding 95% CIs. 1 Although the predicted probabilities from a logistic regression model can vary from 0% to 100%, the MnSOST-3 values for the 2,535 offenders ranged from a low of 0% to a high of 98%. Only 0.7% of the sample, or 19 offenders, had a MnSOST-3 value of 50% or higher, whereas a little more than 1% (N = 34) had an upper CI at or above 50%. Two percent of the sample had a MnSOST-3 value of 25% or higher, whereas 7% had a value of 10% or higher. Nearly half of the sample (46%) had a MnSOST-3 value below 2%, whereas a little more than one fifth (21%) had a value below 1%. Overall, 78% had a value below 4%, which was the sexual recidivism rate observed among the 2,535 sex offenders.
Distribution of MnSOST-3 Values and 95% Confidence Intervals
We first assessed the calibration of the MnSOST-3 by generating a Lowess plot, which is shown in Figure 1. Whereas the dotted line represents actual rates of sexual reoffending among the 2,535 sex offenders, the solid line denotes the predicted probabilities (i.e., MnSOST-3 values) derived from the logistic regression model. The plot indicates a tight correspondence between actual recidivism rates and predicted probabilities for offenders with MnSOST-3 values less than 40%, which suggests the MnSOST-3 appears to be well calibrated with actual sexual recidivism rates for roughly 99% of the sample (See Table 6). The two lines begin to diverge, however, when we reach the 40% mark. As the solid line is below the dotted line for MnSOST-3 values greater than 40%, the plot suggests the MnSOST-3 overestimates sexual recidivism risk for the top 1% of offenders in the sample.

Lowess plot for MnSOST-3 values and observed sexual recidivism
Calibration Between Actual Recidivism Rates and MnSOST-3 Values
To further assess the calibration of the MnSOST-3, we estimated a Hosmer-Lemeshow test in which MnSOST-3 values were regressed on sexual recidivism. The test was statistically significant at the .05 level (χ2 = 30.520; df = 8; p = <.001), which suggests the MnSOST-3 is not well calibrated with the observed rates of sexual recidivism in the sample. Given the calibration pattern observed in the Lowess plot, we reestimated a Hosmer-Lemeshow test for the bottom 99% of the sample (i.e., the 25 offenders with the highest MnSOST-3 values were excluded). This time, the test was not statistically significant at the .05 level (χ2 = 14.291; df = 8; p = .074), which suggests the MnSOST-3 has adequate calibration for the bottom 99% (i.e., sex offenders with MnSOST-3 values below 40%).
In an effort to produce a better calibrated instrument, we estimated a penalized maximum likelihood model (Firth, 1993). Although this estimation procedure does not affect the predictive discrimination of the model, it “shrinks” the regression coefficients so as to achieve greater calibration. However, because the lack of calibration was isolated to a relatively small subset of the highest risk cases, the penalized maximum likelihood procedure did not produce a significant improvement in the overall fit of the model. Indeed, after shrinking the regression coefficients, the Hosmer-Lemeshow test was still significant at the .05 level (χ2 = 27.965; df = 8; p = <.001).
Despite the lack of calibration at the top end of the model, we anticipate the MnSOST-3 could still be useful in identifying the highest risk offenders. Among the 19 sex offenders with MnSOST-3 values greater than 50%, the average MnSOST-3 value was 67% and the sexual recidivism rate was 53% (10 recidivated). Despite the overestimation of risk for these 19 offenders, the sexual recidivism rate was still 53%, which suggests they were “more likely than not” to reoffend.
Reliability
We examined the reliability of scoring the MnSOST-3 by conducting an interrater reliability assessment. We randomly selected 20 sex offenders who were released from Minnesota prisons between January 1 and June 30, 2010, on whom a MnSOST-R had been scored. Following a 4-hr training session, eight assessors in the MnDOC’s Risk Assessment and Community Notification (RACN) Unit each scored the selected cases on the Microsoft Excel application of the MnSOST-3 over a 5-day period. We created an age-at-release calculator on the Excel spreadsheet so as to facilitate the valid and reliable scoring of data for this item. The eight raters in this study had, on average, 7 years of experience in scoring sex offender risk-assessment instruments. We analyzed the degree of interrater reliability among the eight assessors for these 20 cases by estimating intraclass correlation coefficients (ICC) using a two-way random effects model.
The results showed that the singular ICC for the eight raters was .826 for both consistency and absolute agreement of ratings for the 20 cases (see Table 7). The item-level data show that ratings were most consistent for age at release, which may be due in part to the creation of a calculator for this item. The ratings were least consistent, however, for VOFP/stalking/harassment sentences. Although most (6) of the items on the MnSOST-3 are continuous (as opposed to binary or dichotomous) measures, which presumably increases the margin for error in scoring items, the items on this instrument are largely objective measures. Overall, the findings suggest the MnSOST-3 has an adequate degree of reliability.
MnSOST-3 Interrater Reliability Assessment
Note: ICC = intraclass correlation coefficient. All coefficients were statistically significant at the .01 level.
Discussion
The main objective in developing the MnSOST-3 was to further enhance the predictive validity of risk-assessment decisions for Minnesota sex offenders. The evidence presented here suggests the MnSOST-3 predicts sexual recidivism with a relatively high degree of accuracy. Although its predictive discrimination was greatest for recently released sex offenders who were subject to contemporary risk-management practices, its performance was comparable with the MnSOST-R for those released from prison more than 20 years ago who were not subject to these practices. The findings further show the MnSOST-3 is well calibrated with actual recidivism rates for all but the highest risk offenders. Although estimating a penalized maximum likelihood model did not improve the overall calibration of the model, the MnSOST-3 may still be a useful tool in helping identify high-risk offenders whose sexual recidivism risk exceeds 50%. Finally, the results from the interrater reliability assessment indicate the MnSOST-3 can be scored with a sufficient level of consistency across raters.
Unlike some of the other commonly used sex offender risk-assessment tools, we did not attempt to specifically develop a widely applicable instrument. As a result, the relatively high predictive accuracy of the MnSOST-3 reported here may not generalize to sex offender populations in other jurisdictions. After all, Minnesota is, in several potentially important ways, different from the rest of the United States. Even though Minnesota is, compared with the other 49 states, generally in the middle of the pack for population size and crime rate, it has the second lowest incarceration rate in the nation. As Minnesota relies more heavily on local sanctions (e.g., jail and community supervision), prison beds are generally reserved for offenders who have committed very serious offenses and/or have lengthy criminal histories.
Use of the MnSOST-3 outside of Minnesota may also be limited by the level of data needed to accurately score the instrument. In particular, given that six of the nine items relate, in some form or another, to criminal history (both sexual and nonsexual), access to complete and accurate criminal history data is imperative. The MnSOST-3 would therefore have diminished value for agencies that have limited access to these data or in jurisdictions where the criminal history data are less than complete. In addition, although we anticipate the items included on the MnSOST-3 would likely be significant predictors of sexual recidivism for populations of non-Minnesota sex offenders, the weights (i.e., coefficient values) applied to these items are less likely to generalize to other populations.
These limitations notwithstanding, the relatively high optimism-corrected AUC for the MnSOST-3 suggests it still may be among the better risk scales even if there is reduction in its predictive accuracy for other sex offender populations. Nevertheless, determining the extent to which the MnSOST-3 is generalizable to non-Minnesota sex offender populations ultimately depends on the completion of validation studies. Accordingly, we suggest that jurisdictions outside Minnesota consider using the MnSOST-3 alongside externally validated risk-assessment instruments (e.g., Static, SORAG, MnSOST-R, etc.) until results from validation studies are available.
Given that our sample contains prisoners whose index offenses included both sexual and nonsexual crimes, the MnSOST-3 can be used to assess postrelease sexual recidivism risk for offenders who have at least one documented sex offense in their history regardless of whether their index offense is a sex crime. The sample also included 53 intrafamilial fondlers, 99 offenders whose only sex offense conviction(s) occurred as a juvenile, and 12 child pornography offenders—a group that has expanded in size over the past decade (Wolak, Finkelhor, & Mitchell, 2011). Due to these relatively small numbers, we recommend exercising a great deal of caution in using the MnSOST-3 on sex offenders who fall into one of these three groups. Again, we anticipate that external validation studies will help reveal the extent to which the MnSOST-3 has predictive validity for these groups of offenders.
In an effort to facilitate the completion of validation studies and the use of the MnSOST-3 in other jurisdictions, we have provided descriptions of how the nine individual items were coded in the appendix. Moreover, we have prepared a more detailed coding manual and have developed the MnSOST-3 so that it can be scored in a Microsoft Excel spreadsheet. Both the coding manual and the Microsoft Excel application of the MnSOST-3 can be obtained from the authors.
Regardless of its generalizability to other jurisdictions, we believe the development of the MnSOST-3 holds several implications for sex offender risk assessment. Most notably, the use of logistic regression modeling may be a promising one to consider in the enhancement of existing instruments or the creation of new ones. With this method, it is possible to examine interaction effects and achieve more accurate, detailed measurement of significant continuous predictors such as age at release or number of prior sex/predatory offenses. Moreover, this approach yields a relatively straightforward prediction of sexual recidivism, which could be particularly helpful for those jurisdictions that are required to make civil commitment (sexually violent predator) decisions. The MnSOST-3 value offers a directly applicable reoffense probability, albeit for a 4-year follow-up period, and the 95% CIs provide a range in which the true risk of sexual recidivism likely falls. It is important to bear in mind, however, that civil commitment laws generally consider sexual recidivism risk over the course of an offender’s lifetime, not within a limited follow-up period. Therefore, although most sex offender recidivists in Minnesota tend to sexually reoffend within the first 3 to 5 years following their release from prison (Minnesota Department of Corrections, 2007), the relatively brief follow-up period we used is the biggest limitation of this study. Still, it is worth noting that the regression modeling approach is flexible enough so that the MnSOST-3 can be updated in the future with longer follow-up data.
The development of the MnSOST-3 also suggests there are other factors, unrelated to sexual criminal history, which may be worth considering in future research on sexual recidivism. Nonsex offenses such as violations of orders for protection may tap into intimacy deficits, a factor that has been found to be a significant predictor of sexual recidivism. In addition, existing actuarials have not accounted for the impact postprison supervision (or the lack thereof) may have on reoffending.
The development and implementation of risk-management systems typically include the collection of a wide variety of data so as to facilitate determinations of risk. The use of more sophisticated statistical techniques that more fully exploit the information routinely available on offenders will likely yield more accurate risk-assessment tools. Indeed, the use of predictive logistic regression modeling may well represent the next generation of risk assessment. Yet other techniques available such as random forests (Berk, Sherman, Barnes, Kurtz, & Ahlman, 2009) and classification and regression trees (Breiman, Friedman, Olshen, & Stone, 1984; Monahan et al., 2006) may also further advance the development of more accurate risk-assessment tools, not just for sex offenders but for offenders in general.
Footnotes
Appendix
The following lists the nine items on the MnSOST-3 and describes how they were measured. The coding manual for the MnSOST-3, which provides a more complete description of these items, can be obtained from the authors.
Acknowledgements
The authors wish to thank Bill Donnay for his support of this project as well as Jeremy Britzius, Dwight Close, Brian Heinsohn, Donn Nelson, Kevin Nelson, Jeff Olson, and Jack Rusinoff for their work in scoring MnSOST-3 assessments. The manuscript and, more specifically, the MnSOST-3 instrument benefitted significantly from the helpful comments provided by Associate Editor Michael Seto, the three anonymous reviewers, Doug Epperson, Robin Goldman, Steve Huot, and Jim Kaul.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
