Abstract
This study evaluated the validity of the Static-99 and Static-99R in assessing sexual recidivism in Switzerland, based on a sample of 142 male sex offenders. Both tools showed predictive validity, but the Static-99R had better discrimination (OR = 1.82, AUC = .81) and calibration (Brier = .078, P/E = 0.96) than the Static-99. A cut score of four on the Static-99R maximized sensitivity (92.9%) and specificity (60.2%). However, although most offenders (98.7%) with a score < 4 did not commit sexual offenses in the 5-year follow-up period, only one in five (20.3%) offenders with a score ≥ 4 actually recidivated. Furthermore, the predicted number of recidivists in the well above average risk category (Static-99R ≥ 6) was 24% higher than expected in routine samples. The results suggest that the Static-99R may be a useful screening tool to identify low-risk individuals but offenders with scores ≥ 4 should be subjected to a more thorough assessment.
The ability to accurately estimate the likelihood of future criminal behavior is of great importance to clinicians, policy makers, and the public (Baldwin, 2015). Having started with Burgess’ (1928) efforts to differentiate between low-, moderate-, and high-risk offenders, development of actuarial risk assessment instruments to assess recidivism risk in violent and sexual offenders expanded in the 1970s (e.g., Cocozza & Steadman, 1974), and a number of instruments have been developed (Singh, Grann, & Fazel, 2011). Such instruments have proved to be more accurate assessments of recidivism risk than those based on clinical judgment alone (Ægisdóttir et al., 2006; Andrews, Bonta, & Wormith, 2006; Hanson, Bourgon, Helmus, & Hodgson, 2009) and now play a central role in the criminal justice system, helping in the decision-making process pertaining to sentencing, prison classification, offender treatment, case management, release, and community supervision (Fazel, Singh, Doll, & Grann, 2012). The present study evaluated the validity of the Static-99 (Hanson & Thornton, 1999) and its revised version, the Static-99R (Helmus, Thornton, Hanson, & Babchishin, 2012), in assessing sexual recidivism among sex offenders in Switzerland.
The Static-99 is among the most commonly used instruments to estimate sexual recidivism risk (Hanson, Lunetta, Phenix, Neeley, & Epperson, 2014; Helmus, Hanson, Thornton, Babchishin, & Harris, 2012). It was developed based on a sample of 1,301 offenders from Canada and the United Kingdom for use with male sex offenders who are at least 18 years of age at time of release. By 2009, the Static-99 had already been cross-validated in 63 samples (Hanson & Morton-Bourgon, 2009). Evidence of its predictive validity has also been observed in German-speaking countries, including Austria, Germany, and Switzerland (Endrass, Urbaniok, Held, Vetter, & Rossegger, 2009; Rettenberger, Matthes, Boer, & Eher, 2010; Stadtland et al., 2006).
Because previous research has shown that older sex offenders have a lower recidivism risk than younger ones (e.g., Barbaree & Blanchard, 2008) and the original Static-99 did not adequately consider age at release (Helmus, Thornton et al., 2012), in 2012 the Static-99 was revised to include a new age weighting. The original “age-at-release” variable, previously dichotomously coded, was expanded into four coding categories. In its development study (Helmus, Thornton et al., 2012), this revised version, named Static-99R, revealed a slightly better predictive validity overall and a substantially better fit for older offenders than the recidivism estimates from the Static-99. In addition, the Static-99R was found to have good discrimination in a study including 4,037 offenders from five countries (odds ratio [OR] = 1.38, 95% [confidence interval (CI)] = [1.29, 1.48]; Hanson, Babchishin, Helmus, & Thornton, 2013) and in a meta-analysis including 8,055 offenders from eight countries (area under the curve [AUC] = .70, 95% CI [.66, .73]; Helmus, Hanson et al., 2012). However, the predicted sexual recidivism rates of the tool had large and significant variability across samples (Helmus, Hanson et al., 2012).
To our knowledge, only a single study in Austria (Rettenberger, Haubner-Maclean, & Eher, 2013) has examined whether the Static-99R has better predictive validity than the original Static-99 in a German-speaking country. Contrary to the findings mentioned above, this study found that the Static-99 discriminated between sexual recidivists and nonrecidivists better than the Static-99R in a prison sample, but both instruments were well calibrated. Given the lack of research on the Static-99R in German-speaking countries, it is necessary to examine the predictive validity of this revised version in countries such as Switzerland. This is especially relevant considering that a prior validation of the Static-99 in Switzerland (see Endrass et al., 2009) suggested that, although yielding a good level of discrimination (AUC = .76), the recidivism rates reported by Harris, Phenix, Thornton, and Hanson (2003) for routine samples and the recidivism rates observed in that study did not exactly concur. In that study, which followed 69 violent and sex offenders for 5 years after release, the expected sexual recidivism rates for the Static-99 were higher than the recidivism rates observed in the Swiss sample, especially in the lower risk categories.
Most research on the predictive validity of risk assessment tools has focused on discrimination (ability to differentiate recidivists from nonrecidivists), but calibration (accurate quantification of recidivism risk) is essential for risk communication (Cook, 2008). Although it is important to accurately classify offenders according to nominal labels (e.g., low, moderate, and high risk), these are often interpreted inconsistently by policy makers and practitioners (Helmus, Hanson et al., 2012). Therefore, it is also necessary to quantify recidivism risk (e.g., 25% of the offenders in the well above average risk category of the Static-99R are expected to recidivate) to evaluate offenders’ dangerousness. For this reason, the expected recidivism rates provided in the tools’ norms must be similar to the recidivism rates in the current sample to justify the use of an actuarial risk assessment tool like the Static-99 or Static-99R for medico-legal decision making (Rossegger et al., 2013). If the recidivism rates provided in the tool’s norms substantially differed from the recidivism rates in the cross-validation sample, the communication of risk would be inaccurate.
The present study investigates the ability of the Static-99 and Static-99R to assess sexual recidivism risk among sex offenders in Switzerland, and compares the performance of these two tools utilizing complementary measures of discrimination and calibration. It was hypothesized that: (H1) the Static-99 and Static-99R would evidence adequate discrimination and calibration in Switzerland, and (H2) the Static-99R would have better predictive validity than the Static-99. The results are important for the criminal justice system as they shed light on whether the Static-99 and Static-99R can effectively be used for the risk assessment of sex offenders in Switzerland, and may help practitioners to decide which of these instruments to use in their routine forensic practice. In addition, the results may be useful for research purposes, because they may help to generalize prior findings on the predictive validity of the Static-99 and Static-99R, and be integrated into future revisions of the tools’ norms.
Method
Sampling and Procedure
The sample was composed of violent and sex offenders supervised by the criminal justice system of the Canton of Zurich, Switzerland. It included a cohort of offenders registered by the Office of Corrections in August 2000, who were convicted with a minimum sentence length of 10 months or had received a court-mandated intervention. The sample also included all offenders who started treatment in the Department of Mental Health services (DMH) of the Office of Corrections of the Canton of Zurich between January 1997 and December 2009. From these two cohorts, 1 only adult male offenders convicted of a contact sexual offense, who had been discharged into the community, and for whom enough information to score the entire Static-99 was available were included. After excluding offenders who did not reach a 5-year period in the community after release—because of death (n = 5), deportation or emigration (n = 18), or because they were imprisoned for a nonsexual offense and did not have the opportunity to spend 5 years in the community by the time their criminal records were last reviewed (n = 4)—a final study sample of 142 offenders remained. Although small, the sample includes two total cohorts under the supervision of the Office of Corrections, who were discharged from prisons, secure facilities, and outpatient clinics. In addition, there were no potential selection biases concerning the process of choosing the participants. Therefore, the sample is likely to be representative of the male sex offender population in the Canton of Zurich, having both internal and external validity.
Information used to code the Static-99 was obtained exclusively from official records, which includes judicial verdicts, criminal records, correctional files, clinical files and forensic expert opinions (when available), and were last reviewed at September 2013. The Static-99 was coded by Master level psychologists working at the DMH, following the coding rules of the tool (Harris et al., 2003). To avoid bias in the ratings, the raters were blind to the offenders’ recidivism outcomes. The age-at-release item was posteriorly adjusted using the new age weights to transform the Static-99 into a Static-99R.
This study exclusively uses data from criminal justice files that belong to the Office of Corrections of the Canton of Zurich and are not protected under medical privacy laws. All analyses were performed on anonymized data. Therefore, there was no need to submit the study to an external ethics committee. We report how we determined our sample size, all data exclusions, manipulations, and measures in the study.
Variables
Static-99 and Static-99R
The Static-99 comprises 10 items found to be associated with sexual recidivism: (a) age at release, (b) ever lived with a lover (for at least 2 years), (c) index nonsexual violence, (d) prior nonsexual violence, (e) prior sex offenses (charges and convictions), (f) four or more prior sentencing dates (excluding index offense), (g) any convictions for non-contact sex offenses, (h) any unrelated victims, (i) any stranger victims, and (j) any male victims. The Static-99R includes the same 10 items but with an updated weight for age at release. Each item is dichotomously coded as 0 or 1, except the “prior sex offenses” item which is coded 0 to 3. Age at release is coded dichotomously in the Static-99 (1, 18-24.9; 0, ≥ 25 years) and in four categories in the Static-99R (1, 18-34.9; 0, 35-39.9; –1, 40-59.9; –3, ≥ 60 years). Item scores are then summed resulting in a total score ranging from 0 to a possible maximum of 12 for the Static-99 and from −3 to 12 for the Static-99R, where higher scores indicate higher risk of sexual recidivism.
Based on the total score of the Static-99, offenders are classified into one of four risk categories: low risk (L; ≤ 1), low-moderate risk (L-M; 2-3), moderate-high risk (M-H; 4-5), and high risk (H; ≥ 6). For the Static-99R, a revised classification scheme was developed in order to improve the construct validity of the risk categories (Hanson, Babchishin, Helmus, Thornton, & Phenix, 2017), where offenders are classified into one of five groups: very low risk (I; −3 to −2), below average risk (II; –1-0), average risk (II; 1-3), above average risk (IVa; 4–5), and well above average risk (IVb; ≥ 6). The Static-99 and Static-99R then provide expected sexual recidivism rates for different reference groups and for different follow-up periods. The estimates provided for routine samples (i.e., samples representative of typical sex offenders in correctional systems; see Harris et al., 2003) in a 5-year period were utilized because they most closely match the sample being assessed. For the Static-99, the 2009 recidivism rates were utilized (Helmus, Hanson, & Thornton, 2009), while the Static-99R employs the norms from 2016 (Phenix, Helmus, & Hanson, 2016). These norms, as well as the descriptive information of the samples used to develop them, are available at the tool developers’ website (Static-99, n.d.).
Interrater reliability on a random subsample of 15 cases coded by two raters revealed excellent levels of agreement in the Static-99 total score (ICC [2,1] = .96, p < .001; excellent is considered > .75; Cicchetti, 2001). However, the internal consistency of the scale was poor (α = .48; poor is considered < .60; George & Mallery, 2010) which is not surprising considering that Static-99R items have been found to conform to a three-factor structure (Brouillette-Alarie, Babchishin, Hanson, & Helmus, 2016).
Recidivism
As we were interested in the ability of the tools in identifying the most dangerous sex offenders, recidivism was defined as a new charge or conviction for a contact sexual offense (offenses involving an identifiable child or non-consenting adult victim) occurring within a fixed 5-year post-release follow-up period. Time spent incarcerated for a nonsexual offense did not count as “time-at-risk.” The recidivism variable was coded dichotomously (0, nonrecidivist; 1, recidivist).
Analyses
The predictive validity of the Static-99 and Static-99R was tested through different measures of discrimination and calibration. Specifically, regarding discrimination, the ability of the tools to discriminate between sexual recidivists and nonrecidivists was examined through (a) the OR calculated from logistic regressions, and (b) the AUC, as well as values of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy (ACC) across all possible cut scores. A Youden index (J) was calculated to identify the cut score of the tools at which sensitivity and specificity were maximized. The OR indicates the change in the odds of recidivism associated with a one-unit increase in the tools’ total score. The AUC measures the probability that a randomly chosen recidivist would have a higher score on the tools than a randomly chosen nonrecidivist. Values above .71 can be considered large in violence risk assessment (Rice & Harris, 2005).
Regarding calibration, we assessed the fit between the sexual recidivism rates predicted by our logistic regression models and the actual outcome, and also the fit between the recidivism rates in our sample (both observed and predicted) and those provided by the tools’ developers, to assess if they are appropriate for use in the present population. Specifically, calibration was examined through (a) the Brier score—comparing (1) the recidivism rates observed in the current sample (observed rates) with the rates predicted by our regression models (predicted rates),
2
and (2) the recidivism rates observed in the current sample with the rates expected for routine samples (expected rates)—and (b) the P/E indexes (predicted divided by expected number of recidivists). The confidence intervals (CI) of the P/E indexes were calculated with an adaptation of the formula recommended by Hanson (2017), i.e.,
The Brier score measures the accuracy of probabilistic predictions by quantifying their correspondence, for every case in the sample, with the observed outcome. The score is expressed as an average squared difference and ranges between 0 and 1, where lower values indicate higher performance. By comparing observed and predicted recidivism rates, we assess how accurately the predicted probabilities of our logistic regressions fit the actual data, and by comparing observed and expected rates we assess the credibility of the recidivism estimates provided by the tools’ developers. However, the Brier score cannot be used to compare predicted and expected rates directly (it requires a dichotomous outcome while probabilities are continuous), which was desirable in this study due to the low generalizability of the observed rates. Furthermore, the Brier score does not inform about where the discrepancies between the recidivism rates in our sample and the tools’ norms are. Therefore, we also calculated the P/E index (equivalent to calibration in the large) for the overall sample and each risk category as well, to highlight areas of miscalibration. The P/E index is an aggregate statistic analogous to a single-sample t test, where the current data are compared with a known value (Hanson, 2017). If a scale shows perfect calibration, the P/E index is equal to 1. A 95% CI in the P/E indexes that includes 1 means that there is no statistically significant difference (p > .05) between predicted and expected rates.
The descriptive and inferential analyses were conducted in the software Stata 14.1 (StataCorp., 2015) for Windows 10. G*Power 3.1.9.2 (Faul, Erdfelder, Lang, & Buchner, 2007) was used to perform post hoc power analyses.
Results
Sample Characteristics
The sample includes 142 adult male sex offenders charged or convicted of child molestation (59.2%, n = 84) or rape (40.9%, n = 58). At conviction, the mean age of the offenders was 39.0 years (SD = 11.9) and the majority were Swiss citizens (83.8%, n = 119), single (57.5%, n = 81), employed (74.5%, n = 102), and had completed secondary school (65.4%, n = 87). Prior record of violent or sexual offenses was present in 43.3% of offenders (n = 61). 4 After 5 years follow-up, 9.9% (n = 14) offenders sexually reoffended.
The average total Static-99 score was 3.49 (Mdn = 3, SD = 1.76, range 0-8). Based on this tool, 13.4% (n = 19) of offenders were classified as “low risk,” 40.9% (n = 58) as “low-moderate risk,” 31.7% (n = 45) as “moderate-high risk,” and 14.1% (n = 20) as “high risk.” Regarding the Static-99R, the average score was 3.15 (Mdn = 3, SD = 2.10, range −3-9) and it classified 2.1% (n = 3) of offenders as “very low risk,” 6.3% (n = 9) as “below average risk,” 46.5% (n = 66) as “average risk,” 31.7% (n = 45) as “above average risk,” and 13.4% (n = 19) as “well above average risk.”
Discrimination
The OR from the logistic regression indicated a 70% increase in the odds of sexual recidivism for each one-unit increase in the Static-99 total score (OR = 1.70, 95% CI [1.20, 2.39], p = .003) and a 82% increase in the odds of recidivism for each one-unit increase in the Static-99R score (OR = 1.82, 95% CI [1.31, 2.53], p < .001). The Static-99R was better able to differentiate recidivists from nonrecidivists (AUC = .81, 95% CI [.69, .93]) than the Static-99 (AUC = .76, 95% CI [.64, .88]). However, the difference between the AUC of the two tools did not reach statistical significance at the 5% level (χ2 = 2.68, p = .102).
Table 1 presents the sensitivity, specificity, PPV, NPV, and ACC across Static-99 and Static-99R cut scores. The cut score of 4 represented the threshold where sensitivity and specificity were maximized for both Static-99 (J = .44) and Static-99R (J = .53). Using this cut score to categorize low-risk (score < 4) and high-risk (score ≥ 4) offenders, the Static-99R correctly classified 92.9% of the recidivists as high-risk (sensitivity) and 60.2% of the nonrecidivists as low-risk (specificity). Although 98.7% of the offenders classified as low-risk remained free of sexual offenses in the 5 years after release (NPV), only 20.3% of those classified as high-risk actually recidivated (PPV). Overall, the cut score correctly classified 63.4% of the offenders (ACC). The Static-99 had lower sensitivity (85.7%), specificity (58.6%), PPV (18.5%), NPV (97.4%), and ACC (61.3%).
Sensitivity, Specificity, PPV, NPV, and ACC at Each Static-99 and Static-99R Cut Score.
Note. PPV = positive predictive value; NPV = negative predictive value; ACC = accuracy (proportion correctly classified). Values are reported as percentages. Bold indicates the cut score with the highest Youden index.
Calibration
The predictions of the Static-99R in the current sample produced a lower Brier score (.078) than the predictions of the Static-99 (.083), indicating that this model was more efficient (72% congruence between the observed recidivism rates and the predicted probabilities of the model). The expected recidivism rates from the tools’ developers produced similar Brier scores (.079 for the Static-99R and .085 for the Static-99).
Table 2 presents the observed, predicted, and expected number of recidivists across Static-99 and Static-99R total scores and risk categories (the recidivism rates and number of recidivists across each Static-99 and Static-99R score are presented in the Appendix). There were no significant differences between the number of recidivists predicted and expected with both the Static-99 and Static-99R. While the number of recidivists predicted by the Static-99 in the current sample was on average 32% higher (P/E = 1.32) than expected in routine samples for a 5-year period, the number of recidivists predicted by the Static-99R were on average only 4% lower (P/E = 0.96) than expected. In addition, there were no significant differences between the predicted and expected number of recidivists in the different risk categories of either tool. However, the CI of the P/E index could not be calculated reliably for the lower risk categories of the tools because the expected number of recidivists was too low. Furthermore, although not statistically significant, some differences were quite substantial. For example, the number of recidivists predicted by the Static-99 for the high-risk category was 80% higher (P/E = 1.80) than expected in routine samples for this category, and the number of recidivists predicted by the Static-99R for the well above average risk category was 24% higher (P/E = 1.24).
Number of Recidivists Across Static-99 and Static-99R Total Scores and Risk Categories.
Note. O = observed number of recidivists; p = predicted number of recidivists; E = expected number of recidivists; P/E = predicted/expected index; CI = confidence interval of the P/E indexes, 95% CIs that include 1 mean there is no statistically significant difference between predicted and expected number of recidivists at the 5% level (p > .05). L = low; L-M = low-moderate; M-H = moderate-high; H = high; I = very low risk; II = below average risk; III = average risk; IVa = above average risk; IVb = well above average risk.
Figure 1 illustrates the observed, predicted, and expected sexual recidivism rates for each Static-99 and Static-99R score. As it can be seen in the figure, the general pattern is that the predicted recidivism rates in the current sample were higher than the recidivism norms for offenders with higher scores on the tools; specifically, offenders with scores > 3 on the Static-99 and > 5 on the Static-99R.

Observed, predicted, and expected 5-year sexual recidivism rates across Static-99 and Static-99R total scores.
Discussion
This study compared the validity of the Static-99 and Static-99R in assessing sexual recidivism in Switzerland. A sample of 142 male sex offenders from the Canton of Zurich was followed 5 years after release into the community and information on new charges and convictions for a contact sexual offense was collected. It was hypothesized that both tools would have adequate predictive validity in Switzerland and that the Static-99R would have better overall performance than the Static-99. In line with our hypotheses, both tools appeared appropriate to identify recidivists and quantify their risk, but the Static-99R outperformed its original version in all measures of predictive validity.
Specifically, regarding the first hypothesis, both tools had an acceptable discrimination, but the Static-99R had a higher OR (1.82 vs. 1.70) and AUC (.81 vs. .76) than the Static-99. The OR and AUC of the Static-99R in the current sample were quite high compared with prior studies (see Hanson et al., 2013; Helmus, Hanson et al., 2012). However, these values are very similar to those obtained in a study on the field validity of the Static-99R in California (OR = 1.73, AUC = .81; see Hanson et al., 2014). The careful operationalization of sexual recidivism used in the current study may have been a contributing factor for this relatively high OR and AUC. Specifically, we used both new charges and convictions as indicators of recidivism, and excluded time spent incarcerated for a nonsexual offense from the 5-year time-at-risk period, which may have improved the recidivism base rate and discrimination ability of the assessment tools.
The Static-99R also had higher sensitivity (92.9% vs. 85.7%), specificity (60.2% vs. 58.6%), PPV (20.3% vs. 18.5%), NPV (98.7% vs. 97.4%), and ACC (63.4% vs. 61.3%) than the Static-99 using a cut score of four, which was a threshold that maximized sensitivity and specificity in both tools. A score of four also has been used in the United States to make legal decisions regarding sex offenders’ sentences (e.g., Boccaccini et al., 2012). Using this classification, although most offenders (98.7%) with a low score on the Static-99R (i.e., < 4) would remain free of future sexual offenses, only one in five (20.3%) of the offenders with a high score (i.e., ≥ 4) would actually recidivate. This method could therefore have huge repercussions in terms of the human rights of misclassified offenders and costs for the criminal justice system as well. However, it is important to note that this (low) PPV is a lower bound estimate. In fact, when using recidivism data from criminal records, crimes that stay undiscovered and do not come to the attention of the court are not included in this statistic. Thus, some offenders may stay recorded as nonrecidivists when in fact they have reoffended. This means, there is no known false positive rate. 5
Regarding calibration, both tools appeared to quantify offenders’ risk adequately in the current sample. The Static-99R appeared to have better calibration than the Static-99 because its Brier score was lower when using both the predictions of our models (.078 vs. .083) and the norms of the tools’ developers (.079 vs. .085). Furthermore, the similar Brier score produced when using the predicted probabilities of our models and the expected recidivism rates (.078 vs. .079 for the Static-99R) suggests that the tools’ norms and the prediction of our model are comparable. This was confirmed by the P/E indexes, as the indexes for the total score of both tools was nonsignificant. Again, the Static-99R showed better calibration than the Static-99 because the predicted number of recidivists with this tool was on average only 4% lower than expected for routine samples, whereas the predicted number of recidivists from the Static-99 was on average 32% higher.
Furthermore, there were no significant P/E indexes across the risk categories of both tools, suggesting that their norms may be useful for risk communication in the Swiss sex offender population. These results are tentative, however, because the CI of the P/E indexes in the lower risk categories of the tools could not be calculated. In addition, some results had large confidence intervals, related to the small number of recidivists in the current sample, which may have masked significant differences between predicted and expected number of recidivists. In fact, there was a considerable discrepancy between predicted and expected number of recidivists in the higher risk categories of the tools (scores ≥ 6). This was also evident in the calibration plot, which shows that the tools are better calibrated for lower risk offenders. It must be noted that, in the current study, recidivism was defined as a new contact sexual offense. However, the tool was developed to estimate the risk of committing any kind of sexual offenses. Although the number of non-contact offenses was small in the present sample (n = 1), this difference in the operationalization of the outcome variable may contribute (in a small extent) to the differences between the recidivism rates predicted in the current sample and the recidivism rates expected in routine samples.
Nevertheless, in addition to the findings of the present study, the meta-analysis of Helmus, Hanson, et al. (2012) previously evidenced that there is considerable variation in the absolute recidivism rates associated with specific Static-99R scores across settings and samples. This may question the applicability of these recidivism rates to different populations (Hanson et al., 2014). Although the recidivism rates predicted in the present study could be used instead, our sample size (N = 142) and numbers of recidivists (n = 14) is too small to produce stable and generalizable estimates. Therefore, the norms provided by the tools’ developers—being based on 10 samples from six different countries (including Germany and Austria), 4,325 offenders, and 358 recidivists—should be preferred.
In addition, this finding indicates that although risk assessment tools may have good discrimination ability, they frequently fail in quantifying risk with adequate certainty for effective decision making (Singh, Grann, Lichtenstein, Långström, & Fazel, 2012). Based on our model, less than one-third of the offenders in the well above average risk category of the Static-99R were predicted to recidivate (5.86/19). This low recidivism rate, even among the higher risk offenders as classified by the norms of the Static-99R, shows that it should not be used as the sole determinant in the estimation of sexual recidivism risk. Although including only static factors associated with sexual reconviction (Harris et al., 2003), additional factors, especially dynamic ones, may need to be considered to better assess recidivism risk. Furthermore, a shift from atheoretical risk scales to the assessment of psychologically meaningful constructs derived from latent variable analyses has been suggested (Brouillette-Alarie et al., 2016). For instance, Brouillette-Alarie et al. (2016) evidenced that Static-99R and Static-2002R) (Helmus, Thornton et al., 2012) items represent three dimensions: (a) persistence/paraphilia, (b) youthful stranger aggression, and (c) general criminality. The use of such constructs may improve the validity and reliability of risk assessments and provide useful information for offender treatment as well.
Regarding the second hypothesis, the results evidenced that, although the Static-99 and Static-99R both seem to have an acceptable predictive validity in Switzerland, the Static-99R—with its new weighting scheme for the age at release item created to account for the age-recidivism relationship—outperforms its original version in all indices of discrimination and calibration. Currently, practitioners and correctional agencies in Switzerland and other countries are debating whether to adopt the methodology of the Static-99R instead of the Static-99. The finding of this study is consistent with the recommendations of the tools’ developers to utilize the Static-99R and its more actualized recidivism norms (Static-99, n.d.).
Limitations and Implications
This study has two major limitations. First, the sample size (N = 142) and the absolute number of recidivists (n = 14) were relatively small, which reduces the precision of the analyses. For example, although the 5-year sexual recidivism base rate in the current study (9.9%) is similar to other cross-validations of the Static-99R (11.1%, see Helmus, Hanson et al., 2012) there was no observed recidivist in the “low-risk” category of the Static-99 or in the “very low risk” and “below average risk” categories of the Static-99R. Examining a larger sample would increase confidence in the current findings. Nevertheless, post hoc power analyses (one-way) revealed that, attending to the observed effect sizes (OR = 1.82, AUC = .81), the available sample provided a 87% power for the logistic regressions and 99% power for the AUC analyses (using the Wilcoxon Mann–Whitney test for two groups as a proxy; an AUC of .81 corresponds to a Cohen’s d of 1.2 [see Rice & Harris, 2005]). The lack of power is more pronounced in the calibration analyses with the P/E index because the statistical power of this test is determined solely by the number of recidivists (Hanson, 2017). Even so, regarding the well above average risk category of the Static-99R, there would have been necessary 9.10 predicted recidivists (instead of 5.86) to make the difference between predicted and expected recidivists (4.74) statistically significant.
Second, although representative of the Canton of Zurich, which is the largest correctional system of the country, the results could be different in other cantons of Switzerland, which prevents generalization of the present findings. Unfortunately, it is difficult to collect national data in Switzerland because, until recently, the documentation of incarcerations was registered at the cantonal level only. At present, national data are being collected by the Federal Office of Statistics (Bundesamt für Statistik), which may help developing nationally representative samples in future recidivism studies.
Despite limitations, this study has implications for practice. First, the findings suggest that the Static-99R has better overall performance than the Static-99 in the current sample. The Static-99R may therefore be more appropriate for use in Switzerland than the Static-99. Second, regarding discrimination, it appears that a cut score of four in the Static-99R is best to classify individuals as low- or high-risk offenders. However, although this threshold has high accuracy in identifying nonrecidivists, it has very low accuracy in identifying those who actually recidivated. In view of this evidence, the Static-99R could better be used as the first part of a multi-step approach to risk assessment, where offenders with a score < 4 in the Static-99R are classified as low-risk individuals while the offenders with a score ≥ 4 are subjected to a more thorough assessment to better evaluate their recidivism risk. Third, regarding calibration, the results suggest that the Static-99R and its actual norms may be useful for risk communication in the Swiss population. However, for offenders in the well above average risk category of the tool (score ≥ 6), the recidivism rates in the current sample tended to be higher than the rates expected in routine samples. Therefore, the norms of the Static-99R may not quantify the recidivism risk of high-risk offenders so accurately.
Conclusion
The present study compared the predictive validity of two of the most commonly applied risk assessment tools for sex offenders—the Static-99 and Static-99R—in Switzerland. Both instruments were associated with severe sexual recidivism in a 5-year post-release period, but the Static-99R had better discrimination and calibration than the Static-99. However, considering the low positive predictive value and the higher recidivism rates among well above average risk offenders in the current sample when compared to the norms of the tool, the Static-99R would better be used to rule out low-risk individuals and engage the remaining ones in a more thorough risk assessment. Although likely representative of the male sex offender population in the Canton of Zurich, additional research is necessary to support the generalizability of the present findings to the Swiss population.
Footnotes
Appendix
Sexual Recidivism Rates and Number of Recidivists Across Static-99 and Static-99R Scores.
| Score/total | N | O |
P |
E |
|||
|---|---|---|---|---|---|---|---|
| % | N | % | n | % | n | ||
| Static-99 a | |||||||
| 0 | 3 | 0.0 | 0 | 0.7 | 0.02 | 2.3 | 0.07 |
| 1 | 16 | 0.0 | 0 | 2.2 | 0.35 | 3.2 | 0.51 |
| 2 | 24 | 4.2 | 1 | 3.3 | 0.79 | 4.3 | 1.03 |
| 3 | 34 | 2.9 | 1 | 6.3 | 2.14 | 5.7 | 1.94 |
| 4 | 24 | 12.5 | 3 | 8.4 | 2.02 | 7.7 | 1.85 |
| 5 | 21 | 19.0 | 4 | 15.0 | 3.15 | 10.2 | 2.14 |
| 6 | 13 | 23.1 | 3 | 23.9 | 3.11 | 13.4 | 1.74 |
| 7 | 5 | 40.0 | 2 | 28.5 | 1.43 | 17.4 | 0.87 |
| 8 | 2 | 0.0 | 0 | 48.9 | 0.98 | 22.3 | 0.45 |
| Total | 142 | 14 | 13.99 | 10.60 | |||
| Static-99R b | |||||||
| –3 | 1 | 0.0 | 0 | 0.2 | 0.00 | 0.9 | 0.01 |
| –2 | 2 | 0.0 | 0 | 0.4 | 0.01 | 1.3 | 0.03 |
| –1 | 1 | 0.0 | 0 | 0.7 | 0.01 | 1.9 | 0.02 |
| 0 | 8 | 0.0 | 0 | 1.4 | 0.11 | 2.8 | 0.22 |
| 1 | 19 | 5.3 | 1 | 2.1 | 0.40 | 3.9 | 0.74 |
| 2 | 25 | 0.0 | 0 | 3.6 | 0.90 | 5.6 | 1.40 |
| 3 | 22 | 0.0 | 0 | 5.7 | 1.25 | 7.9 | 1.74 |
| 4 | 30 | 13.3 | 4 | 9.9 | 2.97 | 11.0 | 3.30 |
| 5 | 15 | 20.0 | 3 | 16.4 | 2.46 | 15.2 | 2.28 |
| 6 | 10 | 20.0 | 2 | 22.6 | 2.26 | 20.5 | 2.05 |
| 7 | 7 | 42.9 | 3 | 35.6 | 2.49 | 27.2 | 1.90 |
| 8 | 1 | 100 | 1 | 48.8 | 0.49 | 35.1 | 0.35 |
| 9 | 1 | 0.0 | 0 | 62.1 | 0.62 | 43.8 | 0.44 |
| Total | 142 | 14 | 13.97 | 14.48 | |||
Note. O = observed recidivism; p = predicted recidivism; E = expected recidivism.
Expected recidivism norms from Helmus, Hanson, and Thornton (2009).
Expected recidivism norms from Phenix, Helmus, and Hanson (2016).
Acknowledgements
The authors thank Cornel Gmür and Jay p. Singh for their collaboration on a prior version of this study, Madeleine Kirschstein for her comments on the manuscript and assistance in revising the text, and BioScience Writers, LLC. for editing the final version of the manuscript.
Authors’ Note
The present manuscript has not been published elsewhere and is not currently under consideration by any other journal. The authors received no funding to conduct the research and have no conflict of interest, including financial ones. The authors take responsibility for the integrity of the data, the accuracy of the data analyses, and have made every effort to avoid inflating statistically significant results.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
