Abstract
To examine the predictive accuracy of four well established risk assessment instruments (PCL-R, HCR-20, SVR-20, and Static-99) in an important subgroup of sexual offenders, these instruments were assessed retrospectively based on information from forensic psychiatric court reports in a sample of 90 released male sexual homicide offenders (out of an original sample of 166) in Germany. Follow-up information about criminal reconvictions after release were obtained from the federal criminal records. Total scores as well as subscales and single items of these risk assessment instruments did not predict sexual recidivism, and only some of them had moderate predictive power regarding nonsexual violent recidivism. Possible explanations for these unexpected results are the retrospective study design with missing information about influences during the long duration of detention and time after release, the small sample size as well as the possibility that the risk assessment instruments investigated were valid for general sex offender samples, but not for the particular subgroup of offenders with sexually motivated homicides.
In most jurisdictions risk assessment procedures are an integral part of the criminal justice system. For sexual offenders in particular the assessment of recidivism risk has made important advances during the past decades (e.g., Andrews & Bonta, 2006; Hanson, 2009; Harris & Hanson, 2010). In modern forensic psychology and psychiatry there are basically three different methodological approaches to risk assessment (e.g., Boer & Hart, 2009; Craig, Browne, & Beech, 2008; Hanson, 2009; Hart & Boer, 2009): unstructured clinical judgement (UCJ), actuarial risk assessment instruments (ARAIs), and structured professional judgment (SPJ). Intuitively made UCJs—even if performed by experienced clinicians—should not be taken into account in professional risk assessment settings any longer, since they cannot be regarded as a scientific procedure and should therefore not even be named “professional” (Hanson, 2009). ARAIs—at the other end of the spectrum—represent highly structured risk assessment scales that use combinations of empirically determined and thoroughly operationalized predictor variables (e.g., Craig et al., 2008; Hanson & Morton-Bourgon, 2009; Quinsey, Harris, Rice, & Cormier, 2006). Recent meta-analyses and reviews have shown that both standardized risk assessment approaches—ARAIs and SPJ—yielded moderate to good predictive accuracy with a slight superiority of ARAIs over the clinical approaches (e.g., Hanson & Morton-Bourgon, 2009; Quinsey et al., 2006).
However, in spite of many research studies into the reliability, validity, and predictive accuracy of risk assessment tools, unresolved problems and inconsistent research results still remain (e.g., Craig, Browne, Stringer, & Beech, 2004). One important and commonly discussed research question is the predictive accuracy of instruments for different sex offender subtypes. In their developmental study of the Static-99, Hanson and Thornton (2000) concluded that the predictive accuracy of the instrument proved satisfactory for both rapists and child molesters. Although they found no differences in these two subgroups, more recent studies on the differential validity of risk assessment tools across different sexual offender subgroups indicate that the predictive accuracy of particular instruments varies depending on offender type (e.g., Bartosh, Garby, Lewis, & Gray, 2003; Rettenberger, Matthes, Boer, & Eher, 2010). Furthermore, there are subgroups of sex offenders—for example, hands-off offenders, sex offenders with adult male victims, or sexual homicide perpetrators—for which there is little or no empirical evidence for the predictive accuracy of both ARAIs and SPJ tools (Barbaree, Langton, & Peacock, 2006).
The aim of this study was therefore an elaborate test of the predictive accuracy of criminal risk assessment procedures for a sample of male perpetrators of sexually motivated homicides. These procedures were developed in Canada and the United States but are also in use in Germany and have in the meantime been cross-validated and standardized for use with general samples of sex offenders. We expected that the risk assessment instruments used in this study would also be suitable for predicting various types of recidivism (including homicidal, sexual, violent and other offences) in sexual homicide perpetrators.
Earlier results of our Hamburg research group indicated that the rates of recidivism of perpetrators of sexual homicides were comparable with those of perpetrators of sexual crimes in general (e.g., Berner, Briken, Habermann, & Hill, 2008; Hill, Habermann, Klusmann, Berner, & Briken, 2008). Of 90 German men who had committed a sexual homicide, only 3.3% (n = 3) reoffended with a new attempted or completed homicide within a period of approximately 12 years time at risk after release from imprisonment or after the end of involuntary treatment. Two of these offences were regarded as sexual (Berner et al., 2008; Hill, Habermann et al., 2008). The recidivism rates estimated by the Kaplan-Meier procedure for a time at risk of 20 years were 23.1% (95% Confidence Interval [CI; 10.2%, 36.0%]) for sexual offences (including sexual homicide), 18.3% (95% CI: [9.8%, 26.8%]) for nonsexual violent offences, 35.7% (95% CI: [22.6%, 48.8%]) for the combined group of any violent offence, and 58.4% (95% CI [45.9%, 70.9%]) for nonviolent offences. These figures were largely comparably with the recidivism rates of general sex offender samples that were determined in extensive meta-analyses (e.g., Hanson & Morton-Bourgon, 2004).
On the one hand, the empirical investigations into the predictive accuracy of standardized risk assessment instruments for different sexual offender subgroups mentioned above further support the hypothesis that these instruments are also able to predict recidivism in sexual homicide perpetrators because—despite the substantial variability between different sex offender subgroups—most instruments show at least a small amount of predictability for every subgroup (Bartosh et al., 2003; Hanson & Thornton, 2000; Looman & Abracen, 2010; Rettenberger et al., 2010). On the other hand, previous studies by our research group in Hamburg indicate that in the case of this particular group of perpetrators the customary risk factors—which are highly valid for sex offenders as a whole—possess weaker or less predictive power. In these earlier data analyses of the risk assessment instruments, however, only comparisons of mean values were analyzed and not commonly used effect sizes (Hill, Habermann et al., 2008). In addition, the instruments were only investigated with regard to their aggregated scores or the total risk estimation, and not with regard to their individual subscales or items (Berner et al., 2008; Hill, Habermann et al., 2008). We found no significant differences in the survival curves for sexual and general violent recidivism between recidivists and nonrecidivists in terms of their mean scores on different risk assessment tools. However, differences between both groups were found with regard to nonsexual violent reoffences. Another relevant finding was that most of the investigated risk factors were associated with a reduced chance of being released, indicating that decisions about release had been influenced either consciously or intuitively by the presence or absence of well established risk factors (Hill, Habermann et al., 2008).
The present study is part of a large-scale research project concerning the clinical, criminological, and legal aspects of sexual murderers using a comprehensive and comparatively large sample of offenders convicted of sexually motivated murder in Germany (e.g., Berner et al, 2008; Briken, Habermann, Kafka, Berner, & Hill, 2006; Briken, Nika, & Berner, 1999; Hill, Habermann, Berner, & Briken, 2006; Hill, Habermann et al., 2008; Hill, Ujeyl et al., 2008; Ujeyl et al., 2008). The main aim of the present study was to examine the predictive accuracy of four well established risk assessment instruments: the Static-99 (Hanson & Thornton, 2000), the Historical-Clinical-Risk Management-20 (HCR-20; Webster, Douglas, Eaves, & Hart, 1997), the Sexual Violence Risk-20 (SVR-20; Boer, Hart, Kropp, & Webster., 1997), and the Psychopathy Checklist-Revised (PCL-R; Hare, 2003). In order to achieve comparability with the existing status of risk assessment research we used commonly used effect sizes for the investigation of the predictive accuracy of the instruments. Furthermore, we examined different recidivism criteria in order to prove differential effects, and we investigated the predictive accuracy of the different subscales and the individual items of the instruments as well.
Method
Database and Sample Characteristics
Psychiatric court reports on 166 men who had committed a sexual homicide between 1945 and 1991 were evaluated retrospectively by three raters (A.H., N.H., P.B.) for sociodemographic, criminal, and clinical factors, using standardized diagnostic and risk assessment instruments. The reports had been written by 20 forensic psychiatrists from four major German forensic psychiatric centers. Detailed information on the recruitment of the reports has been described elsewhere (Briken, Habermann, Berner, & Hill, 2005; Hill et al., 2006, 2007). Adopting Ressler, Burgess, and Douglas’s (1988) definition of sexual homicide, we included offences in which at least one of the following criteria was fulfilled:
Attempted or completed sexual intercourse (oral, anal, or vaginal);
Exposure of the primary or secondary sexual parts of the victim’s body;
Victim’s being left naked or seminaked;
Sexual positioning of the victim’s body;
Insertion of foreign objects into the victim’s body cavities;
Semen on or near the victim’s body;
Substitute sexual activity (e.g., masturbation, exhibitionistic or voyeuristic behavior);
Sexual interest admitted to by the offender;
Sadistic fantasies admitted to by the offender.
The mean number of sexual homicide criteria per offender was M = 3.7 (SD = 1.5).
The forensic reports (mean length M = 58 pages, SD = 35) were requested mainly in order to assess criminal responsibility (67.5%) or for risk assessment prior to release or changes in security levels of imprisonment (32.5%). An initial systematic evaluation of the reports as well as further available documentation (verdicts, supplementary psychological and neurological reports, etc.) was performed using an operationalized questionnaire that also included the risk assessment instruments that will be described below. In the second step, information about new criminal convictions were obtained from excerpts of the Federal Criminal Records (FCR) between 2002 and 2003 for all cases to investigate recidivism. Four types of recidivism were differentiated in the evaluation of the extracts: (a) homicide, (b) sexual offence, (c) (nonsexual) violent offence, and (d) other (nonsexual and nonviolent) offence. The research assistants involved in the evaluation of the FCR excerpts were “blind” to the assessments of the report evaluators; vice versa, the evaluators had no access to the FCR excerpts when they rated the risk assessment instruments (and thus no knowledge with regard to any recidivism).
The follow-up sample consisted of n = 139 (83.7%) cases for whom FCR excerpts held any entries. For n = 27 (16.3%) cases of the total sample the FCR excerpts exhibited no information about their criminal history and, therefore, these cases were excluded from further analysis. The lack of data can be due to different reasons (Hill, Habermann et al., 2008): A federal criminal record file is deleted one year after the person has died or reached the age of 90 years or after defined periods, if the offender was sentenced to a limited detention in prison (e.g., after 20 years, for any sentence involving hands-on sexual offences) and was not reconvicted for a new offence.
Forty-nine of the 139 offenders (35%) had not yet been released from custody (imprisonment and/or closed forensic psychiatric settings) as a result of their latest conviction due to one or more sexual homicides. The 90 offenders who had been released (65%) had spent on average M =12.2 years (SD = 7.6) in custody, whereby recurrent time periods spent in custody were calculated out. Those who had not been released had been in custody for an average of M = 20.6 years (SD = 7.9) at the point in time of the FCR excerpts. The group of released offenders differed significantly from the group of offenders who had not yet been released with regard to several criminal risk factors (Hill, Habermann, et al., 2008): The group of nonreleased offenders had more often committed sexually motivated crimes before the homicide and had more often committed multiple sexual homicides; and showed a higher prevalence of sexual sadism, of any other paraphilia, and of an antisocial personality disorder. The nonreleased offenders showed significantly higher scores in the PCL-R, HCR-20, SVR-20, and the Static-99, which indicates that risk-relevant and/or risk-associated variables were taken into account in the courts’ decisions to release.
The predictive validity of the risk assessment instruments was analyzed on the basis of the data from the 90 released sexual homicide offenders. The average age at the time of release was M = 38.78 years (SD = 11.37, range 17-70). While 75.6% (n = 68) of the released offenders had served a prison sentence, 18.9% (n = 17) had been detained in forensic psychiatric hospitals. Forty-three percent (43.3%, n = 39) of the released offenders had committed at least one sexual offense prior to the sexual homicide, and further 25.6% (n = 23) at least one nonsexual violent crime. The recidivism rates were 15.6% (n = 14; median follow-up period, calculated from date of release until the date of reconviction or of the end of the follow-up period, respectively = 10.22 years) for sexual, 28.9% (n = 26; median follow-up = 7.78 years) for violent (including sexual), and 47.8% (n = 43; median follow-up = 5.64 years) for general recidivism.
Risk Assessment Instruments
For the present study we used the Static-99 (Hanson & Thornton, 2000), the Historical-Clinical-Risk Management-20 (HCR-20; Webster et al., 1997), the Sexual Violence Risk-20 (SVR-20; Boer et al., 1997), and the Psychopathy Checklist-Revised (PCL-R; Hare, 2003).
The Static-99 is a brief actuarial instrument for the assessment of risk of sexual and violent recidivism in adult sexual offenders (Harris, Phenix, Hanson, & Thornton, 2003). It is the most commonly used and best validated actuarial instrument for sexual offenders (Anderson & Hanson, 2009). It was developed in 1999 by Karl R. Hanson (Canada) and David Thornton (Great Britain) and is used regularly for risk assessment in North America and some countries in Europe. The Static-99 consists of 10 static risk factors: age when exposed to risk, any live-in intimate relationship for 2 or more years, any index offence of nonsexual violence, prior offences of nonsexual violence, prior charges or convictions for sexual offences, prior sentencing dates, any convictions for noncontact sexual offences, any victims not related to the perpetrator, any victims unknown to the perpetrator, and any male victims. The individual risk factors of a sexual offender add up to a maximum total score of 12 that is subsequently translated into four risk categories: low (0-1), medium low (2-3), medium high (4-5), and high (6 or more). Hanson and Thornton (2000) tested the predictive accuracy of the Static-99 using four different mixed data sets of 1,301 sexual offenders (including child molesters and rapists from prisons as well as high-security forensic psychiatric settings) collected in Canada and the United Kingdom. The Static-99 rendered predictive accuracy values of AUC = .71 for sexual and AUC = .69 for any violent (including sexual) recidivism. Among others, further results from studies with Canadian (Barbaree, Seto, Langton, & Peacock, 2001), Dutch (de Vogel, de Ruiter, van Beek, & Mead, 2004), Belgian (Ducro & Pham, 2006), German (Stadtland et al., 2005), Austrian (Rettenberger & Eher, 2006), and Swedish (Sjöstedt & Långstrom, 2001) samples of sex offenders were published. The predictive accuracy values for sexual recidivism ranged from moderate (AUC = .66; Ducro & Pham, 2006) to good (AUC = .76; Sjöstedt & Långstrom, 2001).
The SVR-20 is a structured clinical guideline designed for the assessment of risk of sexual violence in adult sex offenders. The instrument was developed out of a thorough consideration of the empirical literature and the clinical expertise of a number of clinicians. The SVR-20 consists of 20 items, divided into three domains: psychosocial adjustment (11 items: sexual deviance, victim of child abuse, psychopathy, major mental illness, substance use problems, suicidal or homicidal ideation, relationship problems, employment problems, past nonsexual violent offences, past nonviolent offences, and past supervision failure), sexual offences (7 items: high density sex offences, multiple sex offence types, physical harm to victim(s) in sex offences, use of weapons or threats of death in sex offences, escalation in frequency or severity of sex offences, extreme minimization or denial of sex offences, and attitudes that support or condone sex offences), and future plans (2 items: lacks realistic plans and negative attitude towards intervention), which have to be coded by an experienced forensic clinician. Although originally designed as a structured clinical guideline, it is not uncommon for research purposes to add the items up. In this case, the instrument becomes a conceptual actuarial measure (Hanson & Morton-Bourgon, 2007). Current studies provide first indications of the predictive validity of the SVR-20 and its cross-cultural transferability (e.g., de Vogel et al., 2004; Rettenberger, Boer, & Eher, 2011; Stadtland et al., 2005).
The HCR-20 is, like the SVR-20, a Structured Professional Judgment (SPJ) instrument and is used to establish the risk of violent behavior (Webster et al., 1997). The original version of the instrument consisted of 20 items, divided into three domains: items regarding the candidate’s history (10 items: previous violence, young age at first violent incident, relationship instability, employment problems, substance use problems, major mental illness, psychopathy, early maladjustment, personality disorder, and prior supervision failure), clinical items (five items: lack of insight, negative attitudes, active symptoms of major mental illness, impulsivity, and unresponsive to treatment), and risk management items (5 items: plans lack feasibility, exposure to destabilizers, lack of personal support, noncompliance with remediation attempts, and stress). The HCR-20 is internationally the most frequently used and best researched SPJ instrument (for a current overview see, for instance, Douglas & Reeves, 2010). Recent meta-analytic findings support its reliability (Nikolova et al., 2006) and (predictive) validity (Guy, 2008) in a variety of settings and for a variety of offender populations.
Although the PCL-R was originally developed as a psychometric instrument for measuring the construct of psychopathy, research shows that the PCL-R also performs reasonably in predicting general, sexual, and violent recidivism in both prison and forensic psychiatric populations (e.g., Hare, 2003; Quinsey et al., 2006; Rice & Harris, 1995). Because the PCL-R is often used as an assessment tool for dangerousness and recidivism risk (Hare, Clark, Grann, & Thornton, 2000), it was also included in the present study. The PCL-R is based on semistructured interviews and a review of file information. It measures glibness or superficial charm, grandiose sense of self-worth, need for stimulation and proneness to boredom, pathological lying, conning and manipulativeness, lack of remorse or guilt, shallow affect, callousness and lack of empathy, parasitic lifestyle, poor behavioral control, promiscuous sexual behavior, early behavior problems, lack of realistic and long-term goals, impulsivity, irresponsibility, failure to accept responsibility for one’s own actions, many short-term marital relationships, juvenile delinquency, revocation of conditional release, and criminal versatility. Scale scores are obtained by adding together the ratings, with a total possible score of 40. As mentioned above, the PCL-R does a reasonable job as a recidivism risk prediction instrument for different outcomes in various forensic populations (e.g., Hare, 2003; Hare et al., 2000; Quinsey et al., 2006; Rice & Harris, 1995). For the purpose of the present study the 2-factors- 4-facets model currently discussed (Hare, 2007) is used, which differentiates psychopathy into two factors each consisting of two facets, respectively. The first factor reflects affective (Facet 1) and interpersonal (Facet 2) components of the disorder and the second reflects an antisocial lifestyle (Facet 3) and general criminality (Facet 4; e.g., Weaver, Meyer, van Nort, & Tristan, 2006).
Over the past few years German versions of all these instruments have become available. In the meantime, a number of cross-validation studies exist with general sexual and nonsexual violent offender samples which have also shown predominantly good reliability and validity indices (e.g., Dahle, 2006, 2007; Dahle, Schneider, & Ziethen, 2007; Rettenberger et al., 2010; Stadtland et al., 2005).
These four risk assessment instruments were coded retrospectively using exclusively file information from the psychiatric court reports for each of the offenders in the follow-up sample (n = 139)—for both the 90 who had been released and the 49 who had not yet been released. The original psychiatric court reports contained no scores on the risk assessment measures which were part of the present study. Thus, the scores of the risk assessment instruments used in the present study do not include any diagnostic information derived from clinical interviews. The interrater reliability for the total scores of the risk assessment instruments was satisfactory, with values between 0.77 and 0.87 (ICC; single measure). In addition to the total scores of these instruments relevant subscales were examined with regard to their predictive ability, since substantial differences in risk prediction have been reported for subscales of the instruments (e.g., Gray, Taylor, & Snowden, 2008; Hare, 2003; Looman & Abracen, 2010; Rettenberger et al., 2010): While HCR-20 and SVR-20 can each be divided into three subscales, the current 2-factor- 4-facets model formed the basis for the testing of the predictive accuracy of the PCL-R (Hare, 2003).
Data Analysis
The concurrent validity of the instruments was analyzed using Pearson correlations of the raw scores of each instrument. The predictive accuracy of the risk assessment instruments was determined by calculating the AUCs of the Receiver Operating Characteristics (ROCs; Hanley & McNeil, 1982; Rice & Harris, 1995, 2005). The ROC curve is produced by plotting the hit and false-alarm rates across the possible cutoff values. This statistical procedure is commonly used for examining the predictive accuracy of binary decisions such as “release” or “do not release” (Mossman, 1994). The AUC values lie between 0 and 1, with .5 indicating prediction at the chance level and 1 indicating perfect prediction. In the context of recidivism prediction, the AUC is commonly interpreted as the probability that a randomly selected recidivist will have a higher score on a risk variable than will a randomly selected nonrecidivist (Seto, 2005). Because of its low sensitivity to base rates of recidivism and to users’ biases for or against Type I or Type II prediction error and because it is easy to interpret, the AUC is a standard measure of diagnostic and predictive accuracy in clinical and forensic research (Mossman, 1994). Furthermore, none of the measures commonly used in the past provide an efficient way to evaluate the usefulness of predictive tests (Rice & Harris, 1995). Referring to Cohen (1992), Rice and Harris (2005) formulated the following criteria for classification of the predictive accuracy of risk assessment tools: AUC values of .71 or above are classified as good and values between .64 and .70 are classified as moderate. Significant AUC values that are below .64 are classified as small. Douglas, Webster, Hart, Eaves, and Ogloff (2001) report the following interpretation of critical AUC values: AUC values of .70 and above are classified as moderate and values above .75 are classified as good.
Results
Data for scoring the risk assessment instruments was available for all 90 offenders of the released subgroup of the follow-up sample as well as for the 49 offenders who had not yet been released from custody (imprisonment and/or closed forensic psychiatric settings). For both subsamples together (n = 139) the Static-99 mean raw total score was M = 5.54 (SD = 1.96, range 0-11), for the SVR-20 M = 24.02 (SD = 6.52, range 7-37), for the HCR-20 M = 18.27 (SD = 4.45, range 7-37), and for the PCL-R M = 17.21 (SD = 8.35, range 0-37). For the 90 offenders of the released subgroup the Static-99 mean raw total score was M = 5.20 (SD = 1.86, range 0-10), for the SVR-20 M = 22.40 (SD = 6.09, range 7-34), for the HCR-20 M = 15.70 (SD = 4.45, range 6-26), and for the PCL-R M = 15.72 (SD = 8.29, range 0-33). For the 49 offenders who had not yet been released the Static-99 mean raw total score was M = 6.16 (SD = 1.99, range 2-11), for the SVR-20 M = 27.00 (SD = 6.27, range = 13-37), for the HCR-20 M = 19.80 (SD = 5.89, range = 9-37), and for the PCL-R M = 19.94 (SD = 7.83, range 4-37). The differences between both subgroups in the mean raw total scores indicate that elevated scores in the risk assessment instruments were associated with a lower probability of being released (Hill, Habermann et al. 2008).
The product-moment-correlations of the four risk assessment instruments are presented in Table 1. All four procedures showed in part highly significant positive intercorrelations. Previous studies have also investigated the intercorrelations between age (at the time of release from custody), actuarial item responses, and recidivism (e.g., Barbaree, Langton, & Blanchard, 2007; Barbaree, Langton, Blanchard, & Boer, 2008; Helmus, Thornton, Hanson, & Babchishin, 2011) and have reported considerable negative correlations between age-at-release and actuarial scores indicating an age-related influence on risk assessment scoring—at least for actuarial instruments. In the present study, the correlations between age-at-release (n = 90) and the sum scores of the instruments were for the Static-99 r = –.01 (p = .96), for the PCL-R r = .22 (p < .05), for the SVR-20 r = .26 (p < .05), and for the HCR-20 r = .25 (p < .05).
Correlations Between the Raw Scores of the Risk Assessment Instruments (N = 90)
Note: PCL-R = Psychopathy Checklist–Revised; HCR-20 = Historical-Clinical-Risk Management-20; SVR-20 = Sexual Violence Risk-20.
p < .05 ** p < .01 *** p < .001.
The predictive validity indices of the four risk assessment instruments—total scores as well as subscales—are shown in Table 2. For the three categories of recidivism (a) with a sexual offence, (b) with a sexual or (nonsexual) violent offence, and (c) with another (nonviolent) offence, neither one particular risk assessment instrument nor any of the subscales showed a significant predictive result. Only for criterium (d) recidivism with a (nonsexual) violent offence did the aggregated (total) scores of the HCR-20 and the SVR-20 respectively show moderate validity indices. Amongst the subscales, the HCR-20 clinical subscale also attained a moderate level of predictive accuracy for violent (nonsexual) recidivism.
The Predictive Accuracy of the Risk Assessment Instruments and Their Subscales Using AUC Values
Note: PCL-R = Psychopathy Checklist–Revised; HCR-20 = Historical-Clinical-Risk Management-20; SVR-20 = Sexual Violence Risk–20
In brackets p values and 95% confidence interval.
An analysis of the bivariate correlations between the individual items in the four instruments and recidivism categories rendered the following significant results (presentation of the data in full was decided against here for reasons of clarity):
Item 2 of the PCL-R (grandiose sense of self-worth) correlated positively with (nonsexual) violent recidivism (r = .21, p < .05).
Item 12 of the HCR-20 (negative attitudes) correlated positively with (nonsexual) violent recidivism (r = .28, p < .01).
Item 6 of the HCR-20 (major mental illness) as well as the identical item 3 of the SVR-20 correlated negatively with general recidivism (in each case r = –.23, p < .05).
Item 15 of the SVR-20 (use of weapon or threats of death towards the victim) correlated negatively both with general (r = –.21, p < .05) and with (nonsexual) violent recidivism (r = –.30, p < .01)
Item 18 of the SVR-20 (attitudes that support or condone sex offences) correlated positively with (nonsexual) violent recidivism (r = .22, p < .05).
Significant correlations arose for the Static-99 between item 4 (prior offences of nonsexual violence) and (nonsexual) violent recidivism (r = .22, p < .05), between item 2 (relationship status—any live-in intimate relationship for 2 or more years) and sexual as well as sexual and/or violent recidivism (in each case r = .23, p < .05), as well as between item 3 (conviction for any index offence of nonsexual violence) and (nonsexual) violent recidivism (r = –.24, p < .05).
The remaining items from the four instruments did not show any significant correlation with the recidivism criteria.
Discussion
Particularly in the case of the release of perpetrators who have committed serious sexual offences—including sexual homicide—, risk assessments have a high relevance and influence in many jurisdictions. In contrast to the great number of research publications that exist on the validity of forensic diagnostic and risk assessment procedures for sex and violent offenders in general, the accuracy of such instruments when applied to perpetrators of sexual homicides has yet to be sufficiently empirically tested (Hill, Habermann et al. 2008). The present study is the first detailed empirical investigation into the predictive validity of four internationally used risk assessment instruments, using a sample of German men who have committed sexual homicides.
Both interrater reliability indices and convergent validity yielded, as with other subgroups of sex offenders (Barbaree et al., 2001; Rettenberger et al., 2010), significant positive correlations. These results show once again that a comparatively reliable assessment of criminal risk factors can be conducted using standardized risk assessment tools, as long as they are used by appropriately trained forensic psychologists and psychiatrists. However, one has to bear in mind that even satisfactory—with reference to commonly used conventional interpretation guidelines (e.g., Fleiss, 1981; Hart & Boer, 2009)—interrater reliability indices as reported here could be sufficient to explain the lack of predictive validity. This issue may be especially relevant in empirical investigations like the present study where risk assessment instruments were applied retrospectively to relatively low base rate outcomes like (severe) sexual and violent recidivism. With regard to the convergent validity the comparatively high intercorrelations suggest that all of the procedures evaluate the same construct—in this case the probability of recidivism or reconviction for sex offenders.
The correlations between the sum scores of the instruments and age-at-release were not as high as in previous studies investigating the age-related influence on actuarial risk assessment scales (Barbaree et al., 2007; Barbaree et al., 2008; Helmus et al., 2011). In these previous studies the researchers found considerable negative correlations between age-at-release and actuarial risk assessment instruments and concluded that there exists an age-related influence in actuarial risk assessment which is beyond what is captured by the age items in the scales (Barbaree et al., 2007). In the present study only one actuarially developed instrument was used, the Static-99 (Hanson & Thornton, 2000; Helmus et al., 2011), and for this measure no significant correlation with age-at-release was examined. For the other instruments low but significant positive correlations were obtained suggesting higher scores on the scales would lead to later release. It can be assumed that this is due to the fact that a relatively high proportion of sexual homicide offenders was sentenced to potentially lifetime imprisonment or to a potentially undetermined detention in a forensic hospital (Hill, Habermann et al., 2008). Thus, it can be hypothesized that sexual homicide offenders with higher scores on risk assessment instruments (i.e., with higher recidivism risk) were have been held comparably longer in custody and, therefore, might have been released at an older age than sexual homicide offenders assumed as low or moderate risk offenders.
The instruments PCL-R, HCR-20, SVR-20, and Static-99, which are valid predictors for the total group of sex and violent offenders, appear however not to allow a valid prediction of recidivism for offenders who have committed sexual homicides, at least if the instruments rely on the assessment of written reports alone (most of which date from the point in time of sentencing of the sexual homicide). According to our data, these instruments do not fulfill the minimum standards required for a sufficient level of predictive validity: whereas a calculation of the validity indices for new homicide crimes had to be dropped due to the low rate of recidivism of 3.3%, non significant AUC values were obtained for the prediction of new sexual, violent, sexual/violent, and other offences independent of which instrument or subscale was used. Only for the recidivism criterium “(nonsexual) violent recidivism” a moderate level of predictive accuracy could be calculated for the total scores of the HCR-20 and the SVR-20. Only for this recidivism criterion our study confirmed the predictive power of applied risk assessment tools that had been found in previous studies (e.g., Barbaree et al., 2001; de Vogel et al., 2004; Sjöstedt & Långström, 2001). Apart from the low recidivism rate for new homicide crimes, the recidivism rates for general, sexual, and nonsexual violent reconvictions in the present study were similar to those found in sexual offenders in general (e.g., Hanson & Bussière, 1998; Hill, Habermann et al., 2008) and allow, therefore, meaningful statistical analyses of the predictive utility of the risk assessment instruments for these outcome criteria.
At the individual item level, no particularities could be observed with regard to predictive accuracy either. Thus, only a few significant correlations resulted overall; in fact, for some risk factors negative correlations could be observed between individual items and recidivism criteria that were contrary to expectations (e.g., major mental illness or use of weapon or threats of death towards the victim).
Since abundant empirical data regarding the predictive accuracy of standardized risk assessment tools suggest moderate to good predictability for various subgroups of offenders, the possible reasons for the unexpected results in our sample of sexual homicide perpetrators need to be discussed. As mentioned above, according to previous investigations, the recidivism rates for men who had committed sexual homicides with regard to reoffending with sexual and violent crimes were largely comparable with those of other international sex offender samples (Berner et al., 2008; Hill, Habermann et al., 2008). This shows first of all that the event to be predicted, “relevant recidivism,” also occurs within this group, and as such that both the inclusion criteria necessary for using prediction methods and the outcome criteria to be predicted are present.
A major risk-relevant difference to other sex offender samples that have been investigated could be the relatively long period of confinement as well as the considerably small quota of those released in our sexual homicide perpetrator sample. Thus, one could speak of a selection effect. As is suggested by the results of a comparison of released and nonreleased perpetrators of sexual homicides in our sample, a risk-relevant systematic sample bias can be assumed in this selection, since the group of the unreleased offenders in our sample showed higher scores in the applied risk assessment instruments (Hill, Habermann et al. 2008). Together with the above mentioned significant positive correlation between age-at-release and risk assessment scores, the findings of the present study indicate that the presence of risk factors is connected to a reduced probability of release (whereby the data available can not explain whether the consideration of risk factors on the part of the decision makers is systematic or rather more intuitive). Therefore, it might be possible that the results of the risk assessment instruments were already taken into account in the decisions for or against release and that the variance (i.e., the predictive power) of the risk assessment instruments was too small in order to discriminate between recidivists and nonrecidivists. However, looking at the measures of dispersion of the total raw scores of the risk assessment instruments in the result section, there are almost no differences between standard deviations of both subsamples together (n = 139; i.e., the released and the not released sexual homicide offenders) and the released subsample (n = 90) alone. The standard deviations of the total sample as well as of both subgroups differ only marginally for all four instruments. This finding supports the assumption that also for the released subsample the distribution of raw scores was basically sufficient to differentiate between high(er) and low(er) risk offenders.
Another explanation for the unexpected results in this study could be that the number of the released offender (i.e., the sample size) was too small to verify the predictive power of the risk assessment instruments. This issue is related to another potential source of error: the precision of ROC-related measures in small samples (Hanczar et al., 2010). Even if the use of the AUC metric is still referred to as the gold standard method to assess diagnostic and predictive accuracy (e.g., Lobo, Jiménez-Valverde, & Real, 2008; Mossman, 1994; Rice & Harris, 1995, 2005), its accuracy has to be interpreted cautiously especially in relatively small samples with considerably low base rates (e.g., Eher, Rettenberger, Schilling, & Pfäfflin, 2008). However, as Hanczar and colleagues (2010) pointed out, in applied research settings until now there is, first, no simple rule of thumb to determine if a sample is large enough for the application of the AUC metric and, second, even if one does assume that the sample size may be to small for ROC-related measures there is no simple solution available. Furthermore, none of the measures commonly used in the past provide an alternative way to evaluate the usefulness of predictive tests (Rice & Harris, 1995).
A further sample bias is also conceivable: although we do not have data regarding the extent of pre- as well as postrelease treatment and care of the released participants (for instance psycho-, sociotherapeutic or pharmacological treatment, as well as supervision), it can be assumed that, considering the seriousness of a sexual homicide, a comparatively high level of interventional effort was installed. The relatively extensive risk management effort may have invalidated the instruments’ predictive power in this sample. However, it should be pointed out that the risk assessment instruments applied in this study have been proved to predict criminal recidivism in treated sex offender samples (e.g., de Vogel et al., 2004; Rettenberger et al., 2010). The insufficient predictive power of the instruments in this retrospective investigation could also be explained by the fact that in our study the instruments were scored predominantly (just under two thirds) on the basis of reports on criminal liability in the context of the sentencing procedure, that is, that between the beginning of the detention (and thus about the time at which these reports were written) and release, an average 12.2 years had passed.
It is possible that the predictors investigated were valid for general sex offender samples, but not for the particular subgroup of offenders with sexually motivated homicides. Earlier investigations have already indicated that the age at the time when the index offence (here sexual homicide) was committed and age on release were insufficiently or not at all represented in these applied instruments, and these factors showed predictive power for recidivism with sexual and violent offences in this sample: young age at the time of the homicide and at the time of release correlated with higher recidivism rates (Berner et al., 2008; Hill, Habermann et al., 2008). Other characteristics that are more related to the course of events during the offence above all in sexually motivated homicides and which have already been discussed at length in the past (z.B. Hill et al., 2007; Ressler et al., 1988), could be of more importance for risk prediction, particular in this subgroup of sexual offenders. Current investigations suggest that characteristics of the way a crime was committed are important for other groups of sex offenders as well (Dahle, Biedermann, Gallasch-Nemitz, & Janka, 2010). Additionally, studies of our Hamburg research group on the prevalence of personality disorders and sexual disorders amongst offenders who have committed sexual homicides found high prevalence rates for particular disorders, for example, paraphilias, especially sexual sadism, sexual dysfunctions, and cluster-B personality disorders (e.g., Hill et al., 2007; Briken et al., 2006). This raises the question as to the predictive power of specific psychopathological configurations, even though for the sample investigated here, the relevant psychiatric diagnosis, (such as antisocial personality disorder or paraphilias in general and sexual sadism in particular) did not predict the relevant sexual or nonsexual violent recidivism (Hill, Habermann et al., 2008).
There are some relevant limitations in the present study that should be addressed. The information concerning the investigated risk factors was derived retrospectively from an evaluation of forensic psychiatric court reports. Thus, although at least three instruments (PCL-R, HCR-20, and SVR-20—the Static-99 can be used with file information only; Harris et al., 2003) require information derived from a clinical interview, for the purposes of the present study the risk instruments were coded by using file information only. Previous studies about the PCL-R, for example, have indicated that the sum scores of retrospectively file-based ratings are substantially lower than the sum scores derived by using clinical interview data and file information together (Bolt, Hare, Vitale, & Newman, 2004). Hare (2003) has assumed that especially the affective and interpersonal facets of psychopathy require clinical interview data for a reliable diagnostic evaluation. The relatively low PCL-R total score in the present study—compared to other cross-validation studies about the predictive utility of the PCL-R in sex offenders (e.g., Rettenberger et al., 2010; de Vogel et al., 2004)—seems to support this assumption. Thus, although most studies about the psychometric properties of risk assessment instruments have used retrospective research designs, a prospective design might have yielded different results (e.g., Hart & Boer, 2009). However, a prospective study with a follow-up period of more than 25 years is a difficult task.
Furthermore, follow-up information was restricted to data from the German federal criminal records, that is, recidivism rates were based only on reconvictions. It is known that many sexual as well as nonsexual violent acts remain undetected and are underestimated when relying solely on conviction data (Prentky, Lee, Knight, & Cerce, 1997). No information was available about risk-relevant factors and also protective factors during detention for the sexual homicide or after release (e.g. treatment, supervision, social and psychological developments and circumstances). This should be made the focus of further research. A clear strength of the study is the long follow-up period. The sample may not be representative of sexual homicide perpetrators in Germany or other countries. However, recruiting reports from four different German forensic centers and the size of the sample give some confidence that the results are not merely due to any selection bias.
In summary, sufficient predictive validity could not be measured for the four internationally most commonly used forensic risk assessment instruments PCL-R, HCR-20, SVR-20, and Static-99 especially for the sexual as well as nonsexual violent recidivism in this sample of sexual homicide offenders. Because of the different possible explanations for these unexpected results in this particular sample, one should not rush to disregard the application of these instruments in sexual homicide perpetrators altogether. Further investigations are necessary in order to explain the lack of predictive value of variables whose predictive power has been confirmed in several studies, and also to investigate further risk—and protective—factors in addition to the customary risk factors in this particular group of offenders.
Footnotes
Acknowledgements
Andreas Hill and Martin Rettenberger contributed equally to this work. The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint first authors.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors would like to thank the Deutsche Forschungsgemeinschaft for financial support (Grant Nos. BE 2280/2-1 and BE 2280/2-2).
