Abstract
Some actuarial and structured professional judgment (SPJ) risk-assessment instruments have already demonstrated their validity and predictive accuracy in expert criminal forensic evaluations. In contrast, little is known about the effectiveness of instruments identifying protective factors in risk of recidivism prediction. The present study was designed to evaluate the validity and predictive accuracy of the Structured Assessment of Protective Factors for Violence Risk (SAPROF) in 94 violent and sexual violent offenders assessed in a Swiss pretrial criminal forensic context. The SAPROF showed good interrater reliability, and was significantly correlated to predominately dynamic instruments but not to predominately actuarial instruments. However, in terms of predictive accuracy, the SAPROF did not perform as well as expected when compared with other instruments and with previous SAPROF accuracy validation studies. These results have implications for the use of the SAPROF in criminal forensic contexts risk assessment.
Keywords
Introduction
Traditionally, forensic psychologists and psychiatrists have been called on to diagnose defendants and determine whether they can be held criminally accountable for their actions. Increasingly, however, the criminal justice system is also tasking these professionals with assessing defendants’ risk of violence, chances of relapse, and threat to public safety over long or even indeterminate periods of time (Redding, Floyd, & Hawk, 2001; Senon, Lopez, & Cario, 2012). In Switzerland, as in Germany and elsewhere, experts are also expected to advise on mandatory treatment verdicts which can be delivered in addition to or in place of standard sentences. The treatments can be outpatient, inpatient, or even indefinite commitment. Such verdicts aim to protect public safety by preventing violent relapses and can be handed down regardless of accountability, psychiatric diagnosis, and sentence length (Gasser & Gravier, 2007). Experts are asked to make risk assessments for mandatory treatment verdicts before or during the trial and also afterwards, about every 2 years, to help judges to decide whether the verdict should be abrogated or, on the contrary, extended.
Despite judges’ critical attitude toward the capabilities of forensic science, in most cases, they nevertheless follow experts’ recommendations, particularly regarding treatment and risk assessment (Delacrausaz & Gasser, 2012). It is therefore essential to select consistent, reliable, and valid methods for conducting expert criminal forensic evaluations (Hardie, Elcock, & Mackay, 2008).
The criminal forensic evaluation process varies among experts. Initially experts were hesitant to use instruments when making professional judgments and their evaluations often relied on unstructured judgments only. However, the use of instruments is becoming a more common and desirable practice given that structured instruments have been shown to significantly improve expert decision making (Grisso, 2003; Heilbrun, Rogers, & Otto, 2002). Many different instruments are used to assess violence risk. A recent study revealed that forensic psychologists and psychiatrists used 286 different general psychology, actuarial, and/or structured professional judgment (SPJ) instruments for around 800 evaluations (Neal & Grisso, 2014). Another international survey on mental health professionals reported the use of more than 400 instruments, with more than half of the risk assessments conducted in the past 12 months having used such an instrument (Singh et al., 2014).
Studies on the various instruments have shown that the effectiveness of general psychology instruments is poor. In contrast, both actuarial and SPJ risk-assessment instruments have already demonstrated their validity and predictive accuracy (Singh & Fazel, 2010; Yang, Wong, & Coid, 2010).
The ongoing debate is about whether actuarial or SPJ instruments more accurately predict risk (Guy, 2008; Guy, Packer, & Warnken, 2012; Pedersen, Rasmussen, & Elsass, 2010). It is important to note that, contrary to recommendations in guidelines, SPJ instruments are sometimes also used as actuarial instruments; Historical Clinical Risk Management–20 (HCR-20) is an example of this (Douglas & Reeves, 2010). For both types of instrument, the professional systematically assesses the subject on a set of predetermined risk factors that have demonstrated significant empirical relationships with violent recidivism. How to reach the final risk judgment is the essential difference between the actuarial and the SPJ approach to the instrument. For actuarial instruments, the evaluator rates the patients on the items and adds up the score to algorithmically reach a conclusion regarding the likelihood of violence. For SPJ instruments as well, the evaluator rates the patients and adds up the item score. However, the evaluator also uses personal expertise and knowledge to interpret, combine, and weigh the risk factors as well as their relevance to the individual case and the anticipated required intervention to reach a global clinical evaluation called a “final risk judgment” (Lavoie, Guy, & Douglas, 2009; Tully, Chou, & Browne, 2013).
Some studies show that actuarial risk-assessment instruments are more predictive than SPJs (Hanson & Morton-Bourgon, 2004; Sjöstedt & Grann, 2002); however, others show the opposite, with SPJs outperforming actuarial risk-assessment instruments in prediction (de Vries Robbe, de Vogel, & Douglas, 2013; Pedersen et al., 2010). A study on the same instrument used as an actuarial instrument and as an SPJ found no differences in prediction ability between the two approaches (Lodewijks, Doreleijers, de Ruiter, & Borum, 2008).
Actuarial instruments evaluate mainly static risk factors, but SPJs rely on predominately dynamic risk factors. Whereas static risk factors are conceptually unchangeable, dynamic risk factors are susceptible to change and can thus be targeted by interventions (Belfrage & Douglas, 2002). Progress can be measured and used in reassessments to evaluate potential changes in the level of risk. The association between changes in dynamic factors and changes in risk over time has not yet been clearly demonstrated. Some studies advocate that risk prediction based on dynamic risk factors changes outperformed risk prediction based on static factors (de Vries Robbe, Mann, Maruna, & Thornton, 2015); others do not (Morgan, Kroner, Mills, Serna, & McDonald, 2013). In any event, beyond any incremental predictive validity, dynamic risk factor detection can help clinicians to better focus interventions and choose the most appropriate treatment programs, thereby contributing to violence prevention.
In addition, research on risk assessment over the last decade has underlined not only the importance of static and dynamic risk factors but also the identification of protective factors. In the follow-up to the Good Lives intervention model (Ward & Fortune, 2014), protective factors are defined as strengths that can prevent individuals from committing violence by counterbalancing or weakening risk factors (de Vries Robbe, de Vogel, Koster, & Bogaerts, 2015; Jones, Brown, Robinson, & Frey, 2014). Treatments to prevent risk of violence should thus aim to minimize risk factors and strengthen protective factors. Protective factors include a few static factors, such as intelligence and childhood attachment, but are mainly dynamic factors, such as coping abilities, social support, and empathy skills (Lodewijks et al., 2008).
By taking protective factors into account, treatment can be focused on positive individual features, which could be more motivating for offenders. A strength-based approach allows offenders to be considered not only as a sum of risk factors, but also as an individual with positive features (Jones et al., 2014). This reduces offenders’ feelings of stigmatization and fatalism and therefore offers more interesting, individualized, and motivating treatment plans and perspectives. Criminal desistance theory describes the importance of an offender learning to perceive himself as the author of his own change (Maruna, 2000). Moreover, an integrated assessment of risk and protection could diminish risk-assessment bias, notably in terms of false positives as protective factors might moderate overvalued risk factors’ impact (DeMatteo, Heilbrun, & Marczyk, 2005).
The Structured Assessment of Protective Factors for Violence Risk (SAPROF) was introduced in the Netherlands as an adjunct to other SPJs. It assesses general protective factors for recidivism in adults convicted of violent or sexual violent crimes (de Vogel, de Ruiter, Bouman, & de Vries Robbe, 2009). The authors of the SAPROF have shown in two retrospective studies that the SAPROF predicts the absence of violence among violent and sexual violent patients after discharge from their forensic psychiatric care unit for dangerous sexual violent offenders (de Vries Robbe, de Vogel, & de Spa, 2011; de Vries Robbe, de Vogel, et al., 2015). The SAPROF has shown good interrater reliability and significant negative correlations with others SPJ risk instruments. The SAPROF has also demonstrated an incremental validity notably when combined with other risk assessment instruments (de Vries Robbe et al., 2013).
To our knowledge, only three studies have analyzed the validity and accuracy of the SAPROF in other countries and populations: two German retrospective studies and one Irish prospective study. German studies explored and validated the SAPROF’s psychometric characteristics in 66 juvenile sex offenders and in 30 adult sex offenders (Klein et al., 2012; Yoon, Spehr, & Briken, 2011). Klein’s study examined the relationship between risk and protective factors measured by the Screening Tool for the Assessment of Young Sexual Offenders’ Risk (STAYSOR), Structured Assessment of Violence Risk in Youth (SAVRY), and SAPROF. Significant negative correlations were found between risk and protective factors. Yoon’s study analyzed correlations between risk and protective factors measured by the SAPROF, Sexual Violence Risk–20 (SVR-20), and Static-99. The SAPROF showed significant negative correlations with the SVR-20 and was not correlated with the Static-99. For the authors, these findings reflected the dynamic characteristic of the instrument. Neither study analyzed the SAPROF’s predictive validity. The third study conducted in Ireland showed that the SAPROF predicted self-harm and violence toward others during treatment in an adult forensic psychiatric population (Abidin et al., 2013). In general, more findings are needed to confirm the incremental validity of the SAPROF in violence assessment. Specifically, there have been no validation studies in Switzerland or in other French-speaking countries, even though the SAPROF was introduced into the French-speaking part of Switzerland for forensic evaluations a couple of years ago.
The objectives of this study are (a) evaluating the validity and predictive accuracy of the SAPROF in violent and sexual violent offenders in Switzerland, and (b) determining whether and how the SAPROF can strengthen risk assessments in criminal forensic contexts, notably in pretrial assessments when combined with other instruments. The role of the type of offense (violent or sexual violent) was considered.
Method
Participants
The sample consisted entirely of adult males who had been assessed by our forensic unit between 2000 and 2006 just prior to and/or during their trials and who were subsequently convicted of violent and/or sexual violent offenses. Offenses were considered violent when they involved force or the threat of force against another human being (i.e., homicide, attempted murder, manslaughter, assault and battery, armed robbery, and kidnapping), and all types of sexual assault involving physical contact (i.e., rape, sexual coercion, sexual abuse, and child molestation). There were 106 eligible candidates whose forensic reports met the criteria, but 12 offenders were excluded because they had never been released after their criminal sentencing. This was mainly due to mandatory treatment or indefinite commitment verdicts.
As shown in Table 1, the final sample comprised 94 offenders who were, for the most part, Swiss or European citizens under 45. They had been sentenced to mandatory jail terms of around 3 years on average and 20 years at most. Half were in detention awaiting trial when the expert forensic assessments were carried out. The most common professional training among the offenders was in craft trades: 38% (n = 36), or the service industry: 21% (n = 20); 10/94 reported working in professions requiring an elementary-level education, and 14/94 reported no professional training. However, about 82% (n = 77) offenders were unemployed at the time of the offense.
Study Samples’ Demographics, Psychiatric, and Criminological Characteristics
Note. N = 94. ICD = International Classification of Diseases.
In terms of their criminal histories, 51% (n = 48) offenders had been convicted of violent (nonsexual) offenses, and 49% (n = 46) had been convicted of sexual violent offenses. About 32% (n = 30) had been convicted of more than one type of violent offense; and 40% (n = 38) had also been convicted of associated nonviolent crimes.
With regard to psychopathology, diagnoses were mainly personality disorders (59%, n = 56) and substance abuse (49%, n = 46). Offenders had also been diagnosed with mental retardation (27%, n = 30) and with depression or anxiety (23.5%, n = 24). Furthermore, about 10% (n = 10) of the offenders had not been diagnosed with any psychiatric disorder, 65% (n = 60) of the offenders had a history of one or two psychiatric disorders, and about 25% (n = 23) had a history of three or more.
Measures
SAPROF
The SAPROF is a structured professional instrument for evaluating protective factors in violent and sexual violent offenders (de Vogel et al., 2009). It was intended to be used in combination with risk assessment instruments such as HCR-20 and SVR-20. The SAPROF comprises 17 protective factors sorted into three categories: Internal (e.g., intelligence, empathy, and coping), Motivational (e.g., work, financial management, and motivation for treatment), and External (e.g., social network, intimate relationship, and external control). Each item is scored from 0 to 2 depending on whether the factor is present or absent and whether or not it is protective: 0 (not present and not protective), 1 (probably present and probably protective), 2 (clearly present and clearly protective). Total protective scores (TPSs) range from 0 to 34. Subtracting the SAPROF TPS from the HCR-20 or SVR-20 respective total risk score (TRS) results in a new total integrated risk score (TIRS). Based on the coding results, a final protection judgment (FPJ) is also made (0 = low protection, 1 = moderate protection, 2 = high protection). The FPJ is not based solely on the sum of the points coded; rather, following a SPJ approach, it is an overall individual assessment by the rater. By combining the SAPROF FPJ with the final risk judgment (FRJ) made using HCR-20 or SVR-20, an overall risk assessment can be arrived at, known as a final integrated risk judgment or integrative final risk judgment (FIRJ).
HCR-20
The HCR-20 is a well-known structured professional instrument used to assess the risk of violence in adult populations for civil, clinical, forensic, and criminal justice purposes (Webster, Douglas, Eaves, & Hart, 1997). It comprises 20 items in three risk-factor categories: 10 historical (e.g., previous violence, relationship instability, substance use problems, and psychopathy), five clinical (e.g., lack of insight, impulsivity), and five risk-management (e.g., plans lack feasibility, stress). Each item is scored as 0 (not present), 1 (possibly present) or 2 (definitely present) to yield a TRS out of 40. Taking all the rating results into account and using a SPJ approach, a FRJ is made on a 3-point scale (0 = low risk, 1 = moderate risk, 2 = high risk). A large review on HCR-20 studies reported good to excellent interrater reliability as well as moderate to large predictive validity on further violence (Douglas & Reeves, 2010).
Violent Risk Appraisal Guide (VRAG)
The VRAG is a 12-item actuarial instrument which has been widely used to predict the risk of violence in violent offenders (Quinsey, Harris, Rice, & Cormier, 1998). The 12 items—for example, criminal history score for convictions and charges for nonviolent offenses prior to the index offense, age at index offense, victim injury, meets Diagnostic and Statistical Manual of Mental Disorders (3rd ed.; DSM-III; American Psychiatric Association, 1980) criteria for any personality disorder, and meets DSM-III criteria for schizophrenia—are purely static and can be exhaustively assessed from criminal and clinical records. The total score ranges from −24 to 38 and can be divided into three risk categories: −24 to −8 = low, −7 to 13 = medium, 14 to 38 = high (Harris et al., 2003). The VRAG showed acceptable accuracy in a Swiss validation study (Urbaniok, Endrass, Rossegger, & Noll, 2007).
SVR-20
The SVR-20 is a structured professional instrument similar to HCR-20, used to evaluate sexual offenders’ risk of sexual violence (Boer, Hart, Kropp, & Webster, 1997). It comprises 20 items covering three sexual risk factors: psychosocial adjustment (e.g., sexual deviation, victim of child abuse, psychopathy, substance use problems, suicidal/homicidal ideation, relationship problems, past nonsexual violent offenses), history of sexual offenses (high density, use of weapons or threats), and future plans (lack of realistic plans, negative attitude toward intervention). Each item is scored 0 (not present), 1 (possibly present) or 2 (definitely present), for a total score out of 40. Taking all the rating results into account and employing a SPJ approach, a final risk judgment is made on a 3-point scale (0 = low risk, 1 = moderate risk, 2 = high risk). Different studies showed that SVR-20 was a reliable risk assessment instrument (Rettenberger, Hucker, Boer, & Eher, 2009).
Static-99
The Static-99 is a widely used and validated static instrument designed to assess the risk of sexual recidivism among adult male sex offenders (Hanson & Thornton, 1999). It has also been found valid in predicting violent recidivism (Barbaree, Seto, Langton, & Peacock, 2001). It comprises 10 items on historical risk factors (e.g., age, relationship, prior sentencing, prior convictions, victimology-related items) and is coded exhaustively from information gleaned from files. Total scores range from 0 to 12 and can be sorted into four categories: 0-1 = low risk, 2-3 = moderate/low risk, 4-5 = moderate/high risk, more than 6 = high risk. The Static-99 showed acceptable accuracy in a Swiss validation study (Endrass, Urbaniok, Held, Vetter, & Rossegger, 2009).
Psychopathy Checklist–Revised (PCL-R)
The PCL-R (Hare, 1991; Hare, Clark, Grann, & Thornton, 2000) was also entirely assessed. However, for this study, PCL-R was used only to rate one of the items in the HCR-20, SVR-20, and VRAG subscales, and PCL-R scores are not analyzed here.
Procedures
Once the procedure received approval from the Human Research Ethics Committee of Lausanne University Hospital, the study data manager—a member of our unit not involved in rating for this study—gathered and sorted the data. He looked at all the forensic reports done prior to or during trial drawn up by the Lausanne Forensic Unit between 2000 and 2006 and sorted them into violent or sexual violent charges. He retrieved the official criminal-record data from the Swiss Federal Justice Office. Of the initial 161-report sample, 106 reports were selected for convictions for violent or sexual violent offenses. He also collected and managed information about length of detention, date of release, and any convictions for any type of offense after release. Offenders’ criminal histories were monitored for 3 years following the date of release. A 3-year follow-up is considered ideal according to the Swiss Federal Statistics Office guidelines on recidivism in Switzerland (Office Fédéral de la Statistique, 2009).
Six researchers trained in using the instruments used the SAPROF, HCR-20, and VRAG to rate all the reports in the sample. They also used the SVR-20 and Static-99 the rate the reports on sexual violent offenses. For all instruments, official French translations of the manuals/guidelines were used. For all the structured professional instruments—SAPROF, HCR-20, and SVR-20—both sum final scores and SPJ final judgments were calculated for the purpose of comparison. Categorical scores were calculated for the actuarial instruments—VRAG and Static-99—only for descriptive purposes. Subscale scores were calculated for the SAPROF only. The rating was done in different combinations of pairs. Each pair of researchers rated three to five instruments independently, then met to discuss their scores and reach a consensus on the scoring for all the instruments. This procedure is rare in the literature; however, SAPROF authors strongly recommended using consensus ratings when assessing the SAPROF and HCR-20 as this appears to better predict violence risk than single ratings (de Vogel & de Ruiter, 2006; de Vogel, van den Broek, & de Vries Robbe, 2014). This was also the case in a study on the SVR-20 (Logan & Watt, 2001).
The information was gathered from the forensic assessment reports, which included evaluations, summaries of three or four interviews with the offenders, sometimes summaries of interviews with the offender’s family, and conclusions and recommendations to the court and/or judges about the future risk of violence or reconvictions and about treatment plans. In general, these reports also included past criminal reports, police reports, clinical and psychological reports, as well as past final forensic expert reports. They did not include the judge’s final decision on the charges that led to the forensic assessment.
The researchers were not aware of the recidivism outcome during the rating process. Confidentiality was ensured by using a nonidentifying code assigned by the data manager for each offender throughout the duration of the study. Researchers did not have access to the identifying codes, and the data manager was blind to the rating scores.
Data Analysis
Interrater reliability
For each independent (single researcher) rating using the SAPROF, HCR-20, VRAG, SVR-20, and Static-99, the interrater reliability was measured through intra-class correlations (ICCs). Critical ICC values for single measures were ICC > .75 = excellent, ICC between .60 and .75 = good, ICC between .40 and < .60 = moderate, and ICC < .40 = poor (Fleiss, 1986).
Descriptive analyses
Consensus scores were used for the descriptive analyses. From each offender’s descriptive data, we calculated the SAPROF FPJs, total scores and subscale scores (Internal, Motivational, and External), the total HCR-20 scores and final judgments, and the total VRAG scores. For sexual violent offenders, we also calculated the SVR-20 global and final judgment scores and the Static-99 total score. Univariate one-factor ANOVAs were conducted to compare by offense type to investigate potential differences between violent and sexual violent offenders. Type I error rates were set at .05.
Concurrent validity
The consensus scores were used for analyses of concurrent validity, examined through bivariate Pearson’s correlations. Correlations were calculated between the SAPROF total scores, the FPJs, and the three SAPROF subscale scores (Internal, Motivational, External), and (a) the HCR-20 total scores and final judgments, (b) the total VRAG scores, (c) the SVR-20 global and final judgment scores, and the Static-99 total score for sexual violent offenders. Bivariate significant Pearson’s r correlations were r > .50 = high; r > .30 < .50 = moderate, and r > .25 < .30 = low (Cohen, 1988).
Predictive validity
The SAPROF, HCR-20, and VRAG consensus scores were used for the analyses on predictive validity, established with receiver operatic characteristic (ROC) curves analyses. ROC analyses distinguish the correct number of positive rate (sensitivity) against the false number of positive rate (1 minus specificity) expressed by the area under the curve (AUC). AUC values are the probability that a randomly selected recidivist would score higher on the instrument than a randomly selected nonrecidivist (Rice & Harris, 2005). An AUC of .00 represents perfect negative prediction, an AUC of .50 chance prediction, and an AUC of 1.0 perfect positive prediction. General critical AUC values are as follows: AUC > .75 and above = large; AUC > .70 to .74 = moderate to large. Per Swets’s (1988) recommendations, more detailed and rather conservative rating was used in the present study, as was done in a recent predictive validity study (Gammelgård, Koivisto, Eronen, & Kaltiala-Heino, 2015). AUC critical values could be considered excellent (AUC > .90 and above), good (AUC > .81 to .89), fair/moderate (AUC > .70 to .79), poor (AUC > .60 to .69), and failing (AUC > .50 to .59). The use of such guidelines allows for more reliable interpretation of AUC values; this is of utmost importance, given the impact of an expert’s recommendations in a pretrial context. This is especially true in countries with a penal system similar to ours. To compare the AUC values for the SAPROF ratings and the other instruments, AccuRoc analysis was used to apply the nonparametric method as described by DeLong (DeLong, DeLong, & Clarke-Pearson, 1988).
Predictions were generated for the sample only in its entirety because the group of recidivists, particularly the group of sexual violent offenders, was too small for separate analyses by type of offense. All the analyses were done with IBM® SPSS® Statistics Version 22, except for the AUC comparisons, which were done with R software version 10.
Outcome
Our outcome for predictive accuracy was recidivism/no recidivism. For our purposes, a recidivist is an offender who is convicted of one of the offenses we considered violent or sexual violent—as described in participants section—and is reconvicted soon after release. As mentioned in the procedure section above, the data on recidivism was retrieved from the criminal records kept by the Swiss Federal Justice Office, and follow-up was set at 3 years after release. General recidivism was defined as any reconviction for any nonviolent offenses after release, for example, drug dealing, traffic crimes. Violent recidivism was defined as a reconviction for a violent offense after release; sexual recidivism was defined as a reconviction for a sexual violent offense after release. Sexual/violent recidivism was defined as a reconviction for a violent and/or sexual violent offense after release, that is, the sum of violent and sexual recidivism as just defined above.
All outcomes were coded in a binary manner (1 = recidivated, 0 = did not recidivate) along with conviction date for the first new offense of a given category to perform a survival analyses.
Results
Interrater Reliability
As shown in Table 2, the interrater reliability for the SAPROF total scores, subscale scores, and final judgment scores was good to excellent. ICC values for the SAPROF were lower than for the other instruments but still within acceptable limits, even the ICCs total protection score and the internal scale for sexual violent offenders (both ICCs = .64).
SAPROF, HCR-20, VRAG, SVR-20, Static-99 Interrater Reliability by Offense Type
Note. SAPROF = Structured Assessment of Protective Factors for Violence Risk; HCR-20 = Historical Clinical Risk Management–20; VRAG = Violent Risk Appraisal Guide; SVR-20 = Sexual Violence Risk–20; ICC = intra-class correlation.
The interrater reliabilities for the HCR-20 total score and risk and integrated judgments were all excellent, as they were for the SVR-20 total score. The interrater reliabilities for the SVR-20 final risk and integrated judgments were good. The VRAG and Static-99 scores had the highest interrater reliability scores (both ICCs = > .95).
Descriptive analyses
Table 3 shows the mean total and scale ratings for the SAPROF, the HCR-20, and the VRAG for the whole sample, as well as the SVR-20 and the Static-99 for sexual violent offenders. There were no significant differences between violent and sexual violent offenders’ total mean scores of any compared instruments.
SAPROF, HCR-20, VRAG, SVR-20, Static-99 Descriptive Statistics by Offense Type
Note. SAPROF = Structured Assessment of Protective Factors for Violence Risk; HCR-20 = Historical Clinical Risk Management–20; VRAG = Violent Risk Appraisal Guide; SVR-20 = Sexual Violence Risk–20.
Table 4 shows the distribution of final judgments. In most cases, the SAPROF final protection, the HCR-20 final risk, and the SAPROF-HCR-20 integrated judgments were low to medium (FPJ = 97%, FRJ = 87%, and FIRJ = 91% of the sample). This was also the case for the SVR-20 risk and the SAPROF-SVR-20 integrated judgments (both = 93% of the sexual offenders in the sample). FRJ and FIRJ low/medium frequencies were almost the same (FRJ low, n = 40; FRJ medium, n = 43; FIRJ low, n = 45; FIRJ medium, n = 41). The FPJ included more medium than low (low, n = 21; medium, n = 70), as did both the SVR-20’s FRJ and FIRJ (FRJ low, n = 15, FRJ medium, n = 28; FIRJ low, n = 18, FIRJ medium, n = 24).
SAPROF, HCR-20, SVR-20 Final Judgments Frequency Distribution by Offense Type
Note. SAPROF = Structured Assessment of Protective Factors for Violence Risk; HCR-20 = Historical Clinical Risk Management–20; SVR-20 = Sexual Violence Risk–20; FPJ = final protection judgment; FRJ = final risk judgment; FIRJ = integrative final risk judgment; ns = not significant.
There were significant differences between violent and sexual violent offenders for the HCR-20 final risk judgment and SAPROF-HCR-20 final integrated judgments, in which there were more sexual violent offenders in the low risk categories than violent offenders (FRJ = p < .001, FIRJ = p < .014).
Concurrent validity
As shown in Table 5, high to moderate negative significant correlations were found between the HCR-20 total and final risk scores and the SAPROF total, final protection, and subscale scores, except for the External subscale. Higher SAPROF total risk and Internal subscale scores correlated highly with a lower HCR-20 total score. Correlations were generally lower with the HCR-20 final risk judgment than with the HCR-20 total score but were still significant. The External subscale showed no significant correlation with any of the other four instruments. Higher SAPROF Internal subscale scores correlated highly with lower SVR-20 total scores. Moderate to low negative significant correlations were found between SAPROF final protection and total scores and SVR-20 total scores. Conversely, none of the SAPROF final protection scores or subscale scores were significantly correlated to the SVR-20 final risk judgment or to the VRAG or Static-99 total scores.
SAPROF Concurrent Validity Correlations With HCR-20, VRAG, SVR-20, Static-99
Note. SAPROF = Structured Assessment of Protective Factors for Violence Risk; HCR-20 = Historical Clinical Risk Management–20; VRAG = Violent Risk Appraisal Guide; SVR-20 = Sexual Violence Risk–20.
p < .05. **p < .01.
Predictive validity
Reconviction occurred at an average of 742 days after release. When reoffending, violent offenders were reconvicted significantly earlier than sexual violent offenders (639 days vs. 845 days, p < .012).
About 30% of the offenders were reconvicted within 3 years of their release and one in three was reconvicted for both nonviolent and violent reoffenses. Violent offenders were 3 times more likely than sexual violent offenders to reoffend within 3 years, both in terms of general recidivism (n = 21 compared with n = 6) and sexual/violent recidivism, that is, the sum of violent recidivism and sexual recidivism (n = 10 compared with n = 3). Only three offenders (one violent, two sexual violent) were reconvicted for sexual/violent reoffenses. None were convicted for nonviolent sexual offenses. Sexual recidivism alone, as defined in the Method section, was therefore excluded from our analysis.
Table 6 presents the AUC values of the SAPROF, HCR-20, and VRAG for sexual/violent and general recidivism. The SAPROF total score had fair predictive validity for sexual/violent recidivism (AUC = .70) and poor predictive validity for general recidivism (AUC = .64). The SAPROF final protection judgment had poor predictive validity for both sexual/violent and general recidivism (AUC = .63, AUC = .69).
SAPROF, HCR-20 and VRAG Ratings’ AUC Values for Sexual/Violent and General Recidivism
Note. SAPROF = Structured Assessment of Protective Factors for Violence Risk; HCR-20 = Historical Clinical Risk Management–20; VRAG = Violent Risk Appraisal Guide; AUC = area under the curve; CI = confidence interval.
The SAPROF AUC values predict the nonrecidivism rate.
The Motivational subscale score had fair predictive validity for sexual/violent recidivism and poor predictive validity for general recidivism (AUC = .76, AUC = .69). The Internal subscale score had poor predictive validity for sexual/violent recidivism and failing predictive validity for general recidivism (AUC = .65, AUC = .57). The External subscale had failing predictive validity for both sexual/violent and general recidivism (both AUCs = .47).
Both the HCR-20 total and the SAPROF-HCR-20 integrated total scores had good predictive validity for sexual/violent recidivism (AUC = .85, AUC = .82) and fair predictive validity for general recidivism (AUC = .74, AUC = .72). Both the HCR-20 final risk and the SAPROF-HCR-20 final integrated judgment had poor predictive validity for sexual/violent recidivism (both AUCs = .68) and fair predictive validity for general reoffending (AUC = .70, AUC = .71).
The DeLong comparisons show no significant difference between the HCR-20 and SAPROF-HCR-20 prediction accuracy (between the HCR-20 and the SAPROF- HCR-20 integrated total score: Z sexual/violent recidivism = .75, p = .45, Z general recidivism = .54, p = .58; between the HCR-20 final judgment score and the SAPROF-HCR-20 integrated final judgment score: Z sexual/violent recidivism = .05, p = .95, Z general recidivism = .37, p = .71).
The VRAG had the highest AUC values and a large predictive validity for both sexual/violent and general recidivism (AUC = .83, AUC = .80). The DeLong comparisons showed no significant difference between the VRAG’s and the HCR-20’s prediction accuracy (between the VRAG and the HCR-20 total score: Z sexual/violent recidivism = .54, p = .59, Z general recidivism = .90, p = .37).
Discussion
This study is part of a broader research project started in 2013 to evaluate the validity and predictive accuracy of several risk and protection evaluation instruments used in forensic evaluation. Specifically, it investigated the SAPROF’s validity and predictive accuracy, as well as its incremental value in predicting recidivism as compared with and when combined with other instruments in the assessment of Swiss violent and sexual violent offenders.
The results were conclusive on the SAPROF’s interrater and concurrent validity, except on subscales. The results on the SAPROF’s predictive validity, however, were less convincing. At best, the SAPROF showed fair predictive AUC values, and on subscales, its predictive AUC values were poor or failing. Above all, as compared with or when combined with other instruments, the SAPROF did not demonstrate incremental predictive validity for sexual/violent nor general recidivism.
Our work fully confirms the interrater reliability of the SAPROF. Good interrater reliability is an essential quality for a useable assessment. A meta-analysis of 118 prediction studies empirically explored the importance of interrater reliability; it showed that assessments with high interrater reliability are more predictive than assessments with low-interrater reliability (Hanson & Morton-Bourgon, 2009).
The concurrent validity results in our research mainly confirm previous SAPROF studies (Klein et al., 2012; Yoon et al., 2011). They provide further evidence of the SAPROF being predominately dynamic in nature: the total and final judgments of the SAPROF both had significant negative correlations with predominately dynamic instruments such as the HCR-20 and SVR-20, and did not have any significant correlations with predominately static instruments. In addition, correlation values indicated that the SAPROF only partially overlaps with the HCR-20 and to an even lesser extent with the SVR-20. Therefore, our results have empirically confirmed that the presence of protective factors cannot definitively imply an absence of risk, even though both measure the same concept (de Vries Robbe et al., 2013; Parent, Guay, & Knight, 2012). This suggests that integrating protective factors into assessments might be necessary for a complete picture of violent behavior.
However, our findings did not confirm previous SAPROF validation studies (Abidin et al., 2013; de Vries Robbe et al., 2011; de Vries Robbe, de Vogel, et al., 2015) with regard to the concurrent validity of SAPROF subscales. We suggest that subscale validity and accuracy is still an unresolved issue and therefore warrants further examination. The SAPROF subscales, except for the Motivational one, showed mainly failing predictive validity. The External subscale appears to be the most controversial: this subscale showed no significant correlations with any other SPJ instruments and no significant predictive AUC values. This may indicate a potential problem in the SAPROF’s internal consistency at the subscale level. Future investigations could focus on stronger statistical validation of the SAPROF subscales.
This study’s results on the SAPROF’s predictive validity conflicted with previous studies (Abidin et al., 2013; de Vries Robbe et al., 2011; de Vries Robbe, de Vogel, et al., 2015) and were less promising, notably because they failed to validate the SAPROF’s incremental value in predicting recidivism. That the VRAG had the highest AUC values is unsurprising in light of the low-to-moderate risk category of the study sample. This confirms previous studies in which actuarial instruments seemed to better identify low-risk individuals (Fazel, Singh, Doll, & Grann, 2012; Sjöstedt & Grann, 2002). Furthermore, this is consistent with the HCR-20’s total score being more predictive than final judgments for both sexual/violent and general recidivism. It is even more compelling that the final HCR-20 score was as good at prediction as the VRAG. We therefore suggest that, especially when assessing low-to-moderate risk individuals, the HCR-20 will be the main assessment instrument in the context of forensic criminal evaluations at the pretrial stage, given the frequency of specific time constraints and a strong need for efficiency. Unlike the VRAG, the HCR-20 allows the measurement of dynamic factors that could serve as a baseline for repeated posttrial follow-ups or as guidance on the most appropriate compulsory treatment measures. Finally, the literature supports the HCR-20 as a well-established, consistent, and valid instrument (Douglas & Reeves, 2010).
Given our recommendation for the HCR-20, should it be assumed that the SAPROF has no place in pretrial criminal forensic evaluations? Not entirely. At this stage, we feel that the SAPROF could be beneficial in providing more complete information when advising on treatment and also when future reassessments are expected. The SAPROF was designed not only for counterbalancing risk assessment in violence and sexual violence prediction but also for measuring dynamic protective factors and therefore providing information about offenders’ strengths. The presence or absence of these factors can act as a guide in planning interventions and lead to more complete and tailored needs-based treatment than if only dynamic risk factors were assessed. The SAPROF, together with the HCR-20 or SVR-20, could then be used in the posttrial stage as well to measure changes in protective and risk factors. Reassessments would show treatment progress (or the lack thereof) and aid in legal decision making. Caution is required when considering SAPROF subscales. Further validation is needed before including the SAPROF systematically as an assessment instrument of the risk of reoffense.
Our study should not be used as a basis for generalizations, owing to its relatively small sample size and the low-risk status of the offenders. Future research will need to validate the SAPROF in other offender populations (e.g., high-risk offenders and sexual nonviolent offenders, for example, exhibitionists and voyeurs) and form a sufficiently large sample to be able to predict specific sexual recidivisms (e.g., sexual violent offenses vs. sexual nonviolent offenses). In addition, the recidivism data we retrieved reflected only convictions from the official criminal records, which are known to be only the “tip of the iceberg” of all violence. Although less sensitive than charges, convictions are more reliable.
All the offenders in our sample came from the same expert forensic unit’s evaluations, but it is this unit’s reports that are the most sought after for these kinds of criminal evaluations. The reports were well organized and rich in supplementary information, which meant that data were not missing when we went to complete the various study instruments. Even though the sources were complete, the study was still retrospective in design. Follow-up studies are needed to further investigate the benefits and limitations of using the SAPROF in criminal forensic evaluations.
