Abstract
We surveyed evaluators who conduct sexually violent predator evaluations (N = 95) regarding the frequency with which they use the Psychopathy Checklist–Revised (PCL-R), their rationale for use, and scoring practices. Findings suggest that evaluators use the PCL-R in sexually violent predator cases because of its perceived versatility, providing information about both mental disorder and risk. Several findings suggested gaps between research and routine practice. For example, relatively few evaluators reported providing the factor and facet scores that may be the strongest predictors of future offending, and many assessed the combination of PCL-R scores and sexual deviance using deviance measures (e.g., paraphilia diagnoses) that have not been examined in available studies. There was evidence of adversarial allegiance in PCL-R score interpretation, as well as a “bias blind spot” in PCL-R and other risk measure (Static-99R) scoring; evaluators tended to acknowledge the possibility of bias in other evaluators but not in themselves. Findings suggest the need for evaluators to carefully consider the extent to which their practices are consistent with emerging research and to be attuned to the possibility that working in adversarial settings may influence their scoring and interpretation practices.
Keywords
There is ample evidence that forensic evaluators use the Hare (2003) Psychopathy Checklist–Revised (PCL-R) when conducting evaluations of sex offender risk. Surveys show that the PCL-R is the second most commonly used measure in sex offender risk assessment, with only the Static-99 (Hanson & Thornton, 2000) used by more evaluators (Jackson & Hess, 2007; Neal & Grisso, 2014). Sex offender risk assessment also appears to be one of the most common contexts in which evaluators administer the PCL-R and testify about PCL-R results. Reviews of U.S. and Canadian case law found that about 60% of cases describing PCL-R results involved sexual offenders (DeMatteo, Edens, Galloway, Cox, et al., 2014; Edens, Cox, Smith, DeMatteo, & Sörman, 2015).
Why Do Evaluators Use the PCL-R in Sex Offender Risk Assessment?
But why, exactly, do evaluators use the PCL-R for sex offender risk assessment? The PCL-R is a measure of psychopathic personality traits, which are more clearly associated with generalized offending than sexual offending (Hare, 2003). Indeed, findings from meta-analyses suggest that psychopathy measure scores are only moderately associated with sexual recidivism (Hanson & Morton-Bourgon, 2005; Hawes, Boccaccini, & Murrie, 2013). Across 20 studies, the association between PCL-R Total scores and sexual recidivism was d = .40 (Hawes et al., 2013), a meaningful effect, but much smaller than the d = .67 effect for actuarial measures of sexual recidivism risk (see Hanson & Morton-Bourgon, 2009).
It could be that evaluators use the PCL-R for sex offender risk assessment because some research suggests that offenders with high levels of both psychopathy and sexual deviance are at an especially high risk for reoffending; the so-called “deadly combination” of sex offender risk factors (Hare, 1999, p. 189). The PCL-R manual (Hare, 2003) details findings from several psychopathy and sexual deviance studies and concludes that “the PCL-R has relatively good predictive validity with respect to sexual offenses, particularly when combined with measures of sexual deviance” (p. 154). This conclusion is consistent with meta-analytic findings, which suggest that the odds of sexual recidivism are about 3 times higher for sexual offenders with high levels of both psychopathy and sexual deviance than for other sexual offenders (Hawes et al., 2013). This effect (d ≈ .85) is as large as effects for measures specifically designed to predict sexual recidivism (see Hanson & Morton-Bourgon, 2009).
But there are questions about the extent to which findings from psychopathy and sexual deviance interaction studies generalize to widespread forensic practice (Hawes et al., 2013). Only six studies have examined the possible benefits of combining PCL-R and sexual deviance measure scores for predicting sexual recidivism, and three used phallometric measures that may not be available to many evaluators (G. T. Harris et al., 2003; Looman, Morphett, & Abracen, 2012; Rice & Harris, 1997). The remaining three used item or total scores from other assessment measures as indicators of sexual deviance (Hildebrand, de Ruiter, & de Vogel, 2004; Olver & Wong, 2006; Seto, Harris, Rice, & Barbaree, 2004), including the Screening Scale for Pedophilic Interests (SSPI; Seto & Lalumière, 2001), Violence Risk Scale: Sex Offender Version (VRS:SO; Wong, Olver, Nicholaichuk, & Gordon, 2004), and the Sexual Violence Risk–20 (SVR-20; Boer, Hart, Kropp, & Webster, 1997); however, none of these measures appear to be widely used among evaluators conducting sex offender risk assessments (Jackson & Hess, 2007; Neal & Grisso, 2014). If evaluators do consider the extent to which offenders have high levels of both psychopathy and sexual deviance, they may be assessing deviance using methods that have not been examined in the interaction studies.
Another possible explanation for the use of the PCL-R in sex offender risk assessment is that evaluators use the PCL-R as a measure of non-sexual violence or general recidivism. Meta-analytic findings provide stronger support for the use of PCL-R scores for this purpose, with PCL-R Total scores better predicting non-sexual violence (d = .63) than sexual violence among sexual offenders (d = .40; Hawes et al., 2013). These findings may be more relevant in a general presentencing evaluation of a sex offender, in which the court may be concerned about more general violence and criminality but less relevant in sexually violent predator (SVP) commitment evaluations, in which the focus is solely sexual violence (see Witt & Conroy, 2009). SVP laws allow for the postrelease civil commitment of sexual offenders believed to be at an especially high risk for reoffending due to an underlying mental illness or abnormality. In the 20 states with SVP laws (as well as the Federal System), evaluators are asked to determine whether the offender has a mental or behavioral abnormality that predisposes him to commit future acts of sexual violence (Witt & Conroy, 2009).
It may be that SVP evaluators use the PCL-R to measure traits relevant to mental illness or mental abnormality as defined in SVP statutes, as opposed to risk for recidivism. A prior survey of SVP evaluators found that the most common method for determining such an abnormality was to consider whether the offender had both a personality disorder and a history of sexual offending (Jackson & Hess, 2007). A recent review of SVP statutes found that 14 included language indicating that offenders with personality disorders may qualify for commitment (DeMatteo, Murphy, Galloway, & Krauss, 2015). Although psychopathy is not formally recognized as a personality disorder in the current Diagnostic and Statistical Manual Mental Disorders (American Psychiatric Association, 2013), courts have ruled that a finding of mental abnormality need not correspond with a particular psychiatric diagnosis (Kansas v. Hendricks, 1997). It could also be that evaluators use the PCL-R as a measure of antisocial traits to assist in diagnosing antisocial personality disorder, a mental illness that some courts have ruled is sufficient for SVP commitment (Care and Treatment of Heikes v. State, 2005; In re commitment of Adams, 1998; Murrell v. State, 2007). For example, the U.S. court of Appeals (seventh Circuit) ruled in Brown v. Watters (2010) that if a personality disorder (e.g., antisocial personality disorder) was severe enough to cause an inability to control behavior, it was sufficient for commitment.
How Do Evaluators Use PCL-R Results in Sex Offender Risk Assessment?
Although several surveys have asked forensic evaluators how often they use the PCL-R (e.g., Jackson & Hess, 2007; Neal & Grisso, 2014; Viljoen, McLachlan, & Vincent, 2010), most provide only limited information about how or why evaluators use PCL-R results. For example, no surveys have examined whether evaluators report factor or facet scores, and none have asked evaluators to identify the cut scores they use when forming conclusions about an offender’s level of psychopathy or risk.
Evaluators can combine PCL-R item scores to obtain a total score, two factor scores, and four facet scores. Factor 1 consists of an Interpersonal facet (Facet 1) and an Affective facet (Facet 2), and Factor 2 consists of an Impulsive Lifestyle facet (Facet 3) and an Antisocial Behavior facet (Facet 4). Although no survey has asked evaluators whether they report PCL-R factor or facet scores, case law reviews and studies of evaluator reports suggest that the majority (e.g., >80%) do not report either (Blais & Forth, 2014; DeMatteo, Edens, Galloway, Cox, et al., 2014; Edens et al., 2015). If SVP evaluators report only PCL-R Total scores, they may be excluding information about the PCL-R scores that are most predictive of recidivism risk. For example, Factor 2 scores are stronger predictors of sexual recidivism (d = .44) than Factor 1 scores (d = .17), and may even be better predictors than Total scores (d = .36 in studies also reporting effects for factor scores; Hawes et al., 2013). In the five studies that have reported effects for facet scores, Facet 4 (d = .40) is the only facet score significantly associated with future sexual offending (d = .01-.09 for Facets 1-3; Hawes et al., 2013). The pattern is even stronger for predicting non-sexually violent recidivism among sexual offenders, with Factor 2 scores (d = .70) being much stronger predictors than Factor 1 scores (d = .06).
Existing studies also provide only limited information about whether evaluators provide a categorical interpretation of an offender’s PCL-R results and the cut scores that form the basis of their categorical interpretations. Although taxometric analyses of PCL-R scores suggest that psychopathic traits are best conceptualized as falling along a continuum (Edens, Marcus, Lilienfeld, & Poythress, 2006; Walters, Marcus, Edens, Knight, & Sanford, 2011), clinicians prefer to communicate risk assessment results using categorical as opposed to numerical messages (Heilbrun et al., 2004; Viljoen et al., 2010). For PCL-R results, evaluators appear to prefer dichotomous interpretations. A recent study of evaluator reports found that 59.4% contained a statement concerning whether the offender met criteria for psychopathy, whereas 20.7% provided a categorical risk level (e.g., high, moderate, low), and less than 1% provided a probabilistic estimate of risk (Blais & Forth, 2014).
But what cut score do evaluators use when making categorical decisions? The original PCL-R manual (Hare, 1991) suggested a cut score of 30 for identifying offenders as psychopaths, and many early PCL-R studies used this cut score to compare psychopaths with non-psychopaths (see Hare, 2003). The current PCL-R manual (Hare, 2003) describes several possible cut scores for identifying offenders with high or especially high levels of psychopathy (e.g., 25, 27, 30, 33) but does not make a strong recommendation for a particular score, raising questions about the variability among clinicians using cut scores in forensic practice.
One factor that could explain variability in cut scores use is the retaining party for which an evaluator typically performs evaluations (Murrie, Boccaccini, Guarnera, & Rufino, 2013). Findings from several field studies suggest that evaluators testifying for the prosecution tend to assign higher PCL-R scores than evaluators testifying for the defense (DeMatteo, Edens, Galloway, Cox, et al., 2014; Edens et al., 2015; Murrie, Boccaccini, Johnson, & Janke, 2008; Murrie et al., 2009). It may be that the same factors that contribute to these PCL-R scoring differences also lead to differences in score interpretation (Murrie & Boccaccini, 2015). For example, prosecution-retained evaluators might require a lower score (e.g., 25) than defense evaluators (e.g., 30) to identify offenders as having a high level of psychopathy.
Another issue related to categorical interpretation of PCL-R results is whether evaluators use the term psychopath to describe an offender. Labeling a sexual offender as psychopathic may lead to especially negative impressions about the offender (Guy & Edens, 2003, 2006). Although describing an offender as having high levels of psychopathic traits does not necessarily lead to this type of labeling effect, labeling an offender as a “psychopath” does (Boccaccini, Murrie, Clark & Cornell, 2008; Murrie, Boccaccini, McCoy, & Cornell, 2007). Existing studies of evaluator practices provide somewhat contrasting views about labeling in the field. One survey found that 41% of evaluators who used the PCL-R with adults labeled high scoring offenders as psychopaths (Viljoen et al., 2010), but data collected from court cases and evaluator reports suggest that evaluators label fewer than 25% of offenders they evaluate (Blais & Forth, 2014; DeMatteo et al., 2014).
Current Study
We surveyed forensic evaluators (N = 95) who use the PCL-R in SVP evaluations to obtain information about how and why they use the PCL-R in SVP cases. We were especially interested in whether evaluators considered the combined pattern of PCL-R scores and sexual deviance when forming conclusions about recidivism risk, and how those who did so assessed sexual deviance. We asked evaluators to identify the cut scores they used to classify offenders as having high, moderate, and low levels of psychopathy and examined whether cut score use varied depending on the side for which evaluators typically performed evaluations. When we examined Static-99R reporting practices in an overlapping sample of 109 SVP evaluators, we found that defense-retained evaluators were more likely to endorse interpretation practices suggesting the lowest possible level of risk for any given score, whereas prosecution-retained evaluators were more likely to endorse practices suggesting the highest possible level of risk (Chevalier, Boccaccini, Murrie, & Varela, 2015). If this allegiance effect for score interpretation also applies to the PCL-R, we should find that defense evaluators require higher scores than state evaluators before labeling an offender as moderate or high in psychopathy.
Finally, we asked evaluators about the extent to which they believed that their own PCL-R scores as well as the scores assigned by other evaluators are influenced by the side or agency that requests the evaluation. Although data from both field and experimental studies show that state evaluators assign higher scores than defense evaluators in SVP cases (DeMatteo, Edens, Galloway, Toney Smith, et al., 2014; Murrie et al., 2013; Murrie et al., 2009), we suspected that most evaluators would report believing that their own PCL-R scoring was relatively unaffected by the side of retention. Indeed, several lines of emerging research suggest that forensic evaluators appear to view other evaluators, but not themselves as susceptible to bias (Murrie, Boccaccini, Guarnera, Rufino, & Binns, 2012; Neal, 2011), a pattern of effects suggesting the possibility of a “bias blind spot” (Pronin, Lin, & Ross, 2002) among forensic evaluators.
Method
Participants
Participants were SVP evaluators who responded to an email request about PCL-R and Static-99R reporting practices. We posted recruitment announcements on the American Psychology-Law Society (APLS) and Association for the Treatment of Sexual Abusers (ATSA) listserves, and sent individual recruitment emails to known SVP evaluators. Recruitment emails provided a link to the survey (hosted by SurveyMonkey) and those who participated received a US$10.00 Amazon.com gift card. Of the 118 evaluators who responded to the survey, 95 (80.5%) reported using the PCL-R. We reported findings for Static-99R use and reporting practices in a separate article (Chevalier et al., 2015).
Among the 95 evaluators who used the PCL-R, most reported having conducted SVP evaluations for at least 5 years (n = 69, 72.6%), and many (n = 63, 66.3%) reported having conducted more than 50 SVP evaluations. Most participants (n = 86, 90.5%) reported a PhD or PsyD degree, with others reporting an MA (n = 4, 4.2%), an MD (n = 3, 3.2%), or not listing a degree (n = 2, 2.1%). The most commonly represented states/jurisdictions were Wisconsin (17.9%, n = 17), Washington (13.7%, n = 13), Virginia (13.7%, n = 13), and Texas (13.7%, n = 13). When asked who typically refers SVP cases to them, 46.3% (n = 44) reported usually or always the state agency (e.g., corrections, mental health) responsible for initial SVP selection, 17.9% (n = 17) reported usually or always the state/petitioner, 21.1% (n = 20) reported usually or always the attorney for the respondent (defense), 10.5% (n = 10) reported a fairly even split involving two or more of the above, and 4.2% (n = 4) did not provide a response. We did not ask participants who endorsed receiving referrals from multiple sources to identify those sources.
Our recruitment procedures do not allow for a direct assessment of the response rate to our survey. At the time of the survey, the ATSA membership directory listed 246 members who reported providing consultation or training related to both “civil commitment services” and “risk assessment.” Although this does not mean that all of these professionals conduct SVP evaluations, and some SVP evaluators are not ATSA members, our sample does not appear to represent an especially small subset of SVP evaluators.
Survey Questions
The survey included 17 questions about PCL-R use and reporting practices in SVP evaluations, including how often they used the PCL-R (all, most, some, or few SVP evaluations) and the “best estimate of the average PCL-R Total score” they assigned to offenders “for SVP evaluations,” to offenders “who sexually offend against adults (only),” and to offenders “who sexually offend against children.” We then asked evaluators why they used the PCL-R, whether they assessed the combination of psychopathy and sexual deviance, and how they assessed sexual deviance (see Table 1). We asked evaluators to respond to questions about their reporting practices concerning total, factor, and facet scores, the importance they placed on these scores, whether they made categorical interpretations of PCL-R results, and whether they labeled or diagnosed offenders as psychopaths (see Table 2). Finally, we asked evaluators to rate the likelihood that side of retention might influence evaluators’ PCL-R scoring practices (1 = not likely, 3 = very likely) and the likelihood that side of retention might influence their own PCL-R scoring practices (1 = not likely, 3 = very likely). To examine the extent to which any bias blind spot effect may be unique to the PCL-R, we also asked the same two questions about Static-99R scoring (Helmus, Thornton, Hanson, & Babchishin, 2012).
PCL-R Use in Sexually Violent Predator Evaluations.
Note. PCL-R = Psychopathy Checklist–Revised; SVP = sexually violent predator; VRS-SO = Violence Risk Scale–Sex Offender Version.
Participants could endorse or list more than one option.
PCL-R Score Reporting in Sexually Violent Predator Evaluations.
Note. PCL-R = Psychopathy Checklist–Revised.
Participants could endorse more than one option.
Results
When asked how frequently they used the PCL-R, 32.6% of evaluators reported using it in all SVP evaluations, 27.4% reported using it in most SVP evaluations, 22.1% reported using it in some SVP evaluations, and 17.9% reported using it in few SVP evaluations (see Table 1). Evaluators’ estimates of the average PCL-R Total score they assigned in SVP evaluations ranged widely, from 6 to 31 (M = 21.25, SD = 4.23). They reported assigning higher scores to those who offended only against adults (M = 23.40, SD = 3.52) than those who offended against children (M = 18.04, SD = 4.94), which is consistent with PCL-R scores from sex offender samples (see Brown, Dargis, Mattern, Tsonis, & Newman, 2015). This was a large (d = 1.27, 95% confidence interval [CI] = [0.76, 1.31]) and statistically significant difference, t(79) = 9.26, p < .001. But, once again, there was a notable amount of variability in estimates, with average scores ranging from 15 to 30 for those who offended only against adults and from 8 to 29 for those who offended against children.
Evaluators’ estimates of the average PCL-R Total score they assigned in SVP evaluations were similar among those who most often worked for the prosecution (M = 21.36, SD = 3.93), a state agency (M = 21.03, SD = 3.93), or the defense (M = 21.61, SD = 3.85). There was a similar pattern of minimal differences for scores assigned to those who offended sexually against children and scores assigned to those who offend sexually against adults, with none of the differences between the three evaluator groups approaching statistical significance for any of the three “average PCL-R score” questions, Fs(1, 69) < 1.35, ps > .27,
Why Do Evaluators Use the PCL-R in SVP Cases?
The most commonly reported reason for using the PCL-R was to provide information about risk for sexual reoffending (67.4%), although almost one half of the evaluators reported using the PCL-R as a measure of mental illness/abnormality (45.3%) or risk for violent recidivism (40.0%; see Table 1). There were eight evaluators who did not endorse any of these reasons for PCL-R use. When these eight evaluators were asked to explain why they used the PCL-R, four (3.2%) described using it as a measure of psychopathic traits without indicating a specific purpose, one (1.1%) described using it to gauge treatment effectiveness and supervisability, one (1.1%) described only scoring the PCL-R when a different evaluator had scored the measure incorrectly, and two (2.1%) did not provide a reason for use. There were no significant differences between prosecution, state, and defense evaluators in response to questions about why they used the PCL-R, which scores they reported, and which scores they believed were most relevant for predicting recidivism. 1
Many evaluators (n = 42, 44.2%) endorsed more than one reason for PCL-R use. The most common combination was using the PCL-R as a measure of both reoffense risk (sexual or violent) and mental illness/abnormality (n = 30, 31.6%). Among the 74 (77.9%) evaluators who reported using the PCL-R as a measure of either sexual or non-sexual violence risk, 28 (37.8%) endorsed using it for both sexual and non-sexual violence, 36 (48.6%) reported using it for sexual violence only, and 10 (13.5%) reported using it for non-sexual violence only.
PCL-R and Sexual Deviance
Most evaluators (n = 76, 80.0%) reported directly addressing the combination of an offender’s PCL-R score and level of sexual deviance. Among these 76 evaluators, the most frequently endorsed methods for assessing deviance were a documented history of deviant sexual behavior (96.1%) and paraphilia diagnosis (82.9%). Fewer evaluators (44.7%) reported using plethysmography results, perhaps because plethysmography services are more commonplace in some jurisdictions than others. For example, 15 of the 34 evaluators who reported using plethysmography results performed evaluations in Wisconsin. Fewer evaluators (28.9%) reported using a psychological or behavioral assessment instrument designed to assess sexual deviance. The only instruments of this sort used by more than one evaluator were the Abel Assessment for Sexual Interests (see www.ablescreening.com), SSPI, and Multiphasic Sex Inventory (Nichols & Molinder, 1984), but each of these was used by four or fewer evaluators (see Table 1).
It was common for evaluators to endorse using multiple methods for assessing level of sexual deviance. We listed four possible methods in our survey (see Table 1), and the mean number of methods endorsed by the 76 evaluators who assessed the PCL-R and deviance interaction was 2.53 (SD = 0.84). The most common combination was to use both a documented history of deviant sexual behavior and a paraphilia diagnosis (n = 62, 81.6%).
PCL-R Score Reporting
Nearly all evaluators (95.7%) reported providing results for PCL-R Total scores in their SVP evaluation reports, with only about half (47.9%) reporting factor scores and a third (30.9%) reporting facet scores (Table 2). There were four evaluators (4.2%) who did not endorse any of these reporting options, suggesting that they do not report any scores. Many evaluators reported providing raw score (n = 74, 77.9%) and/or percentile values in their reports (n = 64, 67.4%), but few reported providing T-score values (n = 24, 25.3%).
When asked to choose the one PCL-R score that was most predictive of risk for recidivism, the most common response was the PCL-R Total score for both sexual (58.9%) and violent (48.4%) recidivism. Among those who selected a score other than the Total score, Factor 2 and Facet 4 scores were the most common choices (see Table 2).
PCL-R Score Interpretation
Many evaluators (69.5%) reported providing a categorical interpretation of PCL-R scores, although few reported labeling offenders as psychopaths (7.4%) or diagnosing offenders with psychopathy (5.3%; see Table 2). State agency evaluators were somewhat more likely (79.1%) to make a categorical interpretation of PCL-R scores than prosecution evaluators (64.7%), who were somewhat more likely to make a categorical interpretation than defense evaluators (50.0%), although this pattern of differences only approached statistical significance, χ2(2, N = 80) = 5.34, p = .06, Cramer’s V = .26, 95% CI = [0.09, 0.50].
There was wide variability in the cut scores evaluators used to identify offenders as having high, moderate, and low levels of psychopathy (see Table 3). For example, when we asked evaluators to complete the sentence “[I] believe a score of __ and higher indicates a high level of psychopathy,” responses ranged from 20 to 37 (M = 28.11, SD = 3.25). There was even more variability when we asked about the lowest score that indicates a moderate level of psychopathy (M = 20.73, range = 14-34) and the highest score that indicates a low level of psychopathy (M = 15.91, range = 9-33).
PCL-R Total Score Interpretation.
Note. PCL-R = Psychopathy Checklist–Revised.
We used a repeated-measures ANOVA to compare responses to these three cut-score questions from evaluators who tended to work for the prosecution, a state agency, and the defense. There was a medium-sized main effect for evaluator type, F(2, 70) = 5.20, p = .008,
One defense evaluator reported using especially high PCL-R cut scores (high = 37, moderate = 34, low = 33). We did not exclude this evaluator from our original analysis because we expected defense evaluators to use higher cut scores than other evaluators. Nevertheless, we reran the ANOVA with the evaluator excluded to examine whether there was still evidence of prosecution, state agency, and defense evaluator differences without this potential outlier. Even with this evaluator removed, there was still evidence of a main effect for evaluator type, F(2, 69) = 3.51, p = .04, although the effect was somewhat smaller (
Bias Blind Spot
There were 91 evaluators who completed all four questions about the extent to which PCL-R and Static-99R scoring were influenced by side of retention. We used a repeated-measures ANOVA to examine whether there was evidence of a bias blind spot in risk measure scoring, and whether the effect was larger for the PCL-R than for the Static-99R (see Table 4). As expected, there were large main effects indicating that evaluators rated their own scoring as less influenced by the side of retention than evaluators in general, F(1, 90) = 129.15, p < .001,
Perceived Susceptibility to Adversarial Allegiance (n = 91).
Note. Evaluators rated items from 1 = not likely to be influenced to 3 = very likely to be influenced. PCL-R = Psychopathy Checklist–Revised; CI = confidence interval.
p < .001.
Discussion
The PCL-R continues to be one of the most commonly used measures among SVP evaluators. We found that 60.0% of SVP evaluators reported using the PCL-R in either all or most of their SVP evaluations, which is similar to—if not slightly higher than—the 51.2% rate reported in an earlier SVP evaluator survey (Jackson & Hess, 2007). But beyond these updated data about the frequency of PCL-R use, our survey results reveal more details about how and why evaluators use the PCL-R in these SVP cases.
Multiple Uses for PCL-R Scores
Our findings suggest that the frequent use of the PCL-R by SVP evaluators may be a product of its perceived versatility. Because the PCL-R is a measure of personality traits that are associated with reoffending, scores from this single measure may be relevant for addressing two different SVP criteria: the presence of a mental abnormality/disorder and risk for reoffending. Although evaluators were more likely to report using the PCL-R as a measure of either risk for violent or sexual recidivism (77.9%) than mental illness/abnormality (45.3%), about a third (31.6%) endorsed using it as a measure of both risk and mental disorder/abnormality.
Among those who reported using the PCL-R as a risk measure, most endorsed using it as a measure of risk for sexual violence as opposed to non-sexual violence, which appears reasonable given the focus of SVP laws on sexual (rather than general or violent) offending. But the generally modest association between PCL-R scores and sexual recidivism (d ≈ .40; Hawes et al., 2013) suggests that the PCL-R is not the most useful measure for assessing risk for sexual recidivism. To be clear, SVP evaluators in our survey do not appear to rely on the PCL-R as the only, or even the primary, measure of sexual recidivism risk. Almost all (96.8%) of the evaluators who used the PCL-R also reported using the Static-99R, which is a stronger predictor of sexual recidivism (see Hanson & Morton-Bourgon, 2009).
The finding that most evaluators who use the PCL-R also use the Static-99R raises questions about how evaluators integrate scores from these measures, especially when one appears to be a stronger predictor of sexual recidivism than the other. There is no agreed-upon method for integrating scores across multiple measures (Vrieze & Grove, 2010), and research examining the combination of PCL-R and Static-99R scores suggests that PCL-R scores do not contribute to the prediction of future sexual offending once Static-99R scores are taken into account (Looman et al., 2012). Informal attempts at integration, based on clinical judgment, are unlikely facilitate accurate decision making, given the well-known shortcomings of clinical judgment (see Grove & Meehl, 1996).
It may be that many evaluators do not attempt to integrate results from different measures. A prior survey found that most SVP evaluators who used multiple measures reported the result of each instrument alone, without any attempt at integration (Jackson & Hess, 2007). But the frequent use of both the Static-99R and PCL-R (as a risk measure) suggests that evaluators must use some approach—possibly unique to each evaluator—for coming to conclusions about risk when scores on these measures suggest different levels of risk. Identifying and testing these strategies are important areas of future research, for both the PCL-R and other risk assessment instruments. Especially useful to practitioners would be research reporting recidivism rates (and classification errors) based on different PCL-R and Static-99R (or other measure) score combinations.
Integrating PCL-R Scores and Sexual Deviance Assessment Findings
Many evaluators (80.0%) reported attempting to integrate PCL-R results with their assessment of sexual deviance. This finding is promising in that evaluators appear to be following the guidance provided by the PCL-R manual (Hare, 2003) and published studies (see Hawes et al., 2013). But a more in-depth examination suggests that there are significant gaps between research and practice. The 44.7% of evaluators who use plethysmography results along with PCL-R results to form conclusions about risk are arguably in the best position to defend their practice with research, but the body of research supporting this approach is smaller and more variable than it may initially appear. Only two of the three studies examining the interaction of PCL-R scores and plethysmography results found evidence of a predictive effect, and these two studies used markedly different PCL-R scores to identify high psychopathy offenders (17.5 vs. 25.0; G. T. Harris et al., 2003; Rice & Harris, 1997).
Moreover, few evaluators reported using the standardized measures of sexual deviance (e.g., VRS-SO, SSPI) with documented support in PCL-R and sexual deviance interaction studies (Olver & Wong, 2006; Seto et al., 2004). Instead, the two most commonly endorsed methods for operationalizing sexual deviance were a documented history of deviant sexual behavior (96.1%) and a paraphilia diagnosis (82.9%). Both seem intuitively reasonable, but neither has been examined in any study addressing the PCL-R and sexual deviance interaction. Both tend to be redundant, in that a paraphilia diagnosis is usually based on documented history of sexual behaviors.
Some evaluators may question the need to use a sexual deviance instrument when the information they obtain by reviewing an offender’s history and considering diagnostic criteria overlaps considerably with the items on standardized deviance measures. For example, the SSPI asks evaluators to consider whether a male who has at least one child victim had a male victim, had multiple victims, had a victim below the age of 11, and had an unrelated victim (Seto & Lalumière, 2001), information that SVP evaluators would certainly consider. Using a deviance measure ensures that evaluators consider and integrate this information in a structured and empirically supported way. Without a measure, evaluators may use only some of the relevant information or may combine the relevant information with other data that are not empirically associated with reoffending. For example, most evaluators (82.9%) reported using paraphilia diagnoses when considering the combination of PCL-R scores and sexual deviance. But research suggests that paraphilia diagnoses by themselves are not necessarily associated with sexual recidivism (see Kingston, Seto, Firestone, & Bradford, 2010; Moulden, Firestone, Kingston, & Bradford, 2009), raising questions about the utility of diagnoses in risk assessment (with or without the PCL-R).
It may be that the broad concept of sexual deviance requires a better operational definition. Current conceptualizations include atypical sexual interest (e.g., prepubescent victim) as well as sexual regulation and intensity of sexual interest (Hanson, 2010), which are conceptually distinct and may relate to recidivism risk in different ways. At this point, it seems clear that there is a wide gap between research and practice with respect to deviance assessment. The field would benefit from studies examining the validity of common assessment approaches (e.g., diagnosis), either alone or in conjunction with standardized measures, as well as studies examining the incremental validity of findings from different assessment approaches.
One limitation of all PCL-R and sexual deviance studies is that none provide sufficient information about classification accuracy. Ideally, researchers should report recidivism rates, sensitivity, specificity, and predictive power values across a wide range of PCL-R and sexual deviance measure cut-score combinations. For example, researchers using the PCL-R and the SSPI could report classification accuracy rates for groups created based on an SSPI cut score of 2 and PCL-R cut scores of 10, 15, 20, 25, and 30. They could then repeat the analyses using SSPI cut scores of 3, 4, and so on. Because there will often be a large number of possible cut-score combinations, it is important for researchers to show how classification accuracy changes as the cut scores change (across both measures), and to defend recommended cut scores by showing how classification properties change as scores move away from the recommended cut scores.
At best, existing PCL-R and sexual deviance studies provide information about one cut score or one cut-score combination, usually created to conduct survival analysis. Although these studies sometimes report recidivism rates based on this single set of cut scores (see, for example, Hildebrand et al., 2004; Olver & Wong, 2006), they do not report other relevant statistics, such as sensitivity, specificity, and predictive power, which provide important information about different types of classification errors (i.e., false positive, false negative). To be fair, this is a limitation of the PCL-R research literature in general, with only a handful of studies (e.g., Boccaccini, Turner, Murrie, & Rufino, 2012) providing these classification accuracy statistics for more than one PCL-R cut score.
Limited Use of PCL-R Factor and Facet Scores
Although recent studies suggest that PCL-R Factor 2 and Facet 4 scores may be stronger predictors of recidivism among sexual offenders than other scores (see Hawes et al., 2013), few evaluators reported that Factor 2 or Facet 4 scores were the most relevant for assessing sexual recidivism risk. Instead, most reported that total scores were the most relevant for predicting both sexual and non-sexual violence among sexual offenders. Similarly, fewer than half reported including factor or facet scores in their evaluation reports.
On one hand, these findings suggest that evaluators may be failing to take full advantage of PCL-R results and failing to prioritize the PCL-R data that may be most relevant to risk. On the other hand, neither the PCL-R manual nor individual studies provides clear recommendations for interpreting factor or facet scores as they relate to sex offender risk. Many of the available facet and factor score findings were published recently, and therefore are not included in the PCL-R manual (Hare, 2003). Furthermore, most factor and facet studies only show that the likelihood of recidivism increases as Factor 2 or Facet 4 scores increase. As with PCL-R Total score research, these factor and facet scores studies often provide only limited information for practice because they rarely report classification accuracy findings for a range of cut scores and rarely provide detailed information about different types of classification errors. Evaluators who read these studies can conclude that offenders with higher scores are more likely to reoffend than offenders with lower scores, but they are left with unanswered questions about how to interpret specific factor and facet score results. For example, how high does a Facet 4 score need to be to indicate a high level of risk?
Risk assessment researchers now agree that studies should report classification accuracy statistics for a range of risk measure cut scores (Singh, Yang, Mulvey, & The RAGGE Group, 2015). We echo these recommendations and encourage researchers to report cut-score findings. There is also a need for more factor and facet score research, including studies examining the possibility of a statistical interaction between these scores and measures of sexual deviance. The most recent meta-analysis identified only five studies that allowed for an examination of the relationship between facet scores and sexual recidivism (Hawes et al., 2013).
Categorical Score Interpretation and Adversarial Allegiance
We found that 69.5% of evaluators provide categorical interpretations of PCL-R results, although few evaluators reported using the term “psychopath” (7.4%) or diagnosing offenders with psychopathy (5.3%). These findings are promising in suggesting that evaluators may be cautious about the potentially pejorative impact of these labels on decision makers (Boccaccini, Murrie, Clark, & Cornell, 2008; Guy & Edens, 2003, 2006).
There was, however, considerable variability in the cut scores evaluators used to classify offenders as low, moderate, and high risk (see Table 3). For example, the cut scores for high psychopathy ranged from 20 to 37. Thus, just as prior research has documented significant evaluator differences in PCL-R scoring (Boccaccini, Murrie, Rufino, & Gardner, 2014; Boccaccini, Turner, & Murrie, 2008), there also appear to be evaluator differences in PCL-R score interpretation. Some of these cut scores are well outside of those provided in the PCL-R manual, such as cut scores of 37 or 20 to indicate high psychopathy and 33 to indicate low psychopathy.
Some of the variability in cut score use was explained by adversarial affiliation. Defense evaluators tended to require higher scores than prosecution and state agency evaluators for placing offenders in any psychopathy category. In other words, defense evaluators set a higher threshold for what they considered a “high” or “moderate” score, whereas prosecution evaluators set a lower threshold. These cut-score findings suggest the possibility of adversarial allegiance in score interpretation. Field and experimental findings converge to suggest that being retained by one side in an adversarial case can influence PCL-R scoring (see Murrie & Boccaccini, 2015). But the current survey findings suggest that allegiance may also affect score interpretation, such that opposing experts may assign different labels to the same score, in a manner that reflects the perspective of the side for whom they usually work.
There is now indirect evidence for an allegiance effect on score interpretation in both Static-99R (Chevalier et al., 2015) and PCL-R score interpretation in SVP cases. The evidence is indirect because evaluators were not randomly assigned to sides, and the extent to which the apparent allegiance effect in score interpretation is a by-product of selection effects (e.g., attorneys selecting experts with known interpretation tendencies) or is caused by working repeatedly with one party is unclear. Nevertheless, our cut-score interpretation findings highlight the need for more research in this area, especially experimental research that can control for selection effects (e.g., Murrie et al., 2013).
Bias Blind Spot
Unlike other studies comparing scores from prosecution and defense evaluators (e.g., Edens et al., 2015; Murrie et al., 2009), we did not find evidence of systematic score differences across prosecution, state agency, and defense-retained evaluators. Nonetheless, evaluators reported that they considered PCL-R scores somewhat vulnerable to bias or allegiance effects. Specifically, they reported that PCL-R scores, in general, may be influenced by the side that retained the evaluator, but that the scores they assigned were not influenced by allegiance. As applied to forensic evaluators, these “bias blind spot” findings show that evaluators recognize that adversarial arrangements may lead other evaluators to be biased, but—perhaps because introspection is an insufficient method to recognize one’s own bias—genuinely consider themselves free from bias.
There is no reason to suspect that this bias blind spot exists for only PCL-R scoring. Indeed, we found a similar—although somewhat smaller—bias blind spot effect for Static-99R scoring. Although researchers have not conclusively identified the factors driving adversarial allegiance effects (Murrie & Boccaccini, 2015), it could be that those who refuse to admit the possibility of bias in their decision making are least well equipped to engage in a thoughtful consideration of alternative perspectives, which may reduce the likelihood of allegiance. It is also possible that those who believe that their scores are not influenced by allegiance are also those who already engage in this type of thoughtful consideration. Either way, our findings provide evidence of large (d > .85) bias blind spot effects in forensic assessment and highlight the need for research examining the association between the willingness to admit the possibility of bias and actual bias in scoring and decision making.
Limitations
Like other modern, Internet-based surveys of professional practice, our sample is limited to those who are involved in professional listserves (or had other professional connections that allowed us to contact them) and opted to respond to a request for participation in exchange for a small payment. Although we have no direct way of determining the extent to which our sample represents the broader population of evaluators who administer the PCL-R in SVP evaluations, we had nearly 3 times as many respondents as the most comparable SVP evaluator survey (N = 41 in Jackson & Hess, 2007).
Another limitation is that, at times, we failed to anticipate the need for more information. For example, we did not ask evaluators why they used the PCL-R in some cases rather than others or to provide information about whether they were required to use the PCL-R or follow particular score-reporting practices. As another example, our questions about the average PCL-R scores evaluators assigned to offenders did not clarify how evaluators should consider those who offended against both adults and children, who tend to score higher on the PCL-R than those who offended against children only (Brown et al., 2015). We also did not ask evaluators about the variability in the scores they assign across cases, which can also differ substantially among evaluators (P. B. Harris, Boccaccini, & Murrie, 2015).
A final set of limitations relates to findings from null hypothesis significance testing. Our subgroups of defense and prosecution evaluators were relatively small, which limited the statistical power of our analyses examining differences related to side of retention. At the same time, we did not employ a family-wise correction for statistical significance testing due to the small sample and the conservative nature of such an adjustment. We conducted more than 20 null hypothesis significance tests, suggesting that we should expect at least one significant difference by chance.
Conclusion
Our survey findings provide the first detailed account of PCL-R use by SVP evaluators. Beyond providing descriptive information for the field, our findings highlight several important gaps between research and practice. We hope that these findings encourage researchers to conduct studies that more directly examine and inform routine practice. For example, researchers should study the usefulness of combining paraphilia diagnoses with PCL-R results for risk assessment and provide the types of cut-score and classification accuracy statistics that will help evaluators make informed decisions. We recommend reporting cut score and classification accuracy statistics for each PCL-R score (total, factor, or facet) that is predictive of recidivism. In studies that examine the usefulness of integrating PCL-R results with those from another measure (e.g., Static-99R, sexual deviance assessment), researchers should provide clear recommendations for integrating results based on results of analyses examining score combinations from the two measures.
For practitioners, our findings highlight the need to carefully consider the extent to which routine practices are consistent with existing and emerging empirical research. This survey revealed several evaluator practices that are commonplace and practical (e.g., reporting only PCL-R Total scores, relying on less formal or structured measures of sexual deviance), but less congruent with emerging research than other approaches (e.g., considering PCL-R factor and facet scores, using more formal and structured measures of sexual deviance). Although the available research does not yet provide some of the detailed information that would be most useful to practice (e.g., cut scores, classification guidelines), many evaluators have yet to take full advantage of the emerging research data that are available. Although the PCL-R manual (Hare, 2003) remains a crucial resource, it is now over a decade old and cannot adequately reflect the recent, emerging research most relevant to the specialized task of SVP evaluation. Similarly, our survey focused on only limited subset of PCL-R reporting and use issues, and did not address other potentially important issues, such as the emerging pattern of smaller predictive effects for PCL-R scores assigned for real world as opposed to research purposes (see Hawes et al., 2013; Murrie, Boccaccini, Rufino, & Caperton, 2012).
Finally, our findings highlight the need for evaluators to consider the possibility of adversarial allegiance influencing their assessment practices. Although some of our findings suggested that certain score interpretation practices are likely influenced by adversarial affiliation, evaluators tended to see this as a problem for others but not themselves.
The problems and challenges this survey reveals are probably not unique to the PCL-R, to SVP evaluations, or even to forensic mental health practice. For example, the broader field of forensic science struggles with similar problems, such as limited reliability and validity when their assessment techniques are applied in the field, and contextual biases that result from serving a particular party in an adversarial system (National Research Council, Committee on Identifying the Needs of the Forensic Science Community, 2009). But this survey suggests opportunity for improvement in one area of forensic psychology practice, by revealing a number of specific ways researchers and practitioners can work to examine and improve the reliability, validity, and objectivity of PCL-R use in risk assessments of sexual offenders.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by the National Science Foundation, Law & Social Science Program Award SES 0961082.
