Abstract
The study compares agreement between Department of Health Services (DHS) evaluator’s opinions and court decisions in all 132 sexually violent person (SVP) trials in Wisconsin from 2012 through 2016. Previous research on mock jurors in simulated SVP cases may not extend to real-world SVP legal proceedings. This is the first study to directly compare evaluator opinions with court decisions in actual SVP commitment cases. Trial courts found 81% of participants to be an SVP (SVP+). Courts agreed with the DHS evaluator’s opinion in 67% of the cases, which represents slight to fair agreement beyond chance. Trial courts agreed with evaluators’ SVP+ opinions far more than they did with evaluators’ SVP– (not SVP) opinions. The rates of SVP+ opinions differed widely among DHS evaluators. The implications for public policy and forensic practice are discussed.
Twenty U.S. states have enacted laws to permit the civil commitment and confinement of sex offenders as sexually violent persons (SVPs) after they complete their prison term. Some states refer to “sexually violent predator,” “dangerous sex respondent,” or “sexually dangerous individual.” Such laws typically define an SVP as a convicted sex offender who (a) has a mental condition that predisposes them to commit certain sex offenses and (b) poses a specified risk to reoffend. In determining whether a respondent meets an SVP risk threshold, courts consider the testimony of expert witness forensic evaluators. Courts generally accept the opinions of mental health professionals in forensic decisions, such as competence to stand trial (Gowensmith, Murrie, & Boccaccini, 2012; Zapf, Hubbard, Cooper, Wheeles, & Ronan, 2004), legal sanity (Gowensmith, Murrie, & Boccaccini, 2013; Guarnera & Murrie, 2017; Murrie & Warren, 2005), and involuntary detention (Hilton & Simmons, 2001). However, there has been little research on the agreement between evaluator opinions and court decisions in SVP trials.
Studies of simulated and actual SVP trials have consistently found high rates of commitment decisions. The single SVP court in Texas committed 100% of respondents (Boccaccini, Murrie, & Turner, 2014; Turner, Boccaccini, Murrie, & Harris, 2015). In online polls, 84% to 85% of community residents chose to commit the respondent, even if all they knew was that the respondent had been referred as an SVP (Scurich & Krauss, 2013, 2014). In California, 82% of mock jurors favored commitment. Last, like most states, Texas does not define “likely” to reoffend. Half of the jurors in a study of SVP trials in Texas judged that 1% risk meant likely to reoffend; most said 15% risk meant likely (Knighton, Murrie, Boccaccini, & Turner, 2014).
Perceived Credibility of Expert Testimony
Evaluator Credibility
Studies on the perceived credibility of evaluators have also yielded mixed results. Boccaccini et al. (2014) found that jurors attributed disagreement between evaluators to the complexity of the case, not to whether the defense or prosecution retained the evaluator. However, jurors in Boccaccini, Turner, Murrie, Henderson, and Chevalier (2013) judged that prosecution experts were more credible and better able to predict recidivism than defense experts, but they were more skeptical when hearing testimony from opposing experts.
The Static-99R
Most SVP evaluators use the Static-99R (Phenix, Helmus, & Hanson, 2015) or its predecessor, the Static-99 (Phenix, Helmus, & Hanson, 2012), to predict individuals’ risk of sexual recidivism (Kelley, Ambroziak, Thornton, & Barahal, 2018; Neal & Grisso, 2014).
Research on juror confidence in the Static-99/R in SVP trials has yielded mixed results. In several studies, Boccaccini and his colleagues polled jurors after serving in SVP trials in Texas. Every respondent was committed (one case resulted in a mistrial but the individual was committed at the retrial). Most jurors in one study judged that evaluators who used actuarial scales like the Static-99R could predict risk more accurately (Boccaccini et al., 2014). However, jurors in two other studies (Krauss, McCabe, & Lieberman, 2012; Turner et al., 2015) said they were influenced less by the Static-99 than by other factors that are poor predictors of recidivism. Jurors in both studies judged prosecution evaluators to be more credible. In a third study, jurors’ judgments of offenders’ risk were unrelated to their Static-99 score (Boccaccini et al., 2013).
Scurich and Krauss (2013) recruited subjects online to read details of an actual SVP case and decide if the respondent should be committed. They manipulated the actuarial score and found that a high Static-99 score increased the rate of commit decisions, but a low Static-99 score did not decrease the commit rate. Subjects accepted information that indicated high risk but rejected information that indicated low risk, even though the information came from the same source. Using the same procedure, Scurich and Krauss (2014) found the rate of commitment judgments did not depend on the information they provided subjects; 89% of subjects concluded that just being referred as an SVP justified commitment. Krauss et al. (2012) recruited subjects in California who had been excused from jury duty. After watching a video of a simulated SVP trial, most of the mock jurors said their decision to commit the offender was influenced more by clinical testimony than by an actuarial scale.
Limitations of Previous Research
Mock jurors formed their opinions on their own, after reviewing a brief synopsis of the case, knowing their opinion had no real-world consequence. In actual trials, jurors spend several days listening to testimony, legal arguments, and instructions from the bench before deliberating with other jurors. They also know their verdicts can impose or continue an individual’s confinement. Also, research subjects are not randomly selected and may not represent individuals who are called to jury duty. Therefore, the results of analog studies on juror opinions may not generalize to real-life SVP trials.
Much of the SVP research was conducted in Texas, which is unique in having an outpatient-only SVP program. Jurors may be more willing to commit respondents, knowing they will be supervised in the community rather than incarcerated. Also, all SVP trials in Texas are heard in Montgomery County, judged the third most conservative county in that state (Jones, 2014). Of the 20 states with SVP laws, Texas is one of only five states considered highly conservative, according to a 2017 Gallup poll (Saad, 2018). Thus, the results of the Texas SVP studies may not generalize to other states.
To address those limitations, I compared DHS evaluator opinions and court verdicts in all 132 Wisconsin SVP trials from 2012 through 2016. Direct comparisons of evaluators, juries, and judges in actual SVP trials overcome the problems of mock trials. Wisconsin is similar to most other states with SVP programs. Thus, the results of the study should generalize to other states.
The Wisconsin SVP Program
Background
The Wisconsin SVP statute (Wisconsin State Legislature, 2019) was enacted in May 1994. It defines an SVP as a convicted sex offender who suffers from a condition that predisposes them to commit sexually violent acts and makes them likely (more likely than not) to commit a sexually violent offense. Wisconsin courts interpret “more likely than not” to mean greater than 50% probability (State v. Smalley, 2007). Sexually violent offenses against persons aged 16 and over include sexual assaults that use or threaten to use a dangerous weapon or that cause pregnancy (first degree); that use or threaten to use violence; that cause injury, disease, or mental anguish; or if the victim is unconscious or intoxicated (second degree); or that have sexual contact without the person’s consent (third degree). Sexually violent offenses against children include sexual assault of a child under 13 years old (first degree), sexual assault of a child under 16 (second degree), repeated sexual assault of the same child, incest with a child, child enticement, and sexual assault of a child placed in care. Sexually violent offenses may also include intentional or reckless homicide, battery, false imprisonment, burglary, and robbery if they were sexually motivated. Sexually violent offenses need not include physical force or threat.
Precommitment
The Department of Corrections (DOC) reviews all inmates who had been convicted of a sex offense before they are released from prison (C. Tire, personal communication, August 16, 2017): A DOC specialist first screens the inmates, referring about 18% of them to a review board. Of those, the board refers about half for a Special Purpose Evaluation (SPE) by a DOC evaluator. SPE evaluators find about a third of those offenders meet criteria for commitment. By this process, the DOC refers about 3% of convicted sex offenders to the Department of Justice (DOJ) or District Attorney, which can petition the circuit court in the inmate’s county of residence. The court typically finds probable cause and orders the individual to be detained, pending a commitment trial.
Once the court has found probable cause for commitment, a DHS evaluator is assigned to examine the respondent, providing an independent second opinion. “Respondent” here denotes an offender being evaluated for commitment while “patient” denotes an offender who has been committed for treatment. Evaluators are assigned cases on a quasi-random basis, based on the evaluator’s workload and trial schedules, without regard to details of the cases. Both the DOC and DHS evaluators who examined the respondent testify as expert witnesses at the precommitment trial, along with any private evaluators retained by the prosecution or defense. The DOC evaluator always supports commitment; otherwise, the DOC would not have referred the respondent for commitment. In precommitment trials, the respondent, the state, and the court each has the right to demand a jury. Most precommitment trials are heard by juries, whose verdict must be unanimous.
Postcommitment
Once committed, patients must be examined annually by a DHS evaluator to determine whether they still meet criteria for commitment. SVPs may petition the court to be discharged from commitment. If the court finds probable cause, it convenes a trial. The last DHS evaluator to examine the patient is called to testify, along with the DOC SPE evaluator and any private evaluators retained by the prosecution and defense. If both DOC and DHS evaluators agree, the case rarely goes to trial unless either side can retain an independent evaluator with an opposing opinion. Thus, the courts consider opposing opinions of at least two evaluators in virtually all SVP cases. In petition-for-release trials, the subject, his attorney, or the prosecution may request a six-person jury. In practice, about half the petition-for-release trials are bench trials, heard by a judge alone. In jury trials, five out of six jurors must agree on a verdict.
DHS Examinations
Evaluators diagnose virtually every offender with at least one predisposing mental disorder. Thus, their SVP+ opinions almost always rest on the offender’s risk to reoffend. Evaluators routinely reported a Static-99R recidivism rate and risk category but most did not assign a final risk probability. Instead, they stated whether the subject’s risk to reoffend was more likely than not, or greater than 50%. DHS evaluators were arguably less susceptible to an allegiance effect, in which expert witness opinions tend to favor the side that retains them (Chevalier, Boccaccini, Murrie, & Varela, 2015; Murrie & Boccaccini, 2015; Murrie, Boccaccini, Guarnera, & Rufino, 2013; Murrie et al., 2009; Rufino, Boccaccini, Hawes, & Murrie, 2012). Although the evaluators are state employees, the state does not “retain” them nor does DHS influence them to favor either side. DHS evaluators may take SVP cases privately, but they cannot privately examine a patient whom they had ever examined for DHS or vice versa. Wisconsin currently confines about 330 individuals who were committed as SVPs or who are confined pending a precommitment trial. As of June 2016, 22 years since Wisconsin enacted its SVP program, courts had discharged 130 patients from commitment (Whitaker, 2017).
Method
Procedure
I compared the opinions of DHS evaluators with court decisions as to whether or not a respondent was an SVP (SVP+ or SVP–) in all 132 SVP trials in Wisconsin in 2012 through 2016. Of those, 53 were precommitment and 79 were petitions for discharge. Seventy cases were jury trials and 62 were bench trials. An institutional review board approved the study. Evaluator opinions and subject variables were obtained from a database maintained by the Sand Ridge evaluation unit. Court decisions were drawn from the Circuit Court Case Management Database maintained by the State of Wisconsin. All 13 evaluators who were employed by the Sand Ridge Evaluation Unit at the time the 132 cases were tried in court participated in the study (eight men, five women). Each was a doctoral level, licensed psychologist. Evaluators testified in an average of 6.5 trials (SD = 4.1; range = 4-22) from 2012 through 2016.
Participants
Participants were adult male sex offenders who had either been detained under probable cause pending a commitment trial (respondents) or who had been committed for treatment (patients). I did not include 56 cases in which the court accepted a stipulated agreement in lieu of a trial either before or after commitment. The participant mean age was 51.0 years (SD = 10.7; range = 24-80). Ethnic/racial composition was 61.1% White, 31.1% African American, 6.1% Native American, and 1.6% other or mixed ethnicity. Mean time at Sand Ridge was 100 months (SD = 70; range = 3.2 months to 21 years; Mdn = 94.2 months). Of the 88 (66.7%) who had participated in sex offender treatment, their mean time in treatment was 80 months (SD = 60; range = 4 days to 15.4 years; Mdn = 91.5 months).
All participants had been scored on the Static-99R, an actuarial scale comprised of 10 empirically derived recidivism risk factors. Like the earlier Static-99, the Static-99R yields a total score that corresponds to a 10-year recidivism rate of sex offenders who are released from custody. The rate can be used to predict an individual’s risk of sexual recidivism (Elwood, 2016). The Static-99 and Static-99/R have shown moderate validity in discriminating sexual recidivists (Hanson, Lunetta, Phenix, Neeley, & Epperson, 2014; Helmus, Hanson, Thornton, Babchishin, & Harris, 2012: Reeves, Ogloff, & Simmons, 2017) and have been found to outperform clinical judgment (Bengtson & Långström, 2008).
The mean Static-99R score was 5.40 (SD = 1.7; range = 1-9), which predicts that about 35% of high-risk sex offenders would be charged with another sex offense within 10 years of release from custody, using either the 2009 Static-99R high-risk recidivism rates (Elwood, Kelley, & Mundt, 2017) or the 2015 high-risk rates (Hanson, Thornton, Helmus, & Babchishin, 2016). Participants had also been scored on the Psychopathy Checklist–Revised (PCL-R; Hare, 2003), a semistructured scale that is widely used to quantify psychopathy. Some studies have found that a high PCL-R score predicts sexual recidivism, especially when combined with sexual deviance (Hawes, Boccaccini, & Murrie, 2012). Participants’ mean most-recent PCL-R score was 25.2 (SD = 5.0; range = 9-37.5), which Hare (2003) considers high psychopathy.
Analyses
I calculated evaluator × court agreement indices overall, precommitment and petition-for-release trials, and jury and bench trials separately. I used kappa (k; Cohen, 1960) because it is the most common measure of agreement beyond that expected by chance. However, k is highly sensitive to the differences in the marginal totals of the 2 × 2 table, which can yield paradoxical results (Cicchetti & Feinstein, 1990; Feinstein & Cicchetti, 1990): (a) a high prevalence (base rate) of one outcome can result in a low k despite high agreement and (b) a rater bias toward one outcome can result a high k despite low agreement. Prevalence and bias can be quantified by separate indices, Prevalence Index (PI) and Bias Index (BI) (Cunningham, 2009), both of which equal 0 if neither effect is present. To address these paradoxes, I calculated prevalence-adjusted and bias-adjusted kappa (PABAK; Byrt, Bishop, & Carlin, 1993) and positive and negative predictive values (PPV and NPV) separately for positive and negative categories (SVP+/SVP–). PPV and NPV are the conditional probabilities that the court’s decision will agree with the evaluator’s opinion. I calculated PPV and NPV and their Bayesian credible intervals from 2 × 2 tables using an online calculator (Post_Test_Probabilities; Crawford, Garthwaite, & Betkowska, 2009). I report several benchmarks (Altman, 1991; Cicchetti & Sparrow, 1981; Landis & Koch, 1977) for k and PABAK values because no consensus has emerged over which benchmark is most useful.
Results
SVP+ Rates
Overall, trial courts decided that 81.1% of the respondents were SVPs while evaluators judged 64.4% to be sexually violent (Table 1). In bench trials, judges alone ruled for commitment in 66.1% of the trials, while evaluators judged 53.2% to be sexually violent. Overall, court decisions were not significantly related to respondent’s age, Static-99R score, PCL-R (Psychopathy Checklist–Revised) score, time in treatment, or time incarcerated at Sand Ridge (Table 2) or race, Caucasian/other: χ2(1) = .09, p = .76. The mean rate of SVP+ opinions by individual evaluators was 60% (range = 29%-100%). Juries found more respondents to be sexually violent (84.3%) than did judges alone (77.4%), though the difference was not significant, χ2(1) = 1.01, p = .31.
Agreement Frequencies
Note. SVP = sexually violent person.
Commitment Variables in Court Trials
Note. PCL-R = Psychopathy Checklist–Revised.
In precommitment trials, courts found 88.7% of the respondents were SVPs, while evaluators judged 77.4% to be an SVP. Courts were more likely to find a respondent to be an SVP in precommitment trials (88.7%) than in petition-for-discharge trials (75.9%), though the difference was not significant, χ2(1) = 3.08, p = .08. Juries found 88.1% (37/42) of respondents to be sexually violent; judges alone found 90.9% (10/11) were sexually violent. The difference between them was not significant, χ2(1) = .067, p = .80.
In petition-for-release trials, courts decided that 75.9% of the respondents were SVPs, while evaluators judged 55.7% to be an SVP, a significant difference, χ2(1) = 7.12, p = .008.
Juries decided that 75.6% (22/28) of the patients were SVPs, while judges alone found that 74.5% (38/51) were SVP. Again, the difference between juries and judges was not significant, χ2(1) = 1.62, p = .69.
Court by Evaluator Agreement
The court agreed with the evaluator’s opinion in 66.7% of the trials (Table 3). K = .188, which is considered slight (Landis & Koch, 1977) to poor (Altman, 1991; Cicchetti & Sparrow, 1981) agreement. Unbalanced marginals are clearly apparent and both prevalence (PI) and bias (BI) indices are elevated. Given these results, PABAK may be a better agreement measure than k (Cunningham, 2009). Applying the above k benchmarks, the PABAK value is considered fair agreement (Altman, 1991; Cicchetti & Sparrow, 1981; Landis & Koch, 1977). Court by evaluator agreement varied widely, depending on whether the evaluator judged the respondent to be an SVP. The probability that the court would agree with an SVP+ opinion was nearly 3 times higher than it would agree with an SVP– opinion.
Evaluator × Court Agreement Indices
Note. PI = Prevalence Index; BI = Bias Index (Cunningham, 2009), PABAK = prevalence-adjusted and bias-adjusted kappa (Byrt, Bishop, & Carlin, 1993), PPP = positive predictive power; CI = confidence interval; p (agreement | evaluator SVP+ opinion) with 95% Bayesian credible interval, NPP = negative predictive power; p (agreement | evaluator SVP– opinion) with 95% Bayesian credible interval.
Courts agreed with the evaluator in 66.0% of precommitment trials. PABAK reflected poor agreement by the k benchmarks cited earlier. Courts agreed with 86% of evaluators SVP+ opinions but none of their SVP– opinions. As a result, k could not be calculated. In petition-for release trials, courts agreed with 67.1% of evaluators. Using the same k benchmarks, both k and PABAK reflect fair agreement. Courts agreed more than twice as often with the evaluators’ SVP+ opinions than with their SVP– opinions. Juries agreed with the evaluator’s opinion in 67.1% of trials overall, which reflects slight agreement (k) to poor agreement (PABAK) by benchmarks cited earlier. Juries agreed with the evaluator 5 times often with SVP+ opinions than they did with SVP– opinions. In bench trials, judges and evaluators agreed 66.1% of the time, which is considered poor (Cicchetti & Sparrow, 1981) to fair (Altman, 1991; Landis & Koch, 1977). Judges agreed with evaluator’s SVP+ opinions more than twice as often as they did with their SVP– opinions.
Discussion
Court SVP+ Decisions
The high rate of SVP+ decisions is consistent with studies of mock SVP jurors (Krauss et al., 2012; Scurich & Krauss, 2013; Scurich & Krauss, 2014), actual SVP jurors (Boccaccini et al., 2014; Turner et al., 2015), and community residents (Levenson, Brannon, Fortney, & Baker, 2007). Because offenders are tried in their county of residence, jurors be reluctant to release them back to their own community. Most judges in petition-for-release bench trials presided over the original commitment trial and may have been biased to confirm their original decision. The finding that court decisions were unrelated to Static-99R scores is consistent with previous studies (Boccaccini et al., 2013; Krauss et al., 2012; Turner et al., 2015).
The format that evaluators used to communicate risk may have contributed to the high rate of SVP+ decisions (Hilton, Scurich, and Helmus (2015). Most evaluators reported Static-99/R risk categories, but only a few assigned a final risk probability. Both judges (Kwartner, Lyons, & Boccaccini, 2006) and jurors (Varela, Boccaccini, Cuervo, Murrie, & Clark, 2014) consider risk categories more useful in making decisions than numeric risk probabilities. Scurich (2018) cautioned that risk categories may inherently convey a value judgment.
Evaluator SVP+ Opinions
Evaluators varied widely in their rates of SVP+ opinions. One evaluator judged that every offender they examined was sexually violent, whereas another evaluator reached that decision in less than a third of the cases. The divergence in evaluator opinions cannot be attributed to offender characteristics because evaluators are assigned cases quasi-randomly. It cannot be due to information variance because the record format and content were similar across cases. It cannot be explained by different or ambiguous SVP+ risk thresholds because the Wisconsin statute explicitly defines “likely” as “more likely than to not,” which courts interpret as greater than 50%. Last, the wide range of evaluator SVP+ opinions cannot be attributed to allegiance effect because individual evaluators had no obvious allegiance or incentive to favor the prosecution or defense.
Most DHS evaluators use the Static-99R, but their independence means each evaluator can use it in different ways to predict risk. They can choose risk groups from either the 2009 or 2015 samples and can predict risk by either logistic regression (Elwood et al., 2017; Phenix et al., 2015) or Bayesian posterior probabilities (Elwood, 2018). Some DHS evaluators augment the Static-99R with other scales, such as the VRS-SO (Violence Risk Scale–Sexual Offender Version; Olver, Beggs-Christofferson, Grace, & Wong, 2014). They also employ different methods to predict lifetime risk of actually committing a sex offense, not merely being charged with a new offense within 10 years. Thus, the most plausible explanation for the wide disparity in the rate of evaluator SVP+ opinions is the wide variance in methods that evaluators used to predict risk. The method variance itself may reflect individual evaluator biases (Guarnera, Murrie, & Boccaccini, 2017; Neal & Brodsky, 2016).
Court × Evaluator Agreement
Trial courts showed slight to poor agreement with DHS evaluators when judged by k and poor to fair agreement when judged by PABAK. Agreement did not differ significantly between jury and bench trials or between precommitment and petition-for-release trials. Because trial courts found that most respondents were sexually violent, they obviously agreed more with evaluators who most often shared that opinion. Agreement rates varied widely between SVP+ and SVP– evaluator opinions. Overall, courts were 3 times more likely to agree with an evaluator’s SVP+ opinion than with an evaluator who supported dismissal or discharge. Juries were 5 times more likely to agree with evaluators’ SVP+ opinions while judges were more than twice as likely. In precommitment cases, none of the 53 courts agreed with an evaluator who judged that the respondent was not an SVP.
These results are consistent with previous findings that mock jurors tend to reject evaluators’ opinions that deviate from their preferred opinion before hearing the case (Scurich & Krauss, 2013). Clearly, jurors’ preferred outcome in this study was commitment. Court decisions were unrelated to variables that have been found to predict risk (e.g., Static-99R score and time in treatment), which is again consistent with previous research. Turner et al. (2015) contend that evaluators need to better educate courts on the value of evidence-based scales like the Static-99R, over intuitive factors that do not predict recidivism. However, Turner et al. (2015) themselves reported that informing jurors of the research on risk factors had no appreciable effect on their opinion. Moreover, an evaluator’s opportunity to educate jurors depends on the questions raised by attorneys.
Limitations
The results of this study may not generalize across states that have different statutes and definitions and use different methods to evaluate, commit, and discharge offenders deemed to be sexually violent (DeMatteo, Murphy, Galloway, & Krauss, 2015). I did not assess other variables that may be related to the agreement between courts and evaluators. For example, I did not assess differences between opposing attorneys. Prosecutors with more experience in SVP cases than defense attorneys may have made more compelling cases for commitment. Likewise, I did not assess individual difference between DHS evaluators or poll judges or jurors over how they perceived quality of evaluators’ testimony. Different evaluator methods, experience, and demeanor on the stand may have affected court decisions.
Implications for Public Policy and Forensic Practice
Public Policy
One possible way to reduce the bias toward SVP+ decisions in precommitment trials is to identify high-risk sex offenders and provide evidence-based treatment in either a DOC or DHS facility before they are released, rather than wait until they complete their criminal sentence. In that way, offenders would have the opportunity to reduce their risk enough through treatment so that DOC would not refer them for commitment. DeMatteo et al. (2015) point out that while SVP programs in the United States detain offenders under civil commitment laws, other countries with postsentence detention (Canada, Germany, Australia, and the United Kingdom) manage detainees within their criminal justice system.
One possible remedy for the bias in SVP+ decisions in petition-for-release decisions is to remove trials from discharge decisions altogether. The Iowa Department of Health Services (DHS) annually examines offenders who were committed as SVPs. If an evaluator recommends discharge, the Civil Commitment Unit for Sexual Offenders (CCUSO) usually stipulates to release (patients do not petition for release themselves). A judge then typically orders a release plan without convening a jury. Most release plans include supervision for a year before the patient is discharged from commitment. The Iowa practice avoids an adversarial trial and may mean that courts are better informed by expert witness testimony when making SVP discharge decisions.
Forensic Practice
If evaluators’ risk predictions vary widely, they cannot all be equally credible. Guarnera et al. (2017) and Neal and Brodsky (2016) contend that the first step in addressing evaluator bias is to track evaluator opinions. I contend the second step is to calibrate evaluators by reducing method variance. Calibration has long been used to improve interrater agreement in behavioral observation and diagnoses of mental disorders. In the SVP context, each evaluator’s opinion could be compared with either a standard protocol or the modal opinion in a group of peers. An evaluator whose opinion deviates from a standard or a norm could use that feedback to either revise their method or offer a rationale to support it. Ongoing calibration would reflect the evolving sex offender risk assessment literature.
I propose the best method to calibrate SVP evaluators is through formal, recurring, peer review. Peer review is widely used by medical facilities, is considered a best practice, and is mandated by the Joint Commission on Accreditation of Healthcare Organizations (Vyas & Hozain, 2014). Peer review can be used to assess evaluators’ forensic competence and methods, just as it is used to assess physicians’ medical competence and procedures. The Specialty Guidelines for Forensic Psychology (American Psychological Association [APA], 2011) and Standards for Educational and Psychological Testing (APA, 2014) provide a structure for peer review standards, similar to clinical practice guidelines in medicine. Internal peer review could be augmented by periodic, informal external review.
Conclusion
I believe this is the first study that directly compares evaluator opinions with court decisions in actual SVP commitment cases in the United States. Because the study assessed juries and judges in actual SVP trials, the results can be interpreted directly. Wisconsin trial courts committed or sustained commitment of most respondents. In both precommitment and petition-for-release trials, courts showed only slight to fair agreement with the evaluator’s opinion. They agreed far more often with evaluators who supported commitment than with evaluators who supported dismissal or discharge. Evaluators differed widely in their rates of SVP+ opinions, likely due to wide method variance.
The study is important because it extends previous analog research on mock jurors and simulated trials to sworn jurors in actual SVP commitment trials. It supports Scurich and Krauss’ (2014) contention that jurors have limited ability to evaluate expert witness testimony or recidivism risk, thereby rendering the legal criteria to discriminate SVPs “largely superfluous” (p. 101). The results of this study suggest the need to consider changes in both public policy and forensic practice.
Footnotes
Author’s Note:
Sharon Kelley, Rachel Kahn, and James Mundt contributed to this study. All opinions expressed in this article are my own and not necessarily those of the Sand Ridge Evaluation Unit or the Wisconsin Department of Health Services.
