Agreement Between Courts and SVP Evaluators in the State of Wisconsin

Abstract

The study compares agreement between Department of Health Services (DHS) evaluator’s opinions and court decisions in all 132 sexually violent person (SVP) trials in Wisconsin from 2012 through 2016. Previous research on mock jurors in simulated SVP cases may not extend to real-world SVP legal proceedings. This is the first study to directly compare evaluator opinions with court decisions in actual SVP commitment cases. Trial courts found 81% of participants to be an SVP (SVP+). Courts agreed with the DHS evaluator’s opinion in 67% of the cases, which represents slight to fair agreement beyond chance. Trial courts agreed with evaluators’ SVP+ opinions far more than they did with evaluators’ SVP– (not SVP) opinions. The rates of SVP+ opinions differed widely among DHS evaluators. The implications for public policy and forensic practice are discussed.

Keywords

civil commitment sexually violent persons risk assessment

Twenty U.S. states have enacted laws to permit the civil commitment and confinement of sex offenders as sexually violent persons (SVPs) after they complete their prison term. Some states refer to “sexually violent predator,” “dangerous sex respondent,” or “sexually dangerous individual.” Such laws typically define an SVP as a convicted sex offender who (a) has a mental condition that predisposes them to commit certain sex offenses and (b) poses a specified risk to reoffend. In determining whether a respondent meets an SVP risk threshold, courts consider the testimony of expert witness forensic evaluators. Courts generally accept the opinions of mental health professionals in forensic decisions, such as competence to stand trial (Gowensmith, Murrie, & Boccaccini, 2012; Zapf, Hubbard, Cooper, Wheeles, & Ronan, 2004), legal sanity (Gowensmith, Murrie, & Boccaccini, 2013; Guarnera & Murrie, 2017; Murrie & Warren, 2005), and involuntary detention (Hilton & Simmons, 2001). However, there has been little research on the agreement between evaluator opinions and court decisions in SVP trials.

Studies of simulated and actual SVP trials have consistently found high rates of commitment decisions. The single SVP court in Texas committed 100% of respondents (Boccaccini, Murrie, & Turner, 2014; Turner, Boccaccini, Murrie, & Harris, 2015). In online polls, 84% to 85% of community residents chose to commit the respondent, even if all they knew was that the respondent had been referred as an SVP (Scurich & Krauss, 2013, 2014). In California, 82% of mock jurors favored commitment. Last, like most states, Texas does not define “likely” to reoffend. Half of the jurors in a study of SVP trials in Texas judged that 1% risk meant likely to reoffend; most said 15% risk meant likely (Knighton, Murrie, Boccaccini, & Turner, 2014).

Perceived Credibility of Expert Testimony

Evaluator Credibility

Studies on the perceived credibility of evaluators have also yielded mixed results. Boccaccini et al. (2014) found that jurors attributed disagreement between evaluators to the complexity of the case, not to whether the defense or prosecution retained the evaluator. However, jurors in Boccaccini, Turner, Murrie, Henderson, and Chevalier (2013) judged that prosecution experts were more credible and better able to predict recidivism than defense experts, but they were more skeptical when hearing testimony from opposing experts.

The Static-99R

Most SVP evaluators use the Static-99R (Phenix, Helmus, & Hanson, 2015) or its predecessor, the Static-99 (Phenix, Helmus, & Hanson, 2012), to predict individuals’ risk of sexual recidivism (Kelley, Ambroziak, Thornton, & Barahal, 2018; Neal & Grisso, 2014).

Research on juror confidence in the Static-99/R in SVP trials has yielded mixed results. In several studies, Boccaccini and his colleagues polled jurors after serving in SVP trials in Texas. Every respondent was committed (one case resulted in a mistrial but the individual was committed at the retrial). Most jurors in one study judged that evaluators who used actuarial scales like the Static-99R could predict risk more accurately (Boccaccini et al., 2014). However, jurors in two other studies (Krauss, McCabe, & Lieberman, 2012; Turner et al., 2015) said they were influenced less by the Static-99 than by other factors that are poor predictors of recidivism. Jurors in both studies judged prosecution evaluators to be more credible. In a third study, jurors’ judgments of offenders’ risk were unrelated to their Static-99 score (Boccaccini et al., 2013).

Scurich and Krauss (2013) recruited subjects online to read details of an actual SVP case and decide if the respondent should be committed. They manipulated the actuarial score and found that a high Static-99 score increased the rate of commit decisions, but a low Static-99 score did not decrease the commit rate. Subjects accepted information that indicated high risk but rejected information that indicated low risk, even though the information came from the same source. Using the same procedure, Scurich and Krauss (2014) found the rate of commitment judgments did not depend on the information they provided subjects; 89% of subjects concluded that just being referred as an SVP justified commitment. Krauss et al. (2012) recruited subjects in California who had been excused from jury duty. After watching a video of a simulated SVP trial, most of the mock jurors said their decision to commit the offender was influenced more by clinical testimony than by an actuarial scale.

Limitations of Previous Research

Mock jurors formed their opinions on their own, after reviewing a brief synopsis of the case, knowing their opinion had no real-world consequence. In actual trials, jurors spend several days listening to testimony, legal arguments, and instructions from the bench before deliberating with other jurors. They also know their verdicts can impose or continue an individual’s confinement. Also, research subjects are not randomly selected and may not represent individuals who are called to jury duty. Therefore, the results of analog studies on juror opinions may not generalize to real-life SVP trials.

Much of the SVP research was conducted in Texas, which is unique in having an outpatient-only SVP program. Jurors may be more willing to commit respondents, knowing they will be supervised in the community rather than incarcerated. Also, all SVP trials in Texas are heard in Montgomery County, judged the third most conservative county in that state (Jones, 2014). Of the 20 states with SVP laws, Texas is one of only five states considered highly conservative, according to a 2017 Gallup poll (Saad, 2018). Thus, the results of the Texas SVP studies may not generalize to other states.

To address those limitations, I compared DHS evaluator opinions and court verdicts in all 132 Wisconsin SVP trials from 2012 through 2016. Direct comparisons of evaluators, juries, and judges in actual SVP trials overcome the problems of mock trials. Wisconsin is similar to most other states with SVP programs. Thus, the results of the study should generalize to other states.

The Wisconsin SVP Program

Background

The Wisconsin SVP statute (Wisconsin State Legislature, 2019) was enacted in May 1994. It defines an SVP as a convicted sex offender who suffers from a condition that predisposes them to commit sexually violent acts and makes them likely (more likely than not) to commit a sexually violent offense. Wisconsin courts interpret “more likely than not” to mean greater than 50% probability (State v. Smalley, 2007). Sexually violent offenses against persons aged 16 and over include sexual assaults that use or threaten to use a dangerous weapon or that cause pregnancy (first degree); that use or threaten to use violence; that cause injury, disease, or mental anguish; or if the victim is unconscious or intoxicated (second degree); or that have sexual contact without the person’s consent (third degree). Sexually violent offenses against children include sexual assault of a child under 13 years old (first degree), sexual assault of a child under 16 (second degree), repeated sexual assault of the same child, incest with a child, child enticement, and sexual assault of a child placed in care. Sexually violent offenses may also include intentional or reckless homicide, battery, false imprisonment, burglary, and robbery if they were sexually motivated. Sexually violent offenses need not include physical force or threat.

Precommitment

The Department of Corrections (DOC) reviews all inmates who had been convicted of a sex offense before they are released from prison (C. Tire, personal communication, August 16, 2017): A DOC specialist first screens the inmates, referring about 18% of them to a review board. Of those, the board refers about half for a Special Purpose Evaluation (SPE) by a DOC evaluator. SPE evaluators find about a third of those offenders meet criteria for commitment. By this process, the DOC refers about 3% of convicted sex offenders to the Department of Justice (DOJ) or District Attorney, which can petition the circuit court in the inmate’s county of residence. The court typically finds probable cause and orders the individual to be detained, pending a commitment trial.

Once the court has found probable cause for commitment, a DHS evaluator is assigned to examine the respondent, providing an independent second opinion. “Respondent” here denotes an offender being evaluated for commitment while “patient” denotes an offender who has been committed for treatment. Evaluators are assigned cases on a quasi-random basis, based on the evaluator’s workload and trial schedules, without regard to details of the cases. Both the DOC and DHS evaluators who examined the respondent testify as expert witnesses at the precommitment trial, along with any private evaluators retained by the prosecution or defense. The DOC evaluator always supports commitment; otherwise, the DOC would not have referred the respondent for commitment. In precommitment trials, the respondent, the state, and the court each has the right to demand a jury. Most precommitment trials are heard by juries, whose verdict must be unanimous.

Postcommitment

Once committed, patients must be examined annually by a DHS evaluator to determine whether they still meet criteria for commitment. SVPs may petition the court to be discharged from commitment. If the court finds probable cause, it convenes a trial. The last DHS evaluator to examine the patient is called to testify, along with the DOC SPE evaluator and any private evaluators retained by the prosecution and defense. If both DOC and DHS evaluators agree, the case rarely goes to trial unless either side can retain an independent evaluator with an opposing opinion. Thus, the courts consider opposing opinions of at least two evaluators in virtually all SVP cases. In petition-for-release trials, the subject, his attorney, or the prosecution may request a six-person jury. In practice, about half the petition-for-release trials are bench trials, heard by a judge alone. In jury trials, five out of six jurors must agree on a verdict.

DHS Examinations

Evaluators diagnose virtually every offender with at least one predisposing mental disorder. Thus, their SVP+ opinions almost always rest on the offender’s risk to reoffend. Evaluators routinely reported a Static-99R recidivism rate and risk category but most did not assign a final risk probability. Instead, they stated whether the subject’s risk to reoffend was more likely than not, or greater than 50%. DHS evaluators were arguably less susceptible to an allegiance effect, in which expert witness opinions tend to favor the side that retains them (Chevalier, Boccaccini, Murrie, & Varela, 2015; Murrie & Boccaccini, 2015; Murrie, Boccaccini, Guarnera, & Rufino, 2013; Murrie et al., 2009; Rufino, Boccaccini, Hawes, & Murrie, 2012). Although the evaluators are state employees, the state does not “retain” them nor does DHS influence them to favor either side. DHS evaluators may take SVP cases privately, but they cannot privately examine a patient whom they had ever examined for DHS or vice versa. Wisconsin currently confines about 330 individuals who were committed as SVPs or who are confined pending a precommitment trial. As of June 2016, 22 years since Wisconsin enacted its SVP program, courts had discharged 130 patients from commitment (Whitaker, 2017).

Method

Procedure

I compared the opinions of DHS evaluators with court decisions as to whether or not a respondent was an SVP (SVP+ or SVP–) in all 132 SVP trials in Wisconsin in 2012 through 2016. Of those, 53 were precommitment and 79 were petitions for discharge. Seventy cases were jury trials and 62 were bench trials. An institutional review board approved the study. Evaluator opinions and subject variables were obtained from a database maintained by the Sand Ridge evaluation unit. Court decisions were drawn from the Circuit Court Case Management Database maintained by the State of Wisconsin. All 13 evaluators who were employed by the Sand Ridge Evaluation Unit at the time the 132 cases were tried in court participated in the study (eight men, five women). Each was a doctoral level, licensed psychologist. Evaluators testified in an average of 6.5 trials (SD = 4.1; range = 4-22) from 2012 through 2016.

Participants

Participants were adult male sex offenders who had either been detained under probable cause pending a commitment trial (respondents) or who had been committed for treatment (patients). I did not include 56 cases in which the court accepted a stipulated agreement in lieu of a trial either before or after commitment. The participant mean age was 51.0 years (SD = 10.7; range = 24-80). Ethnic/racial composition was 61.1% White, 31.1% African American, 6.1% Native American, and 1.6% other or mixed ethnicity. Mean time at Sand Ridge was 100 months (SD = 70; range = 3.2 months to 21 years; Mdn = 94.2 months). Of the 88 (66.7%) who had participated in sex offender treatment, their mean time in treatment was 80 months (SD = 60; range = 4 days to 15.4 years; Mdn = 91.5 months).

All participants had been scored on the Static-99R, an actuarial scale comprised of 10 empirically derived recidivism risk factors. Like the earlier Static-99, the Static-99R yields a total score that corresponds to a 10-year recidivism rate of sex offenders who are released from custody. The rate can be used to predict an individual’s risk of sexual recidivism (Elwood, 2016). The Static-99 and Static-99/R have shown moderate validity in discriminating sexual recidivists (Hanson, Lunetta, Phenix, Neeley, & Epperson, 2014; Helmus, Hanson, Thornton, Babchishin, & Harris, 2012: Reeves, Ogloff, & Simmons, 2017) and have been found to outperform clinical judgment (Bengtson & Långström, 2008).

The mean Static-99R score was 5.40 (SD = 1.7; range = 1-9), which predicts that about 35% of high-risk sex offenders would be charged with another sex offense within 10 years of release from custody, using either the 2009 Static-99R high-risk recidivism rates (Elwood, Kelley, & Mundt, 2017) or the 2015 high-risk rates (Hanson, Thornton, Helmus, & Babchishin, 2016). Participants had also been scored on the Psychopathy Checklist–Revised (PCL-R; Hare, 2003), a semistructured scale that is widely used to quantify psychopathy. Some studies have found that a high PCL-R score predicts sexual recidivism, especially when combined with sexual deviance (Hawes, Boccaccini, & Murrie, 2012). Participants’ mean most-recent PCL-R score was 25.2 (SD = 5.0; range = 9-37.5), which Hare (2003) considers high psychopathy.

Analyses

I calculated evaluator × court agreement indices overall, precommitment and petition-for-release trials, and jury and bench trials separately. I used kappa (k; Cohen, 1960) because it is the most common measure of agreement beyond that expected by chance. However, k is highly sensitive to the differences in the marginal totals of the 2 × 2 table, which can yield paradoxical results (Cicchetti & Feinstein, 1990; Feinstein & Cicchetti, 1990): (a) a high prevalence (base rate) of one outcome can result in a low k despite high agreement and (b) a rater bias toward one outcome can result a high k despite low agreement. Prevalence and bias can be quantified by separate indices, Prevalence Index (PI) and Bias Index (BI) (Cunningham, 2009), both of which equal 0 if neither effect is present. To address these paradoxes, I calculated prevalence-adjusted and bias-adjusted kappa (PABAK; Byrt, Bishop, & Carlin, 1993) and positive and negative predictive values (PPV and NPV) separately for positive and negative categories (SVP+/SVP–). PPV and NPV are the conditional probabilities that the court’s decision will agree with the evaluator’s opinion. I calculated PPV and NPV and their Bayesian credible intervals from 2 × 2 tables using an online calculator (Post_Test_Probabilities; Crawford, Garthwaite, & Betkowska, 2009). I report several benchmarks (Altman, 1991; Cicchetti & Sparrow, 1981; Landis & Koch, 1977) for k and PABAK values because no consensus has emerged over which benchmark is most useful.

Results

SVP+ Rates

Overall, trial courts decided that 81.1% of the respondents were SVPs while evaluators judged 64.4% to be sexually violent (Table 1). In bench trials, judges alone ruled for commitment in 66.1% of the trials, while evaluators judged 53.2% to be sexually violent. Overall, court decisions were not significantly related to respondent’s age, Static-99R score, PCL-R (Psychopathy Checklist–Revised) score, time in treatment, or time incarcerated at Sand Ridge (Table 2) or race, Caucasian/other: χ²(1) = .09, p = .76. The mean rate of SVP+ opinions by individual evaluators was 60% (range = 29%-100%). Juries found more respondents to be sexually violent (84.3%) than did judges alone (77.4%), though the difference was not significant, χ²(1) = 1.01, p = .31.

Table 1:

Agreement Frequencies

Overall		Court		Total
Overall		SVP+	SVP–	Total
Evaluator	SVP+	74	11	85
Evaluator	SVP–	33	14	47
	Total	107	25	132
Precommitment		Court		Total
Precommitment		SVP+	SVP–	Total
Evaluator	SVP+	35	6	41
Evaluator	SVP–	12	0	12
	Total	47	6	53
Petition for release		Court		Total
Petition for release		SVP+	SVP–	Total
Evaluator	SVP+	39	5	44
Evaluator	SVP–	21	14	35
	Total	60	19	79

Note. SVP = sexually violent person.

Table 2:

Commitment Variables in Court Trials

	Discharge/Dismiss (n = 25)		Commit (n = 107)		Significance
Variable	M	SD	M	SD	t(130), p	Cohen’s d
Age in years	50.00	10.52	51.28	10.79	.54, p = .59	.12
Static-99R score	5.12	1.45	5.47	1.70	.94, p = .35	.22
PCL-R score	25.51	5.23	25.17	4.96	− .31, p = .76	.07
Years in treatment	5.00	6.30	4.53	4.94	− .40, p = .69	.08
Years at Sand Ridge	10.00	5.37	7.92	5.87	− 1.62, p = .11	.39

Note. PCL-R = Psychopathy Checklist–Revised.

In precommitment trials, courts found 88.7% of the respondents were SVPs, while evaluators judged 77.4% to be an SVP. Courts were more likely to find a respondent to be an SVP in precommitment trials (88.7%) than in petition-for-discharge trials (75.9%), though the difference was not significant, χ²(1) = 3.08, p = .08. Juries found 88.1% (37/42) of respondents to be sexually violent; judges alone found 90.9% (10/11) were sexually violent. The difference between them was not significant, χ²(1) = .067, p = .80.

In petition-for-release trials, courts decided that 75.9% of the respondents were SVPs, while evaluators judged 55.7% to be an SVP, a significant difference, χ²(1) = 7.12, p = .008.

Juries decided that 75.6% (22/28) of the patients were SVPs, while judges alone found that 74.5% (38/51) were SVP. Again, the difference between juries and judges was not significant, χ²(1) = 1.62, p = .69.

Court by Evaluator Agreement

The court agreed with the evaluator’s opinion in 66.7% of the trials (Table 3). K = .188, which is considered slight (Landis & Koch, 1977) to poor (Altman, 1991; Cicchetti & Sparrow, 1981) agreement. Unbalanced marginals are clearly apparent and both prevalence (PI) and bias (BI) indices are elevated. Given these results, PABAK may be a better agreement measure than k (Cunningham, 2009). Applying the above k benchmarks, the PABAK value is considered fair agreement (Altman, 1991; Cicchetti & Sparrow, 1981; Landis & Koch, 1977). Court by evaluator agreement varied widely, depending on whether the evaluator judged the respondent to be an SVP. The probability that the court would agree with an SVP+ opinion was nearly 3 times higher than it would agree with an SVP– opinion.

Table 3:

Evaluator × Court Agreement Indices

Trial	n	% agreement	k	PI	BI	PABAK	PPP (95% CI)	NPP (95% CI)	PPP/NPP
Overall	132	66.7	.188	.46	−.17	.334	87.1 [79.3, 93.3]	29.8 [17.7, 43.4]	2.9
Precommitment	53	66.0	<0	.66	−.11	.320	86.3 [74.3, 94.9]	0.00 [.00, 17.1]	∞
Petition for release	79	67.1	.300	.32	−.20	.342	88.6 [77.8, 96.1]	40.0 [24.6, 56.5]	2.2
Jury	70	67.1	.015	.59	−.10	.342	84.6 [73.8, 93.0]	16.7 [3.8, 36.5]	5.1
Bench	62	66.1	.298	.31	−.24	.322	90.9 [79.4, 98.0]	37.9 [21.6, 55.9]	2.4

Note. PI = Prevalence Index; BI = Bias Index (Cunningham, 2009), PABAK = prevalence-adjusted and bias-adjusted kappa (Byrt, Bishop, & Carlin, 1993), PPP = positive predictive power; CI = confidence interval; p (agreement | evaluator SVP+ opinion) with 95% Bayesian credible interval, NPP = negative predictive power; p (agreement | evaluator SVP– opinion) with 95% Bayesian credible interval.

Courts agreed with the evaluator in 66.0% of precommitment trials. PABAK reflected poor agreement by the k benchmarks cited earlier. Courts agreed with 86% of evaluators SVP+ opinions but none of their SVP– opinions. As a result, k could not be calculated. In petition-for release trials, courts agreed with 67.1% of evaluators. Using the same k benchmarks, both k and PABAK reflect fair agreement. Courts agreed more than twice as often with the evaluators’ SVP+ opinions than with their SVP– opinions. Juries agreed with the evaluator’s opinion in 67.1% of trials overall, which reflects slight agreement (k) to poor agreement (PABAK) by benchmarks cited earlier. Juries agreed with the evaluator 5 times often with SVP+ opinions than they did with SVP– opinions. In bench trials, judges and evaluators agreed 66.1% of the time, which is considered poor (Cicchetti & Sparrow, 1981) to fair (Altman, 1991; Landis & Koch, 1977). Judges agreed with evaluator’s SVP+ opinions more than twice as often as they did with their SVP– opinions.

Discussion

Court SVP+ Decisions

The high rate of SVP+ decisions is consistent with studies of mock SVP jurors (Krauss et al., 2012; Scurich & Krauss, 2013; Scurich & Krauss, 2014), actual SVP jurors (Boccaccini et al., 2014; Turner et al., 2015), and community residents (Levenson, Brannon, Fortney, & Baker, 2007). Because offenders are tried in their county of residence, jurors be reluctant to release them back to their own community. Most judges in petition-for-release bench trials presided over the original commitment trial and may have been biased to confirm their original decision. The finding that court decisions were unrelated to Static-99R scores is consistent with previous studies (Boccaccini et al., 2013; Krauss et al., 2012; Turner et al., 2015).

The format that evaluators used to communicate risk may have contributed to the high rate of SVP+ decisions (Hilton, Scurich, and Helmus (2015). Most evaluators reported Static-99/R risk categories, but only a few assigned a final risk probability. Both judges (Kwartner, Lyons, & Boccaccini, 2006) and jurors (Varela, Boccaccini, Cuervo, Murrie, & Clark, 2014) consider risk categories more useful in making decisions than numeric risk probabilities. Scurich (2018) cautioned that risk categories may inherently convey a value judgment.

Evaluator SVP+ Opinions

Evaluators varied widely in their rates of SVP+ opinions. One evaluator judged that every offender they examined was sexually violent, whereas another evaluator reached that decision in less than a third of the cases. The divergence in evaluator opinions cannot be attributed to offender characteristics because evaluators are assigned cases quasi-randomly. It cannot be due to information variance because the record format and content were similar across cases. It cannot be explained by different or ambiguous SVP+ risk thresholds because the Wisconsin statute explicitly defines “likely” as “more likely than to not,” which courts interpret as greater than 50%. Last, the wide range of evaluator SVP+ opinions cannot be attributed to allegiance effect because individual evaluators had no obvious allegiance or incentive to favor the prosecution or defense.

Most DHS evaluators use the Static-99R, but their independence means each evaluator can use it in different ways to predict risk. They can choose risk groups from either the 2009 or 2015 samples and can predict risk by either logistic regression (Elwood et al., 2017; Phenix et al., 2015) or Bayesian posterior probabilities (Elwood, 2018). Some DHS evaluators augment the Static-99R with other scales, such as the VRS-SO (Violence Risk Scale–Sexual Offender Version; Olver, Beggs-Christofferson, Grace, & Wong, 2014). They also employ different methods to predict lifetime risk of actually committing a sex offense, not merely being charged with a new offense within 10 years. Thus, the most plausible explanation for the wide disparity in the rate of evaluator SVP+ opinions is the wide variance in methods that evaluators used to predict risk. The method variance itself may reflect individual evaluator biases (Guarnera, Murrie, & Boccaccini, 2017; Neal & Brodsky, 2016).

Court × Evaluator Agreement

Trial courts showed slight to poor agreement with DHS evaluators when judged by k and poor to fair agreement when judged by PABAK. Agreement did not differ significantly between jury and bench trials or between precommitment and petition-for-release trials. Because trial courts found that most respondents were sexually violent, they obviously agreed more with evaluators who most often shared that opinion. Agreement rates varied widely between SVP+ and SVP– evaluator opinions. Overall, courts were 3 times more likely to agree with an evaluator’s SVP+ opinion than with an evaluator who supported dismissal or discharge. Juries were 5 times more likely to agree with evaluators’ SVP+ opinions while judges were more than twice as likely. In precommitment cases, none of the 53 courts agreed with an evaluator who judged that the respondent was not an SVP.

These results are consistent with previous findings that mock jurors tend to reject evaluators’ opinions that deviate from their preferred opinion before hearing the case (Scurich & Krauss, 2013). Clearly, jurors’ preferred outcome in this study was commitment. Court decisions were unrelated to variables that have been found to predict risk (e.g., Static-99R score and time in treatment), which is again consistent with previous research. Turner et al. (2015) contend that evaluators need to better educate courts on the value of evidence-based scales like the Static-99R, over intuitive factors that do not predict recidivism. However, Turner et al. (2015) themselves reported that informing jurors of the research on risk factors had no appreciable effect on their opinion. Moreover, an evaluator’s opportunity to educate jurors depends on the questions raised by attorneys.

Limitations

The results of this study may not generalize across states that have different statutes and definitions and use different methods to evaluate, commit, and discharge offenders deemed to be sexually violent (DeMatteo, Murphy, Galloway, & Krauss, 2015). I did not assess other variables that may be related to the agreement between courts and evaluators. For example, I did not assess differences between opposing attorneys. Prosecutors with more experience in SVP cases than defense attorneys may have made more compelling cases for commitment. Likewise, I did not assess individual difference between DHS evaluators or poll judges or jurors over how they perceived quality of evaluators’ testimony. Different evaluator methods, experience, and demeanor on the stand may have affected court decisions.

Implications for Public Policy and Forensic Practice

Public Policy

One possible way to reduce the bias toward SVP+ decisions in precommitment trials is to identify high-risk sex offenders and provide evidence-based treatment in either a DOC or DHS facility before they are released, rather than wait until they complete their criminal sentence. In that way, offenders would have the opportunity to reduce their risk enough through treatment so that DOC would not refer them for commitment. DeMatteo et al. (2015) point out that while SVP programs in the United States detain offenders under civil commitment laws, other countries with postsentence detention (Canada, Germany, Australia, and the United Kingdom) manage detainees within their criminal justice system.

One possible remedy for the bias in SVP+ decisions in petition-for-release decisions is to remove trials from discharge decisions altogether. The Iowa Department of Health Services (DHS) annually examines offenders who were committed as SVPs. If an evaluator recommends discharge, the Civil Commitment Unit for Sexual Offenders (CCUSO) usually stipulates to release (patients do not petition for release themselves). A judge then typically orders a release plan without convening a jury. Most release plans include supervision for a year before the patient is discharged from commitment. The Iowa practice avoids an adversarial trial and may mean that courts are better informed by expert witness testimony when making SVP discharge decisions.

Forensic Practice

If evaluators’ risk predictions vary widely, they cannot all be equally credible. Guarnera et al. (2017) and Neal and Brodsky (2016) contend that the first step in addressing evaluator bias is to track evaluator opinions. I contend the second step is to calibrate evaluators by reducing method variance. Calibration has long been used to improve interrater agreement in behavioral observation and diagnoses of mental disorders. In the SVP context, each evaluator’s opinion could be compared with either a standard protocol or the modal opinion in a group of peers. An evaluator whose opinion deviates from a standard or a norm could use that feedback to either revise their method or offer a rationale to support it. Ongoing calibration would reflect the evolving sex offender risk assessment literature.

I propose the best method to calibrate SVP evaluators is through formal, recurring, peer review. Peer review is widely used by medical facilities, is considered a best practice, and is mandated by the Joint Commission on Accreditation of Healthcare Organizations (Vyas & Hozain, 2014). Peer review can be used to assess evaluators’ forensic competence and methods, just as it is used to assess physicians’ medical competence and procedures. The Specialty Guidelines for Forensic Psychology (American Psychological Association [APA], 2011) and Standards for Educational and Psychological Testing (APA, 2014) provide a structure for peer review standards, similar to clinical practice guidelines in medicine. Internal peer review could be augmented by periodic, informal external review.

Conclusion

I believe this is the first study that directly compares evaluator opinions with court decisions in actual SVP commitment cases in the United States. Because the study assessed juries and judges in actual SVP trials, the results can be interpreted directly. Wisconsin trial courts committed or sustained commitment of most respondents. In both precommitment and petition-for-release trials, courts showed only slight to fair agreement with the evaluator’s opinion. They agreed far more often with evaluators who supported commitment than with evaluators who supported dismissal or discharge. Evaluators differed widely in their rates of SVP+ opinions, likely due to wide method variance.

The study is important because it extends previous analog research on mock jurors and simulated trials to sworn jurors in actual SVP commitment trials. It supports Scurich and Krauss’ (2014) contention that jurors have limited ability to evaluate expert witness testimony or recidivism risk, thereby rendering the legal criteria to discriminate SVPs “largely superfluous” (p. 101). The results of this study suggest the need to consider changes in both public policy and forensic practice.

Footnotes

Author’s Note:

Sharon Kelley, Rachel Kahn, and James Mundt contributed to this study. All opinions expressed in this article are my own and not necessarily those of the Sand Ridge Evaluation Unit or the Wisconsin Department of Health Services.

ORCID iD

Richard W. Elwood

Richard W. Elwood is a forensic evaluator at the Sand Ridge Evaluation Unit. He earned his PhD in clinical psychology at the University of Washington. He currently focuses on predicting the recidivism risk of sex offenders. His work in this area has appeared in Sexual Abuse and the International Journal of Offender Therapy and Comparative Criminology.

References

Altman

D. G.

(1991). Practical statistics for medical research. London, England: Chapman & Hall.

American Psychological Association. (2011). Specialty guidelines for forensic psychologists. Washington, DC: Author. Retrieved from http://www.apadivisions.org/division-41/about/specialty/guidelines.pdf

American Psychological Association. (2014). Standards for educational and psychological testing. Washington, DC: Author.

Bengtson

Långström

(2008). Unguided clinical and actuarial assessment of re-offending risk: A direct comparison with sex offenders in Denmark. Sexual Abuse: A Journal of Research and Treatment, 19, 135-153.

Boccaccini

M. T.

Murrie

D. C.

Turner

D. B.

(2014). Jurors’ views on the value and objectivity of mental health experts testifying in sexually violent predator trials. Behavioral Sciences & the Law, 32, 483-495. doi:10.1002/bsl.2129

Boccaccini

M. T.

Turner

D. B.

Murrie

D. C.

Henderson

C. E.

Chevalier

(2013). Do scores from risk measures matter to jurors? Psychology, Public Policy, and Law, 19, 259-269. doi:10.1037/a0031354

Byrt

Bishop

Carlin

J. B.

(1993). Bias, prevalence and kappa. Journal of Clinical Epidemiology, 46, 423-429.

Chevalier

C. S.

Boccaccini

M. T.

Murrie

D. C.

Varela

J. G.

(2015). Static-99R reporting practices in sexually violent predator cases: Does norm selection reflect adversarial allegiance? Law and Human Behavior, 39, 209-218. doi:10.1037/lhb0000114

Cicchetti

D. V.

Feinstein

A. R.

(1990). High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology, 43, 551-558.

10.

Cicchetti

D. V.

Sparrow

S. S.

(1981). Developing criteria for establishing interrater reliability of specific items: Application to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127-137.

11.

Cohen

J. A.

(1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. doi:10.1177/001316446002000104

12.

Crawford

J. R.

Garthwaite

P. H.

Betkowska

(2009). Post_Test_Probabilities.exe [Software]. Retrieved from http://homepages.abdn.ac.uk/j.crawford/pages/dept/BayesPTP.htm

13.

Cunningham

(2009). More than just the K coefficient: A program to fully characterize inter-rater reliability between two raters. In Proceedings of the SAS global forum 2009: Statistics and Data Analysis. Retrieved from http://support.sas.com/resources/papers/proceedings09/242-2009.pdf

14.

DeMatteo

Murphy

Galloway

Krauss

D. A.

(2015). A national survey of United States sexually violent person legislation: Policy, procedures, and practice. International Journal of Forensic Mental Health, 14, 245-266. doi:10.1080/14999013.2015.1110847

15.

Elwood

R. W.

(2016). Defining probability in sex offender risk assessment. The International Journal of Offender Therapy and Comparative Criminology, 60, 1928-1941. doi:10.1177/0306624X15587912

16.

Elwood

R. W.

(2018). Calculating probability in sex offender risk assessment. The International Journal of Offender Therapy and Comparative Criminology, 62, 1262-1280. doi:10.1177/0306624X16677784

17.

Elwood

R. W.

Kelley

S. M.

Mundt

(2017). The 2015 Static-99R: Alternative recidivism tables for high-risk offenders. The International Journal of Offender Therapy and Comparative Criminology, 61, 1593-1605. doi:10.1177/0306624X15623803

18.

Feinstein

A. R.

Cicchetti

D. V.

(1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543-549.

19.

Gowensmith

W. N.

Murrie

D. C.

Boccaccini

M. T.

(2012). Field reliability of competence to stand trial opinions: How often do evaluators agree, and what do judges decide when evaluators disagree? Law and Human Behavior, 36, 130-139. doi:10.1037/h0093958

20.

Gowensmith

W. N.

Murrie

D. C.

Boccaccini

M. T.

(2013). How reliable are forensic evaluations of legal sanity? Law and Human Behavior, 37, 98-106. doi:10.1037/lhb0000001

21.

Guarnera

L. A.

Murrie

D. C.

(2017). Field reliability of competency and sanity opinions: A Systematic review and meta-analysis. Psychological Assessment 2017, 29, 795-818. doi:10.1037/pas0000388

22.

Guarnera

L. A.

Murrie

D. C.

Boccaccini

M. T.

(2017). Why do forensic experts disagree? Sources of unreliability and bias in forensic psychology evaluations. Translational Issues in Psychological Science, 3, 143-152. doi:10.1037/tps0000114

23.

Hanson

R. K.

Lunetta

A. L.

Phenix

Neeley

Epperson

(2014). The field validity of Static-99/R sex offender risk assessment tool in California. Journal of Threat Assessment and Management, 1, 102-117. doi:10.1037/tam0000014

24.

Hanson

R. K.

Thornton

Helmus

L. M.

Babchishin

K. M.

(2016). What sexual recidivism rates are associated with Static-99R and Static-2000R scores? Sexual Abuse: A Journal of Research and Treatment, 28, 218-252. doi:10.1177/1079063215574710

25.

Hare

R. D.

(2003). Manual for the revised psychopathy checklist (2nd ed.). Toronto, Ontario, Canada: Multi-Health Systems.

26.

Hawes

S. W.

Boccaccini

M. T.

Murrie

D. C.

(2012). Psychopathy and the combination of psychopathy and sexual deviance as predictors of sexual recidivism: Meta-analytic findings using the Psychopathy Checklist–Revised. Psychological Assessment, 25, 233-243.

27.

Helmus

Hanson

R. K.

Thornton

Babchishin

K. M.

Harris

A. J. R.

(2012). Absolute recidivism rates predicted by Static-99R and Static-2002R sex offender risk assessment tools vary across samples: A meta-analysis. Criminal Justice and Behavior, 39, 1148-1171. doi:10.1177/0093854812443648

28.

Hilton

N. Z.

Scurich

Helmus

L. M.

(2015). Communicating the risk of violent and offending behavior: Review and introduction to this special issue. Behavioral Sciences & the Law, 33, 1-18.

29.

Hilton

N. Z.

Simmons

J. L.

(2001). The influence of actuarial risk assessment in clinical judgments and tribunal decisions about mentally disordered offenders in maximum security. Law and Human Behavior, 25, 393-408. doi:0147-7307/01/0800-0393

30.

Jones

M. P.

(2014, August 26). The Texas counties: From most liberal to most conservative (Weblog, James A. Baker III Institute for Public Policy). Retrieved from https://blog.chron.com/bakerblog/2014/08/the-texas-counties-from-most-liberal-to-most-conservative/#26223101=0

31.

Kelley

S. M.

Ambroziak

Thornton

Barahal

R. M.

(2018). How do professionals assess sexual recidivism risk? An updated survey of practices. Sexual Abuse. Advance online publication. doi:10.1177/1079063218800474

32.

Knighton

J. C.

Murrie

D. C.

Boccaccini

M. T.

Turner

D. B.

(2014). How likely is “likely to reoffend” in sex offender civil commitment trials? Law and Human Behavior, 38, 293-304. doi:10.1037/lhb0000079

33.

Krauss

D. A.

McCabe Lieberman

J. D.

(2012). Dangerously misunderstood: Representative jurors’ reactions to expert testimony on future dangerousness in a sexually violent predator trial. Psychology, Public Policy, and Law, 18, 18-49. doi:10.1037/a0024550

34.

Kwartner

Lyons

P. M.

Boccaccini

M. T.

(2006). Judges’ risk communication preferences in risk for future violence cases. International Journal of Forensic Mental Health, 5, 185-194. doi:10.1080/14999013.2006.10471242

35.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159-174.

36.

Levenson

J. S.

Brannon

Y. N.

Fortney

Baker

(2007). Public perceptions about sex offenders and community protection policies. Analyses of Social Issues and Public Policy, 7, 137-161. doi:10.1111/j.1530-2415.2007.00119.x

37.

Murrie

D. C.

Boccaccini

M. T.

(2015). Adversarial allegiance among expert witnesses. Annual Review of Law and Social Science, 11, 37-55. doi:10.1146/annurev-lawsocsci-120814-121714

38.

Murrie

D. C.

Boccaccini

M. T.

Guarnera

L. A.

Rufino

K. A.

(2013). Are forensic experts biased by the side that retained them? Psychological Science, 24, 1889-1897. doi:10.1177/0956797613481812

39.

Murrie

D. C.

Boccaccini

M. T.

Turner

Meeks

Woods

Tussey

(2009). Rater (dis)agreement on risk assessment measures in sexually violent predator proceedings: Evidence of adversarial allegiance in forensic evaluation? Psychology, Public Policy, and Law, 15, 19-53. doi:10.1037/a0014897

40.

Murrie

D. C.

Warren

J. I.

(2005). Clinician variation in rates of legal sanity opinions: Implications for self-monitoring. Professional Psychology: Research and Practice, 36, 519-524. doi:10.1037/0735-7028.36.5.519

41.

Neal

T. M. S.

Brodsky

S. L.

(2016). Forensic psychologists’ perception of bias and potential correction strategies in forensic mental health evaluations. Psychology, Public Policy, and Law, 22, 58-76. doi:10.1037/law0000077

42.

Neal

T. M. S.

Grisso

(2014). Assessment practices and expert judgment methods in forensic psychology and psychiatry: An international snapshot. Criminal Justice and Behavior, 41, 1406-1421. doi:10.1177/0093854814548449

43.

Olver

M. E.

Beggs-Christofferson

S. M.

Grace

R. M.

Wong

S. C. P.

(2014). Incorporating change information into sexual offender risk assessments using the Violence Risk Scale–Sexual Offender Version. Sexual Abuse: A Journal of Research and Treatment, 26, 472-499. doi:10.1177/1079063213502679

44.

Phenix

Helmus

Hanson

R. K.

(2012). Static-99R & Static-2002R evaluators’ workbook. Retrieved from http://www.static99.org/pdfdocs/Static-99RandStatic-2002R_EvaluatorsWorkbook2012-07-26.pdf

45.

Phenix

Helmus

L.-M.

Hanson

R. K.

(2015). Static-99R & Static-2002R evaluators’ workbook. Retrieved from http://www.static99.org/pdfdocs/Static-99RandStatic-2002R_EvaluatorsWorkbook-Jan2015.pdf

46.

Reeves

S. G.

Ogloff

J. R. P.

Simmons

(2017). The predictive validity of the Static-99, Static-99R, and Static-2002/R: Which one to use? Sexual Abuse, 30, 887-907. doi:10.1177/1079063217712216

47.

Rufino

K. A.

Boccaccini

M. T.

Hawes

Murrie

D. C.

(2012). When experts disagreed, who was correct? A comparison of PCL-R scores from independent raters and opposing forensic experts. Law and Human Behavior, 36, 527-537. doi:10.1037/h0093988

48.

Saad

(2018, February 6). Conservative-leaning states drop from 44 to 39. Retrieved from https://news.gallup.com/poll/226730/conservative-leaning-states-drop.aspx

49.

Scurich

(2018). The case against categorical risk estimates. Behavioral Sciences & Law, 36, 554-564.

50.

Scurich

Krauss

D. A.

(2013). The effect of adjusted actuarial risk assessment on mock-jurors’ decisions in a sexual predator commitment proceeding. Jurimetrics, 53, 395-413.

51.

Scurich

Krauss

D. A.

(2014). The presumption of dangerousness in sexually violent predator commitment proceedings. Law, Probability & Risk, 13, 91-104. doi:10.1093/lpr/mgt015

52.

State v. Smalley. (2007). WI App 219. Retrieved from https://www.wicourts.gov/ca/opinion/DisplayDocument.html?content=html&seqNo=30318

53.

Turner

D. B.

Boccaccini

M. T.

Murrie

D. C.

Harris

P. B.

(2015). Jurors report that risk measure scores matter in sexually violent predator trials, but that other factors matter more. Behavioral Science & the Law, 33, 56-73. doi:10.1002/bsl.2154

54.

Varela

J. G.

Boccaccini

M. T.

Cuervo

V. A.

Murrie

D. C.

Clark

J. W.

(2014). Same score, different message: Perceptions of offender risk depend on Static-99R risk communication format. Law and Human Behavior, 38, 418-427. doi:10.1037/lhb0000073

55.

Vyas

Hozain

A. E.

(2014). Clinical peer review in the United States: History, legal development and subsequent abuse. World Journal of Gastroenterology, 20(21), 6357-6363. doi:10.3748/wjg.v20.i21.6357

56.

Whitaker

(2017). Civil commitment of sexually violent persons (Informational Paper 52). Madison, WI: Wisconsin Legislative Fiscal Bureau. Retrieved from https://docs.legis.wisconsin.gov/misc/lfb/informational_papers/january_2017/0052_civil_commitment_of_sexually_violent_persons_informational_paper_52.pdf

57.

Wisconsin State Legislature. (2019, March 11). Chapter 980: Sexually violent person commitments. Retrieved from http://docs.legis.wisconsin.gov/statutes/statutes/980.pdf

58.

Zapf

P. A.

Hubbard

K. L.

Cooper

V. G.

Wheeles

M. C.

Ronan

K. A.

(2004). Have the courts abdicated their responsibility for determination of competency to stand trial to clinicians? Journal of Forensic Psychology Practice, 4, 27-44. doi:10.1300/J158v04n01_02