Treatment Evaluation in Forensic Psychiatry. Which One Should Be Used: The Clinical Judgment or the Instrument-based Assessment of Change?

Abstract

In forensic psychiatry, it is common practice to use an unstructured clinical judgment for treatment evaluation. From risk assessment studies, it is known that the unstructured clinical judgment is unreliable and the use of instruments is recommended. This paper aims to explore the clinical judgment of change compared to the calculated change using the Instrument for Forensic Treatment Evaluation (IFTE) in relation to changes in inpatient violence This study shows that the clinical judgment is much more positive about patient’s behavioral changes than the calculated change. And that the calculated change is more in accordance with the change in the occurrence of inpatient violence, suggesting that the calculated change reflects reality closer than the unstructured clinical judgment. Therefore, it is advisable to use the IFTE as a base to make a structured professional judgment of the treatment evaluation of a forensic psychiatric patient.

Keywords

IFTE clinical judgment calculated change forensic treatment evaluation inpatient violence

Introduction

Treatment of forensic psychiatric patients is most effective to prevent recidivism when the three principles of the Risk-Need-Responsivity (RNR) model are applied (Andrews & Bonta, 2010; Andrews et al., 1990, 2006; Polaschek, 2012). The Risk principle argues that treatment programs must meet a patient’s risk level in terms of duration and intensity of the treatment. High-risk offenders need longer and more intensive treatment than low-risk offenders (Papalia et al., 2019). According to the Need principle, treatment programs must focus on patient’s specific dynamic criminogenic needs, which contribute to an increased risk of recidivism. Finally, the Responsivity principle states that treatment programs must match the learning ability, motivation, and strengths of the offender and the treatment used must be evidence-based (Skeem et al., 2015).

The assessment of an offender’s personal risk level and needs was, until the mid-seventies of the last century, a matter of subjective judgments by clinicians. Own insights, intuition, professional opinion, confidence, training, and experiences were leading in the assessment (Miller et al., 2015). This was referred to as the first generation of risk assessment (Andrews et al., 2006). Spengler et al. (2009) showed in a meta-analysis that this unstructured clinical judgment frequently led to inaccurate evaluations of the risk of recidivism. The lack of rules, transparency, replicability, consistency and scoring integrity, and accuracy led to criticism of the clinical approach (Harris & Rice, 2007). For instance, with the unstructured clinical judgment important risk factors were overlooked or not considered, too much attention was paid to irrelevant factors or insufficient weight was assigned to relevant risk factors (Dawes et al., 1989). Therefore, structured (actuarial) risk assessment tools were developed and introduced to tackle the limitations of unstructured clinical judgments, both in the context of legal decision-making and in forensic psychiatric treatment. These instruments are referred to as the second generation of risk assessment (Ǣgisdóttir et al., 2006; Baird & Stocks, 2013; Cooper et al., 2008). In a meta-analysis of 136 studies in which actuarial predictions were compared with unstructured clinical predictions concerning risk of recidivism, actuarial predictions were found to be more accurate than unstructured clinical predictions in almost half (47%) of the studies (Grove et al., 2000). No differences in predictive accuracy between both approaches was found in about 47% of the studies and in a small minority of studies (6%), the unstructured prediction was slightly more accurate. On average, the actuarial prediction of future violence was more accurate than the unstructured clinical prediction by an approximately 10% increase in hit rate (Ǣgisdóttir et al., 2006).

A shortcoming of actuarial predictions was that only historical or static factors were assessed which could not be influenced by treatment or time. Therefore, dynamic risk factors or dynamic criminogenic needs were added to the instruments (Douglas & Skeem, 2005). The use of risk assessment instruments consisting of dynamic risk factors in combination with static factors led to structured professional judgments of future risk of recidivism: the third generation of risk assessment instruments (Andrews et al., 2006). After evaluating and weighing all risk factors and considering the base rate of recidivism, social factors and a patient’s individual environmental factors, a final risk level was determined (Bonta & Andrews, 2007). By systematically assessing dynamic criminogenic needs treatment can be focused on the dynamic criminogenic needs that require treatment and the effects of treatment can be evaluated (Belfrage & Douglas, 2002; Olver & Wong, 2011). Letting the outcomes of the systematical assessments of criminogenic needs guide the treatment is referred to as the fourth generation of risk assessment (Andrews et al., 2006).

Besides predicting future risk of violence, at certain time points in treatment, multidisciplinary teams of clinicians must decide whether a patient has shown sufficient progress, meaning a decrease in risk factors and an increase in protective factors, to make progress in the treatment process, such as transfers to less secure wards or unsupervised leave (Wilson et al., 2013). In inpatient forensic psychiatry, treatment teams and individual professionals often make treatment decisions based on their own subjective assessments without support of systematically collected data, that might be available (Bosker et al., 2013; Day et al., 2017).

This is also a problem in general mental health care, where clinicians also often do not use available routine outcome monitoring (ROM) data for treatment evaluation (Tasma et al., 2017; Zimmerman & McGlinchey, 2008). In general mental health care, studies are available on the limitations of unstructured clinical judgment of treatment progress (Bell & Mellor, 2009; Lilienfeld et al., 2013). The most serious ones are the lack of reliability, transparency, and replicability of unstructured clinical judgments. Clinicians using unstructured judgments fail to observe deterioration or report improvement while there is none (Hannan et al., 2005; Lilienfeld et al., 2014). These inaccurate evaluations can have a negative effect on the patient-professional working alliance and can have negative effects on the well-being of the patient. Several reasons have been reported for biases of unstructured clinical judgments. Clinicians may focus on a limited number of factors and/or on irrelevant data (Lockhart & Satya-Murti, 2017; Waller, 2009), overestimate the value of their experience (Hannan et al., 2005), or receive limited or no feedback on their judgments (Dawes et al., 1989). Therefore, using validated routine outcome monitoring tools is highly recommended to support and improve the accuracy of clinical decisions (Hansen et al., 2002; Lilienfeld et al., 2014; Waller & Turner, 2016).

In forensic psychiatry, an inaccurate positive or negative evaluation of change in risk and protective factors can have profound consequences for the patient, fellow patients, personnel, or society. For example, clinicians who wrongly decide that a forensic patient has changed positively, such as an incorrect judgment of a decrease in offence related risk factors, will grant the patient more responsibilities and more freedom than the patient can manage. This in turn can contribute to an increased risk of inpatient violence and/or violence after treatment. Inpatient violence is a serious problem within forensic psychiatry (Dack et al., 2013; Schuringa et al., 2018), associated with recidivism after discharge (Daffern et al., 2007), and negatively associated with treatment adherence (Jeandarme et al., 2019).

In sum, for predicting the risk of violence there is consensus in forensic psychiatry about the benefits of the use of structured assessment instruments. In general mental health care, there is the growing realization that ROM-instruments are beneficial for treatment evaluation purposes. However, in forensic psychiatry little attention has been given to structured treatment evaluations. The Instrument for Forensic Treatment Evaluation (IFTE; Schuringa et al., 2014) is especially designed for multidisciplinary treatment evaluation purposes in forensic psychiatry. Earlier studies have shown that the IFTE can be used to predict short-term and longer-term inpatient violence for forensic patients in high security institutions (Schuringa et al., 2016, 2019; van der Veeken et al., 2016, 2018).

This paper aims to compare the structured and unstructured judgment of change during forensic psychiatric treatment. Firstly, this study explores whether there is a difference between the treatment evaluation based on the average change in team scores on the IFTE (hereinafter calculated change [CalCh]) and the unstructured clinical judgment of change of the main clinician (ClinJCh). Secondly, this study compares CalCh and ClinJCh in relation to the change in occurrence of inpatient violence over the same period.

Methods

Setting

The study is set at Forensic Psychiatric Centre (FPC) Dr. S. van Mesdag, a maximum-security institution in the Netherlands for mentally disordered male offenders hospitalized under the Dutch entrustment act (tbs-order). A tbs-order is a “provision in the Dutch criminal code that allows for a period of treatment following a prison sentence for mentally disordered offenders.” (van Marle, 2002, p. 83). A tbs-order treatment is not considered as an additional punishment as such, but as a measure to protect society. Every 1 or 2 years, a tbs-order must be evaluated by a court based on the information of the treatment progress provided by the clinicians. The judge decides to prolong the act based on this information and the assessed risk for recidivism.

The IFTE data in this study are extracted from the Routine Outcome Monitoring (ROM) system of FPC Dr. S. van Mesdag. The period covered is from October 2016 until April 2019. The inclusion criteria for this study are: At least three team members completed IFTE’s at two sequentially measurement time points restricted to 4 to 8 months apart and the main clinician has answered the clinical judgment question about whether the patient has changed at the second measurement.

Instrument

The IFTE (Schuringa et al., 2014) consists of all 14 clinical (dynamic) items of the Dutch risk assessment instrument HKT-R (Historical, Clinical, Future – Revised; Spreen et al., 2014), three items inspired by the Atascadero Skills Profile (Vess, 2001), and five items designed in collaboration with clinicians of the institution. All 22 items describe observable behaviors and are divided into three factors: Protective behavior, Problematic behavior, and Resocialization skills (see Table 1).

Table 1.

Factors and Item Descriptions of the IFTE.

Protective behavior	Problematic behavior	Resocialization skills
Problem insight^a	Impulsive behavior^a	Balanced daytime activities^c
Cooperation with the treatment^a	Antisocial behavior^a	Work skills^a
Take responsibility for the crime(s)^a	Hostile behavior^a	Social skills^a
Coping skills^a	Sexually deviant behavior^c	Self-care^a
Medication use^c	Manipulative behavior^c	Financial skills^c
Skills to prevent drug and alcohol use^b	Compliance to rules and conditions^a
Skills to prevent physically aggressive behavior^b	Antisocial associates^a
Skills to prevent sexually deviant behavior^b	Psychotic symptoms^a
	Drugs use^a

From HKT-R.

Inspired by ASP.

Designed with clinicians.

Distinctive features of the IFTE are the 17-points rating scale, which contributes to the enhanced sensitivity for measuring change (Serin et al., 2013) and its multidisciplinary use. The IFTE is completed by all members of a multidisciplinary treatment team independently before the treatment evaluation meeting, every 6 months. The scoring takes about 10 minutes. Before filling out the IFTE, the main clinician gives his/her clinical judgment whether he/she thinks the behavior of a patient has changed by answering the question: “Has the patient changed in this last period?” A 13-pointscale with four anchor points is used: 0 = “worsened,” 1 = “no change,” 2 = “a little improved” and 3 = “a lot improved.” Main clinicians in this institution are the coordinators of the treatment and are mostly (clinical) psychologists. The information is displayed in a treatment evaluation report in which the average team score per item and per factor are displayed as well as a measurement of change (Spreen et al., 2010) and a team agreement index per item (Gower & Legendre, 1986) is reported. The agreement index (0.00 is no agreement, 1.00 is total agreement) displays whether the behavior is consistently observed in different situations by different therapists. The three factors of the IFTE show moderate to good inter-rater reliability (Cronbach’s alphas range from .50 to .92), test-retest reliability (alphas range from .57 to .92), good internal consistency (range from .81 to .90), and modest to good concurrent validity. The factor Problematic behavior has good predictive validity for drug use (Cohen’s d = 1.47), and for inpatient violence with different diagnostic target groups (AUC = .77, CI: [0.70–0.85]) (Schuringa et al., 2014, 2016, 2018; van der Veeken et al., 2016).

Independent Variables

The three factors of the IFTE are used as independent variables (see Table 1). Also, the primary treatment goal of the patient is used as an independent variable. This variable is determined by taking the primary treatment goal of the treatment evaluation report of the second measurement and translating this goal into a corresponding IFTE item. In this way, each patient’s personal and actual criminological need is operationalized.

Outcome Measure

Inpatient violence is the outcome measure and is defined as any behavior, which intentionally could or did physically harm a person or animal, and/or a form of aggression, which is extremely intimidating or threatening (Troquete et al., 2013). Inpatient violence is determined per measurement by scoring the presence (1) of absence (0) of violent acts reported in the treatment evaluation report. The reporting of violence was too poor to differentiate severity and frequency of the violent acts. The change of violence is computed by the difference of the presence and/or absence of the violent acts between both measurements, resulting into three categories: less violence, no change, and more violence. No change means there is either violence or no violence at both measurements.

Calculated Change (CalCh)

The CalCh is computed as the difference between the average team scores, including the score of the main clinician, of two sequential measurements on the IFTE. The reliable change index (RCI) is applied to express the degree of change in observed behavior between the two measurements (Jacobson & Truax, 1991). The RCI is an index to determine whether a change of a patient is statistically reliable. In this study three categories of change are defined: the behavior of a patient can be improved (RCI ≥ 1.96), not changed (−1.96 > RCI < 1.96) or worsened (RCI ≤ −1.96). The CalCh is calculated for the three factors and the IFTE item representing the primary treatment goal.

Clinical Judgment of Change (ClinJCh)

The clinical judgment of change is determined by categorizing the 13-pointscale, filled out at the second measurement by the main clinician; “Has the patient changed?” into three categories: worsened (0–2), not changed (3–5), improved (6–13).

Statistical Analysis

To investigate the correspondence between the ClinJCh of the main clinician and the CalCh of the observed behavior by the team on the three IFTE factors and the primary treatment goal, the data is displayed in crosstabs and percentages of corresponding judgments are calculated. McNemar tests are used to determine whether there is a structural difference between the CalCh and ClinJCh.

The frequency of correspondence of the ClinJCh with change in violence is compared to the correspondence of CalCh with changes in violence. This results in a 2 × 2 table and a McNemar test is used to determine the difference between the agreements of CalCh and ClinJCh with change in violence.

Results

Sample

The sample for this study consisted of 119 men, with an average age of 36.7 years at intake (SD = 9.3; range 19–70), and a mean duration in the hospital of 42.1 months at measurement 1 (SD = 35.6; range 0–190). Thirty-nine percent of the patients had a main diagnosis of schizophrenia spectrum disorder, 29% had a personality disorder of which 13 patients had an antisocial personality disorder and six had a borderline personality disorder. Seventeen percent of the patients had a neurodevelopmental disorder (e.g., ADHD or autism spectrum disorder), 6% were diagnosed with a paraphilic disorder, 5% with a drug related diagnosis and 4% with another diagnosis (DSM-IV-TR; Diagnostic and statistical manual of mental disorders [4th ed., text rev.], American Psychiatric Association, 2000). They were convicted for: 1% for theft, 11% for medium violence, 10% for theft accompanied with violence, 23% for severe violence, 19% for a sexual crime, 16% for manslaughter, 10% for arson, and 10% for murder. There were 25 different main clinicians which evaluated on average 4.8 patients (SD = 3.5; range 1–14).

Statistical Analysis

Table 2 shows the results of ClinJCh and CalCh for the three factors. For the factor Protective behavior in 34% of the judgments (N = 41), the calculated change and clinician judgment of change matched. Of the 78 patients for whom no agreement was found between CalCh and ClinJCh of the factor Protective behavior, judgments about the direction of behavioral change differed significantly. In 70 cases, the ClinJCh of the main clinicians was more positive than the CalCh of the team. In only eight cases was the CalCh more positive than the ClinJCh (χ² (1) = 49.28, p < .001).

Table 2.

Cross Table of the ClinJCh and CalCH on the Three Factors.

		Calculated change protective behavior				Calculated change problematic behavior				Calculated change resocialization skills
		Worsened	Stable	Improved	Total	Worsened	Stable	Improved	Total	Worsened	Stable	Improved	Total
Clinical judgment of change	Worsened	3	7	0	10	1	9	0	10	1	9	0	10
	Stable	6	33	1	40	1	39	0	40	5	33	2	40
	Improved	0	64	5	69	0	67	2	69	2	65	2	69
Total		9	104	6	119	2	115	2	119	8	107	4	119

The ClinJCh was also significantly more positive about the behavioral change on the factor Problematic behavior (χ² (1) = 45.21, p < .001; see Table 2). In 68 of the 77 cases (88%) with disagreements, the clinical judgment reported a more positive treatment development than the calculated change. In 35% of the cases, the ClinJCh was similar to the CalCh (see diagonal in Table 2). Concerning the factor Resocialization skills, also a significant difference in agreement between the clinical judgment and calculated change was observed (χ² (1) = 49.28, p < .001; see Table 2). Like the two other IFTE factors, the ClinJCh of the main clinicians was more positive about the progress of the treatment. Of the 83 disagreements, 72 (87%) were evaluated more positively by the main clinicians. Agreement was found in 30% of the cases (see diagonal in Table 2). Regarding the direction of the progress of the individualized IFTE treatment goals, the unstructured clinical judgment corresponded in 32% of the cases with the calculated change (see Table 3). The ClinJCh was significantly more positive than the CalCh (χ² (1) = 44.83, p < .001).

Table 3.

Cross Table of ClinJCh and CalCh on Treatment Goal.

		Calculated change treatment goal			Total
		Worsened	Stable	Improved	Total
Clinical judgment of change	Worsened	1	9	0	10
	Stable	5	31	4	40
	Improved	5	56	6	69
Total		11	96	10	119

In summary, in 30% to 35% of the cases, the clinical judgment of the main clinician and the calculated change of the team agreed about the direction of patient’s behavioral change. And in 81% to 90% of the cases where ClinJCh and CalCh disagreed, the clinical judgment of the main clinicians was more positive about the progress of the patient than the calculated change of the observed behavior by the whole team.

The second question of this study aimed to gain insight whether the clinical judgment or the calculated change corresponded most with the actual behavioral change of inpatient violence. The number of patients committing violence at the first measurement was 39 (33%) and at the second measurement it was 36 patients (31%). Twenty-five patients (21%) changed in violent behavior, either from non-violent to violent (N = 11) or from violent to non-violent (N = 14). In total 50 patients (42%) showed violence at one or both measurements. Table 4 shows that there is a significant difference in proportion of ClinJCh of the factor Problematic behavior being equal to the change in violence (26% [31/118] of the cases) and the proportion of CalCh being equal to the change in violence (77% [91/118] of the cases; χ² (1) = 54.39, p < .001, OR = 31.00 [95% CI: 8.23–261.66]).

Table 4.

Comparison of CalCh with Change in Violence to ClinJCh with Change in Violence.

Problematic behavior	Change in violence	Clinical judgment of change		>Total
Problematic behavior	Change in violence	Equal	Unequal	>Total
Calculated change	Equal	29	62	91
Calculated change	Unequal	2	25	27
Total		31	87	118

Discussion

This study investigated the agreement between the unstructured clinical judgment by the main clinician (ClinJCh) and the calculated change (CalCh) by the multidisciplinary team of the behavioral change of forensic psychiatric patients. Agreements between the calculated change and clinical judgment of change for the three IFTE factors (Protective behavior, Problematic behavior, and Resocialization skills), and an individualized treatment goal were studied. This study also compared the correspondence in agreement of the two methods of judgment of change with actual change in occurrence of violence. The results showed that the clinical judgment of change only matched the calculated change on the factors of the IFTE and the treatment goal in about 30% to 35% of the cases. The clinical judgment assessed significantly more positive change in behavior than the calculated change.

Since the main goal of treatment in a forensic psychiatric center is the reduction of risk of violence (Andrews et al., 1990; van Marle, 2002), factors strongly associated with that risk should be the focus of treatment, and therefore the focus of change as judged by the main clinician. One would expect a significant relation between the clinical judgment of change and the calculated change on the factor Problematic behavior. The factor Problematic behavior of the IFTE consists of well-known risk factors (Andrews & Bonta, 2010; Andrews et al., 2006) and has shown good predictive validity for inpatient violence (Schuringa et al., 2016, van der Veeken et al., 2016). However, in this study, the clinical judgment of change was much more positive about the change on Problematic behavior then the calculated change. A reason for this could be the cross-sectional sample of this study, which means that the treatment duration of this sample had a range from 0 to 190 months. The mean duration of a tbs-order treatment is approximately 8 years (Nagtegaal et al., 2011). The most problematic behavior is expected at the beginning of treatment, while at the end of treatment the focus would be on resocialization skills. It could be possible that the focus of the clinical judgment had also changed from Problematic behavior to Protective behavior or Resocialization skills in accordance with the progress in treatment a patient had made. In this study, however, there was also little agreement between clinical judgment of change and calculated change on Protective behavior (34%) or Resocialization skills (30%). If the main clinician did not have one of the three factors of the IFTE in mind when answering the question if the patient had changed, then maybe he/she was focused on the individual treatment goal of the patient. But, when this study focused on the specific individual treatment goal of a patient as reported in treatment evaluations, also little agreement was found between clinical judgment of change and calculated change (32%).

This study also explored which judgment of change, the unstructured or the structured one, was more in accordance with behavioral changes. This study showed that the calculated change on the factor Problematic behavior was much more in agreement with actual change in occurrence of violence than the clinical judgment. This strongly indicates that the calculated change was a more accurate representation of changes in violent behavior than the clinical judgment of change, and that the clinical judgment was thus too positive about change. This is in line with studies in general mental health which concluded that clinicians often fail to detect deterioration and over-report improvements (Hannan et al., 2005; Lilienfeld et al., 2014). In forensic psychiatry, this overly positive judgment of change of a patient can result in very adverse outcomes, such as violence. If the main clinician wrongly judges a patient progress as positively changed, he/she may adjust the risk management plan accordingly, which can result in too many responsibilities and freedom for the patient. This mismatch can overcharge the patient’s coping skills, which then can lead to an increased risk of violence. Treatment evaluations based solely on clinical judgments are therefore not recommended.

Results from this study suggest that in forensic psychiatric treatment evaluation, more emphasis should be placed on the results based on structured observations made by the multidisciplinary team. No matter which topic was studied, problematic behavior, protective behavior, resocialization skills or individual treatment goals, the main clinician is much more positive about the change than the instrument-based team scores. This does not mean that the clinical judgment is not useful, since there was no perfect accordance between calculated change and change in violent behavior, but treatment decisions should not be done based solely on the clinical judgment.

In an earlier study (Schuringa et al., 2018), the factor Problematic behavior could be used to classify patients in a high-risk group for short-term violence and in a low-risk group, but in this high-risk group about 50% of the patients committed violence. A similar percentage is found for the HKT-R (Spreen et al., 2014). Although, instruments are undoubtedly useful in forensic psychiatric treatment, they are not perfect, and one should not rely solely on instruments. A clinical judgment based on, instrument-based data is recommended, to reduce various biases of clinical judgments (Bell & Mellor, 2009; Lilienfeld et al., 2013), and at the same time overcome the limitations which come with data driven decisions, such as a non-related or insufficient data (Chin-Yee & Upshur, 2018; Lockhart & Satya-Murti, 2017). Just as with the third-generation risk assessments, ROM-instrument should be used as a basis to make a structured professional judgment about the treatment progress, or in other words, use the best of both worlds.

In the RNR-model the use of instruments to determine Risk and Needs is already widespread practice, but less attention has been given to the use of instruments to determine Responsivity to the treatment (Duwe & Kim, 2018). This study showed that the use of an instrument instead of solely the clinical judgment to establish changes in behavior, and thus structural monitoring responsivity of the patient to the treatment, is just as beneficial as using instruments for assessing risk and needs. Compliance with the principles of the RNR-model, although widely accepted as beneficial, is not as common as one would expect (Bonta et al., 2008). Maybe, if the assessment of responsivity is also performed in a more structured manner, this could help to comply to the RNR-model more often.

Strengths and Limitations

A limitation, but also a strength of this study is its naturalistic design of data collection. For several reasons, such as time management problems or priority issues, one must deal with missing measurements when using data from everyday practice, but with regular and long-lasting measurements most patients are still represented in the data for this study. The sample used in this study is heterogenous in diagnosis, age, treatment duration, and committed crimes, which could have impact on the outcome measure violence. However, in an earlier study (Schuringa et al., 2018) these variables did not influence the predictive power of the factor Problematic behavior, so they probably would also have little to no effect in this study.

The question of the clinical judgment: “has someone changed?” did not specify in which behavior a given patient had changed. It could be possible that the main clinician was thinking about for risk assessment purposes less relevant behavior, where the patient made substantial changes. For example, his cleaning and eating habits. For future research, the question should be more specified, for instance: “Did the patient change on his main treatment goal?” and “Is the patient at risk for violence in the near future?”

The outcome measure inpatient violence is scored as either present or absent because the severity and frequency of violence was not possible to determine by the lack of and/or incomplete reporting of violence in the treatment evaluation reports. Comparing changes in frequencies and severity of violence could result in more sophisticated results. Better reporting of this adverse outcome is therefore advisable. This study was also performed in a single institution, although one of the largest forensic psychiatric treatment institutions of the Netherlands and with a diverse, diagnostic wise population, generalizations to other institutions should be done with care.

Conclusion

This study showed that the unstructured clinical judgment of the main clinician is usually more positive about a change a given patient has demonstrated than the calculated change based on the IFTE. The calculated change on the factor Problematic behavior was more in line with actual behavioral change of occurrence of violence than clinical judgment, but not perfectly. Therefore, using the structured assessment of the IFTE as a base, in combination with the clinical judgment for decision making in forensic psychiatric treatment evaluation is recommended. Our advice is: Use the best of both worlds.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Erwin Schuringa

References

Ǣgisdóttir

White

M. J.

Spengler

P. M.

Maugherman

A. S.

Anderson

L. A.

Cook

R. S.

Nichols

C. N.

Lampropoulos

G. K.

Walker

B. S.

Cohen

Rush

J. D.

(2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. The Counseling Psychologist, 34(3), 341–382. https://doi.org/10.1177/0011000005285875

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Author.

Andrews

D. A.

Bonta

(2010). Rehabilitating criminal justice policy and practice. Psychology, Public Policy, and Law, 16(1), 39–55. https://doi.org/10.1037/a0018362

Andrews

D. A.

Bonta

Hoge

R. D.

(1990). Classification for effective rehabilitation: Rediscovering psychology. Criminal Justice and Behavior, 17(1), 19–52. https://doi.org/10.1177/0093854890017001004

Andrews

D. A.

Bonta

Wormith

J. S.

(2006). The recent past and near future of risk and/or need assessment. Crime & Delinquency, 52(1), 7–27. https://doi.org/10.1177/0011128705281756

Baird

Stocks

(2013). Risk assessment and management: Forensic methods, human results. Advances in Psychiatric Treatment, 19(5), 358–365. https://doi.org/10.1192/apt.bp.111.009407

Belfrage

Douglas

K. S.

(2002). Treatment effects on forensic psychiatric patients measured with the HCR-20 violence risk assessment scheme. International Journal of Forensic Mental Health, 1(1), 25–36. https://doi.org/10.1080/14999013.2002.10471158

Bell

Mellor

(2009). Clinical judgments: Research and Practice. Australian Psychologist, 44(2), 112–121. https://doi.org/10.1080/00050060802550023

Bonta

Andrews

D. A.

(2007). Risk-need-responsivity model for offender assessment and rehabilitation. Public Safety Canada.

10.

Bonta

Rugge

Scott

T-L

Bourgon

Yessine

A. K.

(2008). Exploring the black box of community supervision. Journal of Offender Rehabilitation, 47(3), 248–270. https://doi.org/10.1080/10509670802134085

11.

Bosker

Witteman

Hermanns

(2013). Agreement about intervention plans by probation officers. Criminal Justice and Behavior, 40(5), 569–581. https://doi.org/10.1177/0093854812464220

12.

Chin-Yee

Upshur

(2018). Clinical judgment in the era of big data and predictive analytics. Journal of Evaluation in Clinical Practice, 24(3), 638–645. https://doi.org/10.1111/jep.12852

13.

Cooper

B. S.

Griesel

Yuille

J. C.

(2008). Clinical-forensic risk assessment: The past and current state of affairs. Journal of Forensic Psychology Practice, 7(4), 1–63. https://doi.org/10.1300/J158v07n04_01

14.

Dack

Ross

Papadopoulos

Stewart

Bowers

(2013). A review and meta-analysis of the patient factors associated with psychiatric in-patient aggression. Acta Psychiatrica Scandinavica, 127(4), 255–268. https://doi.org/10.1111/acps.12053

15.

Daffern

Jones

Howels

Shine

Mikton

Tunbridge

(2007). Editorial refining the definition of offence paralleling behavior. Criminal Behaviour and Mental Health, 17(5), 265–273. https://doi.org/10.1002/cbm.671

16.

Dawes

R. M.

Faust

Meehl

P. E.

(1989). Clinical versus actuarial judgment. Science, 243(4899), 1668–1674.

17.

Day

M. D.

Wilson

H. A.

Bodwin

Monson

C. M.

(2017). Change in Level of Service Inventory-Ontario Revised (LSI-OR) risk scores over time: An examination of overall growth curves and subscale-dependent growth curves. International Journal of Offender Therapy and Comparative Criminology, 61(14), 1606–1622. https://doi.org/10.1177/0306624X15623016

18.

Douglas

K. S.

Skeem

J. L.

(2005). Violence risk assessment: Getting specific about being dynamic. Psychology, Public Policy, and Law, 11(3), 347–383. https://doi.org/10.1037/1076-8971.11.3.347

19.

Duwe

Kim

(2018). The neglected “R” in the risk-needs-responsivity model: A new approach for assessing responsivity to correctional interventions. Justice Evaluation Journal, 1(2), 130–150. https://doi.org/10.1080/24751979.2018.1502622

20.

Gower

J. C.

Legendre

(1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5–48. https://doi.org/10.1007/BF01896809

21.

Grove

W. M.

Zald

D. H.

Lebow

B. S.

Snitz

B. E.

Nelson

(2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30. https://doi.org/10.1037//1040-3590.12.1.19

22.

Hannan

Lamberts

M. J.

Harmon

Nielsen

S. L.

Smart

D. W.

Shimokawa

Sutton

S. W.

(2005). A lab test and algorithms for identifying clients at risk for treatment failure. Journal of Clinical Psychology, 61(2), 155–163. https://doi.org/10.1002/jclp.20108

23.

Hansen

N. B.

Labert

M. J.

Forman

E. M.

(2002). The psychotherapy dose-response effect and its implications for treatment delivery services. Clinical Psychology Science Practice, 9(3), 329–343. https://doi.org/10.1093/clipsy.9.3.329

24.

Harris

G. T.

Rice

M. E.

(2007). Characterizing the value of actuarial violence risk assessments. Criminal Justice and Behavior, 34(12), 1638–1658. https://doi.org/10.1177/0093854807307029

25.

Jacobson

N. S.

Truax

(1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59(1), 12–19. https://doi.org/10.1037/0022-006X.59.1.12

26.

Jeandarme

Wittouck

Vander Laenen

Pouls

Oei

T. I.

Bogaerts

(2019). Risk factors associated with inpatient violence during medium security treatment. Journal of Interpersonal Violence, 34(17), 3711–3736. https://doi.org/10.1177/0886260516670884

27.

Lilienfeld

S. O.

Ritschel

L. A.

Lynn

S. J.

Cautin

R. L.

Latzman

R. D.

(2013). Why many clinical psychologists are resistant to evidence-based practice: Root causes and constructive remedies. Clinical Psychology Review, 33(7), 883–900. https://doi.org/10.1016/j.cpr.2012.09.008

28.

Lilienfeld

S. O.

Ritschel

L. A.

Lynn

S. J.

Cautin

R. L.

Latzman

R. D.

(2014). Why ineffective psychotherapies appear to work: A taxonomy of causes of spurious therapeutic effectiveness. Perspectives on Psychological Science, 9(4), 355–387. https://doi.org/10.1177/1745691614535216

29.

Lockhart

J. J.

Satya-Murti

(2017). Diagnosing crime and diagnosing disease: Bias reduction strategies in the forensic and clinical sciences. Journal of Forensic Sciences, 62(6), 1534–1541. https://doi.org/10.1111/1556-4029.13453

30.

Miller

D. J.

Spengler

E. S.

Spengler

P. M.

(2015). A meta-analysis of confidence and judgment accuracy in clinical decision making. Journal of Counseling Psychology, 62(4), 553–567. https://doi.org/10.1037/cou0000105

31.

Nagtegaal

M. H.

van der Horst

R. P.

Schonberger

H. J. M.

(2011). Inzicht in de verblijfsduur van tbs-gestelden. [Insight in duration of tbs-order persons]. Boom Juridische uitgevers.

32.

Olver

M. E.

Wong

S. C. P.

(2011). A comparison of static and dynamic assessment of sexual offender risk and need in a treatment context. Criminal Justice and Behavior, 38(2), 113–126. https://doi.org/10.1177/0093854810389534

33.

Papalia

Spivak

Daffern

Ogloff

J. R. P.

(2019). A meta-analytic review of the efficacy of psychological treatments for violent offenders in correctional and forensic mental health settings. Clinical Psychology, Science and Practice, 26(2), 1–28. https://doi.org/10.1111/cpsp.12282

34.

Polaschek

D. L. L.

(2012). An appraisal of the risk-need-responsivity (RNR) model of offender rehabilitation and its application in correctional treatment. Legal and Criminological Psychology, 17(1), 1–17. https://doi.org/10.1016/j.brat.2008.10.018

35.

Schuringa

Heininga

V. E.

Spreen

Bogaerts

(2016). Concurrent and predictive validity of the Instrument for Forensic Treatment Evaluation. International Journal of Offender Therapy and Comparative Criminology, 62(5), 1281–1299. https://doi.org/10.1177/0306624X16676100

36.

Schuringa

Spreen

Bogaerts

(2014). Inter-rater and test-retest reliability, internal consistency, and factorial structure of the Instrument for Forensic Treatment Evaluation. Journal of Forensic Psychology Practice, 14(2), 127–144. https://doi.org/10.1080/15228932.2014.897536

37.

Schuringa

Spreen

Bogaerts

(2018). Voorspellen van intramuraal geweld op korte termijn met het Instrument voor Forensische Behandel Evaluatie (IFBE), ROM-instrument in de tbs voor verschillende doelgroepen. [Predicting short term inpatient violence with the Instrument for Forensic Treatment Evaluation (IFTE), ROM-instrument in the tbs for different target groups.]. Tijdschrift voor Psychiatrie, 60(10), 662–671.

38.

Schuringa

Spreen

Bogaerts

(2019). Inpatient violence in forensic psychiatry: Does change in dynamic risk indicators of the IFTE help predict short term inpatient violence? International Journal of Law and Psychiatry, 66, 1–7. https://doi.org/10.1016/j.ijlp.2019.05.002

39.

Serin

R. C.

Lloyd

C. D.

Helmus

Derkzen

D. M.

Luong

(2013) Does intra-individual change predict offender recidivism? Searching for the holy grail in assessing offender change. Aggression and Violent Behavior, 18(1), 32–53. https://doi.org/10.1016/j.avb.2012.09.002

40.

Skeem

J. L.

Steadman

H. J.

Manchak

S. M.

(2015). Applicability of the Risk-Need-Responsivity model to persons with mental illness involved in the criminal justice system. Psychiatric Services, 66(9), 916–922. https://doi.org/10.1176/appi.ps.201400448

41.

Spengler

P. M.

White

M. J.

Ǣgisdóttir

Maugherman

A. S.

Anderson

L. A.

Cook

R. S.

Nichols

C. N.

Lampropoulos

G. K.

Walker

B. S.

Cohen

G. R.

Rush

J. D.

(2009). The meta-analysis of clinical judgment project. Effects of experience on judgment accuracy. The Counseling Psychologist, 37(3), 350–399. https://doi.org/10.1177/0011000006295149

42.

Spreen

Brand

ter Horst

Bogaerts

(2014). Handleiding HKT-R. [Manual of the HKT-R]. Stichting FPC Dr. S. van Mesdag.

43.

Spreen

Timmerman

M.E.

ter Horst

Schuringa

(2010). Formalizing clinical decisions in individual treatments: Some first steps. Journal of Forensic Psychology Practice, 10, 285–299. https://doi.org/10.1080/15228932.2010.481233

44.

Tasma

Liemburg

E. J.

Knegtering

Delespaul

P. A. E. G.

Boonstra

Castelein

(2017). Exploring the use of Routine Outcome Monitoring in the treatment of patients with a psychotic disorder. European Psychiatry, 42, 89–94. https://doi.org/10.1016/j.eurpsy.2016.12.008

45.

Troquete

N. A. C.

Van den Brink

R. H. S.

Beintema

Mulder

van Os

T. W. D. P.

Schoevers

R. A.

Wiersma

(2013). Risk assessment and shared care planning in out-patient forensic psychiatry: Cluster randomised controlled trial. British Journal of Psychiatry, 202(5), 365–371. https://doi.org/10.1192/bjp.bp.112.113043

46.

van der Veeken

F. C. A.

Lucieer

Bogaerts

. (2016). Routine outcome monitoring and clinical decision-making in forensic psychiatry based on the Instrument for Forensic Treatment Evaluation. PLoS One, 11(8), e0160787. https://doi.org/10.1371/journal.pone.0160787

47.

van der Veeken

F. C. A.

Lucieer

Bogaerts

. (2018). Forensic psychiatric treatment evaluation: The clinical evaluation of treatment progress with repeated forensic routine outcome monitoring measures. International Journal of Law and Psychiatry, 57, 9–16. https://doi.org/10.1016/j.ijlp.2017.12.002

48.

van Marle

H. J. C

. (2002). The Dutch Entrustment Act (TBS): Its principles and innovations. International Journal of Forensic Mental Health, 1(1), 83–92. https://doi.org/10.1080/14999013.2002.10471163

49.

Vess

(2001). Development and implementation of a functional skills measure for forensic psychiatric inpatients. Journal of Forensic Psychiatry, 12(3), 592–609. https://doi.org/10.1080/09585180110092001

50.

Waller

(2009). Evidence-based treatment and therapist drift. Behaviour Research and Therapy, 47(2), 119–127. https://doi.org/10.1016/j.brat.2008.10.018

51.

Waller

Turner

(2016). Therapist drift redux: Why well-meaning clinicians fail to deliver evidence-based therapy, and how to get back on track. Behaviour Research and Therapy, 77, 129–137. https://doi.org/10.1016/j.brat.2015.12.005

52.

Wilson

C. M.

Desmarais

S. L.

Nicholls

T. L.

Hart

S. D.

Brink

(2013). Predictive validity of dynamic factors: Assessing violence risk in forensic psychiatric inpatients. Law and Human Behavior, 37(6), 377–388. https://doi.org/10.1037lhb0000025

53.

Zimmerman

McGlinchey

J. B.

(2008). Why don’t psychiatrists use scales to measure outcome when treating depressed patients? Journal of Clinical Psychiatry, 69(12), 1916–1919. https://doi.org/10.4088/jcp.v69n1209