Abstract
In forensic psychiatry, it is common practice to use an unstructured clinical judgment for treatment evaluation. From risk assessment studies, it is known that the unstructured clinical judgment is unreliable and the use of instruments is recommended. This paper aims to explore the clinical judgment of change compared to the calculated change using the Instrument for Forensic Treatment Evaluation (IFTE) in relation to changes in inpatient violence This study shows that the clinical judgment is much more positive about patient’s behavioral changes than the calculated change. And that the calculated change is more in accordance with the change in the occurrence of inpatient violence, suggesting that the calculated change reflects reality closer than the unstructured clinical judgment. Therefore, it is advisable to use the IFTE as a base to make a structured professional judgment of the treatment evaluation of a forensic psychiatric patient.
Introduction
Treatment of forensic psychiatric patients is most effective to prevent recidivism when the three principles of the Risk-Need-Responsivity (RNR) model are applied (Andrews & Bonta, 2010; Andrews et al., 1990, 2006; Polaschek, 2012). The Risk principle argues that treatment programs must meet a patient’s risk level in terms of duration and intensity of the treatment. High-risk offenders need longer and more intensive treatment than low-risk offenders (Papalia et al., 2019). According to the Need principle, treatment programs must focus on patient’s specific dynamic criminogenic needs, which contribute to an increased risk of recidivism. Finally, the Responsivity principle states that treatment programs must match the learning ability, motivation, and strengths of the offender and the treatment used must be evidence-based (Skeem et al., 2015).
The assessment of an offender’s personal risk level and needs was, until the mid-seventies of the last century, a matter of subjective judgments by clinicians. Own insights, intuition, professional opinion, confidence, training, and experiences were leading in the assessment (Miller et al., 2015). This was referred to as the first generation of risk assessment (Andrews et al., 2006). Spengler et al. (2009) showed in a meta-analysis that this unstructured clinical judgment frequently led to inaccurate evaluations of the risk of recidivism. The lack of rules, transparency, replicability, consistency and scoring integrity, and accuracy led to criticism of the clinical approach (Harris & Rice, 2007). For instance, with the unstructured clinical judgment important risk factors were overlooked or not considered, too much attention was paid to irrelevant factors or insufficient weight was assigned to relevant risk factors (Dawes et al., 1989). Therefore, structured (actuarial) risk assessment tools were developed and introduced to tackle the limitations of unstructured clinical judgments, both in the context of legal decision-making and in forensic psychiatric treatment. These instruments are referred to as the second generation of risk assessment (Ǣgisdóttir et al., 2006; Baird & Stocks, 2013; Cooper et al., 2008). In a meta-analysis of 136 studies in which actuarial predictions were compared with unstructured clinical predictions concerning risk of recidivism, actuarial predictions were found to be more accurate than unstructured clinical predictions in almost half (47%) of the studies (Grove et al., 2000). No differences in predictive accuracy between both approaches was found in about 47% of the studies and in a small minority of studies (6%), the unstructured prediction was slightly more accurate. On average, the actuarial prediction of future violence was more accurate than the unstructured clinical prediction by an approximately 10% increase in hit rate (Ǣgisdóttir et al., 2006).
A shortcoming of actuarial predictions was that only historical or static factors were assessed which could not be influenced by treatment or time. Therefore, dynamic risk factors or dynamic criminogenic needs were added to the instruments (Douglas & Skeem, 2005). The use of risk assessment instruments consisting of dynamic risk factors in combination with static factors led to structured professional judgments of future risk of recidivism: the third generation of risk assessment instruments (Andrews et al., 2006). After evaluating and weighing all risk factors and considering the base rate of recidivism, social factors and a patient’s individual environmental factors, a final risk level was determined (Bonta & Andrews, 2007). By systematically assessing dynamic criminogenic needs treatment can be focused on the dynamic criminogenic needs that require treatment and the effects of treatment can be evaluated (Belfrage & Douglas, 2002; Olver & Wong, 2011). Letting the outcomes of the systematical assessments of criminogenic needs guide the treatment is referred to as the fourth generation of risk assessment (Andrews et al., 2006).
Besides predicting future risk of violence, at certain time points in treatment, multidisciplinary teams of clinicians must decide whether a patient has shown sufficient progress, meaning a decrease in risk factors and an increase in protective factors, to make progress in the treatment process, such as transfers to less secure wards or unsupervised leave (Wilson et al., 2013). In inpatient forensic psychiatry, treatment teams and individual professionals often make treatment decisions based on their own subjective assessments without support of systematically collected data, that might be available (Bosker et al., 2013; Day et al., 2017).
This is also a problem in general mental health care, where clinicians also often do not use available routine outcome monitoring (ROM) data for treatment evaluation (Tasma et al., 2017; Zimmerman & McGlinchey, 2008). In general mental health care, studies are available on the limitations of unstructured clinical judgment of treatment progress (Bell & Mellor, 2009; Lilienfeld et al., 2013). The most serious ones are the lack of reliability, transparency, and replicability of unstructured clinical judgments. Clinicians using unstructured judgments fail to observe deterioration or report improvement while there is none (Hannan et al., 2005; Lilienfeld et al., 2014). These inaccurate evaluations can have a negative effect on the patient-professional working alliance and can have negative effects on the well-being of the patient. Several reasons have been reported for biases of unstructured clinical judgments. Clinicians may focus on a limited number of factors and/or on irrelevant data (Lockhart & Satya-Murti, 2017; Waller, 2009), overestimate the value of their experience (Hannan et al., 2005), or receive limited or no feedback on their judgments (Dawes et al., 1989). Therefore, using validated routine outcome monitoring tools is highly recommended to support and improve the accuracy of clinical decisions (Hansen et al., 2002; Lilienfeld et al., 2014; Waller & Turner, 2016).
In forensic psychiatry, an inaccurate positive or negative evaluation of change in risk and protective factors can have profound consequences for the patient, fellow patients, personnel, or society. For example, clinicians who wrongly decide that a forensic patient has changed positively, such as an incorrect judgment of a decrease in offence related risk factors, will grant the patient more responsibilities and more freedom than the patient can manage. This in turn can contribute to an increased risk of inpatient violence and/or violence after treatment. Inpatient violence is a serious problem within forensic psychiatry (Dack et al., 2013; Schuringa et al., 2018), associated with recidivism after discharge (Daffern et al., 2007), and negatively associated with treatment adherence (Jeandarme et al., 2019).
In sum, for predicting the risk of violence there is consensus in forensic psychiatry about the benefits of the use of structured assessment instruments. In general mental health care, there is the growing realization that ROM-instruments are beneficial for treatment evaluation purposes. However, in forensic psychiatry little attention has been given to structured treatment evaluations. The Instrument for Forensic Treatment Evaluation (IFTE; Schuringa et al., 2014) is especially designed for multidisciplinary treatment evaluation purposes in forensic psychiatry. Earlier studies have shown that the IFTE can be used to predict short-term and longer-term inpatient violence for forensic patients in high security institutions (Schuringa et al., 2016, 2019; van der Veeken et al., 2016, 2018).
This paper aims to compare the structured and unstructured judgment of change during forensic psychiatric treatment. Firstly, this study explores whether there is a difference between the treatment evaluation based on the average change in team scores on the IFTE (hereinafter calculated change [CalCh]) and the unstructured clinical judgment of change of the main clinician (ClinJCh). Secondly, this study compares CalCh and ClinJCh in relation to the change in occurrence of inpatient violence over the same period.
Methods
Setting
The study is set at Forensic Psychiatric Centre (FPC) Dr. S. van Mesdag, a maximum-security institution in the Netherlands for mentally disordered male offenders hospitalized under the Dutch entrustment act (tbs-order). A tbs-order is a “provision in the Dutch criminal code that allows for a period of treatment following a prison sentence for mentally disordered offenders.” (van Marle, 2002, p. 83). A tbs-order treatment is not considered as an additional punishment as such, but as a measure to protect society. Every 1 or 2 years, a tbs-order must be evaluated by a court based on the information of the treatment progress provided by the clinicians. The judge decides to prolong the act based on this information and the assessed risk for recidivism.
The IFTE data in this study are extracted from the Routine Outcome Monitoring (ROM) system of FPC Dr. S. van Mesdag. The period covered is from October 2016 until April 2019. The inclusion criteria for this study are: At least three team members completed IFTE’s at two sequentially measurement time points restricted to 4 to 8 months apart and the main clinician has answered the clinical judgment question about whether the patient has changed at the second measurement.
Instrument
The IFTE (Schuringa et al., 2014) consists of all 14 clinical (dynamic) items of the Dutch risk assessment instrument HKT-R (Historical, Clinical, Future – Revised; Spreen et al., 2014), three items inspired by the Atascadero Skills Profile (Vess, 2001), and five items designed in collaboration with clinicians of the institution. All 22 items describe observable behaviors and are divided into three factors: Protective behavior, Problematic behavior, and Resocialization skills (see Table 1).
Factors and Item Descriptions of the IFTE.
From HKT-R.
Inspired by ASP.
Designed with clinicians.
Distinctive features of the IFTE are the 17-points rating scale, which contributes to the enhanced sensitivity for measuring change (Serin et al., 2013) and its multidisciplinary use. The IFTE is completed by all members of a multidisciplinary treatment team independently before the treatment evaluation meeting, every 6 months. The scoring takes about 10 minutes. Before filling out the IFTE, the main clinician gives his/her clinical judgment whether he/she thinks the behavior of a patient has changed by answering the question: “Has the patient changed in this last period?” A 13-pointscale with four anchor points is used: 0 = “worsened,” 1 = “no change,” 2 = “a little improved” and 3 = “a lot improved.” Main clinicians in this institution are the coordinators of the treatment and are mostly (clinical) psychologists. The information is displayed in a treatment evaluation report in which the average team score per item and per factor are displayed as well as a measurement of change (Spreen et al., 2010) and a team agreement index per item (Gower & Legendre, 1986) is reported. The agreement index (0.00 is no agreement, 1.00 is total agreement) displays whether the behavior is consistently observed in different situations by different therapists. The three factors of the IFTE show moderate to good inter-rater reliability (Cronbach’s alphas range from .50 to .92), test-retest reliability (alphas range from .57 to .92), good internal consistency (range from .81 to .90), and modest to good concurrent validity. The factor Problematic behavior has good predictive validity for drug use (Cohen’s d = 1.47), and for inpatient violence with different diagnostic target groups (AUC = .77, CI: [0.70–0.85]) (Schuringa et al., 2014, 2016, 2018; van der Veeken et al., 2016).
Independent Variables
The three factors of the IFTE are used as independent variables (see Table 1). Also, the primary treatment goal of the patient is used as an independent variable. This variable is determined by taking the primary treatment goal of the treatment evaluation report of the second measurement and translating this goal into a corresponding IFTE item. In this way, each patient’s personal and actual criminological need is operationalized.
Outcome Measure
Inpatient violence is the outcome measure and is defined as any behavior, which intentionally could or did physically harm a person or animal, and/or a form of aggression, which is extremely intimidating or threatening (Troquete et al., 2013). Inpatient violence is determined per measurement by scoring the presence (1) of absence (0) of violent acts reported in the treatment evaluation report. The reporting of violence was too poor to differentiate severity and frequency of the violent acts. The change of violence is computed by the difference of the presence and/or absence of the violent acts between both measurements, resulting into three categories: less violence, no change, and more violence. No change means there is either violence or no violence at both measurements.
Calculated Change (CalCh)
The CalCh is computed as the difference between the average team scores, including the score of the main clinician, of two sequential measurements on the IFTE. The reliable change index (RCI) is applied to express the degree of change in observed behavior between the two measurements (Jacobson & Truax, 1991). The RCI is an index to determine whether a change of a patient is statistically reliable. In this study three categories of change are defined: the behavior of a patient can be improved (RCI ≥ 1.96), not changed (−1.96 > RCI < 1.96) or worsened (RCI ≤ −1.96). The CalCh is calculated for the three factors and the IFTE item representing the primary treatment goal.
Clinical Judgment of Change (ClinJCh)
The clinical judgment of change is determined by categorizing the 13-pointscale, filled out at the second measurement by the main clinician; “Has the patient changed?” into three categories: worsened (0–2), not changed (3–5), improved (6–13).
Statistical Analysis
To investigate the correspondence between the ClinJCh of the main clinician and the CalCh of the observed behavior by the team on the three IFTE factors and the primary treatment goal, the data is displayed in crosstabs and percentages of corresponding judgments are calculated. McNemar tests are used to determine whether there is a structural difference between the CalCh and ClinJCh.
The frequency of correspondence of the ClinJCh with change in violence is compared to the correspondence of CalCh with changes in violence. This results in a 2 × 2 table and a McNemar test is used to determine the difference between the agreements of CalCh and ClinJCh with change in violence.
Results
Sample
The sample for this study consisted of 119 men, with an average age of 36.7 years at intake (SD = 9.3; range 19–70), and a mean duration in the hospital of 42.1 months at measurement 1 (SD = 35.6; range 0–190). Thirty-nine percent of the patients had a main diagnosis of schizophrenia spectrum disorder, 29% had a personality disorder of which 13 patients had an antisocial personality disorder and six had a borderline personality disorder. Seventeen percent of the patients had a neurodevelopmental disorder (e.g., ADHD or autism spectrum disorder), 6% were diagnosed with a paraphilic disorder, 5% with a drug related diagnosis and 4% with another diagnosis (DSM-IV-TR; Diagnostic and statistical manual of mental disorders [4th ed., text rev.], American Psychiatric Association, 2000). They were convicted for: 1% for theft, 11% for medium violence, 10% for theft accompanied with violence, 23% for severe violence, 19% for a sexual crime, 16% for manslaughter, 10% for arson, and 10% for murder. There were 25 different main clinicians which evaluated on average 4.8 patients (SD = 3.5; range 1–14).
Statistical Analysis
Table 2 shows the results of ClinJCh and CalCh for the three factors. For the factor Protective behavior in 34% of the judgments (N = 41), the calculated change and clinician judgment of change matched. Of the 78 patients for whom no agreement was found between CalCh and ClinJCh of the factor Protective behavior, judgments about the direction of behavioral change differed significantly. In 70 cases, the ClinJCh of the main clinicians was more positive than the CalCh of the team. In only eight cases was the CalCh more positive than the ClinJCh (χ2 (1) = 49.28, p < .001).
Cross Table of the ClinJCh and CalCH on the Three Factors.
The ClinJCh was also significantly more positive about the behavioral change on the factor Problematic behavior (χ2 (1) = 45.21, p < .001; see Table 2). In 68 of the 77 cases (88%) with disagreements, the clinical judgment reported a more positive treatment development than the calculated change. In 35% of the cases, the ClinJCh was similar to the CalCh (see diagonal in Table 2). Concerning the factor Resocialization skills, also a significant difference in agreement between the clinical judgment and calculated change was observed (χ2 (1) = 49.28, p < .001; see Table 2). Like the two other IFTE factors, the ClinJCh of the main clinicians was more positive about the progress of the treatment. Of the 83 disagreements, 72 (87%) were evaluated more positively by the main clinicians. Agreement was found in 30% of the cases (see diagonal in Table 2). Regarding the direction of the progress of the individualized IFTE treatment goals, the unstructured clinical judgment corresponded in 32% of the cases with the calculated change (see Table 3). The ClinJCh was significantly more positive than the CalCh (χ2 (1) = 44.83, p < .001).
Cross Table of ClinJCh and CalCh on Treatment Goal.
In summary, in 30% to 35% of the cases, the clinical judgment of the main clinician and the calculated change of the team agreed about the direction of patient’s behavioral change. And in 81% to 90% of the cases where ClinJCh and CalCh disagreed, the clinical judgment of the main clinicians was more positive about the progress of the patient than the calculated change of the observed behavior by the whole team.
The second question of this study aimed to gain insight whether the clinical judgment or the calculated change corresponded most with the actual behavioral change of inpatient violence. The number of patients committing violence at the first measurement was 39 (33%) and at the second measurement it was 36 patients (31%). Twenty-five patients (21%) changed in violent behavior, either from non-violent to violent (N = 11) or from violent to non-violent (N = 14). In total 50 patients (42%) showed violence at one or both measurements. Table 4 shows that there is a significant difference in proportion of ClinJCh of the factor Problematic behavior being equal to the change in violence (26% [31/118] of the cases) and the proportion of CalCh being equal to the change in violence (77% [91/118] of the cases; χ2 (1) = 54.39, p < .001, OR = 31.00 [95% CI: 8.23–261.66]).
Comparison of CalCh with Change in Violence to ClinJCh with Change in Violence.
Discussion
This study investigated the agreement between the unstructured clinical judgment by the main clinician (ClinJCh) and the calculated change (CalCh) by the multidisciplinary team of the behavioral change of forensic psychiatric patients. Agreements between the calculated change and clinical judgment of change for the three IFTE factors (Protective behavior, Problematic behavior, and Resocialization skills), and an individualized treatment goal were studied. This study also compared the correspondence in agreement of the two methods of judgment of change with actual change in occurrence of violence. The results showed that the clinical judgment of change only matched the calculated change on the factors of the IFTE and the treatment goal in about 30% to 35% of the cases. The clinical judgment assessed significantly more positive change in behavior than the calculated change.
Since the main goal of treatment in a forensic psychiatric center is the reduction of risk of violence (Andrews et al., 1990; van Marle, 2002), factors strongly associated with that risk should be the focus of treatment, and therefore the focus of change as judged by the main clinician. One would expect a significant relation between the clinical judgment of change and the calculated change on the factor Problematic behavior. The factor Problematic behavior of the IFTE consists of well-known risk factors (Andrews & Bonta, 2010; Andrews et al., 2006) and has shown good predictive validity for inpatient violence (Schuringa et al., 2016, van der Veeken et al., 2016). However, in this study, the clinical judgment of change was much more positive about the change on Problematic behavior then the calculated change. A reason for this could be the cross-sectional sample of this study, which means that the treatment duration of this sample had a range from 0 to 190 months. The mean duration of a tbs-order treatment is approximately 8 years (Nagtegaal et al., 2011). The most problematic behavior is expected at the beginning of treatment, while at the end of treatment the focus would be on resocialization skills. It could be possible that the focus of the clinical judgment had also changed from Problematic behavior to Protective behavior or Resocialization skills in accordance with the progress in treatment a patient had made. In this study, however, there was also little agreement between clinical judgment of change and calculated change on Protective behavior (34%) or Resocialization skills (30%). If the main clinician did not have one of the three factors of the IFTE in mind when answering the question if the patient had changed, then maybe he/she was focused on the individual treatment goal of the patient. But, when this study focused on the specific individual treatment goal of a patient as reported in treatment evaluations, also little agreement was found between clinical judgment of change and calculated change (32%).
This study also explored which judgment of change, the unstructured or the structured one, was more in accordance with behavioral changes. This study showed that the calculated change on the factor Problematic behavior was much more in agreement with actual change in occurrence of violence than the clinical judgment. This strongly indicates that the calculated change was a more accurate representation of changes in violent behavior than the clinical judgment of change, and that the clinical judgment was thus too positive about change. This is in line with studies in general mental health which concluded that clinicians often fail to detect deterioration and over-report improvements (Hannan et al., 2005; Lilienfeld et al., 2014). In forensic psychiatry, this overly positive judgment of change of a patient can result in very adverse outcomes, such as violence. If the main clinician wrongly judges a patient progress as positively changed, he/she may adjust the risk management plan accordingly, which can result in too many responsibilities and freedom for the patient. This mismatch can overcharge the patient’s coping skills, which then can lead to an increased risk of violence. Treatment evaluations based solely on clinical judgments are therefore not recommended.
Results from this study suggest that in forensic psychiatric treatment evaluation, more emphasis should be placed on the results based on structured observations made by the multidisciplinary team. No matter which topic was studied, problematic behavior, protective behavior, resocialization skills or individual treatment goals, the main clinician is much more positive about the change than the instrument-based team scores. This does not mean that the clinical judgment is not useful, since there was no perfect accordance between calculated change and change in violent behavior, but treatment decisions should not be done based solely on the clinical judgment.
In an earlier study (Schuringa et al., 2018), the factor Problematic behavior could be used to classify patients in a high-risk group for short-term violence and in a low-risk group, but in this high-risk group about 50% of the patients committed violence. A similar percentage is found for the HKT-R (Spreen et al., 2014). Although, instruments are undoubtedly useful in forensic psychiatric treatment, they are not perfect, and one should not rely solely on instruments. A clinical judgment based on, instrument-based data is recommended, to reduce various biases of clinical judgments (Bell & Mellor, 2009; Lilienfeld et al., 2013), and at the same time overcome the limitations which come with data driven decisions, such as a non-related or insufficient data (Chin-Yee & Upshur, 2018; Lockhart & Satya-Murti, 2017). Just as with the third-generation risk assessments, ROM-instrument should be used as a basis to make a structured professional judgment about the treatment progress, or in other words, use the best of both worlds.
In the RNR-model the use of instruments to determine Risk and Needs is already widespread practice, but less attention has been given to the use of instruments to determine Responsivity to the treatment (Duwe & Kim, 2018). This study showed that the use of an instrument instead of solely the clinical judgment to establish changes in behavior, and thus structural monitoring responsivity of the patient to the treatment, is just as beneficial as using instruments for assessing risk and needs. Compliance with the principles of the RNR-model, although widely accepted as beneficial, is not as common as one would expect (Bonta et al., 2008). Maybe, if the assessment of responsivity is also performed in a more structured manner, this could help to comply to the RNR-model more often.
Strengths and Limitations
A limitation, but also a strength of this study is its naturalistic design of data collection. For several reasons, such as time management problems or priority issues, one must deal with missing measurements when using data from everyday practice, but with regular and long-lasting measurements most patients are still represented in the data for this study. The sample used in this study is heterogenous in diagnosis, age, treatment duration, and committed crimes, which could have impact on the outcome measure violence. However, in an earlier study (Schuringa et al., 2018) these variables did not influence the predictive power of the factor Problematic behavior, so they probably would also have little to no effect in this study.
The question of the clinical judgment: “has someone changed?” did not specify in which behavior a given patient had changed. It could be possible that the main clinician was thinking about for risk assessment purposes less relevant behavior, where the patient made substantial changes. For example, his cleaning and eating habits. For future research, the question should be more specified, for instance: “Did the patient change on his main treatment goal?” and “Is the patient at risk for violence in the near future?”
The outcome measure inpatient violence is scored as either present or absent because the severity and frequency of violence was not possible to determine by the lack of and/or incomplete reporting of violence in the treatment evaluation reports. Comparing changes in frequencies and severity of violence could result in more sophisticated results. Better reporting of this adverse outcome is therefore advisable. This study was also performed in a single institution, although one of the largest forensic psychiatric treatment institutions of the Netherlands and with a diverse, diagnostic wise population, generalizations to other institutions should be done with care.
Conclusion
This study showed that the unstructured clinical judgment of the main clinician is usually more positive about a change a given patient has demonstrated than the calculated change based on the IFTE. The calculated change on the factor Problematic behavior was more in line with actual behavioral change of occurrence of violence than clinical judgment, but not perfectly. Therefore, using the structured assessment of the IFTE as a base, in combination with the clinical judgment for decision making in forensic psychiatric treatment evaluation is recommended. Our advice is: Use the best of both worlds.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
