Abstract
Although the Juvenile Sex Offender Assessment Protocol–II (J-SOAP-II) and the Structured Assessment of Violence Risk in Youth (SAVRY) include an emphasis on dynamic, or modifiable factors, there has been little research on dynamic changes on these tools. To help address this gap, we compared admission and discharge scores of 163 adolescents who attended a residential, cognitive-behavioral treatment program for sexual offending. Based on reliable change indices, one half of youth showed a reliable decrease on the J-SOAP-II Dynamic Risk Total Score and one third of youth showed a reliable decrease on the SAVRY Dynamic Risk Total Score. Contrary to expectations, decreases in risk factors and increases in protective factors did not predict reduced sexual, violent nonsexual, or any reoffending. In addition, no associations were found between scores on the Psychopathy Checklist:Youth Version and levels of change. Overall, the J-SOAP-II and the SAVRY hold promise in measuring change, but further research is needed.
Although adolescents who sexually offend are sometimes assumed to indefinitely pose a high risk to the public, adolescents’ risk can change substantially over time. Some youth show reduced risk and desistance from offending as a result of effective interventions (Letourneau, Henggeler, et al., 2013; Worling, Litteljohn, & Bookalam, 2010) or developmental maturation (Moffitt, 1993; Monahan, Steinberg, Cauffman, & Mulvey, 2013; Sweeten, Piquero, & Steinberg, 2013). Other youth continue to offend, engaging in increasingly severe forms of offending as they age (Loeber, Farrington, Stouthamer-Loeber, & White, 2008).
Given that risk may fluctuate, many widely used adolescent risk assessment tools, such as the Juvenile Sex Offender Assessment Protocol–II (J-SOAP-II; Prentky & Righthand, 2003) and the Structured Assessment of Violence Risk in Youth (SAVRY; Borum, Bartel, & Forth, 2006), were developed with an emphasis on dynamic risk factors (Vincent, Terry, & Maney, 2009). Dynamic risk factors are modifiable factors (e.g., anger management difficulties, limited parental supervision) that may change as a result of intervention, development, or life events. In contrast, historical factors, such as past offending, cannot be undone once they have occurred.
Despite risk assessment tools’ stated emphasis on dynamic factors, little research has been conducted on changes in risk. Thus, we examined the ability of the J-SOAP-II and the SAVRY to measure reliable change over the course of treatment (i.e., from admission to discharge) and whether adolescents who improved were less likely to reoffend. We also tested whether adolescents with psychopathic features showed lower levels of improvement during treatment than other adolescents.
Use of the J-SOAP-II and the SAVRY to Measure Change
The J-SOAP-II and the SAVRY are among the most widely used adolescent risk assessment tools (McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2010; Viljoen, McLachlan, & Vincent, 2010). The J-SOAP-II was designed to assess risk for sexual and nonsexual reoffending among adolescents who have sexually offended (Prentky & Righthand, 2003). Although the SAVRY is not designed specifically for adolescents who have sexually offended (Borum et al., 2006), it may be relevant to this population as adolescents who have committed sexual offenses have some similarities to adolescents who have committed nonsexual offenses (e.g., antisocial attitudes and traits; Seto & Lalumière, 2010). Furthermore, adolescents who have committed sexual offenses are more likely to reoffend with nonsexual crimes (e.g., assaults, property crimes) than with sexual crimes (Caldwell, 2010), suggesting that tools like the SAVRY may be useful.
The J-SOAP-II and the SAVRY both include dynamic factors. On the J-SOAP-II, approximately half of the items are purported to be dynamic factors (i.e., 12/28 items = 43%), including all of the items on the Intervention scale (e.g., empathy) and the Community Stability/Adjustment scale (e.g., management of sexual urges). On the SAVRY, approximately two thirds of the items are dynamic (i.e., 20/30 items = 67%), including items on the Social/Contextual section (e.g., peer delinquency), the Individual/Clinical section (e.g., anger management problems), and the Protective Factors section (e.g., prosocial involvement).
To date, numerous studies have been conducted to examine the predictive validity of the J-SOAP-II and the SAVRY. Across individual studies, the predictive validity of the J-SOAP-II is mixed (Hempel, Buck, Cima, & van Marle, 2013; Viljoen, Mordell, & Beneteau, 2012). However, when aggregated across studies, the J-SOAP-II total scores show a moderate ability to predict sexual and nonsexual reoffending (weighted area under the curve [AUC] = .67 and .66, respectively; Viljoen, Mordell, & Beneteau, 2012). Similarly, the SAVRY Risk Total Scores show moderate effect sizes in predictions of violent and general reoffending (weighted r = .30 and .32, respectively; Olver, Stockdale, & Wormith, 2009; see also Singh, Grann, & Fazel, 2011).
The dynamic sections on the J-SOAP-II and the SAVRY have also been found to predict reoffending (Guy, 2008; Viljoen, Mordell, & Beneteau, 2012; Vincent, Chapman, & Cook, 2011). However, we know very little about the ability of the J-SOAP-II and the SAVRY to measure changes in reoffense risk. This is because in the vast majority of studies, researchers have administered tools at a single time point. In one of the few studies to examine change, youth showed greater improvements on the J-SOAP-II when the treatment dose was moderate than when it was low or high (Rehfuss et al., 2013). In addition, in a conference presentation, Hilterman (2014) found different trajectories of change on the SAVRY, with some youth increasing and other youth decreasing in risk.
Given that in these two studies researchers focused mainly on the process of change rather than the ability of the J-SOAP-II and the SAVRY to measure change, a couple of key questions remain. First, can the J-SOAP-II and the SAVRY be used to reliably measure change? That is, do raters show adequate interrater reliability in assessing change? Second, when can we conclude that a reliable change has occurred on these tools? For instance, if an adolescent scores a couple of points lower on the J-SOAP-II or the SAVRY. this could simply be due to measurement error, as no tool has perfect reliability. To examine this, we used reliable change indices (RCIs) to estimate reliable or true change after taking into account measurement error (Jacobson & Truax, 1991). Although many scholars recommend increased use of RCIs (Duff, 2012; Marsden et al., 2011; Stein, Luppa, Brähler, König, & Riedel-Heller, 2010; Wise, 2004), as of yet, few studies have been conducted examining RCIs in the context of violence risk assessment (i.e., Draycott, Kirkpatrick, & Askari, 2012; Kroner & Yessine, 2013; Viljoen, Beneteau, et al., 2012).
Changes in Risk Ratings and Reoffending
If the J-SOAP-II and the SAVRY are able to adequately capture changes in risk, one might expect that decreases in risk scores during treatment are predictive of lower rates of reoffending. Although this research question has not yet been explored in adolescent samples, a number of researchers have tested associations between changes in scores on risk assessment tools and reoffending in adult offenders. In one of the first studies, Olver, Wong, Nicholaichuk, and Gordon (2007) found small inverse associations between changes in scores on the Violence Risk Scale–Sexual Offender Version (VRS-SO; Wong, Olver, Nicholaichuk, & Gordon, 2003). This association did not reach significance in the full sample (r = −.09) or for the low-risk group (r = .01), but was significant for high-risk offenders (r = −.15, p < .05). In other words, high-risk offenders who showed greater reductions in risk scores were less likely to reoffend (see also Olver, Nicholaichuk, Kingston, & Wong, 2014). For low-risk offenders, the level of improvement may not matter as much as it does for high-risk offenders because low-risk offenders are already relatively unlikely to reoffend.
In several additional studies with the VRS-SO and other tools (e.g., VRS [Wong & Gordon, 2006]; Level of Service Inventory–Revised [Andrews & Bonta, 1995]), researchers have also reported small inverse correlations between change scores and reoffending (Beggs & Grace, 2011; Labrecque, Smith, Lovins, & Latessa, 2014; Lewis, Olver, & Wong, 2013; Olver et al., 2014; Vose, Lowenkamp, Smith, & Cullen, 2009; Vose, Smith, & Cullen, 2013). In other studies, however, reductions in risk factors have not translated into reduced reoffending (Barnett, Wakeling, Mandeville-Norden, & Rakestrow, 2012, 2013; Bowen, Gilchrist, & Beech, 2008; Goodman-Delahunty & O’Brien, 2014; Kroner & Yessine, 2013; Serin, Gobeil, & Preston, 2009; Woessner & Schwedler, 2014).
In part, these nonsignificant findings could be due to methodological issues, such as small sample sizes. Beyond this, the inconsistent results could suggest that some tools do a better job than others at capturing changes related to reoffense risk. To add to this research, the current study is the first to focus on the relationship between changes on adolescent risk assessment tools and reoffending. Furthermore, it is one of the few studies in which changes in protective factors were examined.
Psychopathic Features and Changes in Risk Ratings
A final area of focus in the present study is the relationship between change scores and psychopathic features. Psychopathy is a set of traits that is characterized by callousness toward others, limited capacity to experience emotions, and impulsiveness (Skeem, Polaschek, Patrick, & Lilienfeld, 2011). Youth with psychopathic features show higher rates of offending than do other youth (Edens, Campbell, & Weir, 2007). Furthermore, they often show limited treatment compliance (Falkenbach, Poythress, & Heide, 2003; O’Neill, Lidz, & Heilbrun, 2003) and a diminished response to treatment (Manders, Deković, Asscher, van der Laan, & Prins, 2013; O’Neill et al., 2003).
That said, treatment appears to be more effective for these youth than incarceration (Caldwell, Skeem, Salekin, & Van Rybroek, 2006). Also, youth with psychopathic features appear to respond positively to certain forms of treatment (Salekin, Worley, & Grimes, 2010). Multisystemic Therapy, for instance, is associated with significant decreases in parent-reported psychopathic features (Butler, Baruch, Hickey, & Fonagy, 2011). Functional Family Therapy has been found to result in improved behavioral, emotional, and social adjustment in youth with callous-unemotional features (White, Frick, Lawing, & Bauer, 2012). In another study, adolescents with psychopathic features responded to a brief 12 session intervention that focused on motivational and cognitive-behavioral elements (Salekin, Tippey, & Allen, 2012).
Typically, researchers have measured the success of treatment via an examination of reoffense rates or changes in symptoms of psychopathy or conduct disorder (e.g., Butler et al., 2011; White et al., 2012). However, risk assessment tools may also provide a useful indicator of treatment-related improvement (see Olver, Lewis, & Wong, 2013). Thus, in the present study, we examined whether adolescents with psychopathic features demonstrate fewer reductions in risk factors and gains in protective factors during treatment than other adolescents.
Present Study
Although the J-SOAP-II and the SAVRY were designed to measure change, there is, as of yet, little research on their ability to do so. Thus, we examined the interrater reliability of ratings of change on the J-SOAP-II and the SAVRY, the proportion of youth who showed reliable change in J-SOAP-II and the SAVRY risk scores during residential cognitive-behavioral treatment (CBT), and whether improvements (i.e., reduced risk scores and increased protective scores) were associated with lower reoffense rates. In addition, we examined whether youth with psychopathic features were less likely to show treatment-related improvements.
It was predicted that the J-SOAP-II and the SAVRY would show adequate reliability for measuring change. Given that CBT is associated with significant reductions in sexual reoffending when compared with treatment as usual (OR = 0.59; Reitzel & Carbonell, 2006), it was hypothesized that adolescents attending the program would show reductions in risk scores and increases in protective factors from admission to discharge. Consistent with adult studies, it was expected that, after controlling for risk level, adolescents who improved would be less likely to commit sexual and nonsexual reoffenses. Finally, it was hypothesized that adolescents low in psychopathic features would show more improvement than those high in these features.
Method
Participants
Potential participants included all of the 169 male adolescents who were discharged between January 1993 and December 2004 from a nonsecure residential sex offender treatment program in a medium-sized, mid-Western American city, namely, the Whitehall Psychiatric Residential Program in Lincoln, Nebraska. To be included in the present study, youth had to have remained in the program for a sufficiently long enough period of time (i.e., 30 days or more) that they had an opportunity to show change on the risk assessment tools. Six youth were omitted because they were discharged in less than 30 days after admission, resulting in a sample size of 163.
The mean age of the youth at the time of admission was 15.39 years (SD = 1.50). Although a large majority of the youth were non-Hispanic Caucasian (82.8%, n = 135), a small proportion were African American (8.6%, n = 14), Hispanic (4.9%, n = 8), American Indian or Alaskan Native (1.2%, n = 2), or biracial (2.5%, n = 4). The length of time youth spent in the program ranged from 31 days to 4.07 years; the mean number of years in the program was 1.13 (SD = 0.67). Youth had committed a variety of sexually abusive behaviors (i.e., index offenses) that led to treatment, including genital penetration (36.8%, n = 60), anal penetration (35.0%, n = 57), oral–genital contact (48.5%, n = 79), fondling (62.0%, n = 101), and exhibitionism (14.1%, n = 23).
In most cases, youth had at least one index offense victim who was three or more years younger than the youth (86.7%, n = 137). Approximately half of the youth had index offenses against female-only victims (46.3%, n = 74), 24.4% (n = 39) had male-only victims, and 29.4% (n = 47) had both female and male victims. Many of the victims were related to the offender (71.9%, n = 115). Approximately half of the youth had committed prior sexual offenses (50.9%, n = 83) or were charged or convicted for nonsexual offenses (51.0%, n = 80).
This sample of youth has been included in previous research on risk and protective factors (Elkovitch, Viljoen, Scalora, & Ullman, 2008; Latzman, Viljoen, Scalora, & Ullman, 2011; Spice, Viljoen, Latzman, Scalora, & Ullman, 2013; Viljoen, Elkovitch, Scalora, & Ullman, 2009; Viljoen et al., 2008). However, the current study has a different focus (i.e., dynamic change) and does not include any analyses that are redundant with prior work.
Procedure
Ethics approval was obtained from the University of Nebraska–Lincoln, Simon Fraser University, and the research site. This study had a quasi-prospective design with the risk assessments being made in the context of research rather than clinical practice. Three trained research assistants rated the J-SOAP-II and the SAVRY for each youth based on archival file information. Youth’s admission ratings on these tools were made using the file information available at admission, and youth’s discharge ratings were made using the file information available at discharge. After all J-SOAP-II and SAVRY ratings were completed, two different research assistants separately completed ratings on the Hare Psychopathy Checklist:Youth Version (PCL:YV; Forth, Kosson, & Hare, 2003). By rating the PCL:YV separately, it minimized the possibility that PCL:YV ratings might influence assessments of how much a youths’ risk and protective factors changed.
Research assistants followed the rating guidelines in the manuals for the J-SOAP-II, the SAVRY, and the PCL:YV; no adaptations or changes were made to any rating criteria. In coding the measures, research assistants were blind to youths’ subsequent charges and convictions. All research assistants were PhD students in clinical forensic psychology, had completed graduate coursework on risk assessment, and had been employed in clinical practicum positions with offenders. Prior to commencing coding, raters underwent didactic training, received readings, and completed five practice cases with the study measures using case files.
Given that the youth in this sample had, on average, spent approximately 1 year in the residential treatment program, the file information available to code the study measures was comprehensive. On average, files were over 600 pages in length and included psychiatric assessments, psychological assessments, nursing records, medical examination information, social work reports, teacher assessments, school records, treatment plans and records, progress notes, physician orders, arrest records, and other materials. Raters coded the quality of each file on a scale of 1 to 10, with 1 being extremely poor in quality and 10 being extremely good in quality. The modal quality rating was generally good (Mode = 8.00, Mdn = 7.00, M = 7.29, SD = 1.31), with only five files receiving scores of five or less. As the files generally contained the necessary information for coding tools, missing data were scarce; no youth were missing data on the J-SOAP-II, and only one youth was missing information for items on the SAVRY (this case was prorated for the one missing item at admission and four missing items at discharge). To examine interrater reliability of the risk assessment tools, a random sample of files (22.7%, n = 37) was selected and separately coded by a second rater. As described in the “Results” section, interrater reliability was generally good to excellent.
Approximately 1.37 years after completing the coding of the study measures, youths’ postdischarge juvenile justice and adult criminal records were obtained through statewide law enforcement and probation. Records were available for all participants in our sample. The average length of the postdischarge follow-up period was 8.07 years (SD = 3.50), but ranged from 2.18 years to 13.56 years as youth were discharged at different dates. During the follow-up period, 7.4% of youth were arrested for sexual reoffenses (n = 12), 12.9% for violent nonsexual reoffenses such as assault (n = 21), and 46.0% for any reoffense (n = 75). This latter category included property offenses, violent nonsexual offenses, sexual offenses, and miscellaneous offenses (e.g., mischief) but did not include traffic offenses (e.g., speeding tickets). A reoffense was defined as an arrest rather than conviction, as sexual offenses are sometimes reduced to nonsexual offenses through plea bargains (Letourneau, Armstrong, Bandyopadhyay, & Sinha, 2013). To ensure a consistent and transparent reporting of methodology and results, this manuscript adheres to the Risk Assessment Guidelines for the Evaluation of Efficacy (RAGEE) Statement (Singh, Yang, Mulvey, & the RAGEE Group, 2015), a 50-item reporting checklist.
Description of Treatment Program
The Whitehall Psychiatric Residential Treatment Program is a specialized, community-based residential program that provides treatment to youth adjudicated for a sexual offense. To be admitted into the program, youth had to meet the following admission criteria: between 13 and 17 years of age, intellectual and adaptive functioning at least at the borderline level, adjudicated of a sexual offense and mandated to receive treatment, and demonstrated self-control that would allow functioning in an open, unlocked treatment program. The program is staffed by a multidisciplinary team including a licensed clinical psychologist, a psychiatrist, master’s level mental health clinicians, nurses, occupational therapists, recreational therapists, and bachelor’s level direct care staff.
At intake, all youth undergo a comprehensive psychological evaluation, and throughout treatment, both youth and treatment staff complete ongoing assessments that assess behavioral and emotional symptomatology. Treatment plans are individualized to meet each youth’s strengths and treatment needs, although they tend to focus on similar themes: insight and accountability for past offenses, problem-solving skills, skill building and promotion of positive relationships, development of relapse prevention plans, enhancing awareness of victim impact, reduction of psychopathology, and educational success. These areas are addressed via several modalities, including individual, group, and family therapy, as well as school-based interventions and recreational and occupational therapy. Youth attended individual therapy two to five times a week, as well as a relapse prevention group (three times a week), occupational therapy, and recreational therapy. Depending on the youth’s needs, they also attended trauma-focused, coping skills, and relationship skills groups all with a CBT, skill-building orientation.
The Whitehall Program is an unlocked program. The daytime level of supervision is comparable with a day treatment program and youth have more community contact than a traditional secure or correctional facility. For instance, depending on their progress in the program, youth can go home with their family for a weekend or go on a community outing (e.g., out for lunch with their family). Also, at the time period captured by the current study, some youth in the program were attending public schools and/or church in the community.
Measures
J-SOAP-II
The J-SOAP-II (Prentky & Righthand, 2003) is a 28-item checklist designed to aid in assessing risk for sexual violence and general delinquency. It is intended for use with adolescents, aged 12 to 18, who have a history of sexually coercive behavior. In the present study, we focused on the Intervention and Community Stability/Adjustment scales, as the J-SOAP-II authors conceptualize these scales as dynamic. These scales contain seven and five items, respectively, which are rated on a 3-point scale (absent, possibly present, clearly present) and are summed to create a Dynamic Risk Total Score. The J-SOAP-II does not have cutoff scores or yield probability estimates.
In the J-SOAP-II manual, the authors state to omit the Community Stability/Adjustment scale if a youth is “incarcerated in a correctional facility or a secure residential treatment program” (p. 25). However, this scale can be rated for youth in nonsecure residential settings (Prentky et al., 2010). The residential treatment program in this study was nonsecure and unlocked. For instance, youth in the program had numerous outings in the general community, such as home visits and attendance at school and church. Thus, similar to Prentky et al. (2010), we rated this scale for the youth in our study.
A meta-analysis indicated that the J-SOAP-II’s Intervention and Community Stability/Adjustment scales significantly predicted sexual and nonsexual reoffending with small to moderate effect sizes (Viljoen, Mordell, & Beneteau, 2012). In prior studies, researchers have found the Intervention scale to have good to excellent interrater reliability and the Community Stability/Adjustment scale to have fair to excellent interrater reliability (e.g., Aebi, Plattner, Steinhausen, & Bessler, 2011; Caldwell, Ziemke, & Vitacco, 2008; Martinez, Flores, & Rosenfeld, 2007; Rajlic & Gretton, 2010). In the present study, internal consistency was adequate (α > .77; see Table 1) except for the Community Stability/Adjustment scale at admission (α = .60).
Interrater Reliability and Internal Consistency of J-SOAP-II and SAVRY Scores.
Note. The column titled Change refers to the interrater reliability of the change scores (i.e., Change score = Score at Admission − Score at Discharge for risk scales, and Score at Discharge − Score at Admission for the SAVRY protective factors section). RCIs are rounded up or down to the nearest integer, as scale scores are in full numbers versus decimal points. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth; CI = confidence interval; RCIα = reliable change index based on internal consistency; RCIIRR = reliable change index based on interrater reliability coefficient.
SAVRY
The SAVRY (Borum et al., 2006) is a 30-item checklist that was designed to assess violence risk in male and female adolescents. The SAVRY is based on a structured professional judgment (SPJ) model and does not have cutoff scores. In the present study, we focused on the Social/Contextual, Individual/Clinical, and Protective Factors sections, as the SAVRY authors conceptualize these sections as dynamic. The Social/Contextual and Individual/Clinical sections contain six and eight items, respectively, which are rated on a 3-point scale (with ratings of low, moderate, or high risk). The Protective Factors section contains six items, which are rated dichotomously (present or absent). Consistent with other research on the SAVRY (e.g., Lodewijks, de Ruiter, & Doreleijers, 2010), we summed items to form scores for each section and created a Dynamic Risk Total Score by summing scores on the Social/Contextual and Individual/Clinical sections.
In a meta-analysis examining the predictive validity of SPJ-based risk assessment tools, Guy (2008) reported that the SAVRY Social/Contextual, Individual/Clinical, and Protective Factors sections significantly predicted physical and sexual violence and nonviolent reoffending, with weighted AUC scores ranging from .64 to .75 (see also Lodewijks et al., 2010; Vincent, Guy, Gershenson, & McCabe, 2012). Researchers have found these sections to have good to excellent interrater reliability (e.g., intraclass correlation coefficients or ICC > .80; Lodewijks, Doreleijers, de Ruiter, & Borum, 2008). Internal consistency was generally acceptable in the present study but was low for the Social/Contextual and Protective Factors sections (α < .70; see Table 1).
PCL:YV
The PCL:YV (Forth et al., 2003) is a 20-item rating scale designed to measure psychopathic traits. This measure was adapted for adolescents from the PCL-Revised (PCL-R; Hare, 1991, 2003). Each PCL:YV item is rated on a 3-point scale (i.e., 0, 1, 2), with higher scores indicating a larger number of psychopathy-related traits. Consistent with the PCL:YV manual, items were summed to form a PCL:YV Total Score and scores on four facets. The Interpersonal facet includes four items (e.g., grandiose sense of self-worth), the Affective facet includes five items (e.g., callous/lacking empathy), the Behavioral facet includes five items (e.g., irresponsibility), and the Antisocial facet includes five items (e.g., early behavior problems).
In prior studies, researchers have found that the PCL:YV is a valid and reliable measure available for assessing psychopathic features (Edens et al., 2007; Salekin, Leistico, Neumann, DiCicco, & Duros, 2004). Furthermore, the PCL:YV was found to be a significant predictor of some forms of reoffending in a sample of adolescents who sexually offended (Gretton, McBride, Hare, O’Shaughnessy, & Kumka, 2001). Similar to other studies (Forth et al., 2003; O’Neill et al., 2003), internal consistency in the present study was acceptable for the total score (α = .80). However, it was modest for the facet scores (α = .59, .64, .51, and .68 for Interpersonal, Affective, Behavioral, and Antisocial facets, respectively) possibly due to the small number of items in each facet (i.e., 4 or 5 items each; see Cortina, 1993). Consistent with previous research (e.g., Spain, Douglas, Poythress, & Epstein, 2004), the interrater reliability of the PCL:YV total and facet scores generally fell in the excellent range (ICC for single raters, absolute agreement, two-way random effects model = .89 for total score, and .83, .89, .68, and .85 for Interpersonal, Affective, Behavioral, and Antisocial facets, respectively, based on a random sample of 25 cases from the present study). The mean PCL:YV score was 17.25 (SD = 6.00). This is consistent with prior research with samples of adolescents in residential treatment programs (e.g., Marshall, Egan, English, & Jones, 2006) but slightly lower than other samples of youth who have sexually offended (e.g., Gretton et al., 2001).
Data Analysis
To examine interrater reliability of ratings of change on the J-SOAP-II and the SAVRY, ICCs were calculated on a random sample of 37 cases (22.7%). We used a random effects model for single raters and examined absolute agreement rather than general consistency (McGraw & Wong, 1996). ICCs are commonly classified in the following manner (Cicchetti & Sparrow, 1981; Shrout & Fleiss, 1979): poor (≤.40), fair (.40-.59), good (.60-.74), and excellent (≥.75).
To examine level of change in J-SOAP-II and the SAVRY scores from admission to discharge, repeated-measures MANOVA were conducted using the Dynamic Risk Total Scores and scale scores. Magnitude of change was interpreted based on Cohen’s d for repeated measures, where .20 corresponds to a small effect size, .50 to a medium effect size, and .80 to a large effect size (Cohen, 1988). To determine the proportion of adolescents who showed reliable increases or decreases in scores, we calculated RCIs (95% confidence intervals) with the Jacobson and Truax (1991) formula. 1 The RCI takes into account measurement error by calculating whether an individual showed more change than would be expected based on chance or error alone. Whereas group analyses such as t tests can sometimes mask individual changes (e.g., if an equal proportion of youth increase and decrease, these effects could cancel each other out, resulting in a nonsignificant t value), RCIs provide individual-oriented analyses by examining the proportion of individuals who show reliable increases, reliable decreases, and no reliable change in scores. Although reliable change can be calculated with various forms of reliability (Evans, Margison, & Barkham, 1998; C. Evans, personal communication, September 18, 2014), in this study, RCIs were calculated based on interrater reliability ratings (i.e., ICCs at admission). This is because interrater reliability is a critical form of reliability for risk assessment tools, as risk assessment tools require rater judgment. For comparison, we also calculated RCIs based on internal consistency.
To analyze the association between change and reoffending, change scores were calculated for each scale as follows: Change Scores for Risk Scales = Score at Admission − Score at Discharge, and Change Score for Protective scale = Score at Discharge − Score at Admission. Thus, higher change scores indicated greater improvements. Consistent with research on adult tools (e.g., Olver et al., 2014), we examined zero-order correlations between change scores and reoffending and then conducted three sets of partial correlations controlling for (a) static risk level (i.e., J-SOAP-II Static scale for J-SOAP-II analyses and SAVRY Historical Factors section for SAVRY analyses), (b) admission score on the respective scale, and (c) treatment length. To determine whether the presence of reliable change added incrementally to the prediction of reoffending relative to static risk, a series of logistic regression analyses were conducted. Given the modest base rates for sexual and violent nonsexual reoffending, penalized likelihood regression was conducted to reduce the risk of bias in the estimation of the odds ratio (Heinze, 2006). Although penalized likelihood methods may be applied to Cox regression, logistic regression remained the preferred method of analysis as the exact dates of reoffense could not be ascertained for all youth (n = 6). 2 Finally, receiver operating characteristic (ROC) curve analyses were conducted to generate the AUC values for J-SOAP-II and SAVRY scores at admission and discharge (Hanley & McNeil, 1982). Comparative analyses between the admission and discharge AUC values were conducted using the method developed by DeLong, DeLong, and Clarke-Pearson (1988).
To analyze the association between psychopathy scores and reoffending, zero-order correlations were calculated between PCL:YV total and facet scores and reoffending outcomes. Furthermore, given that associations between psychopathic features and change scores may not be notable unless youth reach a certain threshold of these features, we compared youth scoring high, moderate, and low on the PCL:YV. Cutoffs were selected based on quartiles; youth who scored at 25th percentile or lower (i.e., ≤13) were classified as low, those who scored between the 25th and 75th percentiles were classified as moderate (i.e., 14-21), and those who scored at the 75th percentile or higher (i.e., ≥22) were classified as high.
Analyses were generally conducted in IBM SPSS, Version 19. However, AUCs and penalized likelihood regression were performed in R (Heinze & Ploner, 2004; Robin et al., 2011), which has increased capacities for these analyses (e.g., R provides the DeLong et al., 1988, test; see R Core Team, 2014). All p values for analyses were set at p < .05, and family-wise corrections were made where applicable.
Results
Reliability of Ratings of Change
For the J-SOAP-II, ICCs for change scores were good to excellent for each scale (.64-.82), indicating that change on the J-SOAP-II can be measured with adequate interrater reliability (see Table 1). On the SAVRY, ICCs for change scores were good for the Individual/Clinical section and Dynamic Risk Total Score (.71 and .66, respectively), but fair for the Social/Contextual section (.46) and poor for the Protective Factors section (.24). As shown in Table 1, interrater reliability at discharge was generally higher than at admission.
Level of Change From Pre- to Post-Treatment
Based on a repeated-measures MANOVA, significant multivariate effects were found across the within-subjects time points (i.e., admission and discharge) for both the J-SOAP-II dynamic risk scales (Phillai’s Trace [V] = 0.65, F(2, 161) = 147.60, p < .001) and SAVRY dynamic risk scales (Phillai’s Trace [V] = 0.63, F(3, 160) = 88.87, p < .001). 3 Univariate analyses (Table 2) revealed significant decreases from admission to discharge among each of the risk scales with large repeated-measures Cohen’s d effect sizes (>.80) for five of the six risk scales (the exception being the Social/Contextual section of the SAVRY that produced a moderate effect size). Furthermore, scores on the SAVRY Protective Factors section significantly increased from admission to discharge; however, the magnitude of the difference was small. Stability coefficients ranged from .62 to .75 (Table 2).
Stability and Change in J-SOAP-II and SAVRY Scores.
Note. drm = repeated-measures Cohen’s d; r = stability coefficient. F tests were adjusted for their respective family-wise error rate and are significant at p < .001 level. Dynamic Risk Total Scores were not included in the MANOVA due to multicollinearity with the scales. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth.
p < .001 (two-tailed test).
In general, RCI values classified a sizable number of youth as having exhibited a reliable change between admission and discharge. When RCIs were calculated based on interrater reliability, youth needed to show a change of at least 8 points on the J-SOAP-II and the SAVRY Dynamic Risk Total Scores for change to be classified as reliable (see Table 1). When RCIs were calculated based on internal consistency, a narrower scope of change was needed to classify it as reliable change, as the J-SOAP-II’s and the SAVRY’s alphas were typically higher than their ICCs.
Approximately one half of youth showed reliable decreases on the J-SOAP-II Dynamic Risk Total Scores (see Table 3). Somewhat fewer youth (approximately one third) showed reliable change on the SAVRY Dynamic Risk Total Score. On the section and scale scores, a relatively high proportion of youth showed reliable decreases on the J-SOAP-II Intervention scale and the SAVRY Individual/Clinical section (38.7%-50.3%), whereas rates were more modest for the J-SOAP-II Community Stability/Adjustment scale and the SAVRY Social/Contextual sections (6.7%-19.0%). Although there were no reliable increases in risk factors, a sizable proportion of youth did not meet the threshold for reliable change regardless of direction (≥42.3% per scale). On the SAVRY Protective Factors section, only 8.0% of youth displayed reliable change.
Proportion of Youth Showing Reliable Change in J-SOAP-II and SAVRY Scores.
Note. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth; RCIα = reliable change index based on internal consistency; RCIIRR = reliable change index based on interrater reliability coefficient.
Changes in Risk Ratings and Reoffending
Prior to the main analyses, point-biserial correlations and AUC values were calculated for the admission and discharge scores for the J-SOAP-II and the SAVRY with the reoffense outcomes to determine whether discharge scores were more predictive of reoffending than admission scores (Tables 4 and 5, respectively). Associations between the dynamic risk and protective scores and reoffending were modest, with only a single AUC value being considered moderate in size (i.e., AUC ≥ .64; Rice & Harris, 2005). Contrary to expectations, several of the admission scores were stronger predictors of reoffending when compared with their respective discharge scores. However, none of these differences achieved statistical significance using the comparative methods developed by DeLong et al. (1988).
Relationship Between J-SOAP-II Scores and Reoffending: Point-Biserial Correlations and AUCs.
Note. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; AUC = area under the curve; CI = confidence interval.
p < .05 (two-tailed test).
Relationship Between SAVRY Scores and Reoffending: Point-Biserial Correlations and AUCs.
Note. SAVRY = Structured Assessment of Violence Risk in Youth; AUC = area under the curve; CI = confidence interval.
For ease of interpretation, scores on the Protective Factors were reversed for the AUC analyses such that higher scores represent a deficit in protective factors.
p < .05 (two-tailed test).
Next, correlational analyses were conducted to examine whether decreased risk factors and increased protective factors predicted lower rates of reoffending (Table 6). The correlations were modest and none reached statistical significance even after controlling for static risk level, admission scores on the respective scale, and treatment length. To examine whether change scores predict reoffending over shorter periods of time (as compared with our average follow-up of 8.07 years), post hoc correlational analyses were conducted using fixed follow-up periods of 1 and 2 years. 4 For this analysis, we controlled for risk level and scores at admission. Given that base rates of reoffending were low for the 1- and 2-year follow-ups (i.e., 2.5% to 3.1%), these analyses focused on any reoffending outcome that had base rates of 9.9% and 14.9% at 1 and 2 years, respectively. Again, none of the partial correlation coefficients between change scores and reoffending reached significance.
Correlations Between Change Scores (i.e., Improvement) and Reoffending.
Note. All point-biserial and partial correlation coefficients are nonsignificant. Positive change scores indicate greater improvement. Thus, if youth who showed high improvement were less likely to reoffend, a significant inverse correlation between change and reoffending would be expected. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth.
Although associations with change scores were nonsignificant, we next tested whether reoffending might be inversely associated with reliable change (i.e., change that met the threshold to conclude it was reliable rather than measurement error). These results are presented in Tables 7 and 8. Maximum likelihood logistic regression analyses were conducted for the outcome of any reoffending. Penalized logistic regression, using R, was used for sexual and nonsexual violent reoffending as base rates were modest for these outcomes (i.e., 7.4% and 12.9%, respectively; see King & Zeng, 2001). These analyses controlled for static risk level in Step 1. Overall, model fit was poor, and reliable change failed to significantly predict reoffending with several exceptions. First, youth who showed reliable decreases in the Intervention subscale were less likely to sexually reoffend (OR = 0.14, p = .013; see Table 7). Second, and in contrast, youth who showed reliable decreases on the J-SOAP-II Community Stability/Adjustment subscale were at increased likelihood for sexual reoffending (OR = 6.58, p = .022; see Table 7) and any reoffending (OR = 3.06, p = .021; see Table 8). Overall, the presence of reliable change failed to add significant incremental validity relative to static risk level for the majority of the analyses, the only exception being two analyses with the J-SOAP-II, Δχ2(2) = 6.88, p = .032, for any reoffending; Δχ2(2) = 7.64, p = .022 for sexual reoffending.
Penalized Logistic Regression Analyses of the Association Between Reliable Change (i.e., Improvements) and Sexual and Violent Nonsexual Reoffending.
Note. CI = confidence interval; J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth.
p < .05 (two-tailed test).
Logistic Regression Analyses of the Association Between Reliable Change (i.e., Improvements) and Any Reoffending.
Note. CI = confidence interval; J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth.
p < .05. **p < .01 (two-tailed test).
Psychopathic Features and Changes in Risk Ratings
None of the correlations between PC:YV total and facet scores were significantly correlated with change, although the correlations were in the anticipated direction (i.e., inverse correlations; Table 9). Similarly, when a MANOVA was conducted (see Table 10), the multivariate effect of PCL:YV groups on change failed to reach significance, and none of the univariate effects were significant either (family-wise error rates were controlled using the Bonferroni correction, that is, p ≤ .05 = .010). We reran analyses using different cutoff scores for psychopathy (i.e., low = scores ≤15, moderate = scores of 16 to 24, high = scores ≥25), again finding no significant differences. Finally, as evidenced by chi-square analyses (see last column in Table 10), there were no significant associations found between PCL:YV group and rates of reliable change on the dynamic scales.
Point-Biserial Correlations Between Change Scores (i.e., Improvement) and PCL:YV Scores.
Note. All zero-order correlation coefficients are nonsignificant. Positive change scores indicate greater improvement. Thus, if youth with high PCL:YV scores showed less improvement, a significant inverse correlation between change and reoffending would be expected. PCL:YV = Psychopathy Checklist:Youth Version; J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth.
Comparing Change Scores by PCL:YV Low (n = 44), Moderate (n = 80), and High (n = 39) Groups.
Note. Dynamic Risk Total Scores were not included in the MANOVA due to multicollinearity with the scales. PCL:YV = Psychopathy Checklist:Youth Version; RCIIRR = reliable change index based on interrater reliability coefficient; J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; SAVRY = Structured Assessment of Violence Risk in Youth; MANOVA = multivariate analyses of variance.
Discussion
Adolescent risk assessment tools, such as the J-SOAP-II and the SAVRY, include an emphasis on dynamic factors. However, as of yet, little research has been conducted on dynamic changes in these factors. To help address this gap, we compared admission and discharge scores on the J-SOAP-II and the SAVRY in a sample of 163 adolescents who had participated in a residential CBT treatment program for adolescents who had sexually offended.
Primary Findings
Adolescents showed substantial changes in their risk ratings from admission to discharge. On the J-SOAP-II, effect sizes for change at an overall group level were large. In addition, one half of youth showed a reliable decrease on the J-SOAP-II Dynamic Risk Total Score. Although the treatment program was a specialized program targeted at sex offending, youth in the program also showed moderate reductions in general risk factors for violence on the SAVRY. Specifically, one third of youth showed a reliable decrease on the SAVRY Dynamic Risk Total Score.
Changes in SAVRY Protective Factors were modest in comparison, with only 8% of youth showing a reliable increase in protective factors. This could be because treatment programs for sexual offenders generally focus on risk reduction rather than strengths promotion (Ward, 2002; Ward & Brown, 2004). Alternatively, the protective factors section of the SAVRY may be less dynamic in nature. For instance, the SAVRY protective factor, resilient personality traits, is defined to include “above-average intellectual ability” (Borum et al., 2006, p. 54), which is difficult to modify. Another possibility is that the Protective Factors section is less sensitive to detecting change than the risk scales because it rates items dichotomously (present or absent) rather than on a 3-point scale. Finally, because the Protective Factors section had modest reliability (α = .58, ICC = .68), a higher change score was required to conclude that a change was reliable.
Despite the significant changes in youth’s risk ratings from admission to discharge, risk ratings at discharge were no more accurate in predicting reoffending than risk ratings at admission. Although many youth showed improvement over the course of treatment, this generally did not directly translate into reductions in reoffending. One exception to this was that reliable decreases in risk factors on the J-SOAP-II Intervention scale significantly predicted lower rates of sexual reoffending.
The general failure to find associations between change scores and reoffending could indicate that the J-SOAP-II and the SAVRY Dynamic scales may not be tapping into all of the relevant dynamic factors. Not only were change scores nonpredictive, but also the admission and discharge scores on the dynamic scales did not significantly predict reoffending in this sample, although they have shown adequate predictive validity in other studies (Guy, 2008; Viljoen, Mordell, & Beneteau, 2012). Beyond the possibility that these findings may reflect on the tools themselves, these null results may be due to a number of equally plausible or more plausible explanations, such as methodological limitations (e.g., the reliance on official records to measure change) or challenges in sustaining treatment effects.
In particular, if adolescents’ risk is changeable it may not make sense to presume that decreases in risk would predict reduced reoffending 8 years later, as youth may have experienced many changes in risk and protective factors during this time (e.g., gains in impulse control with maturation or increased antisocial attitudes with cumulative exposure to antisocial lifestyles). However, in the current study, change scores did not predict reoffending at 1-year and 2-year fixed follow-ups either. In future research, researchers should test shorter time intervals (e.g., 6-month follow-ups) to determine whether the relevance of change may expire at an even earlier date. The period of transition from residential programs to home environments may be a period of particular fluidity in risk; youth may not necessarily maintain treatment gains as they transition from residential treatment to the community (Nickerson, Colby, Brooks, Rickert, & Salamone, 2007). In particular, given that some adolescents’ home environments may be characterized by high levels of conflict and limited supervision (Burns, Hoagwood, & Mrazek, 1999), risk scores may increase after youth return home.
Surprisingly, in the present study, reliable decreases in risk factors on the J-SOAP-II Community Stability/Adjustment subscale were associated with higher rates of sexual and any reoffending. This finding is difficult to explain, especially as this scale includes well-established risk factors such as management of anger, management of sexual urges, stability in school, and evidence of positive support systems. However, there are three potential explanations for this: (a) Youth showing decreased risk in this domain may have been subject to increased monitoring leading to a higher likelihood of detection; (b) youth who were perceived to have high levels of stability and community adjustment were provided with less supervision upon discharge, increasing their opportunities to reoffend; or (c) youth showed decreased risk in this domain because the treatment program provided a high degree of structure and supervision (e.g., an on-site school). Youth who responded well to this structure may have been vulnerable to relapse (e.g., reoffending) when discharged back into unstructured home environments. Finally, although the manual states it is acceptable to use this scale with youth in a nonsecure residential setting (Prentky & Righthand, 2003, pp. 25, 26), and this has been done in prior research (Prentky et al., 2010), it is possible that youths’ discharge ratings may provide an unrealistically high estimate of a youth’s capacities in these areas. Instead, it is important to not only assess youth during the treatment program but also to reassess them after they return to their home environment.
Whereas in previous studies researchers have found that youth with psychopathic features are less responsive to treatment than other youth (Manders et al., 2013; O’Neill et al., 2003), no significant differences emerged in the present study. Youth high in psychopathic features appeared to show similar decreases in risk factors and increases in protective factors as other youth. This may be because the residential CBT treatment program that the youth received was appropriate for youth with psychopathic features; there is some evidence that youth with psychopathic features respond quite favorably to some intensive, residential interventions (Caldwell et al., 2006), and cognitive-behavioral approaches (Salekin et al., 2012). Another possibility is that reduced response to treatment is only seen in youth with very high levels of psychopathic features, whereas most youth in our sample had mid-range scores on the PCL:YV (M = 17.25). Finally, most studies have examined changes in features of psychopathy and conduct disorder as treatment outcomes, whereas the current study focused on risk and protective factors. Thus, it may be that risk and protective factors are more dynamic than psychopathic features. If this is the case, it may be useful to target risk and protective factors in treatment for youth with psychopathic features rather than solely focusing on the reduction of psychopathic features themselves (see Wong & Hare, 2005).
Study Limitations
In interpreting these study results, several caveats are important. First, similar to other studies on dynamic change (e.g., Olver et al., 2007), the J-SOAP-II and the SAVRY were rated based on file information. Although file- and interview-based ratings are strongly correlated (Gretton et al., 2001), it is possible that demand characteristics impacted ratings (e.g., raters may have rated discharge risk scores lower than warranted). Similar to other risk assessment studies (e.g., Douglas, Ogloff, Nicholls, & Grant, 1999), we coded some files that predated the development of the tools to ensure an adequate sample size and a sufficiently long follow-up period. This means that the files did not necessarily contain specific information that mapped exactly onto the J-SOAP-II and SAVRY factors. Nonetheless, the files were comprehensive in nature, raters judged most files to be high quality, and there was very little missing data (i.e., only one youth had any missing items on the J-SOAP-II and the SAVRY).
Second, similar to other studies (see Viljoen, Mordell, & Beneteau,2012, for a summary), official records were used to measure reoffending. This approach may fail to detect some sexual offenses (Fleming, Jory, & Burton, 2002). Thus, future research should assess reoffending through multiple methods (e.g., youth and parent self-report, treatment records).
Third, the treatment program examined in this study has not previously been researched. Thus, if the results had indicated that youth did not change, this would have been difficult to interpret; such a finding could have meant that the treatment program was ineffective and/or that the tools were not adequately sensitive to change. As it turned out, youth showed significant improvements over the course of treatment. However, without a control group it is not possible to determine whether changes in risk scores occurred as a result of treatment and/or other mechanisms (e.g., maturation, regression to the mean).
Fourth, this study focused on the J-SOAP-II scales and the SAVRY sections that the authors conceptualize as dynamic, historical factors should be examined in future work, given the possibility that some of these factors may change over time (e.g., a youth can engage in additional acts of violence or experience maltreatment).
Fifth, although the overall sample size was 163, interrater reliability data were collected for a relatively small subset of these youth (22.7%, n = 37); this limits our ability to make firm conclusions about the interrater reliability of change scores. Finally, we did not record information on where youth were residing prior to admission. It is possible that a small number of youth were residing in locked settings prior to admission; staff at the treatment program indicated that such cases would be rare. Also, raters were instructed to follow the J-SOAP-II manual, which states that if a youth was recently in a correctional facility or a secure residential treatment program for longer than 6 months, he must have been in the community for “at least 3 months” to rate the Community Stability/Adjustment scale (Prentky & Righthand, 2003, p. 25).
Implications
Results of this study have several implications for research and practice. In particular, the finding that adolescents’ risk showed substantial change over the course of treatment reinforces that clinicians should reassess risk regularly. Further research should clarify the optimal interval for reassessment. At the present time, experts recommend reassessing risk at least every 6 months and at periods of significant change, such as if a youth acquires a new charge or is released from a custodial facility (Vincent, Guy, & Grisso, 2012).
In addition, given that the J-SOAP-II and the SAVRY detected relatively high rates of change and generally showed adequate interrater reliability in measuring change, these tools hold promise as measures of changes in risk and protective factors. However, to determine whether certain approaches are more sensitive to change than others, researchers should compare these and other approaches for measuring change (e.g., the VRS–Youth Version [Wong, Lewis, Stockdale, & Gordon, 2004-2011], the Estimate of Risk of Adolescent Sex Offense Recidivism [ERASOR; Worling & Curwen, 2001], the Short-Term Assessment of Risk and Treatability–Adolescent Version [Viljoen, Nicholls, Cruise, Desmarais, & Webster, 2014]). Rather than focusing on the predictive validity of change scores (and conceptualizing absence of change as another risk factor), researchers should also examine the extent to which measuring change can guide refinements to treatment plans.
Finally, the results of this study indicate that clinicians and researchers should use caution in interpreting change. If a youth’s score changes by a couple of points on a tool, it does not mean that he or she showed meaningful change, as all tools have a certain degree of imprecision. Indeed, we found that a youth’s score on the J-SOAP-II and the SAVRY Dynamic Risk Total Scores had to have increased or decreased 8 points to conclude that a youth had shown reliable change (after taking into account imperfect interrater reliability). To guide the interpretation of changes in risk, test developers and researchers could provide RCIs or other empirically derived guidelines. Other types of clinical measures, such as treatment outcome measures (Lambert et al., 1996) and neuropsychological tests (Strauss, Sherman, & Spreen, 2006), provide this type of information.
Conclusion
Adolescent risk assessment research consistently shows the importance of attending to dynamic changes in risk. Remarkably, however, the present study is one of the few studies to examine this issue. Based on the results, adolescents’ risk is indeed dynamic. Contrary to expectations, however, high improvement was generally not associated with lower rates of reoffending. Although this could suggest that the tools are not capturing all relevant changes, a number of equally plausible reasons exist including the fact that change cannot be assumed to be a static entity (i.e., adolescents who show decreases in risk factors during treatment may not necessarily maintain these improvements indefinitely). Further research is needed to clarify the potential value of risk assessment tools in measuring change. Studies that prospectively assess adolescents during and following treatment would be of particular benefit.
Footnotes
Acknowledgements
The authors thank Chris Evans for providing helpful information regarding reliable change indices.
Authors’ Note
The findings and conclusions in this manuscript are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention or the Woods Charitable Fund.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is funded by the Woods Charitable Fund.
