Abstract
The current study evaluated the predictive validity of the Juvenile Sex Offender Assessment Protocol–II (J-SOAP-II) scores in a sample of juveniles who recidivated sexually or nonsexually as adults. Participants included 166 juveniles who had previously sexually offended and were followed into adulthood for an average of 10.75 years. Results of area under the receiver operating characteristic curve (AUC) analyses supported the predictive validity of the J-SOAP-II Total Score, Scale 1, and Static Score in regard to adult sexual recidivism, and predictive validity was found for all J-SOAP-II scores (except Scale 1) in regard to adult nonsexual recidivism. Implications for future research on the assessment of risk factors and treatment needs for adolescents who commit sexual offenses are discussed.
Assessment of risk factors and treatment needs for adolescents who have sexually offended is essential for reducing the incidence of sexual recidivism by informing relevant and effective interventions. The utility of such assessment often is facilitated by using objective and reliable risk assessment scores with acceptable predictive validity. Predictive validity refers to the extent to which assessment tool scores are related to outcome(s) of interest, such as recidivism rates (Righthand, Vincent, & Huff, 2017). To determine whether a risk assessment tool is useful, it is necessary to evaluate the predictive validity in diverse samples and across multiple investigations by researchers not involved in the development of the tool (Vincent, Guy, & Grisso, 2012). Although studies provide increasing support for some of these tools, predictive validity findings have often been contradictory, and effect sizes tend to be moderate (e.g., Viljoen, Mordell, & Beneteau, 2012). Some examples include research on the Estimate of Risk of Adolescent Sex Offender Recidivism (ERASOR 2.0; Worling & Curwen, 2001), the Juvenile Sexual Offense Recidivism Risk Assessment Tool–II (J-SORRAT-II; Epperson, Ralston, Fowers, DeWitt, & Gore, 2006), and the Juvenile Sex Offender Assessment Protocol–II (J-SOAP-II; Prentky & Righthand, 2003).
The focus of the current study is the J-SOAP-II. Studies of sexual recidivism using the J-SOAP-II follow youths over varying periods of time (see Table 1), but only one past study (Ralston & Epperson, 2013) specifically explored the relations of youths’ J-SOAP-II scores and adult sexual offending. Although most juveniles do not appear to continue sex offending once identified or adjudicated for a sexual offense (e.g., Caldwell, 2016; Finkelhor, Ormrod, & Chaffin, 2009), some do persist in adulthood (e.g., Abel, Mittelman, & Becker, 1985; Knight & Prentky, 1993). Thus, it is essential to learn more about assessing risk and needs in juveniles who are persistent in sex offending in adulthood. What is more, nonsexual recidivism is an important outcome necessitating empirical inquiry, because most of these youths reoffend with a nonsexual offense if they reoffend at all (Caldwell, 2016). The current study examines the predictive validity of J-SOAP-II scores with regard to sexual and nonsexual recidivism in adulthood.
J-SOAP-II ROC Analyses and Sexual Recidivism.
Notes. M follow-up period included when reported in original study. Italicized values are for AUC estimates associated with the Total, Static, and Dynamic scores. AUC estimates in bold indicate significant effects (*p < .05, **p ≤ .01, ***p < .001) or effects deemed strong, large, or significant by study authors when exact p values were not reported. AUC statistics.71 and above typically indicate large effect sizes, with medium effect sizes characterized by AUC values between .64 and .71 (Rice & Harris, 2005). Wijetunga et al. (2018) report analyses conducted with subsamples from Martinez et al. (2015). See text for discussion of results from Viljoen et al. (2017), which reports on data from Viljoen et al. (2008), and Caldwell, Ziemke, and Vitacco (2008) and Parks and Bard (2006), which report predictive validity findings using Cox analyses. J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II; ROC = receiver operating characteristic; AUC = area under the receiver operating characteristic curve.
Total score calculated without Scale 4 as required in the J-SOAP II directions for Scale 4 when the youths are in secure settings and do not have access to the community that is typical of youths their age (AUC = .64, also nonsignificant, when Scale 4 was included).
Scale 4 calculated for staff-secure residential or incarcerated correctional sample.
Effect noted as significant according to confidence intervals although p > .05.
Recidivism by Adolescents Who Sexually Offend
Adolescents who commit sexual offenses pose particular challenges to juvenile justice systems, mental health providers, and social service agencies. Sometimes these youths are called “juvenile sex offenders,” but this legal categorization may be misleading in that it implies a degree of homogeneity and stability that is not accurate (Chaffin, 2008). Researchers consistently document that adolescents who come to the attention of authorities because of inappropriate or illegal sexual behavior are heterogeneous with respect to social and psychological characteristics, risks for future sexual or nonsexual illegal behaviors, and individual intervention needs (Fanniff & Kolko, 2012; McCuish, Lussier, & Corrado, 2015; van Wijk & Boonmann, 2017).
One relevant area of empirical inquiry is evaluating assessment tools designed to identify risks associated with recidivism and corresponding intervention needs. With respect to sexual recidivism, most adolescents who sexually offend will not have new reports or charges for sexual offenses even when followed up over long periods of time. Reviews and meta-analyses indicate sexual recidivism rates range from 3% to 15% (e.g., Caldwell, 2010, 2016; Finkelhor et al., 2009; McCann & Lussier, 2008; Viljoen et al., 2012). Reports of higher sexual recidivism rates are outliers and likely reflect unusual sample characteristics such as high levels of aggression (Rubenstein, Yeager, Goldstein, & Lewis, 1993), older age (Långström & Grann, 2000), or a less restrictive definition of recidivism (e.g., using child welfare reports; Prentky et al., 2010). Recidivism rates are also influenced by the appropriateness and quality of treatment received, with lower rates associated with effective interventions (Hanson, Bourgon, Helmus, & Hodgson, 2009; Worling, Littlejohn, & Bookalam, 2010).
Despite low base rates of sexual recidivism among adolescents with sexual offenses, sexual abuse by adolescents is a serious societal problem. More than one third of all sexual offenses against minors known to the police are committed by juveniles (Finkelhor et al., 2009). Sexual abuse has serious consequences for victims, families, and communities (e.g., Centers for Disease Control and Prevention, 2015). Youths who engage in the abusive behavior may also experience grave outcomes such as long-term public registration and notification and related iatrogenic problems (Chaffin, 2008; Letourneau & Armstrong, 2008; Zimring, 2015).
In addition, many youths who offend sexually engage in future nonsexual criminal behavior. Rates of nonsexual recidivism are considerably higher than sexual recidivism rates in youths who offend sexually (e.g., Caldwell, 2010; McCann & Lussier, 2008; Viljoen et al., 2012). In a recent meta-analysis, Caldwell (2016) reported nonsexual recidivism among adolescents who sexually offend was 41%. While several risk assessment studies have examined nonsexual recidivism following juvenile sexual offending (e.g., Martinez, Rosenfeld, Cruise, & Martin, 2015; Viljoen et al., 2008), fewer studies have addressed this outcome compared with studies addressing only sexual recidivism. Given the significant consequences of recidivism of any kind and the relatively high base rates of nonsexual recidivism, the importance of assessing risk for both sexual and nonsexual recidivism in adolescents who have offended sexually is clear.
It is important to consider the implications of assessing risk for sexual and nonsexual recidivism over the long term. On one hand, it is acknowledged that the adolescent developmental context is dynamic and that findings from adolescent risk assessment may have a short shelf life (i.e., some youths may present with significant risk factors that may dissipate over time). Yet some youths who offend sexually do go on to reoffend. Thus, both the short- and the long-term predictive validity of adolescent risk assessment tools—even into adulthood—is a necessary area for empirical inquiry that has been understudied. The current study addresses this gap by evaluating the predictive validity of J-SOAP-II scores regarding sexual and nonsexual recidivism in adulthood.
J-SOAP-II
The utilization of adolescent risk and needs assessment tools has grown rapidly in recent decades. Several tools have demonstrated predictive validity across multiple empirical investigations, including the ERASOR 2.0 (Worling & Curwen, 2001), the J-SORRAT-II (Epperson et al., 2006), and the J-SOAP-II (Prentky & Righthand, 2003). Of these tools, the J-SOAP-II is one of the most widely used (McGrath, Cumming, Burchard, Zeoli, & Ellerby, 2010). The J-SOAP-II originated from research conducted in the mid- to late 1990s (Prentky, Harris, Frizzell, & Righthand, 2000; Righthand, Prentky, Hecker, Carpenter, & Nangle, 2000). It is an empirically informed assessment guide designed to facilitate the systematic review of risk factors associated with sexual and nonsexual offending among boys aged 12 to 18 with adjudications for (or a credible history of) coercive sexual offenses. The J-SOAP-II is intended for use as part of a comprehensive evaluation and may help identify intervention targets relevant to case management and treatment planning (Prentky & Righthand, 2003).
The 28 items of this evaluator-completed checklist are rated on a 3-point scale (ranging from 0 to 2) and are organized into four scales: (1) Sexual Drive/Sexual Preoccupation, (2) Impulsive/Antisocial Behavior, (3) Clinical Intervention, and (4) Community Stability/Adjustment. Scales 1 and 2 assess static or historic risk factors, and Scales 3 and 4 1 assess dynamic variables. Individual items can be summed to form individual scale scores, a Static Scale score (Scales 1 and 2), a Dynamic Scale score (Scales 3 and 4), and a Total Score. Percentages can be calculated to reflect the proportion of risk represented by each scale.
Many studies have documented J-SOAP-II scores to have adequate or higher interrater reliability (e.g., Martinez et al., 2015; Rajlic & Gretton, 2010). However, variability across studies does exist and may be attributable to multiple factors including form and type of reliability statistic employed, rater training, rater drift, non-English translations of the tool, and quality of information upon which ratings are based (e.g., records, youth interviews). For a discussion of the challenges involved in establishing interrater reliability consistently, see Fanniff and Letourneau (2012) and Hecker (2014).
To date, 12 studies have published predictive validity findings pertaining to J-SOAP-II and sexual recidivism using completely unique samples. Some of these studies also evaluated the predictive validity of J-SOAP-II scores regarding nonsexual recidivism. The following section briefly reviews the existing predictive validity literature for the J-SOAP-II, with more detailed predictive validity results for sexual and nonsexual recidivism contained in Tables 1 and 2, respectively. 2
J-SOAP-II ROC Analyses and Nonsexual Recidivism.
Notes. M follow-up period included when reported in original study. Italicized values are for AUC estimates associated with the Total, Static, and Dynamic scores. AUC estimates in bold indicate significant effects (*p < .05, **p ≤ .01, ***p ≤ .001) or effects deemed strong, large, or significant by study authors when p values were not reported. AUC statistics.71 and above typically indicate large effect sizes, with medium effect sizes characterized by AUC values between .64 and .71 (Rice & Harris, 2005). See text for discussion of results from Viljoen et al. (2017), which reports on data from Viljoen et al. (2008), and Parks and Bard (2006), which reports predictive validity findings using Cox analyses. J-SOAP = Juvenile Sex Offender Assessment Protocol–II; ROC = receiver operating characteristic; AUC = area under the receiver operating characteristic curve.
Total score calculated without Scale 4 as required in the J-SOAP II directions for Scale 4 when the youths are in secure settings and do not have access to the community that is typical of youths their age (AUC = .60, p < .05, when Scale 4 was included).
Scale 4 calculated for staff-secure residential or incarcerated correctional sample.
Predictive Validity of the J-SOAP-II
As with similar juvenile risk assessment tools, findings from individual predictive validity studies of the J-SOAP-II are somewhat mixed for sexual recidivism (see Table 1). 3 Righthand and colleagues (2017) discussed that multiple methodological challenges exist and likely contribute to the mixed results, including low base rates (e.g., Szmukler & Rose, 2013), heterogeneity among adolescents who sexually offend (e.g., Seto & Lalumière, 2010), small and varied samples (e.g., outpatient, residential, or correctional settings; urban vs. rural environments; different countries), varying age groups, over- and/or underrepresentation of particular racial and ethnic groups, treatment status, incomplete or inadequate data (e.g., case files with relevant data missing), and varying lengths of follow-up period. Moreover, statistical procedures used in predictive validity analyses have limitations. For example, area under the receiver operating characteristic curves (AUCs) are susceptible to sampling restrictions (e.g., by index offense and/or sentence type; Howard, 2017), common among studies reporting nonsignificant AUC results.
Viljoen and colleagues’ recent meta-analysis (2012) sheds some light on the current state of the literature. Of nine studies included in the meta-analysis that evaluated the J-SOAP-II Total Score, the average AUC for sexual recidivism was .67 (95% confidence interval [CI] = [.59, .75]). Each of the J-SOAP-II scales also significantly predicted sexual recidivism (number of studies evaluating individual scales ranged from 8 to 13): Scale 1—Sex Drive/Preoccupation (AUC = .61; 95% CI = [.53, .69]), Scale 2—Antisocial/Impulsive (AUC = .63; 95% CI = [.58, .69]), Scale 3—Intervention (AUC = .60; 95% CI = [.54, .66]), and Scale 4—Community Stability/Adjustment (AUC = .70; 95% CI = [.60, .80]).
Since the meta-analysis, four additional studies on the predictive validity of the J-SOAP-II scores for sexual recidivism have been published, and the pattern of mixed results has persisted. In Ralston and Epperson’s (2013) study of 636 adjudicated youths (M age = 15.2 years), support emerged for the predictive validity of Scales 1 and 2 scores as well as the Static scale (Scales 1 and 2 combined) score in regard to juvenile sexual recidivism (Scales 3 and 4 were not evaluated). The predictive validity of these scores in relation to adult sexual recidivism was lower.
Martinez and colleagues (2015) explored the utility of Scales 3 and 4 for assessing post-discharge behavior with 156 youths (M age = 17.4 years) in residential or correctional settings, noting the J-SOAP-II manual recommends Scale 4 be omitted in such circumstances. On its own, Scale 4 was not reported to be statistically significant (AUC = .65; 95% CI = [.50, .79], p = .08). However, they found support for the predictive validity of Scale 3 as well as the Dynamic scale (Scales 3 and 4 combined).
Further analyses with the Martinez et al. (2015) sample by Wijetunga, Martinez, Rosenfeld, and Cruise (2018) found more consistent results across scales (see Table 1). Analyses in the Wijetunga et al. (2018) study importantly acknowledged the potential roles of age at discharge and the relevance of having a pattern of sexualized behavior. Interestingly, although Viljoen et al. (2008) found stronger predictive validity of J-SOAP-II scores for older adolescents (age 16-18 at admission), Wijetunga et al. (2018) found that J-SOAP-II scores demonstrated better predictive validity for youths who were younger (age 14-17 at discharge) than for youths who were older (age 17-19 at discharge). Specifically, the Total score and Scale 2, Scale 3, and Scale 4 scores each significantly predicted sexual recidivism for those youths who were aged 14 to 17 years at discharge.
Viljoen and colleagues’ (2017) study employed the sample of 163 residential youths used in their 2008 study to explore reliable change scores on Scales 3 and 4. As noted in Table 1, the 2008 study generally did not support the predictive validity of the J-SOAP-II Total or scale scores (except for the Total Score with older adolescents post-discharge). Researchers also reported that Scale 1 scores were associated with sexual aggression during treatment. In the 2015 study, the authors did find that youths who had reliable decreases on Scale 3 were less likely to reoffend sexually. However, those with reliable decreases on Scale 4 had an increased likelihood to reoffend sexually and in general. This counterintuitive finding may be related to the use of Scale 4 with youths in a residential treatment center who typically would not have access to the community as is common for others their age. As such, ratings would not be consistent with the intent of the Community Stability/Adjustment Scale. Nevertheless, Viljoen et al. (2017) concluded that the J-SOAP-II holds promise in measuring change but that further research is needed. In all, evidence converges to generally support the predictive validity of the J-SOAP-II Total score as well as the scale scores for sexual recidivism, although to different degrees, with some variation across studies, and reflecting a need for additional research.
Regarding nonsexual recidivism, evidence has accumulated for the predictive validity of the J-SOAP-II Total Score (see Table 2). Five of eight studies found significant relations between the Total Score and some type of nonsexual recidivism (e.g., nonviolent or violent). 4 Specifically, three studies found a relation between the Total Score and any type of nonsexual offending (Chu, Ng, Fong, & Teoh, 2011; Martinez et al., 2015; Rajlic & Gretton, 2010), one study found a relation with nonsexual violent offending (Aebi, Plattner, Steinhausen, & Bessler, 2011), and another found a relation with serious nonsexual aggression post-discharge (Viljoen et al., 2008; age 16-18 only).
Support for Scale 1 regarding nonsexual recidivism risk is lacking; this finding is not surprising, given that most individuals who commit crimes do not offend sexually. In fact, the absence of a significant association between Scale 1 and nonsexual offending may provide support for the divergent validity of this scale. Nine of 10 studies found nonsignificant relations. As an exception, Ralston and Epperson’s (2013) results did support the predictive validity of Scale 1 for nonsexual violent juvenile recidivism, but only when sexually violent offenses were included. Interestingly, Martinez et al. (2015) found that lower scores on Scale 1 were associated with higher rates of nonsexual recidivism. Stronger support has emerged for the predictive validity of Scale 2 scores, with eight of 10 studies in this review finding a positive association with some form of nonsexual recidivism (cf. Powers-Sawyer & Miner, 2009; Ralston & Epperson, 2013). Across studies, the predictive validity of Static scale (which combines Scales 1 and 2) scores is mixed, with two (Aebi et al., 2011; Ralston & Epperson, 2013) of four studies finding support.
Support for Scale 3 scores and nonsexual recidivism is mixed. Of the eight studies reviewed, four found predictive validity (Caldwell, Ziemke, & Vitacco, 2008; Martinez et al., 2015; Prentky et al., 2010; Rajlic & Gretton, 2010), but others did not (Aebi et al., 2011; Chu et al., 2011; Parks & Bard, 2006; Viljoen et al., 2008). Regarding Scale 4 scores and nonsexual recidivism, three of five studies support the scale’s predictive validity (Martinez et al., 2015; Rajlic & Gretton, 2010; Viljoen et al., 2008, during treatment only), and two did not (Aebi et al., 2011; Chu et al., 2011). Two of three studies support the predictive validity of the Dynamic scale scores (Aebi et al., 2011; Martinez et al., 2015). Together, findings suggest the clearest support for the predictive validity of the Total and Scale 2 scores with regard to nonsexual recidivism.
While the predictive validity findings for sexual and nonsexual recidivism are mixed across studies, findings from the meta-analysis (Viljoen et al., 2012) and the individual J-SOAP-II predictive validity data are informative. Like all research, J-SOAP-II predictive validity studies must be interpreted in the context of their methodological limitations, sample diversity, low base rate for sexual recidivism (e.g., Caldwell, 2016), and challenges inherent in research attempting to assess the risk of criminal behavior in general and risk of sexual recidivism in particular. As noted previously, many of these challenges (e.g., variation in samples, follow-up periods) may converge to influence the mixed pattern of results (Righthand et al., 2017).
The Current Study
The popularity of the J-SOAP-II in clinical practice (McGrath et al., 2010) and the mixed predictive validity findings to date highlight the importance of advancing our understanding of the predictive validity of J-SOAP-II scores. If the J-SOAP II is to be useful for assessing risk factors and treatment needs for adolescents who have sexually offended, it must be sufficiently validated. One understudied area of inquiry is the predictive validity of the J-SOAP-II into adulthood. An adolescent risk and needs assessment tool that is able to identify risk factors associated with adult sexual and/or nonsexual offenses arguably may be discerning stable, persistent characteristics that require intervention to mitigate risk. Limited information exists as to whether adolescent tools are effective in this regard, and the items, scales, and/or properties of such measures that may be most adept at doing so have not yet been clearly identified. The current study represents an important step for understanding whether adolescent risk and needs assessment tools such as the J-SOAP-II have utility for such purposes. It is hoped that the results of this inquiry spur future research directed toward these goals.
The current study was designed to evaluate the long-term predictive validity of J-SOAP-II scores and focuses on recidivism in adulthood. As the majority of J-SOAP-II studies include follow-up periods in the range of 2 to 6 years, the follow-up period for the current study was intentionally selected to be longer for the purposes of evaluating persistent recidivism risk. Records spanning at least 7.6 years were examined, and the average length of the follow-up period was 10.75 years (range: 7.6-15.5 years). As such, the entire sample was followed into their adult years, and this study provides a longer average follow-up period than reported by most of the existing J-SOAP-II predictive validity studies. As a notable exception, Ralston and Epperson (2013) do report adult sexual recidivism findings up to 34 years of age. However, the average follow-up period was not reported, and predictive validity findings are available for only for a subset of J-SOAP-II scores. The current study reports predictive validity results from all four J-SOAP-II scales in relation to adult sexual recidivism.
In addition, despite the much higher occurrence of nonsexual recidivism (compared with sexual recidivism) in adolescents who sexually offend, many studies have not examined the predictive validity of J-SOAP-II scores in regard to nonsexual recidivism. Furthermore, no identified studies have been designed to specifically evaluate the predictive validity of J-SOAP-II scores for adult nonsexual recidivism. This information is important to obtain and accumulate across studies to more fully understand the risk and needs of these juveniles. As such, a second goal of the investigation was to evaluate the predictive validity of J-SOAP-II scores by examining the association between scale scores and adult nonsexual offending.
Method
This study was approved by the Institutional Review Board of the University of Maine. Permission to access official criminal justice records was granted by the Maine Department of Corrections. As data collection was archival, informed consent was not necessary. The confidentiality of participants’ identifying information was maintained.
Sample and Participant Selection
Files of all 188 consecutive juvenile evaluation court referrals to the Maine State Forensic Services 5 between January 1995 and December 2002 were reviewed. Youths were referred because they were charged with sexual offenses that occurred prior to the age of 18. In accordance with the J-SOAP-II manual, a sexual offense was defined as a coercive sexual offense (Prentky & Righthand, 2003) and limited to contact offenses.
A majority of the youths in the sample were referred for presentence evaluations. In these cases, youths had been adjudicated of a sexual offense, and the purpose of the evaluation was to assist the court with disposition (n = 95; 57.2% of sample). Some youths (n = 68; 41%) were referred for preadjudicatory evaluations intended to assist with identifying treatment and placement needs; all were subsequently adjudicated for the sexual offense charge. Three youths (1.8%) were referred for evaluations concerning possible waiver to adult court; each was subsequently adjudicated for a sexual offense.
Six females were excluded from analysis. Additional cases were excluded due to unclear adjudication outcome (n = 10), excessive amounts of missing data (i.e., incomplete records precluded coding across multiple relevant areas; n = 4), and dismissed charges (n = 1); the final sample included 166 youths. Following the J-SOAP-II scoring rules, seven cases were not rated on Scale 4, the Community Stability/Adjustment scale, because the youths had not lived in the community a sufficient amount of time during the 6 months prior to the evaluation. In addition, one youth recidivated sexually as a juvenile, and 22 youths recidivated with nonsexual offenses as juveniles. 6 Given the focus of the current study on reoffending in adulthood (i.e., past the age of 18), juvenile recidivism cases were excluded from the relevant analyses.
Records indicated the sample was primarily Caucasian (n = 158, 95.2%), with a small number of Hispanic youths (n = 4, 2.4%), African American youths (n = 1, 0.6%), and youths identified as other races/ethnicities (n = 3, 1.8%). All youths were under 18 at the time of their offenses and considered juveniles. In accordance with state law, young adults are maintained in the juvenile justice system until age 21 if their offenses occurred while they were juveniles, unless there is a legal waiver to adult court. Thus, age at the time of court order ranged from 12 to 20 years (M = 15.43 years, SD = 3.20).
Procedure
In line with the purpose of the J-SOAP-II, sexual recidivism was defined for this study as any charge of a sexually abusive, hands-on, or coercive behavior (e.g., an attempted hands-on offense) occurring after the date of the court order for the forensic evaluation. The date of court order was used as the start of the follow-up period in lieu of the date of evaluation, as the latter was not a consistently reliable data point across all youths’ files (e.g., some evaluators did not date their reports, some evaluations spanned months). A charge, as opposed to conviction, was used as the definition of sexual recidivism so as to include cases in which a sexual crime might be plea bargained down to a nonsexual offense or charged but not prosecuted. Violent nonsexual offenses included crimes that involved direct contact with a victim (e.g., assault, robbery, and terrorizing). Nonviolent nonsexual offenses included theft, breaking and entering, receiving stolen property, and drug trafficking. Drug or alcohol possession or conduct offenses, such as failure to appear in court or violating conditions of probation, were not included in this definition because these data were not readily available.
Three trained raters assisted with the project, including one postdoctoral researcher, one graduate student in clinical psychology, and one advanced undergraduate psychology student. Prior to rating any cases, raters received a coding dictionary and completed detailed training (e.g., scoring requirements for individual J-SOAP-II items) conducted by the second author. Each rater completed a minimum of five practice cases using actual case files. Cases used to evaluate reliability were coded by each rater, and reliability ratings were conducted at various points throughout the study. Periodic checks addressed rater drift, and all ratings were reviewed and discrepancies resolved by the second author. Raters were blind to whether the cases scored became recidivists.
Raters reviewed the Maine State Forensic Service files to gather demographic information, characteristics about victims and index offenses, offense history, mental health, and other relevant information. All files contained an evaluation of the youths conducted by a licensed psychologist or psychiatrist with specialized training in forensic evaluations of youths who had committed sexual offenses. After reviewing the case file, the raters independently scored J-SOAP-II items and other case-related variables.
Three data sources were used to identify instances of recidivism: Maine State Bureau of Investigations records, the Maine Department of Corrections Information System (CORIS), and microfiche of preelectronic probation officer records. The follow-up period began on the date of the court order for the Maine State Forensic Services evaluation and ended on March 31, 2010. The juveniles were followed for an average of 10.75 years (128.6 months), with a range of 7.6 to 15.5 years (92.0-186.0 months).
Results
Interrater reliability was assessed by calculating intraclass correlation coefficients (ICC) for a subset of the sample (approximately 20% of files, n = 36). Two-way random, single-measures, absolute agreement ICCs were calculated because each subject was rated by each arbitrarily selected rater, reliability was calculated for a single measurement, and raters were expected to converge on a single score. The ICCs were as follows: Scale 1 (Sexual Drive/Preoccupation) ICC = .78, Scale 2 (Antisocial Behavior/ Impulsivity) ICC = .90, Scale 3 (Intervention) ICC = .64, Scale 4 (Community Stability/Adjustment) ICC = .54, Static scale ICC = .90, Dynamic scale ICC = .58, and Total score ICC= .78.
During the follow-up period, six (3.7%) individuals were charged with new contact sexual crimes, 39 (23.8%) had nonsexual violent charges, and 82 (50%) had nonviolent nonsexual charges. In all, 91 (55.5%) were charged with some type of an offense during the follow-up period. The average time to new adult charge was 69 months for sexual offenses (SD = 22.49, range = 36-97), 111.94 months for violent nonsexual offenses (SD = 39.07; range = 1-186), and 87.71 months for nonviolent nonsexual offenses (SD = 47.50; range = 1-186). The average age for new charges was 21.80 for sexual recidivism (SD = 2.17; range = 18-24), 21.30 for violent nonsexual recidivism (SD = 2.38; range = 18-27), and 20.61 for nonviolent nonsexual recidivism (SD = 2.85; range = 18-25). The average age at the end of the follow-up period was 26.58 (SD = 2.70; range = 20-33).
The AUC was used to evaluate the predictive validity of J-SOAP-II scores for new sexual, nonviolent nonsexual, and violent nonsexual charges. 7 Table 3 summarizes the AUC values associated with the receiver operating characteristic (ROC) curves for the J-SOAP-II Total score and the scale scores for the three types of offenses (sexual, nonsexual violent, nonsexual nonviolent). The J-SOAP-II Total Score (AUC = .76; 95% CI = [.56, .97]; p < .05), Scale 1 (AUC = .77; 95% CI = [.55, .99]; p < .05), and the Static Score (AUC = .79; 95% CI = [.66, .91]; p < .05) each were significantly associated with new sexual charges. Nonsignificant results emerged for Scale 2, 3, and 4 scores and the Dynamic Score.
Predictive Validity Findings for Adult Sexual and Nonsexual Recidivists.
Note. Given considerations relevant to balancing false-positive and false-negative rates, the number of tests conducted, and conventions of the existing literature, the alpha level for the current study was set at .05. AUC = area under the receiver operating characteristic curve; CI = confidence interval; J-SOAP-II = Juvenile Sex Offender Assessment Protocol–II (Prentky & Righthand, 2003).
Sexual recidivism: n = 6.
Nonsexual violent recidivism: n = 39 (sexual recidivists were excluded from the nonsexual recidivism analysis).
Nonsexual nonviolent recidivism: n = 82 (sexual and violent recidivists were both excluded from the nonsexual nonviolent recidivism analysis).
p < .05. **p < .01. ***p < .001.
Only Scale 1 was unrelated to nonsexual violent charges during the follow-up period. Specifically, the J-SOAP-II Total Score (AUC = .68; 95% CI = [.58, .77]; p < .01), Scale 2 (AUC = .68; 95% CI = [.59, .77]; p < .001), Scale 3 (AUC = .66; 95% CI = [.57, .75]; p < .01), and Scale 4 (AUC = .66; 95% CI = [.55, .76]; p < .01) were each significantly associated with nonsexual violent charges as were the Static Score (AUC = .64; 95% CI = [.55, .73]; p < .01) and Dynamic Score (AUC = .70; 95% CI = [.61, .80]; p < .001). Regarding nonsexual nonviolent charges, the Scale 2 (AUC = .63; 95% CI = [.54, .71]; p <.05), Scale 3 (AUC = .60; 95% CI = [.51, .69]; p < .05), and Dynamic Scores (AUC = .60; 95% CI = [.51, 68]; p < .05) each demonstrated significant predictive validity. Support did not emerge for the predictive validity of Scale 1 or 4 or for the Total or Static Scores regarding nonsexual nonviolent charges.
Discussion
The J-SOAP-II is a frequently used and extensively researched juvenile risk assessment tool. Including the current investigation, there are 16 published studies 8 of the J-SOAP-II scores’ predictive validity and a meta-analysis. Notably, all but two were conducted independently of the J-SOAP-II developers. This research was conducted in diverse settings with varied populations and provides not only ample opportunities for the emergence of divergent findings but also consistent trends. When used as intended, it appears that the J-SOAP-II may provide useful information relevant to juvenile sexual and nonsexual recidivism risks and intervention needs (Prentky & Righthand, 2003). Importantly, results of the current study underscore the possibility that the J-SOAP-II may also be effective in identifying factors associated with risk of adult reoffending. Having information about factors associated with adult recidivism may illuminate pathways of risk and facilitate resilience, thereby helping reduce the potential for adult recidivism.
In the current study, Scale 1 (Sex Drive/Preoccupation) scores were most strongly associated with new, adult sexual offense charges. As noted, support for the predictive validity of Scale 1 has been mixed. The current results are consistent with several past studies supporting the predictive validity of Scale 1 scores for sexual recidivism (e.g., Chu et al., 2011; Prentky et al., 2010; Rajlic & Gretton, 2010; Viljoen et al., 2008, in treatment only; see also Viljoen et al., 2012), including one study that specifically evaluated predictive validity into adulthood (Ralston & Epperson, 2013). Factors unique to the individual studies not supporting Scale 1 may account for their null findings. For example, Aebi and colleagues (2011) included youths aged 10 to 18 convicted of pornography possession or sexual harassment, whereas the J-SOAP-II was designed for youths aged 12 to 18 with coercive sexual behavior.
The finding that Scale 1 was unrelated to adult nonsexual recidivism adds to a consistent literature documenting this nonsignificant effect across a variety of follow-up period lengths. Of the studies which have tested this relation, only Ralston and Epperson (2013) found a relation between Scale 1 and later violent offending, but this association was not significant when youths with sexually violent offenses were excluded from analyses. The nonsignificant association between Scale 1 and nonsexual recidivism may, in fact, lend support for the discriminant validity of this scale. The current study suggests that Scale 1 adds something to juvenile sex offense risk assessment that is absent from general delinquency risk assessments measures, as it does not appear to measure a general tendency toward violence but rather factors specific to sexual reoffending.
An opposite pattern was observed for Scale 2 (Antisocial/Impulsive) scores, which were strongly related to nonsexual recidivism. These results are at least partially consistent with past research providing evidence for the predictive validity of Scale 2 regarding nonsexual recidivism (see Table 2). However, in contrast to many studies of juvenile sexual recidivism (see Viljoen et al., 2012) and a comparable long- term study of adult sexual recidivism (e.g., Ralston & Epperson, 2013), Scale 2 did not predict adult sexual recidivism in the current study. It may be that unique factors present in the small number of sexual recidivists in this study contributed to this null finding. Given the few existing studies on the topic, additional research evaluating the validity of Scale 2 for adult sexual recidivism is needed to clarify this effect.
It is perhaps not surprising that no significant associations of Scale 3 (Intervention) or Scale 4 (Community Stability/Adjustment) with adult sexual recidivism emerged. While some studies have found support for these scales regarding juvenile recidivism (see Viljoen et al., 2012), it is important to consider that these dynamic scales rate youth on variables that can be expected to change over time, and studies including longer follow-up periods have not specifically investigated the predictive validity of these scales into adulthood (e.g., Ralston & Epperson, 2013). Furthermore, as these variables are typically intervention targets, they may be less reliable predictors of adult sexual recidivism when they are malleable and/or responsive to intervention. Future studies on the impact of intervention on J-SOAP-II scores could clarify this possibility.
Although past research on the predictive validity of Scale 3 and 4 scores for nonsexual recidivism has been mixed, both scales were strongly associated with adult nonsexual, violent recidivism in the current study. Yet overall, relatively few studies address the predictive validity of scores from adolescent risk assessment tools for adult nonsexual recidivism risk. It will be important for future research to further investigate this link for the subpopulation(s) of adolescents who offend sexually and who go on to reoffend nonsexually as adults. Such a program of research may be able to identify reliable factors associated with persistent criminality and thereby better delineate trajectories and possible interventions that may facilitate alternative healthier pathways.
A discussion of findings from the sum scores (Total, Static, Dynamic) is also warranted. As in most past studies, current results found significant associations of the Total Score with sexual and nonsexual violent recidivism; thus, it appears to have utility in this regard. Considering the observed results for the Static Score, however, it appears that Scale 1 may drive the significant effect for sexual recidivism, and Scale 2 may drive the effect for nonsexual recidivism. Together with results from past research, current results suggest that individual use of Scales 1 and 2 may be more informative than use of the Static Score when investigating risks for sexual and nonsexual violent recidivism, respectively. Similarly, although Scales 3 and 4 (and Dynamic Score) were associated with nonsexual violent recidivism, the significant effect for the Dynamic Score may have been primarily driven by Scale 3, reflecting a need for intervention.
Despite the contributions of the current study, several limitations and areas for future research inquiry are noted. The current study involved a retrospective research design and included a sample that was relatively small and comprised only youths referred to the Maine State Forensic Service for forensic evaluations by juvenile courts. Thus, youths deferred from the juvenile justice system were not included. Information about the youths was limited to what was available in case files and official databases, and the quality and amount of information varied (e.g., no information was available about whether youths remained in the community following court-ordered evaluations or required residential or correctional placements; thus, information about opportunity to reoffend was lacking). When needed information was absent or unclear, J-SOAP-II items were scored as instructed in the manual (lower ratings were assigned), potentially underestimating risk. When the amount of missing information precluded effective coding, cases were dropped altogether, further limiting the sample size. Relatedly, the interrater reliability estimates in the current study, while acceptable and consistent with past studies (e.g., Aebi et al., 2011; Martinez et al., 2007), may have been somewhat attenuated by the quality of the records available and may consequently have attenuated predictive validity estimates. This underscores the importance of future research that utilizes client/collateral interviews and records that detail relevant intervention needs and that provides sufficient information regarding community adjustment, stability, and supports.
The small sample also likely contributed to the low rate of juvenile recidivism in this study and our inability to compare the predictive validity of J-SOAP-II scores for youth who reoffended as adults with a sample of those who reoffend as juveniles. Such evaluations are needed. As an example of this important direction for future work, Ralston and Epperson (2013) found that the J-SOAP Static Scale only predicted adult recidivism in their full sample and failed to predict adult recidivism when juvenile-only recidivists were excluded. Interestingly, no risk assessment tool evaluated in that study performed as well as when assessing adult compared with juvenile sexual recidivism. The authors (Ralston and Epperson, 2013) noted that findings “confirm the greater difficulty in making longer term predictions into adulthood on the basis of adolescent behavior” (p. 914).
Recidivism rates in this study were comparable to those reported in past research (e.g., Caldwell, 2016); however, larger samples of recidivists and comprehensive data would facilitate increased complexity and utility of AUC analyses (Mossman, 2013), such as examining individual factors (e.g., opportunity to offend) that may shed light on past equivocal findings emerging from small recidivist samples. In addition, although ROC analysis is a popular approach for assessing predictive validity, future research could employ varied analytical techniques to address some limitations of this approach (see Szmukler & Rose, 2013), including sensitivity to risk score distribution (Howard, 2017) and high true negative rates (Caldwell, 2016). However, it is perhaps worthwhile for researchers to consider whether an acceptable goal for AUC analyses is to successfully identify low scoring nonrecidivists in addition to those who reoffend.
Although typology research is limited, a number of studies suggest that risk and needs may vary according to subtypes of youths who offend sexually. For example, Rajlic and Gretton (2010) found support for the predictive validity of J-SOAP-II scores in an aggregate sample of juveniles who offended sexually (and stronger effects for youths with no record of nonsexual delinquency), but null findings emerged for those who also engaged in other forms of delinquency. Parks and Bard (2006) found the J-SOAP-II more effective in identifying risk and needs of youths who abused child and peer or adult victims, rather than one or the other. Furthermore, as noted, Viljoen et al. (2008) found better predictive validity of J-SOAP-II scores in older versus younger adolescents, although Wijetunga et al. (2018) found stronger evidence with younger teens. Future research should explore these relations as a priori hypotheses. Also, the impact of ethnic, racial, and cultural factors on the predictive validity of J-SOAP-II scores (and scores from other adolescent risk assessment tools) deserves sufficient exploration (for discussion of the impacts of context and design on analyses, see Epperson & Ralston, 2016).
What is more, the predictive validity of these scores is multifaceted. Whether scores are associated with juvenile recidivism is of course of interest. Perhaps equally compelling is the notion that scores with long-term predictive validity are necessary to identify factors associated with future sexual offending by those who persist and, then, to intervene earlier with those who evidence such factors. Yet risk assessment tools cannot assess all aspects of risk or other relevant treatment needs, such as enhancing protective and treatment responsivity factors (Andrews & Bonta, 2010; Hanson et al., 2009). Thus, adequately validated risk assessment tools have their place as one component of ideographic assessment of risk and protective factors designed to identify individual treatment needs and inform effective triage and treatment decisions that may reduce recidivism risk and promote healthy development.
With such rehabilitative and community safety goals in mind, studies investigating the utility of the J-SOAP-II to promote a tailored approach to treatment are sorely needed. In their 2008 study, Viljoen and colleagues found Scale 1 associated with sexual misconduct during treatment. They also found Scale 2 associated with aggressive behavior while in treatment, but neither scale was related to long-term recidivism. Perhaps the absence of an association with long-term recidivism was the result of intensive risk-reduction interventions provided at the residential treatment program. Thus, for example, research studies investigating the utility of the J-SOAP-II historical subscales (Scales 1 and 2) for facilitating a more individualized approach to treatment—and thereby reduced recidivism—are required. Although Scales 1 and 2 comprise historical items, they are dynamic constructs. The extent to which these Static Scale scores continue to be risk-relevant may change over time, and this likelihood must be evaluated as well.
Finally, prospective studies that use the J-SOAP-II as described in the manual are needed. The manual recommends the J-SOAP-II be employed as but one component of a comprehensive risk and needs evaluation and encourages users to gather self-report and multiple sources of information that ideally provide convergent validity. The authors note that, because adolescents are in a rapid phase of development, reassessments (i.e., readministering J-SOAP-II, or at least Scales 3 and 4) should be conducted every 6 months or more often if risk-relevant events have occurred. A prospective study that investigates the J-SOAP-II in this way would allow practitioners and researchers to gather necessary information and evaluate assessments in real time, even assessing or at least controlling for the effects of potential confounds, such as the type of treatment and its intensity or dosage, and community versus residential placement. Such studies also could investigate the J-SOAP-II’s utility in assessing fluctuations in risk and guiding treatment plans over the short and long term.
Despite the low base rate of sexual recidivism, sexual abuse is a terrible and harmful crime, and the risks for and impacts of nonsexual recidivism are also clear and significant. The current study adds to the professional literature dedicated to conducting high-quality assessments for identifying risk-relevant factors and treatment needs among youths who have sexually offended. Such research is essential for developing effective interventions that reduce the incidence of sexual and nonsexual offending and facilitate safe communities and positive, healthy development.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
