Abstract
The predictive validity of the Juvenile Sexual Offense Recidivism Risk Assessment Tool–II (JSORRAT-II) was evaluated using an exhaustive sample of 11- to 17-year-old male juveniles who offended sexually (JSOs) between 2000 and 2006 in Iowa (n = 529). The validity of the tool in predicting juvenile sexual recidivism was significant (area under the receiver operating characteristic curve [AUC] = .70, 99% confidence interval [CI] = [.60, .81], d = 0.70). Non-significant predictive validity coefficients were observed for the prediction of non-sexual forms of recidivism. Additional analyses were undertaken to test hypotheses about the tool’s performance with various subsamples. The age of the JSO at the time of the index sexual offense and time at risk outside secure facility placements interacted significantly with JSORRAT-II scores to predict juvenile sexual recidivism. The implications of these findings for practice and research on the validation of risk assessment tools are discussed.
Keywords
Risk assessment with juveniles who have offended sexually (JSOs) is necessary given the impact of sexual offending and sexual recidivism on the victims, victims’ families, society, and the juveniles who commit those crimes (Prescott, 2004). Such assessments, when accurate, have the potential to inform a number of decisions, such as placement, programming, treatment, and supervision (Epperson & Ralston, in 2015; Epperson, Ralston, Fowers, DeWitt, & Gore, 2006). Given limited budgets and human resources, risk assessment data can be used to allocate resources to those JSOs who most need more restrictive placements, supervision, and treatment. Effective and efficient allocation of resources has the potential to improve community safety by limiting the opportunity for the highest risk JSOs to reoffend and/or by giving them appropriate treatments that effectively reduce their risk. In addition, accurate risk assessment has the potential to benefit very low risk JSOs by segregating them from higher risk JSOs, so as to avoid contagion effects. Furthermore, the ability to identify very low risk JSOs has the potential to reduce the unnecessary imposition of adult statutes with long-term, or even lifetime, consequences on those JSOs.
Risk assessment with adults and juveniles who have sexually offended tends to focus on dynamic risk factors, static risk indicators, or some combination of dynamic and static variables (Epperson & Ralston, in press). Dynamic risk factors are personal characteristics (e.g., deviant sexual attitudes) that causally relate to the likelihood an individual will commit an additional sexual offense (Hanson & Harris, 2000; Hanson, Harris, Scott, & Helmus, 2007; Ward & Gannon, 2006). These factors are dynamic because they can change over time through learning, maturation, or intervention (e.g., treatment). The degree to which individual factors are changeable ranges from relative stability on one end of the spectrum (e.g., impulse control deficits) to more acute or rapidly changing on the other (e.g., intoxication). Examples of tools focused on dynamic risk factors include the STABLE and ACUTE (Hanson et al., 2007) for adults who sexually offend and the dynamic risk scales of the Juvenile Sex Offender Assessment Protocol–II (J-SOAP-II; Prentky & Righthand, 2003).
There are advantages and disadvantages to constructing a risk assessment tool focused on dynamic risk factors. Probably the biggest advantage is that these types of assessments can be used to identify when risk has been reduced through treatment or maturation or to signal when an offense might be imminent. The biggest disadvantage is limitations in the ability to accurately measure dynamic risk factors and adequately capture their complex interactions (Epperson & Ralston, in press). Furthermore, these types of tools often require interview data, increasing vulnerability to response bias, manipulation of self-presentation, and lower inter-rater reliability.
Static risk indicators, however, are often the focus of actuarial risk assessment tools, such as the Static-99 (Hanson & Thornton, 2000) and the Minnesota Sex Offender Screening Tool–Revised (MnSOST-R; Epperson et al., 2004). Risk indicators fall into one of two categories. First, some indicators are behavioral markers that reflect historical manifestations of dynamic risk factors and their complex interactions (Epperson & Ralston, in press). The empirical link between these indicators and sexual recidivism presumably reflects the relative stability of personality, including underlying dynamic risk factors. For example, having three prior contact sexual offenses against three different victims likely indicates an individual with a more “virulent” underlying dynamic risk structure than a person who only has one non-contact sexual offense against a single victim. Typically, the more behavioral markers that indicate the historical expression of the dynamic risk process, the more likely that individual is to have future expressions of that process (e.g., new sexual offenses). Second, some risk indicators might reflect socio-cultural or developmental factors that causally contribute to the development of dynamic risk factors. For example, negative attachment relationships, sustained exposure to violence at an early age, extensive personal abuse victimization, and few or no prosocial peer relationships may lead to distorted sexual attitudes, callousness, narcissism, sexual preoccupation, and so on. In summary, research supports the link between risk indicators and future sexual offending, and a better ability to assess dynamic risk factors may explain the observed relationships between risk indicators and sexual recidivism.
Like assessments focusing on dynamic risk factors, tools that primarily utilize static risk indicators have advantages and disadvantages. A major advantage is that many static risk indicators can be objectively and reliably identified through a normal case file review (e.g., number of victims). Because most such tools do not require an interview, they are far less susceptible to response bias or impression management efforts on the part of the individual who offended. Non-reliance on interview data also opens up the possibility that such tools can be scored by individuals who do not possess advanced training in clinical assessment.
The same reliance on static risk indicators also creates some disadvantages. Although such tools can meaningfully inform decisions about the magnitude of treatment needed (intensity/duration), those assessments cannot focus treatment on specific psychological needs in the way that a dynamic tool may be able to do. Similarly, tools based on static risk indicators are not responsive to reductions in risk through learning, maturation, or treatment. A reduction in the dynamic risk factors will not be reflected in changes in scores to items such as number of prior sexual offenses. Relatedly, caution should be exercised when assigning risk labels that are based on the scores of such tools. Research-informed expiration dates should accompany all risk label descriptions to avoid the possible negative impacts of such labels (e.g., stigma, lost opportunity).
A host of risk assessment tools have been developed over the past couple of decades for adults who have offended sexually. However, there are relatively fewer options for JSOs. Of tools for JSOs, most are empirically guided and combine static indicators with dynamic risk factors. Examples include the Estimate of Risk of Adolescent Sexual Offense Recidivism (ERASOR; Worling & Curwen, 2001), the Juvenile Risk Assessment Scale (JRAS; New Jersey Attorney General’s Office, 2006), and the J-SOAP-II (Prentky & Righthand, 2003). Although the predictive validity of some adult tools is well established, there has been much more limited research exploring the predictive accuracy of juvenile tools.
The lone JSO-specific risk assessment focusing exclusively on static risk indicators is the Juvenile Sexual Offense Recidivism Risk Assessment Tool–II (JSORRAT-II; Epperson & Ralston, 2015; Epperson et al., 2006). The JSORRAT-II was developed empirically using variables extracted from archival juvenile justice records for all JSOs who were adjudicated guilty of a sexual offense from 1990 through 1992 in Utah (hereafter referred to as the “development sample”). In that development sample, the JSORRAT-II performed well, significantly exceeding chance-level prediction for sexual recidivism (new officially charged sexual offense) prior to age 18 (area under the receiver operating characteristic curve [AUC] = .89, 95% confidence interval [CI] = [.85, .92]). This level of performance in the development sample warranted cross-validation research.
Two cross-validation studies have been published. In the first, Epperson and Ralston (2015) scored the JSORRAT-II on another exhaustive sample of male JSOs from Utah (n = 566) who had a sexual offense in either 1996 or 1997 (hereafter referred to as the “Utah cross-validation sample”). This cross-validation sample was geographically identical and demographically parallel to the JSORRAT-II development sample. The cross-validation attempt was successful with an observed AUC of .65 (95% CI = [.59, .72]) when predicting juvenile sexual recidivism (new official charge). Additional analyses indicated that overall predictive accuracy was not meaningfully affected by age at offense, severity of the index sexual offense, or missing data.
The only other published JSORRAT-II cross-validation attempt was conducted with a sample of 169 male JSOs from a residential treatment facility (Viljoen et al., 2008). In addition to the JSORRAT-II, the J-SOAP-II and the Structured Assessment of Violence Risk in Youth (SAVRY) were scored. The researchers defined sexual recidivism more broadly than in the previous two JSORRAT-II studies, including all reports of sexual aggression from law enforcement, probation, or treatment reports. Also unlike the previous two studies, the JSORRAT-II did not significantly predict sexual aggression during treatment (AUC = .59, ns) or after discharge (AUC = .53, ns). Similarly, neither the J-SOAP-II nor SAVRY predicted sexual aggression. Some caution is needed when interpreting those results, however, because of potential problems associated with smaller samples drawn from a single, residential treatment facility.
The predictive validity of the JSORRAT-II also was examined in a recent meta-analysis (Viljoen, Mordell, & Beneteau, 2012). Included in their analyses were the three studies mentioned above, as well as two studies presented at national-level conferences and unpublished data by two additional researchers. Of note, one of the conference presentations represented analyses conducted on a subset of the sample used for the study described in this article. When the researchers included the JSORRAT-II development sample, the aggregate AUC was .64 (95% CI = [.54, .74]). When that study was excluded from the analysis, the AUC dropped to .61 (no CI reported) but remained significant. However, the interpretation of this value is somewhat uncertain, given that the authors also observed significant heterogeneity across the samples.
The present study represents another attempt to cross-validate the JSORRAT-II. Previous studies were limited in a couple of ways. First, the two published studies by Epperson and colleagues used JSOs restricted to Utah, which limited the generalizability of those findings to different geographic areas and jurisdictions. Second, the Viljoen and colleagues (2008) study was potentially limited by using a small sample of convenience. The present study attempted to account for those limitations by scoring the JSORRAT-II on a new, exhaustive sample of male JSOs adjudicated for a sexual offense in Iowa from 2000 through 2006. We hypothesized that the JSORRAT-II would predict juvenile sexual recidivism above chance levels with this new sample. Recidivism data for non-sexual offenses also were collected, and we hypothesized, consistent with the findings of Epperson and Ralston 2015), that the JSORRAT-II would not significantly predict those forms of recidivism. Finally, data on placement in secure facilities after the index sexual offense were collected to evaluate the effect of time at risk on the predictive validity coefficients, and we hypothesize that time at risk would moderate the predictive accuracy of the JSORRAT-II.
Method
Sample
Juvenile judicial and youth corrections case files representing 529 male JSOs were obtained from the Iowa Juvenile Courts and Division of Criminal and Juvenile Justice Planning. These files represented an exhaustive sample of all juveniles adjudicated for a sexual offense in the state of Iowa between 2000 and 2006. Juveniles ranged in age from 11.0 to 17.8 years, with the average age of juveniles at the time of their index sexual offense intake being 15.0 years (SD = 1.4). Of note, though the JSORRAT-II scoring manual restricts the use of the tool to those JSOs ages 12 through 17, the JSORRAT-II development and Utah cross-validation samples used by Epperson and colleagues also included JSOs who were 11 years old at the time of their index sexual offense if those JSOs were adjudicated on or after their 12th birthdate (Epperson & Ralston, 2015; Epperson et al., 2006). To be consistent with those studies, we also included 11-year-olds who met that criterion. A total of 1.3% of the sample were age 11, 7.6% were age 12, 13.6% were age 13, 25.7% were age 14, 27.2% were age 15, 15.7% were age 16, and 8.9% were age 17. The majority of the sample was identified as Caucasian/White (79.6%), and the remainder of the sample was identified as African American/Black (8.7%), Hispanic/Latino (4.5%), Native American (1.1%), Asian American (0.6%), multiethnic or Other (3.1%), and unspecified (2.5%).
Materials and Procedures
Case files
All case files were copied and then transported from the eight Iowa Judicial District offices to the researchers. The content of each case file varied somewhat, but all contained a record of the JSO’s criminal involvement in the juvenile justice system up to and including their index sexual offense. The index sexual offense was defined as the first adjudicated sexual offense on or after January 1, 2000. Records of criminal involvement typically included arrest, investigation, court, and jurisdictional review reports. From these records, information could be gathered about past and current sexual offenses, including information about events leading up to the sexual offense, the nature of the offense, and information about victims. Additional information found in most files included probation or caseworker reports, psychological evaluations, education reports, and Department of Human Services reports. These additional files provided information about educational history, social functioning, substance use and abuse, mental health issues, treatment history, caregiving stability, and histories of being the victim of abuse and neglect.
JSORRAT-II
The JSORRAT-II is a 12-item, actuarial risk assessment tool for male JSOs, ages 12 to 18, which has received some initial empirical support for predicting juvenile sexual recidivism (i.e., a new sexual offense officially charged prior to age 18). As mentioned above, predictive validity evidence has been mixed so far, with studies by the authors of the JSORRAT-II finding moderate to good predictive accuracy (Epperson & Ralston, 2015; Epperson et al., 2006) and a study by other researchers finding null results (Viljoen et al., 2008). For the present study, all items on the JSORRAT-II were directly scored according to the scoring manual, with no modifications for any item.
Data extraction
Data were extracted to a coding form that included the JSORRAT-II variables, and several research variables found to be significantly associated with sexual recidivism in the JSORRAT-II development study (Epperson & Ralston, 2015; Epperson et al., 2006). Eleven graduate and undergraduate research assistants, with no knowledge of the recidivism status of the JSOs, were trained over the course of several meetings on the procedures for extracting information from the case files. During the training meetings, assistants were introduced to and trained on how to use the coding form. All research assistants then extracted data from the same set of practice cases. These practice cases were actual cases from the sample. After completing these cases, all coders met with the researchers to discuss the cases, any discrepancies in scoring, and any other questions pertaining to scoring the cases. This process was repeated until all coders consistently scored several cases in a row.
In addition to this primary training, all coders met with at least one of the authors weekly throughout the course of data collection to help prevent coder drift. During these sessions, the coders and the researchers reviewed discrepancies in scoring reliability cases, key scoring issues, and any additional questions pertaining to the coding information from the cases.
Reliability cases
Approximately once every other week during the data extraction period, each coder was instructed to score 1 of fifty reliability cases. The coders were instructed not to discuss these cases with other coders and to place their coding forms in a separate secure location where the other coders would not have access to their responses. These coding form responses were used to assess inter-rater reliability. Each research assistant coded an average of 9.1 reliability cases across the study.
Data entry
Each coding form was double-entered into a SPSS database to assess and correct for data entry error. Once all forms had been entered, the researcher analyzed the entries for inconsistencies. Upon finding inconsistencies, the original coding form was consulted for the appropriate entry, and the database was corrected.
Recidivism and secure facility placement data
After all information had been extracted from the case files, sexual and violent recidivism data were obtained through a search of the Iowa Justice Warehouse, an electronic database of all official police or court contacts, court decisions, and judicially mandated placements. Sexual recidivism was defined as a new formal charge for a sexual offense by statute prior to age 18. The definition included contact (e.g., sexual abuse) and non-contact (e.g., indecent exposure) sexual offense charges, as well as attempted sexual offenses when that attempt was clearly documented in the charge description (e.g., “assault with intent for sexual abuse”). Sexually recidivating offenses were coded as sexually violent when the statute for the charged offense included the terms “force,” “threaten to use force,” “weapon,” or “cause injury,” or “substantial risk of death or serious injury.” Finally, recidivating offenses were classified as non-sexual violence when the charging offense was defined as violent by statute (e.g., attempted murder, simple assault, disorderly conduct—fighting or violent behavior). This definition included threats of violence (e.g., intimidation with dangerous weapon) and harassment involving a threat to commit a forcible felony. Because no police or investigation reports were included in the Iowa Justice Warehouse database, it was possible that some sexually motivated recidivating offenses were missed due to prosecutorial discretion, such as charging an offense as a non-sexual assault despite the offending behavior including sexual elements.
The average age of juveniles at the time of their intake for their index sexual offense was 15.0 (SD = 1.4), indicating that JSOs were at risk, on average, for 3.0 years before turning age 18. This at-risk time also included, for some, periods when JSOs were in secure facilities (e.g., detention facility, inpatient treatment facility) that might have limited their opportunities to reoffend. To account for the time at risk outside of secure facilities, data from all judicially mandated placements were obtained from a search of the Iowa Justice Warehouse. Those data included the type of placement (e.g., unsecured outpatient treatment, secure inpatient treatment, secure detention), as well as dates of entry into and release from the placements. These data were used to calculate true time at risk outside of secure facilities, so as to explore the potential impact of reduced threat (through secure placement) on the predicative accuracy of the JSORRAT-II.
Data Analyses
Inter-rater reliability
Inter-rater reliability was calculated in two ways. As mentioned above, 50 reliability cases were scored by dyads selected from 11 coders. Ideally for this type of research, a singular intra-class correlation coefficient (ICC) for absolute agreement using a two-way mixed effect model would be used; however, the calculation of that statistic requires all raters to rate all cases. That was not the case in this study. Consequently, the data needed to be structured in such a way that data from one coder in the dyad for each case were specified in one column, and the data from the other coder from that same dyad were specified in the second column. However, coders could appear in either column, depending upon the case, and could appear multiple times in both columns because of the number of cases coded (M = 9.1). In this arrangement, rater effects could not be estimated directly, and hence, the resulting ICC for absolute agreement using a two-way mixed model is likely to be too liberal an estimate. Alternatively, a one-way random-effects ICC calculated on the same data is likely to be too conservative an estimate because it does not separately account for the effects of raters, the interaction between raters and cases, and random error (Shrout & Fleiss, 1979). Given this, we opted for the calculation of the conservative estimate (one-way random) of reliability. In addition, to help contextualize the ICCs, we also provide the percent agreement by item and total score.
Predictive validity analyses
Overall predictive accuracy for the JSORRAT-II was assessed utilizing the AUC statistic. This type of analysis was chosen because of the possibility of utilizing different cut scores to aid different types of decisions. Because the AUC statistic assesses the accuracy across all possible cut scores, it is the most appropriate statistic for assessing the overall accuracy (Mossman, 2013; Quinsey, Harris, Rice, & Cormier, 1998). Also, unlike correlation coefficients, the AUC statistic is largely independent of base rates, making it comparable across samples and studies (Quinsey et al., 1998). AUC values were transformed to d effect size values and are reported alongside each AUC and 99% CI to help contextualize those values. These d statistics were calculated using Equation 6 in Mossman’s (2013) paper. Finally, logistic regression analysis was used to assess the relative contributions of age at the time of the index sexual offense, time at risk, and JSORRAT-II total scores in the prediction of sexual recidivism.
Results
Recidivism Status
A total of 34 (6.4%) JSOs had a recidivating sexual offense prior to age 18. The time between the index adjudication and the sexual recidivating offense ranged from 6 days to 4.8 years, with a median of 4.8 months (M = 9.2, SD = 12.2). The rate of sexual recidivism in this sample was significantly less than that observed in the exhaustive samples used to develop, 13.2%, χ2(1) = 13.85, p < .01 (Epperson & Ralston, 2015; Epperson et al., 2006) and initially cross-validate the JSORRAT-II in Utah, 12.4%, χ2(1) = 11.68, p < .05 (Epperson & Ralston, 2015).
Although sexual recidivism is an important criterion to predict, the public’s concern is not limited to sexual reoffense. The public is also concerned about violent recidivism. Consistent with that argument, we also examined violent offending as a criterion. A total of 82 (15.5%) JSOs had a new charge for a violent offense by the time they turned 18. However, for 19 of these individuals, their violent recidivating offense was a sexually violent offense. Thus, only 63 (11.9%) of JSOs had a new non-sexual violent offense, and 3 had both non-sexually violent and sexually violent recidivating offenses. A total of 178 (33.6%) of the JSOs in the sample had a new juvenile offense of some kind after the adjudication of their index sexual offense. Thus, 81 JSOs (15.3%) were charged with non-sexual, non-violent criminal offenses.
Reliability Analyses
The 11 research assistants extracted data on 50 reliability cases across the entire data extraction period. The percent agreement and the ICCs using a one-way random-effects model were calculated on rater-pairs’ JSORRAT-II total scores (see Table 1). Overall, total scores were identical in 24 of 50 (48.0%) cases. An additional 20 (40.0%) of cases were different by a single point, and 6 (12.0%) were different by two points. No cases’ total scores were different by three or more points. Reliability of scoring the tool was observed to be strong, despite the conservative one-way random model (ICC = .97, 95% CI = [.94, .98]). Although the reliability estimate observed was very strong, this level of reliability might be higher than one would expect in real-world settings, given the level of training and oversight the research assistants received over the course of the study.
Intra-Class Correlation Coefficients for Each JSORRAT-II Item and Total Score.
Note. JSORRAT-II = Juvenile Sexual Offense Recidivism Risk Assessment Tool–II; ICC = intra-class correlation coefficient; SO = Sexual Offense.
The reliability of scoring was somewhat more variable at the individual item level (see Table 1). Percent agreement ranged from 82.0% to 100%, and ICCs ranged from .67 to 1.00. The majority of items exhibited acceptable reliability, with all but two items meeting or exceeding 90% agreement and all but four exceeding an ICC of .90 when using the more conservative one-way random-effects model. Most problematic was the item representing the use of deception or grooming during the commission of a prior sexual offense (Item 6; 82% agreement, ICC = .67). The relative unreliability of this item makes some sense when one considers that it requires scorers to interpret the motivation behind actions leading up to the commission of the offense (e.g., was play behavior by JSO initiated to build a relationship for the purpose later offending?).
Predictive Validity Analyses
After double entry errors were resolved, scores were generated for each individual JSORRAT-II item and the total score. Total scores ranged from 0 to 13 in the full sample, with a mean score of 3.3 (SD = 2.3). That mean was not significantly different from the mean observed in the JSORRAT-II development sample, M = 3.6, t(1126.4) = 1.76, p > .05, or the Utah cross-validation sample, M = 3.6, t(1016.8) = 1.61, p > .05, the other two exhaustive samples used to test the predictive validity of the JSORRAT-II, but it was significantly lower than the mean observed by Viljoen and colleagues (2008), M = 6.2, t(213.5) = 9.82, p < .05. Also, similar to both previous exhaustive samples (Epperson et al., 2006; Epperson & Ralston, 2015), scores for the present sample were skewed in a positive direction, with 75.2% of the sample scoring between 0 and 4. 1
Juvenile sexual recidivists scored significantly higher (M = 4.8, SD = 2.2) than non-recidivists (M = 3.2, SD = 2.2) on the JSORRAT-II, t(37.7) = 3.96, p < .001, d = 0.70. JSORRAT-II scores also correlated with recidivism status (rB = .170, p < .001). Logistic regression analysis with JSORRAT-II scores predicting juvenile sexual recidivism status also provided an adequate fit to the data, χ2(1) = 13.27, p < .001, Hosmer and Lemeshow χ2(6) = 3.29, p > .05.
The AUC statistic was calculated for JSORRAT-II scores predicting juvenile sexual recidivism status. The 99% CI was used to account for the possible inflation in Type I error due to the number of AUCs calculated. The resultant AUC was significant (AUC = .70, 99% CI = [.60, .81], d = 0.70) and represented a moderate level of predictive accuracy with this sample (see Table 2). The AUC for JSORRAT-II scores in predicting violent juvenile recidivism was 0.62 (99% CI = [0.54, 0.70], d = 0.38). Although significant, this value represents a substantial loss of predictive accuracy, and given the possibility that the accuracy was driven largely by the inclusion of sexually violent recidivists in the violent recidivism criterion, those recidivists whose only violent offense was sexual in nature were removed and the analysis was repeated. The resultant AUC was not significant (AUC = .57, 99% CI = [.48, .66]), indicating that the previous significant finding was driven not by the JSORRAT-II’s ability to predict violent recidivism in this sample but by its ability to predict sexual recidivism. The JSORRAT-II also did not significantly predict general recidivism (AUC = .54, 99% CI = [.48, .61]).
Sample Characteristics and Predictive Validity Indices by Type of Recidivism Criterion.
Note. AUC = area under the receiver operating characteristic curve; ns = non-significant.
Age at time of index offense
Although Epperson and Ralston (2015) did not find evidence that age at the time of JSOs’ index sexual offense affected the tool’s predictive accuracy in either the development or Utah cross-validation samples, they did not test the effect of age as an independent predictor of recidivism risk or whether age moderated the predictive accuracy of the JSORRAT-II. Offenders who are younger at the time of their index adjudication have longer to reoffend before they turn age 18, which leaves open the possibility that the JSORRAT-II might perform better for those who are younger and, thus, have more time to reoffend. As mentioned above, the mean age of JSOs in this sample was 15.0 years (SD = 1.4). In addition, 1.3% of the sample were age 11, 7.6% were age 12, 13.6% were age 13, 25.7% were age 14, 27.2% were age 15, 15.7% were age 16, and 8.9% were age 17.
The impact of age at index offense and JSORRAT-II total scores in predicting recidivism status was assessed using logistic regression analysis. In the first block of the equation, age and scores were entered simultaneously, and an interaction factor was entered into the second block. The first block significantly predicted recidivism status, χ2(2) = 13.47, p < .001, Hosmer and Lemeshow χ2(8) = 6.78, p > .05; however, only the coefficient for the total score was significant (Wald χ2 = 14.05, p < .001). When the interaction was added to the equation, the step, χ2(1) = 4.55, p = .03, and model were significant, χ2(3) = 18.01, p < .001, Hosmer and Lemeshow χ2(8) = 9.01, p > .05. In addition, the coefficients for JSORRAT-II total scores (Wald χ2 = 5.92, p = .02) and the interaction were significant (Wald χ2 = 14.05, p = .03). This result seems to indicate that age at the time of the index sexual offense moderates predictive accuracy of the JSORRAT-II.
To investigate the nature of that interaction further, three age-based subsamples were created and AUCs were calculated. We chose not to examine performance for individual years because the small numbers of JSOs at some ages might unduly reduce the power of the statistical test. As can be seen in Table 3, the JSORRAT-II performed well for JSOs in the 11- through 13-year-old and 14- through 15-year-old age groups, with AUCs in excess of 0.70. However, the AUC for the 16- through 17-year-old age group failed to reach significance. This latter finding is potentially an artifact of the shorter follow-up period, given the final cutoff for the recidivism period was age 18. It might be possible that the AUC would rise with a longer follow-up. Furthermore, for some JSOs the follow-up period might be compromised by the actual time the JSO was at risk outside a secure facility.
Sample Characteristics and Predictive Validity Indices by Age Subsample.
Note. AUC = area under the receiver operating characteristic curve; ns = non-significant.
Time at risk
Age at the time of the index offense does not account for other constraints that might limit opportunity to reoffend (i.e., threat to the community). Thus, age is not a perfect proxy for time at risk. To more directly account for that, a time at-risk variable was calculated from the data collected on judicially mandated placements between the index sexual offense adjudication and the JSO’s 18th birthdate. Specifically, time at risk was defined as the time between the index adjudication and the 18th birthdate minus the time spent in some form of secure facility. Examples of secure facilities include inpatient treatment facilities, detention centers, and secure group homes. For recidivists, to avoid counting secure facility placements resulting from their recidivating offense against their time at risk, those placements were subtracted from the total time spent in a secure facility.
Table 4 presents the rates of JSOs receiving secure facility placements and descriptive statistics for those placements. As observed in the table, non-recidivists had proportionally more secure facility placements after the index offense, and, on average, they were placed for longer durations, t(38.0) = 3.18, p < .05. The difference in placement patterns is even more pronounced when restricting the analysis to those non-recidivists scoring 7 or higher on the JSORRAT-II, t(70.8) = 4.06, p < .05. If the JSORRAT-II adequately predicts sexual recidivism, it is JSOs in that score range who are most likely to reoffend; yet the majority (68.3%) received secure facility placements at lengths significantly longer than recidivists. The impact of that differential secure facility placement pattern could be to reduce opportunity to reoffend and, thus, confound the predicted criterion. In other words, with greater opportunity, more of the youth with high scores may have committed additional sexual offenses.
Percentage and Time Spent in Secure Facility Placement by Recidivism Status.
Logistic regression analysis was used to test the effect of time at risk on the predictive accuracy of the JSORRAT-II. In the first block, JSORRAT-II total scores and time at risk were entered simultaneously, followed by an interaction term in the second block. The first block significantly predicted recidivism status, χ2(2) = 21.12, p < .001, Hosmer and Lemeshow χ2(8) = 12.56, p > .05, and the coefficients for both time at risk (Wald χ2 = 7.97, p = .005) and total scores (Wald χ2 = 15.50, p < .001) were significant. When the interaction was added to the equation, the step was not significant, χ2(1) = 2.55, p > .05, but the model remained significant, χ2(3) = 23.654, p < .001, Hosmer and Lemeshow χ2(8) = 6.53, p > .05. In addition, none of the coefficients remained significant.
Time at risk depends, in part, on two factors. First, all else being equal, the length of time in a secure placement reduces the amount of time the JSO is a threat to the community. However, a 1-year placement for a young JSO (e.g., age 12) will have a different impact on time at risk than that same 1-year placement for an older JSO (e.g., age 17) when there is a set ending time (here, age 18) for the follow-up period. Given that, the time at risk variable used in the previous analysis might not be the most appropriate method to investigate the interaction between JSORRAT-II scores and the opportunity to recidivate.
To account for these problems, logistic regression analysis was used to investigate the interactions among age at index, time at risk, and JSORRAT-II scores. In that analysis, all three variables were entered into the first step of the equation. In the second block, the two-way interaction terms were entered, and in the third block, the three-way interaction term for age by time by scores was entered. The first, χ2(3) = 27.68, p < .001, Hosmer and Lemeshow χ2(8) = 7.05, p > .05; second, χ2(2) = 5.83, p = .054, Hosmer and Lemeshow χ2(8) = 4.08, p > .05; and third, χ2(1) = 9.50, p = .002, Hosmer and Lemeshow χ2(8) = 9.28, p > .05, steps were significant, as was the final model, χ2(6) = 43.01, p < .001, Hosmer and Lemeshow χ2(8) = 9.28, p > .05. All single variable and interaction coefficients were significant in the final model, with the exception of the singular time at-risk coefficient (Wald χ2 = 3.40, p = .065), though it was marginal. The pattern of results reflects that predictive validity of the JSORRAT-II in this sample is dependent, in part, on both the age of the JSO at the index offense and the time spent outside secure facility placements.
To illustrate this effect, AUCs were calculated for three subsamples. In each, JSOs were required to be at risk for a minimum of 3 months. From there, the subsamples were distinguished by the age of the JSO at their index offense: 11 through 16 years old, 11 through 15 years old, and 11 through 14 years old. As observed in Table 5, AUC values improved after accounting for the three variables, with the highest accuracy observed with the subsample who were at risk for at least 3 months and who were aged 11 through 14 years old at the time of their index sexual offense. For that group, the AUC value was .78 (99% CI = [.66, .90], d = 1.08).
Predictive Validity Indices by Age, Time at Risk, and Recidivism Status.
Note. AUC = area under the receiver operating characteristic curve.
Finally, predicted probabilities for individual JSORRAT-II scores were generated by regressing sexual recidivism on JSORRAT-II scores for these subsamples and then plotting those probabilities by JSORRAT-II score. Those probabilities are plotted in Figure 1 alongside the predicted probabilities for the full sample. As observed in that figure, accounting for time at risk and age at offense had the greatest impact on the predicted probabilities of those JSOs with the highest scores. Whereas the predicted probabilities of low scorers were nearly identical across subsamples, the probabilities diverge at higher score values. Together, these results seem to suggest that age and time at risk have their largest impact on those scoring in the higher ranges.

Predicted probability of sexual recidivism by JSORRAT-II scores and by age subsample with requirement of at least 3 months at risk outside a secure facility.
Discussion
The primary purpose of the present study was to assess the predictive validity of the JSORRAT-II with a new, exhaustive sample that was geographically and jurisdictionally different from the samples used to develop and initially cross-validate the tool. In this regard, the JSORRAT-II performed moderately well overall, with a statistically significant AUC = .70 and an associated moderate effect size (d = 0.70). Because this sample was geographically distinct, the results lend some evidence to the generalizability of the predictive validity outside Utah, where JSORRAT-II was developed and initially cross-validated.
Although a significant AUC was found for the prediction of juvenile sexual recidivism, the JSORRAT-II total scores were not a significant predictor of non-sexual forms of recidivism. It did not significantly predict general recidivism (e.g., theft), and the AUC for predicting violent recidivism became non-significant when JSOs whose only violent recidivating offense was sexual in nature were removed from the analysis. Given this finding, which is similar to that found by Epperson and Ralston (2015) with their cross-validation sample in Utah, the JSORRAT-II has established good discriminant validity as a predictor of juvenile sexual recidivism.
The JSORRAT-II also performed better when predicting sexual recidivism for younger JSOs. In fact, the tool did not significantly predict recidivism status for JSOs who were aged 16 through 17 years old at the time of their index adjudication in this sample. It is possible that the JSORRAT-II does not do well for JSOs who are older. Yet, it is also possible that this latter finding is an artifact of the shorter follow-up period in this study, given that the final cutoff for the recidivism period was age 18. There is some support for this latter hypothesis, in that age and time at risk interacted with JSORRAT-II scores in the prediction of juvenile sexual recidivism. Furthermore, the biggest impact of age and time at risk appeared to be at highest score ranges. Older JSOs scoring in the higher ranges of the tool who also received secure placements had limited or no time “at risk” to reoffend before age 18. Although this is positive from a community safety point of view, the reduced time at risk (i.e., reduced threat) confounds the risk criterion. Potentially, with a longer follow-up to account for longer time at risk, AUCs for those JSOs might reach significance. However, because the at-risk period for this study ended at age 18, future studies are needed to investigate that hypothesis.
As a whole, the JSORRAT-II performed moderately well in the present sample. As illustrated in Figure 1, the predicted probabilities for each score when using the full sample ranged from just greater than 0.0 for scores of 0 to just above 0.4 for a score of 13, the highest score observed with this sample. Greater separation in predicted probabilities was only observed after accounting for age and time at risk. Given this observation, the AUC for the full sample seems to be driven, only in part from the tool’s ability to identify recidivists at higher score ranges. The significant AUCs are also explained by the tool’s ability to identify non-recidivists in the lower score ranges. For example, the present sample was positively skewed with the mean JSORRAT-II score of 3.3 (SD = 2.3) for the entire sample and three quarters of the sample scoring between 0 and 4. That same 75% of the sample also recidivated below the full-sample base rate (6.4%). Specifically, 4.5% of those scoring 0 through 4 sexually recidivated sexually, a pattern very similar to that observed with the JSORRAT-II development and Utah cross-validation samples (Epperson & Ralston, 2015; Epperson et al., 2006). Given this, it seems that a substantial part of the JSORRAT-II’s predictive validity in this sample is drawn from its ability to predict desistence, as opposed to persistence. Furthermore, the ability to identify a large proportion of JSOs with a relatively low rate of recidivism has important implications for decision making. The 75% who scored 4 or less on the JSORRAT-II may not require intervention beyond detection and a relatively brief psycho-educational program, and the resources saved with this group could be allocated to the other approximately 25% of the sample with a more significant risk of future sexual offending. In addition, the pattern described above demonstrates the potential of the JSORRAT-II to identify low risk adolescents who should be segregated from higher risk adolescents and who might not need potentially stigmatizing and limiting labels derived from adult registration and notification policies. The net results of being able to identify desisters adequately might better allow for those JSOs to reintegrate prosocially into their communities and society.
Finally, the reliability of scoring the JSORRAT-II was quite high for the total score. This ICC value is also consistent with the ICC reported by Epperson and Ralston (2015), despite using a more conservative method for calculation. However, the inter-rater reliability for one item was more problematic. Item 6 focuses on the use of deception or grooming to gain access or compliance by the victim and, unlike all other items on the JSORRAT-II, there are fewer concrete behavioral markers for this item. Instead, for most JSOs, scorers are required to interpret the motivation behind sets of behaviors. Given that, the lower ICC makes some sense, particularly when there are incomplete descriptions of such behaviors in the case file. Despite some difficulty with this item, the remaining JSORRAT-II items and the total score were scored with a generally high degree of inter-rater reliability.
Strengths, Limitations, and Future Research
The present sample represented all male JSOs adjudicated for a sexual offense from 2000 to 2006 from an entire state. As such, all JSOs, regardless of severity of index sexual offense, prior offending history, prior treatment placement, or post-adjudication placement, were represented in the sample exactly as they were represented in the state of Iowa between those two dates. This type of sample has advantages over samples of convenience, such as those drawn from specific treatment programs or placement settings, because the use of exhaustive samples reduces the possibility of restricted range problems and allows for generalizability to the broader class of JSOs, as opposed to a narrower segment meeting the inclusion criteria for a given program or placement.
Yet, some concerns about generalizability still exist. In the present sample, 79.6% of JSOs were described in their case file material as non-Hispanic White. In contrast, the 2000 census reported that non-Hispanic Whites accounted for 69.1% of the U.S. population (U.S. Census Bureau, 2001). Also, this sample was composed entirely of JSOs from Iowa, and all were male. As a consequence, one cannot assume generalizability of the results from this sample to those in other geographic areas with different racial or ethnic compositions or to female JSOs in Iowa or elsewhere.
A second strength of the study pertains to the extensive training and oversight the research assistants received during the course of the entire study. Assistants received several hours of didactic training, paired practice with several real cases followed by discussion and additional training, and weekly meetings to discuss scoring issues and to prevent coder drift. The thorough training and oversight resulted in a high degree of inter-rater reliability observed across 50 reliability cases. That level of training and oversight is not often, if ever, found in real-world situations, and consequently, the level of inter-rater reliability observed in actual practice is likely to be lower. However, Epperson and Ralston (2006) reported results from an unpublished study of seven state evaluators who performed assessments in Utah following a one-day training workshop. After receiving the training, they all scored the same 17 cases, and the singular ICC for absolute agreement was .91. Given that the JSORRAT-II is relatively easy to score with appropriate training and given the results from the unpublished study, the possibility of high “real-world” inter-rater reliabilities exists.
As mentioned previously, risk assessments focus on static risk indicators, dynamic risk factors, or some combination of the two. The JSORRAT-II focuses exclusively on static risk indicators, and tools like the JSORRAT-II cannot track when risk has been reduced, such as through successful completion of treatment. In this sample, a substantial number of JSOs received treatment as a part of their sentence for their index sexual offense. However, the outcome of that treatment could not be ascertained with the methods of data collection used in this study. Thus, the present study could not address the interaction between initial predicted probability of recidivism, given a JSO’s JSORRAT-II score, and subsequent successful treatment completion.
Analyses revealed interesting interactions between age at index offense, time at risk, and JSORRAT-II total scores in predicting juvenile sexual recidivism. Overall, results indicated that the JSORRAT-II predicted juvenile sexual recidivism better for younger JSOs with longer periods at risk. Necessarily, these analyses were based on smaller subsamples, so it will be important to replicate these findings with other samples.
The final limitation is not unique to this study. Sexual offenses, in general, are underreported, and the results from recidivism studies are often underestimates of the true rates of recidivism because of the nature of these types of offenses (Hanson & Bussiere, 1998). Because of this, the results of this study must be interpreted knowing that not all first-time sexual offenders were detected by justice authorities initially and not all recidivists were detected after entering the justice system for their index sexual offense.
Given the strengths and limitations of this study, several future studies seem warranted. First, the JSORRAT-II has now been cross-validated successfully in two different states. However, to assess the generalizability of that predictive validity, studies must be conducted in other states that have different geographic locations, judicial practices concerning JSOs, and racial and ethnic compositions.
Second, the predictive validity of the JSORRAT-II was assessed only for juvenile sexual recidivism. Epperson and colleagues (2006), using the JSORRAT-II development sample, found that the tool did less well in predicting adult sexual recidivism. However, those results need replication before concluding that the JSORRAT-II is or is not appropriate for such types of longer term decisions. It is possible to follow the JSOs in this study for longer times at risk, into adulthood. Consequently, we will seek to assess the performance of the JSORRAT-II for longer term predictions of risk in a future study. Although we are not optimistic about the ability of the tool to make accurate longer term predictions, this is an empirical question.
Third, and relatedly, we advanced a hypothesis that the reason the JSORRAT-II did not perform well with older JSOs was because of limited time at risk, relative to younger JSOs. Future studies with longer follow-ups should be able to test that hypothesis. As well, such researchers might also assess the predictive validity for different times at risk (e.g., 2 years, 5 years, 10 years) to determine the “shelf-life” of predictions made by the JSORRAT-II.
Fourth, Letourneau, Armstrong, Bandyopadhyay, and Sinha (2012; Letourneau, Bandyopadhyay, Sinha, & Armstrong, 2009) found initial evidence for the impact of the application of adult sexual offender laws and policies to JSOs on prosecutorial decision making. Specifically, they found that after the introduction of sexual offender registration and notification laws in South Carolina prosecutors were less likely to move forward with charges for sexual offenses committed by juveniles, and for those who did move forward, there was an increased probability of a guilty finding. If that is the case, more modern samples might also be different from earlier samples in the proportion of JSOs who have prior non-adjudicated sexual offense histories, some of which might be documented through official sources. The JSORRAT-II does not include information from documented but uncharged sexual offenses in the scoring of the first six, offense-related items. Consequently, scores might be deflated in newer samples by excluding the information from those offenses that in earlier times would have resulted in an official charge. Researchers should investigate the impact of including information from documented but uncharged sexual offenses in the scoring of the first six items and the impact of those scoring changes on the predictive validity of the JSORRAT-II.
Conclusions and Recommendations
The JSORRAT-II has now been successfully cross-validated in two states using large, exhaustive samples. For those states (Utah and Iowa), the tool, along with a psychological and needs assessment, can productively inform a range of decisions such as placement, programming, and treatment decisions. Its use to inform similar decisions outside of states where it has been validated should be considered experimental at this time. Additional, planned studies will help clarify the predictive accuracy of the JSORRAT-II in other jurisdictions and potentially expand its usefulness. All uses of the JSORRAT-II, experimental or otherwise, require that the tool be scored accurately and reliably. High levels of reliability were achieved through the procedures used in our laboratory. However, fieldworkers also demonstrated high reliability following a daylong scoring workshop.
The JSORRAT-II has not been validated as a predictor of adult behavior, so all risk assessments based on the JSORRAT-II necessarily expire no later than the JSO’s 18th birthdate. Accordingly, it would not be appropriate or informative to use JSORRAT-II scores to justify the longer term consequences of some contemporary sexual offender laws. Contrarily, the aggregated research data in the three state-wide samples seem to support arguments for exempting juveniles, or at least the vast majority of juveniles, from laws with longer term consequences.
Footnotes
Acknowledgements
This research was completed in collaboration with the Iowa Juvenile Court and the Department of Human Rights, Division of Criminal and Juvenile Justice Planning. Special thanks to Gary Niles, Tom Southard, and Laura Roeder-Grubb for coordinating the identification and transportation of case file information and the collection of recidivism data.
Authors’ Note
Douglas L. Epperson and Christopher A. Ralston developed the Juvenile Sexual Offense Recidivism Risk Assessment Tool–II (JSORRAT-II), the tool examined in this study. The JSORRAT-II is in the public domain and use is free. Douglas L. Epperson is now at the College of Liberal Arts, California Polytechnic University, San Luis Obispo.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by funding from the Iowa Department of Human Services.
