Abstract
This study builds on a long-standing debate focusing on whether structured professional judgment (SPJ) or empirically based methods of risk estimation are more valid and reliable measures of future behavior by comparing three different measures of risk. Data were collected from the Structured Assessment of Violence Risk in Youth administered to a sample of 177 adjudicated juvenile offenders prior to being placed on probation. Three measures of risk were examined: an empirically derived measure of risk using latent class analysis, a violence risk based on SPJ, and a nonviolent delinquency risk based on SPJ. The ability of each measure to predict probation-related outcomes and recidivism was also addressed. Results provide moderate support for the continued use of the SPJ framework and highlight the need for future research regarding risk assessment procedures in juvenile justice settings.
Juvenile justice-focused risk assessment instruments have undergone significant evolution and, today, a number of standardized screening and assessment instruments have been developed (Catchpole & Gretton, 2003; Otto & Douglas, 2010; Welsh, Schmidt, McKinnon, Chattha, & Meyers, 2008). Based on Andrews, Bonta, and Hoge’s (1990) principle of risk-needs-responsivity, risk assessment instruments are currently used in the juvenile justice system to identify who needs treatment (risk), what needs should be targeted (needs), and what treatment strategies should be employed (responsivity). The ultimate goal of these instruments is to assist juvenile justice decision makers in identifying which youth may be a threat to public safety and which youth are most likely to benefit from intervention (Vincent, Guy, Fusco, & Gershenson, 2011).
Many dispositional decisions and intervention plans are based either directly or indirectly on information gathered through the risk assessment process. Therefore, understanding the most effective method of risk assessment is critical to ensuring that juvenile justice agencies are meeting the risk-needs-responsivity principles (Andrews & Bonta, 2010). Given the potential iatrogenic effects of juvenile justice intervention noted in previous studies (Gatti, Tremblay, & Vitaro, 2009; Lowenkamp & Latessa, 2005; Petrosino, Turpin-Petrosino, & Buehler, 2003), overestimation of risk could lead to negative outcomes for low-risk offenders who are mandated to participate in treatment programs that are not necessary. On the other hand, underestimation could result in potential threats to community safety and increased societal costs due to future crime and violence that could have been prevented. Thus, accurately estimating risk is a key element to accomplishing the overall goals of the juvenile justice system—rehabilitation of juvenile offenders and protecting public safety.
A long-standing debate exists regarding the most effective method for assessing risk among offenders involved in the criminal justice system (Ægisdóttier et al., 2006; Andrews, Bonta, & Wormith, 2006). This debate focuses on whether the structured professional judgment (SPJ) framework, which allows practitioners to estimate risk using their judgment based on relevant information gathered during administration of the assessment instrument, or empirically based techniques, which rely on a numerical formula, is a more reliable and valid method for predicting future behavior. This study builds on this debate by comparing SPJ estimations of risk to empirically derived estimations using the Structured Assessment of Violence Risk in Youth (SAVRY) (Borum, Bartel, & Forth, 2003).
Risk Assessment Approaches
In the 1980s and early 1990s, a juvenile’s potential for violence was often based solely on unstructured professional judgment where dangerousness was perceived as a dichotomous construct that was either present or absent within a given individual (Borum, 2000). These “assessments” tended to be highly subjective and unreliable (Pedersen, Rasmussen, & Elsass, 2010) and, as a result, new procedures were developed to assist professionals in the assessment of risk for reoffending.
In an effort to eliminate the subjective nature of unstructured clinical judgment, actuarial assessments were developed to predict future risk based on an empirically derived risk score. Traditionally, empirically based assessments mainly consisted of static risk factors that viewed “dangerousness” as a stable dispositional construct. A great deal of research has focused on determining whether clinical judgment or an actuarial formula is a more accurate predictor of antisocial behavior. In general, empirically based measures have been shown to outperform unstructured clinical judgment among samples of adult offenders (Ægisdóttier et al., 2006; Grove, Zald, Lebow, Snitz, & Nelson, 2000). However, a number of criticisms of actuarial assessments have been noted in the literature. These criticisms include (a) an inability to assist in case of specific intervention decisions (i.e., needs assessment and responsivity strategies; Shlonsky & Wagner, 2005), (b) insensitivity to a changing environment and protective factors (Borum, 2000; Litwack, 2001), and (c) a focus on aggregation (i.e., behavior of a group) and not on the unique characteristics of each individual offender (Ansbro, 2010; Wandall, 2006). Given the array of risk and protective factors related to criminal offending in adolescence, these criticisms seem particularly relevant to actuarial assessment strategies used in the juvenile justice system. Thus, empirically based methods may not be able to effectively capture the constellation of factors that determine “risk” and “need” among juvenile offenders.
Based on these criticisms, risk assessment instruments based on SPJ have been developed in recent years for use in the juvenile justice system (Borum, 2003; Vincent, 2006). These assessments of risk try to blend the advantages of both the actuarial and clinical judgment approaches. Specifically, SPJ ratings are based on defined guidelines that assist the professional in making the final judgment of risk on the basis of static and dynamic risk factors that have been shown to be empirically related to recidivism, their relevance to the individual case, and the intervention program (Pedersen et al., 2010). Thus, SPJ methods attempt to take into account the interaction between factors that increase and those that decrease the likelihood of reoffending during adolescence.
In general, meta-analytic studies have suggested that actuarial measures and SPJs of risk demonstrate similar, yet moderate, levels of accuracy in predicting future behavior (Guy, 2008; Yang, Wong, & Coid, 2010). In regard to juvenile offending, the SPJ framework has been shown to be a valuable tool when estimating future behavior (Borum, Lodewikjs, Bartel, & Forth, 2010; Pedersen et al., 2010), particularly when it comes to predicting violence (Hoge, 2002; Lodewijks, Doreleijers, & de Ruiter, 2008). The ability to capture both “risk” and “need” and to assist in case management planning by accounting for the constellation of risk and protective factors that is unique to each youth’s developmental trajectory is also considered a benefit of the SPJ framework. As a result, this framework seems to have great utility in estimating risk for adolescents (Schwalbe, 2008; Vincent, Chapman, & Cook, 2011). According to Pedersen, Rasmussen, and Elsass (2010, p. 75), the SPJ framework “… is today considered good practice in the aim of identifying, treating, and managing violence risk.”
However, one important aspect of the SPJ framework that is not well understood is how practitioners use the information gathered during the assessment process to estimate risk (Schwalbe, 2008). Although this framework is based on structured guidelines meant to assist the administrator in making an accurate judgment, research has documented that human service professionals do not always choose to use assessment tools as intended (Krysik & LeCroy, 2002; Lyle & Graham, 2000). At the same time, research suggests considerable variability in decisions made by juvenile justice agents, including bias toward minority youth (Lieber, Bishop, & Chamlin, 2011; Pope & Leiber, 2005). Of particular concern to the current study is how discretion extends to the ways in which information about risk and protective factors is collected and used to form judgments about the probability of reoffending. Thus, what remains unclear is how practitioners integrate subjective impressions with information gathered based on item responses.
Therefore, the overall goal of the current study is to compare estimations of risk based on the SPJ framework with estimations of risk based on a statistical analysis of item responses. Recent advances in mixture modeling techniques, particularly latent class analysis, allow for the estimation of “classes” or categories of risk based on observed response patterns in the data. This technique is based on the assumption that patterns in the data can be accounted for by underlying groupings, or latent classes, of individuals (McCutcheon, 1987; Muthén, 2002). From a practical standpoint, latent class analysis could provide jurisdictions with a tool for identifying different categories of risk, based on qualitative (i.e., relevant risk factors) or quantitative differences (i.e., level of risk present), using the individual SAVRY items. The different “risk” categories may then be differentially linked to the available intervention services and levels of risk for reoffending. These preidentified categories could help probation officers assign individual youth into a particular “risk group” based on his or her item responses.
The utilization of item responses and mixture modeling techniques permits the comparison of empirically derived risk estimates to SPJ ratings to assess the variability in assigned risk level and the accuracy of these different procedures in predicting future behavior. Thus, comparing risk estimates based on latent class analysis to estimates based on SPJ will provide valuable information regarding the degree to which relying on probation officers’ judgment, instead of empirical formulas, influences an individual’s assigned risk of reoffending and which methodology provides a more accurate estimation of future behavior.
The SAVRY
Risk ratings in the current study were obtained using the SAVRY. The SAVRY was developed for use by professionals who conduct assessments or make interventions and supervision decisions concerning youth (Borum et al., 2003). The SAVRY includes items measuring historical, social/contextual, and individual-level risk factors as well as protective factors that have been found to be empirically related to violence and delinquency (see Appendix for a list of SAVRY items). The administrator of the instrument considers each individual item, the applicability of each item to the specific adolescent’s circumstances, and then estimates a final summary risk rating (SRR; Borum et al., 2003). The reliability of the SAVRY’s method for assessing risk has been supported by a number of empirical studies (Catchpole & Gretchen, 2003; Lodewijks, de Ruiter, & Doreleijers 2010).
The SAVRY SRR has also been found to have strong predictive validity for general and violent recidivism (Penney, Lee, & Moretti, 2010; Vincent, Guy, Fusco, et al., 2011; Welsh et al., 2008). Borum, Lodewikjs, Bartel, and Forth (2010) reviewed 15 empirical studies testing the predictive validity of the SAVRY for violent offending. Area under the curve (AUC) estimates averaged .74–.80 across the studies. However, most studies examining the predictive validity of the SAVRY have relied on samples of incarcerated adolescents, which typically consist of serious offenders who are either chronic or violent offenders. At the same time, the use of the SAVRY by probation personnel when making decisions about supervision level, treatment referrals, and developing a case management plan has increased in recent years. Yet, adolescents placed on probation are typically adjudicated for less serious or nonviolent offenses when incarceration is not considered necessary for rehabilitation or community safety. Thus, one shortcoming to the existing body of research supporting the predictive validity of the SAVRY is a lack of studies based on adjudicated juvenile offenders placed on probation. This study overcomes this limitation by examining the predictive validity of the SAVRY among a sample of juvenile offenders placed on probation.
Additionally, the ability to predict reoffending is only one aspect of the predictive validity of the SAVRY. Less is known about its predictive validity for youth’s outcomes while they are under juvenile justice system supervision. Due to an increased focus on identifying delinquency risk and need early in the juvenile justice process, the SAVRY also serves as a critical tool at postadjudication, predisposition to assist juvenile probation officers in making decisions regarding probation supervision level, therapeutic referral needs, and an appropriate length of time on probation. As a result, it is important to understand how well the SAVRY performs at predicting juvenile justice system outcomes, including the ultimate outcome of probation and the length of time the youth was involved in the system. Therefore, an additional goal of the current study is to compare the differences in system-related outcomes, as well as recidivism, across an empirically derived risk estimate and estimations based on SPJ (i.e., SRRs) to identify which risk estimation procedure is a more accurate estimation of behavior.
Current Study
The current study was conducted as part of a larger reform effort, funded by the John D. and Catherine T. MacArthur Foundation, to enhance evidence-based assessment practices in the juvenile justice system in Louisiana. Local probation officers in one parish (i.e., county) in Louisiana were trained to administer the SAVRY during postadjudication, predisposition planning. As part of the standard training, administrators of the SAVRY rated all items on the SAVRY, and, using the structured professional judgment framework, determined an overall violence risk rating. However, during training, some users of the instrument expressed a need for two SRRs: nonviolent delinquency SRR and violence SRR. That is, probation officers were concerned that many of the factors related to probation and intervention outcomes were not related to aggressive and violent behavior. Although research tends to demonstrate that adolescents who are violent are also more likely to engage in nonviolent delinquent behavior (Farrington, 1998; Loeber & Farrington, 1998), this is not always the case, especially for youth who are involved in the juvenile justice system for nonserious or nonviolent offenses (Bartel, 2008). Thus, a second SRR was developed specifically for this project: the nonviolent delinquency SRR (Bartel, 2008). Therefore, the current study compares three risk estimations—a newly developed SRR focusing on nonviolent behaviors, the standard violence SRR used in previous studies, and an empirically derived risk measure based on statistical analyses of responses to SAVRY items.
This study adds to the current body of literature on risk assessment in the juvenile justice system in two important ways. First, this study examines consistency across different risk estimation procedures using information gathered from the SAVRY items. Specifically, we assess consistency (or disparity) across individuals’ assigned ratings of risk for future violence based on SPJ, risk for future nonviolent delinquency based on SPJ, and an empirically derived measure of risk based on item responses. Second, we examine differences in system-related outcomes and recidivism across the different estimation approaches. Overall, this study adds to the literature by providing important information regarding the accuracy of SPJs in estimating the risk of future behavior.
Method
Participants
The purpose of data collection was to evaluate the implementation of the SAVRY in one parish’s local probation department located in southeastern Louisiana. 1 The final sample includes 177 youth who were administered a postadjudication, predisposition SAVRY and were released from probation in June 2009 through September 2010. Table 1 describes the characteristics of the full sample. Roughly three fourths of the sample was male and 72% was Black. The average age of the sample was 16 (SD = 1.4). Thirty-six percent of the sample was on probation for a misdemeanor, 32% for a felony, and 32% for a status offense.
Description of Sample (n = 177).
Note. SRR = summary risk rating; SAVRY = Structured Assessment of Violence Risk in Youth. aThere was one case with missing offense information.
Procedures
The data collection process occurred over a 6-month period from September 2010 through February 2011. Information was collected from multiple sources. First, as a standard procedure of the juvenile probation department, all probation officers are required to fill out a two-page data form when a youth is released from probation. This form tracks information pertaining to each probation case including demographic information, adjudicated offenses, SAVRY results, and probation outcome information. On a routine basis, these forms are entered into a database for administrative purposes. Next, a systematic coding process was developed for review of the probation paper files. The goal of this phase of data collection was to obtain information that was missing from the two-page data form. The coders collected information on youth’s adjudicated offense/offenses, ratings for each individual SAVRY item and SRRs, and adjudicated offense information noted in the court documents. Last, probation department personnel were granted access to the local sheriff’s office database to collect data on all new arrests within 6 months following release from supervision.
Measures
SAVRY
As described above, the SAVRY was designed for use as a guide to assessing risk of future violence and delinquency and to aid in probation case management and intervention planning for adolescents aged 12–18 (Borum et al., 2003). The SAVRY is administered by trained probation officers. 2 The full assessment consists of 24 items measuring risk factors across three domains including historical risk factors (10 items), social/contextual risk factors (6 items), individual risk factors (8 items), and 6 additional items measuring protective factors (see Appendix for a list of SAVRY items). Responses to each of the historical, individual, and social/contextual items are rated as low (= 0), moderate (= 1), or high (= 2) and protective items are rated as present (= 1) or absent (= 0). Items within each domain were summed to create four indices representing each of the SAVRY domains: historical risk domain, individual risk domain, social/contextual risk domain, and protective factors (Cronbach’s α ranged from .64 to .82, bivariate correlations ranged from .52 to .76). These four indices were saved and used to estimate the empirically derived measure of risk.
Probation officers’ judgment of risk for violence (i.e., violence SRR) and nonviolent delinquency (i.e., nonviolent delinquency SRR) was determined by the probation officer’s professional judgment based on responses to each of the 30 items included in the SAVRY. Violence is defined by the SAVRY authors as “an act of battery or physical violence that is sufficiently severe to cause injury to another person or persons, regardless of whether injury actually occurs; any forcible act of sexual assault; or a threat made with a weapon” (Borum et al., 2003, p. 15). General nonviolent delinquency is defined as “nonviolent criminal behavior which may include theft, breaking and entering, auto theft, mischief, vandalism, drug trafficking, or fraud” (Bartel, 2008). 3 Each SRR has three categories: low risk, moderate risk, and high risk. Of the 177 participants, 45% were rated low risk for nonviolent delinquency, 23% were rated moderate risk for nonviolent delinquency, and 32% were rated high risk for nonviolent delinquency. For the violence SRR, 34% of the sample was rated low risk, 35% was rated moderate risk, and 31% was rated high risk. The correlation between the two SRRs was .79.
Probation Outcomes
Due to the low number of youth in several categories, the different reasons for probation release were collapsed into two general release types: probation completion and revocation (i.e., revocation/incarceration, transferred to adult court). Length of probation term represents the number of months the youth was actually on probation and was calculated by the number of months between the probation start date and probation end date recorded on the two-page data form. As described in Table 1, 69% of the youth in the sample successfully completed probation and 31% were released from probation unsuccessfully. 4 The average length of time on probation for participants was 15 months (SD = 10.9).
Recidivism
Recidivism was broadly defined as at least one new arrest within 6 months following release from supervision. Arrest information was based on officially recorded arrests entered into the local sheriff’s database. The 6-month period included any arrest for a delinquent offense, status offense, and/or arrests processed by the adult criminal justice system. Six-month re-arrest
Data Analysis
The analyses proceeded in several steps. The first step was the specification of a series of latent class models using the four SAVRY indices—historical risk domain, individual risk domain, contextual/social risk domain, and the protective factors. Analyses were conducted using Mplus 6.0 (Muthén & Muthén, 2010). Latent class analysis extracts latent “classes” or categories based on identified patterns in observed items. A number of measures that assess the overall fit and classification quality are used in this process. Lower values on the Bayesian Information Criterion (BIC), which is based on the log-likelihood value of the fitted model, suggest a better fitting model (Nylund, Asparouhov, & Muthén, 2007). The Lo–Mendell–Rubin (LMR) and bootstrapped likelihood ratio (BLR) tests compare a “k” class model to a “k − 1” class model (e.g., four classes vs. three). Lower observed probability values support the model with an additional class (Lo, Mendell, & Rubin, 2001; Nylund et al., 2007). Entropy, which measures the quality of classification based on observed responses, ranges from “0” to “1,” with values closer to “1” suggesting strong classification (Vermunt & Magidson, 2003). The classification table based on class probabilities for the most likely class membership is also an important indicator of the appropriate number of classes. High diagonal values and low off-diagonal values indicate good classification. Finally, the substantive meaning of the characteristics of the latent classes should also be considered when making a decision on the best-fitting model.
To accomplish the second and third objective of this study, the posterior probabilities of most likely class membership and a variable that contains the most likely class membership were saved for use throughout the rest of the analytic process. Differences across individuals’ risk level based on most likely latent class membership and the nonviolent delinquency and violence SRRs were assessed using bivariate analyses. Differences in reason for probation release, average length of time on probation, and 6-month re-arrest rates were also measured using χ2 and mean comparison bivariate analyses. Receiver–operating characteristic curves were used to estimate the accuracy of the different risk measures for predicting reason for probation release and 6-month re-arrest rates. In particular, AUC values which represent the probability that a youth who is randomly selected from the group of youth who exhibited a negative outcome (e.g., revocation) will have a higher risk level compared to a randomly selected youth from the group who did not exhibit the poor outcomes (e.g., completed probation; Vincent, Guy, Gershenson, & McCabe, 2012). AUC values range from 0 (perfect negative prediction) to 1.0 (perfect positive prediction; Douglas, Yeomans, & Boer, 2005).
Results
Latent Class Analyses
The initial process involved the specification of latent class models that ranged from two to five classes (see Table 2). Both the three- and four-class models had comparatively low BIC values and acceptable results for the LMR and BLR tests. The three-class model showed improvement in the quality of classification through a higher entropy value and average latent class probabilities for most likely class membership. Given the similarities in “categories” of risk outlined by the SAVRY’s SPJ framework and the three-class model, the substantive meaning of the three-class model compared with the four-class model was also considered. The three-class model was selected as the best-fitting model. The internal structure of the three-class model is presented in Table 3.
Model Summary From Latent Class Analyses.
Note. BIC = Bayesian Information Criterion; LMR = Lo–Mendell–Rubin; BLRT = bootstrapped likelihood ratio test. The values in bold represent the parameters of the model that was selected as the best fit.
Three-Class Model From Latent Class Analyses.
*p < .01. **p < .001.
The first class makes up 55% of the overall sample. Relative to the other groups, youth in this group had considerably lower scores on the historical, individual, and contextual domains and the highest levels of protective factors. Thus, this class represents a “low-risk” group. The means for this group also revealed lower average levels of historical, individual, and contextual risk factors and a higher average number of protective factors, compared to the sample as a whole. The second class, which comprises 27% of the sample, represents a “moderate-risk” group in terms of the average scores across the four SAVRY domains. The third class identified in the latent class analysis captured 18% of the sample. Relative to the others, this class is a “high-risk” group. These youth revealed the highest average levels of historical, individual, and contextual risk factors and the lowest average number of protective factors. As can be seen, the best-fitting model identified a model of risk that is similar, both quantitatively and qualitatively, to the prespecified categories outlined by the authors of the SAVRY for use during SPJ estimation of risk.
Level of Correspondence Across Methods for Determining Risk
Bivariate correlations revealed a correlation of .65 for the violence SRR and latent class risk classification and .72 for the nonviolent delinquency SRR and the latent class risk classification. Therefore, the next step in the analysis involved comparing the three-class model to the violence and nonviolent delinquency SRR to assess consistency (or disparity) in the different types of risk estimation across individuals included in the sample. Results are presented in Table 4. The association among the latent class categories and both the SRR for violence risk, χ2(4) = 84.75, p < .001, γ = .84, and the SRR for nonviolent delinquency risk, χ2(4) = 105.61, p < .001, γ = .89, was significant. Overall, the SPJ and latent class frameworks showed consistency in estimating risk level, particularly for lower risk youth. However, important differences in ratings of moderate versus high risk were also revealed. For example, 61% of the youth rated moderate risk for violence using the violence SRR were grouped into Class 1 (low risk); and 32% of youth rated moderate risk for violence were grouped into Class 2 (moderate risk) of the LCA model. Of the youth rated high risk for violence, just under half were grouped into Class 3 (high risk). Similar results were found across the nonviolent delinquency SRR and latent classes. Of youth rated moderate risk for nonviolent delinquency with the SRR, over half were grouped into Class 1 (low risk) and 44% were grouped into Class 2 (moderate risk). Fifty-two percent of the youth rated high risk for nonviolent delinquency were grouped into Class 3 (high risk), while 38% were grouped into Class 2 (moderate risk) and 11% were grouped into Class 1 (low risk). In general, ratings based on the SPJ framework were more likely to result in a higher risk rating compared to ratings based on the empirically derived measure of risk.
Bivariate Associations Among SAVRY SRRs and Three-Class Model (n = 177).
Note. SAVRY = Structured Assessment of Violence Risk in Youth; SPJ = structured professional judgment. SRR refers to the SAVRY summary risk rating, which is based on the SPJ framework. Row percentages are reported above. For example, of the youth rated low risk for violence based on the SPJ framework, 88.3% were placed in the low-risk class and 11.7% were placed in the moderate-risk class.
Comparative Predictive Validity of Risk Assessment Methods
The disparity found across individuals’ assigned risk score suggests that important differences in the utility of these methodological approaches may also exist. Therefore, the final step in the analysis focused on examining the relationship between the three measures of risk and system-related outcomes including reason for probation release, length of time on probation, and re-arrest within 6 months following release from supervision. All three measures of risk revealed significant differences across probation release reason in the expected direction (see Table 5). Low-risk youth were significantly more likely to complete probation successfully and high-risk youth were significantly more likely to unsuccessfully complete probation (i.e., revocation). However, the nonviolent delinquency SRR was the only risk estimate that revealed a statistically significant relationship to all three outcome measures. The average length of time on probation revealed a significant and positive, linear relationship with nonviolent delinquency SRR, (F(2) = 3.18, p < .05, η2 = .04, and violence SRR, F(2) = 4.27, p < .05, η2 = .05). In addition, there was a significant association between nonviolent delinquency SRR and 6-month re-arrest, χ2(8.01, p < .05.
Bivariate Association Among SAVRY Risk Levels and Probation Outcomes.
Note. SAVRY = Structured Assessment of Violence Risk in Youth; SD = standard deviation; SRR = summary risk rating. The total sample (n = 177) is included in the months of probation comparison.
aFour cases were missing probation release reason (n = 173).
bFifteen cases were missing arrest information (n = 162).
AUC estimates for reason for probation release were as follows: latent class model AUC = .73; violence SRR AUC = .79; and nonviolent delinquency SRR = .79. For 6-month re-arrest, the AUC values were somewhat lower: .57 for the latent class model, .58 for the violence SRR, and .62 for the nonviolent delinquency SRR. Taken together, these results suggest that the nonviolent delinquency SRR may be a slightly more useful measure of system-related outcomes for youth placed on probation. However, only minor differences in the proportion of youth who had their probation revoked and the proportion of youth who recidivated were revealed across the three-risk estimates (see Table 5). At the same time, the AUC values for reason for probation release and re-arrest were similar and suggested that all three risk measures used in the current study revealed moderate levels of predictive ability.
Discussion
The goal of this study was to compare an empirically derived measure of risk to ratings based on SPJs using information obtained from the SAVRY. Differences across individuals’ assigned risk levels as well as the association among each of the risk estimates and probation outcomes and recidivism were examined. Latent class analysis identified a three-class model representing low, moderate, and high risk, which was used as the empirically derived estimation of risk. In general, a moderate level of consistency across the risk estimation procedures was revealed. The majority of youth rated low risk for violence or nonviolent delinquency based on SPJs were also grouped into the “low-risk” latent class (∼88%). Disparity across measures was observed, especially for youth rated moderate or high risk for violence or nonviolent delinquency. The empirically derived method of estimating risk tended to place youth into a lower risk class compared to SPJ ratings made by the probation officer administering the SAVRY.
Based on these results, juvenile probationers included in our study, who exhibit at least a moderate level of risk for future violent or nonviolent delinquent behavior were more likely to receive a higher rating of risk when relying on SPJ. One possible explanation for this finding is related to the methodological differences in the procedures used to estimate risk. The empirically derived estimation of risk included in this study was based on SAVRY item responses only and assigned equal weight to each item. The SPJ framework, on the other hand, allows probation officers to consider each individual item as well as the applicability of each item to the adolescent’s circumstances. Key considerations for obtaining structured ratings from individual items are the frequency, recency, and severity of each item. As such, structured professional risk ratings reflect probation officers’ professional judgment about the contexts of risk factors when determining risk ratings. The SPJ framework also allows administrators to consider additional factors not measured by the SAVRY items including adjudicated offense level, current legal status, and any additional information pertinent to the adolescent’s surroundings. As a result, the higher risk levels that resulted from SPJs may be directly related to the ability of the rater to consider additional factors and contexts that have a high probability of influencing future behavior but are not measured directly by SAVRY items. Based on this explanation, the SPJ framework seems to be performing as originally intended—to allow administrators, based on their professional judgment and empirically established criteria, to consider the unique constellation of risk and protective factors present in each offender’s circumstances. In other words, this framework allows the administrator to consider the “totality of the circumstances” unique to each offender, which may be leading to a higher estimation of risk for future violent or delinquent behavior.
On the other hand, it could also be argued that the SPJ framework has the potential to lead to inflated estimations of risk due to the use of discretion in the weight given to the SAVRY items as well as the consideration of factors not measured by the SAVRY items. A wealth of empirical research has highlighted racial and gender biases in juvenile justice decision making (Lieber, Bishop, & Chamlin, 2011; Pope & Leiber, 2005). Based on this body of research, it is possible that similar decision-making practices occur during risk assessment. This argument may be more applicable to the nonviolent delinquency ratings, given its recent development and lack of empirical evidence regarding inter-rater reliability. As a result, additional analyses were conducted to assess the differences across demographic characteristics (full results are available upon request). The empirically derived measure of risk was the only risk estimate to identify moderately significant differences in risk level across gender, χ2(2) = 6.31, p = .04, Cramer’s V = .19. Black youth were somewhat more likely to be rated high risk with both the violence SRR, χ2(2) = 8.53, p = .14, Cramer’s V = .22, and the latent class measure of risk, χ2(2) = 8.44, p = .02, Cramer’s V = .22, but no significant racial differences were noted across the nonviolent delinquency SRR categories, χ2(2) = 4.39, p = .11, Cramer’s V = .16. Thus, the findings of our study do not provide much support for the argument that SPJs would lead to biased estimates based on demographic characteristics.
Given the disparity in the estimations of risk across individuals, determining which estimation procedure is a more effective tool for estimating future behavior is critical. Nonviolent delinquency SRR was the only risk estimate that revealed a significant and positive association to all three outcome measures. Thus, our results provide a measure of support for the continued use of the nonviolent delinquency SRR to predict both probation-related outcomes and re-arrest rates. This finding is likely due to the fact that most adolescents placed on probation are adjudicated for nonserious or nonviolent offenses (Sickmund, Sladky, & Kang, 2011). Research also indicates that nonserious or nonviolent offenders involved in the system do not typically increase the severity of their behavior while involved in the system and are less likely to recidivate once released from supervision (Barrett, Katsiyannis, & Zhang, 2010; Cottle, Lee, & Heilbrun, 2001; Dembo et al., 1998). Thus, for first-time or nonviolent offenders, nonviolent delinquency SRR may be a more appropriate tool for estimating the risk of reoffending compared to empirically based estimates of risk or structured professional estimates of future violence.
However, similar to previous studies examining the predictive validity of risk assessment instruments (Corrado, Vincent, Hart, & Cohen, 2004; Vincent, et al., 2011), the measures of association used in the current analyses suggested that the strength of the association between nonviolent delinquency SRR and future behavior (i.e., recidivism and probation outcomes) was moderate at best. At the same time, although the nonviolent delinquency SRR was the only risk measure to reveal statistically significant associations with all three probation outcomes, minor differences in the distribution of the outcome measures across risk levels were revealed across the three estimates. For example, 46% of youth rated high risk for nonviolent delinquency had their probation revoked, compared to 43% of youth rated high risk for violence and 45% of youth placed into the high-risk class of the LCA model. Similarly, 59% of youth rated high risk for delinquency were re-arrested within 6 months, compared to 55% of youth rated high risk for violence and 61% of youth placed into the high-risk category of the LCA model. Furthermore, the empirically derived measure of risk placed a smaller number of youth into the high-risk group, while revealing a similar proportion of youth who exhibited one or more negative outcomes. Given the potential negative consequences of overestimation of risk, including stigmatization of offenders and mandating treatment that is not necessary, the incidence of false positives when relying on the nonviolent delinquency SRR is also a concern. Thus, additional studies should focus on identifying differences in the predictive accuracy across risk estimation procedures as well as the pragmatic differences when used in juvenile justice settings.
Before any definitive conclusions regarding the nonviolent delinquency SRR can be made, additional testing of the inter-rater reliability and predictive validity of the nonviolent delinquency SRR is critical. In particular, field tests of probation officers’ delinquency ratings, similar to Vincent, Guy, Fusco, and Gershenson’s (2011) examination of the violence SRR, are needed. Specifically, these authors compared juvenile probation officers’ SAVRY ratings to ratings by trained research assistants for 80 adjudicated offenders and found “excellent” intra-class correlations (ICCs) for violence SRR and each of the domains total scores (ICCs range from .67 to .86). Future research should also examine variations in the reliability of the SAVRY across probation officers. Specifically, two important questions regarding the inter-rater reliability of the SAVRY that remain unclear are related to understanding how practitioners apply their subjective judgment when estimating risk for future behavior and to what degree the “most reliable” SAVRY administrators differ in their estimation of risk for future violent and nonviolent delinquent behavior compared to probation officers that are estimating risk more inconsistently.
Another important question to answer is to what extent the nonviolent delinquency SRR offers additional information to juvenile justice practitioners, above and beyond what is already offered by the violence SRR. Given that roughly two thirds of the sample was placed on probation for either a misdemeanor or status offense, this study provides preliminary evidence that the nonviolent delinquency SRR may be a more accurate measure of risk among nonserious or nonviolent juvenile offenders placed on probation. Future studies should also compare the ability of risk assessment instruments designed to be used for empirical prediction to violent and nonviolent estimates of risk based on the SPJs to predicting both system-related outcomes and reoffending across different types of juvenile justice populations including lower risk youth placed on probation as well as higher risk youth placed in detention and residential facilities. It is possible that violence SRR or an empirically based measure of risk may be more effective in predicting future behavior for more serious offenders, particularly violent adolescents, while nonviolent delinquency SRR may serve as a more effective tool for youth placed on community supervision.
A few limitations to the current study should be noted. First, this study was based on one jurisdiction that recently implemented the SAVRY. At the time of data collection, probation officers were just starting to become familiar with the instrument and departmental policies related to the SAVRY were newly developed. Thus, the reliability and validity of the SRRs may change over more extended periods of use in the juvenile justice system. In addition, due to the length of the original project, the time frame for recidivism was limited to 6 months after release from custody. A longer time period, such as one or more years following release, is more desirable. More detailed arrest information would also provide important information regarding the predictive validity of the delinquency SRR across different types of offenses, particularly violent and nonviolent offenses.
Within the context of these limitations, this study provides a unique contribution to the existing debate regarding the most effective method for assessing risk among juvenile offenders. Our study provides support for the continued use of risk assessment instruments based on SPJ. In general, risk estimates based on SPJ tended to result in a somewhat higher level of risk which is most likely due to the ability of the professional to consider the unique circumstances of each adolescent offender—a characteristic that makes this method a more effective tool for targeting the needs of juvenile offenders. Additionally, the nonviolent delinquency SRR was the only risk estimate that was significantly related to both system-related outcomes and recidivism. If our findings are replicated on different samples of juvenile probationers, focusing on nonviolent delinquency risk determined by SPJ may prove useful for making appropriate decisions for less serious or nonviolent offenders placed on probation.
Footnotes
Appendix
Acknowledgments
The authors are grateful for their support. However, the research results reported and the views expressed in the article do not necessarily imply any policy or research endorsement by our funding agency. We would also like to thank Gina Vincent for her consultation during the data collection and analysis phase of this project. We would also like to recognize the essential contributions of the Jefferson Parish Department of Juvenile Services' Probation Department to this project.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: Preparation of this manuscript was supported by Grant #11-98149-000-USP funded by the John D. and Catherine T. MacArthur Foundation.
