Abstract
Risk tools containing dynamic (potentially changeable) factors are routinely used to evaluate the recidivism risk of justice-involved individuals. Although frequent reassessments are recommended, there is little research on how the predictive accuracy of dynamic risk assessments changes over time. This study examined the extent to which predictive accuracy decreases over time for the ACUTE-2007 and the STABLE-2007 sexual recidivism risk tools. We used two independent samples of men on community supervision (NStudy 1 = 795; NStudy 2 = 4,221). For all outcomes (sexual, violent, and any recidivism [including technical violations]), reassessments improved predictive accuracy, with the largest effects found for the most recent assessment (i.e., those closest in time prior to the recidivism event). Based on these results, we recommend that ACUTE-2007 assessments occur at least every 30 days and that the STABLE-2007 assessments occur every 6 months or after significant life changes (e.g., successful completion of treatment).
Risk assessment is a central component of effective correctional interventions. Correctional officers use risk assessment tools to identify individuals at high risk for reoffending. Risk assessment tools also help indicate which individuals may require more intensive interventions (Risk-principle) to manage their criminogenic needs and reduce the likelihood of reoffending (Need-principle; Andrews & Bonta, 2010; Bonta & Andrews, 2017; Hanson et al., 2015). A wide variety of structured risk assessment tools have been developed to classify the risk levels of individuals adjudicated for different types of offenses (e.g., violent, sexual, and general offenses; Bourgon et al., 2018; Kelley et al., 2020; Neal & Grisso, 2014). Such structured risk assessment tools are similar in predictive accuracy for measuring the risk to reoffend (Campbell et al., 2009; Hanson & Morton-Bourgon, 2009; Tully et al., 2013; Yang et al., 2010).
Recidivism risk assessment tools include factors that can be classified as static or dynamic (Bonta & Andrews, 2017; Hanson, 1998). Risk scores with static risk factors—which are fixed features of individuals, like demographics or their criminal history—can inform estimated recidivism risk and intervention strategies (Hanson et al., 2017). Static risk tools, however, are poorly equipped to assess changes related to reductions or increases in risk-relevant factors (Hanson, 1998; Harris & Hanson, 2010). Dynamic risk tools, in contrast, include risk factors that are amenable to change or intervention. Consequently, scoring dynamic tools may require more professional expertise and time than simple static risk tools; however, dynamic risk tools have greater potential for supporting inferences concerning treatment needs and case formulation (Polaschek & Yesberg, 2018; Wong et al., 2009). The items measured within a dynamic assessment can help identify the characteristics of an individual that are conducive to change and that, when targeted, can reduce their overall risk of reoffending.
Calling something a dynamic risk factor does not make it so. Kraemer and colleagues (1997) recommended classifying risk factors based on their evidence base. The general term risk factor would describe characteristics for which individual differences in these characteristics are associated with differences in the relative frequency of outcomes in prospective studies. For a characteristic to be a variable risk factor, there must be evidence that the characteristics changes and that these changes track the likelihood of the outcome. They reserved the term causal risk factor for variable risk factors that have been deliberately changed in experimental studies. Using Kraemer et al.’s framework, the widespread use of the term dynamic risk factor in correctional rehabilitation is premature. Characteristics have been routinely asserted as dynamic risk factors (or criminogenic needs) prior to having the necessary scientific credentials (“putatively dynamic”; Douglas & Skeem, 2005).
There is widespread agreement that the risk to reoffend changes (Blumstein & Nakamura, 2009; Hanson, 2018; Harris & Rice, 2007); however, less is known about these risk reductions are related to changes in dynamic risk factors. Recent studies have found that reassessment of dynamic risk assessment tools improves the prediction of criminal recidivism (Babchishin & Hanson, 2020; Cohen et al., 2016; de Vries Robbé et al., 2015; Howard & Dixon, 2013; Labrecque et al., 2014; Lloyd et al., 2020; for an exception, see Viljoen et al., 2017). Specifically, reassessment has been found to add incremental predictive validity to initial risk assessments, and the most proximal risk assessments predicted reoffending best. Promising results have also been found for dynamic risk tools designed specifically to measure sexual recidivism (Babchishin & Hanson, 2020; van den Berg et al., 2018). Given that the most recent risk score predicts reoffending better than the initial risk score (e.g., Babchishin & Hanson, 2020; Hanson et al., 2021; Lloyd et al., 2020), it follows that the predictive accuracy of dynamic tools degrades after a certain amount of time has passed (i.e., the decay of predictive accuracy due to risk-relevant change).
Hanson and Harris (2000) further divided dynamic risk factors into stable and acute. Stable dynamic risk factors are durable, changing infrequently over months to years (Hanson & Harris, 2000; Serin et al., 2019). Examples of stable dynamic risk factors include emotion regulation, impulse control, problem-solving, and work ethic (Andrews & Bonta, 2010; Polaschek & Yesberg, 2018; Zamble & Quinsey, 1997). Acute dynamic risk factors, in contrast, are those that change quickly, over minutes to days (e.g., access to victims and substance abuse). In Quinsey’s model, acute risk factors do not inform long-term recidivism potential—they only indicate when reoffending is most likely (Quinsey et al., 2006; Zamble & Quinsey, 1997). Subsequent research, however, has found that the average level of acute risk factors incrementally improves recidivism prediction over static/stable baseline assessments (Hanson et al., 2007; Lloyd et al., 2020). Consequently, acute factors are not just unfortunate events; they should also be considered current expressions of enduring risk-relevant propensities (Mann et al., 2010).
Dynamic Predictions of ACUTE-2007 and STABLE-2007
ACUTE-2007 (Brankley et al., 2019; Hanson et al., 2007) and STABLE-2007 (Fernandez et al., 2014; Hanson et al., 2007, 2015) are two dynamic tools designed to assess the likelihood of sexual recidivism. Both ACUTE-2007 and STABLE-2007 are widely used by correctional officers, forensic evaluators, and mental health practitioners (Bourgon et al., 2018; Hill & Demetrioff, 2019; Kelley et al., 2020; Neal & Grisso, 2014). STABLE-2007 includes 13 dynamic items assessing factors such as atypical sexual interests, emotional identification with children, and relationship stability. ACUTE-2007 evaluates imminent indicators of risk, such as preoccupation with sexual fantasies and victim access (Harris & Hanson, 2010). The chronic propensities associated with elevated ACUTE-2007 scores are considered to be essentially the same as those assessed by STABLE-2007. These criminogenic propensities can be broadly grouped into sex crime–specific factors (atypical sexual interests, emotional congruence with children, and low sexual self-regulation) and general criminality (antisocial peers, hostility, impulsivity, and opposition to supervision; Brouillette-Alarie & Hanson, 2015).
Research on the predictive accuracy of ACUTE-2007 and STABLE-2007 has focused on scores from the first risk assessments conducted. These studies have found that the initial assessment of these dynamic risk tools predicted recidivism and still discriminated between recidivists and nonrecidivists up to 5 years later (Brankley et al., 2021; Hanson et al., 2007, 2015; Nitsche et al., 2022). Furthermore, the first dynamic risk assessment scores of these risk tools incrementally contributed to predictive accuracy after accounting for static scores (e.g., Static-99/R; Babchishin & Hanson, 2020; Brankley et al., 2021; Hanson et al., 2007; Helmus et al., 2021; Nitsche et al., 2022). Consistent with the findings of other dynamic change studies (e.g., Hanson et al., 2021; Lloyd et al., 2020), reassessment with ACUTE-2007 improves predictive accuracy (Babchishin & Hanson, 2020). Specifically, the most proximal scores of the ACUTE-2007 risk tool predicted sexual recidivism better than the first assessment scores (Babchishin & Hanson, 2020).
Risk tool developers recommend a regular reassessment of ACUTE-2007 (e.g., at each meeting with supervisees) and STABLE-2007 (e.g., every 6–12 months), and this practice of reassessment is currently implemented in the field of community corrections (Brankley et al., 2019; Fernandez et al., 2014; Hanson et al., 2007). There is, however, a lack of empirical evidence on how often reassessments of dynamic risk should be conducted for optimal predictive accuracy. This current study seeks to ascertain the extent to which the predictive accuracy of these measures changes after specific periods have elapsed. Within Kraemer et al.’s (1997) framework, this is a core requirement for characteristics to be considered variable risk factors and, ultimately, causal risk factors.
Cox Regression With Time-Dependent Covariates for Dynamic Predictions
Although there is no conventional approach to evaluating dynamic predictions, one common method is Cox regression survival analysis with time-dependent covariates (Altman & de Stavola, 1994; Singer & Willet, 2003). There are several advantages to using this method to examine whether including dynamic scores improves prediction. First, Cox regression survival analysis manages the incomplete follow-up time (Singer & Willet, 2003). In longitudinal data analyses, the follow-up time for each individual typically varies due to different start times (i.e., the date they are released into the community in a given study) and end times (e.g., the date they reoffended, died, or discontinued the study). Second, Cox regression does not limit the number of assessments and allows unequal time intervals between assessments. In contrast, the pre–post change analyses, which are standard in studies of the institutional treatment, require only two assessments with similar intervals before and after the treatment. Despite being commonly used, it is difficult to differentiate true change from measurement error (e.g., regression to the mean) with pre–post change analyses (Singer & Willet, 2003).
Cox regression allows for time-dependent covariates, such as dynamic risk scores. Time-dependent covariates, however, require models that impute the expected score at the time of the outcome. The need for models forces data analysts to consider how risk scores change, which, in turn, produces different models that can be tested against each other. For example, Cox regression can compare the predictive accuracy of the following ways of defining dynamic risk scores: (a) use only the score from the first assessment (a “static” one-time assessment of potentially dynamic factors), (b) update the value of the predictor with each new assessment (the fully dynamic model), or (c) use scores from different time periods prior to the recidivism event (e.g., 30 and 180 days). This last type of analysis (using a range of periods) can identify the extent to which there is decay in predictive accuracy as assessments get more distal from the recidivism event.
Current Study
The purpose of this study was to explore the extent to which the predictive accuracy of two common dynamic sexual recidivism risk assessment tools (ACUTE-2007 and STABLE-2007) changes over time. Specifically, we compared the predictive accuracy of these risk tools in a fully dynamic model and when the assessments were conducted within 30, 45, 60, 120, or 180 days prior to recidivism.
Hypotheses
A recent study (Babchishin & Hanson, 2020) found that the most recent ACUTE-2007 assessments predicted recidivism better than the first assessment and were also more accurate than other methods of predicting recidivism (e.g., examining the highest or lowest risk score). Based on these findings, we hypothesized that the predictive validity of the ACUTE-2007 would diminish over time. As well, given the results from other similar studies (Lloyd et al., 2020; Stone et al., 2021), we hypothesized that the predictive validity of the STABLE-2007 would diminish over time but that it would diminish at a slower rate than the acute dynamic tool.
General Method
Overview
The present research included two independent samples: Study 1 used the developmental sample (i.e., the Dynamic Supervision Project [Hanson et al., 2007]) for the ACUTE-2007 and STABLE-2007 tools; Study 2 used an administrative, field validity sample, which was built for day-to-day supervision of clients from British Columbia (BC) Corrections. Sample descriptions are provided in the respective Participant sections. In the following section, we report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study (Simmons et al., 2012).
Measures
ACUTE-2007
The ACUTE-2007 is an empirically-derived risk tool used to assess and track rapid changes in sexual reoffending risk over time by evaluating acute dynamic risk factors for adult males who were charged with or convicted of a sexually motivated offense (Hanson et al., 2007). The ACUTE-2007 has seven items (e.g., victim access, sexual preoccupation, and substance abuse). The items are considered to represent the current expressions of chronic, risk-relevant factors (Fernandez et al., 2014), and the total scores are calculated by summing all item scores (ranging from 0 to 21; higher scores indicate higher acute dynamic risk). Research suggests that it measures one latent factor, and the measurement model is invariant across time (Babchishin & Hanson, 2020). Reassessment of the ACUTE-2007 at each scheduled meeting with supervisees is recommended. The scoring of the ACUTE-2007 requires an additional 5 to 10 minutes for a routine supervisory session (Hanson et al., 2007).
In the development sample (i.e., the Dynamic Supervision Project, overlaps with Study 1’s sample), the intraclass correlation (ICC) for individual ACUTE items at first assessment ranged from .64 to .95, with a median of .90 (k = 75; Hanson et al., 2007). Furthermore, the ACUTE-2007 was previously found to predict sexual, violent, and any recidivism and to add predictive accuracy above that of static risk tools (Hanson et al., 2007; Nitsche et al., 2022). In a recent study, the most recent ACUTE-2007 score or the average of all previous ACUTE-2007 scores were more predictive of recidivism than the first ACUTE-2007 score or the most extreme ACUTE-2007 score (smallest or largest; Babchishin & Hanson, 2020). In this study, ACUTE-2007 total scores were only calculated for individuals who had no missing items, given that there are only seven items of ACUTE-2007.
STABLE-2007
STABLE-2007 was designed to measure stable dynamic risk factors for adult males who were charged with or convicted of a sexually motivated offense (Fernandez et al., 2014; Hanson et al., 2007). STABLE-2007 is one of the most widely used measures of dynamic risk for sexual recidivism (Kelley et al., 2020; McGrath et al., 2010). STABLE-2007, for example, is used by probation officers in England, Ireland, and Wales to identify risk-relevant issues for their case reports and improve their confidence and consistency in decision-making (McNaughton Nicholls et al., 2010; Walker & O’Rourke, 2013).
STABLE-2007 has 13 items (e.g., cooperation with supervision, deviant sexual interests, emotional identification with children, impulsivity), and the total scores are calculated by summing all item scores (ranging from 0 to 26 or 0 to 24 for individuals who did not offend against a child who was less than 14 years old). Higher scores indicate higher stable dynamic risk. STABLE-2007 is scored by trained evaluators (e.g., parole officers and psychologists) based on information collected during an interview and a review of available file information and, if possible, consultation with collateral informants (e.g., spouse). The interview usually takes 90 to 120 minutes, although the time decreases with increased experience and prior knowledge of the case (Fernandez et al., 2014). Reassessment of the STABLE-2007 every 6 to 12 months is recommended (Fernandez et al., 2014).
In the development sample (i.e., the Dynamic Supervision Project, which overlaps with Study 1’s sample), the ICC for individual STABLE items at the first assessment ranged from .66 to .92, with a median of .93 (k = 74; Hanson et al., 2007). A meta-analysis based on 21 studies (n = 6,955) from Canada, United States, United Kingdom, and Austria concluded that considering the STABLE-2007 along with the Static-99R (a static, actuarial risk tool) significantly improves the prediction of sexual, violent, and any recidivism (Brankley et al., 2021). In this study, as recommended by the STABLE-2007 user guidance manual, scores were calculated for individuals only if there was no more than one item with missing information (e.g., the emotional identification with children item; Fernandez et al., 2014).
Recidivism Definitions
We examined three different recidivism outcomes: sexual recidivism, violent recidivism, and any recidivism (including technical violations). Sexual recidivism was defined in Study 1 as any crimes with sexual motivation (contact and noncontact offenses) after release, whether or not the name of charge/conviction was explicitly sexual (e.g., break and enter conviction, but the nature of crime shows individual was motivated to commit a sexual assault). The definition of sexual recidivism in Study 1 also included sexual breaches, defined as official sanctions for sexually motivated violations of the conditions of community supervision (e.g., being in the company of children contrary to a supervision condition; see Hanson et al., 2007, for more details). In Study 2, sexual recidivism was identified only when the name of charge/conviction (contact or noncontact sexual offenses) explicitly included sexual motivation (e.g., sexual assault; see Helmus et al., 2021, for more details).
Violent recidivism was defined as all crimes that involved a confrontation with the victim, including contact sexual offenses but excluding noncontact sex offenses and sexually motivated breaches. Any recidivism was defined as all sexual, violent, or nonviolent crimes, as well as all technical offenses (e.g., violation of conditional release), regardless of whether they were sexually motivated or not. The category of “any recidivism” incorporated the above two recidivism categories (sexual and nonsexual violent recidivism), with the addition of nonviolent offenses and technical offenses.
Procedure
The data sets were constructed in a “person-period” format for discrete-time survival analysis (Singer & Willet, 2003). That is, there was a separate assessment record (i.e., a new row) for each new time period of risk assessments within an individual level. Each individual could have multiple risk assessment records (in multiple rows) in order from the initial assessments (i.e., baseline) to the time that they were released into the community. This format accommodates varying periods of each assessment, with the subsequent assessment date marking the end date of the previous assessment.
Furthermore, we organized the risk score into equally projected forward timeframes (i.e., 30, 45, 60, 120, and 180 days). First, the given projected times were artificially forwarded from the first assessment of each individual, and any assessments scored within this first projected time were replaced by the first assessment score. For example, assume that an individual had a score of 2 at the first assessment after release and the second assessment occurred after 15 days from the first assessment (a score of 4). With the 30-day timeframe, the risk score of 4 at the second assessment is replaced by the score 2 at the first assessment.
Next, the closest risk assessments from the previously projected time were selected as the next assessment, and any assessments scored within the projected time were replaced by the first score of the time frame. The same process is repeated until the last assessment. The person-period format presents outcome information (e.g., recidivism events) in the last row for everyone. Given the different lengths of the projected time across models, more recidivism events occurred within longer projected time frames.
Plan of Analysis
Harrell’s C index
Harrell’s C index was used to compare the predictive accuracy (discrimination) across different fixed follow-up timeframes as it estimates the probability that, in a randomly selected pair of individuals, the individual with a higher risk score will reoffend before the other (Harrell et al., 1996). Harrell’s C is calculated from survival data and does not require fixed follow-up times. Harrell’s C can vary between 0 and 1, with .50 indicating the level of prediction that would be expected by chance. Given its similarity to the area under the curve (AUC), similar interpretations of effect size magnitudes are applicable (i.e., the effect of .56 is small, .64 is moderate, and .71 is large; Helmus & Babchishin, 2017; Rice & Harris, 2005). The Harrell’s C index analysis was conducted using the R function survConcordance of the “survival” package (Version 3.1-11; Therneau, 2020) from the statistical software R (Version 4.0.0; R Core Team, 2013). Although the C values provide some guidance concerning the relative predictive accuracy of the different models, the C values cannot be directly compared using standard statistical tests because the comparisons would involve nonnested models, each with a different number of recidivists (Volinsky & Raftery, 2000).
Cox Regression With Time-Dependent Covariates
A series of Cox regressions with time-dependent covariates were conducted to examine the extent to which different models that integrate multiple assessments perform better at prediction (Singer & Willet, 2003). Three models were tested: (a) initial baseline scores (first assessment since released), (b) reassessment scores every 30 days and every 180 days, and (c) fully dynamic scores as assessed in the field within the time frames. For example, consider an individual who was assessed on March 1, then again on March 14, and was known to have reoffended on March 20. In the 30-day analysis, the risk scores used for Cox regression would be those from the March 1 assessment. For the fully dynamic model, the risk scores would be those from March 14. If the individual was known to have reoffended on April 15, the individual would be considered a nonrecidivist in the 30-day and a recidivist in the 180-day analysis.
Given the nonnested models, comparing models requires fit indices. One of the most commonly used fit indices is the Bayesian Information Criterion (BIC; Raftery, 1995; Volinsky & Raftery, 2000). The BIC starts with the difference in the observed and predicted values (as indexed by -2 time the log-likelihood [-2LL]) and then adds a penalty proportional to the number of predictor variables (BIC = -2 LL + [k*ln(n)], where k is the number of parameters and n is the number of recidivists; Raftery, 1995; Volinsky & Raftery, 2000). Smaller BIC values suggest better fitting models. BIC differences of 0–2, 2–6, 6–10, and 10 and higher, respectively, represent “weak,” “positive,” “strong,” and “very strong” evidence of model fit (Gordon, 2012).
Study 1
Participants
This study included 795 individuals from a study of community supervision outcomes known as the Dynamic Supervision Project (Hanson et al., 2007, 2015) that was used to develop the ACUTE-2007 and STABLE-2007 tools. The original Dynamic Supervision Project study included approximately 1,000 individuals who committed sexual offenses from Canada and two U.S. states (Iowa and Alaska). Only the Canadian samples were included in this study because there was no reliable recidivism information from the U.S. states. As well, women (n = 6) were deleted, given that the recidivism rates of women would be expected to differ from the rates for men, and these tools have only been validated on men. All individuals in this study were adult males starting a period of community supervision (probation or parole) between 2001 and 2005 following a conviction for a sexual offense with a follow-up period until 2011. On average, the individuals were 39 years old (SD = 13.5, ranging from 18 to 84 years). Study 1’s sample also overlapped with the recent study (Babchishin & Hanson, 2020) but with slightly different sample size. Specifically, the Babchishin and Hanson (2020) study included individuals who had two or more assessments (N = 632), whereas Study 1 included anyone who had at least one assessment (N = 795).
Approximately 20% of the individuals self-identified as being of Indigenous heritage, and 5% had previously been diagnosed as developmentally delayed (low intellectual functioning). In this sample, 72% (566/789) had no prior charges or convictions for a sex offense. About 45% of the sample had victimized children who were 12 or younger, 33% had victimized adults who were 18 or older, and 12% had committed noncontact sexual offenses.
Recidivism and Measures
Information concerning new offenses was gathered from reviews of provincial and national criminal records, as well as from supervising officers, local police jurisdictions, and searches of newspaper databases. The average length of follow-up was 6.5 years (SD = 2.6, Mdn = 7.5, ranging from 0.1 to 10.1 years). The sample had a total of 6,656 ACUTE-2007 and 1,243 STABLE-2007 assessments. The average number of ACUTE-2007 assessments per individual was 9.1 (SD = 9.0, Mdn = 6.0, ranging from 1 to 69), and of STABLE-2007 assessments per individual was 1.6 (SD = 0.9, Mdn = 1.0, ranging from 1 to 5; Table 1). Average time between ACUTE-2007 assessments was 35 days (SD = 38, Mdn = 28) and 242 days (SD = 100, Mdn = 203) for the STABLE-2007 (Table 1).
Descriptive Information for the Samples
Note. SD = Standard deviation.
Average scores of the first assessment.
Results and Discussion
ACUTE-2007
The first analysis examined whether reassessment improved prediction (Table 2). For these analyses, recidivism was counted only if it occurred within 180 days of the last assessment. An individual who reoffended after 181 days would be counted as a nonrecidivist. Three models were tested with this data structure: (a) baseline (i.e., the first assessment as a “static” variable), (b) the fully dynamic model in which the scores were updated with each new assessment, and (c) the 180-day model in which the scores were updated only once during each 180-day period. The fully dynamic model fits the data best, followed by the 180-day model for all the recidivism outcomes (sexual, violent, and any recidivism [including technical violations]). The first baseline assessment was consistently the weakest predictor of recidivism, although still statistically significant. In Study 1, the differences between the fully dynamic model and the 180-day model for the ACUTE-2007 were not strong (BIC values of 4.4, 5.0, and 5.2, for sexual, violent, and any recidivism [including technical violations], respectively).
Predictive Accuracy of ACUTE-2007 and STABLE-2007 Total Scores With Three Different Models [180-Day Model]
Note. C: Harrell’s concordance index. †: Reference group (the best fitting model). ΔBIC: BIC difference, with 0 to 2 “weak,” 2 to 6 “positive,” 6 to 10 “strong,” and 10+“very strong” evidence for a better model fit. Bolded represents statistically significant predictors (p < .05). CI = confidence interval; BIC = Bayesian Information Criterion.
As a second examination of the value of reassessment, the data set was reorganized such that recidivism was only counted if it occurred within 30 days of the last assessment. The same three models were then tested: (a) baseline (first) assessment, (b) fully dynamic, and (c) scores updated every 30 days. As can be seen in Table 3, the baseline model was again consistently the worst fit to the data. There were only small differences between the fully dynamic and the 30-day model. In Study 1, the 30-day model was best for all recidivism outcomes. The BIC differences, however, tended to be small, particularly for sexual recidivism and violent recidivism (ΔBIC < 3.0). Nevertheless, there was sufficient evidence that reassessment improved prediction to justify examining patterns of decay in predictive accuracy for different time frames.
Predictive Accuracy of ACUTE-2007 and STABLE-2007 Total Scores With Three Different Models [30-Day Model]
Note. C: Harrell’s concordance index. †: Reference group (the best fitting model). ΔBIC: BIC difference, with 0–2“weak,” 2–6 “positive,” 6–10 “strong,” and 10+“very strong” evidence for a better model fit. Bolded represents statistically significant predictors (p < .05). CI = confidence interval; BIC = Bayesian Information Criterion.
The 30-day and dynamic models were identical because there were no reassessments within the 30-day timeframes.
The next set of ACUTE-2007 analyses (Table 4) compares the predictive accuracy of risk scores over time periods ranging from 30 days to 180 days. For each time period, recidivism events that occurred after the time limit was not considered. Consequently, the number of recidivism events varied for each analysis, being the largest for the 180-day time period and the smallest for the 30-day time period. In Study 1, there was a consistent pattern such that ACUTE-2007 scores that had shorter projected timeframes (30 and 45 days) had a higher predictive accuracy compared with the 120 days or 180 days projections. For example, for any recidivism (Table 4; Study 1), the C values were .73 and .72 for 30 and 45 days, respectively, compared with .69 at 120 days and .66 at 180 days. For ACUTE-2007 in Study 1, the rank order correlation (Kendall’s tau) between the C values and the length of the timeframe was τ = -.467 (expected ranks [1–5] nested within outcomes).
Predictive Accuracy of ACUTE-2007 and STABLE-2007 Total Scores With Different Time Projections (Study 1)
Note. C: Harrell’s concordance index. Bolded represents statistically significant predictors (p < .05). CI = confidence interval.
Total of 6,604 ACUTE-2007 assessments for sexual recidivism, 6,656 assessments for violent recidivism, and 6,238 assessments for any recidivism. Kendall’s tau between recency and C values = -.467 for ACUTE-2007. b Total of 1,239 STABLE-2007 assessments for sexual recidivism, 1,243 assessments for violent recidivism, and 1,216 assessments for any recidivism. Kendall’s tau between recency and C values = -.933 for STABLE-2007.
Consistent with similar analyses of another version of this data set (Babchishin & Hanson, 2020), the current analyses found strong evidence that reassessing using ACUTE-2007 improves prediction over the initial baseline assessment. Current user recommendations are to score ACUTE-2007 after each meeting with supervisees during community supervision, but not more than once a week (Brankley et al., 2019). In Study 1, most assessments were scored approximately 1 month apart (Mdn = 28 days), a frequency of contact consistent with supervisory practices in Canada at that time. The current analyses found the highest predictive accuracy for assessments 30 days or 45 days prior to the recidivism event and that there was no advantage for the fully dynamic model over the 30-day projections. There would be little empirical difference, however, between the fully dynamic model and the 30-day projections because the average gap between assessments was approximately 30 days. Nevertheless, the results support current recommendations to rescore the ACUTE-2007 every 30 days. Although decay was evident between the fully dynamic model and the 180-day projections, the 45-day projections had similar effect sizes to the 30-day projections.
STABLE-2007
Direct comparisons between different dynamic models were not possible in Study 1, given that most cases were assessed only once. It was possible, however, to examine how the predictive accuracy of STABLE-2007 changed based on proximity to the recidivism events. As can be seen in Table 4, the closer the assessment to the recidivism event, the greater the predictive accuracy. There was a strong linear relationship between the rank order of the follow-up time and the size of Harrell’s C (Kendall tau = -.993, with expected ranks nested within recidivism types). The C values for the short timeframes (30 and 45 days) were large (.79 to .90) and larger than the AUC values typically observed for recidivism risk tools (which are typically in the .70 range). The results from Study 1 generally support the value of reassessments of the STABLE-2007. Direct comparisons between different timeframes, however, require more assessments, which were available in Study 2.
Study 2
Participants
Study 2 included 4,221 adult males who were provincially sentenced for a sexual offense (i.e., <2 years) and supervised in the community between 2005 and 2013 by BC Corrections (see Helmus et al., 2021 for more details). The follow-up period was until 2013. Like Study 1, the individuals in Study 2 were, on average, 40 years old (SD = 13.7, ranging from 18 to 90 years), and 73% (3,057/4,166) had no prior charges or convictions for a sex offense. Of the total sample, 63% were White, 22% were from Indigenous heritage, and 15% were from other ethnocultural groups (e.g., Black, East Asian, and Hispanic).
Recidivism and Measures
The recidivism information included all charges and convictions within the province of BC up to June 4, 2013. Charges occurring outside BC would not have been included in this study. The average length of follow-up was 4.6 years (SD = 2.5, Mdn = 4.5, ranging from 0.1 to 8.5 years). The ACUTE-2007 and STABLE-2007 assessments were completed by probation officers between December 13, 2004, and June 4, 2013. During the follow-up time, the sample had a total of 56,091 ACUTE-2007 assessments; the average number of assessments per individual was 13.7 (SD = 11.1; Mdn = 11.0; ranging from 1 to 85; Table 1). The sample had a total of 11,101 STABLE-2007 assessments, and the average number of assessments per individual was 2.6 (SD = 1.8; Mdn = 2.0; ranging from 1 to 12; Table 1). Average time between ACUTE-2007 assessments was 40 days (SD = 76, Mdn = 29) and 245 days (SD = 213, Mdn = 190) for the STABLE-2007 (Table 1).
Results and Discussion
ACUTE-2007
Like Study 1, Study 2 found strong evidence that reassessment improves predictions. As can be seen in Table 2, the 180-day reassessments were better than the baseline assessments, and the best fitting models were fully dynamic scores (i.e., updated with each new assessment). All differences between the dynamic and the baseline model were very strong (ΔBIC of 13.6 to 110.2), as were most of the differences between the dynamic model and the 180-day reassessments (ΔBIC of 9.1 to 84.1).
When the data set was reorganized for 30-day projections (i.e., recidivism at 31 days was not counted; Table 3), the dynamic model was still meaningfully better than the baseline model (ΔBIC of 8.0–66.8). There was little difference, however, between the fully dynamic ACUTE-2007 and the 30-day reassessments for the outcomes of sexual recidivism or violent recidivism (ΔBIC < 3). For any recidivism, there was strong evidence (ΔBIC of 7.8) that the fully dynamic model better fit the data than the 30-day reassessments.
As can be seen in Table 5, the ACUTE-2007 model with the shortest projected timeframes showed larger C values than longer projected timeframes. The same pattern applied to all three recidivism outcomes (sexual, violent, and any recidivism [including technical violations]). The correlation between the rank order of the follow-up time and the size of Harrell’s C was large (Kendall tau = -.533, with expected ranks [1–5] nested within recidivism types).
Predictive Accuracy of ACUTE-2007 and STABLE-2007 Total Scores With Different Time Projections (Study 2)
Note. C: Harrell’s concordance index. Bolded represents statistically significant predictors (p < .05). CI = confidence interval.
Total of 56,091 ACUTE-2007 assessments for sexual recidivism, 52,716 assessments for violent recidivism, and 46,479 assessments for any recidivism. Kendall’s tau between recency and C values = -.533 for ACUTE-2007. b Total of 11,101 STABLE-2007 assessments for sexual recidivism, 10,372 assessments for violent recidivism, and 9,300 assessments for any recidivism. Kendall’s tau between recency and C values = -.800 for STABLE-2007.
Consistent with Study 1, there was strong evidence the ACUTE-2007 reassessments improve prediction. In terms of decreases in predictive accuracy over time, the most recent assessments were the most accurate; however, there was little difference between the fully dynamic model and the 30-day projections for sexual and violent recidivism (results favored the fully dynamic model). For any recidivism, there was strong support for the fully dynamic model over the 30-day reassessments. Although there would be little empirical difference between the dynamic model and the 30-day reassessments (the median gap between assessments was 29 days), it is possible that acute variables have a different relationship to general recidivism than to sexual or violent recidivism (see section “General Discussion”).
STABLE-2007
The models with fully dynamic STABLE-2007 scores were better fitting models for sexual, violent, and any recidivism than the baseline models (ΔBIC = 7.78 to 17.60; Table 2). Compared with the 180-day reassessments, the fully dynamic STABLE-2007 scores performed similarly for sexual recidivism (ΔBIC = 1.71), were a somewhat better fit for violent recidivism (ΔBIC = 5.75), and a much better fit for any criminal recidivism (ΔBIC was 14.47; Table 2).
When considering 30-day projections (Table 3), 30-day reassessments were a somewhat better fit than the baseline model for sexual, violent, and any recidivism (ΔBIC of 4.4, 5.0, and 5.8, respectively). These comparisons were similar in magnitude to those between the baseline model and the 180-day model (ΔBIC of 6.1, 11.5, and 3.13, respectively; Table 2). The fully dynamic model was identical to the 30-day model because no cases had more than one STABLE-2007 assessment in the 30 days prior to a recidivism event.
Consistent with the findings for the ACUTE-2007, the STABLE-2007 model with the shortest follow-up timeframe (30 days and 45 days) had the highest predictive accuracy for sexual, violent, and any recidivism (Table 5). There was a strong correlation between the predictive accuracy (Harrell’s C) and the time from the recidivism event: accuracy diminished as the follow-up timeframes became longer (e.g., Kendall tau = -.800, with expected ranks nested within recidivism types).
General Discussion
To implement effective interventions that reduce the likelihood of reoffending among those on community supervision, it is crucial that dynamic risk factors (i.e., criminogenic needs) are accurately assessed (Andrews & Bonta, 2010; Andrews & Dowden, 2006; Andrews et al., 1990). Many forensic evaluators working with individuals serving sentences in the community (e.g., parole and probation officers) currently use dynamic risk assessment tools and regularly reassess the risk of reoffending risk during routine supervision practice. The ACUTE-2007 and STABLE-2007 are the most used dynamic risk tools for this purpose (Bourgon et al., 2018; Hill & Demetrioff, 2019; Kelley et al., 2020; Neal & Grisso, 2014). The extent to which reassessment is associated with greater predictive accuracy, however, had not been previously examined. Using novel statistical analyses and two independent samples, this study found that predictive accuracy increased the closer the assessment was to the recidivism event. Nevertheless, the baseline assessments remained predictive (with moderately large effects) in all analyses. There was no time limit (in the range studied) at which the assessments failed to predict recidivism. Consequently, decisions concerning the ideal reassessment period need to balance the increased accuracy of recency against the costs and administrative burdens of frequent, repeated assessments.
There are some important strengths of this study. First, evidence for consistent reassessments of dynamic risk assessment tools was supported by two independent field samples of individuals who were serving part of their sentence in the community and were under supervision (i.e., no cohort effect). Furthermore, the assessment results could have real consequences on the individuals’ assessment (e.g., increased intensity of supervision, home visits). Given this, the current findings were derived from two field validity studies (Edens & Boccaccini, 2017) that support their applied use in other criminal justice field settings, particularly within Canada. Second, some previous research has focused on whether score changes at two-time points (e.g., pre-and post-treatment) predict reoffending (e.g., de Vries Robbé et al., 2015; Olver et al., 2007; Vose et al., 2013) or whether reassessment scores that are projecting varying time period predicts reoffending better than the initial (baseline) scores (Viljoen et al., 2017). This study was able to add to this research by evaluating whether there were patterns of declining predictive accuracy across different timeframes with models that more closely resembled the typical contact timeframes outlined within the case management plans of correctional staff. Finally, this study directly and also indirectly compared the patterns of declining predictive accuracy between an acute dynamic risk tool (ACUTE-2007) and a stable dynamic risk tool (STABLE-2007). This study contributed to our understanding of these commonly used dynamic risk tools and elucidated their capacities to assess the likelihood of reoffending for individuals on community supervision at varying assessment periods.
Overall, we found that the ACUTE-2007 and STABLE-2007 predicted sexual, violent, and any recidivism in both samples. Higher predictive accuracy was observed in the developmental sample (Study 1) than in the administrative sample (Study 2); these findings were anticipated for Study 1 in particular, given the greater breadth and depth of information available upon which the dynamic risk assessments were based (e.g., comprehensive recidivism information, well-trained correctional officers, more comprehensive evaluations, see the Dynamic Supervision Project; Hanson et al., 2007, 2015).
Across the samples, this study showed that the dynamic version of ACUTE-2007 and STABLE-2007 predicted sexual, violent, and any recidivism better than the first assessments of those tools. In other words, reassessment of dynamic risk tools can improve the prediction of recidivism risk. The current findings are consistent with a growing body of recent research supporting the reassessment of dynamic risk assessment tools for general and sexual recidivism (e.g., Babchishin & Hanson, 2020; Hanson et al., 2021; Lloyd et al., 2020).
As hypothesized, we also found consistent patterns of declining predictive accuracy for the ACUTE-2007 and STABLE-2007 risk tools as projected timeframes of risk scores became longer, particularly for STABLE-2007 (an average Kendall’s tau of .87 for STABLE-2007 vs. .50 for ACUTE-2007). Specifically, Harrell’s C values fell as the projected timeframes became longer (30–180 days) for all three different types of recidivism assessed (i.e., sexual, violent, and any recidivism). The direct comparisons of different models also supported that the closer the proximity, the greater predictive accuracy. Reassessment of ACUTE-2007 every 30 days and every 180 days both showed better predictions for recidivism than the first assessments; however, the evidence was stronger for the scores assessed every 30 days (i.e., greater ΔBIC against the first assessments).
For both samples, there was stronger evidence for the value of reassessing ACUTE-2007 than STABLE-2007. Reassessment of the ACUTE-2007 every 30 days improved the prediction for sexual, violent, and any recidivism compared with the baseline scores. Although more frequent assessments of the ACUTE-2007 shorter than 30 days did not improve the prediction for sexual and violent recidivism, the two models (30 days vs. fully dynamic) were likely based on similar data, given the average time between ACUTE-2007 assessment was about 30 days.
Direct comparisons indicated that reassessment of the STABLE-2007 every 180 days improved the prediction for sexual, violent, and any recidivism compared with the baseline scores. As well, there was a strong relationship between the recency of the STABLE-2007 (180 to 30 days) and its predictive accuracy. The direct comparison between the baseline and 30-day assessments also favored the 30-day assessments, but not by much. Consequently, the study provided only weak evidence concerning the pattern of decay in the predictive accuracy of STABLE-2007 scores. Nevertheless, the overall pattern is consistent with the predictive accuracy of the ACUTE-2007 scores declining more quickly and more obviously than the STABLE-2007 scores. This pattern would be expected given that ACUTE-2007 is intended to assess rapidly changing features, whereas stable dynamic factors are conceptualized as relatively enduring qualities.
The overall findings support the consistent reassessment of the ACUTE-2007 and STABLE-2007 to improve their predictive accuracy and to better inform frontline community supervision officers. Specifically, given the ease of rescoring ACUTE-2007 (5–10 minutes), the current recommendation to rescore ACUTE-2007 every meeting or at least 30 days seems reasonable. For STABLE-2007, however, updating scores requires more extensive effort (e.g., interview and review of file information). Consequently, the decision about when to update the STABLE-2007 needs to balance increasing predictive accuracy with more recent assessments against the cost of new assessments. The current recommendation is either 6 or 12 months. Based on the current results, there is support for the shorter of these two options 6 months) or after major changes (e.g., successfully completing treatment) associated with their risk-relevant characteristics.
Within Kraemer et al.’s (1997) framework, the current results support STABLE-2007 and ACUTE-2007 as measures of variable risk factors. Previous research has demonstrated that single assessments of these risk tools predict recidivism (e.g., Hanson et al., 2007; Helmus et al., 2021; Nitsche et al., 2022), and that there is intraindividual change (Babchishin & Hanson, 2020). The current results support Kraemer et al.’s additional criteria that the changes in risk scale scores are associated with changes in the likelihood of the outcome. It has not been established that these are causal risk factors. Within correctional rehabilitation, dynamic risk factors are considered the treatment and supervision targets. In other words, reducing the levels of dynamic risk factors such as substance abuse and sexual preoccupation are assumed to reduce the likelihood of recidivism. Kraemer et al. restrict the use of the term causal risk factors, however, to variable risk factors that can be experimentally manipulated. Theirs is a limited view of causality (e.g., see Pearl, 2022; Rubin, 1974). In general, inferences concerning causality are supported by coherent, empirically-supported causal models (see Aalen et al., 2008, Chapter 9).
There are multiple plausible explanations why later assessments may be more informative than earlier assessments, and none are mutually exclusive. Individuals may have changed on the latent constructs measured by these risk tools and, as such, the most recent assessment provides the most up-to-date information. It is also possible that there has been no true score change; instead, raters are getting better at scoring the tools, such more recent assessments are more accurate than previous assessments (increased true-score reliability). Improved application of the risk tools could come from increased practice with these tools, increased familiarity with the individual assessed, or both.
Limitations
The 95% confidence intervals of Harrell’s C values for each of the timeframes were quite broad due to the small number of sexual recidivism events. The confidence intervals were, thus, overlapping (low statistical power; Type II error), even for a study with 4,221 participants (Study 2). It was not possible to directly compare Harrell’s C values across the models because the sample sizes varied.
In Study 1, most individuals on community supervision had only one STABLE-2007 assessment (at baseline) over the follow-up time; therefore, it was not possible to conduct three model comparisons (i.e., baseline vs. 180 days vs. fully dynamic). Consequently, analyses focusing on the STABLE-2007 could only be drawn from the administrative sample (Study 2).
Previously, the ACUTE-2007 was found to assess the same underlying constructs across follow-up time (measurement invariance over time; Babchishin & Hanson, 2020), but an assumption of measurement invariance of STABLE-2007 has not been tested yet. Understanding whether findings regarding changes in dynamic scores could be attributed to true changes, as opposed to measurement errors (Asparouhov & Muthén, 2009), is an important topic for future research.
Low reliability impedes the predictive ability of the tools; thus, the lower predictive accuracy of dynamic risk assessment over time might be attributed to the lower level of reliability of the initial assessment. Interrater reliability of the ACUTE-2007 and STABLE-2007 total scores were, however, not available throughout the reassessment events; we were not able to investigate how the variations of inter-rater reliability over time would attribute to the lower levels of prediction accuracy. Nevertheless, dynamic tools, including the ACUTE-2007 have been found to be measurement invariant (Babchishin & Hanson, 2020; Miner et al., 2023). In other words, studies have found that change in dynamic tools reflects a change in the latent constructs assessed by the risk tool rather than a change attributed to measurement error. As such, the different pattern of predictive accuracy is most likely attributed to individual change rather than later assessment being more reliable.
Individuals who commit different types of sexual crimes (e.g., rapists vs. child molesters) might show different decay rates of the predictive accuracy of dynamic risk tools. We were unable to test for this in this study. Some practical limitations also exist when implementing more frequent assessments in correctional/forensic practices. The repeat assessments would add extra cost and administrative burdens that may not be feasible for all the corrections institutions.
Implications for Research
More studies are needed with a higher quantity of reassessments of the STABLE-2007 to replicate the results of Study 2. In particular, the research could profitably examine whether the declines in predictive accuracy are slower for STABLE-2007 than for ACUTE-2007. Although the current results could be interpreted as supporting this hypothesis, the pattern of results was not entirely consistent.
Several studies have found that dynamic risk tools like the STABLE-2007 and the ACUTE-2007 incrementally predict sexual recidivism above and beyond the contributions of static risk tools, such as Static-99R (Brankley et al., 2021; Hanson et al., 2007; Helmus et al., 2021). Some researchers have provided estimated recidivism rates for risk predictions completed with both static and dynamic tools (Brankley et al., 2017; Static-99R and STABLE-2007). When static and dynamic risk assessments are assessed in combination, the risk levels of the static risk tool have typically been adjusted by the first STABLE-2007 assessment (Brankley et al., 2017, 2019); this study suggests that adjusting Static-99R using the most recent assessment may provide a more accurate assessment than using the first assessment.
This study only examined discrimination (or relative risk), not calibration (the match between expected and observed values; Helmus & Babchishin, 2017). Although several studies (including this study) have found that later assessments improve relative risk (discrimination) compared with earlier assessments, little is known about how reassessments should inform absolute risk estimates. Given that the recidivism risk is expected to decline the longer individuals remain offense-free in the community (Hanson, 2018; Hanson et al., 2018), it is likely that the relationship between dynamic risk factors and absolute recidivism risk also changes over time.
Conclusion
The current research suggests that the predictive accuracy of dynamic risk assessment decreases over time. As such, regular reassessments of dynamic risk tools assist psychologists and corrections officers in evaluating an individual’s risk. Acute dynamic tools appear to benefit from reassessment more frequently than stable dynamic tools. Intervention plans should be updated according to the reassessment results for the most effective management of criminogenic needs, such as psychological problems (e.g., emotional collapse and sexual preoccupation) and/or situational changes (e.g., victim access and loss of employment).
Footnotes
Acknowledgements
The authors thank Leigh Greiner (BC Corrections) for providing access to this data set and L. Maaike Helmus for merging the administrative data sets and saving us considerable work in the process. The authors thank Amel Loza-Fanous for her feedback on an earlier version of this article.
Authors’ Note
The opinions, findings, conclusions, and recommendations expressed are those of the authors and do not necessarily reflect those institutions. Part of the work was done while K. M. Babchishin and K. P. Mularczyk were employed at Public Safety Canada.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: R. Karl Hanson is a co-author and a certified trainer for STABLE-2007 and ACUTE-2007. The Government of Canada holds the copyright for these measures, and none of the authors receives royalties from these measures.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
