Abstract
Research on recidivism prediction has made important advances, but the same cannot be said of research assessing relationships between risk changes over time or after treatment and subsequent reoffending. In realistic criminal justice situations, data linking changes in risk to recidivism are often fraught with problems due to missing data, irregular intervals in repeat risk assessments, and individual differences such as age and risk levels. Traditional statistical methodologies such as ANCOVA for repeated measures are not suited for analyzing data with these features. We presented four types of statistical modeling techniques that can effectively accommodate these noisier data: conventional regression, conditional regression, two-stage, and joint models. The two-stage models consist of multilevel growth model and conventional regression. The joint models refer to structural equational models. Two example data sets were used to illustrate the application of these methodologies.
The technology of offender risk assessment has advanced rapidly in recent years, especially with the development of tools that are sensitive to change. The application of statistical methods and models that can capture these advances, however, has lagged well behind. In this article, we illustrate the use of several well-established statistical approaches that are not yet widely used in criminal justice research. Such approaches are well suited to the research questions and data analyses pertaining to a newer generation of risk assessment tools.
The Research Context
Risk assessment tools have evolved from using solely static or historical predictors to changeable or dynamic predictors in part to assess changes in risk over time or with treatment. Tools that include dynamic factors have been developed to assess risk for general offenders, such as the Level of Service/Case Management Inventory (LS-CMI; Andrews, Bonta, & Wormith, 2004) and Violence Risk Scale (VRS; Wong & Gordon, 1999-2000), and for sexual offenders, the VRS–Sexual Offender version (VRS-SO; Wong, Olver, Nicholaichuk, & Gordon, 2003), the STABLE-2007 and ACUTE-2007 (Hanson, Harris, Scott, & Helmus, 2007), and those tools for special conditions, such as the Dynamic Risk Assessment for Offender Re-Entry (DRAOR; Serin, 2015) among others.
Important contributors to offender change over time are psychological treatments or other forms of effective interventions. A large body of research shows that treatment can reduce recidivism (Andrews & Bonta, 2010; Andrews, Bonta, & Wormith, 2011), but few studies link the changes themselves—whether in treatment or simply over time—to recidivism. An extensive recent review of the literature examining intra-individual treatment changes to subsequent recidivism showed limited positive results (Serin, Lloyd, Helmus, Derkzen, & Luong, 2013), although several well-established dynamic factors such as criminal attitudes, beliefs, and associates did show the expected associations with recidivism.
One possible explanation for some null findings is the use of unsuitable statistical models to assess multiple time-point measurements. For example, ANOVA or ANCOVA are more suited to main effect analyses than for assessing direct relationships between risk change and recidivism (Serin et al., 2013). Notwithstanding the statistical challenges, some recent studies, most of them subsequent to Serin’s review, have reported within-treatment risk reductions assessed over two or more time points linking to reductions in violent and sexual reoffending (Olver, Lewis, & Wong, 2013; Olver, Nicholaichuk, Kingston, & Wong, 2014; Olver, Wong, Nicholaichuk, & Gordon, 2007). In these lines of research, treatment participants demonstrated on average approximately half a standard deviation of risk change from pre to posttreatment. In turn, a one-point decrease in posttreatment score has been associated with about a 10% to 15% reduction in the odds of violent or sexual recidivism, holding baseline risk constant.
Gathering suitable data for these types of studies is clinically and statistically challenging. The risk levels—or the dynamic variables thought to measure risk levels—of treatment participants have to be assessed at least twice, usually pre- and posttreatment, and then offenders need to be tracked for an extended follow-up period, usually in the community. The tools used need to be sensitive to risk changes. There needs to be forces at work for change to occur such as participation in effective treatment, or, less precisely defined processes such as those associated with maturation. Finally, as we noted, suitable statistical methodologies are needed to determine if risk, risk change, and reconviction are linked as predicted, and it is these methods that are the focus of the present study.
Statistical analysis in correctional change studies is unusually challenging for several reasons. Risk assessments often occur at different time intervals for different individuals due to a myriad of operational factors beyond the control of the researcher. Missing data due to staff issues, participant attrition, and invalid assessments are inevitable. Differences in offenders’ baseline risk level, age, ethnicity, length of sentence, length of follow-up time, and other variables that may themselves predict outcome have to be considered. Study design can vary from prospective to retrospective designs. Outcome variables can vary from arrests, charges, returns to custody, parole revocations, and court adjudication, to reconviction for various types of offenses.
In addition to the shortcomings we already noted, conventional ANOVA or ANCOVA for repeated measures are limited in their ability to handle data with missing values. Most existing studies that have successfully examined risk-change links to long-term outcomes have used Cox regression hazard models to examine associations between changes in dynamic scores and recidivism outcomes after several years of follow-up (Brown, St. Amand, & Zamble, 2009; Howard & Dixon, 2013; Olver et al., 2007). For example, Olver et al. (2007) collected pre- and posttreatment VRS-SO scores on 321 sex offenders after a mean 10-year follow-up. Change scores were directly entered in the Cox regression model to assess their predictive efficacy for recidivism. This analysis, however, is limited to risk level measured at two waves, which allows one to calculate a difference in risk between the two time points so that the change score can be used as a covariate to predict recidivism.
Cox regression can also be used to analyze data with covariates that change over time. Brown et al. (2009), in a sample of 136 prisoners, assessed changes in their dynamic measure at three time points: prerelease, 1 month, and 3 months during a 10-month follow-up. They treated the dynamic risk measure repeated at three time waves as a time-dependent covariate and reported predictive accuracy of the dynamic change score. More recently, Howard and Dixon (2013) investigated the predictive accuracy of the dynamic component of the Offender Assessment System (OASys) violence predictor (OVP) tool in the routine assessments in the U.K. prison system of a sample of 196,493 prisoners followed up for a mean of 27.1 months. The number of reassessments can vary from one to 12 over a 48-month period. They also applied a Cox regression model in which the OVP dynamic scores, assessed multiple times, were treated as a time-dependent covariate. They reported significant associations of the change scores to the likelihood of violent recidivism. The Brown et al.’s (2009) and Howard and Dixon’s (2013) studies demonstrated that treating dynamic change scores as a time-dependent covariate in a Cox regression model is an effective method to handle multiwave measures assessed at regular (Brown’s study) or irregular (Howard’s study) time periods in the evaluation of links between change scores and recidivism.
Examples of More Advanced Statistical Models
However, in this article, we look at other alternatives to address the challenges in these longitudinal correctional data sets in terms of two-stage and joint models (JM), based on three types of advanced statistical models: (a) generalized linear models (GLM), (b) multilevel models (MLM) or hierarchical linear models (HLM), and (c) structural equation models (SEM). GLM (McCullagh & Nelder, 1989) extends classical regression models by joining linear and nonlinear regression models in a single computational framework for all types of outcomes, for instance, by combining linear regression for continuous scales, logistic regression for dichotomous outcomes, and Poisson models for integer count data. GLM is a powerful tool that can be used to handle different types of outcomes, and it is available for use in all major statistical software programs such as SPSS, SAS, and Stata. GLM has been widely applied in epidemiology and public health (Kirkbride et al., 2006; Yang, Wong, & Coid, 2013), social medicine (King et al., 2013; Yang & Coid, 2007), and medical and psychological investigations (Stroman, 2006; Yang, Coid, & Pan, 2005; Yang, Wong, & Coid, 2010). MLM (Goldstein, 1986), also referred to as HLM (Raudenbush & Bryk, 2002) or mixed models (Singer, 1998), was developed to fit dependent data with random effects due to a nested, or multilevel clustering structure. Data with repeated measures are typical examples of a two-level structure with repeated measures over multiple time points as Level 1 units that are nested within individuals: the Level 2 units. The GLM element has been implemented in the MLM algorithm for different types of outcomes. Missing data are treated as latent variables and dealt with automatically in MLM estimation procedures (Yang & Goldstein, 1996). Specific software such as MLwiN (Rasbash, Goldstein, Browne, Yang, & Woodhouse, 2000) and HLM (Raudenbush & Bryk, 2002) has been developed to fit MLM in the framework of GLM. Most general statistical software programs such as SAS, Stata, and SPSS have built-in procedures to fit basic MLM models under a GLM framework. Finally, SEM combines elements of traditional factor analysis, regression models, and simultaneous equations to form two parts of its fundamental framework: measurement and structural models (Joreskog & Sorbom, 1993; Muthén, 2004). It can handle imperfect data such as data with missing values or with unobserved or latent effects. As well, it can fit causal relationships over a time course. The most effective software to fit SEM is Mplus (Muthén & Muthén, 2006).
During the last decade, GLM, MLM, and SEM analyses have become commonly used tools in life-course analysis: (a) in etiological studies linking early life events with serious later life illnesses, such as cancer and cardiovascular diseases (Hall, Yee, & Thomas, 2002); (b) in human growth studies that use childhood weight and height growth to predict adult body size (Li, Manor, & Power, 2004); and (c) in political studies that examine relationships between voters’ past political attitudes and their future voting behaviors (Yang, Goldstein, & Heath, 2000). Life-course analyses using models in epidemiology and public health studies are well illustrated and established (De Stavola et al., 2006). However, the literature is very limited on the application of these methods to correctional risk assessment and risk change evaluation.
Study Aims
We introduce statistical models commonly used in epidemiological life-course analysis to address some of the methodological issues in risk assessment and risk change research outlined above. The models’ concepts, the interpretation of the model parameters, their respective pros and cons, and the fitting of the models using software such as SPSS (SPSS, 2008), Stata (Stata, 2009), and Mplus (Muthén & Muthén, 2006) are illustrated using two example data sets collected in the field (Examples A and B). We aim to present these concepts so that readers can gain a thorough appreciation of the models down to their real-life applications. Potential applications of these models to other clinical psychological and correctional evaluations are also discussed.
Example Data Sets
Examples A and B used in this article were drawn from published and unpublished sources, respectively. The main difference between Examples A and B is that the former was assessed at only two time points, pretreatment (T1) and posttreatment (T2), and the latter contains scores for up to 11 time points at irregular time intervals.
Example A
The first example is a sample of 150 male federal offenders serving custodial sentences of 2 years or more at the Regional Psychiatric Centre (RPC) in Saskatoon, Canada: an accredited mental health facility under the Correctional Services of Canada. Offenders were transferred to the RPC for assessment and treatment to address their violence risk and other mental health issues (for details, see Lewis, Olver, & Wong, 2013). All of them participated in the Aggressive Behaviour Control (ABC) program, a 6- to 8-month cognitive-behavioral treatment program designed to reduce their risk of violence and antisocial behaviors. Follow up studies have shown that completion of the ABC program is associated with reductions in violent and nonviolent criminal convictions (Di Placido, Simon, Witte, Gu, & Wong, 2006; Olver, Lewis, & Wong, 2013; Wong & Gordon, 2013; Wong, Gordon, & Gu, 2007). The offenders’ risk levels were assessed with an empirically validated risk assessment tool, the VRS (Wong & Gordon, 2006). Scores were based on a careful and thorough review of their comprehensive clinical and institutional files by raters not involved in their treatment and blind to recidivism outcomes. The assessments were completed at two time points, pre- and posttreatment, to determine changes, if any, in their estimated risk for future violence. The same rater did both pre- and posttreatment ratings, but all posttreatment information was made unavailable to raters when the pretreatment ratings were done to avoid scoring contamination (Lewis et al., 2013). Most of those assessed with the VRS were judged at medium to high risk of violent reconviction (Wong & Gordon, 2006).
The VRS has six static or historical and 20 dynamic or changeable predictors. The dynamic predictors (e.g., criminal attitudes, cognitive distortion, substance abuse) can be used to evaluate changes such as those associated with treatment participation (Lewis et al., 2013; Wong & Gordon, 2006). Higher VRS scores indicate a higher likelihood of recidivism. The men were followed posttreatment in the community for a mean of 4.9 years (SD = 2.6 years, range = 0.07-9.3 years) during which 47.3% reoffended violently.
The mean VRS dynamic scores assessed at t1 and t2 were 46.7 (SD = 6.9, range = 20.0-56.0, n = 150) and 42.2 (SD = 7.0, range = 19.5-54.0, n = 150), respectively. The distributions of the VRS scores were approximately normal. The ages at t1 and t2 assessments were variable, with means of 30.8 (SD = 7.2, range = 18.3-52.4) and 32.8 (SD = 7.4, range = 21.0-56.2), respectively. Most offenders showed a decrease in risk (i.e., a lower score) between t1 and t2; some showed no change and a few, an increase (see Figure 1a). The time differences between t1 and t2 and the amount of change or lack thereof were more clearly observable when individuals’ risk scores were centered on their ages at the first assessment (see Figure 1b). Age in years at release (M = 32.6, SD = 7.6; range = 18.8-54.1) and total follow-up in years (M = 4.9, SD = 2.6; range = 0.07-9.3) were significantly correlated with reoffending (r = −.21 with p = .012 and r = .47 with p < .001, respectively)

Individual Plot of VRS Dynamic Scores of Example A by Age at Pretreatment Assessments (t1) and Posttreatment (t2) Without (a) and With (b) Centering of Age at t1 Assessment That Converts Assessed Age to Passage of Time From Initial Assessment
Example B
Example B consists of 782 U.K. prisoners (670 men and 112 women) with a mean age of 33.6 (SD = 10.7, range = 18.1-74.7) at index offense and at least two repeat risk assessments (i.e., 3 risk assessments in total). The data, together with reconviction information, were obtained with permission from the Justice Statistics Analytical Services Unit (JSAS) of the Ministry of Justice in the United Kingdom. The U.K. Police National Computer (PNC) system database was used to access reconviction information.
Prisoners in the United Kingdom with sentences of 12 months or more are mandated to undergo multiple risk and need assessments administered by the National Offender Management Service (NOMS) using the computer-based OASys, a tool with both static and dynamic items that has been designed to assess risk and need and is sensitive to change over repeated assessments. It has been in use since 2002. A brief outline of the OASys system of assessment, the dynamic factors, and the calculation of the OASys scores are provided in Appendix A.
The present study used a composite OASys score including both static and dynamic items (more details can be found elsewhere; for example, Howard, 2006; Howard & Dixon, 2012; Howard & Seaton, 2008). Assessments in Example B were administered shortly before and during offenders’ custodial or community sentences. OASys assessments can be re-administered when intervention or management activities reach a milestone, such as when treatment terminates, prior to parole applications, or simply to provide a routine update on the offender’s status.
Development of the OASys was based, in part, on the Risk, Need, and Responsivity principles (Andrews & Bonta, 2010) such that the results of the assessments can inform staff in sentence planning, prison management, and community supervision with the overall objective to reduce reconvictions. Specifically, OASys assessments were designed to identify offenders’ risks and needs, such as past and present offense characteristics, and criminogenic needs as indicated by the dynamic factors (see Appendix A). A higher OASys score indicates a higher risk of recidivism. However, in this data set, no information was available on the length of time served in prison or community. Importantly, the objective of using Example B is to demonstrate the statistical modeling of repeated measures of events at irregular time interval with missing data, rather than to definitively evaluate the efficacy of OASys as a risk or risk-change tool.
Table 1 shows the sample’s demographic information with respect to age, OASys scores, and the number of OASys assessments. The median time intervals between assessments were irregular and varied from 0.23 to 1.03 years, with shorter intervals associated with more frequent assessments. The mean OASys scores tended to decline over time except for a handful of offenders with 10 or more assessments. Perhaps these were higher risk offenders serving longer sentences who were being monitored very closely.
Descriptive Statistics for t1 to t11 Assessment Occasions for Example B (n = 782)
Note. OASys = Offender Assessment System.
Figures 2a and 2b show OASys scores as a function of age (Figure 2a or of age after centering at the first OASys assessment (Figure 2b) to show the pattern of scores over time. 1 There appears to be a much faster reduction in scores, as evident by the steepness of the slope (Figure 2a), in contrast with that of the VRS scores in Example A (Figure 1a). Once the scores were plotted against the length of time between assessments (in years; Figure 2b), the decline seems less steep, with many showing little discernible slope.

Individual Plots of OASys Scores for Example B That Are Arranged and Can Be Interpreted in the Same Manner as Example A
The mean time at risk for Example B was 4.6 years (SD = 1.7, range = 1.3-4.9). Of the total sample of 782, 187 (23.9%) recidivated violently. Compared with nonrecidivists, recidivists were significantly younger at the end of follow-up (M = 36.5 years, SD = 11.1 vs. 31.5 years, SD = 8.5; p < .001, d = 0.54, 95% confidence interval [CI] = [0.38, 0.71]), had significantly higher initial OASys scores (M = 42.2, SD = 24.2 vs. 60.9, SD = 20.3, d = 0.80, 95% CI = [0.63, 0.97]), but showed no significant difference in time (months) at risk (M = 31.5, SD = 11.5 vs. 33.3, SD = 11.8; t(df = 780) = 1.82, p = .07, d = 0.15, 95% CI = [−0.01, 0.32], or sex (χ2 = 1.95, df = 1, p = .17, odds ratio [OR] = 1.43, 95% CI = [0.82, 1.94]).
In both Examples A and B, assessments were repeated to determine initial risk estimates and possible changes in risk over time. In Example A, an individual’s risk scores,
In contrast, in Example B, risk assessment scores (
Risk changes can be assessed easily by ANOVA for repeated measures if all individuals were assessed at the same time points, no data are missing, and assuming that the effects of time interval on risk can be ignored. In practice, offender data often are collected at irregular time points. Also the effects of the passage of time have to be taken into account. To control for the latter, a model that incorporates the time variable as a covariate is the preferred choice. Such models for repeated measures, with time as a function to capture change over time are widely used in child growth and development studies; hence, they are called “Growth models” (Yang & Goldstein, 1996). The most important analyses with offenders link risk changes to the distal reoffending outcome. In the following sections, we describe different models and their respective strengths and limitations when used to analyze Examples A and B.
The Models and their Applications
Conventional Regression Models (CRMs)
For Example A, regression models within the GLM framework can be used to directly estimate the association between the change variable,
where g( ) refers to a link function through which the relationship between the dependent variable or outcome y, and independent variables can be adequately linearized and effectively estimated by means of a maximum likelihood (ML) estimation procedure, depending on the type of dependent variable y. The type of outcome variable determines which type of regression is most suitable. Equation 1 typically represents a linear regression model for a continuous y variable such as a risk score. For a dichotomous y, such as presence or absence of any violent reconvictions, the g( ) is linked with a logit function, and Equation 1 becomes a logistic regression. If the outcome y is incidents or a count of any reoffending, the g( ) is linked with a logarithmic function, and Equation 1 becomes a Poisson regression. The parameter β0 is the estimate of the intercept when all independent variables have a value of zero. Typically, it is the overall mean if the dependent variable is a risk score, or log odds of baseline recidivism rate if the dependent variable indicates the presence or absence of recidivism, or the log of the baseline incident rate if the dependent variable is number of offenses. The parameter β1 estimates the mean effect of the VRS dynamic change score on reoffending behavior. A positive sign of β1 suggests positive association between the risk change score and the reconviction outcome. The parameter β2 estimates the association between a covariate and the reconviction outcome.
An alternative model is to separate the risk score at two time points
Conceptually, Model 1 can quantify the effects of change in the score between two time points on the independent variable while controlling for possible confounds, while Model 2 can quantify the effects of a t2 score on the independent variables, controlling for a t1 score and other possible confounds.
Any statistical software can provide a straightforward estimation of both Models 1 and 2. For the dichotomous reconviction outcomes in Example A, a logistic model with a logit link of g( ) can be fitted using the command of GENLIN in the SPSS. Alternatively, a direct use of LOGISTIC REGRESSION in SPSS will give identical estimates. The detailed command codes of GENLIN and LOGISTIC REGRESSION are presented in Table 2 together with model estimates for Example A.
Estimates of Conventional and Conditional Logistic Regression Models in Predicting Violent Reconviction for Example A (n = 150)
Note. Letters y, x, and z refer names of variables. Var y is coded 0 for nonviolence reconviction and 1 for violence reconviction. Other covariates are all continuous scores. CI = confidence interval; VRS = Violence Risk Scale; Δ = change score; PCL = Psychopathy Checklist.
Model 1 indicates a marginal association between the change in VRS dynamic score and the probability of later reconviction in the community: For individuals with a one-point decrease in VRS dynamic score from the first to second assessment, the odds of committing a violent offense are 1.13 times (or 13%) lower than those with no risk change. 2 This finding is independent of differences in follow-up time, PCL-R factor scores, and age, but is not statistically significant (p = .085). Similarly, Model 2 indicates that if VRS dynamic scores at the first assessment were the same for all individuals (i.e., controlling for baseline risk level), those with a score that is one-point lower at the second assessment are 1.16 times (or 16%, p = .052) less likely, at marginal significance, to commit a subsequent violent offense than those with no change, after adjustment for confounding variables. The logistic regression results from both Models 1 and 2 are very similar to the Cox regression models in the original research (Lewis et al., 2013) in terms of the conclusions drawn from the analyses, but not in terms of the model parameters. That is, the Cox model gives effects in terms of hazard rates over variable follow-ups, while the logistic model (i.e., the current model) yields odds ratios for a fixed follow-up period only.
Both models can estimate the direct association between risk change and outcome but unlike Model 1, Model 2 can account for the possible effect of the baseline risk level. Although both models are easy to fit and interpret, they assume fixed time points for risk change measurement and so can only handle data from two time points as in Example A.
For more than two risk scores—for example, at three time points—Model 2 can be extended to
In addition, neither Model 1 nor Model 2 can be used for Example B, where assessment times occurred at irregular intervals within individuals, and were also not consistent across individuals. In sum, while Models 1 and 2 can link risk changes to subsequent outcome, they have serious limitations when there are multiple assessments occurring at irregular intervals for different individuals. Specifically, Model 1 can be used for measurements of two generally uniform time points while Model 2 can be used for measurements of more than two, also uniform time points. However, the latter can be clumsy, inefficient, and difficult to interpret for multiple time points and cannot be used to model risk or changes over time if offenders were assessed at irregular time intervals (see Table 2). In this case, we want to use an alternative analytical strategy.
Conditional Models
A conditional model (CM) is a different version of Model 2 derived by resetting Model 2 parameters. A CM not only models causal relationships (as defined by Kraemer et al., 1997) between risk changes and outcome, it can also accommodate two or more change assessments over several, fixed time periods. The model was developed from child growth and development research (Goldstein, 2011) where multiple measures of weight and height were fitted by statistical models to reflect the growth patterns of children and differences in these growth patterns (Yang & Leung, 1994). Such models have been used widely in life-course epidemiology (De Stavola et al., 2006); a context that is akin to the examples in this article.
For data with two fixed time points, the conventional GLM model
and Model 4 for data with three fixed time points,
and Model 5 for data with unlimited h fixed time points,
Or
The change of regression parameter from beta to alpha in Model 5 is in preparation for the introduction of the next two types of models when two sets of model parameters are required. These CMs can effectively estimate both the initial measure or baseline risk level in relation to
Using both GENLIN and LOGISTIC REGRESSION commands in SPSS, we obtained estimates of Models 1 to 3 in Table 2 for Example A. As expected, Models 2 and 3 gave identical estimates for the effects of risk changes and the confounding variables. The baseline risk-level estimate of .041 in the CM was the sum of β1 and β2 in the conventional model (0.145 + [−0.104]). Just as with the conventional model, the CM assumes a fixed or consistent time interval, which is not true for Example B. These models also require complete data from each individual; those with missing data at some time points are “list-wise deleted” by the GLM estimation procedure, unless an imputation procedure is conducted before fitting the models. Models 1 to 3 can be fitted to Example A, but cannot be fitted to Example B due to the data problems already noted; alternative models, explained below, are needed.
Two-Stage Models (TSM)
A TSM can address the various data analytic issues discussed above and can be used to assess associations between risk change and recidivism outcome in the two examples. At Stage 1, a change model—or the so-called “Growth models”—for repeated assessments of risk over time, can be fitted to identify change patterns estimated as latent variables. Growth models for repeated measures in the MLM framework have been described by Goldstein (Goldstein, 2011) and others (Raudenbush & Bryk, 2002). At Stage 2, a regression model can be fitted to link the latent variables of change patterns to the distal outcome, adjusting for confounding variables.
Stage 1: Growth Models for Change Patterns
There are several key advantages in growth models. They can accommodate different individuals with different numbers of assessments or time points that are greater than one. The intervals between time points or assessments for different individuals and for the same individual can vary. Missing data are also allowed if they occur at random (see Little, 2002, for more information). The data in Examples A and B present a two-level structure for repeated assessments of risk level (score) X, measured on occasions from one to any point (rj, i = 1, . . . , rj) for each individual (j = 1, . . . , n). The two-level structure measured on occasions i at Level 1, nested within individual j at Level 2 (Figure 3). In other words, Level 1 is the measurement occasions for each individual, which are nested by individual at Level 2.

General Two-Level Structures of Repeated Measures Data.
It is more a rule than the exception that after an intake assessment, most offenders will be assessed at inconsistent times during their sentences, whether or not they undertake treatment. The growth model can use the exact assessment time tij, say, after treatment, operationalized as the elapsed time from initial assessment age or calendar date (date/month/year), to build a function of the time between the repeatedly measured risk scores. Because the growth model is most efficient for data with more than two time points, Example B with multiple time-point data is used for illustration in the following sections.
The repeated measures or risk assessment scores over time in this case become the response variable in Stage 1 analysis. For risk scores measured over time, a linear function of the passage of time for each assessment can be fitted by Model 6 in which the intercept α0 and linear slopes α1 are allowed to vary at the individual level by including individual-level random effects,
The growth Model 6 consists of parameters for fixed and random parts. The parameters α0 and α1 are the fixed part of intercept and slope, similar to those in any CRM. For each of the fixed parameters, random terms u0j, u1j, are attached, respectively. Their attachment allows for variation in the intercept and slope among individuals. Model 6 estimates an overall linear function for the whole sample by the fixed parameter part,
In Model 6, Xij is the value of the risk score at the ith assessment for the jth individual, tij is the time (usually the age of the jth individual) at the ith assessment, u0j and u1j are the individual-level random effects for the intercept and slope parameter, respectively, and the e0ij are the assessment occasion-level residuals: with zero mean and constant variance
Model 6 effectively and simultaneously fits n linear regression models for n individuals by allowing for different intercepts and slopes. Variability of the intercepts and slopes among individuals is captured by the variance–covariance matrix
If the observed change pattern over time is not linear, other forms of time functions such as polynomial functions, spline functions, or fractional polynomial functions can be fitted. Such models, more specific in application and interpretation, are beyond the scope of the present discussion that used only the basic growth model; it fits Example B well. We refer interested readers to Singer and Willett (2003).
Prior to fitting a growth Model 6, raw data arranged in a wide format (i.e., one row per offender) need to be rearranged in a long matrix structure (i.e., one row per assessment) as shown in Figure 4, with assessment occasion nested within individuals, corresponding to the two-level structure.

Data Structure From Case Records to Nested Records for Example B.
This transformation of data layout can be performed in all major statistical packages. In Stata/SE 11 (Stata, 2009), the command reshape will do the job well. The command descriptions are presented in Appendix B.
Individuals with more than one assessment occasion are retained in the new data format. The dimensions of the data matrix excluding the case identifier are now indicated as
Growth models can be fitted effectively using SAS, Stata, MLwiN, R, HLM, and many other packages. Detailed reviews of statistic packages for their capacity and tools in fitting MLM, including growth models can be found on the MLM review website (http://www.bristol.ac.uk/cmm/learning/mmsoftware/). In the present study, Stata/SE 16 was used for the two-stage analysis of Example B. The command xtmixed fits the growth model, and outputs change patterns in terms of random effects or latent variables
By default, the model parameters will be estimated by the restricted likelihood estimation method (REML). This model has random effects of intercepts and linear slopes. If the data indicate a nonlinear relationship between age and risk score, quadratic or cubic terms of time t could be included in the model.
Stage 2: Conventional GLM Model
Having fitted the growth Model 6 and obtained the new variables
If a growth Model 6 has more than two parameters including the intercept and their random effects—for instance, K regression coefficients with random effects—the growth patterns will be summarized in K sets of new variables in a linear form,
If the outcome is reoffending or not, a binary variable, the
Estimates by Two-Stage and Joint Models for Example B (n = 782)
Note. Age at assessment was centered at 18 years. OGRS = offender group reconviction scores; OASys = Offender Assessment System.
β1 estimates effect of the latent covariate z1 or mean value of the repeated assessment of individuals; β2 estimates effect of the latent covariate z2 or linear slope of the repeated assessment of individuals.
By fitting a growth Model 6 and a logistic Model 7 separately in two stages, model estimates for Examples B were obtained and are shown in Table 3. Parameter estimates in the growth model in Table 3 suggested that in this sample, the mean OASys risk score at t0 (when offenders first started serving their sentences) was 69.0, and it declined by 0.71, on average, as the age of individuals was increased by 1 year (p < .001). There was a significant variation of the mean score of OASys among individuals by
The estimates of the Stage 2 logistic model presented in the bottom half of Table 3 suggest that a higher baseline OASys level was significantly associated with a higher risk of reconviction behavior (p < .001), but the changes in OASys scores over time were not significantly linked with risk of reconviction, after adjusting for effects of follow-up time, gender, age at end of follow-up, and the offender group reconviction scores (OGRS; Copas & Marshall, 1998) used here as a static measure of offender risk of reconviction. 3
As expected, longer follow-up time and higher OGRS were independently associated with significantly higher likelihood of reoffending (p = .026, p = .008, respectively), and older age, with significantly lower likelihood of reconviction (p < .001). The program syntax for fitting the model is in Table 3.
However, one potential methodological problem in two-stage modeling could be the use of random effects estimated from the first stage as covariates in the second stage model. It is well known that the random effects or latent variables are weighted-posterior means with estimation errors and shrinkage toward the grand mean (Longford, 1993; Snijders & Bosker, 1999). The shrinkage of such estimates could be large when the intraclass correlations (ICC)—or correlations between repeated measures of risk scores in this case—are small. Shrinkage could lead to biased estimates of latent variables and their estimation errors, which could be transferred to the second stage model and cause biased estimation of the effects of changes on recidivism. SEM has been designed to solve such a problem. It fits two or more models simultaneously. The detailed theory and computational aspects are available elsewhere (Muthén, 2004). The use of SEM for Example B with comparisons with TSMs is illustrated in the following section.
The degree of shrinkage of the latent variables can be judged based on the ICC, which has a value between 0 and 1 (Snijders & Bosker, 1999). A smaller ICC indicates more shrinkage, and a larger ICC, less shrinkage. Taking the random intercept u0j of each individual in the two-level growth Model 6 as an example, the optimal estimate of u0j is from two sources of information: the mean of Level 2 unit j or the jth individual,
JM: SEM
By definition, a typical SEM has two parts: a measurement model (M) and an equation, structural, or regression model (E; Markus, 2007; Muthén, 2004). The M part usually identifies latent variables (e.g., individual change) from factor analysis. The identified latent variables will then be used as covariates of a defined outcome in the E part: the regression model. This definition of SEM is a good match for the purpose of the two-stage modeling in this context. By jointly estimating the two models in one procedure, unbiased estimates of variables of interest can be obtained very efficiently.
Using Example B, we conducted a TSM analysis as indicated above. First, a growth model for OASys change scores over time was fitted using Model 6. Second, the relationship between the estimated change scores and distal reconviction outcomes, with adjustments for baseline risk score, age, gender, and follow-up time of the individuals was fitted by Model 7. More efficiently, we can perform a simultaneous analysis to combine Models 6 and 7 into a JM as shown in Figure 5.

The Joint Model of Example B.
At the measurement level, two latent variables, intercept (Int) and slope (Slp), can be estimated based on the repeated-OASys scores of each individual. Intercepts are estimated from the mean of the OASys scores of each individual, and slope is the rate of linear change in each individual’s OASys score. If necessary, other latent variables can be introduced to reflect the quadratic or cubic shape of the change model over time. At the individual level, the reoffending outcome is then regressed on these latent variables, together with covariates (e.g., OGRS, follow-up time, and age or gender in Example B).
The software MPlus 6 (Muthén & Muthén, 2006) was used to fit the JM. Estimates of the model and Mplus syntax are presented in Table 3. Five key syntax commands in Mplus were used to fit the JM: VARIABLE, TSCORE, CATEGORICAL, ANALYSIS, and MODEL. The command VARIABLE declares all variables: a total of 28 in this case. It was included in the data set with an option Usevariable to specify variables to be used in the models. The command TSCORE transforms ages at assessments t1 to t11 to standardized scores, and the command CATEGORICAL defines the outcome y as a discrete variable (as opposed to a continuous variable). The command ANALYSIS selects type of models as random effects and the estimation procedure as ML. Finally, the command MODEL specifies variables for the two parts in the JM (see Appendix B). The estimates of both the TSM and the JM are presented in Table 3.
The two models in Table 3 have similar estimates on most parameters; the models yielded the same conclusion
4
on every parameter for Example B in terms of statistical significance. In sum, the OASys risk level was reduced as the individuals’ ages increased (
A Summary of Data Requirements for Generalized Linear Models in Assessing Associations Between Changes in Risk and Subsequent Reconvictions
Discussion
As public safety takes on increasing importance for criminal justice agencies, forensic risk assessment has moved from simply assessing risk and predicting recidivism to risk reduction through treatment and risk management. The ability to update assessments to reflect changes resulting either simply from the passage of time or from intervention and linking change to policy-relevant outcomes are central to expectations about how parole boards and other decision makers will meet public safety requirements.
A number of dynamic risk assessment tools are designed to measure risk-related changes, but the research examining whether change predicts subsequent reductions in the odds of reoffending is still in need of further development. The methods currently used to measure change in this domain are one source of problems (Olver, Beggs Christofferson, & Wong, 2015; Serin et al., 2013) and certainly warrant more investigation. Cox regression analysis with time-dependent covariates to estimate association between the risk change and subsequent reoffending (Brown et al., 2009; Howard & Dixon, 2013) is an effective alternative method to address such research questions. One advantage of the approach taken here is that the GLM framework can be used to examine multiple reconviction outcomes for each individual (i.e., density of recidivism, frequency of recidivism), which is not possible for Cox models.
Another important contributor to the paucity of evidence comes from challenges inherent in typical data sets and the lack of application of more complex statistical models to address these challenges in our field. Irregular data collection intervals, missing data, intrusion of significant life events, and other potential confounding variables (e.g., treatment attrition, drop-out during follow up, reconviction during the study period, data being collected by a variety of different correctional personnel) are frustratingly common. The methodologies to handle these “noisy” real-life data are well developed in the areas of biostatistics and epidemiology, but they are not often used in forensic psychology to address risk assessment and change issues. We introduced these methodologies and illustrated their applications with two data sets. Example A is more akin to a well-designed quasi-experimental investigation with relatively “clean” data: Observations were collected at about the same time for all participants and all were exposed to the same treatment, and data were collected by one investigator with little of it missing. By contrast, Example B is “noisier” with variable numbers of assessments and a myriad of missing data points collected during “real-time” administration of a tool used both in custody and the community on a day-to-day basis. These applications yield data sets that require a new level of statistical complexity in their analyses.
We introduced four types of models that may be useful for studies that assess change over two or more time points: CRMs, CMs, TSMs, and JM for examining potential pathways from risk management or intervention over time to future offending or desistance. Structured under the framework of GLM, the four models should cover most data characteristics commonly found in this field, illustrated using Examples A and B. Both examples had the distal outcome of any violent reconviction. Thus, for the Part 2 of TSM and JM, the logistic regression model was used for the presence of recidivism with adjustment for follow-up time. For the outcome of number of reconvictions (i.e., frequency), Poisson regression models can instead be fitted for the incidence of recidivism, with the total follow-up years of individuals as the denominator.
As with most things, one model does not fit all. When each individual has two repeated measures of risk scores at Times 1 and 2 (Example A) prior to collecting the outcome measure, both CRM and CM can be used to estimate the direct association between risk change and outcome. With more than two repeated measures, the CRM becomes inefficient, and the results more difficult to interpret, whereas the CM can handle more than two repeated measures. Thus, CM is more flexible, and the results are more interpretable; however, the CM cannot handle cases with repeated risk measures taken at irregular time intervals, or with missing measurements. In these cases, TSM can be used.
The TSM analysis consists of two parts: the growth model and the regression model. The growth model is designed to handle repeated measures such as risk scores assessed over time during treatment with random missing data (e.g., offenders missed appointments on account of illness) and irregular time interval between measures (e.g., assessors conduct assessments at irregular times). The regression model then makes the link between patterns of change in scores from the risk tool over time and the criterion measures (e.g., presence of reconviction). However, the change patterns, estimated by latent variables, could be biased for different reasons. To address potential biases, the JM would be the better choice, although its methodology and computation are more complex than the TSM. When the latent variables have little shrinkage (i.e., the ICC is close to unity and the estimated u0j is close to the raw mean of the jth individual), and therefore produce little or no bias, either TSM or JM can be used.
In Example A, risk was measured at two time points with few missing data. Offenders had varying follow-up times and there were a number of potential confounds such as baseline risk levels, age, and ethnicity. Example A was analyzed with the GLM Models 1 and 2 that produced results comparable with that obtained previously using the Cox regression model (Lewis et al., 2013). This replication validates the similarity of the Cox regression and the GLM Models 1 and 2.
Data for Example B were more complex. The TSM and JM were applied; similar estimates on the effects of covariates and of the two latent variables on the recidivism outcome were obtained. From the TSM, we calculated the ICC for each of the 782 individuals and obtained the value of λ j in the range of .984 to .996. This result suggested little shrinkage in the estimate of random intercepts u0j. Because there is no simple method to examine shrinkage or bias in the estimates of random slope u1j, we assumed only negligible bias in the random slopes provided by TSM, based on the similar estimates in the JM.
For Example B, after controlling for several covariates in the analyses, no causal relationship (as defined by Kraemer et al., 1997) was found between the OASys change scores, assessed in a realistic correctional context, and violent recidivism. To the best of our knowledge, very few studies have been carried out to assess OASys score changes and reconviction. One implication of the present results is that changes in OASys scores may not be a valid indication of changes in reconviction risk for violence, and further research is called for to replicate our findings especially with the noted limitations of the Example B data set. Our findings, consistent with that reported by Howard and Dixon (2012), may reflect (a) a deficiency in the OASys tool itself (e.g., the lack of a mechanism to systematically rate and capture change), (b) some issue external to the tool (e.g., the lack of an effective change agent to reduce reoffending risk, or problems in the implementation of the tool), or (c) some idiosyncratic characteristics of the data set. Regardless, the purpose of presenting the results was not to compare and draw conclusions about the tool, but to illustrate the utility of using various methodologies in assessing change measured under various conditions.
The methodology presented in Example B has a high degree of flexibility and can be used to analyze field-generated data that are often fraught with the problems discussed earlier. In contrast, Example A illustrates different approaches to the analyses of a “cleaner” data set. It is hoped that the various data analytic options presented in this article could be useful additions to the tool boxes of researchers who are interested in exploring the association between change on risk measurements (or other domains) and reoffending.
There are probably many reasons why in some areas the empirical evidence base has been slow-growing. We are not the first in recent years to suggest that the adoption of multivariate statistical modeling techniques would greatly benefit forensic and criminal justice change evaluations (Meehan & Stuart, 2007; Walters, 2007). Also, much could be learned through cross-disciplinary fertilization by the adoption of methods used to solve similar problems in other disciplines (e.g., epidemiology). However, these suggestions appear to have had a limited effect on the published literature. One solution may be to create more opportunities for researchers to attend statistical training workshops at correctional and forensic psychology gatherings, such as those offered at the conferences of other organizations such as the Association for Psychological Science. Easy to follow “how to” guides or explanatory papers could also be helpful to increase the use of statistical techniques not typically used in a particular field. With the use of effective learning and in-service training approaches, such cross-disciplinary fertilization opens up the possibilities for gaining new perspectives on traditional approaches without re-inventing the wheel: an efficient way of moving a field forward.
Footnotes
Appendix A
Appendix B
Commands for Models
| Function | Commands | Note |
|---|---|---|
| Stata | ||
| Transform data as shown in Figure 4 | reshape long t x, i(case_id) j(time) | The command stacks t1, t2, t3 . . . and x1, x2, x3 . . . into two columns/vectors named |
| Fit Model 6 | xtmixed x t, || case_id: t, covariance(un) variance predict u0 u1, reffects gen z1 = _b[_cons] + u0 gen z2 = _b[t] + u1 |
The first line of command fits the two-level growth Model 6 in which the dependent variable is the risk assessment scores, labeled as x; the independent variable is age at assessment labeled as t. The effect of age t as a linear function of the dependent variable is allowed to vary among individuals identified by the case_id, and the covariance structure is unstructured. The second line requires outputs of the two random effects named as u0 and u1 based on the fitted model. The last two lines simply generate the two latent variables and , respectively. |
| MPlus | ||
| Fit Models 6 and 7 jointly | MODEL: i s |x1 − x11 AT t1 - t11; i*1538 s*0.8; i with s*−18; x1 - x11*56 (eq_v); y ON i s z3 z4 z5 z6; |
i for the latent variable for intercepts and s for slopes. The first four lines fit the growth model (Part 1). The initial values of 1,538, 0.8, −18, and 56 were assigned for the variance of intercepts, slopes, covariance between intercepts and slopes, and variance of measurement errors within the individual, respectively. These initial values are obtained from the two-stage modeling analysis. The final line under the MODEL command defines a classical linear regression model (Part 2) with the y variables regressed on six independent variables as described in Model 7 in the last section. |
Note. The choice of the three statistical packages to introduce these models is based on our experience and knowledge indicating that most quantitative researchers in the field are familiar with SPSS (see syntax in Table 2) for regression analysis and with MPlus for factor analysis. We organized the technical parts in such a way that interested researchers can simply use the SPSS codes to fit the conditional models with little difficulty. For the growth model, we introduced Stata because it has much more functions and is more powerful than SPSS. Stata is also easy to use and widely used for social, medical, and public health research. Neither SPSS nor Stata can handle the joint model, and, based on our experience, MPlus is the best option for this purpose.
Acknowledgements
We wish to thank the Justice Statistics Analytical Services Unit (JSAS) of the Ministry of Justice in the United Kingdom for providing OASys data for Example B and Philip Howard of JSAS for advice on reconviction outcomes in the OASys data.
