Abstract
The Generic Program Performance Measure (GPPM) was developed to assess the progress and performance of offenders participating in correctional programs. Program facilitators use the GPPM to systematically rate offenders’ skill development, attitude change, motivation level, and program participation. The present study examined the psychometric properties of the GPPM using a total sample of 3,815 offenders who were assessed on the tool at pre-treatment, and 2,120 who were assessed pre- and post-treatment and subsequently released. Results indicated that the measure was sensitive to significant treatment gain in all program areas: for men and women, and Aboriginal and non-Aboriginal participants. Internal consistency was excellent. Interrater reliability was acceptable. Importantly, offenders who were rated as not demonstrating treatment gain were more likely to return to custody than those who achieved treatment gain. The GPPM is a reliable and valid measure of progress across a range of correctional programs that is not resource intensive.
Evidence accumulated over the last two decades supports the use of well-designed and implemented correctional programs in facilitating successful community reintegration of offenders from a broad variety of backgrounds (Andrews & Bonta, 2010; Usher & Stewart, 2014; Wilson, Bouffard, & MacKenzie, 2005). While the ultimate goal of programming is to reduce recidivism, correctional interventions should also have clearly defined, measurable, intermediate objectives that allow interveners and researchers to evaluate participant progress and overall program performance (McGuire, 2000). Valid and reliable tools are required for such evaluations.
The importance of measuring offender progress in correctional programs may seem obvious; however, few empirical standardized tools have been developed and validated for this purpose. Relying solely on offenders’ self-reported motivation to change or perceived level of benefit gained from program participation provides only one part of the picture. Mere participation in a correctional program does not provide adequate evidence of treatment impact. Factors such as knowledge of program content, skills acquisition, development of therapeutic alliance, and understanding of personal risk factors are important indicators of treatment gain and should be assessed by program staff. Given the importance of effective correctional programming to criminal justice goals, it is essential that correctional organizations have a means of measuring offender response to programs, both to guide continued individual treatment focus and to establish which aspects of the program are the primary engines of positive outcomes (Serin, Lloyd, Helmus, Derkzen, & Luong, 2013).
The Correctional Service of Canada (CSC) is the national correctional system responsible for the supervision of offenders sentenced by the courts to terms over 2 years. The Generic Program Performance Measure (GPPM) is a rating scale used at CSC to measure offender progress in all types of correctional programming against key objectives. Correctional programs in CSC are cognitive-behavioral in orientation and are carefully designed and implemented to adhere to the effective correction principles (Risk, Need, Responsivity) described by Canadian researchers Don Andrews and James Bonta (Andrews & Bonta, 2010). The menu of CSC correctional programs in which the GPPM is a key measurement tool include living skills programs, sex offender programs, violence prevention programs, substance abuse programs, domestic family violence programs, women’s programs, and Aboriginal-specific programs. These programs are of moderate to high intensity; all are minimally 26 sessions (52 hr) in length, with high-intensity programs over 100 hr. Program delivery staff are hired based on their qualities as change agents and must be trained and approved as qualified to deliver each program they facilitate. Quality of program delivery, accuracy of the assessment procedures, and the thoroughness of the final program report are monitored by regional trainers and on-site managers. Offenders are referred to the correctional programs only if they meet specific referral criteria. They are assessed pre-treatment and their performance in the program is monitored and evaluated by program facilitators. A final report completed by program staff contains an analysis of the offenders’ performance and a recommendation stating whether they have achieved a successful completion. Successful completion of correctional programs that are a required component of the offenders’ correctional plan is a key consideration of decision makers reviewing offenders’ readiness for discretionary release to the community.
The GPPM is intended to be a means of measuring common goals shared across all correctional programs; it is in this sense, then, a generic measure. Used nationally in CSC since 2005, the GPPM allows program facilitators to rate offender motivation, knowledge acquisition, attitude change, and skill development in a standardized manner, and provides a means of delivering structured feedback to offenders on their program performance. Given the widespread use of this measure within CSC, it is important to validate its psychometric properties and examine its value in predicting key correctional outcomes.
The development of the GPPM was based on the concept of Goal Attainment Scaling (GAS). First developed by Kiresuk and Sherman (1968) for use in a community mental health center, GAS is now widely used across many different human service interventions, including corrections, to measure participant progress on treatment goals (Kiresuk, 2003). The GAS method allows program administrators to measure goals specific to the treatment being offered as well as the degree of change incurred by each participant. Theoretically, those with the highest levels of goal attainment will have internalized the treatment material to a greater degree and ultimately perform best on the related outcome indicators.
A number of studies have used GAS to measure treatment change with offender populations, most commonly in the context of sex offender programs. For example, Stirpe, Wilson, and Long (2001) measured treatment gain in a sample of 48 sex offenders using a version of the GAS specific to sex offender interventions. This version of the measure was based on earlier work completed by Todd Hogue (1994). All participants had taken part in a cognitive-behavioral intervention for sex offenders. Groups were created according to level of risk. Results indicated that all groups showed treatment gains on the GAS. Interrater reliability was reported to be good (r = .81). The same version of the GAS was also used to measure motivational change over the course of a community-based sex offender treatment program (Barrett, Wilson, & Long, 2003). In this study, 101 Canadian, federally-sentenced male sex offenders were rated using GAS procedures pre- and post-treatment. Motivation levels were found to be significantly higher at the end of the treatment program. Unfortunately, no associated outcome measures of recidivism were assessed in either study.
Although versions of GAS have been applied in correctional settings, there is debate as to whether intraindividual changes can predict key correctional outcomes such as criminal recidivism, or indeed, whether existing instruments measuring change on dynamic factors are good predictors of correctional outcomes. Theoretically, gains achieved through participation in correctional programming that target changeable risk factors associated with criminality should result in reductions in recidivism, the ultimate goal of correctional interventions. Alternatively, one would conclude that participants who demonstrate no change, or their change scores indicate a reduction in evidence of skills or poorer attitude post-treatment, do not benefit from the intervention. High-quality research that has empirically established the link between reduction in dynamic risk factors with treatment and reductions in recidivism is rare. It is unclear whether goal attainment scales are sensitive or reliable enough to detect clinically meaningful treatment change (Barnett, Wakeling, Mandeville-Norden, & Rakestrow, 2013; Nunes, Babchishin, & Cortoni, 2011) and whether the change measured is related to recidivism. Some recent studies, principally in the area of sex offender research, have examined the properties of goal attainment scales.
Beggs and Grace (2011), for example, examined whether positive treatment change could predict reduced recidivism in a sample of sex offenders using the Standard Goal Attainment Scaling for Sex Offenders (SGAS). Interrater reliability of the SGAS was found to be good (r = .88). Change scores on the SGAS were strongly correlated with change scores on the Violence Risk Scale–Sexual Offender Version (r = .69), providing concurrent validity for the SGAS. Furthermore, the total SGAS post-treatment score was found to be negatively correlated with recidivism (r = −.21), although this relationship was no longer significant after controlling for risk levels. Importantly, change scores on some of the scales in the measure (social inadequacy, anger/hostility, and pro-offending attitudes) predicted sexual recidivism after an average 12-year follow-up. Although not strictly speaking a goal attainment scale, Olver, Wong, Nicholaichuk, and Gordon’s (2007) study examining change scores on the dynamic risk factors on the Violent Risk Scale–Sexual Offender version is of interest. Their methodology involved retrospectively coding file information of sex offenders treated in an institutional sex offender program. They found that the total dynamic change score was related to sexual recidivism, even when static risk or pre-treatment score were controlled. Using the same data set as Olver et al. (2007), Beggs and Grace (2011) found that both total dynamic change score and sexual deviancy change scores on the Violent Risk Scale–Sexual Offender version were predictive of sexual recidivism and that prosocial change was negatively related to sexual recidivism.
Outside of the area of sex offender research, Olver and his colleagues (Lewis, Olver, & Wong, 2013; Olver, Lewis, & Wong, 2013) have also examined the relationship of treatment change as measured by the dynamic items on the Violence Risk Scale (Wong & Gordon, 2000) to violent recidivism among high-risk program participants. In this study, researchers used direct assessments of clients, rather than coding archival data. Based on their results, the authors concluded that even among very high-risk offenders, risk-related treatment changes can be predictive of reductions in violent recidivism even after controlling for confounds and that reliable measurements of therapeutic change may be informative about treatment outcome in a high-risk violent offender group (Olver et al., 2013, p. 160).
The purpose of this study is to evaluate the GPPM on a number of criteria. First, the study will establish whether the scale has an acceptable level of reliability through examination of the internal consistency of the measure as a whole, and of the three subscales. Second, we will determine whether the tool is sensitive to the measurement of statistically significant change on the identified goals of various correctional programs. Third, the study will examine the extent to which the dynamic factors measured by the GPPM are related to program dropout (pre-treatment scores) and recidivism after release. Finally, the study examines whether the pattern of change scores on the GPPM is related to offender outcomes. We hypothesize that offenders who begin the program with deficits and demonstrate treatment gain as measured by the GPPM will have lower recidivism rates than those who begin with deficits and do not improve.
Method
Participants
The total study sample consisted of 3,815 Canadian federal offenders who participated in nationally recognized correctional programs between November 1, 2005, and October 31, 2007, for whom GPPM data were available. Of these 3,815 offenders, 3,222 completed their correctional program and had post-program GPPM scores available, and of these, 2,120 had been released, permitting an analysis of the relationship of GPPM results to recidivism measures. Nearly 85% of the initial sample completed their correctional program, 14% dropped out, and 1.7% were unable to complete their program due to administrative reasons such as transfer to another institution, or release. Mean age for the sample was 32.74 years (SD = 9.90) and ranged from 17 to 81. The majority of the sample were male (96.7%); 68% were Caucasian and 20% Aboriginal; the remaining self-identified as from another racial group. Forty-four percent of the sample was serving sentences for violent offenses (e.g., homicide, robbery, sexual offenses, assault) and 56% for non-violent offenses (e.g., property, drugs). Ninety-one percent of offenders were rated moderate or high need, and 96% were rated moderate or high risk (based on the criminal history static factor analysis and the dynamic factor analysis portions of the Offender Intake Assessment [OIA] results, described later). Seventy-five percent of the sample was serving sentences of 2 to 4 years. Offenders included in the study participated in a range of national correctional programs: family violence (10%), living skills (30%), substance abuse (50%), and sex offender and violent offender combined (10%). Most (78%) of the offenders had participated in a moderate intensity program.
Measures/Material
Generic Program Performance Measure
The GPPM (Stewart, 2005) was developed within CSC as a means to efficiently assess offender performance in correctional programs (see Table 2 for a list of the items and for a full description of the items and the scoring rubric see Usher & Stewart, 2011). It was designed to be a dynamic measure of treatment progress on goals common across all correctional program areas. Development of the scale was based on the concept of Goal Attainment Scaling (GAS; Kiresuk & Sherman, 1968). The measure consists of 17 items (5 of which are only completed post-program), which are rated on a 5-point scale ranging from −2 to +2. Ratings are completed by the program facilitator pre- and post-program, and are based on observable characteristics and behaviors. The scale was developed with input from national managers from several correctional program areas: sex offender, violence prevention, living skills, family violence prevention, Aboriginal and women’s specific programs, and substance abuse. In discussions during the development of the scale, the managers agreed on objectives that were common to all the program areas. Treatment targets of all the programs were established based on criminogenic needs identified in the effective correction literature; for example, procriminal attitudes and associates, and problems with self-regulation. Ratings on each item in the scale are anchored by descriptors of the behaviors or attitudes that were developed with the agreement of the correctional program specialists. Facilitators using the GPPM are trained by national trainers on a standard mandatory training package to optimize interrater reliability. Pre-program results, and for some program areas, mid-program results, are completed and reviewed with participants as a means of providing feedback on their areas of strength and pointing out areas that should be a focus for improvement in the remaining sessions.
The GPPM consists of three subscales: Performance, Responsivity, and Effort. The Performance scale assesses skills, attitudes, and knowledge relevant to the goals of correctional programs. The pre-program assessment is completed by the third session, is based on observations of program participation to that point, and is supplemented by pre-program interviews and assessment results. Each item on the Performance scale taps the extent of the participants’ knowledge of a key program area (e.g., “knows a range of self-management skills”) and an item on the same content taps the extent to which they are able to apply this knowledge (e.g., “demonstrates use of range of self-management skills”). The Responsivity scale assesses factors such as motivation, learning ability, and relationship with facilitators that could affect participants’ progress in treatment (e.g., “motivation to change behavior”). Facilitators are instructed to attend to these factors and address them if possible, to maximize the participant’s potential for a positive outcome. Finally, the Effort scale evaluates the participant’s effort to learn and practice the content of the program during the course of treatment (e.g., “completes required assigned work”) and is only assessed post-program. On the Effort scale, facilitators assess the extent to which offenders attend sessions, complete homework, and participate actively in group activities.
A score for each scale is calculated by averaging the ratings for each item in the scale. A final overall score is derived by summing and averaging the post-score ratings for the Performance and Effort subscales only. An offender’s final program report is guided by these post-program scores. Given that one of the items on the Responsivity scale is not subject to change (i.e., ability to learn the material), the Responsivity scale is not used in the final overall program score calculation although statements on the results are included in the final program report. A mean overall final score of −2 to −1 indicates poor effort and performance. If offenders with mean overall scores in the −2 to −1 range complete their program, the facilitators’ training guidelines advise that the final report should probably not indicate a successful completion. An overall score of −1 to less than 0 indicates that some effort was demonstrated and some skill and knowledge development obtained. An overall score from 0 to +2 indicates a satisfactory to excellent level of effort, knowledge, and skills development. Guidelines recommend that a rating in this range would correspondingly result in an evaluation of a successful program completion on the final program report.
The GPPM is completed by program facilitators pre- and post-program for all offenders who participate in all categories of correctional programs. If the program staff have access to good information on the offender, the tool can be completed and scored in 15 min. Once completed, the GPPM scores are maintained in the Offender Management System (OMS), an electronic database containing the records on all federally sentenced offenders in CSC. At the time of this study, the mean number of programs completed by program participants prior to release was 1. In cases where offenders participated in more than one program during this time period, only GPPM scores from their first program were included.
A previous study of the GPPM using the same data set as this study examined interrater reliability on a sample of 21 program participants (Vandermey, 2009). In each case, two facilitators (both of whom were co-facilitating the same correctional program) independently rated participants in their groups. Intraclass correlations were used as a measure of reliability and were found to be good for the post-program scores at .88. Intraclass reliability correlations above .70 are considered to be sufficient (Nunnally & Bernstein, 1994).
Demographic, Criminogenic, and Recidivism Data
Demographic information such as age, gender, and ethnicity for this sample was obtained from the OMS. Risk ratings were drawn from the OIA which is a comprehensive evaluation conducted on all incoming offenders in CSC. The Dynamic Factors Identification and Analysis (DFIA) component of the OIA assesses a wide variety of dynamic risk factors or needs grouped into seven domains, with each domain consisting of multiple indicators. The structured dynamic risk assessment conducted by trained parole officers working at intake yields ratings for each domain, as well as a three-point overall level of dynamic risk (need) of low, medium, or high. The Static Factors Assessment (SFA) provides comprehensive information pertaining to the criminal history of each offender. The reviews of the components of the criminal history yield a structured professional judgment of low, medium, or high static risk. Research has demonstrated the good psychometric properties of the OIA assessment and its strong relationship to correctional outcomes (Brown & Motiuk, 2005). Typically, only offenders rated at least moderate or high on both the risk and need assessments meet the referral criteria to be enrolled in CSC correctional programs.
Recidivism data were obtained from the Canadian Police Information Centre (CPIC). Three categories of recidivism were used as variables in this study: (a) returns to custody for any reason (including technical parole violations and new offenses), (b) returns to custody for a new offense, and (c) returns to custody for a new violent offense. Violent offenses were defined as homicide, manslaughter, sexual offenses, assault, and robbery.
Procedure/Analytic Approach
Treatment Change
Within-subject t tests were used to compare mean pre-program scores and mean post-program scores on the GPPM.
Reliability
Internal consistency of the GPPM was assessed using Cronbach’s alpha. Interitem correlations, item-total correlations, and the adjusted alpha if deleted were calculated, based on classical test theory.
Exploratory Factor Analysis
Factor structure of the GPPM was analyzed using principal component analysis. This was selected for this study as the GPPM was still in the development phase. No previous research has examined the component structure of the GPPM, nor its psychometric properties. As such, this portion of the study is considered exploratory.
Validity
Predictive validity was first assessed by correlating GPPM scores with the outcome in question (i.e., program dropout and returns to custody). Next, the predictive ability of the GPPM was assessed using the Receiver Operating Characteristic. These analyses generate area under the curve (AUC) statistics that reflect the difference between true-positive and false-positive rates for each possible score (Rice & Harris, 1995). AUCs range from .50 to 1.00, where .50 indicates no predictive accuracy and 1.00 denotes perfect predictive accuracy. The AUC statistic equals the probability that a score drawn at random from one sample or population (e.g., recidivists’ scores) is higher than that drawn at random from a second sample or population (e.g., non-recidivists’ scores; Rice & Harris, 2005). AUC analysis is generally recommended as the preferred measure of predictive accuracy in forensic psychology (Mossman, 1994; Rice & Harris, 2005; Swets, Dawes, & Monahan, 2000).
The relationship between pre- and post-program scores and recidivism was also examined using offender person-years analysis. Offenders were grouped according to their pre-program scores on the GPPM and their degree of improvement on their post-program scores. Four groups were created: (a) offenders who began their correctional program with a negative overall pre-program GPPM score and whose GPPM score remained negative post-program; (b) offenders with a negative pre-program score and a positive post-program score; (c) offenders with a positive pre-program score and a positive post-program score; and (d) offenders with a positive pre-program score and negative post-program score. Rates of return to custody were subsequently calculated for each group using a person-year analysis to control for time-at-risk. A person-year calculation is an appropriate measure of failure rate when the length of observation time differs among individuals in a sample (Ibrahim, Alexander, Shy, Farr, & Horner, 2000). It is also more precise than simply calculating the proportion of failures over a given amount of time, as it ensures that the failure rate remains constant. Rates of returns to custody were calculated based on 100 offender person-years, which is equivalent to 100 offenders followed for 1 year. Rate ratios were used to test for significant differences between groups on rates of return to custody. The rate ratio is an appropriate method of comparing two groups of person-time rates and determines the precise relationship between the two rates. A rate ratio equal to 1 indicates no difference between rates.
Results
Treatment Change
Treatment change was assessed by comparing pre-program and post-program scores for the Performance and Responsivity scales of the GPPM. The Effort scale is only assessed post-program. Analyses of treatment change were completed by gender, ethnicity, and program type. Only those offenders who completed a correctional program are included in these analyses. Significant treatment gain of large effect was found for the Performance (t = −79.38, p < .0001, n = 2,891, d = 2.95) and Responsivity scales (t = − 49.93, p < .0001, n = 2,891, d = 1.86) for men; for women (Performance scale, t = −14.17, p < .0001, n = 85, d = 3.07; Responsivity scale, t = −7.67, p < .0001, n = 85, d = 1.66), and for Aboriginal offenders (Performance scale, t = −38.21, p < .0001, n = 559, d = 3.12; Responsivity scale t = −23.19, p <. 0001, n = 559, d = 1.90). Although the strength of the gains differed across program domains, significant gains were noted for all types of programs at both the moderate and high level of intensity. All individual items in the scales assessed pre- and post-program demonstrated significant positive change.
Reliability
Internal consistency of the GPPM was examined using Cronbach’s alpha for the total score and for each subscale. Results are presented in Table 1. For the pre-program scores, the mean item-total correlation was r = .71, which demonstrated a strong association between individual items and total scores. Cronbach’s alpha on the pre-program items points to a high level of internal consistency. Item-total correlations for individual items ranged from .59 to .79 and none fell below the generally accepted threshold of .30.
Reliability Statistics on the GPPM and Its Subscales (N = 3,223)
Note. GPPM = Generic Program Performance Measure; α = Cronbach’s alpha. r = item-total correlation.
The same analysis was conducted for the Performance and Responsivity subscales of the pre-program GPPM items. For the Performance scale, the mean item-total correlation and Cronbach’s alpha were considered to be high. None of the items improved the internal consistency when removed from the scale. Interitem correlations ranged from .49 to .80. For the Responsivity scale, interitem correlations ranged from .44 to .70.
The reliability of the post-program items of the GPPM was also investigated and also demonstrated strong internal consistency as illustrated in Table 1. Cronbach’s alpha was not increased substantially through the deletion of any of the post-program items. Interitem correlations for individual items ranged from .26 to .73. Likewise, results for the subscales post-program provide strong indications of internal consistency. Interitem correlations for individual items calculated by scale for the post-program range from .46 to .81.
Factor Structure
To assess the factor structure of the GPPM, an exploratory factory analysis was performed using principal component analysis. Only 17 post-program item scores were used for this analysis. Pre-program scores were not used for the factor analysis as not all items are scored pre-program, and successful program completion status is determined primarily through post-program scores. Oblique rotation was applied, as the factors were assumed to be correlated given the high interitem correlations. Results are presented in Table 2. Factor analysis of the GPPM revealed two factors with eigenvalues greater than 1. Together, these factors accounted for 67.4% of the variance in the GPPM scores (Factor 1 = 60.8%; Factor 2 = 6.6%). Items 1 to 4 (punctuality, participation, completes assignments, attentive to program) loaded on the first factor and the remaining items loaded on the second. The correlation between Factors 1 and 2 was .55. Factor 2 consisted of items representing effort. The remaining items load on Factor 1, with the exception of Item 6 (interpersonal relationship with facilitators), which is equally loaded to both factors.
Factor Loadings for the Post-Program Items of the GPPM (N = 3,223)
Note. Factor loadings > .40 are in boldface. GPPM = Generic Program Performance Measure.
Aboriginal Ancestry
Factor analyses were separately analyzed for offenders of Aboriginal ancestry. No significant differences were found between Aboriginal and non-Aboriginal offenders, indicating that the GPPM appears to perform similarly for both Aboriginal and non-Aboriginal offenders. Separate analyses of factor structure were not performed by gender, as the sample size for women was too small to be able to reach meaningful conclusions for this analysis.
Predictive Validity
Predictive validity of the GPPM was assessed by comparing pre-program scores to program dropout, and post-program scores as well as program change scores to recidivism on release.
Relationship to Dropout
To determine whether pre-program scores were related to program dropout, individual pre-program items were correlated with dropout along with subscales and overall score. Point-biserial correlation was used, as it is most appropriate for comparing continuous and dichotomous variables. Offenders who did not complete their program for administrative reasons (e.g., they had met their release dates or were being transferred for reasons unrelated to their behavior while incarcerated) were excluded from this analysis. All items were significantly associated with dropout with correlations ranging from −.13 to −.24 (p < .0001), indicating a moderate inverse relationship between pre-program score and dropout. As pre-program scores increased, likelihood of dropout decreased. Items most strongly related to dropout generally tapped factors associated with offender motivation, the extent of antisocial orientation, and relationship with facilitators (including agreement on treatment goals). Notably, correlation coefficients were larger for the GPPM scales than for any one individual item, ranging from .24 to .26 (see Table 3).
Predictive Ability of GPPM Pre-Program Items for Dropout: Correlations and AUCs (N = 3,222)
Note. Effort scale is not included as it is not measured pre-program. GPPM = Generic Program Performance Measure; AUC = Area Under Curve; CI = confidence interval.
p < .0001.
To further explore the relationship between pre-program score and dropout, an examination of the predictive ability of pre-program GPPM scores was undertaken. Corresponding areas under the receiver-operative curve (AUC) for each GPPM scale are presented in Table 3. Acceptable levels of predictive validity were found for each scale with the overall score showing the greatest degree of prediction, AUC = .71, 95% CI = [.69, .74].
Relationship to Recidivism
Post-program GPPM scores were compared with returns to custody to determine whether the measure could predict recidivism. Of the 3,222 participants who completed their correctional program, 2,120 had been released into the community prior to the study end and were thus available for follow-up. The average time-at-risk (i.e., time spent in the community after release) was 2.48 years (SD = 1.57). Individual GPPM post-program items, along with subscales and overall score, were correlated with returns to custody for any reason (which includes technical violations and new offenses). Point-biserial correlation was again used. Only offenders who were released from custody were included in this analysis. Significant correlations ranged from −.04 to −.14, indicating a significant weak inverse relationship between post-program score and returns to custody.
To further explore the correlation between post-program scores and return to custody for any reason, an examination of the predictive ability of post-program GPPM scores was undertaken. Corresponding areas under the receiver-operative curve (AUC) for each subscale are presented in Table 4. The predictive ability of the GPPM with respect to returns to custody for any reason was found to be low. Item 12 (prosocial orientation) was found to show the greatest degree of prediction; however, with an AUC of .58 (95% CI = [0.56, 0.60]), predictive ability for this item was not much greater than chance. When limiting the analysis specifically to returns to custody for a new offense, correlations were nonsignificant, and predictive ability was again found to be no better than chance.
Predictive Ability of Post-Program GPPM Scores for Returns to Custody for Any Reason: Correlations and AUCs (N = 2,213)
Note. GPPM = Generic Program Performance Measure; AUC = Area Under Curve; CI = confidence interval.
p < .001.
Although individual GPPM scores demonstrate only a weak relationship to success on release, the relationship of change in attitude and skill level as measured by the GPPM and outcome is the more important question to explore. To examine the relationship between pre- and post-program change scores on recidivism, the sample was separated into four groups based on their overall pre- and overall final post-GPPM scores: (a) offenders who began their correctional program with a negative overall pre-program GPPM score and whose GPPM score remained negative post-program; (b) offenders with a negative pre-program score and a positive post-program score; (c) offenders with a positive pre-program score and a positive post-program score; and (d) offenders with a positive pre-program score and negative post-program score. Note that overall scores were calculated based on Items 6 to 17 only, as Items 1 to 5 are not administered pre-program.
Based on the aforementioned groupings, rates of return to custody were calculated using an offender person-year analysis. This calculation was completed for returns to custody for any reason (which includes technical violations and new offenses), returns to custody for new offenses, and returns to custody for new violent offenses. As outlined in Table 5, offenders with the highest rates of return to custody were those with negative overall total post-program scores. In other words, offenders who did not improve their GPPM scores, or whose scores actually decreased, demonstrated the poorest outcomes on release.
Rates of Return to Custody According to GPPM Pre- and Post-Score Grouping
Note. GPPM = Generic Program Performance Measure; OPY = offender person-year.
Rates denoted are significantly different from each other at p < .05 using a rate ratio comparison.
Rates denoted are significantly different from each other at p < .05 using a rate ratio comparison. No significant differences were found across groups for returns for a new offense or returns for violent offense.
While there appears to be a general trend of lower rates of recidivism as the post-program score increases, only two significant differences between groups were found for returns for any reason. Offenders who began with a negative GPPM score and ended with a negative GPPM score were significantly more likely to return to custody than those who began with a positive GPPM and ended with a positive GPPM score (rate ratio = 1.46, 95% CI = [1.21, 1.78]) and offenders who began with a negative score and ended with a positive score (rate ratio = 1.33, 95% CI = [1.13, 1.56]). In other words, offenders with high GPPM scores post-program are less likely to return to custody than offenders with low GPPM scores both pre- and post-program. For rates of returns for new offenses and for new violent offenses, no significant differences were found between groups.
Thus, although it appears that GPPM scores alone are not a significant predictor of recidivism, offenders who do not demonstrate any improvement based on scores on the GPPM are more likely to return to custody than are other offenders.
Discussion
The principal aim of this research was to evaluate the psychometric properties of the GPPM and validate it as a tool for measuring participants’ correctional program performance. The GPPM as a whole, as well as each of the subscales, demonstrated excellent internal consistency, indicating strong evidence for the reliability of this measure. This finding is consistent with previous research on Goal Attainment Scaling (GAS), which has overwhelmingly found this type of measure to be reliable, both with respect to internal consistency and interrater reliability (Cardillo & Smith, 1994; Stirpe et al., 2001; Vandermey, 2009).
Significant treatment change was detected post-program for men and women offenders and for Aboriginal and non-Aboriginal offenders, and in all types of correctional programs. These results indicate that the GPPM is a dynamic measure that is able to detect treatment change in program participants across a range of program areas. While it could be argued that facilitators invested in the progress of participants in their programs may have produced biased ratings, thereby inflating post-program scores, it is unlikely that this potential bias accounted for the significant positive change in GPPM scores. A previous study had shown good interrater reliability of the GPPM (Vandermey, 2009). Furthermore, there were a number of offenders whose post-program scores actually decreased after participating in a correctional program (n = 207 1 ). Poorer performance ratings post-treatment may be associated with actual declines related to changed circumstances or other experiences that deteriorated the offenders’ attitude toward treatment. Alternatively, for some offenders, facilitators may have made more detailed observations of the extent of their deficits with further exposure to their behavior during program sessions. Exploratory factor analysis revealed that a two-factor structure may be more appropriate than the original three-factor design of this measure. The items originally designated as the Effort scale were found to form a single scale, while the remaining items loaded onto a different scale. These results suggest that items in the Performance and Responsivity scales could be collapsed into a single scale, as they appear to be tapping into the same dimension. The Responsivity scale was originally conceived as being a separate scale because one aspect of responsivity, namely, participants’ learning ability, reflects a trait not susceptible to change through programming. However, as demonstrated in this study, the overall Responsivity scale showed significant treatment change from pre- to post-program. It can therefore be concluded that all other items comprising the scale tap aspects of responsivity such as motivation, relationship with the facilitator, and establishment of prosocial goals, which are changeable and can be targeted through intervention. Evidence for the predictive validity of the GPPM was found with respect to program dropout. Low pre-program scores were significantly related to dropout. This finding has important implications for offender engagement, as program facilitators could devote additional resources to those offenders most at risk of dropping out of their correctional programs. Previous research suggests that motivation, prosocial attitudes, and other responsivity factors are good predictors of correctional program dropout (Nunes, Cortoni, & Serin, 2010), and several of the items on the GPPM tap these dimensions.
Finally, pre- and post-treatment GPPM scores only weakly predicted recidivism. This lack of relationship is not surprising as the measure was not designed as a risk assessment tool per se. Importantly, however, participants who did not demonstrate positive change as measured on the GPPM were more likely to return to custody than those who did. In other words, offenders who begin their program with low GPPM scores and finish with low program scores returned to custody at significantly higher rates than offenders who demonstrated positive treatment gain. The finding that offenders who demonstrated no treatment gains were more likely to return to custody supports the conclusion that the GPPM is capable of detecting meaningful treatment change. These findings highlight the importance of developing valid and reliable tools for measuring treatment change, as it appears that measured positive change on the key objectives represented in the GPPM had a modest impact on reducing returns to custody. Accurately measuring treatment change is a considerable challenge, as evidenced by the paucity of research in the literature. Recent research, however, is beginning to provide evidence that positive post-treatment change, particularly if the measures assess relevant dynamic treatment targets as prescribed in the effective corrections literature, is related to reductions in recidivism (Kroner & Yessine, 2013; Serin et al., 2013).
Conclusion
This study supports the use of the GPPM as a measure of treatment gain in correctional programs. Pre-program scores can provide useful information on which offenders are at increased risk of program dropout. Offenders who receive low scores both pre- and post-program should be targeted for additional support and supervision as they were found to have higher rates of returns to custody than offenders who demonstrated treatment gains.
There are a number of limitations to this study. One is the lack of a non-treatment comparison group. Because the GPPM is not administered to offenders who do not participate in programs, it was not possible to assess differences in change ratings between offenders who met the referral criteria but chose not to participate and program participants. It would be useful to establish whether short-term changes on dynamic risk factors included in the GPPM occur in the absence of participation in structured interventions. Second, offender ratings assume perfect observer accuracy (Babchishin, 2013) while measurement error, due to such potential factors as insufficient information to accurately rate the item (especially pre-treatment), unclear items or indicators anchoring the ratings for the items, or limited training of the observers, could affect the degree to which differences in pre- and post-ratings reflect true offender change. Applying recent developments in structural equation modeling (Morin, Marsh, & Nagengast, 2013) to the analysis of correctional results could provide insight into the extent to which offender change on the GPPM reflect measurement invariance (i.e., the observed change is not attributable to measurement error). In the present study, the scores’ potential to reflect measurement invariance may have been increased by the requirement that the raters undergo standardized training and previous research indicating that interrater reliability is satisfactory. Finally, the concurrent validity of the measure was not examined. Future research should determine the extent to which the results of the GPPM is related to other established measures of skills development and attitude change. The GPPM relies solely on staff ratings. Multimethod supplementary assessments that include offender self-report could establish the measure’s agreement with other valid tools that assess key correctional program goals.
Footnotes
Authors’ Note:
We thank the men and women program delivery officers of the Correctional Service of Canada (CSC) who produced and compiled these data, and their managers who ensured the data integrity. The work was completed as part of a research project funded by the CSC. The views and opinions expressed in this article are those of the authors and do not necessarily reflect the policies and perspectives of the CSC.
