Abstract
This study examines the impact of offenders’ psychological and demographic attributes and their offense history on the effectiveness of Reasoning and Rehabilitation, a cognitive–behavioral intervention. Differential effects were examined for a sample of 940 male parolees randomly assigned to either experimental or comparison conditions. The study used survival analysis to test interactions between treatment and age, race, social class, risk, marital status, prearrest employment status, education, prior violence, interpersonal maturity level, personality, reading level, and IQ. For the entire sample, the difference in recidivism rates (returns to prison up to 33 months) was not statistically significant. The analysis of differential effects, however, uncovered five interaction effects. The treated high-risk, aged 28 to 32 years, assessed as dependent (Jesness Inventory [JI]), and White groups evidenced lower recidivism rates than their comparison group. The treated parolee group assessed with high anxiety (JI) evidenced a higher recidivism rate than their comparison group.
Introduction
The Georgia Cognitive Skills Experiment (GCSE) was implemented by the Georgia Board of Pardons and Paroles (GBPP) as part of a larger commitment to evidence-based practice (MacKenzie, 2000). In the course of implementing a well-tested program, Reasoning and Rehabilitation (R&R; Ross & Fabiano, 1985), two phases of the GCSE used a randomized controlled experiment to compare outcomes for offenders who participated in R&R with those who did not. The central question of the Phase I study concerned the overall effectiveness of R&R (Van Voorhis, Spruance, Johnson-Listwan, Ritchie, & Seabrook, 2004). Phase II, the present study, expanded on this inquiry to determine whether program effects varied by individual attributes pertaining to age, race, social class, risk, marital status, prearrest employment status, education, prior violence, interpersonal maturity level, personality, reading level, and IQ. 1
In assessing the differential effects of a cognitive skills program, the GCSE joined a small handful of studies seeking to determine whether a specific program modality worked better for some types of offenders than others. Such research has been advocated for decades (Andrews, Bonta, & Hoge, 1990; Gendreau & Ross, 1987; MacKenzie, 1989; Palmer, 1978, 2002; Van Voorhis, 1994), but very few studies have actually sought the findings needed to identify psychological and other individual attributes which separate successful program participants from unsuccessful ones (see, however, Andrews & Kiessling, 1980; Palmer, 1974, 2002; Robinson, 1995; and Warren, 1983). Even fewer studies have examined these effects for cognitive–behavioral programs.
Differential Effectiveness in Correctional Interventions
Although cognitive–behavioral treatments (CBTs) are put forward as generally more amenable to offender populations than other psychological treatment modalities (e.g., Rogerian person-centered or psychodynamic approaches), sources note that they should be adopted to specific offender characteristics, such as gender, culture, race, psychological attributes (e.g., personality, emotions, cognitive abilities, intellectual functioning), and other individual considerations (Andrews & Bonta, 2010). Accommodating these potential barriers to treatment has come to be termed the specific responsivity principle (Andrews & Bonta, 2010; Andrews, Bonta, & Hoge, 1990; Gendreau, 1996).
The notion of specific responsivity is not new to correctional treatment, however. During the 1960s and 1970s, the term differential treatment or matching (e.g., Palmer, 1974, 2002; Reitsma-Street & Leschied, 1988; Warren, 1971, 1983) was used to recommend individualized approaches to specific types of offenders. In the broader psychotherapeutic literature, specific responsivity falls under the notion of prescriptive interventions (Goldstein, Glick, Irwin, Pask-McCartney, & Rubama, 1989). Although it is hardly a new recommendation for social service programs, specific responsivity is seldom incorporated into correctional practice or research (Bonta, 1995; Van Voorhis, 1997).
The problem this creates for research and prevailing understandings of the effectiveness of correctional interventions is one that involves masking the effects of specific interventions. This occurs when controlled studies portray no treatment effect when, in fact, the tested intervention was effective with some types of offenders and not others, causing the overall research findings to be attenuated by the fact that effects for successes were cancelled out by the effects for the failures. Masked effects, which could also be termed interaction effects, have been noted among restitution programs as moderated by personal maturity (Heide, 1982) or moral development (Van Voorhis, 1985); community treatment programs for youth (moderated by personality and cognitive development; Neto, 1972; Palmer, 1974; Reitsma-Street & Leschied, 1988); Guided Group Interaction (moderated by personality; Empey & Lubek, 1971), and a number of juvenile interventions with respect to race (Leiber & Mawhorr, 1995; Lochman, Coie, Underwood, & Terry, 1993; Nangle, Erdley, Carpenter, & Newman, 2002). Perhaps the most consistently observed differential findings regard offender risk. Among intensive programs for offenders, research repeatedly notes that treatment effects are much stronger for high-risk offenders than low-risk offenders (Andrews & Dowden, 2006; Andrews, Zinger, Hoge, et al., 1990; Brusman Lovins, Lowenkamp, Latessa, & Smith, 2007; Lipsey, 2009; Lowenkamp, Latessa, & Holsinger, 2006; Van Voorhis, Groot, & Ritchie, 2010).
For the most part, the literature on responsivity and differential effectiveness stops at the point of detecting such effects. The next step, employing such knowledge to design and empirically test programs that accommodate those differences, has been largely overlooked with the exception of a small cadre of programs for youth assessed into prescribed interventions during the 1960s and 1970s (see Palmer, 2002). Many of these fell out of favor due to their intensive assessment and resource requirements. Perhaps a more straightforward approach to accommodating individual differences might be seen in Andrews and Bonta’s (2010) recommendations regarding specific responsivity considerations for the three basic offender personality styles they considered to be most relevant to offender success—psychopathy, immaturity, and emotional distress. For example, the authors suggested that individuals with limited cognitive or interpersonal skill sets (including low levels of interpersonal maturity or IQ) should not be assigned to modes of service that are verbally and interpersonally complex, requiring self-reflection, sensitivity and empathy to others, complex vocabulary, or abstract reasoning. Similarly, highly anxious individuals are not helped by interventions noted for strong and confrontational interpersonal exchanges. Finally, those with antisocial and manipulative personalities need much structure, clear rules, and rewards and consequences. The authors also advocated for addressing age differences through developmentally appropriate services (Andrews & Bonta, 2010).
Even more straightforward, of course, are instances where offenders may simply be screened out of an inappropriate intervention, on the bases of assessed risk, IQ, or other relevant attribute. Such recommendations are noted in many of the directives for CBTs but often get lost in the confusion of implementing change on a large scale. In fact, there is no shortage of process evaluation studies which find significant proportions of the participants to be inappropriate for the intervention being tested (Van Voorhis, Cullen, & Applegate, 1995). Indeed, studies which disaggregate appropriate from inappropriate clients, typically do find the forecasted differential outcomes (see Andrews, Bonta, & Hoge, 1990; Andrews, Zinger, Hoge, et al., 1990; Lipsey, 2009; Palmer, 2002).
Another approach to accommodating the differential needs of offenders is seen in work with mentally disordered offenders (MDOs). A number of studies found improvements on intermediate outcomes, such as social problem solving and coping skills (Clarke, Cullen, Walwyn, & Fahy, 2010; Tapp, Fellowes, Wallis, Blud, & Moore, 2009) as well as arrest rates for nonviolent as well as violent crimes (Ashford, Wong, & Sternbach, 2008). These studies, however, are small pilot studies. Moreover, several authors also limit findings to program completers, thus raising some concerns for the comparability of comparison and experimental groups. However, modifications to the R&R curriculum to better accommodate the needs of MDOs represents an interesting development that is relevant to the issue of differential effectiveness and responsivity (see Young & Ross, 2007). Preliminary tests of the program, referred to as R&R2M, showed some improvements in intermediate outcomes. Again, however, outcomes were pertinent to a small group of program completers (Young, Chick, & Gudjonsson, 2010).
Cognitive–Behavioral Interventions for Offenders
Notwithstanding their inattention to specific responsivity, a number of systematic reviews of the CBT research over the last 20 years have portrayed these programs in a favorable light. These meta-analyses reported reductions in recidivism as high as 50% (Hollin, 1999; Landenberger & Lipsey, 2005; Lipsey, Chapman, & Landenberger, 2001; McGuire, 2000), but more typically, and in the most recent meta-analyses, CBTs achieved reductions in recidivism that ranged from 2.6% to 26% (Aos, Miller, & Drake, 2006; Drake, Aos, & Miller, 2009; Lipsey, 2009; Lipsey et al., 2001; Lipsey, Landenberger, & Wilson, 2007; Pearson, Lipton, Cleland, & Yee, 2002; Tong & Farrington, 2007; D. B. Wilson, Bouffard, & MacKenzie, 2005). Many of these studies found that CBT programs were effective with a variety of offender populations including substance abusers (Aos et al., 2006; D. B. Wilson et al., 2005), sex offenders (Aos et al., 2006; Hall, 1995; Hanson et al., 2000), and women offenders (Andrews & Dowden, 1999). The effectiveness of CBT was not found to be dependent upon whether the program was given in institutional or community settings (Lipsey et al., 2001; Tong & Farrington, 2007) or whether the program recipients were juveniles or adults (Lipsey et al., 2001; Pearson et al., 2002).
Several CBT meta-analyses specifically examined R&R. Most of these showed that R&R was at least as effective as other types of CBT (Aos et al., 2006; Lipsey et al., 2001; MacKenzie, 2006; Pearson et al., 2002; Tong & Farrington, 2006) and that R&R significantly reduced recidivism by 8% to 15% (Pearson et al., 2002; Tong & Farrington, 2006; D. B. Wilson et al., 2005). Furthermore, R&R was demonstrated to be robust across institutional versus community settings, earlier versus more recent evaluations, voluntary versus nonvoluntary participants, and large- versus small-scale studies (Tong & Farrington, 2006). However, D. B. Wilson and colleagues (2005) warned that the overall effect sizes may have been favorably skewed by the very strong results for early, demonstration programs where evaluators were closely involved in delivering the programs and monitoring their fidelity. Several larger evaluations of statewide or nationwide implementation efforts have not found significant treatment effects (Robinson, 1995; Van Voorhis et al., 2004). For example, Phase I of the GCSE reported a treatment effect of only 4%.
Differential effects have also been found in evaluations of R&R, as when Johnson and Hunter (1992) and Robinson (1995) found interaction effects for age, and Robinson (1995) found that R&R results were moderated by ethnicity (Robinson, 1995). Finally, Ross, Fabiano, and Ross (1989) have long noted that R&R is not appropriate to low-functioning individuals, those with IQ scores lower than 80. The risk principle, the admonition that intensive correctional interventions are more appropriate for high-risk offenders than low-risk offenders (Andrews & Bonta, 2010; Andrews, Bonta, & Hoge, 1990; Brusman Lovins et al., 2007; Lipsey, 2009, Lowenkamp et al., 2006; Van Voorhis et al., 2010) is especially pertinent to CBTs (Landenberger & Lipsey, 2005). However, one meta-analysis specific to R&R, reported the program to be effective with both high- and low-risk offenders (Tong & Farrington, 2006).
The Georgia Cognitive Skills Program
As noted above, the GCSE evaluated the State’s unmodified implementation of R&R (Ross & Fabiano, 1985), a cognitive skills program which sought to modify offenders’ impulsive, egocentric, illogical, and rigid thinking patterns. In doing so, R&R shared clinical dimensions common to the larger field of cognitive–behavioral psychology, currently considered to be the dominant therapeutic paradigm in mental health (Dodson & Khatri, 2000). Specific objectives of the R&R program included improving offenders’ interpersonal problem-solving, consequential thinking, means–end reasoning, social perspective-taking, critical and abstract reasoning, and creative thinking (Ross et al., 1989). The program was structured into 35 lessons that covered seven key components: (a) problem solving, (b) creative thinking, (c) social skills, (d) management of emotions, (e) negotiation skills, (f) values enhancement, and (g) critical reasoning. The class sessions build upon each other in such a way that new skills were presented along with opportunities to practice previously acquired skills. Activities included role-playing, thinking games, homework assignments, and group discussions. Against this background, the present exploratory study seeks to identify individual attributes that moderate the effectiveness of this cognitive–behavioral intervention. These include cognitive maturity (I-level), personality type, age, race, IQ, risk, prior violence, reading, SES, marital status, employment, and education. As noted hereinbefore, 6 of the 12 attributes have been shown to affect effectiveness in other treatment studies. On the bases of previous research, cited hereinbefore, we could hypothesize, that poorer outcomes would be achieved by those with lower levels of cognitive maturity; neurotic/high anxiety personality attributes, and by young, minority, low-functioning (low IQ), or low-risk offenders. However, only the findings respecting risk (Landenberger & Lipsey, 2005), IQ (Ross et al., 1989), and age (Johnson & Hunter, 1992; Robinson, 1995) apply specifically to cognitive–behavioral programs. As with studies of other correctional interventions, research guiding attempts to accommodate specific responsivity or prescriptive programming is scarce. Thus, the need for exploratory research is apparent.
Method
From July 1998 to April 2000, 25 parole districts across Georgia participated in Phase II of GCSE. Across these districts, 44 treatment groups were run, each with an experimental and control group. Selection criteria for the study pool sought high-risk offenders, and those with IQ scores (Culture Fair; Cattell & Cattell, 1973) above 80, no history of sexual offending, at least 16 months remaining on their parole term, and an absence of substance abuse problems so severe as to interfere with the offender’s ability to attend or comprehend the program. With the exception of the IQ scores, these decisions were not aided by assessments. 2
Research Design
The lists of eligible parolees were then sent to the central offices in Atlanta to be randomly assigned to the parole or comparison group. The study used a simple randomization design. A blocked randomization procedure was used to randomize subjects within each time period and parole district. Therefore, Es and Cs within each district were subjected to similar parole supervision policies, and officers. However, a randomization process that utilized a blocking procedure to randomize within categories of the test variables listed hereinbefore (i.e., randomized block design) was not possible given the presence of 12 such variables. Some would assert that this post hoc subgroup analysis increases the chance of a Type I error and unknown biases (Ariel & Farrington, 2010). It will be seen, however, that the equivalence between the experimental and comparison groups within categories of the test variables is strong (see Table 1). In addition, a multivariate approach was sought to control for sources of nonequivalence attributable to available variables. The design utilized an intent-to-treat approach (ITT); noncompleters are included in the experimental group.
Comparison of the Experimental and Control Group Parolees Across Background Demographic and Social Characteristics Collected at Prison Intake (n = 940)
Category is used as a reference category in survival analysis.
The comparison group members also participated in treatment programs. In an earlier survey (Phase I participants), more than 70% of experimental and comparison group members were enrolled in substance abuse programs; fewer, approximately 20% participated in employment and educational programs. Participation levels were similar for both the experimental and the comparison group members. At the time, R&R was GBPP’s only use of a cognitive–behavioral intervention and it was only available to experimental participants. Simply put, the state wished to determine whether R&R could advance the effectiveness of its current treatment approach.
Sample Attrition Issues
A total of 963 parolees were randomly assigned to either the treatment group (n = 470) or the comparison group (n = 493) after accounting for 23 experimental parolees who inexplicably did not begin the program. Unfortunately, this number dropped to 459 experimental cases and 481 comparison cases (n = 940, total) when the project incurred problems obtaining data on all demographic, prior history, and outcome measures. The number of cases available for a comparison of recidivism rates between experimental and control groups was higher (n = 963) than that for analyses of differential effects which required measures of offender characteristics (n = 940). The 23 omitted cases were missing data on most of the independent measures other than group (e/c) membership. For, the remaining 940 cases, missing data was incurred for measures of employment status (1.0%), social class (2.7%), education (0.4%), marital status (0.6%), and personality type (14.0%). For the 940 participants, these missing values were replaced through a multivariate (logistic regression) process that imputed missing values, a process that is asserted to be appropriate when the proportion of missing cases does not surpass 20% (Sijtsma & van der Ark, 2003).
During-program attrition is a problem faced by many correctional programs including this study as well. However, the 190 experimental participants (40.4% of the experimental group) who did not complete the program remained in the experimental group for these analyses. Attrition analysis (logistic regression) regressed a dichotomous program completion measure on measures of age, race, social class, risk, history of violence, IQ, reading level, personality, cognitive maturity, and marital status. Offenders most likely to fail to complete the program (p < .05) were middle class, medium-to-high risk, highly anxious, and between the ages of 23 and 27.
Program Fidelity
The facilitators’ handbook (Ross et al., 1989), in addition to intensive training of facilitators (coaches), attempted to foster program fidelity or strong adherence to the program design and key components of the cognitive–behavioral tradition. Fifty Georgia parole officers were trained by the program creators, Elizabeth Fabiano and Frank Porporino. Classes were held in meeting rooms of district parole offices and the majority were co-facilitated.
Program fidelity during the program was monitored primarily through the use of observer assessments. These were conducted by trained administrative staff from the GBPP central office. Observation forms rated classroom atmosphere, facilitators’ (coaches’) organization and professional demeanor, classroom structure, and skill in dealing with difficult participants. The observer visited each program twice during the course of the study and provided feedback to the cognitive coaches on these and other matters pertaining to program integrity. The observers generally gave high ratings to facilitators on levels of class participation, respect for participants, and facilitators’ skill in identifying the cognitive processes of clients. However, the facilitators’ maintenance of class organization and structure appeared to vary across classes, but the aggregate ratings were nevertheless quite high. Moreover, observer ratings, which were fairly high on the first visit, improved further by the second observation following the observers’ feedback to program facilitators.
Data on program fidelity was also available from coaches’ observation forms and from participants. These data were used primarily for research purposes. Participants rated classes on 21 items which formed three scales: coaches’ adherence to social learning methods, relevance of skills, and group climate. Across all scales participant ratings were generally high, though ratings of adherence to social learning methods, and relevance of the learned skills were rated somewhat higher than group climate. Coaches rated classroom participants at five different points throughout each R&R program according to level of participation, level of enjoyment, level of understanding, use of previously taught skills, and group atmosphere. Responses on these dimensions (averaged across the five evaluations) were favorable for approximately 75.0% of the participants (range across scales, 70.4-76.8). The proportion of participants reported to be using previously taught skills increased throughout the program from 71.0% at Time 1 to 94.3% at Time 5.
Notwithstanding generally favorable class ratings, concerns were raised for the 40.4% attrition rate (discussed hereinbefore), class size, and limited attention to program screening criteria. Class sizes varied from 6 to 28 (M = 12; median = 10) on the first day of the program and from 2 to 20 on the 30th day. Further details are offered in the process report for this study (Spruance, Van Voorhis, Johnson-Listwan, Pealer, & Seabrook, 2002). Program fidelity measures are used in this article to describe the program fidelity. In a subsequent article, we incorporate program measures into a separate study that empirically examines within-group differences for the experimental group.
Sample
Table 1 describes the 940 study participants who are included in the multivariate analyses. The study participants were predominately non-White (70.2%). Their average age was 32.0. All were male. At the time of their arrests, 54.0% were employed on a full-time bases. At prison intake, 41.0% were rated as middle class, and 63.8% were single (never married). The majority of the parolees (66.5%) had less than a high school education. Notwithstanding the GBPP directives to exclude parolees with low IQ scores, 6.1% of the participants scored 80 or below on the Culture Fair Test (Cattell & Cattell, 1973). Even so, 71.5% of the sample read at the fifth-grade reading level or higher.
JI (Jesness, 1996) results found 61.3% of the parolees diagnosed at I-levels 2 or 3, indicating that cognitive and interpersonal maturity was atypically low, even for a criminal population. JI personality diagnoses observed: (a) 30.2% aggressive (i.e., antisocial values, internalized criminal lifestyle), (b) 15.5% neurotic (i.e., high anxiety), (c) 29.5% dependent (i.e., easily led, immature), and (d) 24.8% situational (i.e., prosocial value system, and crime associated with adversity, poor coping skills, and perhaps substance abuse).
Offender risk and prior offense attributes are shown in Table 2. Most participants (79.0%) scored medium-to-high risk on a risk scale developed for this study. All but two participants had a prior felony on record and 58.6% had served at least one prior prison term. Prior convictions for at least one prior violent offense characterized 412 participants (43.8%). Prior convictions for property offenses, drug sales, and drug possession were found for 71.8%, 44.5%, and 64.8% of the sample, respectively. Finally, 32 participants (3.4%) had committed at least one prior sex crime, even though screening criteria attempted to exclude such individuals.
Comparison of the Experimental and Control Group Parolees Across Criminal History Characteristics Collected at Prison Intake (n = 940)
p < .05.
Generally, percent distributions were similar for the experimental and comparison groups. There were no significant differences between the comparison group and the experimental group on any of the background measures noted in Table 1. Modest differences in the number of prior property offenses were however statistically significant (p ≤ .05). Therefore, randomization procedures appear to have been adhered to. However, screening of participants into the study pool was not optimal as revealed by the presence of sex offenders, low-risk offenders and those with lower IQ scores. 3
Data Sources and Measures
At the time of the present study, GBPP and the Georgia Department of corrections maintained most of its offender data on the Georgia Offender Tracking Information System (OTIS). Data obtained during prison intake and subsequently entered into OTIS-provided measures of the social and demographic background characteristics reported hereinbefore. OTIS also furnished data pertaining to criminal histories (e.g., prior incarcerations, felonies, and violent offenses) along with reading (Wide-Range Achievement Test [WRAT]) and IQ (Culture Fair; Cattell & Cattell, 1973) scores. Independent measures included the focal group assignment variable (experimental vs. controls), as well as individual characteristics which were used as covariates. These individual attributes consisted of age, race, social class, marital status, prearrest employment status, prior violence, education, reading level, IQ scores, risk, interpersonal maturity level, and personality.
Age
Age was not determined to be linearly associated with recidivism. As such it was represented by a series of dummy variables depicting the participant’s age at the beginning of participation in the cognitive–behavioral program: (a) 18-22, (b) 23-27, (c) 28-32, (d) 33-37, and (e) 38+. While not a frequent subject of research, at least two studies involving R&R found age effects. Robinson (1995) found the program to be least effective among offenders younger than 25 or older than 39. Similarly, Johnson and Hunter (1992) reported the program to be least effective among offenders less than 30 years of age.
Race
Race was a dichotomous measure of “White” and “non-White” (non-White = 1). The non-White group was predominately African American; only two participants were Native American and one was Hispanic. Explorations of whether race moderates the effects of correctional interventions are uncommon, and results across studies are equivocal. For example, while S. J. Wilson, Lipsey, and Soydan (2003) noted few studies on the topic, they reported no significant differences between Whites and minorities; earlier, however, Garrett’s (1985) meta-analysis of residential treatment programs reported some unspecified racial interaction effects. Most studies of racial effects involve juvenile samples, and some find African American youth achieving considerably less treatment gains than Whites (e.g., see Hudley & Graham, 1993; Leiber & Mawhorr, 1995; Lochman et al., 1993), but one found African American youth to have more favorable outcomes (Gordon, Moriarty, & Grant, 2000).
Socioeconomic Status
Participants’ socioeconomic status was ascertained by prison intake counselors. The GDOC personnel did not use an established index for this purpose, but were guided by the following criteria: (a) welfare (receiving some form of public assistance at the time of arrest, regardless of other income), (b) occasionally employed, (c) minimum standard (annual income meets the government’s Minimum Standard of Living), (d) middle class (living on more than the Minimum Standard of Living and having some resources, such as property, savings, or investments), and (e) other. The measure was collapsed into a dummy variable indicating whether participants were middle class.
Marital Status
Marital status was coded by GDOC into seven categories and then collapsed for multivariate analysis into a dichotomous variable indicating married or nonmarried status. Although a number of studies report that for men, marriage appears to be a source of criminal desistance (see Horney, Osgood, & Marshall, 1995; Laub & Sampson, 1993, 2003), there is less evidence to suggest that marriage interacts with treatment condition.
Employment and Education
Employment and education measures reflected parolees’ self-reported characteristics at prison intake. Employment status was collapsed into four dummy variables: employed full-time, employed part-time, unemployed for less than 6 months, unemployed for 6 months or more, never had a job but was capable of working, student, incapable of working, and other. For purposes of analyses, the measure was collapsed into four dummy variables: employed full-time, employed part-time, unemployed for less than 6 months, unemployed for more than 6 months, and other. An education measure noted the highest degree/certificate of educational attainment or the highest grade completed if they had not finished high school. The measure was collapsed into a dummy variable indicating whether the participants had completed high school/earned a GED.
Intelligence
The Culture Fair Intelligence Test (Cattell & Cattell, 1973) scores provided raw scores that were converted to standardized IQ scores. These were then collapsed into a dichotomous measure at one standard deviation below the mean. Thus, low IQ was a score of 85 or lower. The Culture Fair Intelligence Test is a short, pencil and paper assessment designed to avoid biases attributable to reading skills or ethnocentric vocabulary. Reading levels were measured by the WRAT (Reid, 1986; Reynolds, 1986). Scores are reported by grade level and number of months in a grade. For purposes of these multivariate analyses, reading was dichotomized at the fifth-grade reading level. The cut-points for both the intelligence and the reading scores reflect the proficiency levels recommended for entry into the R&R program (Ross et al., 1989).
Risk Scores
As noted earlier, previous research on cognitive interventions often finds intensive programs to be more effective for high-risk offenders than for low-risk offenders. However, the risk effect was not found in previous studies of R&R among parolees (Robinson, 1995; Van Voorhis et al., 2004) or other correctional populations (Tong & Farrington, 2006). Because risk scores, as assessed by the GBPP, were unavailable for 29.4% of all participants, a risk score was constructed that approximated the Salient Factor Score (SFS) of the U.S. Parole Commission (Hoffman, 1994) using static variables available through OTIS. This required modification of three of the SFS factors: (a) history of substance abuse was substituted for the SFS heroin/opiate dependence measure, (b) the SFS measure of a recent commitment free period of 3 years or more was not available and was omitted from our scale, and (c) measures of juvenile offenses were not available on OTIS, so all measures of prior convictions pertained to adult records. Therefore, we used the existing data to construct the following seven items: (a) prior adult convictions (0-3 pts), (b) prior adult incarcerations (0-2 pts), (c) age at first commitment (0-2 pts), (d) offense type (0-1 pts), (e) prior parole results (0-1 pts), (f) drug use (0-1 pts), and (g) employment (0-1 pts). The scale constructed for this study showed construct and predictive validity. It was significantly and strongly related to the GBPP score among those who had a risk score on record (γ = .81, p ≤ .001) and to offender recidivism (returns to prison) over the course of the follow-up period (r = −.35, p ≤ .001). The scale was collapsed to two points identifying low/medium (0-6 on the scale) and high risk (7-11 on the scale; high risk = 1). Similar, researcher-constructed measures have been used by others (e.g., Brusman Lovins et al., 2007; Lowenkamp & Latessa, 2004; Van Voorhis et al., 2004).
Psychological Classifications
Psychological classifications of offenders into three levels of Interpersonal Maturity and four personality types (Van Voorhis, 1994; Warren, 1971) were provided by the JI (Jesness, 1996). GBPP staff administered the 155-item true/false, paper and pencil inventories to the experimental and control groups at the beginning of the Cognitive Skills Program. Study participants were portrayed according to two psychological dimensions of the JI: (a) interpersonal maturity, and (b) personality subtypes. Building on the theoretically derived I-level system developed by Sullivan, Grant & Grant (1957), and later Warren (1971), the JI placed offenders into one of three levels of Interpersonal Maturity. I-levels 2 through 4 assess how individuals perceive the emotions and motivations of others, and provides insight regarding the complexity of one’s thought processes. The authors of the I-level typology also identified 13 personality subtypes among juvenile offenders (Warren, 1971); the JI furnished scales measuring nine of the types. Research among adults has since found that the nine subtypes could be further collapsed into four personality types: aggressive, neurotic (high anxiety), dependent, and situational types (Van Voorhis, 1994). Descriptions of each of the levels and personality types are provided in Figure 1. The four personality types were each represented as dummy variables in the analyses. The I-level measure was collapsed into a dummy variable (I4 = 1, I2 and I3 = 0). Although data on I-level were available for all participants, double scoring and missing items affected the computerized scoring of the scales pertaining to personality. Inventory personality measures were initially available for 86.0% (n = 808) of the 940 parole participants. Scoring rules were developed to complete missing items for inventories with less than four missing items. 3 Seventy-five percent of the affected inventories (n = 99) were corrected in this manner. When this was not possible, missing values for the remaining types were imputed through multivariate procedures.

Summary of Interpersonal Maturity Levels and Personality Types (Harris, 1988; Jesness, 1988; Warren, 1969)
Outcome Data Sources
The recidivism measure consisted of an indicator of returns to prison, extracted from OTIS. These data were obtained for the 940 participants for times ranging from the beginning of their participation in the study until the conclusion of our data collection period. Because the GBPP conducted the cognitive classes at different times throughout the evaluation and the staff obtained readmission to prison data at one point in time (October, 2001), the length of available follow-up varies. Parolees in early groups have lengthier follow-up time frames than those who entered the study later. Specifically, data on returns to prison were available for all experimental and comparison group members for up to 12 months after the program end dates. Data for subsequent time periods (15, 18, 21, 24, 27, 30, and 33 months) characterize smaller proportions of the sample, so that return to prison data were collected on only 122 parolees, or 12.2% of the sample at the 33 month follow-up period. As discussed below, a decision to use event history, analysis was made to accommodate differing follow-up time frames. Thus, the follow-up measures indicated whether a participant returned to prison (y/n) during one of 12 time periods, including the period of time they were in the program. These were coded into 11, 3-month, follow-up periods. Data were also available on measures pertaining to intermediate outcomes (offender attitudes), technical violations, employment, and arrests. For ease of presentation, we focus only on the returns to prison measure which followed offenders for a longer period of time.
Data Analysis
The effects of the cognitive program on the likelihood of being readmitted to prison were revealed through the use of discrete time event history analysis (Allison, 1984; Willett & Singer, 1993). This statistical procedure estimated the probability or likelihood of a return to prison while considering the different lengths of time that the participants were available or “at risk” for the event to occur. Curves shown in Figures 2 through 7 represent the predicted cumulative failure rates (inverse of the cumulative survival function) for each group by length of time (months) from the beginning of the R & R program. Cases were censored when the participant was returned to prison or when the available follow-up window had expired.

Returns to Prison by Experimental Condition And Time (Main Effects Model; n = 940)

Returns to Prison by Experimental Condition and Time, Interaction Between Risk Level and Experimental Condition (n = 940)

Returns to Prison by Experimental Condition and Time, Interaction Between Neurotic Personality Type (Jesness Inventory) and Experimental Condition (n = 940)

Returns to Prison by Experimental Condition and Time, Interaction Between Dependent Personality Type (Jesness Inventory) and Experimental Condition (n = 940)

Returns to Prison by Experimental Condition and Time, Interaction Between Race and Experimental Condition (n = 940)

Returns to Prison by Experimental Condition and Time, Interaction Between Age and Experimental Condition (n = 940)
The analyses first compared the overall results for participants in the cognitive and comparison groups. We then examined the differential treatment effects, analyzing whether some types of participants had better results from the cognitive program than others. As noted above, individual covariates included age, race, social class, risk, marital status, prearrest employment status, education, prior violence, interpersonal maturity level, personality, reading level, and IQ. Interaction terms [individual characteristic by treatment condition (e/c)] were computed and entered into the event history analyses, along with noninteractive (additive) attributes to partial out the effects of the individual-level characteristics. Interaction terms were tested through a decrement to chi-square analysis for the significance of the interaction terms. Interaction models showing a significant improvement (p ≤ .05) over the more parsimonious additive model were considered to indicate that the attribute moderated the effects of the cognitive program. Interactions found at probability levels of less than .10 are discussed for purposes of illustration. The decision to report results between p = .05 and p = .10, instead of strict adherence to the p ≤ .05 standard, was based on concern for the statistical power of some of the tests and seemed appropriate to the exploratory nature of the study. Even though the sample was quite large, the analysis produced some small subgroups where substantial differences were not found to be significant at p ≤ .05.
Concern for the statistical power of each model also required the testing of single interaction terms rather than the simultaneous testing of all possible interactions. In the case of factors with multiple rather than dichotomous responses, each category was tested for its interaction with the treatment condition. However, for clarity of presentation, only the interaction terms showing decrement to chi-square levels of .10 or lower are shown in Figures 3 through 7.
Results
Overall Treatment Effects
Results of event history analysis for parolees randomly assigned to experimental and comparison groups are shown in Table 3 and Figure 2. This is the main effects model, the primary test of most program evaluations, indicating the overall effect of the program. As can be seen in Figure 2, the estimated recidivism rate for the comparison group was slightly higher than that for the experimental group, showing that the program had not reduced the recidivism of its participants. This was true throughout the 12 follow-up periods. By the end of the 33-month period, the predicted return to prison rates for experimental and comparison groups were 40.4% and 44.1%, respectively. These rates derive from the logistic regression equation shown in Table 3. The difference in rates for the two groups (3.7%) by 33 months following program completion was not statistically significant. These results were similar to those found for Phase I of the GCSE (see Van Voorhis et al., 2004). Results on other follow-up measures also showed nonsignificant differences between e’s and c’s. 5
Returns to Prison by Experimental Condition and Time (Main Effects Model; n = 940)
Note. Omitted variables (reference categories) are: (a) during program time period; (b) age 38 and older; (c) employment other; (d) situational.
p ≤ .10. **p ≤ .05. ***p ≤ .01. ****p ≤ .001.
Differential Effects
Logistic regression equations testing for whether individual attributes moderated the effects of the R&R program are shown in Table 4. No significant interactions were noted for social class, marital status, prearrest employment status, education, reading level, IQ, prior violence, or interpersonal maturity level. However, statistically significant (p ≤ .05) decrements to chi-square tests were detected for risk by experimental condition and personality (neurotic) type by experimental condition. In addition, the decrement to chi-square tests between age and experimental condition approached significance (p = .07) as did interactions between race and experimental condition (p = .07), and dependent personality type and experimental condition (p = .08). Recidivism curves for these findings are shown in Figures 3 through 7 and interpreted below.
Returns to Prison by Experimental Condition and Time and Interaction Effects (n = 940)
Note. Omitted variables (reference categories) are (a) during-program time period, (b) age 38 and older, (c) employment other, (d) situational.
p ≤ .10. **p ≤ .05. ***p ≤ .01. ****p ≤ .001.
Risk by Experimental Condition
Figure 3 shows that treatment effects were significantly more favorable for high-risk parolees than for medium-/low-risk parolees. By the final 33-month time, the high-risk experimental participants showed an estimated recidivism rate of 58.2% compared with high-risk comparisons (68.5%). The estimated treatment effect was 10.3%. In contrast, medium- and low-risk experimental participants evidenced a slightly higher rate of recidivism (30.5%) than their counterparts in the comparison group (26.4%).
Personality by Experimental Condition
The personality interaction models examined whether aggressive, neurotic, dependent, and situational personality types (JI) responded differently to R&R. To afford optimal statistical power, each of the personality types was tested in separate models. A strong interaction effect (p = .002) was detected for the neurotic personality type by experimental condition. As shown in Figure 4, the neurotic experimental group participants evidenced a considerably higher estimated recidivism rate (63.9%) than the neurotic comparison group members (38.9%).
The model for the dependent personality type by treatment condition approached significance according to the decrement to chi-square test shown in Table 4. A treatment gain of 14.2% was achieved by the dependent experimental group over the dependent control group (see Figure 5); however, this was not significant at the .05 level.
Two additional interactions are shown in Figures 6 and 7, for race and age. As shown in Table 4 these decrement to chi-square tests also approached statistical significance, both with probability values of .07. For purposes of illustration, recidivism curves are shown in Figures 6 and 7.
Race by Experimental Condition
White participants assigned to the cognitive skills class returned to prison at lower estimated rates than White participants in the comparison group. By the end of the follow-up period, 48.9% of White participants in the comparison group and only 35.4% of Whites in the experimental group were returned to prison. Among non-White participants, however, estimated recidivism rates were nearly identical to those for the non-White comparison group, 43.6% and 43.0%, respectively.
Age by Experimental Condition
Returns to prison were affected by participants’ age. Treatment gains were greater for subjects between the ages of 28 and 32 (a 14.1% difference). Significance tests for the other age groups’ interactions with treatment condition were not significant.
Discussion
In sum, the Reasoning and Rehabilitation Program did not favorably affect recidivism outcomes in this setting. Differences between participants randomly assigned to the R&R group and the comparison group were negligible (3.7%). These results were unfortunate and contrary to the favorable outcomes generally seen for offender cognitive–behavioral programs. However, they are similar to results for other large-scale interventions, implemented on a statewide-level rather than in a small demonstration site (see D. B. Wilson et al., 2005). There are nevertheless opportunities to learn from these findings, by identifying factors differentiating successful participants from unsuccessful participants.
The results found for the main effects model masked the favorable results for certain subgroups of participants. In fact, the treatment effect was 10.3% for high-risk offenders, 8.6% for nonneurotic/low-anxiety offenders, 14.2% for dependent parolees, 13.5% for Whites, and 14.1% for parolees between the ages of 28 and 32 years.
This is not to assert that the program was a success, however. Even the results for the most amenable participants were not commensurate with the treatment effects observed in some other evaluations of R&R and similar programs. To put these findings in perspective, results for the amenables in our study only brought success rates to within the 14.0% range noted for all participants in a recent meta-analysis of R&R (Tong & Farrington, 2007). Thus, it is likely that the overall effect size and the more favorable findings for specific subgroups of offenders are also affected by low completion rates and variations in program quality across sites (Van Voorhis et al., 2004).
We note, for example, that the program had a high attrition rate that was not entirely unusual to community-based programs. As well, two of the factors predictive of noncompletion (age and high anxiety) were similar to those that adversely interacted with program status (experimental or comparison) to produce unfavorable outcomes. This pattern is not unique to the present study. An emerging literature on program attrition, for example, often finds the predictors of overall recidivism to be at fault—factors such as lower levels of education, unemployment, younger age, high risk, low motivation to change, and psychopathy (for a review, see Cullen, Soria, Clarke, Dean, & Fahy, 2011). This research, similar to the responsivity research recommends more careful attention to those attributes related to program attrition. This research suggests that program noncompletion is part of an overall pattern of failure.
The intent to treat design mandates that we keep noncompleters in the experimental group for the analysis. This attributes program attrition to one of a number of possible explanations for program failure. Its effect and that of other program factors will be examined in a follow-up study examining only the experimental group for within-group programmatic differences and their impact on overall program effectiveness.
We also note that the sample size, though large, did not afford an opportunity to study more saturated models that would allow for a determination of the comparative importance of each interaction. Sample size was also somewhat problematic given the partitioning of participants into subgroups. This may explain why the tests of race, dependent personality, and age reached significant levels less than .10 rather than less than .05 in spite of rather strong differences. The alternative, to use a randomized block design instead of a complete randomized design (Ariel & Farrington, 2010), may have assured adequate numbers, but would have proven extremely unwieldy given the number of moderators tested and the exploratory nature of some of the tests.
We also make no claims to have exhausted all possible individual factors that could have differentiated successful participants from unsuccessful ones. Although we have tested many possible interactions, offender motivation, social support, and other psychological attributes may also have attenuated the program’s main effects.
In another sense, several of the detected interaction effects mirror findings from other studies and, as such, are not likely to be random occurrences. This is especially true for findings regarding high-risk parolees. In fact, the risk effect has been replicated across many evaluations of intensive programs such as R&R. For example, one of the most recent meta-analyses of juvenile interventions finds interventions to high-risk offenders to be the program quality most strongly associated with effective reductions in recidivism (Lipsey, 2009). The importance of the risk effect is not only crucial for identification of high-risk offenders as highly amenable to intensive correctional programs, but as well for its admonitions regarding low-risk offenders; low-risk offenders often have worse outcomes for being in an intensive intervention than for not being in such an intervention (Andrews, Bonta, & Hoge, 1990; Andrews & Dowden, 2006; Brusman Lovins et al., 2007; Lipsey, 2009; Lowenkamp et al., 2006; Van Voorhis et al., 2010) and this very important finding does not appear to be attracting sufficient attention from criminal justice officials, policy makers, and practitioners.
The second interaction effect found that parolees assessed as neurotic by the JI were harmed by the R&R program. This group of highly anxious offenders is seldom examined in corrections research, but when they are, they tend to be observed as a fairly distinct group of offenders. In previous research, for example, these individuals have evidenced: (a) more stressful adjustments to prison (Van Voorhis, 1994); (b) higher long-term recidivism rates than offenders assessed into other personality types, even when controlling for risk (Listwan, Sperber, Spruance, & Van Voorhis, 2004; Listwan, Van Voorhis, & Ritchey, 2007); and (c) more acknowledgment and guilt for their child molestation charges than other child molesters (Sperber, 2004). Even so, it would be difficult to determine whether these findings are attributable to the inherent design of R&R or to the manner in which the program was delivered. Social learning and cognitive–behavioral approaches do not offer a passive method of treatment. They stress active client participation, role-playing by the clients and the facilitators, and directed discussions of real-life situations in the clients’ lives. Thus, even the best of programs requires calling attention to oneself and demonstrating new skills in front of an audience. In addition, the cognitive–behavioral curricula are widely touted as programs that may be facilitated by nonclinical personnel. But the task of accommodating client anxiety may surpass the skill sets of lay facilitators. Indeed, skills needed to build client trust in a group setting, formulate therapeutic relationships, and deal with anxiety are not the primary focus of current curricula for training correctional staff. It is also possible that some unmeasured program attribute (e.g. overly confrontational facilitators or group members) made matters worse for these offenders.
Two findings that approached significance are also worthy of note. Although only significant at p ≤ .10, the observation that higher rates of success were noted for dependent personality types and younger adults may speak to the level of skills targeted by the program. Offender cognitive skills programs target very basic thinking skills. For example, teaching one the very elemental steps of problem solving may be most appropriate for less-skilled and younger offenders. This is the third study to find age effects with R&R, which is partially consistent with Robinson’s (1995) findings that the program was not effective with offenders above the age of 30.
Finally, although findings did not reach a p = .05 level of significance, the program was more successful with White than non-White participants. Unfortunately, the literature on the moderating influences of race on treatment effectiveness is not extensive. For a variety of reasons, program evaluations seldom disaggregate findings according to race, so the literature on offender-based cognitive–behavioral interventions does not contribute to our understanding of this finding. However, a strong literature on multicultural counseling (American Psychological Association 2003; Arredondo et al., 1996; Parrott, 1997; Shearer & King, 2004; Sue, Arredondo, & McDavis, 1992) faults prevailing therapeutic approaches for not accommodating client cultures. Thus, it is possible that current cognitive–behavioral interventions have been contextualized in preconceived and culturally dependent notions of lifestyle, norms, and worldview that are not relevant to specific racial and cultural groups receiving treatment. This is clearly a finding that should be examined in future studies of these approaches.
Policy implications of these findings are somewhat dependent upon the moderating influences, themselves. For example, the implications of the risk effect seem fairly straightforward. A large body of replicated research recommends that low-risk offenders simply be screened out of interventions designed for high-risk offenders (Bonta, 2009). It is clearly counterproductive to stipulate a programmatic condition that places low-risk offenders in continued association with high-risk offenders and interferes with alternative prosocial outlets such as employment and time with prosocial families.
However, the implications for personality, race, and age seem more complex than a simple recommendation for screening these offenders out of the program. We would certainly benefit from additional studies that replicated similar findings. Additional research might also help identify programmatic factors (e.g., staff skills, cultural focus, developmental focus) that rendered some offenders less amenable than others. Focus groups and further empirical study of programmatic qualities and their impact on the outcomes for these offender groups are warranted. Such findings could then guide program modifications that could better accommodate key individual differences. Attention to the specific responsivity principle might then result in modifications to class exercises, visual presentations, or staff relationship skills.
Footnotes
Authors’ Note:
This study was funded by the Office of Justice Programs for the National Institute of Justice (Grant 98-CE-VX-0013) and by the Georgia Board of Pardons and Paroles. The report reflects conclusion drawn by the authors and not the Georgia Board of Pardons and Paroles or the National Institute of Justice.
