Abstract
Although a range of opinions about the impact of incarceration on later offending have been articulated, there have been very few studies of sufficient methodological quality to allow the effect to be examined empirically. Drawing on a sample of 5,500 male offenders from 1 of 10 regions in the United Kingdom, propensity score matching was used to balance the preexisting differences between two groups of offenders: those who had been incarcerated for their index offense and those who had received community orders involving supervision. Both methods of balancing the group differences (matching/stratification) suggested that 1 year after release, offenders who had been incarcerated were significantly more likely to have committed another (proven) offense. These offenders also tended to commit more offenses and started reoffending earlier than those supervised in the community. Moreover, offenders who had originally been incarcerated were much more likely to be reincarcerated. In line with other emerging evidence, it was concluded that incarceration tends to slightly increase rather than decrease the chances of future offending. Limitations of the research are considered and directions for future research are explored.
Introduction
Preventing further crime is one of the main aims of sentencing in most jurisdictions (Tonry, 2009). While countries differ on the extent to which they seek to do this through rehabilitation, the increased use of custody across most of the world (e.g., Walmsley, 2009) suggests that there is a widespread belief in the value of imprisonment through general or specific deterrence, or through incapacitation. As a result, there are estimated to be around 10 million people in prison worldwide, amounting to 145 people for every 100,000 of the world’s population (Walmsley, 2009). Considering the worldwide prevalence of imprisonment and its increased use in many countries over the last two decades (e.g., Apel & Nagin, 2010; Ministry of Justice, 2010), it is essential to evaluate how far imprisonment does have a crime reductive value especially in relation to other sentencing options.
The behavioral response to the threat of prison (i.e., the general deterrent effect) is notoriously difficult to ascertain (e.g., Apel & Nagin, 2010; Durlauf & Nagin, 2010). This is because of the technical shortcomings of much of the early research in this area (as noted by Moxon, 1998; Nagin, 1998) and because of the ambiguous directionality (or what Apel & Nagin 2010 refer to as the “simultaneity problem”) between crime rates and imprisonment rates. Moreover, it can be misleading to evaluate the impact of the severity of a sanction (i.e., prison as a disposal for a given offense) without considering aspects of certainty such as the likelihood of detection and conviction for that offense (Helland & Tabarrok, 2007). Despite these methodological challenges, however, in their recent comprehensive review of general deterrence, Apel and Nagin (2010) concluded that there was “little evidence that increases in the severity of punishment yield general deterrent effects that are sufficiently large to justify their social and economic costs” (p. 430). Other researchers echo these sentiments (see, for example, Marsh, Fox, & Hedderman, 2009), suggesting that there is limited evidence to support the widespread use of prison as a general deterrent.
It could be argued that incarceration eliminates offending (at least that toward the general public) while the offender is in prison. However, the study of the incapacitative impact of prison is also beset by methodological difficulties (Reuter & Bushway, 2007). Both individual-level studies (which estimate the frequency of an individual’s offending prevented by incarceration) and aggregate approaches (which compare crime rates with incarceration rates) 1 lack agreed and objective measures (e.g., Moxon, 1998), and both approaches assume that those incarcerated would have continued to offend (at various levels) had they been free to do so.
Given the methodological difficulties of accurately assessing the impact of both general deterrence and incapacitation, assessing the value of prison as a sentencing option to prevent crime rests heavily on how the experience of prison influences later reoffending. If the experience of prison has the desired impact and decreases later reoffending, this might be because of a specific deterrent effect. That is, the experience of incarceration was so unpleasant that it increases an offender’s perceived risk of future punishment for offending. 2 If prison has an undesirable impact and increases later reoffending, this could be because of the absence of a specific deterrent effect, with the experience of incarceration being not sufficiently unpleasant, so an offender’s perceived risk of future punishment for reoffending is reduced (e.g., Nagin, Cullen, & Jonson, 2009). Incarceration could also increase the likelihood of later offending through the severing of prosocial bonds such as the loss of a partner and loss of employment that decrease an individual’s stake in conformity (Hirschi, 1969; Laub & Sampson, 2003). Alternatively, prison might lead to the acquisition of antisocial ties through the introduction to criminal techniques or potential co-offenders (e.g., Roxell, 2011) and/or the reinforcement of an offending identity caused by being a prisoner (Lemert, 1967). In addition, structural impediments on release (e.g., inability to obtain employment, appropriate housing) could also have a significant negative impact (e.g., Chiricos et al., 2003). 3
Methodological Challenges
Arguably, the method that would provide the most confidence in determining the impact of custody on later reoffending would be to undertake randomized controlled trials (RCTs) in which a large number of individuals who had committed offenses were allocated either to a period of incarceration or not. In theory, this method would equate those given custodial sentences with those not given custodial sentences on all known and unknown variables. If carefully conducted, any difference in later measures of reoffending could then be attributed to imprisonment.
There are numerous practical and ethical considerations in undertaking even a “typical” RCT (i.e., when allocating a specific treatment suspected of being beneficial; for example, Farrington & Jolliffe, 2002) and these issues would be multiplied when the intervention is the loss of liberty. There is little question that it would be ethically problematic to allocate someone to prison based on the flip of a coin. However, some researchers have been able to overcome these issues and have made a case for the use of RCTs in this context (i.e., Villettaz, Killias, & Zoder, 2006).
There have been only two experimental studies of the impact of custody on reoffending using adult samples. 4 Bergman (1976) evaluated the impact of prison using a group of mostly male adult second felony offenders (n = 109) in Michigan, who were randomly allocated to either an innovative probation program or prison. The results suggested that 12 months after release, those allocated to probation had a significantly lower prevalence of reoffending (14%) compared with the prison group (33%).
Similarly, Killias, Aebi, and Ribeaud (2000) reported on an experiment in Switzerland in which 123 adult offenders were randomly allocated to either serving 14 days in prison or community service. 5 Twenty-four months later, there was a slightly negative but nonsignificant impact of prison on the prevalence of convictions (21.4% community service vs. 25.4% prison), arrests (33.3% community service vs. 38.5% prison), and the frequency of convictions (0.39 community service vs. 0.64 prison). An additional follow-up of the original sample 11 years later found that 58% of those sentenced to prison had a conviction after release compared with 53% of those in the control group (Killias, Gillieron, Villard, & Poglia, 2010). This difference was found to be nonsignificant, but the standardized mean difference of d = 0.15 suggests that the effect of prison was not negligible.
The results of these RCTs suggest that prison either had no effect at all, or a small undesirable effect in which reoffending increased. However, these studies involved a degree of trade-off, maximizing their internal validity at the expense of their generalizability. That is, specific features of these studies (e.g., short lengths of incarceration, less serious offenders, low numbers) are likely to have encouraged courts to comply with the randomization, but arguably make them less useful in determining the impact of custody more widely.
This introduces the main challenge to this area of research, which is that there is a clear and proven relationship between the characteristics of the individual who presents in court (and subsequently receives custody) and his or her likelihood of reoffending (see, for example, Gottfredson & Gottfredson, 1988; Spohn, 2007; Spohn & Holleran, 2002). Research has shown that males, ethnic minorities, those with longer criminal histories and those with more serious offenses are more likely to be sent to prison and also more likely to reoffend (Spohn & Holleran, 2002).
One method of attempting to control for the fact that many of the features that predict sentencing decisions predict reoffending is to match those who received custody with those who did not on these relevant demographic and offense features. In such quasi-experimental studies, the reoffending of these two groups (made similar on the relevant characteristics) can be compared to assess the potential impact of custody. For example, Petersilia, Turner, and Peterson (1986) compared the prevalence of reoffending between a group of individuals sent to prison (n = 511) with a group given probation (n = 511) in Alameda and Los Angeles counties. The two groups were matched on a number of features, including gender, conviction offense, previous conviction history (using a three-level summary score), and age. The results suggested that those who were imprisoned were significantly more likely to be charged with a new offense and were more likely to be reimprisoned.
There have been a number of these so-called “exact matching studies” that have attempted to equate individuals sent to custody with a comparable noncustodial group on a variable-by-variable basis (for review, see Nagin et al., 2009). This type of matching process has been referred to as a data hungry process as it requires information about a large number of individuals to develop a comparison group similar on even the most basic explanatory variables such as age, age at first offense and current offense type. As a result, “exact” matching rarely is exact and often it is clear that it has not been completely successful in creating a comparable counterfactual group (as noted by Nagin et al., 2009) that makes it very challenging to interpret the results of these types of studies to evaluate the overall impact of prison. This is because of the bias introduced by inadequate matching in individual studies and also because looking across studies, different studies control for different variables to different degrees of specificity.
Propensity score matching (PSM) is a statistical technique that has been used only sparingly in criminology to date (i.e., Apel & Sweeten, 2010), but is particularly useful in the current context. This is because it is a method that uses relevant background information to develop a conditional probability that an individual will be in one condition rather than in another, or in this case incarcerated rather than a community order (e.g., Luellen, Shadish, & Clark, 2005). The benefit of this method is that, if correctly specified, individuals balanced on this conditional probability will be equivalent on all measured covariates. This increases the likelihood that any difference between the two groups in later reoffending might be attributable to the main remaining difference between them: whether they were sentenced to custody or not.
There have been a small number of studies that have used PSM to examine the impact of custody on later reoffending (e.g., Loughran et al., 2009; Weisburd, Waring, & Chayet, 1995; Wermink, Blokland, Nieuwbeerta, Nagin, & Tollenaar, 2010). Loughran et al. (2009) used data from the Pathways to Desistance study (Mulvey et al., 2004) to examine the impact of a custodial sentence 6 (compared with probation) on later rearrest and self-reported offending for a group (n = 921) of 14- to 17-year-old (mostly) felony offenders. A large number of covariates (66) were used to model the probability of receiving a custodial sentence. These included demographic, familial, peer, legal, psychological, mental health, substance abuse, psychosocial maturity, and prior adjustment. After balancing on the available covariates, the results showed those who received a custodial sentence were about equally likely as those who had received probation to self-report offenses, but were slightly (but not significantly) more likely to be arrested.
Wermink et al. (2010) used demographic and criminal history variables in the development of a propensity score to examine the impact of custody compared with community service in the Netherlands. Variables in the model included age, gender, index offense type, and previous criminal history. Individuals who had been sentenced to their first incarceration (n = 2,123) were then matched to those sentenced to community service (n = 2,123). Eight years later, those incarcerated were convicted of significantly more property and violent offenses than those who had received community service.
Similarly, Weisburd et al. (1995) investigated the impact of prison on white-collar offenders (n = 742). The probability of going to prison was modeled using criminal history (e.g., prior record, offense), social (e.g., gender, age), act-related (e.g., type and number of victims), and actor-related variables (e.g., cooperation with prosecution). Weisburd and colleagues created three subgroups based on the predicted probability of going to prison (low, medium, and high probability), and investigated the number of new recorded offenses for those who had gone to prison compared with those who had not, within each subgroup 10 years after release. The results suggested that there was no significant difference in the proportion of reoffenses between the two groups; however, the effect sizes (d = 0.10-0.12) suggested that those incarcerated might have been slightly more likely to reoffend.
Recently, the Ministry of Justice (2011) in England used PSM to compare the reoffending of a large sample (n = 24,978) of those who received various levels of probation supervision (community orders and suspended sentence orders) with those who received custodial sentences of up to 12 months in England and Wales. The results suggested that those who had been sentenced to custody had a prevalence of reoffending of between 6.5% and 7.4% higher 12 months after release than those who received probation. 7 Although this is an informative finding, it is important to note that there is no statutory requirement for those who receive periods of custody of under 12 months to receive any support from probation services after release. This means that this comparison was not simply between custody and no custody, but between a short-term period of incarceration (and no assistance on release) and those who received varying levels and different forms of support in the community from the probation service. It is conceivable that the difference in prevalence of reoffending in this instance could be attributable to successful probation support rather than (or in addition to) a negative impact of incarceration.
In recent years, three systematic reviews have attempted to summarize the impact of the experience of custody on later reoffending (Gendreau, Goggin, & Cullen, 1999; Nagin et al., 2009; Villettaz et al., 2006). Although these studies differed slightly in their approach (e.g., inclusion criteria), all three highlighted the limited number of studies of sufficient methodological quality that were available to contribute to the debate.
The Current Study
The major research question to be addressed in this research is as follows: “What is the impact of prison on official measures of reoffending of adult males?” Given the previously reviewed evidence on this topic, it would be hypothesized that prison will slightly (but perhaps not significantly) increase later measures of reoffending. In the process of addressing this central research question, this study will attempt to add important information to the wider debate about the impact of custody on later reoffending by addressing some of the specific limitations in this literature that were highlighted by Nagin et al. (2009). First, this study is not based on “old” data. The current sample was sentenced or released between 2005 and 2008, which means most of the cases were sentenced under the provisions of the current Criminal Justice Act (CJA) 2003. 8 The data are therefore “policy relevant.” Moreover, this is one of the few studies that have investigated the impact of custody on later reoffending using non-U.S. data. England provides a useful contrast to the emerging findings of U.S. studies as its sentencing policy is generally not as harsh as that in the United States but it has become increasingly punitive over the last two decades (e.g., Tonry, 2010).
Another benefit of this study was that it was based on a sample drawn from a probation caseload. This means that the entire sample was subject to approximately equal level of support, supervision and monitoring, and creating equivalence between those who had been released from custody and those who were serving community orders. Therefore, the added value of being supervised on probation could be distinguished from any negative effects of incarceration.
The Data
The sample comprised male offenders aged 18 to 50 being supervised either on a community order or supervised after release from custody in five neighboring probation areas in 1 of 10 administrative regions in England and Wales during 2005 and 2008. Each case was followed up for 1 year after commencement. Data were drawn from the Police National Computer (PNC) database that included information about past criminal activity (e.g., age at first offense, number of previous offenses, number of previous court appearances, number of previous custodial sentences) and the current offense that led to this instance of supervision (e.g., offense type, disposal for current offense). Moreover, information about the number, timing, and type of reoffenses was available up to 12 months after the commencement of supervision.
Unfortunately, information about the actual length of incarceration was not available. However, as these data were for individuals on a probation caseload, it can be assumed that the minimum length of sentence (as given by the court) was greater than 1 year. Probation support is regularly provided to those released from custody who received a sentence of 12 months or more, and not to those who receive less than this tariff. Therefore, these data probably contains more serious offenders and offenses than those in comparable studies.
Reoffending by the Sample
Of the 5,500 individuals in the sample, 2,155 (39.2%) had committed a reoffense within 12 months of either commencing their community order or after release from prison. Moreover, the frequency of reoffending during the 12-month follow-up period was found to be an average of 3.3 offenses per offender (SD = 3.1) and the average time to reoffense was 135.4 days (SD = 105.0). However, the difference in these measures of reoffending between those who received community orders and those released from custody was stark. Those released from custody had a significantly higher prevalence of reoffending (53.3% compared with 32.8%, p < .001; d = 0.46) and frequency of reoffending (4.0 [SD = 3.6] compared with 2.8 [SD = 2.5], p < .0001; d = 0.41). Those released from custody were also quicker to reoffend in that the time from release to the date of the next offense for this group was 122.5 days (SD = 101.2) compared with 144.8 days (SD = 106.8, p < .0001; d = 0.21) for those given a community order.
In addition to measures of statistical significance (chi-square and t tests), the standardized mean difference (d) was used to assess the difference between those released from custody compared with the community order group on the measures of reoffending. There are a number of methods of interpreting effect sizes, but a useful method of making the magnitude of the effect size practically meaningful is to convert it to a difference in proportions. To convert the effect size to a difference in proportions, the effect size (d) is divided in about half (see Lipsey & Wilson, 2001). Therefore, the effect of custody compared with community orders on the measures of reoffending (but not controlling for preexisting differences between the groups) ranged from an 11% difference in proportions (time to reoffense; d = 0.21) to 23% (prevalence of reoffending up to 12 months; d = 0.46).
Several previous studies have demonstrated that those who receive custody have different demographic and criminal history profiles than those sentenced to community orders (e.g., Wermink et al., 2010). For this reason, the simple comparison of measures of reoffending of those released from custody compared with those who received community orders tells us little about the actual impact of custody.
In this study, the 3,793 individuals from the community order group differed from the 1,707 in the custody group on key demographic features. For example, those released from custody were significantly younger (M = 27.4 years) than those in the community order group (M = 29.4 years, t = 8.2, p < .0001; d = 0.24), and a higher proportion of those classified as being of Black ethnicity were in the custody group (8.1%) compared with the community order group (6.1%; χ2 = 7.5, p < .006), but there were no other race differences. The proportion of those released from custody varied within the five areas covered in this study. Those in one area (Area 2) were significantly more likely to have been released from custody (χ2 = 14.0, p < .0001) and those in another (Area 3) were significantly less likely (χ2 = 32.8, p < .0001).
Table 1 provides the criminal history information for the two groups. Overall, and in line with previous research (e.g., Gottfredson & Gottfredson, 1988), those sentenced to custody were clearly at a higher risk of reoffending. For example, those sentenced to custody had significantly earlier ages of onset, greater number of previous offenses, greater number of conviction occasions, and greater number of previous custodial sentences. The Copas score is a logarithmic measure of the speed of criminal convictions over a criminal career, 9 and the Offender Group Reconviction Score (OGRS) is a static risk assessment device that predicts the probability of reoffending (from 0 to 100) 1 year after release (Howard, Francis, Soothill, & Humphreys, 2009).
Criminal History of the Full Sample.
Note: OGRS = Offender Group Reconviction Score.
p < .05. **p < .001. ***p < .0001.
The bottom of Table 1 shows the breakdown in the prevalence of the type of index offense that resulted in either a custodial sentence or community order. Those who received a community order were more likely to have committed criminal damage, drink driving offenses, fraud, other motoring offenses, and public order offenses. Perhaps surprisingly, they were also more likely to have committed violence. 10 Those who received custodial sentences were more likely to have committed domestic and other burglary, drug import/export/supply, robbery, and sexual offenses against children.
Developing the Propensity Score
The propensity scored was developed using the available demographic, criminal history, and offense characteristics. Table 2 shows variables that were considered for inclusion in the model. The final model included all those that had p values of <.25 or lower in keeping with Rosenbaum’s (2002) caution against using statistical significance to select predictors for PSM. 11
Variables Considered for Inclusion in the Model.
p < .05. **p < .001. ***p < .0001.
The final model had a likelihood ratio chi-square (LRCS) of 1,741.5 (p < .0001) and a Nagelkerke R2 = .39. The model correctly predicted (78.5%) of the sample, and the area under curve (AUC) of the receiver operator characteristic was high (AUC = 0.83). 12 However, these criteria are less relevant with PSM where the fit of the model is evaluated by the extent to which balance on the covariates has been achieved (Luellen, Shadish, & Clarke, 2005).
Table 2 shows that, similar to propensity scores developed in previous research (e.g., Wermink et al., 2010), most individual characteristics (e.g., age at first offense, number of previous offenses) significantly predicted group membership. 13 The Exp(B) can be interpreted the same as an odds ratio, 14 so a 1-year increase in age at first offense decreased the probability of being in the custody group by about 10% (1/0.91), controlling for all other measured variables. Information about the independent impact of the index offense was assessed with violence as the reference category. This means that, for example, those who were convicted of criminal damage were about half as likely to be sentenced to custody, Exp(B) = 0.54, compared with those convicted of violence.
Once the propensity score was developed, two methods of balancing the covariates for those in the custody group with the community group were used: nearest-neighbor matching and stratification.
Nearest-Neighbor Matching
Nearest-neighbor one-to-one matching was used to match one offender from the custodial group to one offender in the community order group. This type of matching was selected to decrease potential bias, with the knowledge that one-to-one matching increases variance (Apel & Sweeten, 2010). Overall, 1,162 matched pairs were identified.
It was clear that the matching was successful in reducing the preexisting differences between the two groups in demographic features. For example, those in the matched community order group were no longer significantly older than the custodial group (M = 28.3 compared with M = 28.1, ns; d = 0.02) or more likely to be from a specific ethnic background. Moreover, the community and custodial groups were about equally represented across the five areas of the region. The maximum standardized difference between the two groups on age, ethnic background, and area was d = 0.05, which is approximately equivalent to a 2.5% difference. This suggests that the two groups were balanced on these demographic features, as these values are well below the generally accepted d = 0.20 cutoff level (Hahs-Vaughn & Onwuegbuzie, 2006).
Table 3 shows that the process of matching was successful at minimizing the substantial preexisting differences in important offender and offense characteristics between those who received community orders and those who had been released from custody. There were no significant differences between the two groups, and the maximum standardized mean difference was d = 0.12 for both an index offense of robbery and a sexual offense. Importantly, for the purposes of this research, the OGRS score, a validated measure of future reoffending, was essentially identical for both groups.
Criminal History of the Matched Sample.
Note: OGRS = Offender Group Reconviction Score.
p < .05. **p < .001. ***p < .0001.
Stratification
As a method of balancing the nonequivalent groups, Rosenbaum and Rubin (1984) suggested matching (as above) and stratification. Stratification involves dividing the sample into five strata of equal sizes, so that members of the treatment and control groups have similar propensity scores within each strata (e.g., Luellen et al., 2005). In this study (see Table 4), Strata 1 included the 1,082 individuals who had the lowest probability of receiving custody, whereas Strata 5 included the 1,081 individuals who had the highest. It can be seen that only 2.5% (43/1,689) of those in the custody group were in the lowest strata compared with 27.9% (1,039/3,720) of the community group.
Comparison of Strata With Number in Custody and Community Groups.
The balance of the stratification was evaluated by reviewing the relationships of the covariates to custody versus community order within each strata. Using logistic or linear regression (depending on the scale of the variable), each of the covariates was used as the dependent variable and four of the five dummy-coded strata as well as a variable indicating custody or community order were used as independent variables. In a randomized experiment, 5% of the comparisons would be expected to be significant by chance alone (at the p < .05 level); thus, one can assume that the propensity score model reaches relative balance when 5% or less of the results are statistically significant (Hahs-Vaughn & Onwuegbuzie, 2006). After reviewing the regressions, the indicator of custody versus community order was significant in only one regression—that in which an index offense for robbery was predicted.
Results
Table 5 shows the comparison between the two groups when the propensity score was balanced by one-to-one nearest-neighbor matching. It was clear that those who were released from custody were significantly more likely to reoffend. For example, of the 1,162 individuals with custodial sentences, 594 (51.1%) were convicted within 12 months after release compared with 517 (44.5%) of those who received a community order (χ2 = 10.2, p < 001; d = 0.14). In addition, those who were released from custody were much more likely to be reincarcerated for their subsequent offense. Over 65% of those who initially received custody were sentenced to another period of incarceration within 12 months of being released. This was compared with the 33.7% of those who had originally received a community order.
Measures of Reoffending for the Matched Sample.
p < .05. **p < .001.***p < .0001.
The frequency of reoffending was scaled up (based on an estimate of time served 15 ) for those who received a custodial sentence during the follow-up period to attempt to account for the different “at-risk” periods. On average, those released from custody committed significantly more offenses (4.8 compared with 3.5, t = 5.7, p < .0001; d = 0.32) 16 and committed their first offense earlier (121 days after release compared with 140 days, t = 3.0, p < .003; d = 0.18) than those on a community order.
The bottom of Table 5 shows the distribution of the types of offenses that were committed by the two groups over the 12-month follow-up period. For example, 123 of the 1,604 (7.7%) further offenses committed by those in the community order group were for absconding or bail offenses compared with 267 out of the 2,428 (11.0%) further offenses committed by those in the custody group (χ2 = 11.8, p < .0001; d = 0.21). Those in the custody group were also significantly more likely to commit theft offenses. In contrast, the reoffenses of those in the community order group were more likely to be drug offenses (both import/export/supply and possession) and public order offenses. Based on the effect sizes (d = 0.38 and 0.28), there was some indication that those in the custody group might have had a higher proportion of sexual and robbery offenses; however, these results were not statistically significant (possibly because of the low number of offenses in these categories).
Figure 1 shows the prevalence of reoffending (solid line) and incarceration (broken line) for both those in the custody and community order groups across the five strata. For example, 15.1% (157/1,039) of those in the community order group in Strata 1 (lowest likelihood of receiving prison) had a reoffense compared with 37.2% (16/43) of those in the custody group in Strata 1. Overall, those in the community order group had a lower prevalence of reoffending for all strata and this was statistically significant for all but Strata 5 (those with the highest likelihood of being released from custody). The standardized mean differences of the prevalence of reoffending between those in the community group compared with those in the custody group ranged from d = 0.66 (Strata 1) to d = 0.04 (Strata 5). Averaging the scores of the five strata, the overall effect on the prevalence of reoffending was found to be d = 0.16, suggesting that release from custody was associated with an increase in the likelihood of reoffending of about 8%.

Prevalence of reoffending and incarceration by strata.
Those released from custody were much more likely to be incarcerated again when compared with the community order group (broken line in Figure 1). This difference was statistically significant for every strata and the effect sizes ranged from d = 1.15 (Strata 1) to d = 0.70 (Strata 4). Averaging the scores of the five strata, the overall effect on incarceration was found to be d = 0.80 (p < .0001), suggesting that release from custody was associated with an increase in the likelihood of incarceration of about 40%.
Figure 2 shows the frequency of reoffending (solid line) for those in the two groups by strata (on the left-hand axis). For example, those in the community order group in Strata 1 had an average of 2.4 further offenses per offender, whereas those in the custody group had an average of 5.0 (t = 3.1, p < .0001). Those released from custody had a significantly higher number of offenses per offender at each of the five strata, with effect sizes ranging from d = 0.81 (Strata 1) to d = 0.23 (Strata 5). The overall mean effect size for frequency of further offending was found to be d = 0.40 (p < .0001), suggesting that release from prison was associated with a 20% increase in the number of reoffenses per offender.

Frequency and time to reoffense by strata.
Figure 2 also shows the time to reoffense (broken line, right-hand axis) for the two groups. For example, in Strata 1, those in the community group averaged 154 days before reoffense compared with 171 days for those in the custody group (t = 0.6, ns). With the exception of Strata 1, those released from custody had a faster time to reoffense than those in the community group, but this difference was only significant for Strata 2 and Strata 3. The average effect size of the difference across the five strata was d = 0.19 (p < .001), suggesting that being released from custody was associated with a slight decrease in the time to reoffense of about 10%.
Sensitivity Analysis
Unlike RCTs, PSM can only balance individuals on measured covariates, and the data for this study did not contain information about a host of factors (e.g., drug or alcohol addiction, socioeconomic status, family relationships) that, had those from the custody and community order group been balanced, would have increased confidence in the results. To determine the potential impact of this hidden bias, Rosenbaum’s bounds method (Γ) was used (Keele, 2010). In a RCT, randomization ensures that Γ = 1.0, which is equivalent to no hidden bias. In a quasiexperimental study, if Γ = 2.0 and two people were identical on the matched covariates, then one might actually be twice as likely as the other to receive the treatment (in this case, custody) because of differences in unobserved covariates (Keele, 2010). Although the actual value of Γ is unknown, the bounds method evaluates several estimates to determine at what magnitude the conclusions of the study change.
In this study, at Γ = 1.1, which is equivalent to hidden bias that would increase the odds of one individual of the matched pair being 10% more likely to have received custody, the critical p value was found to be <.0002. This suggested that the original results would be robust at this level. However, at Γ = 1.3, the critical p value was found to be <.12. Therefore, if there was an unmeasured covariate that independently (i.e., not correlated with any other variables currently included in the propensity score) increased the odds of one of the matched pair receiving custody by 30%, this would call into question the robustness of the results of this study. By way of comparison with the Exp(B) values in Table 3, the level of hidden bias would need to have an equivalent effect to the variable of Asian Ethnicity, Exp(B) = 1.3. This level of Γ is similar to that of another study, which used official records to create a propensity score (Wermink et al., 2010).
Discussion
The results of this study were quite clear in suggesting that prison did not achieve one of its primary objectives: the prevention of crime through the reduction of reoffending of those incarcerated. In fact, when controlling for the available variables (using multiple methods to obtain balance), prison was associated with a small but significant increase in the proportion of people reoffending (7%-8%), the number of reoffenses committed (16%-20%), and a substantial increase in the proportion of individuals being incarcerated (36%-40%). This finding is strikingly similar to others found in England (Ministry of Justice, 2011), the United States (Loughran et al., 2009; Weisburd et al., 1995), the Netherlands (Wermink et al., 2010), and Switzerland (Killias et al., 2010).
The notable increase in the probability of (re)incarceration might reflect the higher number of reoffenses of those originally incarcerated, the decreased time to first reoffense of this group or potentially, the greater seriousness of the types of reoffenses. The observed increase in reincarceration fits particularly well with the body of literature that has highlighted the “revolving doors” of prisons, with offenders typically spending repeated stints in custody (Clear, 2007; Comack, 2008).
The results of this research, however, must be interpreted cautiously and with the weaknesses acknowledged. One limitation was that only official data were available. Official records of offending are known to be an imperfect measure of actual offending behavior because they are affected by decisions to report, record, prosecute, and convict (Lloyd, Mair, & Hough, 1994). In addition, official data are very limited in their ability to provide explanations for the results. Prison was associated with an increase in reoffending but this study cannot contribute much to the debate on why this was the case.
However, other research provides an indication about the more likely mechanisms by which prison might be having this negative impact. While there is little evidence that prison acts as an introduction service for potential co-offenders (Roxell, 2011), the stigmatizing effects of labeling, and the resulting structural impediments in terms of obtaining employment and adequate housing might be key mediators of the relationship between incarceration and reoffending (e.g., Chiricos, Barrick, Bales, & Bontranger, 2007; Farrington, 1977; Laub & Sampson, 2003; Murray, Blokland, Farrington, & Theobold, 2012). For example, using data from 870 individuals from the Rochester Youth Development Study, Bernberg, Krohn, and Rivera (2006) found that the labeling effect of self-reported involvement in the juvenile justice system was associated with significantly greater levels of subsequent offending. Likewise, Chiricos et al. (2007) found that adjudication as a “felon” significantly increased recidivism among a very large sample of men and women in Florida who were being sentenced to probation. In both of these studies, the interaction with the justice system that produced the label (e.g., self-reported involvement/adjudication as a felon) was less severe than incarceration, suggesting the labeling effect of being a “prisoner” may have a more profound negative impact.
Much like the subsequent evolution that followed the original development of the “What Works?” paradigm, it will be important for future research to consider not only why prison does not appear to reduce reoffending but also for whom this is true and under what circumstances. This could include individual dispositions (e.g., levels of impulsivity, variation in stakes in conformity), interpretations of the experience (e.g., perceived fairness of incarceration), and features of the experience (e.g., length of incarceration, distance of prison to individual’s local community). The current results (Figures 1 and 2) suggested that the disparity in the prevalence of reoffending, frequency of reoffenses, and prevalence of (re)incarceration between those who received custody and those who received community orders varied depending on the predicted likelihood of receiving custody. That is, being incarcerated had a particularly negative impact for those who had characteristics (e.g., a less prolific criminal history) that made them less likely candidates to actually receive incarceration.
In future research, there should be increasing efforts to include more information in propensity score models when estimating the impact of prison, especially as it is very unlikely that the political and practical constraints can easily be overcome to allow for adequate RCTs to be undertaken. Moreover, given the emerging evidence that prison is harmful, randomly allocating someone to prison might be considered increasingly unethical. It might be possible to improve propensity score models by using data from assessment tools such as Level of Service Inventory–Revised (LSI-R) or Offender Assessment System (OASys; Andrews & Bonta, 1995; Debidin, 2009), which are commonly administered to offenders on probation, but linkage of these data has proved challenging (e.g., Hedderman & Jolliffe, 2010). Moreover, it would be useful to undertake prospective longitudinal studies where individuals are followed up from the point of conviction throughout their prison sentence or community order to a significant period of time after release. By collecting a range of individual- and system-level data, as well as information about self and official offending at regular intervals, a better picture of the mediators and moderators of the impact of prison on reoffending could become evident.
Conclusion
The results of this research add to the growing evidence base, which suggests that the experience of prison can be criminogenic (e.g., Ministry of Justice, 2011; Wermink et al., 2010). However, record numbers of individuals continue to be sentenced to prison in the United States, the United Kingdom, and around much of the rest of the world (e.g., Walmsley, 2009). Despite the current difficult economic climate, few Western governments seem willing to confront penal populism. It is to be hoped that the consistency of the small but growing body of research, which clearly demonstrates that prison increases rather than decreases reoffending in like-for-like cases, may provide the evidence base to demonstrate that calling for less prison cannot always be equated with being soft on crime and failing to care about victims.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
