Abstract
A vast body of research underlies the ascendancy of criminogenic risk assessment, which was developed to predict recidivism. It is unclear, however, whether the empirical evidence supports its expansion across the criminal legal system. This meta-review thus attempts to answer the following questions: 1) How well does criminogenic risk assessment differentiate people who are at high risk of recidivism from those at low risk of recidivism? 2) How well do researchers’ conclusions about (1) match the empirical evidence? 3) Does the empirical evidence support the theory, policy, and practice recommendations that researchers make based on their conclusions? A systematic literature search identified 39 meta-analyses and systematic reviews that met inclusion criteria. Findings from these meta-analyses and systematic reviews are summarized and synthesized, and their interpretations are critically assessed. We find that criminogenic risk assessment’s predictive performance is based on inappropriate statistics, and that conclusions about the evidence are inconsistent and often overstated. Three thematic areas of inferential overreach are identified: contestable inferences from criminalization to criminality, from prediction to explanation, and from prediction to intervention. We conclude by exploring possible reasons for the mismatch between proponents’ conclusions and the evidence, and discuss implications for policy and practice.
Introduction
Over the past 25 years, actuarial risk assessment of criminogenic risk factors has become an “evidence-based” policy and practice in the criminal legal system, strongly promoted within expert circles of policymakers, researchers, and practitioners (National Institute of Corrections, 2010). 1 Criminogenic risk assessment can be defined as (1) the use of statistical methods to predict an individual’s legal system outcomes and categorize them accordingly, purportedly to (2) manage carceral populations through efficient and effective allocation of supervision resources and, ideally, to reduce individuals’ risk through appropriate rehabilitative and social services.
The first part of this definition is about quantifying certain individual characteristics associated with, and often thought to be generative of, illegal behavior. Four of these individual characteristics (a history of antisocial behavior, antisocial personality pattern, antisocial attitudes and cognitions, and antisocial associates), have been consistently associated with recidivism, violence, and other legal system outcomes in almost any sample of people involved in the criminal legal system (Dowden and Andrews, 1999; Gendreau et al., 1996; Lipsey and Derzon, 1998). The second part of the definition is about intervening on manipulable aspects of these predictors such as attitudes, cognitions, elements of personality, and other “criminogenic” targets. Such efforts can modestly reduce recidivism rates (Andrews et al., 1990; Andrews and Dowden, 2006).
A vast body of research underlies the ascendancy of criminogenic risk assessment. As a result of its apparant success, it is moving from the back-end of the criminal legal system, where it was developed to assess the risk of recidivism, to the front-end of the system, in pre-trial processing, sentencing, and policing (Gottfredson and Moriarty, 2006; Lowenkamp and Whetzel, 2009; Storey et al., 2014; Trujillo and Ross, 2008).
The relative success of this approach to risk assessment has been interpreted as evidence that it taps into the causes of “criminal behavior” more generally, and that targeting these factors can therefore also reduce illegal behavior and correctional supervision rates overall. Indeed, an explanatory framework emerged around “the Big Four” antisocial criminogenic risk factors as fundamental to the roots of crime itself, and a model for organizing and applying this knowledge—the risk-need-responsivity model of correctional assessment and rehabilitative programming—is widely accepted and promoted (Andrews and Bonta, 2010; Bonta and Andrews, 2017; James, 2018; Serin and Lowenkamp, 2015).
Yet, with the field’s embrace and promotion of criminogenic risk assessment and the risk-need-responsivity model, its advocates make expansive claims about what it can achieve. Some proponents even argue that risk assessment should characterize the proper function of the criminal legal system itself. For example, Andrews and Bonta (2010) suggest that the prediction of illegal behavior is a central activity of the criminal legal system, because “from it stems community safety, prevention, treatment, ethics, and justice.” In addition to reducing recidivism rates, proponents suggest that the framework might be able to improve sentencing procedures, facilitate jail diversion, reduce prison populations, help scale down mass incarceration without jeopardizing public safety, and ultimately, prevent crime altogether (Andrews et al., 2011; Clement et al., 2011; Monahan and Skeem, 2016).
The present meta-review interrogates the plausibility of such claims by attempting to answer the following questions:
How well does criminogenic risk assessment differentiate people who are at high risk of recidivism from those at low risk of recidivism? How well do researchers’ conclusions about (1) match the empirical evidence? Does the empirical evidence support the theory, policy, and practice recommendations that researchers make based on their conclusions?
To date, scores of meta-analyses and systematic reviews have attempted to answer the first question, by synthesizing vast amounts of research on the predictive utility and validity of criminogenic risk factors and particular risk assessment instruments. These reviews typically conclude that the evidence supports the continued use and expansion of criminogenic risk assessment. 2 Concurrently, many critics have written about the scientific, cultural, and political forces that brought risk assessment to the forefront in the era of mass incarceration (e.g., Feeley and Simon, 1992; Garland, 2003), and on the ways in which risk may be gendered and racialized (Hannah-Moffat, 1999, 2004). However, these critiques have not always engaged directly with the empirical evidence thought to support criminogenic risk assessment, instead challenging the framework’s premises outright.
This division of academic labor means that researchers who largely accept the premises of criminogenic risk assessment have tended to oversee empirical research, its translation to policy and practice, and assessments of its effectiveness. Critics, in turn, have tended to question or dismiss the entire endeavor without directly engaging the empirical evidence on which proponents base their claims. The present study bridges these worlds, approaching the empirical basis of criminogenic risk assessment from a theoretical perspective more skeptical than many of its current proponents.
Our purpose, in sum, is to evaluate whether what the field says about criminogenic risk assessment is consistent with what the evidence says about criminogenic risk assessment. We do this by conducting a meta-review of 39 meta-analyses and systematic reviews of the predictive performance of criminogenic risk factors, with a focus on history of antisocial behavior, antisocial attitudes and cognitions, antisocial personality, and antisocial peers. Our goal is to provide a bird’s eye view of not only the empirical evidence surrounding criminogenic risk assessment, but also how the field understands and interprets that knowledge. This entails that we engage with the literature’s quantitative data and methods, but also that we excavate its tacit theoretical and political assumptions.
A premise of our approach is that the way researchers mobilize concepts, language, and methods to make claims about evidence and practice can reveal hidden ontological and epistemological assumptions, and even contradictions. This is consequential if the widespread acceptance and expansion of criminogenic risk assessment is predicated on the misinterpretation or misuse of the concepts, terms, and methods associated with it. This, in turn, can have a real impact on people’s lives, if scores generated from risk assessments restrict people’s freedom or determine their access to health treatment or other services.
Moreover, we focus primarily on the empirical basis of criminogenic risk assessment, and the field’s interpretation of it, rather than the merits of the risk-need-responsivity model, because the former is prerequisite for certain aspects of the latter. Indeed, the originators of the model acknowledge that criminogenic risk assessment was developed based on a “radical empirical approach to building theoretical understanding” (Andrews and Bonta, 2010: 132). Although they admit that this approach might be confused with “dustbowl empiricism” (Andrews and Bonta, 2010: 133), they argue that it nonetheless “lead[s] to a deeper theoretical appreciation of criminal conduct” and is “practically useful in decreasing the human and social costs of crime” (Andrews and Bonta, 2010: 133). Moreover, while the most recent iteration of the Risk-Need-Responsivity model de-emphasizes prior distinctions between risk factors based on the antisociality construct and others (Bonta and Andrews, 2017), the influence of this psychopathological conceptualization of crime and criminality—as something that emerges from within deviant or abnormal individuals, versus a social relation—looms large, as we shall see below. This meta-review analyzes, assesses, and critiques this logic.
Methods
To answer the three questions posed above, we conducted a systematic literature search and review to identify meta-analyses and systematic reviews that examined the predictive utility of criminogenic risk factors. (We will subsequently refer to the meta-analyses and systematic reviews as “reviews,” while we will refer to the primary studies and data sources that constituted those reviews as “primary studies.”) The details of our methods follow.
Inclusion criteria
Reviews were included if they were published in English language journals between 1990 and 2020, focused on a legal system outcome (e.g., recidivism or arrest), and focused on male subjects. We excluded studies of criminogenic risk assessment among women for several interrelated reasons. Sex does not appear to moderate associations between criminogenic risk factors and criminal legal system outcomes (Singh and Fazel, 2010). Yet, it was “…derived from statistical analyses of aggregate male correctional population data and…based on male-derived theories of crime” (Hannah-Moffat, 2009: 211), and thus while criminogenic risk assessment may appear to be “gender neutral,” it may nonetheless fail to be gender-responsive (Hannah-Moffat, 2009, 2013). More recent efforts to incorporate gender-informed variables into the criminogenic risk framework, however, may merely reproduce gender-normative stereotypes and “neutralize gender politics and decontextualize women’s experiences” (Hannah-Moffat, 2010: 201). While these issues are critical, they are beyond the scope of the present review.
Search strategy
See the online supplement for search databases and terms. Search results were downloaded into a reference management system, de-duplicated, and titles and meta-data were screened to isolate meta-analyses and systematic reviews. Titles and abstracts of retained reviews were screened based on inclusion criteria to obtain a final sample.
Data extraction and analysis
Meta-data were compiled from the final sample of reviews. Citation information was obtained from Web of Science and Google Scholar. Select characteristics of reviews were tabulated. To answer the first question of this meta-review, we extracted and synthesized quantitative results and researchers’ conclusions and interpretations. To answer the second question, each author of the present meta-review independently rated review conclusions, to determine whether reviews deemed the evidence for the predictive utility of criminogenic risk assessment to be strong, moderate, or weak. Our inter-rater reliability, estimated with Cohen’s kappa, was 0.84, p < 0.01. Ratings reflect consensus scores reached after discussing disagreements. To answer the third question, we make claims based on a close reading of the reviews, from which we identify and examine recurring issues with the concepts, language, and methods mobilized by researchers in this body of work.
Results
Supplemental Figure 1 is a diagram of the flow of information through the meta-review process. The initial search yielded 12,952 records. Articles were retained if their titles or abstracts contained the terms meta-analysis or review. This reduced the number of records to 561. Titles and abstracts of these 561 reviews were read to determine whether they met inclusion criteria. The vast majority were excluded because they did not include a criminal legal system outcome. Thirty-nine meta-analyses or systematic reviews were retained for complete analysis.
Select review characteristics
Table 1 provides a description of retained reviews, and Supplemental Table 1 presents selected information from each, including disaggregated data from Table 1.
Meta-description of included meta-analyses and systematic reviews.
Note: Percentages are of the 39 studies included in this meta-review unless otherwise noted.
*Percentage of the 7553 total citations.
†Some studies counted in multiple categories, e.g., they reported the LSI and PCL.
Table 1 shows that the 39 reviews, two-thirds of which were meta-analyses, were published in 25 unique sources. Criminal Justice and Behavior and Law and Human Behavior published the most number of reviews (7 and 4 respectively). The vast majority of reviews were peer-reviewed (N = 36, or 92.3%). Those that were not peer reviewed appeared in books or government-sponsored publications.
Collectively, reviews have been cited 7,553 times by other journals, according to Web of Science or Google Scholar. While the plurality of reviews has been cited between one and 20 times, 52.1% of the total citations can be attributed to five high-impact reviews. The plurality of reviews were published between 2011 and 2020.
Samples from primary studies in 84.5% of reviews were drawn from people who were involved with the criminal legal system (either adult or juvenile “offenders”). The outcome investigated by nearly all reviews was recidivism. However, definitions of this construct were heterogeneous: types of recidivism often were not distinguished (i.e., re-arrest, re-conviction, and technical violations were considered the same outcome), or a definition was not provided.
Supplemental Table 1 shows that primary studies from the reviews cover a half-century, from 1965–2020, and sample sizes (of combined participants from primary studies) ranged from roughly 2,400 to nearly 140,000 though many reviews did not report this information.
Thirty-three of the 39 meta-analyses and systematic reviews were available in the Web of Science database, which made it possible to conduct a bibliometric analysis of their complete reference lists. The results of this analysis are presented in the second column of Table 1, which shows the top 10 cited references and top 10 cited first authors. Andrews (91 citations) and Bonta (33 citations), the creators and owners of the Level of Services Inventory, and their students or frequent co-authors (e.g., Dowden, 11 citations and Gendreau, 31 citations) were among the top-cited authors and were authors of the top-cited references.
How well does criminogenic risk assessment differentiate people who are at high risk of recidivism from those at low risk of recidivism?
Table 2 presents meta-analytic effect size estimates and other predictive performance indicators from the sample of reviews for the four “antisocial” criminogenic risk factors for recidivism. Most reviews reported findings in terms of either weighted point-biserial correlation coefficients or Cohen’s d statistics, both of which were typically referred to as “effect sizes.”
Meta-analytic effect sizes and other performance indicators for criminogenic risk factors and general recidivism.
LSI: level of services inventory; PCL: psychopathy checklist. Factor 1 represents callous/unemotional/narcissistic. Factor 2 represents antisocial, anger/aggression, impulsivity.
For studies that reported correlation coefficients, the range of mean effect size estimates for history of antisocial behavior was 0.06 – 0.35, for antisocial attitudes 0.16 – 0.2, for antisocial personality 0.18 – 0.31, and for antisocial peers 0.18 – 0.27. The range of estimates for demographic characteristics such as sex, racialized group membership, and education/employment status was 0.05 – 0.26. The magnitude of point-biserial correlations are difficult to interpret because it depends on the coefficient itself and the prevalence of the outcome (an issue we will discuss below). However, a heuristic is that coefficients of 0.1, 0.3, and 0.5 are small, medium, and large, respectively (Rice and Harris, 2005). Thus, reviews tended to find small to medium effect sizes.
Also in Table 2, for studies that reported weighted mean Cohen’s d, the range of estimates for history of antisocial behavior was 0.32 – 0.57, for antisocial attitudes 0.23 – 0.51, for antisocial personality 0.42 – 0.6, and for antisocial peers 0.39 – 0.41. For demographic characteristics, the range was 0.16 – 0.44. Cohen’s d is easier to interpret, as it does not depend on the prevalence of the outcome. Cohen’s d can be interpreted as the proportion of a standard deviation difference between two groups. Cohen’s heuristic for small, medium, and large effects is 0.2, 0.5, and 0.8, respectively (Rice and Harris, 2005). Reviews reporting Cohen’s d thus tended to find small to medium effect sizes.
Other meta-analyses reported weighted mean estimates for particular instruments overall. Table 2 shows that the correlation coefficient effect size estimates for the Level of Services Inventory ranged from 0.06 – 0.6, and for the Psychopathy Checklist, 0.26 – 0.28. Factor 2 of the Psychopathy Checklist, which measures antisocial characteristics, anger/aggression, and impulsivity, had a stronger effect size (0.29 – 0.32) than Factor 1, which measures callous, unemotional, and narcissistic traits (0.15 – 0.18).
A small number of meta-analyses calculated the mean area under the Receiver Operating Characteristic curve (ROC-AUC). This statistic represents the probability that a randomly chosen individual who has recidivated would be ranked as having higher criminogenic risk than a randomly chosen individual who had not recidivated. Schwalbe (2007), calculated an ROC-AUC of 0.64 from a meta-analysis of 28 different risk assessment instrument validation studies. Whittington and colleagues (2013) found a mean ROC-AUC of 0.69 from 65 studies. In a meta-analysis of 23 samples using the Level of Services Inventory and the Psychopathy Checklist, Fazel and colleagues (2012) found a mean ROC-AUC for recidivism of 0.66, a sensitivity of 0.4 (the probability that someone was assessed as high-risk given that they recidivated), a specificity of 0.8 (the probability that someone was assessed as low-risk given that they did not recidivate), a positive predictive value of 0.52 (the probability that someone will recidivate given that they were assessed as high-risk), and a negative predictive value of 0.76 (the probability that someone will not recidivate given that they were assessed as low-risk).
Eighteen of the reviews, or roughly 46%, tested for heterogeneity in meta-analytic results as a function of study characteristics such as sample composition (male/female, white/racialized group), study design (cross-sectional, longitudinal), source of risk assessment coding (interview/files), publication status (published/unpublished), etc. In general, these reviews found moderate to high degrees of heterogeneity that were attributable to the above characteristics. Seven reviews, or roughly 18%, discussed the quality of their primary studies. Four of these considered study design to be a proxy for quality, and as a result two included only prospective, longitudinal designs (Bonta et al., 1998, 2014). Two assessed whether design moderated meta-analytic results. One of these found that design had no effect on results (Andrews and Dowden, 2006), and one found that prospective studies were more likely to obtain statistically significant results than cross-sectional studies (Whittington et al., 2013). One study found that coder-rated quality of the outcome variable was positively associated with effect size (Lipsey and Derzon, 1998). Eight reviews mentioned publication bias and 6 (15%) tested for it, and found that the likelihood of publication bias was low. This is consistent with Singh and Fazel’s (2010) meta-review, which found that only a quarter of reviews assessed for publication bias, which likely biases results in favor of positive significant findings.
How well do conclusions about criminogenic risk assessment’s performance match the empirical evidence?
Supplemental Table 2 paraphrases the primary conclusions of the reviews. Roughly 37% of the reviews concluded that evidence for predictive performance was strong, 37% concluded it was moderate, 13% concluded it was weak or that results should be interpreted cautiously, and 13% did not draw explicit conclusions.
Thus, while over a third of the reviews judged the predictive performance of criminogenic risk assessment to be weak to moderate, over a third of the reviews deemed it to be strong. All but one meta-analysis drew these conclusions based on point-biserial correlations, Cohen’s d, or ROC-AUC. The vast majority relied on the former two statistics, which do not quantify predictive performance.
Measures of “effect” versus measures of prediction/classification. Most reviews used the language of “effect size” in describing point-biserial correlations or Cohen’s d. This confuses and conflates the language and goals of causal inference with the language and goals of prediction. Moreover, there are a number of major, well-understood problems with the use of point-biserial correlations and Cohen’s d even as measures of effect, including their dependence on the marginal distribution of the independent variable, arbitrary features of study design, and sampling variability (e.g., Cumming, 2013, 2014; Greenland et al., 1986).
But one issue in particular warrants further examination: the point-biserial correlation coefficient depends on the prevalence of the outcome, which was frequently not reported in the reviews or the primary studies that constituted them. Of greater concern is that a large number of reviews made conversions among correlation coefficients, Cohen’s d, and ROC-AUC, in order to implement meta-analytic procedures, using methods for this conversion that are sensitive to outcome prevalence. However, these reviews rarely reported the outcome prevalence estimates used in conversions or acknowledged that commonly cited tabular conversion charts assume an outcome prevalence of 50%. Using a 50% prevalence, or base rate, can overestimate the correlation coefficient if the true base rates are lower or higher. This is relevant because a study of nearly 68,000 people released from prisons in 2005, randomly sampled to represent the roughly 401,000 people released from prisons that year in 30 states, found that average recidivism rates are appreciably higher than 50% (Alper et al., 2018). The proportion of people who were re-arrested within three, six, and nine years of release was 68%, 79%, and 83% respectively (Alper et al., 2018).
Supplemental Figure 2 demonstrates the instability of point-biserial correlations converted from Cohen’s d, as a function of outcome prevalence and the magnitude of d. This plot was developed using the standard conversion formula from Rice and Harris (2005). For various magnitudes of Cohen’s d (curved lines), an outcome prevalence (x-axis) of 50% results in the maximum point-biserial r (y-axis). As outcome prevalence decreases or increases from 50%, the point-biserial r decreases. The potential for serious bias revealed in this figure—that the true magnitudes of correlations are likely lower than reported in the reviews—has been comprehensively discussed in the psychology literature (McGrath and Meyer, 2006).
Even if point-biserial correlation coefficients and Cohen’s d were described and interpreted not as effects, but purely for prediction, they do not convey some important information relevant to answering the first, technical question of this meta-review, about how well criminogenic risk assessment differentiates people who are at high risk of recidivism from those at low risk of recidivism. Only one meta-analysis (Fazel et al., 2012) presented measures that provide this information: sensitivity, specificity, positive predictive value, and negative predictive value. This review found that criminogenic risk assessments were better at identifying people at low risk for recidivism than people at high risk for recidivism, i.e., negative predictive values were high. They argued, however, that positive predictive values were unacceptably low: only 52% of individuals judged to be moderate to high risk went on to commit any offense (virtually equivalent to flipping a coin).
Furthermore, one of the meta-analyses reviewed here found that the Receiver Operator Characteristic curve was defined incorrectly in 27.8% of studies, and the Area Under the Curve statistic was defined in only 34% of studies, and, when it was defined, the definition was incorrect 37.5% percent of the time (Singh et al., 2013). Of greater concern, the estimated Area Under the Curve values were only interpreted in one-third of the studies, and was interpreted accurately in only 12.5% of these.
Thus, while empirical indicators provide relatively consistent magnitudes for the association between criminogenic risk factors and recidivism, the most commonly used statistics do not directly answer the first question regarding criminogenic risk assessment’s ability to distinguish people at high vs. low risk of recidivism. And because the most common statistic—the point-biserial correlation coefficient—is unstable relative to outcome prevalence, even those measures were likely inflated: of the 17 reviews that presented correlation coefficients, only three explicitly stated that they collected information about outcome prevalence from their primary studies. Five others mentioned the issue of sensitivity to outcome prevalence, but did not state whether they had information on true base rates from primary studies or made assumptions about outcome prevalence. The one meta-analysis that reported positive and negative predictive values found that risk assessments were good at correctly identifying people at low risk of recidivism, but virtually no better than chance at identifying people at high risk of recidivism. The technical performance of criminogenic risk assessment has thus been interpreted inconsistently, and arguably inappropriately, by the framework’s proponents.
Does the empirical evidence support the theory, policy, and practice recommendations that researchers make based on their conclusions?
In this section, we analyze how the reviews talk about risk assessment and illegal behavior more broadly, and assesses whether they make inferences that are supported by the data. Three themes are identified: contestable inferences from criminalization to criminality, contestable inferences from prediction to explanation, and contestable inferences from prediction to intervention.
The heterogeneity of recidivism definitions reflects the heterogeneity among risk assessment instruments used to predict recidivism. In their review, Desmarais and colleagues (2016) found that of 19 risk assessment instruments validated in U.S. correctional settings, 31% of validation studies defined recidivism as a new arrest, 13% as re-conviction, 10% as reincarceration, and 4% as technical violations. Importantly, the definition of recidivism influences the predictive performance of risk assessment instruments. For example, the Level of Services Inventory was found to be a valid predictor of recidivism in roughly half as many studies when the definition was re-arrest versus reincarceration (Vose et al., 2008).
Only two of the meta-analyses and systematic reviews acknowledged the difference between exposure to the criminal legal system and illegal behavior. The remainder of the reviews took for granted that legal system outcomes were the result of agential behaviors that emerged from within deviant individuals (e.g., Bonta et al., 2014).
Recidivism can be the result of an individual’s own behaviors, the proclivities of their supervision officer, or institutional policies and customs, and the causal mechanisms for recidivism are not uniform across these scenarios. For example, impulsivity may be one of many mechanisms for committing a new robbery, but family or employment problems may be the mechanism for missing a mandated treatment session. And the disposition of a community corrections officer might supersede both of these mechanisms in some circumstances.
As Schwalbe (2008) notes in his review, none of this is important if the goal of criminogenic risk assessment is purely prediction: As statistical prediction devices, actuarial risk assessments do not assume an underlying causal process related to recidivism. Rather, they count risk factors irrespective of the specific factors that may or may not be present for an individual case. (pp. 1368–1369)
An analogous problem arises with criminogenic predictor constructs, which also conflate illegal behavior with exposure to the criminal legal system. Only two reviews recognized the conceptual and empirical distance between illegal behavior and exposure to the criminal legal system, both within the context of racialized disparities. In the first, Wilson and Gutierrez (2013) compared the predictive ability of the Level of Services Inventory among Aboriginal versus non-Aboriginal “offenders” in Canada, and found effect modification of Aboriginal status and risk score: high-risk Aboriginals and non-Aboriginals had the same probability of recidivism, but low-risk Aboriginals had a higher probability of recidivism than low-risk non-Aboriginals. The authors characterized this finding as an “underclassification” of low-scoring Aboriginals. But a more critical interpretation is that low-risk Aboriginals were subject to a lower threshold of policing, arrest, and sentencing, i.e., they were victims of racialized discrimination. Similarly, in a review of studies that compared risk assessments for ethnic minority and white offenders in the United Kingdom, Raynor and Lewis (2011) found that ethnic minorities consistently had significantly lower risk scores, but received the same sentences as higher-risk white offenders. The authors attributed this finding to racialized discrimination in the British criminal legal system.
Findings such as these reveal that because crime is viewed as emerging from within deviant or abnormal individuals, criminogenic risk assessments struggle to account for distortions in the purported “signal” of individual differences that are in fact due to socio-structural “noise.” In fact, whether or not a person will be re-arrested or re-convicted is influenced by factors that have nothing to do with their criminogenic risk profiles, such as the way the criminal legal system targets their racialized social position.
Indeed, criminogenic risk assessment avoids altogether basic questions about which behaviors are considered crimes and whether behaviors that are deemed criminal are treated differentially across time, space, and groups of people. Story (2016: 10) clarifies this difference between criminality and criminalization: While criminality is understood to be a state of objective deviance located in the individual, to be criminalized is to be subjectified as well as subjugated by the coercions of law enforcement and the criminal justice system, both of which are highly malleable relative to changes in laws, policy, and institutional dictates….
Instead, most reviews implied that the question Why do some people engage in illegal behavior more than others? is the same as the question Why does the criminal legal system target some people more than others? This conflation was sometimes made rather consciously: The risk principle of case classification relates not to the retributive or deterrent aspects of justice but to the objective of reduced reoffending through rehabilitative programs. Let justice be done and let the just penalty be set, the just obligations be established, and the just decisions be made. The risk principle of human service becomes relevant when, in that just context, interest extends to public protection through the delivery of human services. (Andrews and Dowden, 2006: 90)
GPCSL [General Personality and Cognitive Social Learning theory] proposes that The general findings of the current meta-analysis are consistent with broad social psychological perspectives of The Big Four and Central Eight underpin a general personality and cognitive social learning theory of The LSI was developed from a general personality and social psychological perspective of crime (Andrews & Bonta, 2003), embodied in the Big Four
The problem with conflating the predictors, let alone causal explanations, for the onset of illegal behavior or exposure to the legal system with causal explanations for recidivism has long been recognized (e.g., asymmetric causation, Uggen and Piliavin, 1998). Yet, few reviews dealt directly with the implications of generalizing from their legal system sampling frames to individuals not involved in the system, and thus made the extension from recidivism to “crime” or onset of illegal behavior without clear intention or justification. One exception is a thoughtful explanation in Cottle and colleagues (Cottle et al., 2001), regarding why their meta-analysis would focus only on recidivism and not initial offending: It is not feasible to make meaningful assumptions about predictors of reoffending behavior based on predictors found to be associated with first-time delinquency.… …[S]tudies examining recidivism risk factors typically are based on more homogenous samples of adolescents already identified as delinquent. Therefore, variables significantly associated with reoffending behavior in juveniles are not necessarily useful in initially distinguishing between adolescents who will or will not become delinquents.
The importance of these dynamic risk factors is that, Changes in dynamic factors achieved through treatment that are subsequently linked to reductions in recidivism are known as criminogenic needs. (Dowden and Brown, 2002: 243) Moreover, Although the prediction of adult criminal recidivism is important and interesting, some have argued (Douglas & Kropp, 2002), and we concur, that This theory argues that interventions should
Discussion
We know a great deal about which individual-level factors are associated with recidivism. However, criminogenic risk assessment 1) does a poor to modest job differentiating among people at high versus low risk, 2) its predictive performance is often misinterpreted and overstated, and 3) many inferences drawn from its empirical evidence base are not supported by the data. Our findings suggest that we know comparatively little about criminogenic risk assessment’s actual predictive performance, in terms of false positives, false negatives, and other metrics derived from these measures. We know even less about how, and to what effect, decisions about sensitivity, specificity, and positive and negative predictive values are implemented and evaluated in the field, only that these metrics are poorly understood by researchers and practitioners in the rare cases they are even considered.
The slippage identified in the preceding sections suggests that the state of evidence does not warrant claims that criminogenic risk assessment’s “theoretical and empirical base…should be disseminated widely for purposes of enhanced crime prevention throughout the criminal legal system and beyond….” (Andrews et al., 2011 emphasis added). Existing evidence does not speak to its efficacy beyond tertiary prevention. In order for such claims to be evidence-based, the methodological, definitional, and inferential problems discussed above must be systematically addressed. A complete causal model that elaborates the structural- and individual-level antecedents, confounders, and mediators of criminogenic risk factors must be subjected to explicit hypothesis testing in appropriate samples.
One reason this has not already happened may be the radical empirical approach that forms the foundation of criminogenic risk assessment. That is to say, because the theory was developed to fit the data, rather than proposed a priori and subjected to empirical confirmation, competing explanations were not subjected to rigorous hypothesis testing. Other reasons may include prior theoretical commitments and a lack of attention to sample construction and comparison groups. For example, Andrews and Bonta (2010: 79, 93), have argued that it is a “myth” that the “roots of crime are buried deep in structural inequality.” They go on to cite the results of many of the meta-analyses reviewed here, arguing that social factors such as socioeconomic status are demonstrably weaker predictors of recidivism than criminogenic risk factors. Yet this does not appear to be the case: of the nine studies that provided estimates for so-called “demographic” risk factors, roughly 56% found “effect sizes” equal to or greater than the criminogenic risk factors. Table 2 shows that demographic risk did not perform much worse (and sometimes performed better) than antisocial characteristics in their association with recidivism. This is notable because we would not expect a factor like socioeconomic status to be strongly associated with anything in a sample where it does not vary appreciably, and the vast majority of people targeted by mass criminalization and mass incarceration are low-income.
What might explain the mismatch between the empirical evidence and proponents’ conclusions about it?
Above we have suggested that many researchers seem to overstate the predictive utility of criminogenic risk assessment in relation to the empirical evidence on which they base their claims. One possible explanation for this mismatch is that the authors of these more optimistic reviews may not be neutral arbiters of the studies they examine—both because they are often also the authors of the studies they review, and because they have financial interests in the instruments on which these studies are based. To explore this hypothesis, we conducted a post-hoc bibliometric analysis of all references cited in our sample of reviews with R package Bibliometrix (Aria and Cuccurullo, 2017), as well as a co-citation network analysis of the reviews and their analyzed studies, using R package igraph (Csardi and Nepusz, 2006).
For 35 of the 39 meta-analyses and systematic reviews, authors indicated which references were analyzed as part of review procedures, or provided lists of these primary studies in appendices or supplemental materials. We created a directed network of the relationships between the reviews and their primary studies. Supplemental Figure 3 displays this network in two layouts, with red nodes representing reviews that judged the predictive utility of criminogenic risk factors to be strong, blue nodes representing reviews that judged it to be weak, and grey nodes representing analyzed studies. The size of the grey nodes is proportional to the number of reviews that cite them.
These networks suggest that there are two distinct clusters of reviews, each of which tends to cite a group of primary studies that the other cluster mostly ignores, although there is some overlap. Moreover, each cluster tends to correspond to a different ideological position about the performance of criminogenic risk assessment: those reviews that deem the predictive utility of criminogenic risk factors to be strong tend to co-cite a similar body of studies that is distinct from the studies cited by the reviews that deem the predictive utility of criminogenic risk factors to be weak.
What characterizes the cluster of reviews that are most bullish about the predictive utility of criminogenic risk assessment? One key feature of this cluster of reviews is the involvement of the developers of a particular risk instrument, or their students and frequent collaborators. Andrews, Bonta, Dowden, Gendreau, and Wormith were authors on 73% of the reviews that judged predictive performance to be strong. Three of the five most-cited reviews (overall) included combinations of the Level of Service Inventory’s creators or their students or co-authors.When we restrict the bibliometric sample to the reviews that involve these authors, we find that 17 of the top 20 primary studies cited in those reviews were authored or co-authored by Andrews, Bonta, Dowden, or Gendreau. This degree of self-citation suggests a rather insular field that is largely self-refereed. Furthermore, Andrews, Bonta, and Wormith have a proprietary interest in the Level of Services Inventory and receive royalties on sales of the instrument from its publisher, Multi-Health Systems. Conflicts of interest such as this were disclosed in only two of the nine reviews involving these authors.
Implications for policy and practice
In theory, risk assessment in the criminal legal system might productively be used to focus resources on the people most in need of support and social institutions most in need of change. But it is difficult to imagine how it might live up to this promise without radical changes, from its conceptual underpinnings to its development, implementation, and evaluation. At the very least, as the public begins to take greater notice of criminogenic risk assessment, often opposing it on ethical as well as scientific grounds (Angwin et al., 2016; Barry-Jester et al., 2015; Smith, 2016), it is incumbent upon researchers to be clear about its scientific versus political content. This is because the perceived empirical superiority of criminogenic risk assessment lends the appearance of scientific objectivity to the selection and prioritization of risk factors, their scoring and weighting, and their tuning and revision, belying the political and value-laden decisions inherent in all data generating and modeling endeavors (O’Neil, 2016).
One way to address the theoretical and empirical overreach demonstrated above might be to democratize and de-privatize criminogenic risk assessment. This would entail: (1) making criminogenic risk assessment instruments open source and free; (2) providing open access to scoring, coding, and statistical modeling procedures; (3) providing open access to de-identified calibration and validation data; and (4) requiring jurisdictions to collect data on, and report, false positives and false negatives.
There should be no profit motive (or paywall blocking access) to the design, dissemination, and evaluation of risk assessments used to make claims about public safety, deprive people of freedom, enable or remove their access to limited treatment and social service resources, or otherwise limit or expand their life chances. In addition to transparency in the constitutive components of risk, the way in which these items are prioritized, weighted, and scored should be public and reproducible. Like certain data stored in the National Archive of Criminal Justice Data, deidentified data collected by jurisdictions using criminogenic risk assessments should be publicly available, with proper privacy protections. Jurisdictions that use criminogenic risk assessments should be required to collect data on and report sensitivity, specificity, positive predictive values, and negative predictive values on a regular basis. While the calibration of these performance measures of course has technical components, the moral and political dimensions of misclassification should be subject to the same public dialogue that informs other jurisprudential and penal norms.
Limitations
The present meta-review is limited in the following ways: First, it is of course possible that there was human error in implementing systematized procedures for screening reviews and extracting data. However, our procedures were designed to minimize this risk. Second, the primary aim of this meta-review was not to quantify a synthesis of findings across reviews, but rather to conduct critical, narrative analysis. Thus, despite being firmly grounded in quantitative methods, this review reflects the subjectivities, inherent biases, conceptual orientation, and political and normative perspectives of the authors. Its findings should thus be understood in that context. Finally, this meta-review is constrained by the methodological deficits of its constituent reviews.
Conclusion
As the criminogenic risk assessment expands at the same time that the criminal legal system slowly inches toward the precipice of reform, it is essential that we are clear about what the evidence does and does not say, in order to resist the hubris of overreach and to prevent the production or reproduction of harmful, unintended consequences. Targeted, strategic, and theory-driven research on the mechanisms of prediction and successful interventions—both individual and structural—is paramount as the field moves forward.
Supplemental Material
sj-pdf-1-pun-10.1177_14624745211025751 - Supplemental material for Criminogenic risk assessment: A meta-review and critical analysis
Supplemental material, sj-pdf-1-pun-10.1177_14624745211025751 for Criminogenic risk assessment: A meta-review and critical analysis by Seth J Prins and Adam Reich in Punishment & Society
Footnotes
Acknowledgments
The authors thank Drs. Sharon Schwartz, Bruce Link, and Lisa Bates for invaluable comments on earlier drafts of this manuscript. SJP also thanks Jennifer Skeem for their participation on his dissertation committee. This work was supported by the National Institute of Mental Health (T32-MH-13043) and National Institute on Drug Abuse (T32-DA-37801 and K01-DA045955).
Authors’ note
References for all meta-analyses and systematic reviews analyzed in this meta-review are available in the online supplement.
Notes
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
