Editors’ Introduction

Abstract

As the 21st-century unfolds, there is an increasing consensus among scholars, practitioners, and policy makers that crime prevention and crime control practices and policies should be rooted as much as possible in scientific research (Farrington et al. 2006). In an evidence-based crime policy model, the source of scientific evidence is empirical research in the form of evaluations of programs, practices, and policies. Not all evaluation designs are considered equal, however. Some evaluation designs are considered more scientifically valid than others. Higher quality evaluation designs (i.e., those with higher validity, especially internal validity) are favored in determining “what works” in crime and justice interventions (Sherman et al. 2006; Weisburd 2003). Randomized experiments and high-quality quasi-experimental research designs provide the strongest tests of the effectiveness of crime and justice programs and policy interventions (Nagin and Weisburd 2013). Some experimental criminologists suggest that well-designed quasi-experiments can produce results that are equivalent to the results produced by randomized experiments. For instance, Berk et al. (2010) reported that the results from a randomized controlled trial (RCT) used for evaluating parole and probation supervision among low-risk offenders were effectively identical to the results from a regression discontinuity quasi-experiment (see also Shadish, Clark, and Steiner 2008).

Experimental criminologists promote evidence-based crime policy models where program evaluations are ranked according to methodological rigor. For instance, in 1996, the U.S. Congress commissioned the Department of Criminology and Criminal Justice at the University of Maryland to provide an independent, scientifically rigorous assessment of more than US$4 billion worth of federally sponsored crime prevention programs. Sherman and his colleagues (1997) reviewed scientific evaluations of programs intended to prevent crime in seven settings in which crime prevention takes place: families, schools, communities, labor markets, places (e.g., urban centers, homes), police, and courts/corrections. Programs were evaluated on the Scientific Methods Scale which ranked scientific studies from 1 (weakest) to 5 (strongest) on overall internal validity. Properly implemented randomized experiments were rated highest on the scale and observational studies lowest. High-quality quasi-experimental designs that made comparisons between multiple units with and without the program, controlling for other factors, or nonequivalent comparison groups with only minor differences evident were ranked second highest in terms of internal validity. The scale was one of the first attempts in crime and justice to rank studies scientifically and to communicate quality in science more effectively to policy makers, practitioners, media, and the general public. The findings of the original 1997 report were updated several years later (Sherman et al. 2006).

It is important to note here that very few of the studies reviewed in the “Maryland Report” used randomized experimental designs. A subsequent analysis of the Maryland Report that focused on nonschool-based crime prevention studies reported that only 15% (42 of the 308 eligible studies) used randomized experimental designs (Weisburd, Lum, and Petrosino 2001). High-quality quasi-experimental designs represented only 9% (28) of eligible studies. The bulk of the crime and justice evaluations reviewed in the subsequent analysis of the Maryland Report were characterized by weaker quasi-experimental designs and correlational studies (76%, 234 of the 308 eligible studies). Moreover, Weisburd, Lum, and Petrosino (2001) observed that evaluation designs with lower internal validity were more likely to report results in favor of treatment and less likely to report harmful effects of treatment when compared to the more rigorous designs with higher internal validity (see also Welsh et al. 2011). This suggests that study designs do have a systematic effect on outcome in criminal justice studies with weaker evaluations overstating positive program effects.

Experimental criminologists have been successful in increasing the number of RCTs over the last several decades. In their examination of randomized experiments on crime and justice, Farrington and Welsh (2006) found that experiments with a minimum of 100 participants more than doubled between 1957 and 1981, when there were 37, and between 1982 and 2004, when there were 85. More recently, Braga and his colleagues (2013) found that the number of randomized field experiments in policing increased dramatically from only two RCTs completed in the 1970s to 63 RCTs by 2011. This growth was especially pronounced in the 1990s and 2000s, which included the completion of 54 policing experiments (about 86% of the population of 63 RCTs). While similar reviews of higher quality quasi-experimental designs in crime and justice have not been completed, there is some indirect evidence that the use of more rigorous quasi-experimental designs is also increasing. For instance, Braga and Weisburd (in press) recently noted that the quality of focused deterrence quasi-experimental evaluations have improved over the last decade with an increased number of studies seeking to use statistical matching to create more equivalent comparison groups.

As the evidence-based crime policy movement gains further momentum and more rigorous evaluation designs continue to proliferate, criminologists will need to add new evaluation “tools” to their methodological toolbox. For this special issue of Evaluation Review, we sent out a call for articles that advance cutting-edge experimental and quasi-experimental evaluation techniques in criminology and criminal justice. We were particularly interested in articles focused on improving the measurement of outcomes in randomized experiments, applying statistical matching techniques and regression discontinuity designs in quasi-experimental evaluations, and developing other advances in controlled evaluation methodology and statistical modeling. We feel fortunate to have six strong peer-reviewed articles accepted for this issue that cover a range of topics in evaluation methodology. Three articles examine important methodological issues in randomized experimental designs and three articles present applications of more rigorous quasi-experimental designs.

Berk and his colleagues (2010) examine the analysis of randomized field experiments using linear regression models that include indicator variables for the treatment and covariates in an effort to increase the precision of estimated treatment effects. While this approach has been recommended by popular textbook and applied in several criminal justice experiments, the use of covariates in the analysis of randomized field experiments can yield biased estimates that persist even when using large samples. Building on the recent work of Lin (2013), Berk et al. develop an alternate formulation of the Neyman causal model to offer a remedy that permits proper generalizations to well-defined populations. They then provide a practical estimator and valid standard errors for this approach and illustrate the estimator’s use with data from a recent probation and parole randomized experiment.

Emma Antrobus and her colleagues tackle the problem of nonresponse bias in RCTs that involve survey responses. They examine two types of nonresponse bias problems that can happen when implementing randomized field experiments. First, they consider the general problem of low response rates and its implications for tests of statistical significance. Second, they examine the specific problem of differential response rates for treatment and control conditions. These two types of nonresponse bias are explored in the context of the Queensland Community Engagement Trial, a randomized field experiment that used survey responses to test the impact of procedural justice policing on citizen attitudes toward police (Mazerolle et al. 2013). Their results suggest that the Queensland experimental outcomes were robust to nonresponse bias problems and consider the implications of their findings for other randomized field experiments with survey response outcome measures.

Most place-based policing RCTs involve relative small numbers of cases. In these instances, block randomized place-based trials maximize the equivalence of treatment and control groups and improve statistical power (Weisburd and Gill 2013). Joshua Hinkle et al. consider the lesser known methodological problem that the number of events within each place is sometimes too small to allow for statistically powerful outcomes. Specifically, the issue of random noise around small baseline crime rates in micro-place units of analyses in smaller cities makes detecting a treatment effect difficult. Hinkle and his colleagues use data collected from a randomized field experiment that tested hot spots policing in three California cities to illustrate the low base rate problem in small N place experiments. They use a series of analyses to examine the “noise” resulting from low base rates of crime in smaller cities and provide suggestions for future evaluations to overcome these limitations.

Sarah Jalbert and William Rhodes examine the technique of regression discontinuity as a rigorous quasi-experimental design for the evaluation of crime prevention and criminal justice programs and policies. While seldom used in criminal justice, it is very much a different type of research design that takes advantage of “natural experiments,” events that are far more common than many criminology researchers currently appreciate. Regression discontinuity designs take advantage of the fact that some localities barely miss the cutoff for state assistance and that some people only barely qualify for public support. To the extent that the “barely missed” and “barely qualified” people are roughly identical except for their treatment status, these designs can approximate true experiments (Berk et al. 2010). This article provides key details on the use of regression discontinuity, discusses common challenges facing the design, reviews the types of substantive applications and situations for which this design seems particularly well suited within criminology, and discusses some of the examples that currently exist in the field.

The final two articles present examples of the use of propensity score-matching techniques to develop balanced treatment and comparison groups in quasi-experimental evaluations of high-profile criminal justice programs. Propensity score-matching techniques attempt to create equivalent treatment and comparison groups by summarizing relevant pretreatment characteristics of each subject into a single-index variable (the propensity score) and then matching subjects in the untreated comparison pool to subjects in the treatment group based on values of the single-index variable (Rosenbaum and Rubin 1983, 1985). Christy Visher and Pamela Lattimore use propensity scores to develop statistically matched comparison groups in their federally funded evaluation of Serious and Violent Offender Reentry Initiative (SVORI) reentry programs in 12 sites serving adult males. Their evaluation suggests that the SVORI programs were generally not well implemented and, using an “intent to treat” approach, their statistical analysis revealed small recidivism reductions that were not statistically significant.

Anthony Braga, Robert Apel, and Brandon Welsh use a nonrandomized quasi-experimental design to evaluate the gun violence reduction effects of the Operation Ceasefire-focused deterrence strategies on directly treated gangs and vicariously treated gangs in Boston. Propensity score-matching techniques were used to identify balanced comparison gangs for the vicariously treated gangs. They find statistically significant reductions in total shootings by directly treated gangs and vicariously treated gangs, suggesting that focused deterrence strategies do indeed generate spillover deterrent effects to gangs that do not directly experience the full Ceasefire intervention.

We think that together these articles add important new knowledge to evaluation research in crime and justice. They provide insights into both key problems in experimental research and key opportunities in the area of quasi-experimental studies.

References

Berk

Barnes

Ahlman

Kurtz

. 2010. “When Second Best is Good Enough: A Comparison between a True Experiment and a Regression Discontinuity Quasi-experiment.” Journal of Experimental Criminology 6:159–90.

Braga

A. A.

Weisburd

D. L.

. In press. “Must We Settle for Less Rigorous Evaluations in Large Area-based Crime Prevention Programs? Lessons from a Campbell Review of Focused Deterrence.” Unpublished manuscript.

Braga

A. A.

Welsh

B. C.

Papachristos

A. V.

Schnell

Grossman

. 2013. “The Growth of Randomized Experiments in Policing: The Vital Few and the Salience of Mentoring.” Journal of Experimental Criminology. doi:10.1007/s11292-013-9183-2

Farrington

D. P.

Gottfredson

Sherman

L. W.

Welsh

B. C.

. 2006. “The Maryland Scientific Methods Scale.” In Evidence-based Crime Prevention. Rev. ed., edited by Sherman

L. W.

Farrington

D. P.

Welsh

B. C.

MacKenzie

D. L.

, 13–21. New York: Routledge.

Farrington

D. P.

Welsh

B. C.

. 2006. “A Half Century of Randomized Experiments on Crime and Justice.” Crime and Justice 34:55–132.

Lin

2013. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” Annals of Applied Statistics 7:295–318.

Mazerolle

Antrobus

Bennett

Tyler

. 2013. “Shaping Citizen Perceptions of Police Legitimacy: A Randomized Field Trial of Procedural Justice.” Criminology 51:33–64.

Nagin

Weisburd

. 2013. “Evidence and Public Policy: The Example of Evaluation Research in Policing.” Criminology and Public Policy 12:651–79.

Rosenbaum

Rubin

. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70:41–55.

10.

Rosenbaum

Rubin

. 1985. “Constructing a Control Group Using Multivariate Matched Sampling Methods that Incorporate the Propensity Score.” American Statistician 39:33–38.

11.

Shadish

W. R.

Clark

M. H.

Steiner

P. M.

. 2008. “Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random to Nonrandom Assignment.” Journal of the American Statistical Association 103:1334–43.

12.

Sherman

Farrington

D. P.

Welsh

B. C.

MacKenzie

D. L.

, eds. 2006. Evidence-based Crime Prevention. Rev. ed. New York: Routledge.

13.

Sherman

Gottfredson

MacKenzie

D. L.

Eck

Reuter

Bushway

, eds. 1997. Preventing Crime: What Works, What Doesn’t, What’s Promising. Washington, DC: U.S. Department of Justice, National Institute of Justice.

14.

Weisburd

2003. “Ethical Practice and Evaluation of Interventions in Crime and Justice: The Moral Imperative for Randomized Trials.” Evaluation Review 27:336–54.

15.

Weisburd

Gill

. 2013. “Block Randomized Trials at Places: Rethinking the Limitations of Small N Experiments.” Journal of Quantitative Criminology. doi:10.1007/s10940-013-9196-z

16.

Weisburd

Lum

Petrosino

. 2001. “Does Research Design Affect Study Outcomes in Criminal Justice?” Annals of the American Academy of Political and Social Science 578:50–70.

17.

Welsh

B. C.

Peel

Farrington

D. P.

Elffers

Braga

A. A.

. 2011. “Research Design Influence on Study Outcomes in Crime and Justice: A Partial Replication with Public Area Surveillance.” Journal of Experimental Criminology 7:183–98.