Abstract
We propose that the causes of the effectiveness (or ineffectiveness) of police interventions can be better understood with an increased focus on the measurement of treatment implementation and outputs, as opposed to the more common “black box” conceptualizations of police interventions and outcome-only evaluations used in most experimental studies. We present findings from a randomized, experimental evaluation of broken windows policing at hotspots in three California cities. Our analysis suggests that variation in the treatment delivered to target street segments within and between cities limited the ability of the study to detect potential treatment impacts and was due in part to the failure of the police agencies to take ownership of the science of the intervention.
There is now ample evidence that increased police patrol directed at hotspots is more effective in reducing crime and disorder than random, undirected (routine preventive) patrol (Braga, Papachristos, & Hureau, 2014; Weisburd & Eck, 2004). However, two issues remain less clear. First, we know little about the causes of effectiveness, or conversely ineffectiveness. Aside from increased police presence (dosage and frequency) or the use of problem-solving approaches, which specific police actions are most effective in reducing crime and disorder and under what conditions? Second, what actions should police leaders take to “create and maintain the causes” of successful strategies (Sherman et al., 2014, p. 96)?
In this article, we address two obstacles that have impeded the progress being made in understanding both of these issues: (a) the research and evaluation methodologies being used to study police interventions and (b) the lack of “ownership of police science” in police agencies (Neyroud & Weisburd, 2014; Weisburd & Neyroud, 2011). We suggest that through addressing the first obstacle, police academics and researchers can better ascertain the extent to which the second may have impacted the effectiveness of an evaluated strategy. In turn, an increased focus on eliminating these obstacles is necessary to increase our knowledge about both how police strategies are effective or ineffective, and how police can implement interventions in ways that they are likely to have impact.
We begin by briefly discussing a common shortcoming of police evaluation research. We then present findings from a randomized, experimental evaluation of broken windows policing at hotspots (Weisburd, Hinkle, Famega, & Ready, 2011) to illustrate how the research methodology for evaluations of police interventions is important in assessing whether the police really took ownership of an intervention.
Getting Inside the “Black Box” in Policing Experiments
In 1998, Lawrence Sherman advocated for evidence-based policing—that police practices should be based on scientific evidence about what works best. He proposed that more randomized experiments examining the outcomes of police work (e.g., crime rates, repeat victimization, etc.) were needed, as well as ongoing outcomes research about the results achieved by applying (or ignoring) basic research in practice (Sherman, 1998). Today, randomized, experimental designs have generally become accepted as the best method for generating valid evidence about the effectiveness of criminal justice programs, practices, and strategies (Weisburd & Hinkle, 2012). The statistical benefits of randomizing subjects or places into treatment and control groups eliminates the possibility of confounding variables, as everything affecting the outcomes of interest aside from the intervention should vary randomly across the two groups (Boruch, McSweeny, & Soderstom, 1978; Campbell & Stanley, 1963; Riecken et al., 1974; Weisburd, 2003).
However, a challenge of both experimental and non-experimental designs, is ensuring the treatment is delivered as designed in terms of dosage (e.g., total time of police presence), frequency (e.g., number of times police come and go), and outputs (the achievement of specific treatment activities; see Sherman et al., 2014). Clearly, police failure to take ownership of the science of an intervention poses a significant threat to the internal validity of evaluations. This may manifest through implementation failure or treatment decay. Key personnel may be shifted, or not shifted as anticipated, and personnel involved in treatment delivery may resist or lose interest in the intervention (Copes & Vieraitis, 2005). A more complex threat is treatment variability across subjects or areas (e.g., beats, hotspots, etc.). Police interventions frequently proscribe a combination of multiple tactics and activities. The personnel administering the treatment may employ these tactics in varying degrees across areas, or not at all in some areas. The potential for treatment variability increases when multiple sites are used to evaluate an intervention (e.g., more than one police department). Despite extensive literature recognizing the critical importance of assessment, monitoring, and enhancement of treatment fidelity in other fields (e.g., health behavior intervention research, see Bellg et al., 2004); and even the development of a measure to evaluate treatment fidelity (for health behavior interventions, see Borrelli et al., 2005) it has been somewhat overlooked in the policing research field.
Treatment decay and variability are difficult to adjust for statistically even in experimental evaluations, as using multivariate models to account for variation in treatment reintroduces the possibility of omitting confounding variables and thus loses the benefits gained from randomization. Such models can be used as sensitivity analyses, as will be done in the current study (see also Hinkle, Weisburd, Famega, & Ready, 2014), but much caution is needed if attempting to inform policy from such analyses. In the best case scenario, a watered down intervention will still have some impact on the outcome measures and suggest the strategy is promising. But if no treatment effect is found, little is gained if there is no way to examine if it is a result of the designed treatment being ineffective (theory failure), or the failure to implement the treatment as designed (Lipsey, Petrie, Weisburd, & Gottfredson, 2006). Therefore, while randomized experiments are important to examine the impact of police interventions on outcomes, evidence-based policing also necessitates evaluation of treatment implementation and outputs in all research designs to avoid erroneously concluding a strategy did not work (Weisburd, Lum, & Yang, 2003). Despite this, researchers evaluating police interventions have frequently conceptualized the intervention as a “black box” with little examination of exactly what the police treatment entailed (Lipsey et al., 2006) or the outputs that were delivered.
Even randomized policing experiments with perfectly implemented interventions can only tell us whether the overall strategy had an impact on the outcome measures when the treatment is conceptualized as a black box in analyses. As others have noted, criminology is largely past only wanting to know whether a strategy works in reducing crime (Pawson & Tilley, 1997; Rosenbaum, 1988); the more functional focus is on how interventions do or do not work. This is to say that research needs to focus on actual actions taken by police, rather than just assuming the intervention went as planned and fit the style of policing being evaluated. This can be an involved question to address, as innovations in policing over the last several decades (e.g., community, problem-oriented, broken windows, zero-tolerance, and pulling levers policing) have not been easily defined, simple strategies with straightforward instructions for implementation. To the contrary, several have eluded concise, consistent definitions, and are more accurately philosophies than strategies, and thus many different types of police activities have been evaluated under the same black box umbrellas.
Whether necessitated by vague treatment design, lack of output measures, or manuscript word limits, the analyses presented are often simple models with variables for treatment, any blocking factors (for block-randomized studies) and crime or disorder as the outcome. Although methodologically appropriate (since nothing else needs to be controlled for in randomized experiments as outlined earlier), with complex innovations that prescribe multiple police actions, it is difficult to assess how the outcomes were produced (see Boruch, 2010; Lipsey et al., 2006) without data on the outputs delivered. This in turn makes it difficult for police leaders to assess what actions need to be taken to create and maintain the causes of successful strategies. External validity is also limited when treatment is conceptualized as a black box in design or analyses. The generalizability of results cannot be reasonably ascertained without detailed explanations of treatment and measures of outputs. Such a description is essential if a program is to be replicated at other sites or implemented more broadly (Lipsey et al., 2006).
The Current Study
The current study uses data from an experimental evaluation of broken windows policing at hotspots (Weisburd et al., 2011, 2012) to illustrate the importance of examining detailed data on treatment implementation and outputs. The experiment was conducted for approximately 6 months, in three southern California cities all located between 55 and 65 miles east of Los Angeles. Based on population estimates from the 2008 Uniform Crime Report, City A is the largest of the cities with an estimated population of 172,543 over 50 square miles. At the time of the experiment, the police department employed 218 sworn officers. City B’s population of 70,730 is spread over 36 square miles, with a police force of 78 sworn officers. The smallest of the cities, City C, covers 16 square miles with a population of 51,177, and a police force of 74 sworn officers. According to the 2010 Census, the majority populations of both City A and City C are Hispanic or Latino (69% and 71%, respectively); while City B’s population is only 30% Hispanic or Latino (73% White). The median household incomes, 2008 to 2012, vary considerably across the three cities: City C-$41,496, City A-$54,994, and City B-$66,901 (United States Census Bureau, 2010).
Selection of Random Assignment Hotspots
Preintervention Disorder Calls for Service on Study Street Segments.
Treatment
A central concern was to develop a police intervention that was “true” to the notion of broken windows policing as developed by Wilson and Kelling in 1982. They suggested that broken windows policing should lead to crime prevention through the following mechanism. Police-led reductions in disorder should reduce fear of crime and community withdrawal among residents, and in time, help restore community controls. This should leave the community less vulnerable to crime, as an orderly community is less inviting to criminals than a disorderly area where offenders may perceive that their chances of arrest are lower. Conceptualizing a police intervention true to the broken windows thesis was a more difficult task than with many other policing strategies as Wilson and Kelling never provided detailed guidelines regarding what broken windows policing should look like in practice. Past studies of broken windows policing have used misdemeanor arrests (see e.g., Kelling & Sousa, 2001) or misdemeanor convictions (see Worrall, 2002) as proxies of police activity in relation to the broken windows theory. However Wilson and Kelling (1982) spent a great deal of time discussing that police must work with people in the community to negotiate consensus on behavior, and issuing citations and making arrests for every law violation were not suggested as the methods to achieve that result. In particular, Wilson and Kelling discussed that police should talk to people they find involved in activities such as public drinking about why the behavior is not allowed, and rely on warnings (or asking individuals to keep the behavior confined to a back alley out of site, etc.) rather than relying on law enforcement for every minor crime or disorder they encounter. In a later work with Catherine Coles, Kelling further illustrated that his conception of broken windows policing was not equivalent to zero tolerance policing, stating that, in fact, the ideas presented in “Broken Windows” were antithetical to the use of “streetsweeping” tactics targeted on “undesirables”; rather, they advocated close collaboration between the police and citizens, including street people, in the development of neighborhood standards. Moreover, neighborhood rules were to be enforced for the most part through non-arrest approaches—education, persuasion, counseling, and ordering—so that arrest would only be resorted to when other approaches failed. (Kelling & Coles, 1996, pp. 22–23)
Dosage and Frequency
The research and police intervention teams agreed upon a treatment dosage of 180 minutes of broken windows style policing at each of the 55 treatment segments for each week of the approximately 30 week experiment. The research team emphasized in the initial training session, and multiple refresher trainings conducted during the intervention period, that officers should make an effort to visit each segment multiple times each week, for short periods of time, and especially at times of the day or night when the likelihood of observing social disorders would be higher.
To monitor the dosage, frequency, and nature of the treatment being delivered, officers were instructed to complete a log form 3 for each visit to a treatment segment, regardless of how long they were present on the street segment, and even if they did not observe any disorders nor take any actions beyond police presence. The log forms recorded the following: the date of the visit, name of the street segment, city, officer ID number(s), the time of officer arrival and departure from the segment, social and physical disorders observed by the officer, and actions taken by the officer. Officers in two-officer units were required to submit only one log form with both officer IDs to prevent duplication and artificially inflating the intervention dosage. The log forms were scheduled to be picked up from the departments on a biweekly basis, and the data were subsequently entered into a database in order for the research team to provide departments with regular summaries of dosage levels for each segment.
Results
It is important to note that the original study (Weisburd et al., 2011, 2012) was designed to test the impact of broken windows policing at hotspots on individual-level outcomes such as fear of crime, police legitimacy, and perceptions of collective efficacy, crime, social, and physical disorder. Despite statistical power for these outcomes being very high, 4 the results indicated that the treatment did not have a significant effect on these outcomes. The lone exception was perceptions of physical disorder, which modestly increased in the treatment areas. Examining the impact of the intervention on the street-segment level outcomes of crime and disorder was not a specific goal of the original study design. The mean change in disorder calls for service (CFS) from pre- to postintervention showed declines in both the treatment and control segments, 5 but the ANOVA results (presented in the technical report by Weisburd et al., 2012) did not indicate a statistically significant impact of the intervention on disorder CFS. A follow-up study (Hinkle et al., 2014) provided support for the argument that this was due to a lack of statistical power for street-level outcomes due to relatively low-base rates of disorder at the study segments (see Table 1). 6
As discussed in Hinkle et al. (2014), low-base rates and statistical power present a challenge to detecting treatment effects when evaluating microplace police interventions in smaller cities. Although a sample size of 110 hotspots is not a small sample, it took three cities to yield enough street segments with 10 or more disorder CFS and 3 or more Uniform Crime Reporting Part 1 crime CFS the year before the experiment began. If additional cities could have been recruited for the experiment, there would have been more hotspot street segments, and likely a higher preintervention base rate of calls for service to obtain more power. However, multisite evaluations also create additional challenges for researchers once studies are underway, particularly treatment implementation and variability across sites, which may also wash out potential treatment impacts. 7 Later, we examine treatment implementation and output measures to assess the extent to which police created and maintained the conditions proscribed, and whether they took ownership of the experiment.
Examining the Delivered Dosage and Frequency of Visits and Activity
The log forms submitted by officers for each visit to treatment segments were used to calculate the treatment dosage and frequency variables, and the results discussed here.
8
Figure 1 illustrates that in two of the three cities, the designed dosage of 180 minutes of broken windows style policing per treatment segment per week, was met or exceeded through most of the intervention period, but all three cities initially had problems meeting the dosage goal. This was due to police agencies underestimating the number of officers that would need to be assigned to the experiment and failing to allocate the agreed upon resources. During the planning stage of the study, each of the three agencies agreed to form a Broken Windows Policing Unit comprised of 8 to 12 officers, and to assign a supervisor responsible for overseeing members of the unit and managing their operations. City B did not achieve the minimum agreed upon staffing until Week 4, City C until Week 8, and City A until Week 9. Although collectively, the designed dosage was met or exceeded after the initial weeks, only City B was able to maintain (and usually exceed) the minimum personnel allocation throughout the rest of the intervention period; moreover, they had to supplement the unit officers with patrol officers in order to achieve this.
Implementation of police intervention.
Dosage—Number of Hours Police Presence on Treatment Segments.
F = 168.222. p = .000.
Multiple indicators suggest that both police supervisors and officers in City A did not take ownership of the experiment. During Weeks 5 to 7, there was almost complete treatment decay, with either one, or no officers delivering treatment during these 3 weeks. During Weeks 9 to 17, the department assigned patrol officers to the experiment to supplement the original unit officers and increase dosage. A member of the research team conducted four additional training sessions during patrol shift briefings. 9 During this 9-week period, the number of officers participating in the experiment ranged from 16 to 27, yet the sum dosage at treatment hotpots was less than during weeks when as few as six officers were participating in the experiment. Due to an internal dispute over resources, by Week 18, all but two of the original unit officers and five patrol officers were removed from the experiment.
Frequency—Number of Police Visits to Treatment Segments.
F = 309.57, p = .000.
Dosage—Average Minutes per Police Visit to Treatment Segments.
F = 97.323; = .000.
The average minutes per police visit to treatment segments are displayed in Table 4. Repeated training sessions stressed that officers should make an effort to visit each segment multiple times each week, for short periods of time, yet in City A, officers spent on average 54 minutes per visit to a treatment segment. This is much higher than what both prior research and common sense would suggest is effective in terms of the chance of observing social disorder, as activity on a street cools down upon the arrival of a squad car and will remain lowered for a good while after they leave. Koper (1995) found that the ideal dosage for police presence to deter disorderly and criminal behavior was 10 to 15 minutes, and that longer presences had diminishing effects (see also, Telep, Mitchell, & Weisburd, 2014). With just this data, it is not clear whether there was a failure to implement treatment as designed, or whether the duration of visits is long because officers had several disorder problems to address during visits; which is why measures of outputs are also important (and will be discussed later).
Examining the Outputs
Data from the officer log forms were also used to examine the nature of treatment that was delivered. Officers were instructed to record both the physical and social disorders they observed during each visit to a street segment, and to indicate the number of specific actions they took to address the disorders on the log form. The data collected regarding the physical disorders will not be discussed here rather we will focus on the treatment of social disorders as police had a wider array of options for dealing with social problems. Log forms included a list of 16 social disorders 10 along with an “other” category and also provided a checklist for actions the officers took to address these problems. 11 Officers were asked to indicate the number of times each action was taken to address a specific type of disorder (e.g., an officer may have conducted two stop-and-frisks (N = 2), made an arrest (N = 1), and issued a citation (N = 1) in the course of addressing one type of disorder. However, officers did not identify the number of incidents of a type of disorder during each visit on the forms. For example, officers in City C observed traffic disorder(s) during 427 visits to treatment segments; however, they recorded talking to 649 “suspects” in the process of addressing traffic disorders (conducting traffic stops; see Appendix 1). Therefore, during some of the 427 visits during which traffic disorders were observed, multiple incidents of traffic disorders were observed.
Overall Activity on Treatment Street Segments by City.
Five Most Prevalent Social Disorders Addressed by City.
Five Most Frequent Actions Taken by Police by City.
Outcome Measures
To explore the possibility that variation in treatment dosage and outputs (the specific actions taken by police to address disorder) within and across cities may have masked the ability of the study to detect treatment impacts, and keeping in mind the low-base rates of crime and disorder CFS on the street segments, we used Poisson regression 13 to examine the effects of treatment variation on postintervention disorder CFS for the 30 segments with the highest preintervention CFS.14,15 We used two measures to examine this: the total number of hours of police presence on each segment and the total number of actions taken (by police) at each segment. 16 This allows us to look at two ways the treatment varied across segments and whether these variations potentially affected the ability of the experimental analyses to detect impacts on disorder. Moreover, it allows for examining whether any potential impact was actually due to the mechanisms of broken windows policing (the specific police actions taken to reduce disorder, see Weisburd, Hinkle, Braga, & Wooditch, 2015), or another mechanism such as deterrence from simply more police presence on the segments. It is important to note at the outset that this analysis cannot be used to argue the effectiveness or ineffectiveness of the treatment. As noted earlier, performing simple observational analyses on experimental data loses all benefits of randomization and cannot rule out potential confounding factors. This is simply an exploratory approach to further examine whether variation in treatment dosage and activity may have further limited the ability of the original study to detect a treatment effect in street segment level analyses of disorder and crime CFS.
Poisson Regression: Impact of Total Number of Hours of Police Presence on Disorder CFS.
% = percent change in expected count for unit increase in X; % Std X = percent change in expected count for SD increase in X; SD of X = standard deviation of X; N = 30; Pseudo R2 = .358.
p ≤ .05. **p ≤ .01. ***p ≤ .001.
Poisson Regression: Impact of Total Number Actions Taken to Address Disorder on Disorder CFS.
% = percent change in expected count for unit increase in X; % Std X = percent change in expected count for SD increase in X; SD of X = standard deviation of X; N = 30; Pseudo R2 = .327.
p ≤ .05. **p ≤ .01. ***p ≤ .001.
These findings show that the total number of actions police took against social disorder was significantly associated with postintervention disorder CFS (B = −0.007, p = .002). This suggests the intervention was more likely to have an impact on segments where police took more specific actions to address social disorder. However, the effect size is again modest, with each additional action leading to an expected 0.7% decline in disorder CFS. Nonetheless, the findings again offer support for the argument that the variability in treatment across the study street segments and cities could account in part for the failure of the experimental analyses to find impacts on disorder. As with the time measure, these should not be interpreted as evidence of treatment impact, but rather simply as evidence that treatment variation may have affected the ability of the original study to test impacts on street-segment disorder or crime outcomes.
Discussion
The overall broken windows policing intervention did not have a significant impact on crime and disorder calls for service in the experimental findings (Weisburd et al., 2012). This is likely due to the research design which was focused on impacts on perceptions of citizens rather than crime and disorder outcomes. Indeed, with the low-base rates of crime and disorder CFS, statistical power for examining impacts on crime and disorder CFS was quite low. Specifically, the intervention needed to produce crime declines of over 50% to reach the .80 power threshold (see Hinkle et al., 2014). This of course reinforces the importance of careful statistical power analyses before a study begins. This is not a criticism of the original study, as the authors noted at the outset the modest statistical power for examining these outcomes (as contrasted with the main outcomes of the study). But these analyses suggest that traditional estimates of statistical power would have in any event been overstated.
Our analyses of dosage, frequency, and output measures reveals that in all three cities police presence and visits to hotspots increased during the experimental period (arguably beyond the level these locations would generally receive), and outputs consistent with our operationalization of broken windows policing were delivered. In all three cities, informal actions like talking to suspects or issuing warnings were more common police actions than citations or arrests. However, these data also show variation in treatment between sites, and suggest that in varying degrees, police failed to take ownership of the science of the experiment (Neyroud & Weisburd, 2014; Weisburd & Neyroud, 2011). In one city, there was both a period of treatment decay, and overall much weaker implementation of all aspects of treatment. In all three cities, there was deviation from the treatment design in the desired frequency of visits to hotspots. After the experiment concluded, the research team conducted focus groups with officers who had participated in the experiment. Their responses to several questions provide additional evidence of a lack of ownership.
When asked the question: “How was Broken Windows Policing carried out in the field? What happened during a typical shift?” Officers from all departments made statements which suggest that the deviation from the treatment design in the desired frequency of visits to hotspots was a result of their commanding officers being more concerned with simply meeting the treatment dosage of 180 minutes per segment per week, than adhering to the way in which this dosage was designed to be delivered. Some supervisors “scheduled” officers to spend time at hotspots: The way we had it set up was you just go to a segment for an hour block. They modified my patrol . . . working half of my shift on normal patrol and half of it I was assigned to segments for hour blocks. (City C officer) We never sat a solid hour. I know other units in the police department did, like Sgt [name removed] and his unit—he made them sit there for an hour. We didn’t do that. I know we were supposed to, but we just couldn’t. (City B officer on night shift, emphasis added) When I was there in a black and white police car there was actually no criminal activity going on the entire time I was there. It’s obvious, if a police car is going to be there for two hours or an hour nothing is going to be happening in that particular hundred block. (City A officer, emphasis added) I think the feeling is that you could do in 10 minutes what some of us wasted an hour in doing . . . You could take a car, put a dummy in it and sit it on the block and get the same results. (City A officer) A lot of us don’t think we collected that data how you think we should have. And I’ll be frank, we use a term called “pencil f*ck” and it means if you asked for six hours I may have actually put in four and a quarter. (Participating officer, department withheld) I’ll give an example, we go on (specific treatment segment). I go up there, and you know, with just your presence there people really aren’t doing a whole lot. If you want to find something you really got to search for it. If nothing was going on when I got there I’d pretty much just do traffic stops or ped [sic] checks. (City C officer)
Arguably the most significant challenge facing evaluation researchers in policing is getting the agencies to take ownership of the science necessary to produce valid evidence on program effectiveness. When police fail to take ownership, implementation tends to be shallow at best. This clearly creates a challenging environment for producing reliable evidence on what works in policing. Over the last few decades, evaluations of police agencies’ attempts to implement new innovations have revealed that police agencies are notoriously resistant to change (Weisburd & Braga, 2006). Police have limited resources and are unlikely to fully deliver programs in which they see little utility (Sherman et al., 2014). A strategy that is of interest to police scholars may nonetheless be a hard sell to police leaders and the line officers who are asked to carry it out in the field. Yet, only recently has a discussion commenced about the need for police agencies to step up their use and ownership of science in order for policing to become an arena of evidence-based policies (Neyroud & Weisburd, 2014; Weisburd & Neyroud, 2011).
Many of the changes that are necessary for this to succeed must be borne by police agencies (i.e., changes in police education and training, leadership priorities, and measures of agency success). All of these need to reflect that police take science seriously (Neyroud & Weisburd, 2014; Weisburd & Neyroud, 2011), and until they do, it is unlikely that evidence-based policing will become the norm rather than the exception in police agencies across the country. However, police scholars and researchers have also contributed to the current state of affairs. Police research has been instrumental in the tremendous progress that has been made over the last 40 years in our understanding of what works. But it is also restricting our understanding of how innovations work (see Pawson & Tilley, 1997, 2009; Weisburd et al., 2015). As Weisburd and Neyroud (2011) have noted, “most police practices are not systematically evaluated, and we still know too little about what works and under what conditions in policing” (National Research Council, 2004; Weisburd & Eck, 2004). There have also been few efforts to develop management strategies to guide police in implementing and sustaining innovations in ways that they will have impact (although see Sherman et al., 2014).
Several decades ago, Goldstein (1979) proposed that police had become susceptible to “the ‘means over ends’ syndrome, placing more emphasis in their improvement efforts on organization and operating methods than on the substantive outcome of their work” (p. 236). In the quest to build knowledge on the causes of effectiveness, it is important that policing scholars not fall susceptible to an “ends-over-means” syndrome, where more emphasis is placed on examining changes in outcome measures than on improving research design, evaluation, and the dissemination of findings, in order to not limit the substantive outcomes of their work. If evaluation research is not directly designed to assess the extent to which police created and maintained the conditions proscribed, then regardless of whether an intervention is found to be effective or not, we are still no closer to identifying the causes of effectiveness or ineffectiveness.
In order for police researchers to make progress here, we propose that first, research and evaluation methodologies for police innovations must be better able to address whether interventions that do not produce expected outcomes are due to research design, implementation failure or variability and police lack of ownership, or treatment ineffectiveness. The use of randomized controlled trials in policing evaluations is a good start, but there are still implementation issues that need to be addressed. Intensive efforts in measuring and monitoring interventions throughout study periods, and regular meetings with the involved agencies are necessary to achieve and maintain proper treatment. This is especially important and challenging in multisite evaluations. Second, research and evaluation methodologies need to focus more on identifying the specific police actions or outputs of treatments that are effective, so that police agencies can make more efficient use of their limited resources and implement innovations in ways that are most likely to have impact. Third, researchers must follow the advice of Weisburd and Neyroud (2011; see also, Neyroud & Weisburd, 2014) and pursue and design studies that have the best chances of getting the involved agencies invested so that they take ownership of the science.
Conclusion
City A: Social disorders and actions taken
City B: Social Disorders and Actions Taken
City C: Social Disorders and Actions Taken
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported in part by National Institute of Justice (grant number 2007-IJ-CX-0047).
