Abstract
A cornerstone of broken windows theory concerns public perceptions of disorder and crime, and the citizens’ “panic response” with the onset of disorder in their neighborhoods. Naturally, assuming this dynamic to exist lends support for a significant expansion of police operations from traditional crime control to order maintenance. More specifically, the advocates of the theory presume that citizens view disorder and crime as two distinctive constructs, and further that the former triggers the latter. Broken windows theory was quite popular during the 1990s and early 2000s, a period in which order maintenance or quality of life policing reached its apex of popularity. Findings from recent studies on public perceptions of disorder and crime, however, have called into serious question: the cognitive distinction between crime and disorder. Using data collected from a random telephone survey of residents residing in the Houston metropolitan area, we follow this line of research and test the hypothesized dynamics underlying broken windows theory. Our principal findings suggest that neither a one-factor model (convergent) nor a two-factor model (discriminant) fit the empirical data when an appropriate concept validation process is carried out. Implications are drawn for broken windows theory and some specific recommendations are made for future research at the end of the study.
Keywords
Introduction
A cornerstone of broken windows theory concerns public perceptions of their environments, and how citizens respond to disorder and crime taking place in their own neighborhood. Disorder and crime occurrences comprise the two essential anchors of the theory, and the starting point is the public perception of a marked increase of disorder, particularly social disorder (Skogan, 2008). Assuming disorder does give rise to fear of personal safety and leads to crime in turn, there is legitimate cause for the police to expand their scope of operations from conventional crime control to order maintenance/quality of life policing aimed at “fixing” broken windows to prevent these negative dynamics from taking hold in the first place. Since the early 1980s, police departments across the nation have adopted a variety of programs such as zero tolerance policing and quality of life policing in efforts to monitor and control disorder (Greene, 2000).
The concept of disorder is absolutely fundamental to broken windows theory. It is argued that signs of disorder, either of a social or physical nature, can set off the chain of events that lead to higher crime rates in urban neighborhoods (Kubrin, 2008, p. 204). Since social disorder is typically measured by the rate of occurrence of minor offenses of the law, Wilson and Kelling (1982) used misdemeanant offenses as an indicator of social disorder and used felony crimes to represent crime. They argued that disorder and crime reflect two distinct, separate concepts in the cognitions of citizens (also see, Kelling & Coles, 1996; Kelling & Sousa, 2001). Broken windows theory was widely embraced among law enforcement professionals during the late 1980s and the 1990s (Bratton, 1998; Maple, 2000).
In more recent years, however, the hypothesized relationship between disorder and crime has been seriously challenged by scholars. For example, after reanalyzing Skogan’s (1990) data, Harcourt (2001) found that the reported effect of disorder on crime in the Skogan study was largely due to the influence of a single outlier neighborhood. If that one neighborhood was deleted from the analysis, the disorder-crime link evaporated (also see, Harcourt & Ludwig, 2006). Similarly, based on findings from the videotaping and systematic rating of visible physical features of 23,000 street segments in Chicago, Sampson and Raudenbush (1999), p. 637) noted that after controlling for these built environment neighborhood characteristics the relationship between disorder (misdemeanant offenses) and crime (felonies) “vanished.”
Gau and Pratt (2008), p. 170) insightfully noted that “one simple question that goes to the very core of broken windows theory and policing, which seems to have been lost in the shuffle of academic and practitioner debate, is whether citizens perceive disorder and crime as two separate phenomena.” There is good reason to believe that the occurrence of disorder and public perceptions of disorder and crime are not always in alignment, let alone connected in the sequential manner broadly assumed (Gau & Pratt, 2008; Piquero, 1999). Currently, the limited literature on this key issue offers a rather mixed picture of contradicting evidence (Armstrong & Katz, 2010; Gau & Pratt, 2008; Worrall, 2006a).
The purpose of this study is to extend and deepen this line of research by examining public perceptions of neighborhood-level disorder and crime in a systematic way. This article addresses this specific question: Are public perceptions of disorder and crime reflecting a unitary cognitive construct or two distinct constructs? Methodologically, this is an issue of discriminant validity of key concepts at the core of broken windows theory. Discriminant validity concerns the extent to which a latent construct can be viewed as distinct from other related latent constructs (Farrell, 2010, p. 63). Using public survey data and GIS technology we make use of confirmatory factor analysis (CFA) to explore this key question of dimensionality of citizen perceptions of disorder and crime at the core of broken windows theory.
Literature Review
Broken Windows Theory
Broken windows theory features some distinctive characteristics that have caused it to have made such a major impact on the literature of criminology and American policing. First, the theory draws attention to the nefarious impact of remediable disorder (e.g., minor offenses) as opposed to the difficulty of deterring serious crime on the lives of citizens. Skogan (2008), p. 195) noted that “Although they did not give it that name, they focused on what we now call social disorder. Their original list included public gambling, public drinking and urination, street prostitution. . ..” This was a relatively innovative idea given that prior criminological research on fear of crime was premised on the belief that the primary cause for fear of crime among citizens was the incidence of major crime — in particular, violent crime (e.g., Brooks, 1974; Poveda, 1972; the Presidential Commission, 1967). Broken windows theory redirected the attention of law and order discussions on the belief that as long as there is a lack of intervention from a legitimate governmental party to disorder incidents, generally the Zimbardo documented downward cycle of decline will occur and serious criminality will follow suit.
In addition to this redirection of attention to disorder, the theory also brought to light the importance of public perception formation phenomena. It was argued that social disorder incidents in a neighborhood tend to have two significant impacts on individual residents residing there. Residents’ perceptions of disorder will be elevated, signaling impending future danger in the neighborhood if past disorder continues unaddressed. It was argued that in addition to becoming more fearful citizens often become less trusting of others, less willing to interact with each other, less inclined to assist their neighbors in distress, and less inclined to intervene on behalf of the community (Wilson & Kelling, 1982). This combination of reactions to unaddressed disorder leads to an erosion of the sense of collective efficacy among residents and the rise of a sense of futility and social withdrawal. The advocates on community policing were among the many people across the country to consider broken windows theory a convenient framework for building neighborhood-lead crime prevention programs throughout the country.
Test of Broken Windows Theory
Ever since the publication of the Wilson and Kelling broken windows classic formulation, disorder phenomena have become an important focus of research on outcomes attributable to community policing (Greene, 2000). Over the course of three decades two principal approaches to the testing of broken windows theory have emerged. The first, and the most frequently adopted approach, is to build on a rather practical question—namely, do empirical tests of outcomes associated with the active policing of disorder serve as evidence of an effective crime control strategy? There have been a number of noteworthy studies conducted during the 1990s on the efficacy of this disorder-focused crime reduction strategy (Silverman, 1999; for a review, see Greene, 2000). 1
The second, less frequently taken approach focuses on the identification of social and physical disorder in neighborhoods and tests the connection between disorder and public perceptions of disorder and crime. These well-crafted studies involve the use of systematic field observations to document the levels of social and physical disorder present in urban neighborhoods (e.g., Braga & Bond, 2008). For example, based on the videotaping and systematic rating of more than 23,000 street segments in Chicago, Sampson and Raudenbush (1999) were able to construct reliable scales of social disorder, physical disorder, and crime and found that disorder and crime might actually be part of the same cognitive construct; they observe in this regard that “rather than conceive of disorder as a direct cause of crime, we view many elements of disorder as part and parcel of crime itself” (Sampson & Raudenbush, 1999, p. 608). Similarly, Ross and Mirowsky (2001) found that elements of neighborhood disorder such as vandalism, drugs, and youth gangs were generally found in disadvantaged neighborhoods and were associated with a breakdown of social control. A rich body of literature has arisen regarding the documentation of the levels of disorder in a neighborhood (Perkins & Taylor, 1996; Sampson & Raudenbush, 1999, 2004; Skogan, 1990). However, the research on public perceptions of disorder and crime has been far more limited.
A discussion of this limited body of research identified three studies that have attempted to test the conceptual relationship proposed by broken windows theory or incivility thesis (i.e., Armstrong & Katz, 2010; Gau & Pratt, 2008; Worrall, 2006a). 2 Using a citizen survey conducted in Eastern Washington Gau and Pratt (2008), p. 178) attempted to validate the theoretical concepts which distinguish between disorder and crime. They reported that the confirmatory factor loadings of a set of disorder-related and crime-related survey items supported a one-factor approach as opposed to a two-factor solution. The one-factor conclusion derived resulted from a high correlation between disorder and crime in CFA analysis. Gau and Pratt note in this regard: “The correlation between the crime and disorder factor was 0.92. A between-factor correlation of 0.85 or greater indicates that the factors cannot be justifiably separated (Gomez et al., 2005).” Somewhere in between the broken windows view and the Gau and Pratt position is the work of John Worrall based on a study of crime and survey data from 12 cities. Worrall (2006a) examined the discriminant validity of a number of perceptual dimensions including perceived crime, social disorder, physical disorder, and fear of crime based on public surveys conducted in 12 different cities. The results of a confirmatory factor analysis of a two-factor solution suggested that the loadings of social disorder items were quite high, but contrary to expectations two of the loadings of perceived crime items were negative in values. The CFA of the one-factor solution did not perform any better due to poor loadings of three out of six perceived crime items (Worrall, 2006a, p. 377). In addition, the fit indices for both models were unsatisfactory (e.g., χ2/df = 21.78 and 18.95 respectively). 3 Worrall (2006a) concluded on the basis of his analysis that the relationship between disorder and crime is captured by neither a one-factor solution nor a two-factor solution. Recently, using a random sample collected from Mesa, Arizona, Armstrong and Katz (2010) replicated Worrall’s study and found that the conceptual relationship between public perceptions of disorder/incivilities and perceptions of crimes failed to find empirical support. The results were similar to the findings of Worrall that neither unmodified single-factor model nor modified two-factor model fits the data well due to low levels of adjusted goodness-of-fit index (AGFI) (smaller than 0.90). In reviewing the recent empirical testing literature on broken windows theory, Kubrin (2008), p. 203) opined rather harshly: “Over the past several years, social science has not been kind to broken windows theory.”
Rationales for the Current Study
A thorough review of the three studies discussed above reveals that several important factors limit their comparability in three primary ways, which warrants the current study. First, no research on the public perceptions of social disorder and crime has been conducted in a large urban environment, the location Wilson and Kelling highlighted in their article. Similarly, no single study has explored the conceptual relationship between disorder and crime in suburban neighborhoods, an important location that has been completely forgotten by the extant research on this topic. For example, Worrall (2006a) used the secondary data of 12 cities sponsored by the COPS Office in the 1990s but lumped them together in the analysis. Similarly, the site of Armstrong and Katz’s (2010) study was a medium-sized city, Mesa, Arizona while the research spot of Gau and Pratt (2008) survey was in essentially a rural area located in the eastern part of Washington State. In our view, Wilson and Kelling (1982) highlighted the social disorder in large cities in their study and none of the three studies have particularly examined the public perceptions of social disorder and crime in a large urban-suburban setting.
Second, the methodology employed in the three studies varied widely regarding validation issues and the measurement model. 4 In his analysis of the secondary data, Worrall (2006a) performed the CFA based on the loadings of the manifest variables without any modification. Not surprisingly, the measurement model did not fit the data and, therefore, failed to support either one or two-factor approach concerning the relationship between public perceptions of disorder and crime. Armstrong and Katz (2010) noted that part of their study was a replication of Worrall’s to assess the conceptual relationship between the disorder and crime among respondents. However, the model fit statistics of the two studies were astonishingly apart in the results of the two initial models. For example, the fit indices for the initial model without any model modification show that the CMIN/DF for Worrall (2006a) was 21.78 for the one-factor model and 18.95 for the two-factor model. In sharp contrast, the CMIN/DF of the initial models reported by Armstrong and Katz (2010) were 2.02 for one-factor and 1.56 for the two-factor model. The difference of CMIN/DF values reported between Worrall (2006a) and Armstrong and Katz (2010) is almost 10 times, suggesting fundamental differences between the two samples. More specifically, the measurement model of SEM fits the respective samples of the two studies differently. Since CMIN/DF is a primary indicator of the absolute fit indices, a value of around 2 in CMIN/DF is often considered an excellent fit disregard of sample size (Barrett, 2007) and an impossible fit by other scholars (Hu & Bentler, 1999; Miles & Shevlin, 2007; Wheaton et al., 1977) when the sample size is large.
Third, modification of the measurement model is often permitted in order to improve the model fit. A commonly used technique is to allow error terms of manifest variables to be correlated if they are loaded on the same factor. This is because the variation of error among these variables tapping into the same theoretical construct is not random but systematic. Researchers often follow the suggestions from modification indices to make an adjustment. This practice, to a large extent, is statistically driven and may share the similar traits of exploratory factor analysis not confirmatory, “Once modifications have been completed, one must realize that the analysis has moved from confirmatory to exploratory” (Schreiber et al., 2006, p. 330). This means the modified model with significant improvement in fit indices is not the initial theoretical model hypothesized. If CFA is involved with testing the relationship among constructs, validation of the modified model is crucial (Lei & Wu, 2007). A general approach for validation is to randomly split the data into two subsamples, one for testing and the other half for validation. The primary purpose is to assess if the modified measurement model fits the validation sample. Out of the three studies discussed above, only Gau and Pratt’s study performed the validation process and found that the one-factor approach is superior to the two-factor approach. In contrast, Worrall (2006a) did not modify the measurement model and found neither approach works while Armstrong and Katz (2010) failed to validate their modified measurement model.
Given the three limitations of previous studies, we believe that the analysis reported here can make two important contributions to the yet rather scant empirical research available on broken windows theory. Our first contribution is to test the theory in two closely related environments: a large city and its suburban area. In fact, the theory is largely derived from observations and research findings reported in neighborhoods in urban and suburban communities, especially large U.S. cities. Wilson and Kelling (1982) cited several case studies on how citizens perceive disorder and crime phenomena in places such as Newark, New Jersey, and New York City (also see, Kelling & Coles, 1996). An important point is that their discussion also extended to the character of public perceptions of disorder in suburban areas located near a big city such as the widely known “broken window” process observed in Philip Zimbardo’s experiment carried out in East Palo Alto in the San Francisco Bay Area. The previous studies investigating our question of interest used data derived from citizen surveys conducted in smaller communities in Eastern Washington State (Gau & Pratt, 2008) and in urban areas across the country (Worrall, 2006a) and a medium-sized city (Armstrong & Katz, 2010). Unlike those prior studies, the data used for this study are based on citizens residing in Houston, the fourth largest city in the U.S., and in the suburban communities in the Houston zip codes with a population of 3 million. In our view, the examination of disorder and crime perceptions at the neighborhood level in a combination of the large central city and suburban settings offers a good opportunity to test the conceptual framework of broken windows theory in a way that has not been done heretofore. Though the specific questionnaire items tapping into disorder and crime perceptions in Houston differ slightly from the previous three studies, the core measures of crime and disorder and citizen perceptions of those phenomena so crucial to broken windows theory are sufficiently alike to permit direct comparison of results with these three studies.
The second contribution of our study is to use cross-validation to estimate how the modified model fits a new sample. As discussed earlier, Gau and Pratt’s study used a split sample to evaluate the fit indices with the modified measurement model. Armstrong and Katz (2010) did not employ a validation sample to examine the model fit between the modified measurement model and the empirical data. We strongly believe that validation is important in the testing of theory if the measurement model is modified. Accordingly, we will randomly split data into two samples. In addition, it is important to note that our study borrows the geo-mapping technique to visualize the geo-coded sample distribution in the map. Previous studies on public opinions generally describe their sample as randomly selected implying a distribution of respondents throughout the area of interest but the actual geographic distribution of respondents is largely unknown. The current study is the first one in the research on public perceptions of disorder and crime to map each of the 1,850 respondents’ residences based on the geo-coded information.
Methods
Data
The data for this study were obtained from a random-sample landline telephone survey of the residents aged 18 years and older who resided in the Houston metropolitan area in 2010. The sampling frame was the landline telephone directory in the spatial boundary of Houston zip codes. According to the US Postal Service, Houston, TX covers 178 zip codes including unique (single high-volume address such as governmental agencies, universities, businesses, or buildings), Post Office Box only, and Standard (home and business mail delivery addresses). For the sampling frame of this study, the six unique zip codes and 76 P. O. Box zip codes were identified and excluded because they simply represent a point of mail delivery and do not correspond to geographic areas. The 96 standard zip codes (77002–77051 and 77053–77099) for which a clear geographic shape can be drawn were utilized for the selection of the telephone directory. These 96 zip codes represent the metropolitan area of Houston that covers the city of Houston and the neighboring suburban areas in Harris County. 5
Random digit dialing and computer-assisted telephone interviewing (CATI) technology were used to carry out the survey. Houston respondents residing in Spanish-only speaking households were provided with the opportunity to complete the interview in Spanish. The response rate for the survey was 37%, and interviews were conducted between January 1 and January 15 of 2010; a total of 1,850 residents in the metropolitan area of Houston were interviewed over the phone by trained interviewers. The survey documented the respondent’s residence with X and Y coordinates collected at the time of the survey, information which enables us to incorporate the geo-coded data and survey-based perceptual and attitudinal data into a single dataset using the GIS software ArcGIS. The geo-coded sample distribution of the 1,850 respondents is reported in a map shown in Figure 1. The survey respondents came from most areas of the Houston zip codes except the airport (i.e., George Bush Intercontinental Airport in the North), lakes (i.e., Lake Houston in the Northeast), and forested and recreational areas (i.e., in the West). In addition, the geo-coded sample distribution shows that the respondents were relatively spread across the city and suburban areas.

Geo-coded sample distribution in the Houston zip code map.
A comparison of key demographic variables was conducted to evaluate the extent of representativeness of the sample (see Appendix A). According to the 2010 U.S. Census, the population of Houston zip code area aged 18 and older was 2,120,231. Among them, 50.5% were female, 33.3% are 50 years old or older, and 26.2% were White. Our 2010 survey data revealed similar gender distribution (i.e., 53.0% females and 47.0% male respondents). However, due to the problem of noncoverage in landline telephone surveys, 6 our sample reflects a higher percentage of older respondents (53.9%) compared with the 2010 Census data. The racial composition of the sample is over-representative of Whites (41.0%) and under-representative of Hispanics (33.2%). Similar to the methodology employed in the Gau and Pratt (2008) study, the 1,850 respondents in the 2010 survey data are randomly allocated to a testing sample and a replication sample, with data from 925 respondents in each subsample.
Measures
A total of 11 manifest variables are used to test models of the discriminant and convergent validity of the disorder and crime aspects of broken windows theory. For each of the disorder and crime perception variables respondents were asked to rate the level of problem severity in their immediate neighborhoods, using a scale ranging from 1 = no problem to 4 = serious problem. Four variables are employed to tap into the latent construct of crime; the crime perception measure in this study is derived principally from the Type I crimes reported annually by the Federal Bureau of Investigation and are broadly considered to represent violent crime. These crimes are all felony offenses and clearly differ from the misdemeanors featured in the survey questionnaire such as drug use, vandalism, and public drunkenness. Respondents were asked to assess a number of crime problems in their neighborhood, including: (1) rape; (2) people being robbed (robbery); (3) violent physical attacks; and (4) breaking and entering to steal personal property (burglary).
Disorder or incivilities have been conceptualized in two rather distinct dimensions in the research literature—namely, social disorder and physical disorder (for reviews see, Armstrong & Katz, 2010; Taylor, 1999, 2001; Worrall, 2006a). In their empirical study on the effects of the problem-oriented policing initiative in Lowell, Massachusetts, Braga and Bond (2008) differentiated social disorder such as public drunkenness, drug sales, and loitering from physical disorder such as street segments with trash, damaged structures, and unkempt vacant lots (also see, Braga et al., 1999; Piquero, 1999). A review of the research literature reveals that social disorder is typically perceived of and measured as the occurrence of minor criminal offenses, while physical disorder is largely a reflection of how well public spaces and private properties in a neighborhood/physical environment are maintained (Braga & Bond, 2008; Corman & Mocan, 2005; Kane, 2006; Kelling & Sousa, 2001; Worrall, 2006a). Social disorder such as misdemeanant incidents are commonly used to test the broken windows theory in policing research (e.g., Corman & Mocan, 2005; Kane, 2006; Kelling & Sousa, 2001; Worrall, 2006b). Given the greater salience of social disorder for broken windows theory phenomena, we make use of social disorder in our analysis. Thus, social disorder is a latent construct reflecting a set of perceptions of several types of disorder and is derived from previous studies (Cao et al., 1996; Gau & Pratt, 2008; Scheider et al., 2003). Respondents were asked to assess the level of severity of a number of disorder problems in their neighborhood, including: (1) people openly selling drugs; (2) drunk drivers on the road; (3) people drinking to excess in public; (4) groups of teenagers hanging out and harassing people; (5) the presence of youth gangs; (6) people using illegal drugs; and, (7) vandalism. 7
Analytical Strategy and Statistical Assessment of Survey-based Evidence
Confirmatory factor analysis (CFA) is used to examine the relationship among observed variables and underlying theoretical constructs. CFA has been employed to verify theoretical constructs, including broken windows theory (Armstrong & Katz, 2010; Gau & Pratt, 2008; Worrall, 2006a). An obvious advantage of CFA concerns the number of model fit indexes that can offer insightful information; in comparison, the statistical results from the exploratory factor analysis (EFA) are rather limited, including only model significance, modification indices, and covariance among factors. A variety of absolute and relative (or incremental) indices derived from CFA are consulted in this study to assess model fit. The absolute fit index includes χ2 statistics, where χ2 is the likelihood ratio statistic used to test whether a given model provides an acceptable fit to relevant observed data. There are no definitive cutoff values for these indices. The criteria for Chi-Square/df ratio (χ2/df) ranges from as low as 2.0 (Tabacknick & Fidell, 2007), to 3 (Kline, 2005), to 4 (Hu & Bentler, 1999), to 5 (Wheaton et al., 1977). At the center of the argument is that the Chi-Square value is very sensitive to sample size, and rejects virtually any model derived from a large sample disregarding the issue of the multivariate non-normality (Bentler, 2007; Miles & Shevlin, 2007). For example, in their analysis of a sample of 932 cases, Wheaton et al. (1977, p. 99) found the following: “For our sample size, we judge a ratio of around 5 or less as beginning to be reasonable, based on our experience in inspecting the sizes of residuals which accompany varying χ2 values.” Another absolute fit indicator, “one of the most informative fit indices,” (Diamantopoulos & Siguaw, 2000, p. 85) is the root means square error of approximation (RMSEA) which takes the error of population approximation and degrees of freedom into account and characterizes the lack of fit of the hypothesized model to the population covariance matrix. The cutoff points of RMSEA have been reduced considerably in the past 20 years (Hooper et al., 2008). In the early 1990s, a RMSEA value in the range of 0.05 to 0.10 was considered an indication of a fair model fit, and values above 0.10 were judged to be a poor fit (Hooper et al., 2008; MacCallum et al., 1996). In more recent studies, the cutoff points have been reduced to below 0.07 (Steiger, 2007) and below 0.06 (Hu & Bentler, 1999) to constitute a good fit.
Among incremental fit indices or approximate fit indices are a group of measures that do not use the chi-square in its raw form, but rather compare the chi-square value to a baseline model or the null/independence model (Hooper et al., 2008). “The indices effectively say ‘How well is my model doing, compared with the worst model that there is?’” (Miles & Shevlin, 2007, p. 870). Two of these indices are commonly used in Structural Equation Model (SEM). The CFI assesses “the fit of a user-specified solution in relation to a more restricted, nested baseline model,” in which the “covariances among all input indicators are fixed to zero” positing no relationship among variables (Brown, 2006, p. 84). The CFI coefficient value ranges from 0 to 1.00, with values greater than 0.95 indicating a reasonably good fit between the hypothesized model and the empirical data (Hu & Bentler, 1999). The TLI (Tucker Lewis Index or Non-normed Fit Index) is another measure suggested by Jöreskog and Sörbom (1989) for assessing a model’s overall fit based on a ratio of the squared sum of discrepancies to the observed variances. A TLI value around 0.95 or larger generally indicates a good fit (Hu & Bentler, 1999; Loehlin, 1992; Schumacker & Lomax, 2004). A review of the relevant literature sets up the criteria to be used to accept or reject a CFA model for this study (χ2/df ratio below 5, RMSEA below 0.07, CFI and TLI around or above 0.95) (Bentler, 2007; Hu & Bentler, 1999; Miles & Shevlin, 2007; Steiger, 2007; Wheaton et al., 1977). The Mplus version 6 is used to conduct CFA using the data from the Houston survey conducted in 2010.
Findings
The first step in SEM is to determine if the specified model is overparameterized; a degree of freedom calculation provides an important check on the appropriateness of model specification (Rigdon, 1994). A frequently used approach in this regard is to assess the overall model identification following the formula: dƒ = m × (m + 1)/2 to 2 × m–ζ × (ζ–1)/2–g–b (Kline, 2005; Rigdon, 1994). The first term, m × (m+1) / 2, represents the total number of elements in the variance-covariance matrix to be analyzed and the maximum number of degree of freedom. The second term, 2 × m, represents the number of parameters to be estimated in the matrix of loadings. The third term, ζ × (ζ–1)/2, represents the free, off-diagonal covariances of the constructs, and g and b are the number of free terms in parameter matrices. In the model tested here, there are 11 observed variables, so 11×(11 + 1)/2 yields a total of 66 unique elements in the matrix. The degrees of freedom for the two-factor modified model and the one-factor modified model are 37 and 35, respectively. Therefore, the model is appropriately overidentified and the estimation of the model can proceed. 8
Baseline for One-Factor and Two-Factor Unmodified Models
An initial analysis of the two-factor model of public perceptions of disorder and crime in the 2010 testing sample is shown in Table 1. The standardized loadings of manifest variables on both the crime and the disorder factors were high and statistically significant. For example, the lowest loading on the crime factor is 0.650 (breaking and entering to steal personal property, burglary) and the highest one is 0.740 (violent physical attacks). Similarly, the loadings of the observed variables on the disorder factor were 0.673 from the lowest one “drunk drivers,” to 0.801 for the highest one “people using drugs.” The correlation between the crime factor and disorder factor was .80, below the recommended cutoff criteria of .85 (e.g., Gau & Pratt, 2008; Gomez et al., 2005). However, the model fit indices suggested that the unmodified model fits the 2010 testing sample poorly. The χ2/df ratio was 11.197, well above the critical value around 5 and RMSEA was 0.105. In addition, the value of TLI was below 0.90.
Item Loadings and Fit Indices for the Two-Factor Model.
The one-factor model shows that all 11 manifest variables loaded significantly on the overarching construct of crime and disorder, with the lowest loading being 0.546 (breaking and entering to steal personal property, burglary) and the highest loading being 0.776 (People using drugs) as reported in Table 2. Similar to the two-factor model, the fit indices show that the unmodified one-factor approach fits the data poorly with χ2/df ratio = 15.908, RMSEA = 0.127, and both CFI and TLI being below the 0.95, cutoff point. These findings suggest that both the one-factor model and the two-factor model should be rejected. Our findings are similar to the results by Gau and Pratt (2008), who reported that both unmodified models failed to meet the thresholds of fit indices of either the absolute or the incremental variety.
Item Loadings and Fit Indices for the One-Factor Model.
Model Modification and Replications
The next step of the analysis attempts to improve the fit between the theoretical model and the empirical data. A common practice of accomplishing this is to examine the modification indices that posit suggestions about making specific changes in order to improve the overall model fit, usually entailing a drop in χ2 value (Albright & Park, 2009, p. 23). Bentler (2007) noted that carrying out model modification is an effective process in SEM to obtain a possible acceptable model. Most of the suggestions from model specification indices concern the correlation of error terms between pairs of observed variables. It is justifiable to permit error terms to correlate when two manifest indicators are very similar in content and likely tap into the same theoretical concept (Gau & Pratt, 2008, p. 179). This is particularly true regarding the public perceptions of crime and disorder because observed indicators of either the crime construct or the disorder construct are highly correlated (e.g., Wilson & Kelling, 1982). The correlation matrix of the testing sample is reported in Appendix B.
When following the suggestions derived from model specification indices, we permitted them to correlate for both the two-factor model and the one-factor model in the 2010 testing sample. The primary purpose of this step in the analysis is to see if absolute and incremental indices for the two “conflicting” models can be modified to fit the data. For the two-factor model, six pairs of error terms listed in the modification indices were allowed to correlate before the model fit the data well as shown in Table 3. 9 Similar to the findings reported by Gau and Pratt (2008), these six error item pairs are significantly correlated and seem to represent the same theoretical constructs, but viewed from different angles to measure such things as “people openly selling drugs” and “people using illegal drugs” and “drunk drivers on the road” and “people drinking to excess in public.” The χ2 value significantly dropped by 323.69 from 481.464 in the baseline model to 157.796 in the modified model with Δdf = 6 and χ2/df ratio = 4.265. In addition, the other model fit indices suggest a good fit: RMSEA below 0.06 (Hu & Bentler, 1999), CFI above 0.95 at 0.975, and TLI at 0.963 (Table 4). A primary reason for Gau and Pratt’s (2008) to choose the one-factor model concerns the high correlation between crime and disorder (r = .92), suggesting that two distinct factors cannot be reasonably separated. The correlation between the two factors in the modified two-factor model is .841, and the shared variance between the two factors is 0.70 (0.8412). Farrell (2010) suggested that when the correlation between the two latent factors ranges from .8 to .9, researchers should try to collapse the two factors into one since discriminant validity becomes a legitimate concern.
Fit indices for Original, Modified, and Cross-Validation Two-Factor Models.
Fit Indices for Original, Modified, and Cross-Validation One-Factor Models.
At this point, we decide to check the estimate of average variance extracted (AVE), the amount of variation that a latent construct is able to explain in the observed variables (Farrell, 2010). 10 Fornell and Larcker (1981) developed a method for assessing the discriminant validity for two or more factors. If the AVE of each construct is less than the shared variance between the constructs, discriminant validity is not supported (also see: Farrell, 2010). The results of AVE estimation indicate that the AVE value for the latent variable of crime was 0.46 and 0.50 for the disorder construct. Both values are smaller than the shared variance between crime and disorder (0.70). The next step in our analysis is to examine the convergent validity of a one-factor model.
The results of the one-factor model after modification are shown in Table 4. Following the suggestions derived from the modification indices, we allowed nine pairs of error terms to be correlated. These nine pairs of items include five of the six pairs reported in the modified two-factor model, except for the pair “groups of teenagers hanging out and harassing people” and “youth gangs are present.” 11 Both absolute and incremental indices suggest that the current one-factor model fits the data well, with χ2/df ratio = 4.178, RMSEA = 0.059, CFI = 0.977, and TLI = 0.964. A comparison of the fit indices of the one-factor model and the two-factor model reveals that they share considerable similarities, but the two-factor model is a bit more parsimonious with two fewer degrees of freedom used than the one-factor model. These findings led us to run the validation checks on both models to determine if one of the models can pass the validation test of the 2010 replication sample.
The absolute fit indices of the 2010 replication sample show that the modified two-factor model failed to meet the validation criteria (see Table 3). The χ2/df ratios for the 2010 replication sample was 6.589, which is higher than the upper limit of the cutoff point of χ2/df ratio around or below 5.0 (Wheaton et al., 1977). In addition, the RMSEA for the replication sample exceeds a rather generous cutoff point of below 0.07 at 0.078 (Steiger, 2007). With respect to the validation of the one-factor model, the results indicate that it does not fit any better than the two-factor model. The χ2/df ratio was high for the 2010 replication model (8.770). In addition, RMSEA for the 2010 replication sample was 0.092, well above the cutoff point of 0.07. Though CFI values and TLI values for these two replications are close to or above 0.95, the χ2/df and RMSEA are particularly important indices in SEM to judge if the covariance matrix of the observed data fit the theoretical models (Barrett, 2007; Bentler, 2007; Hu & Bentler, 1999; Steiger, 2007). Based on these findings, we are led to conclude that neither the two-factor model nor the one-factor model passed the construct replication tests carried out.
Additional Model Considerations
Given the fact that both models failed to demonstrate either convergent or discriminant validity, the next step of consideration is to check possible cross-loadings of any observed variables. Farrell (2010) suggested that if items cross-load on more than one latent variable, removal of “offending” item(s) should improve discriminant validity. None of the three studies that tested the validity of broken windows theory examined the cross-loading issue (Armstrong & Katz, 2010; Gau & Pratt, 2008; Worrall, 2006a). In our view, there is a good reason for them not to do so because most of the items included in public surveys of crime and social disorder have been well established and used for a long time (e.g., Gau & Pratt, 2008; McGarrell et al., 1997; Perkins & Taylor, 1996; Taylor, 1999; Worrall, 2006a).
The results from model modification indices (M.I.) revealed that the disorder item “people openly selling drugs” can be loaded on the latent construct crime as well (M.I. = 69.812). It seems rather reasonable to argue that the degree of seriousness of open drug sales varies significantly from misdemeanant offense to felony offense, and a considerable number of offenders are prosecuted for breaking even federal law (Nunn et al., 2006). A review of literature reveals that the drug sales on the street item is one of the core measures of social disorder, and both field observation research and public opinion surveys have routinely employed this measure to represent social disorder. Sampson and Raudenbush (1999) used seven items as the direct evidence to measure the presence of social disorder from the videotapes recorded by researchers, including selling drugs, drinking alcohol in public, and peer groups with gang indicators. Similarly, Braga and Bond (2008) investigated the effect of police control of physical and social disorder on serious crime in Lowell, MA and the presence of drug sellers in the street was one of the four measures of social incivilities/disorder. Moreover, public surveys of social disorder have always included items to measure drug sales and users in a neighborhood. For example, in their study on public perceptions of neighborhood disorder in Baltimore, Perkins and Taylor (1996, p. 78) developed four survey items to tap into the concept of perceived social disorder, including “people selling illegal drugs” and “groups of teenagers hanging out in the street.” Finally, Worrall (2006a) analyzed the data collected in 12 U.S. cities by the Bureau of Justice Statistics and used drug sales as one of the items to measure social disorder.
Given the importance of drug sales in the measure of social disorder, and the reality that public perceptions of drug sales may lie somewhere between disorder and crime, we decided to examine two versions of the two-factor model both with and without the item “people openly selling drugs” on the latent variables and replicate this analysis in the 2010 replication sample. 12 The first part of Table 5 reports the findings from the two-factor model without the item, “people openly selling drugs.” All the fit indices are satisfactory, suggesting the model fits the data well. For example, the χ2/df ratio was 2.689, well below the up-limit cutoff value of around 5 (a drop of 77.12 in χ2 value in the two-factor modified model) and the RMSEA was 0.043 below the recommended limit of 0.07 (Steiger, 2007). Both the CFI and the TLI were above 0.95. The results from the 2010 replication sample showed that the value of χ2/df ratio increased considerably for the 2010 replication sample (5.548). The value of RMSEA was 0.07, while CFI and TLI were all around 0.95 or above. However, the shared variance between crime and disorder did not drop significantly (from 0.648 to 0.598), and the AVE tests suggested that discriminant validity remains a serious issue (see Appendix C for the AVE values).
Fit indices for Two-Factor Model Without the Item “People Openly Selling Drugs” and Its Cross-Validations.
At this point, we also tested the model in which the item “people openly selling drugs” is cross-loaded on both factors. The findings from this cross-loading on both factors are reported in the first half of Table 6. Interestingly, the results are very similar to the models run without the inclusion of that item. For example, the two-factor cross-loading model was good with χ2/df ratio = 2.877, RESEA = 0.045, and CFI = 0.986, but the shared variance remains high at 0.648 surpassing the AVE values of both factors (Appendix C). The results of the replication sample show χ2/df ratio was at 4.923 and a slight decline of shared value between the two factors at 0.594 (Appendix C).
Fit Indices for Two-Factor Model with Cross-Loading the Item “People Openly Selling Drugs” and Its Cross-Validations.
Discussion and Conclusion
The purpose of this study was to examine a core assumption of broken windows theory. The two-factor model and the one-factor model were used to test if the theoretical model derived from the theory fit data collected to specifically test the broken windows theory conception of a predictable movement from social disorder to fear of crime to crime incidence. Wilson and Kelling (1982) hypothesized that public perceptions of disorder and crime represent two independent latent constructs, although they offered no empirical evidence to support this assumption. Since then, findings from empirical field research (e.g., Sampson & Raudenbush, 1999) and public opinion surveys of neighborhood safety have cast serious doubt on the discriminant validity of the two-factor conceptualization which lies at the core of broken windows theory.
Based on the findings of this study conducted in the metropolitan Houston area, there is no evidence to support the core tenet of broken windows theory regarding disorder and crime. Furthermore, the results reported here indicated that neither the one-factor nor the two-factor models tested passed the CFA validation tests run. At this point, two important observations deserve to be highlighted.
Our first noteworthy observation pertains to the empirical findings regarding the disorder-crime link. Using citizen survey data collected from small cities and towns in the largely rural Eastern Washington area, Gau and Pratt (2008) found that a one-factor model combining disorder and crime perceptions is a better choice because the correlation between disorder and crime was very high among their survey respondents (.92). From a statistical point of view, this suggests that the shared variance between the two was at 0.846 and it is impossible to separate one from the other (Farrell, 2010). One possibility is that residents living in small cities and towns fail to discriminate between disorder and crime because there is relatively little crime occurrence in these communities. In contrast, the results reported by Worrall (2006a) based on survey data from 12 large and medium-sized cities showed that the baseline model for the one-factor approach fit the data poorly (χ2/df ratio = 35 and RMSEA = 0.11). Similarly, the two-factor approach failed to discriminate disorder and crime as well in that same dataset (χ2/df ratio = 21.78 and RMSEA = 0.14). These results led to this overall conclusion: “But just because theorists have pushed for the separation of crime and incivility indicators does not mean both concepts are distinct” (Worrall, 2006a, p. 379; also see: Armstrong & Katz, 2010).
In the current study, the citizen perception data were collected from a large metropolitan area which featured a large number of respondents from both central city neighborhoods and suburban neighborhoods. Our findings also showed that neither the one-factor model nor the two-factor model fits the empirical data acceptably. Besides the high correlation that led us to reject the two-factor model (also see Gau & Pratt, 2008), the cross-loading issue was also investigated in this study and brought additional doubt to the core tenet of a clear distinction in citizen cognitions between disorder phenomena and crime phenomena.
Our second noteworthy observation concerns the hypothesized fear of crime and social withdrawal dynamics associated with a sequential link between disorder and crime perceptions (Wilson & Kelling, 1982). It is argued that citizen perceptions of neighborhood-level disorder cause residents to become fearful and to alter their routine activities of life in a way that leads to more crime occurrence (Wilson & Kelling, 1982). This is another core tenet of broken windows theory which leads to prescriptions as to what the police can do to prevent such a downward spiral of neighborhood decay. Broken windows theory calls for the proactive intervention of American police in disorder amelioration at a time when the conventional crime control model was being seriously challenged from virtually all quarters (Cordner, 1997; Greene, 2000). The impact of widespread police adoption of broken windows theory was that the task environment of American police substantially expanded beyond the traditional focus on serious crime (Cordner, 1997) to the active suppression of misdemeanant offenses which American police had long ignored (Walker, 1984). For example, Harcourt (2001, p. 2) noted that police misdemeanant arrests increased dramatically from prior levels by as much as 85,000 per year in New York City during the period 1994 to 1998.
While broken windows theory advocates shifting police resources to order maintenance policing and downplaying crime control function, there is no evidence that American police have actually deemphasized their conventional crime control activities while taking on new order maintenance challenges. Bittner (1972) observed some 40 years ago that the core mission of the police in American society is crime control. In reflection of that core element of American policing, Kraska (2001) found that American law enforcement agencies across the nation were aggressively expanding the militarization of local law enforcement through the use of SWAT teams despite the fact that FBI index crime rates declined significantly during the same period (the second half of the 1990s). A set of longitudinal studies on the priorities of local police agencies in U.S. cities consistently showed that crime, particularly serious crime incidents, have continued to be the top organizational priority of American police administrators during the 1990s —at the very time when the popularity of broken windows theory reached its peak (Zhao et al., 2001, 2003).
The advocates of broken windows theory argued that citizens cognitively distinguish between disorder and crime incidents, and furthermore change their behavior when they perceive social disorder to be on the increase (Wilson & Kelling, 1982). Wilson and Kelling (1982) did not offer any empirical evidence in support of their theory in their now-classic article. Their theory was developed based on interpretations they made of several studies deemed relevant, such as the foot patrol experiment in Newark, the automobile vandalization project carried out by Stanford social psychologist Philip Zimbardo, results from public surveys collected in Portland, Oregon, a study of Guardian Angels in New Year City, and research on public perceptions of police services conducted by Elinor Ostrom and her associates in Indianapolis. Questions regarding the validity of the core tenets of the theory began to emerge in the late 1990s (e.g., Sampson & Raudenbush, 1999), with some researchers questioning the effect of disorder on crime occurrence at the neighborhood level (e.g., Harcourt, 2001). Scholars who have investigated the connection between public perceptions of disorder and crime using SEM have reported findings that were seriously at odds with the theory.
This study contributes to our understanding of the limitations of broken windows theory, but it does have some noteworthy limitations. First, physical disorder is not included in this study. Though social disorder is the primary focus of broken windows theory (Skogan, 2008), physical disorder is also an important part of the theory. Worrall (2006a) examined the theoretical constructs of crime, social disorder, and physical disorder and decided not to explore the possibility of a higher-order factor that may include all three factors. Future research should, however, include physical disorder and explore the possible solution of a higher-order factor with an appropriate dataset featuring neighborhood-level data.
Second, it seems reasonable to speculate that citizen perceptions of crime and disorder vary by the levels of disorder present across neighborhoods. Wilson and Kelling (1982) noted that public perceptions of disorder are more pronounced when there is a rapid increase in disorder phenomenon in a neighborhood. Relatedly, Klinger (1997) argued that police behaviors differ noticeably according to the characteristics of neighborhoods; in a high-crime neighborhood, police officers regularly ignore misdemeanant offenses and focus on serious crime incidents while enforcing zero-tolerance policies in more affluent neighborhoods. Do citizens who live in high-crime areas view disorder differently than their counterparts in low-crime neighborhoods? That is, do residents living in a prosperous and crime-free neighborhood tend to put social disorder and crime into the same category? We did not investigate this issue in this study but hope future research will explore this and the related questions.
Footnotes
Appendix
Average Variance Extracted (AVE) Values for the Two-Factor Models (Testing Sample).
| Model | AVE for Crime Construct | AVE for Disorder Construct | Shared Variance between Crime and Disorder |
|---|---|---|---|
| Modified two-factor model | 0.46 | 0.50 | 0.71 |
| Two-factor mode without cross-loading | 0.46 | 0.51 | 0.65 |
| Two-factor mode with cross-loading | 0.41 | 0.45 | 0.65 |
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
