Abstract
Estimating the racial and ethnic proportions within a constantly changing population of drivers is difficult. Commonly called benchmarks, these estimates are the basis upon which researchers determine the potential for racial profiling. Most benchmarks do not consider the effect driving frequency has on exposure to routine police supervision. Furthermore, racial profiling research has a tendency to focus on overrepresentation at the expense of underrepresentation. This research demonstrates the use of a benchmark based on vehicle collision data. Benchmarks based on these data are more valid and reliable and enable researchers to consider driving frequency and the potential for disengagement.
Keywords
Introduction
Our inability to devise a universally acceptable method for measuring racial and ethnic proportions within an ever-changing driving population remains one of the most controversial methodological challenges in racial profiling research. Commonly called benchmarking, this measurement challenge is essential to our ability to determine whether law enforcement programs (e.g., traffic stops) might be devoting too much attention to citizens of certain racial and ethnic groups.
The importance of a valid and reliable benchmark is not limited to the ruminations of research methodologists. Racial profiling studies based on poorly constructed benchmarks cause political and public relations problems and sometimes result in ill-fated litigation. We offer the following as an example.
During the summer of 2010, the State of Missouri released the 10th iteration of its Annual Vehicle Stops Report. This report was based on police stop data collected in 2009, thus it is referred to as the 2009 report. Almost immediately, attention focused on the disparity index, arguably the most controversial statistic in the report. The disparity index is calculated by dividing the proportions of individuals stopped (reported by the policing agency) by the proportions of individuals available to be stopped (based on the benchmark estimate of the driving population) for each racial or ethnic group. If these proportions are equal, then the result will be 1, indicating no disparity for that racial or ethnic group. If the disparity index is greater than 1, then the attorney general concludes that racial or ethnic group is overrepresented in stops. Alternatively, if the disparity index is less than 1, then the attorney general concludes that racial or ethnic group is underrepresented in stops.
The 2009 report indicates that overall “the disparity indexes for African-American drivers have increased [in] each of the last five years” (Missouri Attorney General, 2010, p. viii). With one exception (2004), this disparity index has increased each year for the past decade with respect to African American drivers. The attorney general interprets this to mean that African American drivers are increasingly more likely to be stopped by police officers throughout the state of Missouri.
To estimate the population of individuals available to be stopped in each jurisdiction, the Missouri attorney general uses the most current estimate of the residential population over the age of 16 years. In doing so, the attorney general makes no attempt to adjust these estimates to account for actual driving patterns in and around population centers. 1 In short, Missouri’s attorney general assumes that the population of individuals available to be stopped by the officers in any community is limited to the individuals who actually reside in that community. This creates a problem for some communities, particularly those that are populated overwhelmingly by one racial or ethnic group, located near or adjacent to a large urban area, and responsible for policing a significant transportation system, such as an interstate highway that introduces a large amount of transient traffic 2 into their community. Here is an example.
In an article appearing in the St. Louis American 3 on June 2, 2010, the City of Ladue (a suburb of St. Louis) was identified as one of the “10 worst places to drive black [sic] in Missouri” ( St. Louis American, 2010, p. 2). The 2009 disparity index for African American drivers in the City of Ladue is 17.11—the highest in the state. Attorney General Chris Koster observes that “it was more than 1,700 percent more likely a black [sic] driver will be stopped in Ladue based upon the African-American population of Ladue” (pp. 2–3).
As one might imagine, the response to this was quick and loud, particularly from advocacy groups like the The National Association for the Advancement of Colored People (NAACP). And, it should be. For a decade, the disparity index in Ladue, MO, has been consistently either the highest or within the top 10 highest in the state. Taken literally, the numbers suggest that in 2009, the 63 Black residents of Ladue, MO, were stopped nearly 18 times each because according to the attorney general, only individuals who actually reside in Ladue are subject to being stopped by the Ladue Police Department.
Ladue is a small, wealthy, suburban community populated nearly exclusively by White residents next to a large urban center that is populated principally by Black residents. The accusation that the Ladue Police Department targets Black drivers in order to deter those people from entering their quiet leafy streets is an easy one to make. But, such an accusation based on the attorney general’s disparity index alone would be wrong. Interstate highways 170 and 64/40 and state highways 67 and 340 either course through or abut the city. All four highways are within Ladue’s city limits and are actively patrolled by the Ladue Police Department. Collectively, according to the Missouri Department of Transportation, these four major transportation systems handle over 300,000 vehicles each day, a number of people representing 48 times the population of Ladue, assuming of course that each vehicle contains only one driver. It is a safe bet that the majority of these drivers are not counted in the residential population of Ladue.
It would seem that the exclusive use of a single benchmarking method for every community regardless of their particular enforcement contexts would be rather irresponsible. In the case of Ladue, it would appear that the use of the residential population to estimate the driving population would be inappropriate, given the large amount of transient drivers who enter but do not reside in the community. Of more concern is the potential that this benchmarking method would fail to reveal an actual pattern of racial disparity in stops or disengagement.
The purpose of this article is to demonstrate the viability of a benchmarking strategy based on vehicle collision data. Such benchmarks have been used for many years by traffic engineers and insurance companies in order to determine the relative risk (RR) of collision involvement among various classes of drivers and types of vehicles. This strategy would be beneficial to communities like Ladue that are populated overwhelmingly by one racial or ethnic group, located near or adjacent to large urban areas, or responsible for policing a significant transportation system that introduces a large amount of transient traffic into their community. In addition, benchmarks based on vehicle collision data enable researchers to account for driving frequency and to measure the potential for disengagement.
Literature Review
Information on police stops is rather meaningless unless it can be compared against a valid measure of individuals, by race and ethnicity who are exposed to police supervision, or more accurately subject to law enforcement attention. Without a reliable estimate of the driving population and its racial/ethnic proportions, conclusions of racial prejudice are premature (Engel, Calnon, & Bernard, 2002), accusations of racial profiling may not be successfully proven (Statistical Assessment Service, 1999), and policy changes may result in inappropriate corrective measures (McMahon, Garner, Davis, & Kraus, 2003). According to MacDonald (2003), “Until someone devises an adequately sophisticated benchmark that takes into account population patterns on the roads, degrees of law breaking, police deployment patterns, and the nuances of police decision making, stop data are as meaningless as they are politically explosive” (p. 22).
According to Walker (2003), effective benchmark must meet three basic criteria. First, the benchmark must be scientifically credible. It should be methodologically sound and able to withstand the rigors of peer review. Second, the benchmark should have practical utility. It should provide insight into the findings and illustrate a solution to the problem. And third, the benchmark must have political credibility. Racial profiling research never occurs in a vacuum. The benchmark must be recognized as valid by the stakeholders. To these criteria, we add a fourth. The benchmarking strategy used should take into account the measurement challenges existing with the research context.
The benchmarks used by racial profiling researchers can be organized into two broad categories. External benchmarks rely on various proxies like the residential population, field observations, or vehicle collision records to estimate the racial proportions of individuals available to be stopped by the police. The most commonly used external benchmarking strategy is based on the residential population. Internal benchmarks rely on the comparison of traffic stops conducted by similarly situated police officers. This benchmarking strategy is based on the assumption that similarly situated officers (e.g., same shift and beat or general assignment) should perform similarly.
Within each these categories is an extensive array of variations that include using subpopulations (e.g., licensed drivers), sophisticated imputation procedures to account for migration, crime rates, citizen surveys, deployment differences, and many others. In a few cases, researchers have developed single benchmarks from multiple (population) sources (Cordner, Williams, & Velasco, 2002; Cordner, Williams, & Zuniga, 2000). To improve precision, a few researchers have attempted to differentiate between the violator and non-violator subpopulations within each racial and ethnic classification of drivers (Lamberth, 1994; Lange, Blackman, & Johnson, 2001). And three researchers have conducted racial profiling analyses in the same communities using competing benchmarks (Engel & Calnon, 2004a, 2004b; Withrow, 2004). The following is a broad overview of the most commonly used benchmarking strategies.
Residential Population
The residential population is the most frequently used benchmark in racial profiling research. Some researchers use the entire residential population to enable an analysis of police contacts with nondrivers (Withrow, 2002, 2003) and others prefer to use only the residents that are likely to have a driver’s license (Smith & Petrocelli, 2001). Two research projects rely on the use of sophisticated spatial weighting techniques to account for migration (Rojek, Rosenfeld, & Decker, 2004) or traffic patterns within the research site (Novak, 2004).
Regardless of the population estimate used or how it might be adjusted to fit the researcher’s needs, the validity of residential population-based benchmarks as a measure of the drivers at risk of being stopped depends on the following assumptions. First, one must assume that the racial/ethnic proportions within the residential population are equal to the racial/ethnic proportions within the driving population. Second, one must assume that patrol resources are equally distributed throughout the research area. Third, one must assume that all police officers are equally attentive and apply enforcement criteria consistently.
Benchmarks based on the residential population have a few practical advantages. First, residential population estimates are readily available and inexpensive. Second, in most cases, residential population estimates can be disaggregated into smaller units. Third, the residential population provides an estimate of the entire population including juveniles and other nondrivers.
Methodologically, benchmarks based on the residential population have numerous disadvantages. First, residential population-based benchmarks do not reflect the transient driving population. To overcome this, researchers have used various spatial weighting factors and mapping software to account for differences between the residential and driving populations (Miller, 2000; Novak, 2004; Rojek et al., 2004).
Second, benchmarks based on the residential population do not account for differential rates of exposure to police supervision. Herein, there are important methodological concerns. Because policing resources are not equally distributed throughout a community, residents are subjected to differential levels of routine police supervision (Withrow, 2004). Residential population-based benchmarks do not account for factors that influence driving frequency. Finally, there are also differences between racial and ethnic groups with respect to vehicle ownership and the use of public transportation (Engel & Calnon, 2004a; Langan, Greenfeld, Smith, Durose, & Levin, 2001).
Third, residential population-based benchmarks cannot identify the population of drivers at higher risk of being stopped (Walker, 2001). Although scant, there is some evidence that the propensity to exceed the speed limit may vary with respect to the race of the driver (Lamberth, 1994, 1997; Lange et al., 2001; Zingraff, Smith, & Tomaskovic-Devey, 2000).
Fourth, benchmarks based on population estimates are regularly criticized because of their inability to accurately measure all residents and the haphazard way they differentiate between racial and ethnic categories (Withrow, 2006, 2011). Undocumented immigrants, overwhelmingly Hispanic, are often overlooked in the census data. In spite of recent ambitious initiatives, an accurate measure of undocumented immigrants remains rather elusive (Navarro, 2003).
Field Observation
The first racial profiling studies were conducted by John Lamberth in 1994 and 1997. These studies use a field observation benchmark (see Lamberth, 1994, 1997 and Police Foundation, 2003). A field observation benchmark requires the systematic observation of actual drivers at randomly selected locations and times. In most cases, observers attempt to classify drivers with respect to race, ethnicity, and approximate age. More sophisticated studies attempt to determine the percentage of drivers within each racial or ethnic classification that violate the traffic (speed) law (Lamberth, 1994; Lange et al., 2001; Zingraff et al., 2000). To be sure that an adequate sample size will be collected, researchers typically select high volume intersections or stretches of roadway. Field observers at stationary sites (intersections) observe single lanes of traffic and alternative between lanes during an observation period. On interstate highways or rural locations, field observers use rolling surveys.
Field observation-based benchmarks have some methodological advantages. First, field observation-based benchmarks may account for the transient driving population. Field observation benchmarks record information from the individuals who actually use a roadway rather than from the individuals who might use a roadway because they live near it. Similarly, because field observation benchmarks do not require a proximal residential population, they are particularly useful for highly transient and rural research contexts.
Second, field observation benchmarks provide the researcher with an opportunity to identify differential offending patterns with respect to race. In addition to classifying drivers with respect to their race or ethnicity, field observers can indicate whether the driver is (at the time of the observation) violating the traffic law (Lamberth, 1994; Lange et al., 2001; Zingraff et al., 2000).
There are considerable practical and methodological disadvantages to a benchmarking strategy dependent upon field observations. First, the costs, in terms of money and time, associated with collecting a field observation-based benchmark can be substantial (Engel & Calnon, 2004b).
Second, there is a legitimate concern that the data upon which field observation benchmarks are based may be both unreliable and invalid (Alpert, Smith, & Dunham, 2004; Engel & Calnon, 2004b; Greenwald, 2001). It may be difficult for observers to differentiate between Hispanic and White or Black and Hispanic drivers. Routine interrater reliability tests, such as those conducted by Lamberth during the Kansas statewide racial profiling study (Police Foundation, 2003), can assess the reliability of the sample. The validity of a field observation measure is also questionable and may be adversely affected by darkness, speed, tinted windows, and traffic volume. To address these issues, one research team (Lange et al., 2001) installed digital cameras on the overpasses of an interstate highway and photographed drivers.
Third, adequate sample size is critical to a field observation benchmark. Often researchers will choose high volume intersections or stretches of highway to insure they get an acceptable sample size in the least amount of time. Selecting observation sites in this way may bias the sample and adversely affect the validity of the benchmark. Furthermore, it would not be possible to merge a series of field observations collected throughout a city into a single estimate of the driving population for the entire community.
Fourth, the researcher’s conclusion of whether or not racial profiling is occurring is limited to an analysis (e.g., odds ratio) occurring within the observation site(s). The final location of the stop may not be the same location where the violation took place, or more importantly, the place where the motorist came to the attention of the officer. Additionally, for safety reasons, officers are reluctant to initiate traffic stops at or near the high volume intersections typically chosen as observation sites (Withrow, 2004).
Vehicle Collision Data
Traffic engineers have used vehicle collision records for more than 70 years to estimate the qualitative features of roadway users and the differential risk factors among drivers (Alpert et al., 2004). Using vehicle collision data to estimate the racial composition of actual roadway users was first introduced to the racial profiling research agenda by the Washington State Patrol in 2001 and later by Alpert, Smith, and Dunham in 2004. 4
Thorpe (1967) proposed a method, commonly referred to as an induced or quasi-induced exposure analysis, for estimating the relative exposure of various classes of drivers and types of vehicles to accident involvement (Haight, 1970). This process determines “the relative likelihood of driver involvement in an accident as the ratio of the number of involvements to the exposure” (Stamatiadis & Deacon, 1997, p. 37). Carr (1969) abandoned Thorpe’s initial assumption that at-fault and not-at-fault drivers in two-vehicle accidents are essentially the same population after developing a technique for reliably identifying the at-fault drivers in two-vehicle accidents. Carr’s RR statistic is computed by dividing the frequency of accident occurrence for selected types (e.g., males) of at-fault drivers by the frequency of accident occurrence for the same types of not-at-fault drivers. Later, Lyles, Stamatiadis, and Lighthizer (1991) proposed an involvement ratio (IR) statistic. The IR statistic is calculated by dividing the marginal proportion of at-fault drivers by the marginal proportion of not-at-fault drivers for various types of drivers or classes of vehicles. If the proportion of at-fault male drivers is higher than the proportion of not-at-fault male drivers involved in two-vehicle accidents, then the researcher concludes that males are more likely to be involved in accidents.
The application of this benchmark in racial profiling research relies on whether vehicle records contain the race or ethnicity of the drivers. If so, researchers can compare the proportional representation of each group to the actual stop data provided by the police department.
A benchmarking strategy based on vehicle collision data has numerous practical and methodological advantages. First, vehicle collision data are normally easy and inexpensive to obtain. Usually differentiating between the at-fault and not-at-fault drivers is relatively easy if officers are required to designate between them on the reporting forms. If not, the researcher can obtain access to the officer’s narrative descriptions of the collisions to differentiate between the drivers. It may also be possible for the researcher to verify the race or ethnicity of the drivers whether the state driver licensing databases include this information. Unfortunately, most states do not collect such data.
Second, benchmarks based on vehicle collision data are particularly useful for racial profiling evaluations because they account for driving frequency and differential exposure to police supervision rates among various classes of drivers. Individuals who drive more are more likely to be involved in a collision. Likewise, they are also more likely to be observed by a police officer and stopped for a traffic violation.
Third, vehicle collision data account for drivers who are not included in the residential population estimates but who actually drive in a community. This advantage is particularly important in urban areas that are surrounded by suburbs and in communities with large transportation systems.
Fourth, vehicle collision data-based benchmarks are more flexible than the other types of benchmarks. If the collision data are sufficiently detailed, then the researcher can develop a benchmark for an entire city or just a portion of it. Because rural and interstate highways often do not have an associated resident population, these types of benchmarks are particularly useful in these locations.
There are a few practical and methodological limitations to the use of traffic collision data for creating a racial profiling benchmark. First, there is a lack of empirical studies that can verify the accuracy of vehicle collision records as an approximate measure of the driving population (DeYoung, Peck, & Helander, 1997, Lyles, Stamatiadis, & Lighthizer, 1991; Stamatiadis & Deacon 1997). Additional research utilizing detailed driver trip logs and/or field observations is necessary to further evaluate the validity of this benchmark.
Second, determining who is at fault in a collision may not be easy in all cases and seldom is one driver totally at fault. The legal distinction of fault, used by a lawyer, may be different than the theoretical definition of fault used by a social science researcher (Withrow, 2004).
Internal Benchmarks
Internal benchmarks are based on the assumption that similarly situated police officers will perform similarly. More specifically, similarly situated officers, for example, those who work the same beat, shift, and general enforcement assignment, should report the same levels of productivity because they all are exposed to the same enforcement context. Officers who fail to perform at levels similar to their similarly situated peers are considered anomalies and warrant additional administrative attention (Walker, 2003).
Internal benchmarks are commonly used by police administrators to compare various measures of officer productivity and behavior. The number of stops, arrests, citations issued, and even miles driven per citation issued are common measures of officer productivity. Internal benchmarks are sometimes used to monitor officer behavior for the purpose of identifying potentially errant conduct. Police officers who are the subject of minor but relatively frequent citizen complaints often become the subject of more serious internal affair inquiries. In this regard, internal benchmarking systems are used as early warning systems that allow police administrators to intervene prior to serious breaches of conduct.
The use of internal benchmarks for racial profiling research was first proposed by Walker (2003). Within this research context, internal benchmarks compare the races and ethnicities of the individuals stopped by each officer with the races and ethnicities of the individuals stopped by similarly situated officers. When an officer reports stopping a comparatively high percentage of individuals from a particular racial or ethnic group than his similarly situated peers, that officer may warrant additional administrative attention.
Internal benchmarks have numerous advantages. First, unlike external benchmarks, internal benchmarks do not introduce measurement error into a racial profiling study. The measurement error, inherent in all external benchmarks, often causes stakeholders and consumers to question the validity of a research finding (Withrow & Dailey, 2014).
Second, internal benchmarks measure what actually happens within an enforcement context. High profile events occurring within an enforcement context that change how police officers are deployed can be accounted for in internal benchmarks (Withrow, Dailey, & Jackson, 2009).
Third, internal benchmarks are highly effective at identifying individual officers who are exhibiting behaviors consistent with racial profiling. Data collected at the aggregate level cannot be used to explain individual officer behavior, lest the researcher be guilty of an ecological fallacy. Because the unit of analysis of a racial profiling study using an internal benchmark is the individual officer, it is possible to identify individual officers who might appear to be targeting drivers of a particular race or ethnicity.
Finally, it is more difficult for officers to game the system when an internal benchmark is used. Two forms of officer misbehavior are possible. Balancing occurs when officers amend their stop reporting forms, so that the racial and/or ethnic proportions of individuals stopped closely approximate that of the benchmark. Ghosting occurs when the police report that minority drivers are White so as to reduce the proportion of minority drivers actually stopped (Lamberth, 1994). Such behaviors are only possible if the benchmark is known prior to the collection of stop data.
Internal benchmarks are not a panacea. There are important disadvantages. First, it is often difficult, and in some cases impossible, to operationalize similarly situated. In urban settings, police officers are initially assigned beats; however, they often find it necessary to work across beat lines. In addition, officers may be temporarily deployed to perform different enforcement tasks within the context of a single shift that are dissimilar to the tasks performed by other officers working within the same beat. In rural setting, it is often impossible to find enough similarly situated officers upon which to develop a benchmark.
Second, an internal benchmark would not be useful in situations, wherein racist behavior is either systemic or rampant. Fortunately, such instances are rare, however, if all (or even most) of the police officers in a department are inappropriately targeting drivers of a particular race and/or ethnicity, then it would be exceedingly difficult to identify a singular errant officer. Table 1 summarizes the advantages and disadvantages of the most commonly used benchmarks discussed in this section.
Summary of Advantages and Disadvantages of Commonly Used Benchmarks in Racial Profiling Research.
The Research Challenge
The City of San Marcos, Texas (population 44,894), presents an interesting challenge for a racial profiling study. This challenge lies in the development of a valid and reliable benchmark to estimate the racial and ethnic proportions within the city’s driving population. For several reasons, a benchmark based on the residential population is not appropriate. First, from 2000 to 2010, the population of San Marcos grew 29.3%, and the community is still growing. San Marcos is the county seat of Hays County, one of the fastest growing counties in the nation (El Nasser, 2010). Interestingly, it appears that the racial and ethnic demographics within the community changed only slightly from 2000 to 2010. During this time frame, the proportion of the population reporting itself as African American or Black grew from 5.1% to 5.4% and the proportion of the population reporting itself as Hispanic grew from 37.4% to 37.8% (U.S. Bureau of the Census, 2011). The apparent stability of the racial and ethnic proportions within the residential population might suggest that a benchmark based on the residential population would be reliable. The issue here is that during the years between the 2000 and 2010, Census Bureau’s annual estimates of the population in San Marcos were highly unreliable. Beginning in 2004, the annual population estimates put San Marcos on track to exceed 50,000 residents by 2010 and actually estimated a population in excess of 50,000 residents as far back as 2008. This suggests an inability to accurately estimate the residential population of this growing community during the years between the decennial censuses.
Second, San Marcos is home to Texas State University, the fourth largest university in the state. From the 1999 fall semester to the 2011 fall semester, the student population at Texas State grew from 21,765 to 33,572 students, about 52%. In addition, the racial and ethnic demographics within this student population changed dramatically. The percentage of African American students grew from 4.7% to 6.5% and the percentage of Hispanic students grew from 18.1% to 24.4% (Texas State University, Office of Institutional Research, 2011).
Third, Interstate Highway 35 bisects the community from north to south and three intermediate state highways, ranch roads, or farm-to-market roads enter San Marcos from the east and west. Of these transportation systems, Interstate Highway 35 has the most effect on the driving population in San Marcos. Interstate Highway 35 stretches from Laredo, TX, to Duluth, MN. Each day, 100,000 vehicles travel through San Marcos on Interstate Highway 35. Some of these vehicles are likely driven by San Marcos residents, but it is a safe assumption that the majority of the drivers on this highway are not counted among the residents of the community.
Fourth, San Marcos is home to two major outlet shopping malls—Tanger and San Marcos Premium Outlets. Together, in 2010, these shopping malls were the fifth ranked tourist attraction in Texas, followed (in order) by the Alamo, Galveston Island, the Hill Country (west of Austin), and the State Capital (Economic Development and Tourism, Office of the Governor, 2010). The most recent estimates suggest that 10 million individuals visit these large shopping facilities each year. Here again, some of these visitors are likely residents of San Marcos, but it is more likely that most are not.
Finally, San Marcos is located roughly halfway between a 90-mile stretch of Interstate Highway 35 between Austin and San Antonio. Austin, with a population of 790,390, is the 14th largest city in the nation and is bordered on the north by Williamson County, which like Hays County, wherein San Marcos is located, is one of the top five fastest growing counties in the nation (El Nasser, 2010). San Antonio, with a population of 1,327,407, is the seventh largest city in the nation.
All of these factors produce important challenges to the validity and reliability of benchmarks based on the residential population as a means of estimating San Marcos’ driving population. First, the residential population in and around San Marcos is unpredictable. Residential population growth in the area has an almost immediate effect on traffic patterns but cannot be accurately estimated in a timely manner, that is, between decennial censuses. Second, the driving population in San Marcos is highly transient. Traffic associated with Interstate 35, the outlet malls, and Texas State University consists mainly of individuals who do not actually reside in the community. Third, population demographics appear to be changing. The growth in racial and ethnic diversity among the student population at Texas State University is not measured by the residential census of San Marcos.
The Stop Data
We were provided a data set representing individuals stopped and cited by the San Marcos Police Department from January 1, 2003 to December 31, 2009, a 7-year period. This data set contains the date, race/ethnicity of the driver (White, Hispanic, Black, Native American, or Asian), gender, date of birth, and age of the drivers involved in 33,179 stops that resulted in the issuance of a citation.
These data were collected in compliance with a state law requiring police departments to collect data on drivers that are stopped and issued a citation. Limiting stop data to those drivers who are issued a citation is a potential concern, which we discuss later.
Drivers were classified into racial and ethnic categories by the officers who issued the citations. The racial and ethnic categories (White, Black, Hispanic, Native American, or Other) are mutually exclusive. Unlike the traditional method of measuring race and ethnicity (i.e., U.S. Bureau of the Census), Hispanic drivers are not also classified as members of the White, Black, Asian, or Native American race. It is assumed that the police officers based this classification on visual evidence rather than asking the drivers their racial or ethnic group preference. The age and gender of the driver were collected from the driver (i.e., from their driver’s licenses) provided to the officers during the stops (see Table 2).
Summary of Stops Resulting in Citations Issued Each Year by Race/Ethnicity, Age, and Gender of the Drivers, 2003–2009.
Note. n = 33,179.
We use 7 years of traffic stops data in order to capture the variation in the number of stops resulting in a citation occurring at about the same time as the student population began to grow and change at Texas State University. As previously mentioned, growth during this period was substantial both in terms of the number of students as well as the racial and ethnic composition of the student population. Subsequently, this growth had an effect on the amount of traffic throughout San Marcos. From 2003 to 2007, the number of stops resulting in a citation remained relatively constant. Then, in 2008 and 2009, the number of stops resulting in a citation increased dramatically. Initially, we predicted an increase in the percentage of Hispanic drivers stopped and issued citations because of the dramatic increase in the percentage of Hispanic students enrolled at Texas State University. These data do not support this prediction. Overall, we find the percentages of drivers by race or ethnicity, age, and gender to be relatively consistent throughout the 7-year period (see Table 2).
The Vehicle Collision Data
We were provided with a data set containing information on single, two-vehicle, and multiple-vehicle collisions investigated by the San Marcos Police Department from January 1, 2003, to December 31, 2009, a 7-year-period. The data set contains the date of the accident, race/ethnicity (White, Hispanic, Black, Asian, and Other), gender, date of birth, age, and fault classification (at fault and not at fault) for drivers involved in 17,438 traffic collisions.
Although this may appear to be the totality of vehicle collisions occurring within the city, there are a few exceptions or omissions. First, vehicle collisions with apparent dollar damage below US$1,000 and not resulting in injury or death are neither required to be reported to the state nor investigated by a police officer. Given the expense of car repairs, this may not exclude many collisions. Second, not all reportable collisions must be investigated by a police officer. Texas law allows the parties involved in a collision meeting the reporting criteria the option of completing a Driver’s Crash Report. These collisions, many of which occur on private property, will not be investigated by a San Marcos police officer and are not a part of the data set. Third, collisions occurring wholly within the Texas State University campus may not be investigated by the San Marcos Police Department. Instead, these are likely investigated by the Texas State University Police Department. We anticipate this to be a very small number because driving on campus is largely prohibited. The vast majority of vehicle collisions involving Texas State University students, faculty, and staff occur off campus and thereby are within the jurisdiction of the San Marcos Police Department. Fourth, it is possible that a small number of collisions occurring on Interstate Highway 35 will be investigated by the Texas Highway Patrol. These would be collisions involving fatalities or multiple vehicles that result in the closure of the interstate that bisects San Marcos. Because the San Marcos Police Department is the primary agency with jurisdiction, it is likely this represents a very small number of collisions.
Drivers were classified into racial and ethnic categories by the officers who investigated the collision. The categories (White, Black, Hispanic, Asian, or Other) are mutually exclusive. Unlike the traditional manner of measuring race and ethnicity (i.e., U.S. Bureau of the Census), Hispanic drivers are not also classified as White, Black, Asian, or Other race. It is assumed that the police officers based this classification on visual evidence rather than asking the drivers their racial or ethnic group preference. Officers classified the drivers by gender and age based on the information provided to them by the drivers (i.e., from their driver’s licenses) at the scene of the collision. Drivers were classified into fault categories (at fault or not at fault) by the officers who investigated the collision. This classification is based in practice on the information available to the officer at the scene. This information comes from driver, witness and passenger statements, the available physical evidence, the juxtaposition of the vehicles following the collision, and other factors. As a matter of practice, at-fault drivers are designated as Driver #1 and not-at-fault drivers are designated as Driver #2 on the State of Texas Peace Officer’s Crash Report by the investigating officer.
In a strict sense, this classification should not be associated with legal fault for two reasons. First, drivers classified as at fault by police officers may subsequently be determined to be blameless as a result of a legal ruling. Alternatively, drivers classified as not at fault by police officers may subsequently be determined to be blameworthy as a result of a legal ruling. Second, fault may not be absolute and wholly reside in one party involved in the collision. The current classification system does not allow for drivers who are “partially at fault” or “partially contributing” to a motor vehicle collision.
Initially, we were concerned that the inclusion of single vehicle collision reports might inflate the proportion of at-fault drivers or that the inclusion of multiple vehicle collisions might inflate the proportion of not-at-fault drivers. We learned this is not necessarily the case. The State of Texas Peace Officer’s Crash Report reporting standards do not require the assignment of fault or no fault. Under certain conditions, a driver involved in a single vehicle collision may be reported as not at fault. Examples of this include damage to a legally parked unattended vehicle, damage caused by a legally executed evasive maneuver, or damage caused by a flying object or an “act of God.” A single vehicle collision that is the result a vehicle maintenance issue (e.g., blown tire) would result in the driver being reported as at fault because drivers are ultimately responsible to maintain their vehicles in a safe condition. In multiple car collisions, a police officer may indicate multiple drivers at fault if the evidence at the scene supports the designation.
We chose to use 7 years of vehicle collision data in order to correspond to our stop data. Our initial review of these data reveals consistency throughout the period with respect to the race, ethnicity, age, and gender of the drivers involved in vehicle collisions (see Table 3).
Summary of All Vehicle Collisions Each Year by Race/Ethnicity, Age, and Gender of the All Drivers Involved, 2003–2009.
Note. n = 17,438.
Operationalizing a Multiple-Level Benchmark
The analysis of racial profiling data is relatively simple. Most analysts present a table comparing the percentages of individuals that are available to be stopped (the benchmark) with the percentages of individuals that are actually stopped (the stop data) for each racial and ethnic group. Other analysts go one step further and calculate an odds ratio by dividing the percentages of individuals actually stopped by the percentages of individuals available to be stopped for each racial and ethnic group. Odds ratios less than 1 suggest an underrepresentation and odds ratios greater than 1 suggest an overrepresentation of individuals from a particular racial or ethnic group in police stops or other law enforcement outcomes. In some cases, the analyst will develop a χ2 model. This statistical technique compares the frequencies expected (i.e., the percentages of individuals by race and ethnicity within the benchmark) with the frequencies observed (i.e., the percentages of individuals by race and ethnicity within the stop data). This technique produces a statistic that can be tested for significance to determine whether any differences are likely due to chance. A statistically significant χ2 statistic suggests that any over- and/or underrepresentation are not likely due to chance. To determine where the differences are, the analyst must evaluate a contingency table.
For two reasons, these sorts of analyses limit our understanding of how racial and ethnic groups are affected by police officer decision making. First, how much overrepresentation is enough to justify an allegation of actual disparity? Does any odds ratio exceeding 1 indicate overrepresentation? Lamberth proposes that any odds ratio between 1.5 and 2.0 is sufficient evidence of overrepresentation and an odds ratio exceeding 2.0 is evidence of active targeting by race or ethnicity (Police Foundation, 2003). Unfortunately, there are no scientific or methodological bases for these analytical thresholds. Furthermore, Lamberth defines racial profiling is implied when minorities are stopped at disproportionately higher rates than they are represented in the benchmark (Lamberth, 1994; Police Foundation, 2003). So, by this definition, any odds ratios exceeding 1 should result in finding of racial profiling. Given the measurement error that is inherent in most benchmarks, it is unlikely this would be considered a reasonable basis for a finding of racial profiling.
Second, the percentages of individuals representing the racial and ethnic categories within the benchmark and within the population of individuals stopped must sum to 100%. This means that if one racial or ethnic group is overrepresented, then at least one other must be commensurately underrepresented. Historically, racial profiling research has focused on overrepresentation at the expense of underrepresentation. The underrepresentation of a single racial and ethnic group in police stops, however, may even contribute to the statistical significance of the χ2 statistic. As a result, in some studies, it is difficult to definitively know whether the problem is overrepresentation, which may be caused by profiling, or underrepresentation, which may be caused by disengagement. We argue that both profiling and disengagement are equally threatening to the effectiveness and credibility of the policing function.
These issues can be addressed by capitalizing on the qualitative differences between the drivers within the vehicle collision data set. This data set is the broadest measure of actual drivers available. Dividing the percentages of drivers that are stopped and cited by the percentages of all drivers who are involved in a vehicle collision produces an odds ratio for each racial and ethnic group. We resist the urge to use these odds ratios as the basis of our decision on whether the police are either targeting or purposively ignoring individuals based on their race or ethnicity. Instead, we refer to these odds ratios as reference thresholds and intend to use them only as a preliminary indication of either over- or underrepresentation.
Next, we focus on the not-at-fault portion of the vehicle collision data set. We propose these drivers, based on the driving behaviors that selected them into this portion of the data set, are the least likely to violate the traffic law. Using the same stop data, we will calculate odds ratios using the not-at-fault drivers as our benchmark. We refer to these odds ratios as the overrepresentation thresholds. The overrepresentation thresholds, in conjunction with the reference thresholds, will be used to determine if any racial or ethnic group is overrepresented in stops. For any racial or ethnic group with a reference threshold greater than 1 and an overrepresentation threshold greater than its reference threshold, we will allege that this group is overrepresented in stops.
Finally, we turn our attention to the at-fault portion within the vehicle collision data set. We propose these drivers, based on the driving behaviors that selected them into this portion of the data set, are the most likely drivers to violate the traffic law. Using the same stop data, we will calculate odds ratios using the at-fault drivers as our benchmark. We refer to these odds ratios as the underrepresentation thresholds. The underrepresentation threshold, in conjunction with the reference thresholds, will be used to determine if any racial or ethnic group is underrepresented in stops. For any racial or ethnic group with a reference threshold less than 1 and an underrepresentation threshold that is less than its reference threshold, we will allege that this group is underrepresented in stops.
Findings
We begin by calculating the reference thresholds for each racial and ethnic category and for each year using the entire vehicle collision data set as our benchmark. To simplify the analysis, we deleted the Native American category in the stop data set and the Other category in the vehicle collisions data set. In addition to being incongruent with respect to racial category, the percentages of stops and collisions for these groups are comparatively miniscule.
As mentioned previously, we use the reference thresholds as preliminary evidence of over- and underrepresentation. We find preliminary evidence that Hispanic drivers may be overrepresented in stops in 4 of the 7 years (2003, 2004, 2005, and 2008) and may be underrepresented in stops in 3 of the 7 years (2006, 2007, and 2009) and overall, Black drivers may be overrepresented in stops in each and every year and overall, White drivers may be overrepresented in stops in 4 of the 7 years (2006, 2007, 2008, and 2009) and overall and may be underrepresented in stops in 3 of the 7 years (2003, 2004, and 2005), and Asian drivers may be overrepresented in 1 of the 7 years (2004) and may be underrepresented in 4 of the 7 years (2003, 2005, 2006, and 2007) and overall. Asian drivers are at parity in 2 of the 7 years (2008 and 2009; see Table 4).
Odds Ratios (Reference Thresholds) Calculated Using the Entire Vehicle Collision Data Set as the Benchmark by Racial/Ethnic Group and Year, 2003–2006.
Next, we use the at-fault drivers within the vehicle collision data set to calculate underrepresentation thresholds for each racial and ethnic group. Recall we propose that the at-fault drivers are the most likely drivers within the data set to violate the traffic law. We interpret the underrepresentation thresholds in conjunction with their reference thresholds to determine if any group is underrepresented in stops. For any racial or ethnic group with a reference threshold that is less than 1 and an underrepresentation threshold that is lower than its reference threshold, we will allege that this group is underrepresented in stops. Using these analytical criteria, we find evidence of underrepresentation among Hispanic drivers overall, White drivers in 2 of the 7 years (2003 and 2005), and Asian drivers in 4 of the 7 years (2003. 2005, 2006, and 2007) and overall (see Table 5).
Odds Ratios (Underrepresentation Thresholds) Calculated Using the At-Fault Drivers in the Vehicle Collision Data Set by Racial/Ethnic Group and Year, 2003–2006.
Note. Boldface values indicate cases where the reference threshold is less than 1 and the underrepresentation threshold is less than the reference threshold, see Table 4.
Finally, we use the not-at-fault drivers within the vehicle collision data set to calculated overrepresentation thresholds for each racial and ethnic group. Recall we propose that the not-at-fault drivers are the least likely drivers within the data set to violate the traffic law. We interpret the overrepresentation thresholds in conjunction with their reference thresholds to determine if any group is overrepresented in stops. For any racial or ethnic group with a reference threshold greater than 1 and an overrepresentation threshold that is greater than its reference threshold, we will allege that this group is overrepresented in stops. Using these analytical criteria, we find evidence of overrepresentation among Hispanic drivers in 3 of the 7 years (2003, 2004, and 2008), Black drivers 4 of the 7 years (2005, 2006, 2007, and 2008), White drivers in 2 of the 7 years (2006 and 2007) and overall, and Asian drivers in 1 of the 7 years (2009; see Table 6).
Odds Ratios (Overrepresentation Thresholds) Calculated Using the Not-At-fault Drivers in the Vehicle Collision Data Set by Racial/Ethnic Group and Year, 2003–2009.
Note. Boldface values indicate cases where the reference threshold is greater than 1 and the overrepresentation threshold is greater than the reference threshold.
In summary, our analyses reveal that during the 2003 through 2009 period, Black drivers are the most frequently overrepresented group, followed closely by Hispanic drivers. Asian drivers are the most frequently underrepresented group in stops. We note these findings are generally consistent with more than a decade and a half of active research in the racial profiling research agenda (Withrow & Dailey, 2014; see Table 7).
Summary of Years During Which Racial or Ethnic Groups Are Over- and Underepresented in Police Stops, 2003–2009.
These findings beg the question, what are the demographic differences between the at-fault and not-at-fault drivers in the vehicle collision data set? Using the entire vehicle collision data set, we find that Hispanic, Black, and Asian drivers appear to be more likely to be the at-fault drivers in vehicle collisions. Younger drivers are more likely to be at fault. Overall male drivers are more likely to be involved in a vehicle collision and more likely to be the at-fault driver (see Table 8).
Vehicle Collision Data by Fault Category and Race, Age, and Gender (2003–2009).
To evaluate further the independent effects of race, ethnicity, age, and gender on vehicle collisions, we conducted a logistic regression. Fault category (0 = not at fault and 1 = at fault) is the dependent variable. We created a binary nominal variable for race/ethnicity by collapsing Hispanic, Black, and Asian drivers into a single attribute (coded as 1) and White drivers into the other attribute (coded as 0). Gender was coded as 0 for female drivers and 1 for male drivers. Age remained a scale variable. Our analysis reveals that the coefficient for the gender variable is not statistically significant. However, the coefficients for the race/ethnicity and age variables are statistically significant (p ≤ .05). The coefficient for the race/ethnicity variable (.418, p ≤ .000) appears to have the most influence on the fault category and suggests that minority drivers are more likely to be at fault. The coefficient for the age variable (−.007, p ≤ .000) appears to have the least influence on the fault category yet suggests that younger drivers are more likely to be at fault.
Discussion
Vehicle collision data sets that include single, two-vehicle, and multi-vehicle collisions are the most valid and reliable measures of actual roadway users available in a community. The validity of benchmarks based on these data is enhanced because vehicle collision data sets to some degree account for driving frequency. The inability of a benchmark to consider driving frequency has long been a problem for racial profiling researchers. The most widely used benchmarks are based on the residential population. In a benchmark based on the residential population, a White retiree who only occasionally drives counts the same as a White delivery driver who drives extensively throughout a community everyday. In reality, the delivery driver is much more likely to be involved in a vehicle collision. Similarly, the delivery driver is more likely to be observed violating the traffic law by a police officer (and therefore stopped) even if his propensity to violate the traffic law is equal to that of the retiree. In short, we assert that the validity of a benchmark based on vehicle collision data takes advantage of the consistently positive correlations between driving frequency, collision involvement, and exposure to routine police supervision.
The reliability of vehicle collision data sets is evident by the consistency in the number of vehicle collisions from 1 year to the next. In our vehicle collision data set, the average (mean) number of vehicle collisions is 2,491. The number of vehicle collisions ranged from a low of 2,356 (2009) to a high of 2,592 (2004) for a total range of only 236 collisions (see again Table 2). Interestingly, this number remained relatively constant even though the student population at Texas State University and the overall residential population in the larger community were dramatically increasing. Further, demonstrating the consistency of this measure, the percentages of individuals by race and ethnicity represented in the vehicle collision data set and its subsets (at fault and not at fault) remained relatively constant from 2003 to 2009 (see again Tables 4 through 6).
Reliability is an important quality for a benchmark in racial profiling research. Relative consistency in benchmarks from 1 year to the next enables a researcher to determine how, or if, ad hoc changes in policing policy or practice affect different racial or ethnic groups. For example, the implementation of a long-term directed patrol strategy designed to interdict juvenile gang activity might result in notable changes in the percentage of stops involving individuals from particular racial or ethnic groups. This is particularly likely when membership in juvenile gangs is highly segregated by race or ethnicity.
The importance of a reliable benchmark was particularly important in this study. While the number of vehicle collisions remained constant from 2003 to 2009, the number of stops resulting in citations increased dramatically in 2008 and 2009. It does not appear that any particular racial or ethnic group was more or less at risk of being stopped in these years. However, this should cause one to question whether the increase in stops resulting in citations caused a reduction in vehicle collisions. It appears not.
It is important to note here that we reject the notion that any benchmark should be based on a random sample of potential drivers. A random sample requires random selection, meaning that all drivers would have an equal and non-zero chance of being selected into the sample. In this case, all drivers would have an equal and non-zero chance of being involved in a vehicle collision. This assumes that factors known to increase or decrease the probability of being involved in a collision do not exist. We disagree. Collision involvement is affected by numerous factors like time of day, location, the presence of other drivers, the number of miles driven, age, gender, experience, and many other factors, all of which diminish randomness and instill a systematic bias into the “sample selection.” In addition, benchmarks are compared against police stop data. Here again, the process by which an individual is selected for a traffic stop is not at all random.
We are reluctant to use the reference threshold as a sole indicator of racial disparity because our benchmark does not consider the effects of patrol officer allocation. In general, practice patrol officers are allocated on the basis of the demand for their services. Normally, demand is determined by crime rates or citizen calls for service. This results in a large proportion of patrol resources being allocated to high crime neighborhoods. In addition to other similarities, like poverty and density, these neighborhoods tend to be populated principally by racial and ethnic group minorities. This results in racial and ethnic minorities being subjected to higher levels of routine police supervision and consequently more likely to be stopped. Instead, we use the reference odds ratio as a de facto baseline that establishes the basis for our subsequent analyses using the not-at-fault and at-fault portions of the vehicle collision data set. We consider this prudent given the serious nature of a racial profiling allegation.
The not-at-fault drivers are, comparatively, the safest drivers in the vehicle collision data set. Alternatively, the at-fault drivers are, comparatively, the most dangerous drivers in the vehicle collision data set. These assertions are well founded in decades of research by traffic engineers and insurance company actuaries. Using the benchmarks based upon the not-at-fault and at-fault drivers in conjunction with the benchmark created from the entire vehicle collision data set provides some important analytical opportunities. First, the conservative nature of this analytical strategy adds credibility to the findings. For example, to be overrepresented, the drivers from a single racial or ethnic group must be overrepresented among all drivers and overrepresented among the safest drivers. Or, to be underrepresented, the drivers from a single racial or ethnic group must be underrepresented among all drivers and among the most dangerous drivers. It takes quite a bit of difference to overcome two thresholds.
Second, our benchmark is able to consider the possibility that the members of some racial and ethnic groups are justifiably stopped in higher proportions because they are more likely to violate the traffic law. Our analysis did in fact find that minority drivers are more likely to be the at-fault drivers in vehicle collisions. Even so, in order to be overrepresented in stops, the percentage of stops involving the members of a particular racial or ethnic group must be higher than the percentage of individuals from the same racial and ethnic group in both the reference threshold (representing all drivers) and even higher still than the overrepresentation threshold (representing the not-at-fault drivers). We consider this a relatively difficult threshold to meet and only possible when the drivers of this racial and ethnic group are truly overrepresented in stops.
It is important to note the limitations of this research. First, our police stops only include stops that resulted in the issuance of a citation. In our view, these stops are less likely to be racially motivated because they are more likely subject to subsequent review. Because a citation has been issued, and hence a legal case initiated, the defendant has a venue and the motivation to question the legality of the stop. This more readily provides the driver an opportunity to question potential for racial animus within the officer’s motivation. Stops that do not result in a citation are among the most likely enforcement events to be racially motivated because they are more difficult to subject to a subsequent review. In these cases, the drives must initiate a separate legal case, which can be costly, time consuming, and difficult to prevail. Knowing this, a police officer so predisposed could specifically target drivers from a particular racial or ethnic group and remain undetected.
Second, it is likely that people drive with different levels of safety consciousness even with the same journey. One may drive recklessly in the familiar areas near one’s home and then more cautiously in unfamiliar areas. To the extent this is true, it causes us to question the conceptualization of the over- and underrepresentation thresholds. For example, the underrepresentation benchmark is based on the at-fault drivers, which we interpret to be the least safe drivers within the driving population. However, it is possible that these drivers are not habitually unsafe. It may very well be true that some at-fault drivers are extremely cautious most of the time and that their inclusion into the at-fault portion of the vehicle collision data set was the result of a momentary and rare occurrence of poor judgment.
Third, not all jurisdictions collect demographic data on drivers involved in vehicle collisions. As a result, vehicle collision data would not likely be viable as the basis of a valid and reliable measure of the driving population by race and ethnicity. In these jurisdictions, it may be possible to randomly select a sample of collisions in a stratified manner. Using other data sources (e.g., driver licensing records) or even contacting the sampled drivers individually, it may be possible to construct a viable benchmark similar to the one used in this research.
Fourth, the designation of fault at the scene of a collision is preliminary and may even be incomplete. A police officer’s decision on which driver is at fault may be subsequently overturned by a court. Furthermore, it is rather rare for one driver to be totally at fault in a vehicle collision. Often all drivers share some percentage of the fault.
Conclusion
The purpose of this research is to evaluate the viability of a benchmark based on a vehicle collision data set in racial profiling research. We find vehicle collision records are an effective method to estimate the driving population as well as many of its important qualitative features. This benchmarking method is particularly useful in locations, like San Marcos, that experience substantial change in their residential populations and/or include a large proportion of transient drivers.
We assert that benchmarks based on vehicle collision data are the most valid and reliable estimates of the driving population available. The validity of benchmarks is enhanced because, unlike other measures, vehicle collision data inadvertently consider driving frequency. The reliability of benchmarks based on vehicle collision data is evidenced by the consistency in the number of collisions from 1 year to the next. Furthermore, it appears that even when the overall number of collisions changes, the percentage of collisions within racial and ethnic groups remains relatively constant. This stability provides police administrators with a means to determine whether changes in police systems and practices had adverse effects on racial or ethnic groups.
Using the at-fault and not-at-fault drivers within the vehicle collision data set offers several clear advantages. First, differentiating between the drivers who are more likely (i.e., the at-fault drivers) or less likely (i.e., the not-at-fault drivers) to be stopped and cited for a traffic violation provides for a stronger analytical conclusion than what might be possible using a single point benchmark. The analytical strategy used in this research avoids the necessity to argue how much overrepresentation is enough to justify disparity or indicated the potential for racial profiling. Second, using the analytical strategy demonstrated in this research enables us, at least in part, to account for relative differences, if any, in the driving performance between racial and ethnic groups. Finally, this analytical strategy enables us to evaluate the potential for racial profiling as well as disengagement. Both are serious threats to the effectiveness of a policing program.
We recognize that no benchmark, by itself, can provide definitive proof of racial profiling. A comparison of police stop data against any benchmark can only indicate the possibility of racial profiling. Stop data do not normally measure whether the police officer knew the race or ethnicity of the driver prior to the decision to initiate a traffic stop. Unless and until, we are able to establish that the officer knew the driver’s race or ethnicity and acted with racial animus no stop data/benchmark comparison will provide sufficient evidence of racial profiling. We propose that any stop data/benchmark comparison, including ours, only be used as the initial indicator of racial profiling. Evidence of over- or underrepresentation from the initial comparison may not be enough to sustain an allegation of racial profiling. It, however, is sufficient justification for a more thorough review of stop data, including the analyses of post-stop behaviors involving searches and arrests.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
