Abstract
Road crashes in metropolitan areas are challenging to prevent because they stem from the interactions of drivers and other system users in intricate built environments. Recent theories indicate that features of the built environment may induce unsafe driving by shaping users’ expectations and behaviors. The availability of street view imagery and methods of scene parsing create new possibilities for understanding how features of the built environment influence crash incidence. Most previous crash research using street imagery has applied manual processing methods. In this paper, we develop and apply automated machine parsed imagery in conjunction with self-explaining roads theory to consider how the street space visible to drivers influences crash frequency, using data from Columbus, Ohio, USA. While controlling for road network and area characteristics, we model the association of individual street elements with crash frequency. We then conduct a cluster analysis to define four types of street spaces, which are used in a subsequent model. We find that an Open Road type of metropolitan street space, characterized by more visible sky, roadway, and signage are associated with the greatest increase in crashes, and that the majority of these spaces exist on arterial or collector class road segments. We theorize that the visual similarity of this type of street space to highways promotes faster, less careful driving, which combines with their mixed land uses to make them the least safe. This points to the importance of traffic calming for such roads in high-activity areas, and the need to differentiate environments of non-highways from highways to promote careful driving.
Introduction
Urban road crashes derive from the interactions of drivers and other road users in intricate and complicated built environments. The theory of self-explaining roads suggests that drivers’ expectations and behaviors are shaped by road design features (Charlton et al., 2010). Research on the role of environmental design features has been limited by the availability of comprehensive data on the characteristics of street spaces and scalable methods for analyzing that data. Our current knowledge is strongest in understanding how adjacent land uses influence crash frequency by creating conflicts of use, and in understanding the effectiveness of specific street designs in reducing conflicts in high-crash areas. There is less understanding of how the complete characteristics of the street spaces visible to drivers influence urban road safety outcomes, including non-road and natural elements such as buildings and sky whose visibility is directly influenced by road design. To address this limitation, this paper presents analyses using segmented street view imagery in order to consider the relationship of street space elements with crash frequency. In particular, we ask firstly, how do specific elements of street spaces visible to drivers influence crash frequency on road segments? And secondly, can these street spaces be categorized into types on the basis of these elements to better understand their influence on crash frequency?
We conduct this study using data from Columbus, Ohio, USA, where severe and fatal crashes have been a persistent problem. The number of these crashes in Columbus excluding highways has been steady in recent years, ranging between 353 and 370 each year from 2016 to 2019 according to a Vision Zero Action Plan (Vision Zero Columbus, 2021a). Furthermore of 225 traffic fatalities that occurred between 2015 and 2019, nearly one third were pedestrians (Vision Zero Columbus, 2021b). After a review of the literature and theory of road crashes, we will introduce our data sources and describe our methods of image segmentation, crash frequency modeling, and cluster analysis. We then present results from three negative binomial regression models on crash frequency. A base model shows how road network, land use, and area characteristics affect crash frequency on Columbus road segments. We augment this base model by adding independent variables for the most prevalent individual objects identified by our street view image segmentation. A third model replaces these individual object variables with street space types created though a cluster analysis. Based on these results, we argue that the similarity of highways with metropolitan non-highway roads in terms of their street spaces makes them more dangerous because it encourages higher speed driving in locations where more complex conditions exist, including the presence of pedestrians. We also recommend a greater role for segmented street image data in traffic safety practice.
Background
Road crashes and the built environment
The built environment and adjacent neighborhoods of roadways have been previously studied in relation to crash frequency and severity, especially regarding pedestrian crashes. In terms of crash frequency, transit stops, commercial land use, and intersections have been associated with more crashes. Dai and Jaworski (2016) found the presence of transit stops on road segments and the percentage of the adjacent population that use transit were related to increased pedestrian crashes. In a community-level analysis, Ouyang and Bejleri (2014) similarly found that more transit stops, as well as mixed land uses, were associated with increased pedestrian crashes. Zonal speed limits on roadways and the density for 4-way intersections reduced the incidence of pedestrian crashes in a Seattle study that considered non-linear effects (Ding et al., 2018). Multiple studies that included all types of vehicular crashes found commercial land uses as well as intersection density associated with increased crash incidence (Huang et al., 2018; Dumbaugh et al., 2020).
Concerning crash severity, areas with transit access and pedestrian connectivity, such as urban center districts, tended to have less severe pedestrian crash outcomes (Clifton et al., 2009). Hanson et al. (2013) used Google street view imagery to catalogue street safety infrastructure elements at the point of pedestrian crashes finding buffer areas and sidewalks associated with decreased crash severity. In terms of intersection crashes, those involving pedestrians, cyclists, left turns, or certain configuration of divided roads were found to have higher crash severity (Abdel-Aty and Keller, 2005). Xie et al. (2019) study how urban land conversions in China, such as to mixed residential and commercial uses, created mixed traffic flows, and were associated with increases in the number of severe traffic crashes.
The influence of the built environment on crash frequency has also been explained through theory related to street design. The concept of self-explaining roads refers to an observed tendency of drivers to be guided by road design, which users perceive as indicating a particular type of road, or the appropriate way to drive (Theeuwes, 2021). Interventions to establish clearer types of self-explaining roads effectively reduced speeds and reinforced perceptions of differences in the appropriate speed among residents (Charlton et al., 2010). The theory of safe systems incorporates this concept, further arguing that because the built environment creates scripts that shape driver errors, the context of driver errors should be considered foremost (Dumbaugh et al., 2020). In an application of this theory, Saha et al. (2020) found that four corner intersections and big box stores contribute to increased injury or fatal crashes. In practice, the policy framework of Vision Zero implements such a systems perspective on road safety that seeks solutions that accommodate rather than attempt to correct human fault (Belin et al., 2012). Finally, the mixture of elements in view to the driver may influence psychological state or behavior, such as greenery associated with improved mental state (Jiang et al., 2021) and street trees associated with reduced speed (Naderi et al., 2008).
Street imagery and urban analysis
The systematic use of photography to document street spaces and support urban inquiry predates computing and digital imagery. Jacob Riis (1890) used photographs to bring the street spaces of the poor to the attention of the wider population as part of his studies of tenement life. In the early 20th century, members of the Chicago School of sociology similarly used photography as part of research on urban neighborhoods (Lindner, 2019). In photographing street scenes throughout three US cities, Kevin Lynch (1960) considered not only what each image represented, but also what its objects collectively mean and how residents perceive them to navigate cities. And in 1970, William Whyte used time-lapse photography to better understand how people used public spaces in New York City (Whyte, 1980). Yet such early examples of using imagery were limited in scope by the necessity of choosing and manually photographing each location by researchers.
The capture of digital multi-perspective photographs from a moving vehicle opened up capability to more comprehensively catalog street scenes (Anguelov et al., 2010; Roman et al., 2004). Street view imagery has informed various transportation and urban research, especially for classification (Biljecki and Ito, 2021). Gong et al. (2019) use street view imagery of Beijing China to broadly classify street spaces as dominated by just one or two elements, such as buildings, trees and sky, or compromised of a balance of those elements. Other classification efforts used street view images to estimate neighborhood tree shading (Li et al., 2018), building functions (Kang et al., 2018), and cycling route viability (Vanwolleghem et al., 2014). Ye et al. (2019) incorporated the evaluations of designers into a machine learning algorithm in order to extract a measure of visual quality from street view images. To create an index of urban bikeability, Ito and Biljecki (2021) automatically extracted indicators from street view images in two cities using computer vision techniques, with results that outperformed other measures. Moving beyond classification to correlational research, Lu (2019) used street view images around Hong Kong housing estates to show the positive influence of greenery on physical activity. Processed street view imagery also provided a measurement of perceived physical environment for a model of neighborhood opioid overdose risk in Columbus, Ohio (Li et al., 2022). In relation to road safety, street view imagery has been used to augment crash analysis through manual data processing that identifies built environmental features (Hanson et al., 2013), or to augment researcher interpretations of results (Hu et al., 2020). It has also been used to verify the recollections of victims of cycling crashes about the built environment (Cicchino et al., 2020). Computer processed street view data—as opposed to manually processed—has also been used to evaluate perceived crash risk at intersections (Kwon and Cho, 2020).
This paper will contribute to the literature by developing and applying computer-processed street view image data and classifications to an analysis of crash frequency, within existing theoretical frameworks of crash risk. Specifically, the proportion of elements in street view images represent views visible to drivers on road segments, giving insight into the functioning of self-explaining roads beyond the design elements customarily used in crash modeling. This includes open sky and close-by trees or buildings, or smaller objects such as trash cans or fences for which data does not always exists, yet which may influence choices regarding speed and attention. Furthermore by combining these elements in groups, we gain insight into how they form collective environment types. We theorize that both individual elements and environment types that indicate dense complex areas where non-drivers may be immediately present reduce crash frequency through driver attention and speed, while those that indicate driver-oriented environments are associated with more crashes. The most dangerous areas then should be ones that are complex and dense with the presence of non-drivers, but missing the visible elements that communicate these conditions to drivers. Identifying specific street space elements and types that influence crash frequency can thus point to ways to differentiate such street spaces to create clearer self-explaining roads within a safe systems theoretical framework.
Data
In recent years, the rapid development of map services has provided a massive amount of geo-tagged images publicly available that can help audit the streetscape features of every corner of a city at large urban scale. This study collected street view images from the Google Street View images API (Google, 2020). We obtained street network vector layer from OpenStreetMap (Haklay and Weber, 2008) and we extracted street view images along the street network at a fixed distance interval of 100 m. To ensure the surrounding physical features of the street space were fully captured, we collected four images for each location with compass headings of camera: 0, 90, 180, and 270 degrees, and horizontal field of view of 90 degrees. In total, we collected 241,179 street view images at the City of Columbus to cover the diversity of its built environment, which drivers inhabit as they traverse the region. More than 70% of these images were photographed in 2019, and only 4.5% are older than 2015, with the oldest being 2007. Overview of the image feature extraction workflow.
Crash data is collected by the Ohio Department of Public Safety from police departments and is made available by the Ohio Department of Transportation (ODOT). We focus only on crashes involving verified injury or fatality, which are known as “KAB” crashes based on coding from the National Safety Council (1990). These exclude crashes that resulted in property damage only or where injury outcomes were unclear. We obtained data for all such crashes in Franklin County Ohio from 2018 and 2019, totaling 10167 crashes of which 7539 occurred in Columbus. Of these 1.7% were fatal, 10.2% were incapacitating, and the remaining resulted in non-incapacitating injury. Additionally, 9% were pedestrian crashes, and 3% involved cyclists. Road network and design data, including functional classifications and segment characteristics, were also acquired from ODOT. Additionally, we indicate adjacent land uses for road segments, including points of interest such as retail outlets, which was obtained from the City of Columbus open data portal, as well as bus stop locations, which was available from the Central Ohio Transit Authority. Finally, we include in our analysis population density and median household income data for Census tracts based on 2014-2018 American Community Survey (ACS) 5-year estimates. Road segments were merged with ACS data for the Census tracts in which they were contained, and were split in cases where road segments crossed multiple tract boundaries. Our final crash database consisted of 5997 road segments to which we aggregated crashes, bus stops, and points of interests.
Methods
Image segmentation of street view images
In this study, we used Pyramid Scene Parsing Network (PSPNet) (Zhou et al., 2017), a pre-trained deep convolutional neural network model, to segment the street view imagery into visual objects. Figure 1 illustrates how such neural network models works given input imagery. First, a pre-trained convolutional neural network is applied to capture the feature map. Second, a pyramid pooling module is used to extract different sub-region representations—using four level pyramids, the pooling kernels cover the whole, half, and smaller portions of the imagery, followed by upsampling and concatenation layers to establish the final feature representation. The feature representation fuses both the local and global context information. Finally, the representation is fed into convolution layer to get the final prediction map. Compared with other image semantic segmentation algorithms such as Deep Convolutional Neural Network or Fully Convolutional Network (Dubey et al., 2016), PSPNet takes both pixel-level prediction and global level context into account, and thus achieves better performance. Pyramid Scene Parsing Network can partition street view imagery into up to 150 different object categories (e.g., sky, sidewalk, and tree), and the model reached a state-of-art performance with accuracy 85.4% with PASCAL Visual Object Classes, and accuracy 80.2% on Cityscapes (Zhao et al., 2012); it is efficient and accurate in street view image segmentation (Zhang et al., 2018; Kang et al., 2020). For each street view image, we segment the street view imagery into visual objects and the number of pixels for categories were further calculated into percentages (see Figure 1), with higher percentages for objects that appear larger. We matched street view imagery to road segments by choosing the set of images in the location closest to the center point of each segment. The mean percentage for each object category across the four directional images was then calculated for later regression analysis input. Example street space clusters showing the four views.
Descriptive statistics for base model variables and most prominent street view elements on Columbus road segments 2018–2019 (n = 5997).
Negative binomial regression modeling of crash counts
We compare the results of three negative binomial regression models on the dependent variable of number of KAB crashes per road segment. Poisson regression is a standard model for count data; however, when such a model is overdispersed, the use of a negative binomial model is preferred (Gardner et al., 1995). We calculated the Pearson’s chi-square dispersion statistic using the R package msme (Hilbe and Robinson, 2018) which indicated overdispersion. Therefore, we decided on the negative binomial alternative. We also considered the use of zero-inflated negative binomial model because of the 5997 complete road segments in our dataset, 3531 had zero crashes. A zero-inflated approach would permit these zero crash segments to be separately modeled through a logistic process, implying that some may be in excess. The use of zero inflation is controversial in crash research. While Lord et al. (2005) consider that low exposure is a better explanation for zero crashes than the notion that some areas are inherently safe through a distinct process, Pew et al. (2020) argue that zero-inflated models may be used where shown useful for research objectives, especially as any model is an approximation. For our study, both standard and zero-inflated negative binomial models showed similar results and we opted for the simpler configuration of the one-step negative binomial model.
Among the three models that will be compared, the first is a base model (“Base”) that includes as independent variables only road network, land use, and area characteristics, while excluding any street view image variables. Independent variables in this model are segment length, functional class to indicate relative volume and speed, number of lanes, median, shoulder width, number of bus stops, number of points of interest, population density, and median household income (Table 1). We also include an interaction variable for the combination of points of interest and population density. We expect this model to be effective at explaining the variation of crash counts on road segments in the City of Columbus, largely because the literature has previously identified these factors as being important.
Our other models add to this base model either individual street space elements or clustered street space elements, based on our segmentation of street view imagery. The first alternate model (“Model 1”) adds as independent variables the 25 most prevalent objects for the road segments in our analysis as shown in Table 1. The second alternate model (“Model 2”) adds independent variables to indicate street space cluster. The methods used to identify these clusters based on image segmentation results are described in the subsequent methods subsection. We expect Model 1 to be most effective at explaining crashes based on its inclusion of detailed street space elements; however, we expect the cluster-based Model 2 to retain some of that effectiveness. We measure model fit using McFadden’s R2 (McFadden, 1973), and convert each coefficient to an incidence rate ratio (IRR) for easier interpretation as percentage change.
Cluster analysis of street space elements
We use k-means cluster analysis to identify four types of street spaces in the City of Columbus. Our approach is similar to Gong et al. (2019), however with a focus on road segments and the fitting of resulting clusters into regression models. K-means cluster analysis is one of the most widely used clustering approaches in GIScience applications (Bação et al., 2005; Li and Xie, 2018) due to its linear time complexity. This type of clustering classifies each observation as one of k categories based on the proximity to the mean of each cluster. We used two methods to gain guidance on the appropriate number of clusters. The elbow method indicated that four or five clusters would be optimal, while the silhouette method indicated that two clusters were most optimal with four clusters being the second most optimal. Based on this guidance, combined with our knowledge of the road network and city, we generated four clusters based on the 25 most prevalent object categories for our road segments. Before incorporating these clusters into our modeling, we first describe the characteristics of each cluster including its cross tabulation with functional class, as well as show a map of the distribution of clusters within the city of Columbus.
Selected road segment characteristics by street space cluster (n=5997).
Note:
In terms of road functional classifications, all the clusters are majority local roads, except for the Open Road cluster, which contains nearly all the highways, comprising 23% of its segments; yet its largest share is arterial roads (44%). This combination suggests highways and many of the city’s arterial roads have similar street spaces as measured by image segmentation. The Leafy Residential cluster has the lowest proportion of non-local roads, only 14%. In terms of land use, the Built-up Urban cluster has vastly more average points of interest per road segment than other clusters, as well as bus stops. And while this evidence of activity does correlate to a high mean number of injury and fatal crashes per segment, it is the Open Road cluster that has the highest mean injury or fatal crashes per road segment. The map in Figure 3 supports this overall assessment showing that the Built-Up Urban cluster occupies most of central Columbus and other dense commercial areas. It also shows that the Open Road cluster includes not just highways, but also several major arterial roads that were identified as areas of high-injury by the Columbus Vision Zero Initiative (Vision Zero Columbus, 2021b). Distribution of road segments by street space cluster in Columbus (n = 5997).
Results
Base model and model 1 results
Negative binomial regression models on number of crashes per segment (n=5952).
2018-19 injury or fatal crashes; ***p<0.01, **p<0.05, *p<0.10
Model 1, which included the top 25 most prevalent street space elements for our road segments, shows an improved model fit, with a McFadden’s R2 of 0.076 compared to 0.070 in the base model. However, only two of these factors showed a significant influence on crash frequency. The first is the visibility of signage, which are associated with increased KAB crashes. Each additional mean percentage point of the four images for a given road segment representing signage is associated with a 23% increase in the number of crashes. The second influential street view factor is the visibility of trashcans, which are associated with lower injury or fatal crash counts. Each additional mean percentage point of the four images for a given road segment representing trashcans is associated with a 40% decrease in the number of crashes. Finally, the road network, land use, and area characteristic variables all retained significance and direction of correlation, with some lessening of effect sizes especially for the functional class categories.
Model 2 results
Results from Model 2, which contains all the base model variables plus dummy variables indicating street space cluster, are shown in the final columns of Table 3. Once again, the road network, land use, and area characteristic variables all retained the significance and direction from the base model, yet are again slightly diminished in effect size. All of the street space cluster variables show a significant association with crash counts, using the Open Residential cluster as a reference group. The Leafy Residential cluster, which tended to have less road space, more trees, and more trashcans (Table 2), is associated with a 12.4% decrease in the number of KAB crashes compared to the reference group. The Built-Up Urban cluster is expected to have an increase of 21.7% in injury and fatal crashes on road segments compared to the reference group. The largest correlation effect is for the Open Road cluster, which is characterized by more visible sky, road, and signage (Table 2), showing an increase in crash frequency by 48% compared to the reference Open Residential cluster. And it is noteworthy that these associations are found while controlling for base model variables on road network and land use characteristics. The McFadden’s R2 of 0.072 is an improvement over the base model showing that the inclusion of street space clusters slightly helps to better explain the variation of injury and fatal crash counts across road segments in the city of Columbus.
Discussion
The results from our three models show that segmented street view data—applied both individually and in clusters representing types of street space—improved the modeling of crash frequency. Our results found that only two individual street space elements had an association with crashes: signage and trashcans. The amount of visible signage on a road segment was found to be associated with higher crash frequency. Such signs on or adjacent to the roadway would largely be intended to direct drivers in complex situations or else advertisements intended to be viewed by drivers. We consider the signage variable to be a visible indicator of complex car-oriented street spaces. Conversely, trashcans were associated with fewer crashes per segment. We consider the trashcan element to be a visible indicator to drivers of more pedestrian-oriented spaces. Our cluster analysis allowed us to relate these individual elements to the greater road network through the creation of a typology. We defined two residential clusters, one urban cluster, and one roadway cluster. A key finding from our third model that included these clusters while controlling for road and area characteristics, is that the Open Road cluster had by far the strongest correlation of any cluster with increasing crash counts. Furthermore, this cluster contains both highways and the most arterial roads, showing that from a driver’s perspective highways and most arterial roads in Columbus are more similar than different.
Crosstabulation of street space clusters and functional class.
A policy response to this finding would be to take greater steps to differentiate these non-highway roads from highways in their appearance, especially in busy areas and near transit. Local policymakers should consider what it would take to convert high-injury arterials or collectors with Open Road type street spaces, so that they could fit into another cluster such as one of the residential clusters, or the Built-up Urban cluster. Such a conversion would require narrowing roadways, while adding either more nearby greenery (trees or grass) or more nearby buildings and sidewalks. While a loss in vehicle capacity and speeds on such roads may be concerning to some, these marginal reductions should be weighed against the numerous injuries and deaths they require in busy areas. A more balanced approach to street and road design and performance metrics would consider this human cost.
Limitations and future directions
There are important limitations to the research presented in this paper. First are the quality of the street view data and the accuracy of its segmentation. Examples of this are the existence of noise, such as mislabeled or empty images, or objects that were misidentified like “mountains” in flat Columbus. While we took steps to minimize these issues, such as excluding non-street locations and using only the top 25 most prominent identified objects, which excludes most cases of misidentification, this potential for bias should be borne in mind. A second limitation is related to our use of segmented street view data in the modeling presented in this paper. The primary goal in this research was to use street view imagery to consider the context and visual perspective of the driver while controlling for road network elements such as volumes and speed through functional class, number of lanes, the existence of medians, and shoulder width. However, these controls are not exhaustive, which creates uncertainty in the interpretation of our findings. For example, lacking a control variable, our visible “signage” variable simultaneously represents both the actual presence of signage likely related to nearby complex interchanges—as well as the visibility of that signage to drivers. Because of this our model cannot conclude with certainty that driver perspective is a mechanism of the association of signage with crashes, yet nor can that mechanism be ruled out. Indeed, this challenge to the usefulness of individual street view elements in crash modeling is one reason we applied cluster analysis in order to make use of elements in combinations.
Yet these limitations also give shape to future directions for continuing research into the application of street view imagery to road safety. We suggest two directions for future efforts. First is to better explore the potential of machine parsed street view data to augment inventories of road design and road adjacent land uses. Street view imagery has been used previously to create such inventories for specific studies through a manual process (Hanson et al., 2013). Yet if an automatic process could be applied with reliable results—as demonstrated by promising early efforts (Ito and Biljecki, 2021)—it would be a useful resource for departments of transportation and planning. Secondly, and more related to the focus of this paper, street view imagery can be used to explore how different types of built environments perceived by drivers influence road safety outcomes. For this effort, we hope our paper has provided a beginning by looking at the relationship of four types of street spaces with injury and fatal crash frequency.
Conclusions
This paper compared a set of negative binomial models to show two different applications of machine-parsed street view data towards understanding injury and fatal crash frequency on Columbus Ohio, USA road segments. The first approach included independent variables for the percentage of the most prominent individual street space elements visible on road segments, finding that the presence of signage was associated with increased crashes, while the presence of trashcans was associated with fewer crashes. The second approach used cluster analysis of individual street space elements to define four types of street spaces visible to drivers. Including those clusters in modeling of crash frequency showed that the Open Road cluster, characterized by sky, roadway, and signage, was most associated with increased crashes. Yet this Open Road cluster was also shown to contain roads of all functional classes, not just highways. We theorize based on our findings that as self-explaining roads (Theeuwes, 2021), these highway-like arterial, collector, and local roads may encourage unsafe speeds and levels of attention on metropolitan roads that contain a complex mixture of activities. Achieving the goal of Vision Zero—as the city of Columbus and others have set out to do—would benefit from considering how high-crash road segments present themselves to drivers when deciding where and how to apply design interventions. We have also shown that street view data can support and inform crash frequency modeling, as it did in showing both how particular objects are related to crash frequencies as well as clusters based on object groupings. It is our hope that departments of transportation as well as cities engaged in safety programs such as Vision Zero, can, in time, incorporate street view data into their own transportation safety analyses, either through partnerships with universities, private sector products, or government initiatives.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
