Abstract

There is an old saying that a picture is worth a thousand words, but what if the picture or image is geographical in nature? With popular sites like Flickr and Instagram, and high quality cameras and GPS in smartphones, a proliferation of georeferenced photographs have become available. Unsurprisingly, they have been the subject of much research interest. In addition to geographical coordinates, they have date and time stamps, which allows for a variety of rich spatial-temporal analyses. Examples of such applications are now in evidence in Environment and Planning B, for example to characterize scenic areas (Seresinhe et al., 2018) or to measure the size of crowds (Botta et al., 2020).
Another interesting parallel development has been in the form of street view imagery, starting with the launch of Google’s Street View initiative in 2007, where large numbers of streets have been photographed using car-mounted cameras. Embedded within other Google products such as Google Earth and Google Maps, users can zoom into street locations and see panoramic photographs that cover large swaths of traversable pathways. In 2012, users could begin to upload their own panoramic photographs to Google Street View while businesses could add panoramic photographs of the inside of their shops and cafes from 2013 onwards. Baidu Maps, a Chinese provider of online mapping, has also offered street view images in China since 2013. Since then, more street view crowdsourcing initiatives have emerged where the two largest are KartaView (previously named OpenStreetView and OpenStreetCam) and Mapillary. However, the quality of these images is not as crisp or clear as Google Street View as they have been taken with mobile phone cameras but they are low cost in terms of deployment as no dedicated car or driver is required. Recognizing the commercial value of street view imagery, both have now been commercialized as Kartaview is currently owned by GrabTaxi Holdings Pte. Ltd, a southeast Asian technology company, including ride hailing and food delivery, while Mapillary was acquired in 2020 by Meta (i.e., Facebook). The latter currently has over 1.5 billion images and over 10 million kilometers (Quack, 2021). These include image datasets uploaded by Vermont Department of Transportation (5 million images collected over 5 years) and the Arizona Department of Transportation (4.7 million) (Mapillary, 2018; Yoong, 2018). Now widely accessible through APIs (Application Programming Interfaces), these sources of street level imagery are providing a wealth of photographic data that can be used for research purposes, e.g., to estimate the socioeconomic characteristics of cities (Gebru et al., 2017). We have also seen increasing numbers of papers being published in Environment and Planning B that use street view imagery to explore different aspects of cities, for example, to assess the quality of urban spaces (Li et al., 2021; Ye et al., 2019), and we expect this trend to continue.
In terms of urban planning, we foresee street level imagery playing a role in at least three potentially interesting applications in the future. The first is in detection of urban change. Since 2014, Google Street View allows you to step back in time by providing access to time series where available. Figure 1 shows how images taken at the same location at two different points in time can demonstrate urban change. In this example, we can see newly built apartments in an area dated October 2020 that was an area of waste land in August 2007 (shown in the inset). Leveraging street level imagery over time can help detect and document urban processes such as gentrification and urban regeneration. One could also envisage cities actively acquiring street level imagery on a continuous basis as a new tool for monitoring urban change. Exploring urban change in Buffalo, New York with Google Street View in October 2020 and the same location in the 2007 inset. Source: Google Maps.
Another promising area is the use of computer vision and machine learning to gain insights from the wealth of imagery available and to turn the data into information. The study by Gebru et al. (2017) used computer vision algorithms to determine the make and type of car from Google Street View imagery, which was then used to develop models for predicting socioeconomic characteristics and voting behavior. For training or just extracting information, one can use the YOLO (You Only Look Once) deep neural network model (Ultralytics, 2021), which was employed by Ajayakumar et al. (2021) in mapping informal settlements. But computer vision and machine learning could also be used to automatically build databases of points of interest (POIs) or city assets such as locations of fire hydrants, stop signs, street addresses, etc. Using this type of technology in less developed countries where POIs and other urban databases are lacking has considerable potential, reducing the need to manually input the data.
The final application, which to some extent builds upon the previous one, is in the area of augmented reality (AR) and navigation. In such applications, the user, say of Google Maps “Live View”, has the ability not only to see the route between A and B displayed as a 2D map, which we are already used to from navigation devices in our vehicles, but also with directions augmented within the actual environment. This not only gives a sense of place but also provides an enhanced navigation experience. This can be especially useful for those who might have issues with using more traditional 2D maps, but smart city applications could also benefit from these innovations in ways that are yet to be uncovered. Such technology relies on street view images and for the device to detect where you are from such images (e.g., using street signs, buildings, landmarks). However, as users can also upload their own images to Google, this again increases the number of places being mapped and hence where AR applications will become available in the future (Perez, 2020). Coupled with machine learning, buildings and logos can be automatically identified. If we take Google (2022) as an example, and their Cloud Vision API, one can upload images of a specific place or landmark (e.g., the Empire State building) and Google can automatically identify it.
More recently, 360° car-mounted cameras have become more affordable and can be bought from sites like Amazon for a few hundred dollars or less. So this opens up many possibilities for more crowdsourced or volunteered street view imagery as well as tools that cities may employ to document urban space in ways that complement their more traditional spatial data infrastructures. But the critical question is: will this volunteered or crowdsourced street view imagery become as widespread as mapping contributions to OpenStreetMap (OSM)? Maybe the key is that the two initiatives need better integration so that contributors to OSM can experience the benefits of street view imagery, which might then incentivize them to contribute panoramic images. For example, both KartaView and Mapillary can be used in OSM for editing features such as turn restrictions and road names, so this integration is already beginning.
Regardless, there will undoubtedly be a proliferation of street view imagery becoming available from many different sources in the future. O’Keefe et al. (2019) noted that it only takes 10 New York City taxi drivers to cover the equivalent of 33% of Manhattan in a day. If autonomous cars become more ubiquitous with built in cameras and fleets of taxis and minicabs were incentivized to participate in such a data collection exercise, imagine how much of the world could be mapped and how frequently? These data could then be further mined and processed to study many different urban issues, e.g., to look for the occurrence of any changes over time in almost near-real-time, or these data could become integrated into smart city applications through AR. For sure, the potential for urban planning and scientific research is immense, and we have only witnessed the start of a new era of urban characterization through street view imagery.
