Abstract
Understanding the interplay between urban form, traffic volume, and air quality is significant for urban planning and environmental sustainability. However, limited progress has been made in bringing effective urban planning strategies to help control traffic demand and resulting air pollutants. Therefore, this study aims to investigate the interrelation between urban form, traffic volume, and air quality with a spatiotemporal stratified method. The method extracts and preprocesses traffic volume data in spatial (polluted and unpolluted zones) and temporal (periods in holidays and workdays) dimensions. Three decision tree models (random forest, random tree, M5 model tree) and two comparison models (multiple linear regression, artificial neural network) are used to examine the relationships. The final results show that the spatiotemporal stratification approach effectively reveals the interrelations, and the random forest model outperforms the other models. Specifically, highly aggregated roads and industrial areas are more associated with traffic volume in polluted zones. The dominance of waterway and vegetation shows a strong association with traffic volume in unpolluted zones. The degree of association also varies significantly between workdays and holidays. Our spatiotemporal stratified approach reveals heterogeneous relationships between urban form, traffic volume, and air quality and provides insightful references on sustainable urban development.
Introduction
The world’s urban population accounted for 25% in 1950 and is expected to rise to 60% by 2025 (United Nations, 2015). The increasing urbanization may result in a potentially higher level of harmful emission and deteriorated air quality (Copsey, 2016). Vehicle emissions have become the dominant source of air pollutants. Transportation in metropolitan areas was reported to be the most significant source of contamination (Zhang and Batterman, 2013), and motor vehicles accounted for 30–50% of carbonaceous particle emissions (Kim et al., 2017).
Many studies have investigated the driving forces of urban traffic. One of the most apparent parts is human mobility, which has been further divided into the social, spatial, and socio-spatial dimensions (Kim et al., 2017). For instance, spatial configurations and arrangements of urban functions are closely related to human mobility in social-spatial dimensions. Researchers believe that observed mobility patterns result from complicated decision-making processes by which people try to reach destinations within certain spatiotemporal and institutional constraints (Breuste et al., 2013).
To this end, the study of urban forms has attracted enormous attention. It concerns the spatial configurations and arrangements of different urban elements, such as land uses, population distribution, transportation network, and other infrastructure that may dramatically affect human activities (Bereitschaft and Debbage,2013; Ewing et al., 2003). Urban form influences human travel behavior and traffic distribution by imposed spatial restrictions, such as the connectivity of the road network, as well as its spatial configurations and arrangements of transport infrastructures (Self, 1995).
In sum, the urban form has a significant impact on human mobility patterns and traffic formation and consequently influences air pollution. However, few studies applied urban morphology theory in air quality modeling until the end of the 20th century, partly due to the lack of scientific investigations focusing on environmental health issues. Since then, many researchers have explored relationships between urban form metrics and air quality, especially transport-related emissions. For example, Bechle et al. (2011) investigated the relationship between NO2 concentrations and urban form in 83 cities worldwide and found that cities with lower population density or build-up density had less air pollution. However, they relied on the linear assumption to construct the model. Besides, Tian et al. (2020) found the urban form-air quality relationship proves to be nonlinear.
To improve urban air quality through effective urban designing strategies, polycentric urban development has increasingly attracted researchers’ interests. It refers to multiple proximately located urban centers that were formerly functioning independently to become integrated through a broad range of processes (Liu et al., 2018a). For example, She et al. (2017) studied the correlation between six pollutants and six landscape metrics. They concluded that polycentric development was an effective strategy to minimize air pollution instead of monocentric development. In addition, urban fragmentation, which indicates large patches’ transformation into smaller ones that tend to be isolated from the original (De Montis et al., 2017), is another phenomenon associated with air quality. For example, Liu et al. (2018b) found that the level of urban fragmentation significantly influences emission patterns. It also supported that a continuous and sparsely populated city can reduce overall air pollution. Ou et al. (2013) concluded that less fragmentation, more compact development, and a higher degree of coupling between urban form and traffic organization could reduce carbon emissions.
Apart from urban form, meteorological factors also play an essential role between air quality and traffic volume. For example, the wind can impact the horizontal transport as well as vertical mixing and dispersion of air pollutants (Seaman, 2000). The precipitation would affect air pollution through the adsorption and collision of raindrops and the reduction of dust and fugitive dust previously suspended in the atmosphere (Li et al., 2015). The increased air temperature caused by lower density, decentralized patterns of urbanization is also closely related to the increased concentration of inner-city air pollution (Breuste et al., 2013). Therefore, we add them as control variables when exploring the interrelation between urban Form, traffic volume, and air quality.
To sum up, it was rarely seen in the previous studies about how the relationships between urban form, traffic volume, and air pollution play out and how they vary in different spatiotemporal dimensions. Furthermore, prior studies took simple classification schemes for urban forms by a simple dichotomous scheme, urban vs. nonurban areas. The oversimplified schemes cannot represent the urban form with sufficient details. In response to this gap, this study considers the urban form for each land use separately. Given that traffic volume varies in space and time, we extracted and preprocessed traffic data for different regions (polluted and unpolluted zones) and different time periods (workdays and holidays). Three decision tree models (i.e. random forest (RF), random tree (RT), M5 model tree (MF)) and two comparison models (i.e. multiple linear regression (MLR), artificial neural network (ANN)) are constructed to investigate the interrelationships among relevant surrogate variables and metrics.
Research area and data
Area of case study
The research area, the city of Atlanta, USA, is the central city of the Atlanta metropolitan area, located in the northwestern part of Georgia. It is the most populous city in Georgia and consists of the majority of Fulton County and a small portion of DeKalb County. As for urban form, Atlanta is a sprawling city with a network of freeways spreading from the center, and the automobile has become the primary mode of transportation in the region (Fodor, 2007). The heavy reliance on cars has resulted in massive traffic flow and frequently occurred congestions, particularly during peak hours. There are 30 traffic monitoring sites, 5 Environmental Protection Agency (EPA) stations, and 4 meteorological stations around Atlanta city. Figure 1 shows those locations in the geographic background of buildings, vegetation, waterways, and industrial land-use types.

Traffic sites, air quality, and climate stations around Atlanta city in the background of land-use distribution.
Air pollution data
Air pollution data were obtained from the EPA in 2019. We used air quality index (AQI) readings 28 EPA stations to quantify air quality (Tian et al., 2019). The overall AQI and individual pollutant IAQI values are calculated with equation (1) (Mintz, 2009)
According to EPA specifications on AQI categorization and health implications, the air quality is moderate when AQI is bigger than 50, and it is unhealthy for sensitive people when AQI is bigger than 100 (EPA, 2012). Therefore, for a specific time, we classify urban areas as polluted or unpolluted zones with the criterion of AQI above or below 50. Note that the classification is dynamic as the air pollution concentrations change.
Traffic volume data
To monitor the human mobility pattern, we use the data of traffic volumes from continuous count stations (CCS) downloaded from the Georgia Department of Transportation. The CCSs are permanent traffic counting stations that collect vehicle volume throughout the day. All traffic sites located in corresponding polluted or unpolluted zones are counted separately on each day. Traffic volume on polluted zones is collected and aggregated from all traffic sites located in the zone each day. Given the dynamic nature of traffic, the data of six-time periods (i.e. 6 am–10 pm, 6 am–12 pm, 7 am–7 pm, 12 am–12 pm, AM peak hour, and PM peak hour) are collected and analyzed. To further explore the temporal differences, we classify the periods into holidays and workdays and analyze the traffic volume in each type of period separately.
Land use data
The USGS land-use and land-cover Level I classification (Biau and Scornet, 2016) is adopted, including categories of roads, buildings, waterways, and vegetation. The road data were downloaded from OpenStreetMap, while types of land use data were obtained from the Atlanta Department of City Planning GIS. Particularly, industry buildings were considered a separate type because industrial emissions, including coal combustion, oil combustion, and power generation, are among the dominant sources of organic aerosols in urban areas (Wu et al., 2018). All other building types are considered as buildings only in this study. Given that traffic emission is the primary source of air pollution in Atlanta, another emission inventory is also noticeable. Industrial emission has contributed to more than 10% of particulates since 1968 in Atlanta (EPA, 1969), and the magnitude of it is larger than residential emissions (Wu et al., 2018). Therefore, this study chooses to focus on industry buildings as few studies had investigated the effect of industry type of land use (Lunetta et al., 2004).
Meteorological factors
In this study, the meteorological factors are used as control variables. Historical meteorological data are obtained from the Iowa Environmental Mesonet (IEM) website. We collected air temperature (F), relative humidity (%), wind speed (mph), and precipitation (mm) at each meteorological station. The IEM recorded weather observations throughout the world using sensors with a frequency of every five minutes. All meteorological factors are aggregated to a daily scale from 1 January 2019 to 31 December 2019.
Summary of data
Based on the daily AQI value, 58 days are found in which all areas in the city are polluted while 200 days are unpolluted. The remaining 107 days have both polluted and unpolluted types of zones. Taking each zone as a sample record, 1732 samples are recognized as polluted, and 7806 samples are considered unpolluted zones for all traffic sites. The study then partitions the 365 days into workdays and holiday days. A total of 143 holidays are extracted, including all federal holidays and weekends in 2019. Daily data of all factors are recorded at each sample location, including 4 meteorological factors (wind speed, relative humidity, precipitation, and temperature) and 25 urban form metrics (five indices with five land-use types, respectively), as well as traffic volume. The only exception is that traffic volume is obtained for six time periods each day (7 am–7 pm, 6 am–10 pm, 6 am–12 am, 12 am–12 am, AM peak hour, and PM peak hour). These data sets are used to construct all models. Supplemental Table 2 summarizes the descriptive statistics of daily traffic volume and meteorological factors in both polluted and unpolluted zones for each traffic site throughout the year. Supplemental Table 3 shows the summary of daily urban form metrics.
Research design
The flowchart and methods
The flowchart of the research design is shown in Figure 2. First, we identified polluted and unpolluted zones for each day in 2019 based on the daily AQI values estimated from EPA stations by Kriging interpolation. Kriging is a classical spatial estimation technique suitable for variables that change smoothly over space (Zhu et al., 2018). In this study, these zones are the basic spatial units in which urban form metrics were computed. The spatial coverage of a zone is a contiguous urban area where AQI values are all larger (polluted zone) or smaller (unpolluted zone) than a threshold value in all locations of the zone. This study takes 50 as the threshold value because it is the EPA’s standard value to determine the pollution category (EPA, 2012). Spatial modeling was performed in each zone. The urban form metrics of each land use type and the meteorological factors were used as independent variables, and the corresponding traffic volume is the dependent variable. Similarly, traffic volume data are also aggregated spatially in polluted and unpolluted zones for each day and temporally by workdays and holidays. The traffic volume in each polluted zone is the daily average volume of all traffic sites within the zone, and the same procedure applies to unpolluted zones as well. If a polluted or unpolluted zone does not contain any traffic sites, we would extract traffic information from the nearest site. Finally, three machine learning algorithms, namely RF, RT, and M5, and two comparison models, namely ANN and MLR, are used to model the relationships among the variables. Before running each algorithm, a scheme-specific attribute selection is performed to select highly relevant attributes. Finally, their performances were compared through 10-fold cross-validation.

The flowchart in the study.
This study describes the urban form as the spatial configurations and arrangements of different land-use types. The relationships between air quality, traffic volume, and urban form might be heterogeneous in space and time. They may vary in different types of urban areas and during different periods of the day, or by a different type of day (workdays vs. holidays). Thus, the study applies three different approaches to construct the models and compares their results. The first approach uses all daily data without stratification to construct a general model with all algorithms. The second is the spatial stratification approach. It separates all sample data into two groups, those in polluted zones and those in unpolluted zones. A model is constructed separately for either type of zones with all algorithms. The third approach is the spatiotemporal stratification. With this approach, the previous spatially stratified groups are further classified by the type of day, namely workdays and holidays. Then decision tree models are constructed for each stratum. Due to the innate heterogeneity of the relationships, the study designs three groups of experiments (i.e. no stratification, spatial stratification, spatiotemporal stratification) to examine the interplay between urban form, traffic volume, and air quality in different spatiotemporal granularity and configurations. This helps reveal the hidden relationships and find the most suitable model.
Data processing and descriptive analysis
Preliminary feature selection
Many methods (e.g. linear regression models, instance-based learning, and decision trees) are susceptible to irrelevant attributes because they fragment the instance space and increase the required number of training instances exponentially. Therefore, before the model construction process, it is necessary to select the most relevant and mutually independent attributes from the abovementioned pool of candidate variables. In this study, we used the scheme-specific selection to search for the attributes by a greedy stepwise (both forward and backward) algorithm. The 10-fold cross-validation was used to evaluate the accuracy and to make the final decision. The method evaluates the worth of a subset of attributes by considering each feature’s predictive ability and the degree of redundancy between them. Subsets of features that are highly correlated with the dependent variable while having low intercorrelation are preferred (Hall, 1999).
Traffic volume
Figure 3 summarizes the traffic volume for all periods. On the one hand, traffic volumes during workday days are higher than holidays in polluted zones. However, such a difference is not evident for unpolluted zones. On the other hand, traffic volume during workdays is higher in polluted zones than that in unpolluted zones, but the volume is similar during holidays in both types of zones. Therefore, the spatiotemporal partition of traffic volume reveals potential differences between air quality, traffic volume, and urban form. Moreover, the peak volume in the afternoon is overall higher than that in the morning, which indicates that trips are more concentrated in the afternoon. Peak volume usually leads to traffic congestion and worse air quality.

Traffic volume in polluted zones for (a) holidays and (b) workdays, and in unpolluted zones for (c) holidays and (d) workdays.
Urban form metrics
Previous studies did not consider each type of land use separately when calculating urban form metrics to analyze their relationships with air quality (Liu et al., 2017, 2018b). This study considers more nuanced information on urban morphology. We treat five land-use types individually (roads, buildings, waterways, vegetation, and industry) and calculate their corresponding metrics in Fragstats. The selected metrics include total class area (CA), patch density (PD), largest patch index (LPI), landscape shape index (LSI), and aggregation index (AI). Specifically, CA is the sum of all polluted zones in the research area, and it changes daily according to the AQI value monitored from the station. PD (Irwin and Bockstael, 2007) indicates the degree of fragmentation or interspersion. Computationally, for a land-use type i (LUi), the PDi is proportional to the number of LUi patches divided by the total area of the zone. LPI (Bereitschaft and Debbage, 2013) is the percentage of the total zone area comprised the largest patch. LSI (Bereitschaft and Debbage, 2013) measures the shape complexity. Computationally, it is the ratio between the total perimeter of LUi patches and the minimum total perimeter of LUi possible for a maximally aggregated class. AI (He et al., 2000) measures the degree of aggregation or clumping and only considers adjacencies. These five metrics can effectively represent the spatial configuration, agglomeration, and interspersion of urban land-use patterns based on previous studies of how urbanization affects eco-environmental issues (Buyantuyev and Wu, 2010; Li et al., 2013; Wu et al., 2011). The summary of each metrics is shown in Supplemental Table 1. These metrics are used to describe the morphological characteristics of both polluted and unpolluted areas.
This study focuses on the selected urban form metrics, as shown in Table 3. Figure 4 displays the results of AI, LPI, LSI for polluted zones and the PD, LPI, and AI for polluted and unpolluted zones in both workday and holiday. All selected urban form metrics have a similar trend with the traffic volume. For example, the AI–waterway, AI–building, and AI–roads are all peaked around Mar 18 and went down immediately, which is similar to Figure 3(a).

Selected urban form metrics in polluted and unpolluted zones for both holidays and workdays. (a) Polluted zone during holidays; (b) Polluted zone during workdays; (c) unpolluted zone during holiday; (d) unpolluted zone during workdays.
As for the difference between holiday and workday, LPI–vegetation and LPI–road displays fluctuate more during workdays than holidays only in polluted zones. More variance of polluted areas during workdays indicates diverse trip destinations and a broader range of human mobility than the holiday, suggesting that people living in polluted zones may prefer to stay at home during holidays. Compared between two zones, almost all metrics’ fluctuation is more drastic in the polluted zone than unpolluted zone. Therefore, the majority of days are clean. The comparison indicates that the spatial coverage of unpolluted zones is more stable throughout the year than the polluted zone. It confirms that 200 days are unpolluted according to the AQI data. This evidences the spatial inequality issue that higher-income people residing outside of the city center enjoy better air quality. For the AI and LSI in the polluted zones, industry and road have the highest values respectively throughout the year. It implies that highly aggregated roads and industry areas dominate the polluted area and could mostly indicate traffic volumes, which reconfirms the findings in a later section.
Model construction and comparison
Decision tree models
The decision tree is a classic machine learning method. It predicts the outcomes of new cases based on historical data (training data) based on a tree structure. Each node is a test on a feature (or attribute) in a decision tree, each branch corresponds to an outcome of the test, and each leaf node is an output. The final tree includes all the paths from the root to the leaf, which can be used to predict classification or regression outputs. The tree structure examines every property of the predictors to make final decisions with strong generalization ability. It can model complicated relationships effectively between dependent and independent variables (Witten et al., 2016). Few studies compared the performance of different decision tree models under the same circumstance for air quality research. Thus, three decision tree models (RF, RT, and M5) were applied, and their performances were compared in this study.
The RF produces results by considering each branch to get an output. The random forest algorithm first selects a feature subset and then split the subset (Supplemental Figure 1) using the best property (e.g. entropy) chosen at that node (Breiman, 2001), which corrects overfitting for decision trees during training. The number of features in the subset controls the degree of randomness, which makes RF usually outperform ordinary bagging algorithms. Then, the final classification and regression results are obtained by voting and averaging from all branches, respectively (Witten et al., 2016), which can avoid overfitting issues to a large extent (Breiman, 2001).
Like the RF machine learning method, an RT is also supervised learning that generates lots of individual learners and employs a bagging method to construct a random set of data for constructing a decision tree (Witten et al., 2016). One difference that the RT only considers a certain number of random attributes when making decisions (Witten et al., 2016).
The M5 was proposed by Quinlan (1992). It is built by a binary division (Supplemental Figure 1) process with linear regression functions to model the relationship between the dependent and independent variables (Rahimikhoob et al., 2013). Two steps are involved in the process of generating an M5 tree. The first step splits the data into subsets to create a decision tree structure. The second step prunes the tree, and the pruned sub-trees are replaced with linear regression functions to overcome the overfitting or poor generalization issues (Quinlan, 1992). The M5 algorithm calculates the standard deviation of the subset values as a measure of error and estimates the expected reduction of error after testing each attribute at that node (Rahimikhoob et al., 2013). The main advantage of the M5 algorithm lies in its comprehensive and simple form of regression.
All three algorithms can be applied for both quantitative and qualitative data, and they can be used to solve classification or regression type of problems. However, they have different tree structures and use different ways to divide sample spaces (Rahimikhoob et al., 2013; Yaseen et al., 2018). The open-source software Weka is used to construct and test the three decision tree models in this study. The inputs are the CSV files containing all traffic volume and other independent variables. We partitioned all the data into two subsets. One is for training purposes and the other is for testing purposes.
Comparison models
To compare the results with the three decision tree models, we also applied MLR, which is commonly used in previous studies, and the ANN, which is another type of machine learning model. Supplemental Figure 2 shows the construction process for ANN.
Results and discussions
Model results
No stratification
Table 1 shows that all models perform unsatisfactorily with a small R (correlation coefficient) and large root mean squared error (RMSE). The MLR model’s performance is similar to that of the three decision tree models, while the ANN performs the worst. However, all models have unsatisfactory performances with very low R (<0.3) values, which indicates that the interplay between urban form, traffic volume, and air quality cannot be well captured by any of the no stratification models. Furthermore, urban form factors have not been selected by any models. Therefore, we cannot examine the relationship between urban form and traffic volume with these no stratification models.
Summary of model results without stratification.
RF: random forest; RT: random tree; MLR: multiple linear regression; ANN: artificial neural network; RMSE: root mean square error.
Spatial stratification
Table 2 shows the model results for dividing traffic volume into polluted and unpolluted zones for all traffic sites. The results are much better than Table 1. It can be seen from the table that the RF model overall outperforms the RT and M5 models with medium R (0.5–0.8) and lower RMSE, especially for the polluted zones. This is reasonable because the RF model can better handle missing values than the RT and M5 models (Witten et al., 2016). The result is also compatible with the previous study (Yu et al., 2016). The RT model is inferior to the RF model in performance. The reason is probably that the RT learning only uses one global setting of the ridge value for all leaves, thus it may oversimplify the optimization procedure and induces unreasonably balanced trees (Grama and Rusu, 2017; Liaw, 2013). According to the results of the M5, the relationship may not be well modeled by a linear split. The MLR performs the worst in “Polluted Zones” among all models. For the “Unpolluted Zones”, the MLR performs a little better than the RT and M5, but still worse than the RF except for the periods “6 am–12 pm” and “Peak am”. The ANN model performs a little better than the MLR but still worse than the RF, especially for the “Polluted Zones”. It performs similarly well as other models for the “Unpolluted Zones”. In sum, the RF model still outperforms other models when the spatial stratification is applied.
Summary of model results for polluted and unpolluted zones.
RF: random forest; RT: random tree; MLR: multiple linear regression; ANN: artificial neural network; RMSE: root mean square error; LPI: largest patch index; AI: aggregation index; PD: patch density; LSI: landscape shape index.
When comparing with different time periods of traffic volume, we find that the RF shows similar results for all periods in polluted zones (R = 0.75) and slightly different for unpolluted zones. RT performs better during periods from 6 am to 10 pm in polluted zones (R = 0.67). The M5 gives the best results during periods from 7 am to 7 pm and the afternoon peak hour for polluted zones (R = 0.59). Overall, all models perform better for the polluted zones than for the unpolluted zones for traffic volumes in all periods, demonstrating that the urban form metrics have more pronounced effects on traffic volume in polluted zones than in unpolluted zones.
As for selected attributes in polluted zones, wind speed, temperature, LPI–waterway, AI–waterway, AI–road, LSI–vegetation, PD–industry, and LPI–industry are the most impacting factors in polluted zones, which indicate that the dominance and clustering distribution of waterway, roads, and industry shows a strong correlation (R > 0.7) with traffic volume. On the one hand, the area with highly aggregated roads and industrial areas may have a higher demand for transportation that leads to higher traffic volume. On the other hand, the area with a high density of vegetation and waterway is also attractive. For instance, 64% of citizens live within a 10-min walk of a park or lake in Atlanta (Saporta, 2013). For unpolluted zones, relative humidity, LPI–waterway, and LPI–vegetation play a more critical role than other attributes. However, it may be inappropriate to draw any further conclusions given its relatively weaker correlation (R < 0.6) so far.
Spatiotemporal stratification
In this stratification scheme, we construct separate models for each spatiotemporal stratum based on a spatial partition (polluted and unpolluted zones) and a temporal partition (holidays and workdays). The periods of traffic volume are further classified into holidays and workdays. Table 3 shows the model results in holidays and workdays, each for both polluted and polluted zones. It can be seen that the RF model still performs the best. Besides, polluted zones still have higher R (0.5–0.8) than unpolluted zones for both holiday and workday, respectively. All models perform best during holidays in polluted zones and during workdays in unpolluted zones.
Summary of model results for spatiotemporally stratified data.
RF: random forest; RT: random tree; MLR: multiple linear regression; ANN: artificial neural network; RMSE: root mean square error; LPI: largest patch index; AI: aggregation index; PD: patch density; LSI: landscape shape index.
Besides, the spatiotemporal stratification approach provides the best results to reveal such interrelations. In Table 1, all models have unsatisfactory performances and no urban form factors have been selected by any models. In Table 2, we find a strong correlation for polluted zones and a modest correlation for unpolluted zones, but we did not know further internal differences. In Table 3, such a difference has been revealed. For example, the model performances on unpolluted zones during working days are much better than unpolluted zones during holidays, which have been muffled in Table 2. Furthermore, we find all models show better results for holidays than workdays in polluted zones, but the situation is reversed for unpolluted zones.
For selected features in polluted zones, several features remain unchanged compared with Table 2. Still, the feature LPI–building has been chosen by most models for holidays, suggesting that the dominance of buildings is associated with the traffic volume in holidays. It reflects the social-spatial dimension of human mobility. People prefer to get together during holidays, and thus the place with concentrated buildings (e.g. church, restaurants) could attract more people, and the traffic volume increases consequently. The situation is similar for PD-building for unpolluted zones during workdays. Besides, AI–waterway, LPI–waterway, and LPI–vegetation show a higher correlation (R > 0.8) with traffic volume in unpolluted zones during workdays. It suggests that people's trip purposes and route selection priorities vary in space. For instance, for medium and high-income people who live in unpolluted zones, such as the northern part of Atlanta city (Jang and Yao, 2014), the level of comfort and landscape attractiveness of a trip route might be of high priority. The dominance and aggregation of waterway and vegetation usually imply greenspaces in the city, which would be attractive for travelers.
Discussions
Even though a lot of previous studies have analyzed how urban form could impact air quality by categorizing urban morphology (Shi et al., 2018), estimating traffic emission (Peters and Jones, 2003), quantifying energy consumption (Yang and Li, 2011), or analyzing their effects on urban climates, such as O3 formation (Clark et al., 2011), urban heat island effects (Breuste et al., 2013), and photochemical pollution (Fallmann et al., 2014), limited researches have explored the interplay between urban form, traffic, and air quality, considered both land use and urban form simultaneously, and compared the results from multiple nonlinear models. According to the experiments, firstly, the inter-relationships become incrementally clearer from Tables 1 to 3. The spatiotemporal stratification approach provides the best results to reveal such interrelations. Secondly, traffic volume also shows generally higher correlations with urban form in polluted zones than unpolluted zones. For specific attributes, highly aggregated roads and industry areas (e.g. AI–industry, AI–building, LPI–road, LPI–industry) are closely correlated with traffic volume in polluted zones due to the high transportation demand, such as central business districts and populated residential communities. The dominance of waterway and vegetation (e.g. AI–waterway, AI–vegetation, LPI–waterway, LPI–vegetation) also show a high correlation with traffic volume in unpolluted zones because of the location preference of human activities, indicating that urban vegetation and waterway play a significant role in alleviating air pollution even though there are high traffic volumes. Therefore, adding more greenspaces and waterfront is useful to improve the living and working environment, especially in the regions with high residential and employment density. The degree of association is also different for holidays and workdays, suggesting synergic effects of working schedule and urban form on traffic volume. With regard to model performance, the RF model outperforms all other models with higher R and lower RMSE.
Conclusions
Machine learning models are suitable to simulate complicated relationships. Comparing with the traditional models, they do not impose a linear assumption. For example, decision tree models create a comprehensive analysis of the outcomes, identify all possible consequences of a decision by tracing each path. Moreover, it can indicate the feature importance, which is hidden by a black-box modeling process such as the ANN (Witten et al., 2016). In this study, we construct three decision tree models and two comparison models to understand the interplay between urban form, traffic volume, and air quality to explore how urban form impacts the human mobility pattern through different spatiotemporal partitions. The results show that RF can best simulate the relationship with higher R and lower RMSE. By experimenting with three different schemes, the study finds that spatiotemporal stratification is the best approach. The association between traffic volume and urban form metrics are generally higher in polluted zones than in unpolluted zones and stronger during workdays than holidays. Considering specific urban form features, highly aggregated roads and industrial areas are closely correlated with traffic volume in polluted zones, and the dominance of waterway and vegetation also shows a high correlation with traffic volume in unpolluted zones. The research finds interesting patterns about the impact of urban form on traffic behavior in different spatial and temporal scenarios. Such information can be helpful for urban planners and practitioners to make informed decisions. Although the findings above are informative and useful, several limitations still need to be considered carefully and can be addressed in future research.
Data availability
The change of background concentration of air pollutants needs to be considered, and it can be improved by collecting the field air quality data (e.g. portable sensors) in future research. Besides, it would be useful to combine traffic volume data for more fine-grained analysis as well. Also, the air quality data are site-based, which is usually applicable where air pollutants are routinely released or when the release occurs over a long time to establish a fixed-site monitoring system (Covello and Merkhoher, 2013). Nonetheless, the site-based measurement may still introduce errors during extrapolation. Therefore, it could be helpful to use satellite-derived data to simulate the distribution of air pollutants (e.g. NO2 (Bechle et al., 2011; Li et al., 2018) and PM2.5 (Wu et al., 2018)). Finally, the traffic volume data were estimated with original data collected in six fixed periods. It will be helpful to collect field traffic volume data in finer temporal granularity and larger spatial coverage in future research.
Meteorological factors
In this study, we considered wind speed, temperature, relative humidity, and precipitation. However, we did not use wind direction data because it would be less meaningful to use average wind direction on a daily scale. Thus, further research could apply hourly data to improve temporal granularity given that wind speed and wind direction improve the model’s performance well when evaluating the traffic-related air pollutants (e.g. NO2, NO, and PM2.5) than other pollutants (Contreras and Ferri, 2016). In this study, O3 was another defining parameter of AQI in Atlanta, which is not directly related to vehicle emission. This also indicates that AQI-based polluted urban areas may not perfectly reflect the traffic volume.
Model construction
We stratified the traffic volume through spatiotemporal dimensions, and further research could also focus on other factors, such as race, gender, and trip purposes. Other potential factors, such as socioeconomic variables and different building types, could be added as well. How these variables play out similarly or differently on a microspatiotemporal scale deserves further exploration. Moreover, we used a single threshold of 50 to divide the research area, given that there are only eight samples throughout the year with AQI larger than 100. This is appropriate for some cities with relatively low levels of air pollution. However, in future studies, it is better to select multiple thresholds determined by a computational or experimental method to further divide the region. Also, more advanced methods, such as the deep learning models, can be applied to further explore the interplay between urban form, traffic volume, and air quality.
Supplemental Material
sj-pdf-1-epb-10.1177_2399808321995822 - Supplemental material for Urban form, traffic volume, and air quality: A spatiotemporal stratified approach
Supplemental material, sj-pdf-1-epb-10.1177_2399808321995822 for Urban form, traffic volume, and air quality: A spatiotemporal stratified approach by Ye Tian and Xiaobai Yao in EPB: Urban Analytics and City Science
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Authors’ biography
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
