Assessing impacts of the built environment on mobility: A joint choice model of travel mode and duration

Abstract

This paper introduces a joint choice model for travel mode and duration to quantify the mobility impacts of urban design changes on the built environment. The model is formulated as a Random Forest classifier that predicts the mode-duration probabilities of a given trip. A novel series of predictor features are proposed which measure the urban form, demographics, and service densities on different scales of the transportation network. Through a sensitivity analysis and a proof-of-concept case study, we find that a dense, mixed-use environment with good coverage of a multi-modal mobility network can significantly promote active transportation and public transit use. However, we also find that ultra-dense, centralized developments can lead to increased travel time and increased vehicle use in the urban periphery. Our modeling and analysis method provides a simplified and effective way to assess urban design and planning scenarios from different mobility perspectives and facilitates data-driven, mobility-aware urban design and planning that can help identify better solutions more quickly.

Keywords

urban mobility planning urban design machine learning travel behavior

Introduction

The global urban population is projected to grow significantly by 2050 (United Nations, 2018), which indicates that existing urban mobility problems like congestion and pollution could be further aggravated in the coming thirty years. It is important for future cities to reduce vehicle reliance and promote alternative transport solutions such as walking, biking, and public transit (Rupprecht Consult, 2019). For example, the California Transportation Plan 2050 envisions reducing total vehicle distance traveled by up to 27% and increasing trips by public transit and active modes by 10% to mitigate congestion, protect the environment, and enhance the quality of life.

Well-informed mobility-aware decision-making in spatial planning and urban design is imperative to realizing these goals. Built environment factors, such as placement of urban services, population density allocation, and network design, have a fundamental impact on human travel behavior, such as mode choice and travel distance (De Vos et al., 2021; Ding et al., 2017; Litman, 2022; Wang and Zhou, 2017). Although these factors are found to have modest significance on their own, typically just a few percent of total travel choices, they are synergistic and therefore have a significant combined effect on aggregate travel behavior (Ewing and Cervero, 2010). Urban infrastructure and spatial patterns are much more costly to change once established than policies like parking pricing or gasoline taxing. Hence, there is a consensus that good spatial planning and urban design is a reliable, long-term, and cost-effective approach to alleviating mobility problems (Ma et al., 2018).

To achieve mobility-aware decision-making in urban design, it is necessary to quantify the built environment’s influence on travel choice behavior. The main approach is to link built environment variables and the travel choice probabilities through statistical models. The commonly used models include discrete choice models (Ding et al., 2018; Yin et al., 2022) and regression models (Berhie and Haq, 2017; Lee et al., 2016). The most analyzed travel choice is the mode choice. In terms of built environment variables, most studies used density, diversity, and design indicators, the so-called 3Ds (Cervero and Kockelman, 1997). Density variables can be population, employment, or activity densities. Diversity variables can be measurements of job-housing ratios or the mix of land use types. Design variables can be descriptions of network characteristics, such as connectivity or street segment length, or street quality, including tree canopy or reported pleasantness. Other variables that are often accounted for in these studies are variables related to self-selection, such as sociodemographic variables and attitudinal variables (Aston et al., 2021).

The existing approach can be developed further. Firstly, there is a growing interest in switching from traditional statistical models to data-driven machine-learning models. Studies demonstrate that machine learning methods are more robust, flexible, and generalizable to be used in mobility modeling because they are purely driven by data and do not introduce predefined utility functions or model structure (e.g., linear or log-linear) (Drchal et al., 2019; Zhang and Zhao, 2022). Evidence also shows that machine-learning classification models have overall higher predictive power than traditional discrete choice models (Cheng et al., 2019; Zhao et al., 2020). Secondly, there is potential for analyzing more integrated mobility metrics rather than just travel modal split. This can be realized using a joint-choice model. For example, previous studies have investigated joint choices of travel mode and travel distance (Ding et al., 2014; Vega and Reynolds-Feighan, 2009). Beyond these studies, this type of model has received limited attention to date. Thirdly, there is room to expand the definition of the built environment variables. The current definitions of the 3D variables are usually restricted by a small spatial range or buffer zone (e.g., density within a walkable distance of 1.5 kilometers or 15-minute walking) (Berhie and Haq, 2017; Gan et al., 2021). These variables may be sufficient for modeling walking preferences. However, in a multi-modal system, different travel modes can be sensitive to varying spatial ranges. It can be beneficial to include built environment variables measured on different scales of the transportation network to comprehensively reflect trade-offs of multi-modal choices. However, there is limited research examining the variability within different measurement scales of the built environment variables.

This paper aims to build on previous studies and propose a data-driven joint choice model of travel mode and duration with a novel set of built environment variables. The joint choice model is formed as a machine learning classifier that predicts the mode-duration probabilities of a given trip. The novel-built environment variables are a series of accessibility (ACC) features that measure the total number of different jobs within different mode-duration levels from the starting location. As a proof-of-concept example, we train the model based on travel survey data and use it to predict the aggregate mode-duration distributions in the City of Los Angeles (LA City) at the Census Block Group (CBG) level. A sensitivity analysis is conducted for evaluating the marginal effects of the built environment features. A case study is used to showcase how the trained model can help answer what-if questions about built environment scenarios.

Data processing and feature engineering

Travel features

The travel data are sourced from the 2017 National Household Travel Survey Add-On data for California (NHTS) (Caltrans, 2017), which contain 81474 trip records within LA City made by 19311 unique individuals after the data cleaning. Table 1 defines the discretized levels of the mode-duration choice set. Each choice combines a mode (m) and a corresponding duration level (d), such as walking by 0–5 minutes. The mode refers to the dominant travel mode of each trip. The trip duration is self-reported by recording the start time and the end time. The duration levels are defined based on the 20%, 40%, 60%, and 80% quantile of durations of all trips by the mode in the NHTS data.

Table 1.

Mode-duration choice set.

Mode m^a	Duration level d (minute)
Mode m^a	d ₁	d ₂	d ₃	d ₄	d ₅
min-max (median)
Walk (10.1%)	0–5 (5)	5–7 (7)	7–10 (10)	10–15 (15)	>15 (26)
Bike^b (1.6%)	0–8 (5)	8–15 (10)	—	15–30 (22)	>30 (45)
Vehicle (86.4%)	0–8 (5)	8–13 (10)	13–20 (15)	20–30 (30)	>30 (45)
Transit^c (1.9%)	0–25 (7)	25–35 (30)	35–50 (45)	50–70 (60)	>70 (95)

^aTrips that are taken by modes other than walking, biking, vehicle, and transit are dropped (2.5% of all trip records before dropping).

^bThe bike mode has the same 40% and 60% quantile duration (i.e., both are 15 minutes) thus it has one less duration level.

^cThe duration of the transit trip is derived by subtracting the self-reported waiting time from the total travel duration.

Table 2 describes the features of the trip and the traveler. The trips are geo-coded by specifying the Census Block Group (CBG) of the origin location. The travelers are identified by the cluster membership regarding eight traveler clusters. These are clusters representing distinct travel patterns of individuals across the US. The derivation and the validation of these traveler clusters have been extensively discussed in a previous paper (Yang et al., 2023).

Table 2.

Description of travel features.

Abbr.	Feature Name	Type	Description/Levels
Trip features
—	Origin CBG	Numerical	The FIPS^a code of the CBG in which the trip started
SSN	Season	Categorical	0 = winter (Dec, Jan, Feb); 1 = spring (Mar, Apr, May); 2 = summer (Jun, Jul, Aug); 3 = fall (Sep, Oct, Nov)
WKD	Weekend	Categorical	0 = weekday; 1 = weekend
TOD	Time of day	Categorical	0 = early morning (12pm - 6am); 1 = morning (6am–11am); 2 = noon (11am - 3pm); 3 = afternoon (3pm–5pm); 4 = evening (5pm–9pm); 5 = night (9pm–12pm)
SAC	Starting activity	Categorical	0 = from other; 1 = from home; 2 = from work; 3 = from education; 4 = from shop/errands; 5 = from recreation; 6 = from food service
DAC	Destination activity	Categorical	0 = to other; 1 = to home; 2 = to work; 3 = to education; 4 = to shop/errands; 5 = to recreation; 6 = to food service
Traveler features
CLS	Traveler cluster	Categorical	1 = cluster 1; …; 8 = cluster 8

^aFIPS stands for the Federal Information Processing Standard codes which uniquely identify the CBG.

Built environment features

The built environment features, as shown in Table 3, include the population density (POP) and a series of accessibility (ACC) features. The POP is sourced from the American Community Survey 5-year Estimates (US Census Bureau, 2018). The job count information for deriving ACC features is sourced from the LEHD Origin-Destination Employment Statistics (LODES) (US Census Bureau, 2017). The LODES data record the number of different jobs at the level of census blocks. The jobs are categorized by the NAICS code, a federal standard for classifying businesses and industries. All features in Table 3 are matched to the trip data in Table 2 by the trip origin CBG.

Table 3.

Description of built environment features.

Abbr.	Feature Names	Type	Description
—	CBG	Numerical	The FIPS code of the CBG
POP	Population density	Numerical	No. residential population per square kilometer
ACC	Accessibility features
ACC(RT/ER)	Accessible retail/errands service density by mode-duration levels	Numerical	No. retail/errands (RT/ER) service jobs (NAICS sector 42, 44–45, 48–49, 53, 81) by mode-duration levels
ACC(REC)	Accessible recreational service density by mode-duration levels	Numerical	No. recreational (REC) service jobs (NAICS sector 71) by mode-duration levels.
ACC(FOD)	Accessible food service density by mode-duration levels	Numerical	No. food (FOD) service jobs (NAICS sector 72) by mode-duration levels
ACC(BS/AD)	Accessible business/administrative service density by mode-duration levels	Numerical	No. business/administrative (BS/AD) service jobs (NAICS sector 56, 92) by mode-duration levels
ACC(MAN)	Accessible manufacturing service density by mode-duration levels	Numerical	No. manufacturing (MAN) service jobs (NAICS sector 11, 21–23, 31–33) by mode-duration levels
ACC(PR/TE)	Accessible professional/technical service density by mode-duration levels	Numerical	No. professional/technical (PR/TE) service jobs (NAICS sector 51–52,54–55, 62) by mode-duration levels
ACC(EDU)	Accessible educational service density by mode-duration levels	Numerical	No. educational (EDU) service jobs (NAICS sector 61) by mode-duration levels

The ACC features of a CBG are derived by counting the jobs of different categories by mode-duration levels from the CBG. Each ACC feature can be denoted as ACC(X)_m_d (e.g., ACC(EDU)_walk_0-5min) where X is the abbreviation of categories as listed in Table 3. equation (1) defines the ACC features

A C C (X)_m_d = \sum_{d_{m, i} = d} N_{X, i}

(1)

where

N_{X, i}

is the number of job X within the CBG i, and

d_{m, i}

denotes the duration level between the analyzed CBG and the CBG i by mode m. There would be no ACC features for the longest duration level (d₅) because the number of jobs within d₅ for any mode is always infinite.

The duration level between the CBGs is derived through a mode-specific routing process. Figure 1 illustrates the accessible CBG within different walking duration levels from block group A. Note that most CBGs use the geometric center as the router point, while the large ones like block group B ( $\geq$ 1 square kilometers) use multiple equal-spaced points (with an interval of 900 meters) on the boundary as the router points. As a result, B falls in both d₂ and d₃ due to its multiple router points.

Figure 1.

(a) CBG router points. (b) (c) Accessible CBG within different walking duration levels from A.

We develop a custom program written in C# for deriving ACC features for any given location across the U.S. The mode-specific routing process is conducted using Itinero (Itinero, 2019), an open-source routing package based on the. NET framework. The walking, biking, and vehicle trips are routed based on the street network, sourced from OpenStreetMap (OpenStreetMap, 2017), with pre-defined mode-specific travel speeds. The transit routing is based on a separate transit network including lines and stops of subways and buses sourced from TransitFeeds (TransitFeeds, 2015). The transit routing allows switching between different lines and accounts for the walking time towards, from, and between different stops.

Modeling and analysis methodology

Mode-duration choice model

For any given trip with travel features and built environment features, a choice model is trained to predict its mode-duration choice probabilities denoted as MD in equation (2)

M D = [p_{m_{1}, d_{1}}, \dots, p_{m_{i}, d_{j}}]

(2)

where

p_{m_{i}, d_{j}}

is the probability of choosing mode

m_{i}

and duration level

d_{j}

(

\sum p_{m_{i}, d_{j}} = 1

For training, each numerical feature is scaled into the range of zero to one, and each categorical feature is one-hot encoded (i.e., encoded into multiple zero-one features regarding all discretized levels). This process yields 141 final predictor features for the choice model. The classification model is trained through a five-fold cross-validation process during which the dataset is split into a different set of 80% training and 20% testing in each of the five iterations. The aggregate-level goodness-of-fit is measured by L1 Norm as defined in equation (3)

L 1 N o r m = \sum_{m, d} | {\bar{p}}_{m, d} - {\hat{p}}_{m, d} |

(3)

where

{\bar{p}}_{m, d}

is the average predicted probability of the choice m and d for each data sample, and

{\hat{p}}_{m, d}

is the observed percentage of the choice in the dataset. A lower L1 Norm indicates a higher accuracy in predicting the aggregate distribution of choices. Multiple classification algorithms that can output probabilities, including Random Forest, AdaBoost, Naïve Bayes, Neural Network, and K-Nearest Neighbor, are tested to select the best-performance algorithm.

Analysis metrics

With the individual-level MD per trip predicted by the choice model, the aggregate CBG-level MD per trip, denoted as $\tilde{M D}$ , can be calculated as in equation (4)

\tilde{M D} = \sum_{k} {M D}_{k} \cdot C_{k}

(4)

where

{M D}_{k}

refers to the predicted MD regarding the traveler cluster k, and

C_{k}

is the percentage of cluster k in the total population of the CBG. The derived

\tilde{M D}

takes the same format as of MD (equation (2)) where each element is denoted as

{\tilde{p}}_{m_{i}, d_{j}}

Two aggregate-level mobility metrics can be derived. The first metric is the aggregate mode percentage (i.e., modal split, $\sum_{m} M_{m} = 1$ ), denoted as $M_{m}$ in equation (5)

M_{m} = \sum_{j} {\tilde{p}}_{m, d_{j}}

(5)

The second metric is the estimated duration per trip by mode m, denoted as $D_{m}$ in equation (6)

D_{m} = \sum_{j} E [d_{j}] \times \frac{{\tilde{p}}_{m, d_{j}}}{M_{m}}

(6)

where

E [d_{j}]

refers to the expectation of travel duration of the level

d_{j}

regarding the mode m. In this paper, we use the median durations in Table 1 as the

E [d_{j}]

. An increased

D_{m}

indicates increased long-duration trips and decreased short-duration trips by the mode in the analyzed CBG.

Sensitivity analysis

The sensitivity analysis examines the effect of built environment features on the predicted MD by holding all other features as constants while only changing POP and ACC. Table 4 shows the feature combinations designed for this analysis. Each possible feature combination yields a separate MD result. The results are plotted, analyzed, and compared through graphs.

Table 4.

Feature combinations for sensitivity analysis.

SSN	WKD	TOD	SAC	DAC	POP and all ACC features
Summer	Weekday	Morning	From home	To work	Change from 10% quantile to 90% quantile in the training data
				To shop/errands
				To recreation
				To food service
			From work	To work
				To shop/errands
				To recreation
				To food service

*The traveler cluster (CLS) feature is not included in the table because predictions are conducted separately for each cluster and then combined into a weighted average result, weighted by the cluster percentage in the entire training data.

Case study

A case study is conducted in LA City to show how the model can help answer what-if questions about built environment scenarios. We first evaluate the current mobility environment using the metrics of $M_{m}$ and $D_{m}$ . Then, we test three hypothetical urban design scenarios and quantify their mobility impact based on changes in the metrics (denoted as $Δ M_{m}$ and $Δ D_{m}$ ). Figure 2 shows the spatial context of the case study and the location of a new high-density development away from the existing city center and a new transit line that cuts across the Valley from East to West. The exact choice of location is random and our case studies are hypothetical and thus not based on practical reasoning or real-world references.

Figure 2.

The spatial context of the LA City. (a) Different levels of geographic boundaries, with the densified neighborhood highlighted in red. (b) Existing transit network, with the added rail line highlighted in red.

Table 5 specifies the tested scenarios in detail. Scenario 1 and Scenario 2 densify the randomly picked neighborhood by increasing all types of job counts within it. Scenario 3 adds the new rail line based on Scenario 2. To save computation time, we only model one type of trip as an example, which are the commuter trips (i.e., trips from home to work) in the morning of summer weekdays.

Table 5.

Description of hypothetical scenarios in the case study.

	No. jobs of each CBG within the densified neighborhood							Add rail line
	RT/ER	REC	FOD	BS/AD	MAN	PR/TE	EDU	Add rail line
Current (mean)	89.6	14.7	35.6	23.1	42.4	137.5	24.3	no
Scenario 1^a	334	38	157	85	103	378	102	no
Scenario 2^b	2457	322	826	1141	1351	3387	688	no
Scenario 3	2457	322	826	1141	1351	3387	688	yes

^aScenario 1 assumes the job counts in each densified CBG are equal to the 90% quantile of all CBGs in LA City.

^bScenario 2 and Scenario 3 assume the 99% quantile.

Result

Choice model training result

The average L1 Norms in the cross-validation process are shown in Table 6. The Random Forest classification algorithm is found to have the best predictive performance with the lowest L1 Norm of 0.1007.

Table 6.

Average L1 Norms in the five-fold cross-validation.

	Random Forest	AdaBoost	Naïve Bayes	Neural Network	K-Nearest Neighbor
Average L1 Norm	0.1007	1.0495	0.3034	0.1071	0.1244

The feature importance rank of the Random Forest classifier is used to reveal the most important features that determine the mode-duration choice of a trip. Figure 3 shows the top quarter of the features. The activity type of shop/errands (SAC_4 and DAC_4) and work (SAC_2 and DAC_2) are among the highest-ranking features. Meanwhile, none of the time features (i.e., season, weekday, time of day) enters the top-ranking list. A possible explanation is that the activity features may already have a notion of time embedded, which impairs the significance level of the time features. For instance, trips from home to work may happen mostly in the morning and on weekdays.

Figure 3.

The top 35 features in the feature importance rank. Feature names are represented by the abbreviations and the discretized levels as shown in Tables 2 and 3.

Within the ACC features, we find that the accessibilities for vehicle and transit duration levels are overall more impactful than the ones measured by walking duration levels. The most impactful ACC feature is the food service job density within 0 to 8 vehicle driving distance (ACC(FOOD)_vehicle_0-8min). This provides evidence that only using a walkable distance as the buffer zone for deriving built environment variables, as in most previous studies, may not be sufficient for modeling multi-modal choices.

Sensitivity analysis result

The predicted MD based on all feature combinations in the sensitivity analysis are presented through graphs in Figure 4. The line graphs display how the modal split ( $M_{m}$ ) changes as the built environment features (POP and ACC) increase. Each line graph is juxtaposed by a stacked bar chart showing the duration distributions regarding the minimum and the maximum-built environment features analyzed in the line graph.

Figure 4.

Sensitivity analysis result. (a) (c) (e) (g) (i) (k) (m) (p) are line graphs showing how modal split (y-axis) changes as the POP and ACC (x-axis) increase. (b) (d) (f) (h) (j) (l) (n) (q) are stacked bar charts showing the duration distributions regarding the minimum and the maximum POP and ACC.

Based on the line graphs, the mode percentages do not change significantly until the POP and ACC features reach around 50% quantile of the data. The non-linear shape of the fitted curve is partially due to the non-linear increase on the x-axis (i.e., the quantiles of the POP and ACC do not increase linearly). But it also reveals that the built environment does not have a significant impact on mobility until it reaches a certain density (the quantile values of POP and ACC on the x-axis are specified in Supplementary Table S1 in the Supplementary Material). Another finding is that the magnitude of changes in modal split varies by mode. For example, the increase in the biking mode is noticeably slower and smaller than other modes.

An interesting finding based on the stacked bar chart is about the vehicle trip duration distribution (j) (l). Although most of the short-duration vehicle trips are replaced by alternative modes, the long-duration vehicle trips (e.g., trips of d₅) show an opposite trend which is increasing when the environment is densified. This is potentially due to the impacts of the congestion. In other words, it is likely that the same distance or even shorter trips by vehicle can take more time in dense urban areas with congestion, leading to an increased probability of long-duration driving trips.

Case study result

Figure 5 shows the results of mode percentages $M_{m}$ in the case study. The first column shows $M_{m}$ in the current environment. The last three columns show $Δ M_{m}$ before and after implementing the scenarios. In the current environment, West and Central LA are more walkable overall (a) and less reliant on vehicles (i) due to their higher level of density and accessibility. Central has the highest percentage of transit trips (m) since it has the best accessibility to the transit network (Figure 2(b)). West has a slightly more percentage of biking trips (e), but the overall spatial heterogeneity is less noticeable than in other modes.

Figure 5.

Results of mode percentages $M_{m}$ in the current environment (first column) and the changes $Δ M_{m}$ before and after implementing the scenarios (last three columns).

In the three scenarios, areas around the densified neighborhood in the Valley show different levels of decreased vehicle mode percentage and increased use of non-vehicle modes. An unexpected finding is the opposite changes in the Central region, such as decreased $M_{w a l k i n g}$ and increased $M_{v e h i c l e}$ , especially in Scenario 2 and Scenario 3 (c) (d) (k) (l). The reason for this happening in the model is that, for Central, activity densities become considerably higher within the driving range than within the walkable range, which drives up vehicle utilization. Another interpretation is that the densified Valley becomes the new city center that attracts trips from all surrounding areas. For the areas that are not within walkable or bikeable distance, such as Central, it can cause decreased walking and biking trips and increased vehicle and transit trips.

Figure 6 shows the results of $D_{m}$ and their changes $Δ D_{m}$ . The possible reasons for a decreased $D_{m}$ include new destinations within shorter distances or faster routes to original destinations. On the contrary, the possible reasons for an increased $D_{m}$ include more destinations within longer distances or slower speeds for original trips. An interesting finding is that, as shown in (k) (l), $D_{v e h i c l e}$ has mostly increased for all influenced areas.

Figure 6.

Results of duration per trip $D_{m}$ and the changes $Δ D_{m}$ before and after implementing the scenarios.

Lastly, Table 7 summarizes statistics about all

Δ M_{m}

and

Δ D_{m}

. For the modal split, the maximum

Δ M_{w a l k}

is + 22.47% and the maximum (absolute)

Δ M_{v e h i c l e}

is −25.88%, which is a considerable modal shift for one CBG. The transit mode has a less noticeable

Δ M_{t r a n s i t}

with a maximum value of +4.66%. Considering that only 1.9% of trips in the original travel survey used transit mode (Table 1), the magnitude of

Δ M_{t r a n s i t}

is also significant. Regarding the estimated duration per trip, the largest change happens in

Δ D_{t r a n s i t}

which has a maximum value of +20 minutes. Note that for both metrics, the mean value deltas are relatively small because they are averaged out by the results of areas that are less influenced by scenarios. Overall, since the case study only analyzes the morning commute trips, these quantified changes can be significant if extrapolated to all types of trips and to a time span of an entire day or year.

Table 7.

Mean and maximum $Δ M_{m}$ and $Δ D_{m}$ of all CBGs that are influenced by the scenarios (i.e., all CBGs with non-zero $Δ M_{m}$ ).

		Walk		Bike		Vehicle		Transit
		mean	max	mean	max	mean	max	mean	max
$Δ M_{m}$ (%)	Scenario 1	+0.31	+5.60	+0.02	+0.91	−0.38	−6.22	+0.04	+0.75
	Scenario 2	+1.52	+21.09	−0.05	+1.20	−1.88	−23.90	+0.41	+3.33
	Scenario 3	+1.50	+22.47	−0.05	+1.20	−1.89	−25.88	+0.45	+4.66
$Δ D_{m}$ (minute)	Scenario 1	+0.04	+0.90	+0.04	+2.49	+0.04	+1.28	−0.08	+12.25
	Scenario 2	+0.23	+3.30	+0.07	+7.67	+0.83	+5.80	−0.29	+19.12
	Scenario 3	+0.20	+3.00	−0.02	+7.67	+0.74	+6.22	−0.50	+20.00

Discussion

This paper introduces a data-driven joint choice model of travel mode and duration for quantifying the mobility impacts of changes in the built environment. We introduce a novel set of ACC features that are comprehensive yet straightforward to use in parametrizing the density distribution of urban services on different scales of the transportation network. Various types of urban design interventions can be easily converted into changes in ACC features. For example, network changes such as adding new transit lines can increase the ACC of transit in nearby neighborhoods. Program allocations such as new commercial districts can lead to changes in ACC of retail, shopping, and food services in nearby neighborhoods. Further, the ACC features allow the model to capture the mobility impacts not only within walkable distances but also in distant areas. This leads to some interesting findings in the case study which proves that the mobility impacts of certain built environment scenarios may cover a rather wide spatial range.

Another contribution of our method is to model the travel mode-duration as a joint choice and solve it using a machine learning classifier. This approach significantly simplifies the predictive architecture and brings ease in evaluating more integrated mobility metrics. Not only the modal split can be directly computed from the choice probabilities but also the travel duration by mode can be properly approximated by taking expectations for each duration level. The latter metric is challenging to compute and requires more complex simulation approaches. Previous studies about the joint choice models have only analyzed mode-distance choices with coarse discretized levels (Ding et al., 2014; Vega and Reynolds-Feighan, 2009), none of which used learning-based models. Trip distance in the travel survey data is usually post-processed based on the shortest path between the origin and the destination regardless of travel mode. Therefore, the duration can be a more reliable measurement of trip length, especially for modes like public transit that are prone to significant deviations from the shortest path.

Through a sensitivity analysis and a proof-of-concept case study, we find that a dense, mixed-use environment with good coverage of the multi-modal mobility network can significantly promote active transportation and public transit use. However, we also find that an ultra-dense centralized development could lead to increased travel time and increased vehicle use in some surrounding areas. These results and findings facilitate a deeper understanding of the accessibility-oriented development (Aston and Levinson, 2021) and the importance of the mixed-use neighborhood. Our model rewards a higher level of accessibility to different urban services and employment opportunities. Contrarily, neither increasing the POP feature alone nor increasing the ACC features of any single industry could notably change the prediction outcome. Our model also captures that urban density is a complex factor that comes with both benefits and trade-offs in mobility performance. Numerous researchers (Ewing and Cervero, 2010; Yang et al., 2021) have linked urban density with travel behavior and sustainable mobility. Recent planning strategies, such as “15-Minute City,” advocate for a well-distributed “optimal density” that allows for a sustainable and smart growth (Moreno et al., 2021). However, hardly any existing research has offered an approach to quantitatively define, measure, and analyze the “optimal density.” The two distinctively defined metrics in this paper, $M_{m}$ and $D_{m}$ , provide opportunities to advance the understanding of this concept by allowing various perspectives in quantifying the benefits and drawbacks of density and use mix configurations in cities.

Our approach is limited by the potential urban mobility factors that are not captured by the POP and ACC features, such as microclimate, green space, and transit schedules. These factors can be transformed into new features and added to the training data. They can also be factored into the existing ACC features. For example, parks and green spaces can be used to adjust the number of jobs in the recreational category (REC) to better capture leisure activity densities. Our study is also limited by the survey region, precision level, frequency, and timing of the NHTS data. The trained choice model is only valid for the surveyed region (e.g., LA City in this paper). The spatial resolution of the analysis (e.g., CBG) is constrained by the precision level of the reported trip locations. The potential behavioral changes in the population, such as declined transit ridership caused by the COVID-19 pandemic (Liu et al., 2020), can only be captured and analyzed if new survey data is collected every few years. However, the NHTS data are the only data that provides comprehensive daily travel data nationwide and therefore is a good data source that allows others to reproduce models for their regions of interest. Other popular data sources, including passively generated travel data from GPS or social media, can, in theory, be used to replace or augment the NHTS but these datasets often fall short of collecting crucial information such as socio-demographics, activity, and mode.

Conclusion

We propose a joint choice model of travel mode-duration for quantifying the impacts of the built environment on mobility. The model is formulated as a Random Forest classifier using a novel set of ACC features as predictor features. Our approach is mainly characterized by its simple architecture and generalizability. The data-driven nature allows the method to be deployable to any city around the globe provided that the local data is available for deriving the training data. The analysis metrics facilitate a fast and effective mobility assessment of built environment scenarios from different perspectives regarding aggregate model split and average travel durations. Overall, the results and method in this paper can benefit a better understanding and execution of mobility-aware urban design and planning, which can potentially lead to substantial economic, environmental, and societal gains.

Supplemental Material

Supplemental Material - Assessing impacts of the built environment on mobility: A joint choice model of travel mode and duration

Supplemental Material for Assessing impacts of the built environment on mobility: A joint choice model of travel mode and duration by Yang Yang, Samitha Samaranayake and Timur Dogan in Environment and Planning B: Urban Analytics and City Science

Footnotes

Acknowledgements

We thank the California Department of Transportation (Caltrans) for providing geocoded NHTS travel survey data.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The Center for Transportation, Environment, and Community Health (CTECH) and Federal Government CFDA #: 20.701 (69A3551747119).

ORCID iDs

Yang Yang

Timur Dogan

Supplemental Material

Supplemental material for this article is available online.

Yang Yang is a Ph.D. student in Systems Engineering at Cornell University and a research assistant in the Environmental Systems Lab. Yang holds a master’s in architectural design (M.S.AAD) with distinction from Cornell University, a master’s in architecture, and a bachelor’s in engineering with distinction from Tongji University in Shanghai. Her primary research interest is in data-driven urban analytics, urban mobility simulation, parametric urban design, and computational design workflow and software development.

Samitha Samaranayake is an Assistant Professor in the School of Civil and Environmental Engineering and a Graduate Field Faculty in the School of Operations Research and Information Engineering, the Center for Applied Math, and the Systems Engineering Program at Cornell. He holds a Ph.D. in Systems Engineering from the University of California, Berkeley, Master’s degrees in Management Science and Engineering from Stanford and Electrical Engineering and Computer Science from MIT, and a Bachelor’s in Computer Science from MIT. His primary research interest is in mathematical modeling and algorithm design for large-scale transportation network problems.

Timur Dogan is an Associate Professor in the Department of Architecture, the director of the Environmental Systems Lab, a field member of the Department of City and Regional Planning, the School of Civil and Environmental Engineering, and the Systems Engineering program at Cornell, as well as a faculty fellow at the Cornell Atkinson Center for Sustainability. Dogan holds a Ph.D. from MIT, a master’s in design studies from Harvard GSD, and a Dipl. Ing. in architecture with distinction from the Technical University Darmstadt. His primary research interest is in daylighting, energy modeling, passive climate control strategies, and performance-driven design workflows on both urban and architectural scales.

References

Aston

Currie

Delbosc

, et al. (2021) Exploring built environment impacts on transit use–an updated meta-analysis. Transportation Review 41: 73–96. DOI: 10.1080/01441647.2020.1806941

Aston

Levinson

(2021) Accessibility-oriented planning: why and how to make the switch. Institute of Transportation Engineers Journal 91: 25–29.

Berhie

Haq

(2017) Land use and transport mode choices: space syntax analysis of American cities. Enquiry Journal Architecture Research 14: 1–22. DOI: 10.17831/enq:arcc.v14i1.429

Caltrans (2017). 2017 National Household Travel Survey California Add-on [WWW Document]. https://nhts.dot.ca.gov/

Cervero

Kockelman

(1997) Travel demand and the 3Ds: density, diversity, and design. Transportation Research Part D: Transport and Environment 2: 199–219. DOI: 10.1016/S1361-9209(97)00009-6

Cheng

Chen

De Vos

, et al. (2019) Applying a random forest method approach to model travel mode choice behavior. Travel Behaviour and Society 14: 1–10. DOI: 10.1016/j.tbs.2018.09.002

De Vos

Cheng

Kamruzzaman

, et al. (2021) The indirect effect of the built environment on travel mode choice: a focus on recent movers. Journal of Transport Geography 91: 102983. DOI: 10.1016/j.jtrangeo.2021.102983

Ding

Wang

Liu

, et al. (2017) Exploring the influence of built environment on travel mode choice considering the mediating effects of car ownership and travel distance. Transportation Research Part A: Policy and Practice 100: 65–80. DOI: 10.1016/j.tra.2017.04.008

Ding

Wang

Tang

, et al. (2018) Joint analysis of the spatial impacts of built environment on car ownership and travel mode choice. Transportation Research Part D: Transport and Environment, Special Issue on Traffic Modeling for Low-Emission Transport 60: 28–40. DOI: 10.1016/j.trd.2016.08.004

10.

Ding

Xie

Wang

, et al. (2014) Modeling the joint choice decisions on urban shopping destination and travel-to-shop mode: a comparative study of different structures. Discrete Dynamics in Nature and Society 2014: e492307. DOI: 10.1155/2014/492307

11.

Drchal

Čertický

Jakob

(2019) Data-driven activity scheduler for agent-based mobility models. Transportation research. Part C, Emerging technologies 98: 370–390. DOI: 10.1016/j.trc.2018.12.002

12.

Ewing

Cervero

(2010) Travel and the built environment: a meta-analysis. Journal of the American Planning Association 76: 265–294. DOI: 10.1080/01944361003766766

13.

Gan

Yang

Zeng

, et al. (2021) Associations between built environment, perceived walkability/bikeability and metro transfer patterns. Transportation Research Part A: Policy and Practice 153: 171–187. DOI: 10.1016/j.tra.2021.09.007

14.

Itinero (2019) Itinero [WWW Document]. https://www.itinero.tech/#

15.

Lee

Jeong

Kim

(2016) Impact of individual traits, urban form, and urban character on selecting cars as transportation mode using the hierarchical generalized linear model. Journal of Asian Architecture and Building Engineering 15: 223–230. DOI: 10.3130/jaabe.15.223

16.

Litman

Todd

Victoria Transport Policy Institute (2022) Land Use Impacts on Transport: How Land Use Factors Affect Travel Behavior. TRID Database. Available at: https://trid.trb.org/view.aspx?id=1157840.

17.

Liu

Miller

Scheff

(2020) The impacts of COVID-19 pandemic on public transit demand in the United States. Plos One 15: e0242476. DOI: 10.1371/journal.pone.0242476

18.

Yang

Ding

, et al. (2018) joint analysis of the commuting departure time and travel mode choice: role of the built environment. Journal of Advanced Transportation 2018: e4540832. DOI: 10.1155/2018/4540832

19.

Moreno

Allam

Chabaud

, et al. (2021) Introducing the “15-minute city”: sustainability, resilience and place identity in future post-pandemic cities. Smart Cities 4: 93–111. DOI: 10.3390/smartcities4010006

20.

OpenStreetMap (2017) Planet dump. [WWW Document]. https://planet.osm.orghttps://www.openstreetmap.org

21.

Rupprecht Consult (2019) Guidelines for Developing and Implementing a Sustainable Urban Mobility Plan. Second Edition. Cologne: Rupprecht Consult.

22.

TransitFeeds (2015) OpenMobilityData [WWW Document]. https://transitfeeds.com/

23.

United Nations (2018) World Urbanization Prospects: The 2018 Revision. Geneva, Switzerland: United Nations.

24.

US Census Bureau (2018) 2018 American Community Survey (ACS) 5-year Estimates [WWW Document]. Httpsdatacensusgov Cedsci. Suitland, MD: US Census Bureau.

25.

US Census Bureau (2017) LEHD Origin-Destination Employment Statistics (LODES) [WWW Document]. https://lehd.ces.census.gov/data/

26.

Vega

Reynolds-Feighan

(2009) A methodological framework for the study of residential location and travel-to-work mode choice under central and suburban employment destination patterns. ransportation Research Part A: Policy and Practice 43: 401–419. DOI: 10.1016/j.tra.2008.11.011

27.

Wang

Zhou

(2017) The built environment and travel behavior in urban China: a literature review. Transportation Research Part D: Transport and Environment, Land use and Transportation in China 52: 574–585. DOI: 10.1016/j.trd.2016.10.031

28.

Yang

Kral

, et al. (2021) Urban design attributes and resilience: COVID-19 evidence from New York City. Building Cities 2: 618–636. DOI: 10.5334/bc.130

29.

Yang

Samaranayake

Dogan

(2023) A clustering-based approach to quantifying socio-demographic impacts on urban mobility patterns. Environment and Planning B-Urban Analytics and City Science.

30.

Yin

Wang

Shao

, et al. (2022) Exploring the relationship between built environment and commuting mode choice: longitudinal evidence from China. International Journal of Environmental Research and Public Health 19: 14149. DOI: 10.3390/ijerph192114149

31.

Zhang

Zhao

(2022) Machine learning approach for spatial modeling of ridesourcing demand. Journal of Transport Geography 100: 103310. DOI: 10.1016/j.jtrangeo.2022.103310

32.

Zhao

Yan

, et al. (2020) Prediction and behavioral analysis of travel mode choice: A comparison of machine learning and logit models. Travel behaviour and society 20: 22–35. DOI: 10.1016/j.tbs.2020.02.003

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.19 MB