Abstract
Studies on the jobs–housing balance and self-containment of employment are mainly focused on observed journey-to-work trips using travel survey data. This study examines the relationship between the jobs–housing balance and the self-containment of employment through the use of mobile phone location data in Shenzhen, a megacity in southern China. Individual-level journey-to-work trips are explored based on mobile phone location big data. Self-containment of employment in the suburban districts is higher than that in the central districts. The effect of the jobs–housing balance on self-containment of employment is examined at a 2 km grid level. Jobs–housing balance policies positively affect the self-containment of employment in the suburban districts, but its effect is limited in the central districts. Two extreme commuting spectrum measures are used to analyze self-containment of employment in different journey-to-work scenarios with the same jobs–housing distribution. Workers are disaggregated into secondary and tertiary sector workers according to job types. The self-containment of employment is found to be mainly affected by the local jobs–housing balance for secondary-sector workers and the regional city level job distribution for tertiary-sector workers. The extreme scenarios of commuting behavior using the commuting spectrum method can provide benchmarks that can help to understand the observed self-containment of employment better.
Keywords
Introduction
The spatial distribution of the residences and workplaces of workers determines the demand for journey-to-work trips. The jobs–housing balance, excess commuting (EC) (Ma and Banister, 2007), and the jobs–housing mismatch have been utilized to estimate and examine commuting efficiency from different perspectives. However, only a limited number of researchers have examined urban commuting from the perspective of the self-containment of employment (SCE) (Cervero, 1995; Yigitcanlar et al., 2007). SCE refers to the percentage of workers who work locally out of the total number of workers residing in a particular zone (Cervero, 1989). An appropriate jobs–housing ratio could help achieve a better jobs–housing balance, and the extent to which such a balance is realized could be measured by SCE (Cervero, 1995; Ewing et al., 2001). Many studies have shown that high SCE encourages non-motorized travel modes that are environmentally friendly (Ewing et al., 2001) and reduces vehicle miles traveled (Greenwald, 2006).
The jobs–housing balance can be achieved by putting residences closer to workplaces and vice versa and is usually measured by the jobs–housing ratio, which is the number of jobs divided by the number of residents. Using data from 26 US cities, Horner and Murray (2002) found that commuting distances are closely correlated with the jobs–housing ratio at the city level. In contrast, Giuliano and Small (1993) used journey-to-work data in Los Angeles and concluded that jobs–housing balance policies exert only a minor influence on commuting distances. An empirical analysis of the correlation between the jobs–housing ratio and SCE in 23 San Francisco cities showed that the jobs–housing ratio and SCE have minimal association at the city level (Cervero, 1996). A significant relationship exists between the jobs–housing ratio and commuting but only when jobs and housing are mismatched at the regional level (Peng, 1997). Previous studies on the effect of the jobs–housing balance on the SCE were mostly based on travel survey data, and the spatial analysis unit was the traffic analysis zone (TAZ) of different sizes and shapes. Travel survey data are not only prone to errors for reliance on respondent-reporting, but also limited by relatively small sample sizes. With mobile phone location big data, individual-level work trip data with finer location and time information and larger samples can provide more detailed information on commuting patterns and the SCE compared to what can be obtained from travel surveys based on TAZ.
Previous studies on the effect of the jobs–housing balance on the SCE have mainly focused on observed journey-to-work trips, and have not considered other journey-to-work trip possibilities when workers’ commuting behavior changes. With a given distribution of workers and jobs, journey-to-work trips present numerous possibilities, and the SCE varies. This study employs a commuting spectrum method to consider different journey-to-work possibilities when commuting behavior changes (Yang and Ferreira, 2008). The extreme scenarios of commuting behavior using the commuting spectrum method can provide benchmarks that can help to understand the observed SCE better. Two extreme scenarios of the commuting spectrum are helpful in understanding the relationship between the jobs–housing balance and SCE and have been employed in this study, which examines the effect of the jobs–housing balance on SCE in Shenzhen, a mega city in southern China, using mobile phone location data.
Exploring journey-to-work trips from mobile phone location data
In traditional transport studies, journey-to-work patterns are usually derived from travel survey data (Wang and Cheng, 2001), which are not only prone to errors because of their reliance on respondent reporting but also limited and partial because of the relatively small sample size (Chen et al., 2011). Massive spatio-temporal mobility data, such as smart card data, mobile phone location data, and floating car data, are now available (Batty, 2012). Smart card data have been used to analyze journey-to-work patterns (Zhou and Long, 2016). However, journey-to-work patterns explored from smart card data are only limited to those who commute by bus or subway. Mobile phone location big data have provided a new momentum to explore journey-to-work patterns on a large scale (Batty et al., 2012). Gonzalez et al. (2008) studied the trajectories of mobile phone users and found that the mobility patterns of residents present considerable spatio-temporal similarities. The geographic mapping of mobile phone usage at different times of the day represents the intensity of urban activities and evolution through space and time (Ratti et al., 2006). These studies have shown that the daily activities of residents present similar patterns. Mobile phone location data can be utilized to explore journey-to-work patterns that show a considerable degree of spatio-temporal regularity on weekdays.
The commuting spectrum method
Before we discuss the commuting spectrum method, the list of key acronyms used in this paper is listed in Appendix 1.
Any jobs–housing distribution might be associated with a spectrum of commuting trips depending on how workers choose their residences and workplaces. According to the double-constrained gravity model (Wilson, 1970), the journey-to-work flows from zone i to zone j are calculated as follows:
Observed commuting trips represent only one possibility of the commuting spectrum with any jobs–housing distribution. The minimum required commuting (MRC) and proportionally matched commuting (PMC) are two extreme measures of the commuting spectrum.
MRC is derived from the scenario that total commuting cost is minimum at the city level. Linear programming (LP) was proposed by White (1988) to calculate
While MRC represents the lower bound of the commuting spectrum, PMC represents the upper bound (Yang and Ferreira, 2008). After computing the journey-to-work flows among zones, the PMC value at the city level can be computed as follows (Charron, 2007):
Study area and data processing
In this research, the study area is the main built-up area in Shenzhen, excluding the southeastern parts. The central areas in Shenzhen include Nanshan, Futian and Luohu, and the suburbs include Bao’an and Longgang (Figure 1). The central areas are characterized as commercial and residential development, whereas the suburbs are mostly for industrial and residential development. Mobile phone location data on Friday, 23 March 2012, were collected by the largest mobile phone operator in Shenzhen for identifying the residences and workplaces of workers. Building information data were obtained from the Urban Planning Land and Resource Commission of Shenzhen for identifying workers’ residences and workplaces, and their job types.
Study area.
The number of mobile phone users in this dataset is greater than 12.4 million, which represents a large sample of the entire population of around 15 million. The phone numbers of the mobile phone users were made anonymous to protect the privacy of the mobile phone users. The anonymous ID of the mobile phone users, date, time, and coordinates of the mobile phone towers that provided the phone service were actively recorded at approximately 1 hour intervals as long as the phone was active. The mobile phone location data in this study were different from call detail records (CDRs) used in other studies (Gonzalez et al., 2008; Ratti et al., 2006) because the latter were passively recorded only when calls or messages were made. Therefore, the mobile phone location data used in the current study are more representative samples of mobile phone users, unlike the possibly biased CDRs that depend on mobile phone usage. There were 5908 mobile phone towers in the entire city. The location of a mobile phone user was estimated according to the coverage area of the nearest mobile phone tower with an average area of 0.28 square kilometers (standard deviation = 0.58 square kilometers).
Using the analytical framework shown in Figure 2, the residences and workplaces of the workers were identified based on their activity places and building information data. The “place-starting time-duration” model was used to identify the residences and workplaces of the workers (Long and Thill, 2015). The place represents the tower coverage area where the mobile phone user stayed for some activity. The starting time is the time at which the mobile phone stayed in that location and began their activity. The duration represents how long the user stayed in that location undertaking the activity. According to the daily activity habits of Shenzhen residents, mobile phone location records from 0 am to 6 am were selected to identify individuals’ residences based on the condition that they stayed there for not less than 4 hours in total. Mobile phone location records from 8 am to 12 am and from 2 pm to 6 pm were selected to identify the workplaces of mobile phone users based on the condition that they stayed there for not less than 5 hours in total. The residences of the workers were in residential areas, and the workplaces of the workers were in areas of employment. In this manner, the mobile phone location records that were associated with recreation activities during office hours or those reflect users who stay at home all day were eliminated.
Identification of mobile phone users’ residences and workplaces.
The residences of 7.15 million mobile phone users were identified from mobile phone location data. The number of residents and workers was obtained from the Shenzhen Statistical Yearbook 2013 at the district level (Shenzhen Bureau of Statistics, 2013). The number of residents at the 2 km grid level was estimated with building information data according to the proportion of residential floor area. The distribution of residents identified from the mobile phone location data was highly correlated with that from the statistical data (R-squared = 0.81, p-value < 0.001). The residences and workplaces of 2,386,998 workers were identified. Workers whose residences and workplaces within the same tower service area and those with cumulated activity duration less than the threshold during working hours were eliminated, so the number of workers in this study was smaller than the actual number of workers. Similarly, the number of workers at the 2 km grid level was estimated with building information data according to the proportion of floor area used for employment. Similar to the residential areas, the distribution of workers identified from the mobile phone location data had a good correlation with that from the statistical data at the 2 km grid level (R-squared = 0.69, p-value < 0.001).
The mobile phone location data were aggregated to a grid size of 2 km because the average non-motorized travel time is 17 minutes according to Shenzhen travel survey data from 2010 (Urban Planning Land and Resource Commission of Shenzhen Municipality, 2011). The average commuting distance of the non-motorized mode of travel is approximately 1.5–3 km. Thus, workers tend to commute by walking or biking when their commuting distances are within 2 km. Increasing the SCE at the 2 km grid level will encourage workers to select the non-motorized mode of travel and therefore 2 km grid was chosen for the study.
Model formulations
MRC with a disaggregated linear programming model
Used in the disaggregated LP model were four matrices, namely, journey-to-work flow (FLOW), commuting cost (COST), spatial distribution of workers (WORK), and spatial distribution of jobs (JOB). The COST matrix was reflected by the shortest network distance among zonal centroids in the road network. Intra-zonal commuting distances, which were the diagonal matrix values in the COST matrix, were calculated using the following equation (Frost et al., 1998):
The MRC was computed by using the LP model proposed by White (1988). The total commuting cost was at its minimum in the expected scenario. It assumes that workers and jobs are homogenous. However, this assumption does not accurately reflect real-world scenarios. A LP model with undifferentiated workers may yield misleading results because of the heterogeneity of workers and job types. For example, this model may erroneously match a secondary-sector worker to a tertiary-sector job. Few studies have disaggregated workers according to job type. For instance, O'Kelly and Lee (2005) disaggregated workers according to their job types to analyze EC. Thus, a LP model disaggregated according to job types was developed to obtain the MRC for all job types. The multiple objectives were achieved independently to ensure that the total commuting cost is at its minimum. The disaggregated LP model included two steps. First, all workers were matched to local jobs within the zone according to job types. Then, the rest of the workers were assigned to matching jobs in other zones to obtain the MRC in each job type.
PMC with a disaggregated model
The PMC is based on the assumption that the possibility of a worker residing in zone i and working in zone j is proportional to the share of the matched job market in zone j. The journey-to-work flow between zone i and zone j can be calculated as follows:
The PMC value for workers with job type k at the city level can be computed as follows:
The jobs–housing balance and self-containment of employment in Shenzhen
Self-containment of employment
Although knowing each worker’s job type was impossible, the distribution of secondary-sector and tertiary-sector jobs was approximately estimated based on the building floor area at the 2 km grid level. Grid cells where more than 80% of the employment utilize the floor area for industrial use were identified as industrial grid cells, whereas those where more than 80% of the employment utilize the floor area for commercial use were identified as commercial grid cells. The industrial and commercial grid cells were 145 in total, with 77 mixed-type grid cells excluded. Workers whose journey-to-work trips end at industrial and commercial grid cells were considered secondary-sector and tertiary-sector workers, respectively. There were a total of 816,823 workers whose journey-to-work trips ended in mixed-type grid cells, which were excluded. Secondary-sector and tertiary-sector workers who were identified from mobile phone location data accounted for 65.8% of all workers, indicating the large sample of such workers in Shenzhen. The SCE in industrial and commercial grid cells was calculated as the percentage of secondary-sector and tertiary-sector workers working within their residential grid cells out of the total number of secondary-sector and tertiary-sector workers residing in the grid cells (Figure 3).
Self-containment of employment (SCE) at the 2 km grid level.
The average SCE in industrial grid cells was 78.5%, which is significantly higher than commercial grid cells, which was 59.4% (t = 4.31, p-value < 0.001). It was hypothesized that the difference was partly attributed to the Jobs_Workers Ratio (JWR). The JWR in industrial and commercial grid cells was obtained by calculating the number of secondary-sector jobs over the number of secondary-sector workers residing in each grid cell as well as the number of tertiary-sector jobs over the number of tertiary-sector workers residing in each grid cell. Figure 4 shows the JWR at the 2 km grid level, which skews toward the right (skewness = 1.841).
Jobs_Workers Ratio (JWR) at the 2 km grid level.
Ordinary least squares (OLS) was performed to test the effect of the JWR on the SCE. The independent variable was denoted by the logarithmic (natural log) value of the JWR, which positively affected the SCE in industrial grid cells (coefficient = 0.487, R-squared = 0.165, p-value < 0.001). The SCE tended to increase as the JWR increased, as shown in Figure 5. Secondary-sector workers tended to work locally if more jobs were provided. Average housing prices in the suburbs (12.4 thousand RMB) were relatively cheaper compared with that in the central areas (29.7 thousand RMB) (t = −10.0, p-value < 0.001) according to the housing price data obtained from SOFANG website, the leading real estate internet portal in China, which can be accessed at http://sz.fang.com/. Workers in the suburbs were more likely to reside near their workplaces compared with those in the central areas. In addition, some manufacturing firms provided housing for workers. As a result, higher JWR was effective to encourage workers to work locally. The SCE increased as the JWR increased up to a certain point. As the JWR increased further, its effect on the SCE became less significant. The SCE demonstrated a diminishing rate of change as the JWR moved from job-poor to job-rich grid cells. A high JWR was a necessity for a high SCE. The JWR was relatively high for grid cells with a high SCE. Although a high SCE can be attained with a high JWR, this alone is not a guarantee. When the JWR reached a certain point, its effect on the SCE became smaller because of the jobs–housing mismatch and the deficit in available housing.
Relationship between the Jobs_Workers Ratio (JWR) and Self-Containment of Employment (SCE) in industrial grid cells.
The JWR did not significantly affect the SCE in commercial grid cells because tertiary-sector jobs are more specialized than secondary-sector jobs. Tertiary-sector workers usually need to search over a larger area to find suitable employment. A similar number of tertiary-sector workers and jobs did not guarantee a high SCE in a particular zone. In addition, areas with higher JWR had relatively higher housing prices, forcing a portion of tertiary-sector workers to live far away from their workplaces to reduce housing costs if their workplaces were predetermined. Efficient transportation infrastructure in the central areas allowed workers to reside further from their workplaces along transit corridors to save the housing cost.
The commuting spectrum from mobile phone location data
Comparing the actual commuting with the minimum and maximum plausible amount of commuting.
The
The PMC mainly depends on the regional spatial job distribution. The PMC value for an entire city decreases as job distribution changes from a concentrated to a dispersed distribution (Yang and Ferreira, 2008). The PMC value for a zone is calculated as the average commuting cost weighted by journey-to-work flows from this zone to other zones. The classic negative exponential distance decay function of job distribution with a uniform grid is as follows (Wang, 2001):
The
In equation (1), a commuting spectrum with different journey-to-work scenarios can be generated for a given jobs–housing distribution if
In an actual scenario, the
Excess commuting is widely used to describe the deviation between the ACC and the MRC in different cities (Hamilton and Röell, 1982). The EC can be calculated as follows:
According to equation (10), the EC of secondary-sector workers was 30.1%, whereas that of tertiary-sector workers was 31.8%, suggesting that secondary-sector workers had a higher commuting efficiency than tertiary-sector workers.
The commuting potential utilized (COM_u) describes the extent to which the commuting potential was utilized by the workers, with the MRC and PMC acting as the lower and upper bounds. COM_u can be calculated by using equation (11):
A lower value of the MRC corresponds to a more efficient commuting behavior. The COM_u was lower for secondary-sector workers (3.6%) than that for tertiary-sector workers (23.9%). Hence, secondary-sector workers had more efficient commuting behavior than tertiary-sector workers in terms of the commuting cost.
Relationship between observed SCE and expected SCE
The MRC and the PMC are extreme measures of the commuting spectrum. From the workers’ perspectives, the SCE reflects their likelihood of working locally or commuting to other grid cells. When workers tend to work locally or choose their nearest matched jobs, the commuting cost is the MRC and the SCE is at its maximum. When workers randomly choose among matched jobs while disregarding the commuting cost, the commuting cost is the PMC and the SCE is at its minimum.
The disaggregated LP model produced journey-to-work flows for each type of workers in the expected scenario. Figure 6 depicts the expected SCE ( Expected Self-Containment of Employment (Ex_SCE) in industrial and commercial grid cells.
Although the
Figure 7 shows the spatial variations of Re_SCE at the 2 km grid level. The average Re_SCE in industrial grid cells was 79.9%, whereas that in commercial grid cells was 60.8%. Industrial grid cells had a significantly higher average Re_SCE than commercial grid cells (t = 4.81, p-value < 0.001), indicating that, with the Ex_SCE as the benchmark, the former was more self-contained than the latter.
Relative Self-Containment of Employment (Re_SCE) at the 2 km grid level.
The unexpected SCE (
The UnEx_SCE in each grid cell was determined by the distribution of jobs, as shown in Figure 8. Secondary-sector jobs were more dispersed, whereas tertiary-sector jobs were more concentrated. As a result, the UnEx_SCE of secondary-sector workers was lower than that of tertiary-sector workers.
Unexpected Self-Containment of Employment (UnEx_SCE) in industrial and commercial grid cells.
The Ex_SCE and the UnEx_SCE were the upper and lower bounds of the Ob_SCE, which were mainly affected by the local jobs–housing balance and the regional job distribution. The Ob_SCE was between the UnEx_SCE and the Ex_SCE. An OLS model showed that the Ob_SCE was positively correlated with the Ex_SCE (coefficient = 1.21, R-squared = 29.3%, p-value < 0.001), but was not correlated with the UnEx_SCE in industrial grid cells. The Ob_SCE was positively correlated with the UnEx_SCE (coefficient = 1.81, R-squared = 17.5%, p-value < 0.05), but was not correlated with the Ex_SCE in commercial grid cells. Consequently, the SCE of secondary-sector workers was mainly affected by local jobs–housing, whereas that of tertiary-sector workers was mainly affected by the regional job distribution.
The self-containment potential utilized (SCE_u) describes the extent to which the SCE potential was utilized by the workers, with the Ex_SCE and the UnEx_SCE acting as the upper and lower bounds. SCE_u can be calculated by using equation (14):
A higher value of the SCE_u corresponds to a more efficient commuting behavior. The SCE_u reflects the extent to which the
Conclusions
This study utilized mobile phone location big data to examine the effects of the jobs–housing balance on the SCE. Compared with travel survey data, mobile phone location data have a finer spatial resolution and a larger sample size. With mobile phone location big data, this study was able to analyze the SCE at a grid size of 2 km, a scale that will encourage workers to select non-motorized modes of travel if jobs are available. The mobile phone location data also enabled the disaggregation of workers according to job types by matching the location of the workers with building information data. The model can be extended to the disaggregation of workers based on other factors in addition to job types, such as income level or educational attainment, if detailed disaggregated data are available.
The commuting spectrum method has conceptualized the SCE in different journey-to-work scenarios with a given distribution of workers and jobs. The MRC and PMC scenarios characterize both extremes of the commuting spectrum. The expected and unexpected SCE provide the upper and lower bounds for the observed SCE in the two extreme scenarios. It was found that secondary-sector workers are more self-contained than tertiary-sector workers with the expected and unexpected SCE as benchmarks using the commuting spectrum method. The empirical analysis showed that the SCE is mainly affected by the local jobs–housing balance for secondary-sector workers and by the regional city level job distribution for tertiary-sector workers. Both the local jobs–housing balance and the regional city level job distribution must be considered in formulating policies that aim to increase the SCE in reducing commuting distances and associated energy consumption.
The study shows that the jobs–housing balance is a necessary but insufficient condition for a high SCE. The interaction of the jobs–housing balance also depends on the types of industries and the location of the zone and not on the jobs–housing balance alone. The study shows that the jobs–housing balance positively affects the SCE for secondary-sector workers in the suburban districts of Bao’an and Longgang where industries are mainly located because of cheaper land value, but its effect on SCE is limited for tertiary-sector workers in the central area districts of Nanshan, Futian, and Luohu where housing prices are high. Secondary-sector workers with low skills tend to work in nearby factories in the suburban districts where the secondary industries are located. This is because housing prices are lower in the industrial areas in the suburban districts as compared with the commercial areas in the central area districts. Furthermore, some industries provide worker housing near their factories. However, for tertiary-sector workers, the jobs–housing balance has limited effect on the SCE because workers in the tertiary-sector jobs in the central area districts can choose to live in the suburban districts where the house prices are more affordable. Increasing the jobs–housing ratio in commercial areas in the central area districts may not lead to a higher SCE. To increase the SCE in the commercial areas in the central area districts, in addition to having a better jobs–housing balance, housing subsidies may need to be provided to attract tertiary-sector workers in the commercial areas, which have higher house prices. This study is based on a Chinese city, Shenzhen, with many secondary industries in the suburbs, which is different from a Western context. It supplements the existing theory on the jobs–housing balance and the SCE, which is primarily based on Western cities.
One of the limitations of this study is that one-day mobile phone location data may be biased in the identification of the residences and workplaces of workers since workers with residences and workplaces within the same cell towers were excluded. More reliable results will be obtained if one-week data were to be made available. Another limitation is that the disaggregation of workers based on building information data may not be sufficiently accurate. The study focuses on grid cells with primarily secondary-sector workers or tertiary-sector workers with other workers excluded. This might bring some bias to the results. It will be better if actual employment data can be obtained so that workers could be disaggregated more accurately and in more detail.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study is supported by Natural Science Foundation of China (NSFC 41471378, 41671387), Shenzhen Scientific Research and Development Funding Program (CXZZS20150504141623042) and Faculty of Architecture Research Output Prize Award, Chan To-Haan Endowed Professorship Fund, and Distinguished Research Achievement Award of the University of Hong Kong.
