Abstract
The Massachusetts Bay Transportation Authority (MBTA) Bus Network Redesign project included a data-driven, customer-focused approach to creating the high-frequency core of the network. This approach used location-based services data to create an objective, repeatable process for developing this network based on the agency’s priorities. Centering equity and designed to limit human biases, the process algorithmically generated 100,000 possible high-frequency bus networks and scored them based on how much total demand and demand by low-income and minority populations was served by high-quality transit. The result was used to identify the high-frequency core of the bus network that approached the optimal for demand and equity while meeting resource constraints. The approach consisted of organizing the demand data, determining busable streets, generating potential bus corridors, combining those corridors into many sets, and scoring the sets based on how well they serve the region’s travel demand. The process emphasized equity by heavily weighting demand from low-income and minority populations in building and evaluating corridors and networks. Further, by using location-based services data, the approach was focused on where people were traveling, rather than traditional approaches that look to connect concentrations of trip generators and attractors without knowledge of where people are actually going. The result was a core, high-frequency bus network that has informed the decisions and design process of MBTA’s future bus system.
Keywords
The Massachusetts Department of Transportation (MassDOT) and the Massachusetts Bay Transportation Authority (MBTA) along with Cambridge Systematics and Arup undertook a data-driven approach to developing a proposed high-frequency route network as part of the Bus Network Redesign (BNRD) project. This approach was developed to create an objective, repeatable process for analyzing travel data across the Greater Boston region and integrating this with the service planning process. The result was a set of high-frequency networks based only on travel data by different sociodemographic groups, which has the potential to reduce bias and reduce human influence in the network creation process. The high-frequency corridors delivered from this process were then considered as part of the service planning process and other analytical metrics to develop the proposed draft network map. The high-frequency network developed as part of BNRD was tasked to provide buses running every 15 min or better, 20 h/day, 7 days/week.
Originally built from the routes taken by early 20th century trolley lines, the MBTA bus network needed to adapt to the recent rapid growth of the service area, changes in the distribution of jobs and population, and to bridge gaps in equity. The redesign effort aimed to create a modern bus system that addressed these concerns while leveraging rich origin–destination (O-D) location-based services (LBS) data collected from cell phones that considered all travel regardless of mode.
The algorithms at the core of this process began by prioritizing these LBS data comprising approximately 90 million trips made in the MBTA service area (collected from StreetLight Data, an LBS data provider) during October 2019. These algorithms then utilized these trip data to infer optimal O-D connections in the service area, linking them to form 92,000 potential high-frequency corridors that were analyzed and sampled to form 100,000 networks that were evaluated based on an objective function that prioritized equity. The best of these networks were used as direct input for MBTA’s future bus system.
The processes were developed to ensure repeatability with future data sources and to center racial and income equity from initial exploration through to the final design. As a process built from demand data, it is a tool that can be used to create the high-frequency core of a bus network in a region that focuses on serving travel equitably while building ridership.
Methods
Defining the Data Universe
The BNRD’s High-Frequency Corridor development process began with defining the geographic and data universe for the redesign, representing the extents of the algorithms in prioritizing demand and generating corridors and networks.
The geographic universe was defined by the MBTA service area, divided into 831 unique spatial polygons, each roughly 0.5 mi2 in area, representing one or more census block groups. These bus analysis zones (BAZs) (Figure 1) were created for this project, and served as the main geographic reference for aggregating the origins and destinations of trips and prioritizing connections. All trip data used in the redesign were at the BAZ level. The size was selected to approximate a walkable area from a bus stop, insomuch that any bus stop within a BAZ could serve the entire BAZ.

Sample bus analysis zones (BAZs).
The data universe was created using StreetLight LBS data at the BAZ level. The original dataset procured by MassDOT had about 180,000 BAZ-to-BAZ O-D connections representing nearly 90 million trips during October 2019 including all travel modes (transit, vehicle, pedestrian, and bicycle trips). These data were aggregated and cleaned so that each row of data represented bidirectional travel volumes between two BAZs at a weekly timescale. The data were then cleaned to include a set of variables necessary for analysis, including,
Total weekly trips by direction: Two variables were created to save trip counts by direction for each unique ID. Each column represented the total trip count from origin BAZ to destination BAZ and from destination BAZ to origin BAZ, respectively. These variables illustrated the flow of trips between BAZs. Each connection in this dataset included total trips, trips taken by individuals who are racial or ethnic minorities or from a low-income household. These equity determinations were made by correcting StreetLight- with census data to match MBTA’s Title VI Policy population definitions. This method of identifying equity trips captures all the travel made by low-income and minority populations, regardless of the starting/ending point of a trip.
Supplementary data: Once the trip data were cleaned, supplementary data were attached for each unique BAZ connection ID. This included a neighboring BAZ flag (1 if the BAZ connection was adjacent and 0 if not), BAZ-to-BAZ centroid straight-line distance in miles, name labels for the origin and destination BAZs, and a flag indicating any overlap with rapid transit. The last of these indicated whether a connection could be made as an MBTA rapid transit one-seat ride, a commuter rail one-seat ride, a rapid transit two-seat ride, or on a combination of rapid transit and commuter rail.
After adding these variables, strategic filters were applied to these data to create a smaller universe of O-D pairs known as “serviceable connections” that identified the demand that a network of high-frequency corridors would aim to serve. The filters (Figure 2) included physical attributes and a volume threshold. Physical attribute filters ensured that connections were not redundant with one-seat rapid transit (trips that could be completed on rapid transit with no transfers) and that they were within a reasonable distance to be served by a bus (determined to be 10 mi or less). Trips within a single BAZ or between two adjacent BAZs were filtered because they were deemed walkable/bikeable. The basic volume filter ensured that connections served could support at least one bus a day throughout the week in both directions. This equated to 560 weekly trips per pair, assuming a bus has a capacity of 40 persons with all-week service in both directions: (40 [bus capacity] × 1 [number of buses] × 7 [days a week service] × 2 [directions] = 560).

Filters used to create “serviceable connections.”
After all filters were applied, the universe of serviceable connections contained 13,329 BAZ-to-BAZ connections representing approximately 24 million trips from the original dataset. A breakdown of the effects of each filter is shown in Table 1.
Data Filtering Process
Note: O-D = origin–destination; BAZ = bus analysis zone.
Determining Busable Roadways
The Boston-area roadway network is complex, and includes many streets that buses cannot navigate owing to street width, grade, or other constraints. Additionally, many adjacent BAZs are not connected to each other by busable roadways because of barriers such as water features or railroad corridors. This step cleaned the geographic data to identify busable pathways to get from each origin BAZ to each destination BAZ.
There were two inputs for this analysis: a “Busable Streets” shapefile created by MassDOT and a shapefile of the existing MBTA bus routes ( 1 ). The Busable Streets layer was filtered to remove any roads that buses could not navigate in winter owing to steep grades. It was also assumed that any street with a current bus route on it was busable.
These two shapefiles were then combined to create a routable network of busable streets that was ultimately used to map corridors onto streets (see the section titled “BAZ Corridors to Routed Paths”). A script was developed in FME (Feature Manipulation Engine) that combined the layers into a complete network of busable streets. The result was a network of roads that a bus can traverse, although it did not account for one-way streets or unusual traffic situations.
This file was then connected with the BAZ shapefile to identify whether or not adjacent BAZs were connected by busable streets, and these connections were tagged to create a busable adjacencies data file. A busable adjacency occurs when there is a busable street across the edge of two adjacent BAZ boundaries, which is necessary for a bus to travel between two adjacent BAZs. The process of finding adjacent BAZs with busable streets across boundaries was automated with an FME script owing to the complex nature and size of the input datasets. The output was a list of busable adjacencies that was transformed into a web of all possible BAZ-to-BAZ connections (Figure 3).

Web of busable adjacencies.
Generating Corridors
Once the busable adjacencies were identified, the next step was to create and evaluate potential high-frequency bus corridors for each connection in the dataset. For the purposes of this process, a corridor was defined as an ordered string of BAZs linking an origin and destination BAZ. Routes, on the other hand, are corridors routed to roads with more precise distances, resource levels, and terminals. In a dense network like the Boston region, there are many potential paths that could connect each O-D pair, and no guarantee that the shortest or most direct path will actually make the best bus route. This is because the demand served by a corridor is not only the demand between the origin and destination BAZs, but also includes the demand between intermediate BAZ pairs. Some diversion from a straight line may actually be beneficial in bus routing.
This issue was addressed by creating many corridors between each BAZ pair where possible, and identifying the options that most efficiently met MassDOT’s/MBTA’s goals for further consideration in network building.
K Shortest Path
Because the single shortest path between two end points is not necessarily the best path for a bus corridor, multiple paths were created for each connection that could be evaluated against the goals of the BNRD. This was done using a “k shortest path” methodology that produced 20 possible strings of BAZs for each of the nearly 700,000 connections possible in the network.
The k shortest path methodology used in this process was based on Yen’s algorithm ( 2 ), which relies on a series of loops to calculate the shortest path, store it, and then calculate all other potential shortest paths with a limitation of k, in this case, 20. The process modified the algorithm to recognize a path already taken, making sure that each subsequent path was unique.
Using Yen’s algorithm, an embedded “for-loop” navigated a matrix of all 700,000 BAZ-to-BAZ connections in the study area and generated 20 pathways between each origin and destination built on the busable-connections web identified in the previous step (see the section titled “Determining Busable Streets”). The output for this process was stored in a dataset that had a source BAZ, an end BAZ, a k indicator tag (1 to 20), and a list of pathway BAZs used to get from origin to destination, defining the corridor itself. The corridor generation process developed a universe of about 14 million corridors between the 700,000 BAZ-to-BAZ pairs in the geographic universe.
The method relied on the igraph package ( 3 ) in R to convert adjacencies to a network graph that could be navigated to create these potential pathways.
Cleaning Corridor Data
The corridor creation process generated about 14 million corridors representing 20 connections between every BAZ pair in the network. Using this many corridors as inputs to a network building process would be infeasible from a time and resources perspective. Further, since the best high-frequency networks would likely be comprised of the best high-frequency corridors, this set of 14 million corridors was filtered and prioritized to result in a more manageable set of corridors to use in an efficient network building process.
The cleaning process removed the following:
Corridors that were inverses of another corridor, to make sure origin and destination BAZ pathways were not duplicated. For example, if a corridor from BAZ 1 to BAZ 2 was already present, the corridor from BAZ 2 to 1 that uses the same sequence of BAZs in its path was filtered from the dataset.
Corridors that did not reach their destination. Only corridors that were complete (i.e., that navigated a full pathway from an origin BAZ to a destination BAZ) were represented in the data.
This cleaning process, while simple, removed half the corridors from the dataset, largely resulting from inverse removal, bringing the total number down to under 7 million corridors.
After cleaning, data were attached to the corridors, including
Trip volume: Both from the serviceable connections (discussed in section titled “Defining the Data Universe”) and from the full StreetLight dataset (to ensure all trips were captured), including separate volumes for minority and low-income populations. Volume, in this case, represented all travel captured by all BAZ pairs along the corridor.
Distance: Routed distance in miles (based on roadway distance).
Number of observations: The number of BAZs in each corridor’s pathway.
The next step in this process filtered the list of nearly 7 million corridors to further reduce the universe of potential corridors to the most promising. The filters used at this phase included the follwoing:
Length/physical characteristics: This filter ensured that each corridor was an appropriate length and shape for bus routes: ○ The corridor length was limited to a routed distance between 2 and 15 mi; ○ The corridor circuity was limited to a value of less than 2. The circuity was calculated for each corridor by dividing the BAZ centroid-to-centroid distance of each pathway by the straight-line distance between the centroids of the origin and destination BAZs; and ○ The corridors were also limited to those that had at least 3 BAZs in their pathway.
Rapid transit overlap: This filter removed corridors that were significantly overlapped by any individual rapid transit line in the MBTA system. Any corridor with greater than 75% overlap was removed from the final dataset.
Significant volume (K filter): For each O-D connection, this filter identified the two corridors that most successfully achieved the service objectives previously defined by MassDOT/MBTA. To quantify the objectives of serving total, minority, and low-income demand, with more weight on the latter two, a new objective function was created, which quantified how well each corridor met those service objectives, as defined in Equation 1. The weights were developed for this study with the intention of emphasizing equity within the process.
All demand values in this calculation draw on the cleaned serviceable connections demand dataset. Out of the 20 corridors developed for each O-D connection, which comprised the full dataset of 14 million corridors, the two corridors with the highest demand* value were retained. This filter identified not only highly traveled corridors, but also corridors with greater value to low-income and minority populations. This resulted in a maximum of two corridors for each O-D connection.
Volume/mile: Because demand varies significantly across the region and resources are always limited, an additional filter was applied to prioritize the most resource-efficient corridors. The process calculated the trips served per mile, and retained the top third of corridors by total volume/mile from the serviceable connections demand dataset.
Table 2 shows the impact of each filter in the process.
Impact of Each Filter in the Corridor Cleaning Process
Note: BAZ = bus analysis zone.
The final set of 92,008 corridors was carried forward for network building, approximately 1.3% of the 14 million corridors originally created. This process does have the limitation of potentially removing “useful candidate” corridors, but given the resource limitations, these filters do create a large universe in which to create networks that cover a significant portion of demand (see the results section).
BAZ Corridors to Routed Paths
At this point, all corridors followed the web of connections linking BAZ centroids to one another based on the busable BAZ adjacencies. The analysis of busable adjacencies only identified whether there was a busable roadway connection between two BAZs, it did not identify the preferred roadway for making that connection (when multiple options existed). The next step in the process was to map each of the corridors from the BAZ pathways to specific roadways. This process was automated using FME.
Routing in FME requires a network topology, which is a cohesive line network and not a set of polylines. A routable network typology processed the busable streets network so that it could be used in identifying the best routes for buses to take to get from BAZ to BAZ. By including attributes such as average speed for each roadway segment based on the roadway type, certain roads can be prioritized in corridor routing. Polylines of the roadway network were input into Topology Builder, an FME transformer that creates a topology that can be used for future routing analysis by outputting a network of edges.
To route corridors onto the roadway network typology, the process used the shortest path tool in FME to link a series of points along each corridor. A standard application of the shortest path tool would only find the shortest path between the origin and destination BAZs, and not follow the BAZ path identified for each corridor. For this application, the process had to identify points in each BAZ that had to be served and use these points to create a polyline representing each corridor by linking multiple shortest path connections. For each corridor, the series of required points included the two end points, and at least one point within each of the BAZs along the way. An example of how these points were identified—for corridor A–C–D–E–B using an automated process—includes the following steps:
Identify the point on the roadway typology that is closest to the centroid of the starting BAZ using the FME software, BAZ A in this case;
Identify the point on the busable streets network in BAZ C closest to the center point in BAZ A;
Identify the point on the busable streets network in BAZ D closest to the identified point in BAZ C;
Identify the closest point in BAZ E to the identified point in BAZ D; and
The line ends at the point on the roadway network closest to the centroid of BAZ B.
This automated process was looped for each of the 92,000 potential corridors. Once the noded polylines were created for each corridor, an on-street routing from the origin point to the destination point was developed using the FME shortest path tool. The shortest path algorithm was set to identify the shortest path based on the average travel times calculated for each link in the network.
The output from this process is a set of 92,000 routed paths that represent each corridor’s route along the actual busable road network. Finalizing bus routing still requires manual review and design from service planners to identify considerations other than shortest paths, such as important locations to serve and the distribution of service with BAZs.
Defining Alternatives and Sampling Networks
Selecting the corridors from the list of 92,000 routed corridors that perform best individually would not necessarily result in a network that equitably and efficiently provides high-frequency service across Greater Boston. Therefore, the next step is to move from individual corridors to sets of corridors that comprise high-frequency networks. Potential networks were evaluated to identify the set of corridors that served the most demand* while fitting within a resource constraint. The number of possible combinations that could be created from the list of routed corridors is too high to evaluate every possible combination and guarantee that the truly optimal network is identified. Thus, a sampling process was used to create many networks with the objective of finding a relatively small number of high-scoring networks that would approximate the optimal solution.
Alternatives Definitions
For the purposes of the BNRD, MassDOT and MBTA wanted to consider alternatives that devoted different levels of resources to the high-frequency bus routes. An alternative was defined as a network fitting a resource level. Three alternatives were created based on dedicating different shares of total peak bus-hour resources to high-frequency service: high (80%), medium (60%), and low (40%). The network alternatives were designed to be “nested,” such that each of the high-frequency corridors included in the low alternative would be included in the medium alternative, and all the high-frequency corridors included in the medium alternative would be included in the high alternative.
Sampling Networks
To create potential networks, a sampling process was implemented that selected from the list of routed corridors and grouped them to create many possible networks quickly. The process incorporated several variables and assumptions, including,
Resource constraints: For each routed corridor, an estimate of the resources needed to provide high-frequency service during the peak period was calculated. These resources were summed and tracked to stay within each alternative’s total resource constraint. The resources for a corridor were calculated as shown in Equation 2.
Most of the assumptions were held constant for all corridors, as follows:
○ # of directions = 2
○ Buses/h = 10 per direction
○ Hours in peak period = 2
○ Layover factor = 1.233
○ Average speed = 9.7 mph
• Corridor weights: Because it is not possible to test and evaluate every combination of the 92,000 identified corridors as a network, each corridor was weighted so that corridors that were better at achieving system design objectives were more likely to be incorporated into the tested networks. Corridor weights were based primarily on the demand* variable to emphasize equity and serving demand during this process. Corridors with a connection to a BAZ with a rapid transit station were given a 60% weight increase, in an effort to account for the additional demand that a route might serve with bus to rapid transit two-seat rides.
• Overlap among corridors: While building networks, the overlap between corridors was calculated to avoid creating networks that inefficiently used resources by serving the same demand using multiple corridors. Once a corridor was included in a network, no corridors that served more than 66% of the same BAZs were allowed to be added to that network. This filter allowed for expansive networks while allowing certain core locations to be served by multiple corridors.
Sampling Loop
For the sampling process, a “while-loop” was developed to perform a random weighted sampling (without replacement) of corridors while checking for overlap between each new corridor and all corridors in a sampled network. A network was deemed to be complete when the resource constraint for an alternative was met. Once the loop finished one network, it was saved and the loop moved onto creating another network until it sampled a user-defined number of networks.
The more networks that are created, the more likely the process is to create a network that approaches the performance of the optimal network. To balance the desire to create many networks with time and computing constraints during the planning process, 100,000 networks were created across all three alternatives, using three sampling processes. Each alternative was a subset of the higher resource alternative (e.g., the 40% alternative was a subset of the 60% alternative). This required the 80% alternatives to be run first, the 40% alternatives next (to create the core), and the 60% alternatives last.
The 80% alternative was run first and was able to sample from all 92,000 corridors, producing a top-performing network of 40 corridors from the cleaned universe. The 40% alternative was run second and was only able to sample from the 40 corridors included in the 80% alternative, since all alternatives were to be subsets of one another. The 40% alternative produced a top-performing network of 20 corridors. The 60% alternative was run last, holding the 20 corridors in the 40% alternative fixed while sampling the rest from the 40 corridors included in the 80% alternative.
Making each alternative a subset of the higher resource alternative gave MBTA a sense of which corridors to hold constant with lower-resource levels, with the ability to identify and add additional corridors as additional resources become available.
Scoring and Evaluating Networks
To evaluate the large number of potential networks sampled for each alternative, a scoring methodology was developed for this study based on four types of trips served by the combined rapid transit and high-frequency bus network. The score for each network was defined using Equation 3,
where
bus one-seat rides = demand* trips that could be served with a one-seat ride on the sampled bus network;
bus to bus two-seat rides = demand* trips that could be served with a two-seat ride on the sampled bus network;
bus to rapid transit two-seat rides = demand* trips that could be served with a two-seat ride combining the sampled bus network and the existing rapid transit network; and
rapid transit to rapid transit two-seat rides = demand* trips that could be served with a two-seat ride on the existing rapid transit network.
Trips that could be served multiple ways (i.e., with both a bus one-seat ride and a bus to rapid transit two-seat ride) were credited to the higher category in this hierarchy.
Weights for each of these trip categories were determined based on service design priorities. They were selected to give the highest importance to providing as many trips as possible with one-seat bus rides, followed by trips that were served by two-seat rides, including trips that fed the rapid transit network. Any trips that could be served by a one-seat rapid transit ride were not counted to limit bus service that was redundant with the rapid transit network.
All networks were scored using demand* from the serviceable connections dataset, which emphasized equity in scoring networks. The scoring process outputs a spreadsheet, with each row representing one network with the score and the total demand in each of the trip types (and a “not served” category) as columns.
Results
This process produced 100,000 possible networks, including 20,000 networks for the 80% alternative (sampled from 92,000 potential corridors). Once a single 80% network was selected, 40,000 networks were produced for the 40% resource alternative (sampled from the 40 potential corridors in the 80% network), and a single 40% network was selected. Finally, 40,000 networks were produced for the 60% alternative (sampled from the 40 potential corridors of the 80% network with the 20 corridors of the 40% network held constant). Sampling was structured in this way to ensure that each network was nested within each other.
Summary Statistics of 100,000 Networks Across Alternatives
The distribution of the scores of each of the generated networks is shown in Figure 4. The 80% alternative shows the widest range of scores because it had the largest pool of corridors to sample from (92,000 compared with 40). The range was so large that most of the 60% alternative networks outscored most of the 80% alternative networks. As shown, the vast majority of the 40% and 60% alternative networks scored within a relatively tight range, because they had a limited number of corridors to sample from.

Distribution of scores for the generated networks of each alternative.
The 20 top-scoring networks for each of the three alternatives (80%-, 60%-, and 40% resource level) were added into the tool along with a custom network builder (developed in R Shiny) that allowed users to choose corridors from these networks to create custom-built networks that satisfied resource constraints in each alternative.
The performance of the top 20 networks in each alternative is presented in Figure 5. These results show that the few top-performing networks for each alternative scored higher than the rest and were probably approaching the performance of an optimal network. This was most true for the 40% and 60% alternatives, for which the process created a higher portion of the potential networks; in future applications it may be beneficial to sample a larger number of networks for the alternative with the most options. The diminishing returns associated with allocating more and more resources to high-frequency service are clearly visible in this image, as only the very highest performing networks in the 80% alternative scored better than the networks in the 60% alternative, despite a 33% increase in resources between the two.

Distribution of scores for the top 20 generated networks for each alternative.
Getting to the Final Network
At this point, the automated data-driven process had created multiple alternatives for a core, high-frequency bus network for Greater Boston that was objectively built to meet the goals of MassDOT/MBTA. With equity and efficiency at the center, this data-driven process helped to identify connections that the current network does not provide for the people who most rely on the bus system.
MBTA service planners generated a final network for the 80%, 60%, and 40% peak hour resource alternatives, by selecting corridors present in the generated network alternatives and recombining them based on discussions of tradeoffs (Figure 6). The performance of the custom-built networks was higher than those created by the automated process, underscoring the need for qualitative analysis to supplement or correct the limited considerations of a network sampling approach. Additionally, input from MBTA service planners was necessary to contextualize networks to infrastructure and specific amenities (hospitals, elder care centers, etc.), limiting a purely quantitative approach. The relatively large zonal geographies represented by a centroid contributed to the need for planners to contextualize trips.

High-frequency network process results.
Based on the analysis outlined above, MassDOT and MBTA decided that an increase in high-frequency service was the best choice for a redesigned network, and elected to move forward with a 60% alternative structured around 33 high-frequency corridors—a substantial increase from the roughly 40% of current service on high-frequency corridors.
From this point, MBTA service planning staff led the design of a completely revised bus network that included not only high-frequency service, but lower-frequency services that met the full range of coverage, equity, and service-level goals for the region. Although more manual, this service design process accounted for things that the automated process could not, such as difficult turning movements, terminal locations, garage space and other resources, labor requirements, and specific needs and trip generators such as schools.
This led to a proposed draft BNRD network, which was released for public comment during spring and summer 2022. A final network incorporating public feedback was adopted in fall 2022, and is planned for a phased implementation starting in 2023.
Discussion
The process developed in this study is unique because it leverages powerful LBS data to objectively design a high-frequency bus network, developing a new use case for big data in transportation planning. By using rich O-D data comprising millions of trips occurring on all modes, the process was inherently focused on all potential ridership markets, instead of focusing only on current transit demand. Because of the scale of the dataset and the region, the automated process was able to test many alternatives against a set of custom-built demand-based performance metrics, going beyond what is possible through a traditional, manual service planning processes. The process integrated a complex data-oriented method with equity from inception to final design, addressing the transportation needs of the most vulnerable in MBTA’s service area, while providing an objective counterpoint to paradigm paralysis that might exist in a process that starts with existing bus route structures.
Additionally, the process was designed to be readily repeatable and customizable, allowing for flexibility and reanalysis as travel patterns continue to change. The process and results can be easily updated with new LBS data for the Boston region in the future (for example, with 2023 LBS data to see how post-COVID travel patterns resulted in significant changes to demand). As priorities change, the process could also be customized to fit different goals, as the functions for all the steps in the process are fully customizable. The process framework could also be transferred to other geographies for use in other regions.
Of particular interest are the metrics developed during this process to analyze the performance of individual corridors and the network as a whole. These demand-focused metrics provide a way of comparing alternatives relative to the demand that they serve, quantifying the potential markets covered and the opportunities that they provide in an aggregate way. These types of metrics could easily be applied to a wide array of project types, from system redesigns to corridor studies. Because of the easily reproducible algorithms created through this process and the accessibility of LBS data, the effort required for such projects would also be reduced.
In the future, this process could be used to identify not only more optimal networks, but an efficient level of resources to allocate to high-frequency service. This could be done by testing a wider range of resource alternatives and identifying inflection points in the demand served per route-mile and other metrics.
This process helped to develop solutions to several difficult data and analytical issues that may also have wider applications, for example,
Linking disaggregate LBS data at the O-D level into corridor and network level aggregations;
Approximating optimization of corridor and network options by creating many good sets, instead of all possible sets; and
Automating the process of moving between zone-based routing and roadway-specific routing for buses.
Conclusions
A transit system that effectively meets the needs of its riders must reflect where riders want to go and when they want to go there, especially those who have historically experienced the consequences of disinvestment. The process outlined in this paper can transform LBS data and socioeconomic statistics into tools and metrics for service planning that are systematic, reproducible, and invite examination of assumptions. However, these analyses are primarily planning-level tools that reveal crucial themes, and do not take the place of a service design process that understands more qualitative needs like layover facilities, capacity at garages and other facilities, special trip generators, and accessibility at the bus stop level. Furthermore, as a process in development, the final stage of design necessitates human intervention to ensure an implementable network. This was true in Greater Boston, where MBTA service planners played a big role in making decisions around the political and operational feasibility of automated networks because the process could not take these qualitative considerations into account. Whereas some of those elements could be integrated into future versions of this tool, and additional computing resources could lead to more accurate results, service planning still requires the level of qualitative information that human service designers are best able to consider and address. Qualitative analysis of powerful but difficult to measure issues—accessibility in the built environment, local attitudes and views about transit affordability, idiosyncratic roadway characteristics, and others—must be incorporated.
The process described in this paper provides a quantitative foundation for generating bus networks at a massive scale, with both process and results focused on equity. This first application in Greater Boston has illustrated proof-of-concept for application, illustrating that the process can identify new connections and high-frequency bus networks that serve as the core framework for systemwide service design. The customizable parameters make this process transferable across geographies and repeatable over time, as travel patterns and priorities evolve. These features mean that the process could also be used to identify more incremental service changes through regular updating of the base LBS data and validating the results against ridership. Improvements to the process would be possible to increase sample sizes, reduce computing loads, and make network creation more accessible through an all-in-one app interface. As implementation on MBTA’s new bus network proceeds, it will be important to track how the corridors identified by this process perform in relation to ridership over time, to further validate the parameters and functions used.
Footnotes
Acknowledgements
The authors acknowledge the contributions of MassDOT and MBTA staff in encouraging and enabling this work through the overall Bus Network Redesign planning effort. In particular, this includes staff from the MassDOT Office of Transportation Planning, MassDOT/MBTA Office of Performance Management and Innovation, MBTA Service Planning, and MBTA leadership. In particular, the authors would like to acknowledge Caroline Vanasse, Anna Gartsman, Julianna Horiuchi, Melissa Dullea, Robert Guptill, Kathryne Benesch, Wes Edwards, Scott Hamwey, and Christof Spieler.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: D. Baumgartner, V. Chachra, M. Ciborowski, D. Leven, A. Zimmer; data collection: A. Zimmer, V. Chachra, Z. Temco; analysis and interpretation of results: D. Baumgartner, V. Chachra, M. Ciborowski, Z. Temco, D. Leven, A. Liu Pathak, A. Zimmer; draft manuscript preparation: V. Chachra, D. Baumgartner, Z. Temco, A. Liu Pathak, D. Johnson. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The preparation of this report has been financed in part through funding from the Federal Highway Administration and Federal Transit Administration, U.S. Department of Transportation, under the State Planning and Research Program, Section 505 (or Metropolitan Planning Program, Section 104[f]) of Title 23, U.S. Code.
Data Accessibility Statement
Location-based services data in its raw form are not available owing to license agreements with StreetLight Data. Aggregated LBS data can be found on the MBTA’s Bus Network Redesign website (https://www.mbta.com/projects/bus-network-redesign/update/how-we-used-data-design-equitable-bus-network). Initial LBS cleaning scripts are available on MBTA’s GitHub (
). Other data, including the Busable Streets geographic information system network, or other scripts may be available via Public Records Request.
The contents of this report do not necessarily reflect the official views or policy of the U.S. Department of Transportation.
