Abstract
Major events are a significant source of traffic congestion, especially in large metropolitan areas. This paper presents a case study of football games played at the Los Angeles Memorial Coliseum, a venue near downtown Los Angeles, California, with a capacity of about 80,000. Two teams play home games at the Coliseum: the Los Angeles Rams and the University of Southern California (USC) Trojans. These events take place in an area that has a high level of recurrent congestion. The traffic impacts of game days are analyzed by comparing game day traffic with traffic on control days on both the highway and arterial systems. The data include speed records from in-road detectors. Two sets of models are estimated to test relationships between game attributes and traffic performance. The first set is traditional regression models controlling for spatial and temporal correlation. The second set is random forest (RF), a type of machine learning estimation. RF is found to perform better, as it allows for complex non-linearities in variables. The results show that Rams and USC impacts are different. Rams fans arrive in a more concentrated time interval closer to the start time of games and, therefore, have a greater impact on the major approach routes than USC fans. The greatest impacts on highways are around nearby freeway-to-freeway interchanges. Arterial traffic is more consistently affected by distance from the venue. This case study provides the basis for better management of major planned events.
Major planned events are a significant source of traffic congestion, especially in large metropolitan areas. Planned events may include sports events, concerts, or short-term repair/construction closures. This paper presents a study of major events to develop strategies and policies to reduce their impacts. Football games played at the Los Angeles Memorial Coliseum were used as the case study. The venue is near downtown Los Angeles and has a capacity of about 80,000. Two teams play home games at the Coliseum: the Los Angeles Rams and the University of Southern California (USC) Trojans. These events take place in an area that has a high level of recurrent congestion.
This research includes a comprehensive analysis of highway, arterial, transit, and parking demand. This paper presents one part of the comprehensive analysis focusing on traffic impacts. Here, an analysis of highway and arterial traffic impacts is presented. The impacts of game days are examined by comparing game day traffic with traffic on control days. Speed and volume records from in-road detectors are the main data source. Two sets of models are estimated to test relationships between game attributes and traffic performance. The first set is traditional regression models controlling for spatial and temporal correlation. The second set is random forest (RF), a type of machine learning estimation. RF is found to perform better, as it allows for complex non-linearities in variables. The results show that Rams and USC impacts are different. Rams fans arrive in a more concentrated time interval closer to the start time of games and, therefore, have a greater impact on the major approach routes than USC fans. The greatest impacts on highways are around nearby freeway-to-freeway interchanges. Arterial traffic is more consistently affected by distance from the venue. This case study provides the basis for better management of major planned events.
Major events, whether planned or unplanned, are sources of non-recurrent congestion. Non-recurrent congestion is estimated to account for more than half of all congestion. The largest share is from incidents, followed by weather, work zones, and special events ( 1 ). Work zones and special events are planned (known in advance) and, therefore, potentially more effectively managed.
Research on the impacts of special events is limited. Wang et al. ( 2 ) developed a classification system for special events and a conceptual framework for their impacts. Impacts of special events are short term and spatially concentrated around the event site. Kwon et al. ( 3 ) conducted a study of one highway corridor using detector data (I-880 in San Francisco Bay Area) and estimate that sporting events account for about 4.5% of all daily congestion. Seeherman and Anderson ( 4 ) also use highway detector data to study baseball game impacts in San Francisco and Anaheim and suggest that games add about 1,000 vehicles to the afternoon commute. Others have used simulation ( 5 ) or GPS data from vehicles ( 6 ). These studies suggest that special event-related traffic is short in duration and concentrated in both time (near event start and end) and space (near the venue) ( 7 ).
Existing research generally focuses only on the impact of events on road traffic near venues (3–6, 8). Major events research is extended in the following ways. First, the impact of traffic generated by events on the local transportation system—arterial roads in the immediate venue zone—as well as the impact on a larger regional scale transportation system—highway corridors leading to the venue—are studied. Second, the impact of several event-related factors on pre-game traffic and travel behavior of attendees are explored. Third, an automatic and classification-based approach is developed to identify the impact region and affected time period of events. Finally, a modeling approach is used that should generate traffic prediction with high temporal and spatial resolution. Improved traffic prediction allows transportation planners and event organizers to develop targeted strategies to mitigate event-related congestion.
Methods
A quasi-experimental design is used, comparing game days and a set of control days. This study only focuses on weekend football games at the Coliseum. Compared with weekend games, games on weekdays would have different traffic impacts, and there are too few weekday games to allow a separate analysis. This study includes all weekend Coliseum football games from 2016 to 2018 for which data were available, and a set of comparable control days. During these years, both USC and Rams games took place. Traffic impacts along major corridors and on arterials near the Coliseum are examined.
Attendance, time of game start, time relative to game start, and distance from the venue are expected to be related to traffic performance at a given time and detector location (4, 9). Performance is measured as the difference in speed between a game day and a control day. Games starting earlier in the day are expected to have less effect than games later in the day, because the regular daily traffic is lighter earlier in the day. Most arrivals are expected to take place a few hours before the start of a game; therefore, the demand on the system will move toward the venue as time gets closer to start time. Attendance is the best indicator of the added demand generated by the event.
The detector data have a high degree of spatial and temporal correlation ( 10 ); the best predictor for any given sensor at time t is the value at time t−1, and the same is true for location. These correlations must be controlled (11, 12). Two sets of models are estimated. The first is a simple ordinary least square (OLS) regression with spatial and temporal lags:
where
The second set of models uses a machine learning method based on the RF algorithm. RF is an “ensemble learning” method generating several decision trees and aggregating the regression results from these trees ( 13 ). It is a popular machine learning algorithm and is widely used in prediction and optimization. It allows for many different types of non-linearities, and is often used to predict traffic (14, 15). Unlike many other machine learning methods, RF allows for measuring the prediction strength of each variable and for illustrating relationships between one or more independent variables and the dependent variable. RF is also very flexible about non-linearities in the data.
Figure 1 illustrates the structure of a RF regression model. A series of random samples are taken, and best-fit models (for prediction) are generated from each sample. The main parameters of RF are the number of trees (samples) and number of variables randomly selected for each tree. An average prediction from each of the sample “trees” is estimated as the final model. Prediction strength is measured by applying the model to data withheld from the given sample.

Structure of a random forest regression.
Study Area and Data
The study area is the Los Angeles Memorial Coliseum, home to college football team USC Trojans and professional football team the Los Angeles Rams, since 2016. Located in Exposition Park, two interstate highways—I-110 and I-10—provide regional access. Figure 2 shows a map of the study area with the closest on- and off-ramp locations identified. The local road network is dense and is part of the Los Angeles Automated Traffic Surveillance and Control system, so it is heavily instrumented. Traffic signals can be managed in real time.

Map of study area ( 16 ).
All football game days from 2016 to 2018 for which detector data were available were selected as “treatment” days. A set of “control” non-game days that were as similar as possible to game days was selected; for example, same day of week, either one week before or after a game day, no unusual weather, and no major events at nearby facilities. The final sample consisted of 29 game days (19 RAMS and 10 USC) and 39 control days. Most of the missing data were in 2017. Data were also obtained on game start time, opponent, score, and attendance. Control day comparisons are based on game start times.
Detector data on speed, volume, and occupancy were drawn from the Archived Data Management System (ADMS) at USC, which collects near real-time traffic data from detectors on highways and arterial roads ( 17 ). ADMS gives directional volume, speed, and occupancy for each highway segment at intervals of 30 s. Midblock speed and volume at intervals of 1 min are available for each arterial segment. The data are aggregated to intervals of 15 min.
Preliminary tests were conducted to define the boundaries of game day impacts for both distance and time. Highway effects were not observed more than 10 mi from the venue, and arterial effects were observed only within a few miles. Effects could be observed up to 6 h before game start time. Thus, the selected time/space boundaries are 6 h before game start, 10 mi distance for highways and 5 mi distance for arterials. The area of analysis includes 100 highway detectors and 4,017 arterial detectors. Total number of observations is 34,548 for highways and 1,685,995 for arterials.
Results
Before estimating the models, some descriptive analysis was conducted. Highway corridor effects were expected to be different because attendees come from different parts of the region. Rams and USC game effects were also expected to be different because they draw from different markets. Therefore, the study focused on specific corridors in the highway analysis and estimated separate models for each corridor and each set of games. Figure 3 shows the four main highway corridors approaching the Coliseum. The directional labels are with respect to location of the Coliseum. There are two main highways approaching from the east; these are combined, as both converge just east of downtown Los Angeles.

Highway corridors for analysis.
Selected Results, Descriptive Analysis
The descriptive analysis reveals the main spatio-temporal patterns that become evident in the model results. Only a few examples are presented here because of space limitations. Figure 4, a and b, gives spatio-temporal maps of pre-game traffic speed for Rams games (panel a) and USC games (panel b) on highway corridor I-110 S. The upper three-dimensional graphs show distance from the Coliseum on the x-axis, time to game start on the y-axis, and speed difference (control − game) on the z-axis. The warmer the color, the greater the difference. The lower graphs show only the x-axis (distance) and y-axis (time). The figure shows that arrival patterns are different between Rams and USC, and the biggest differences in speed are not closest to the Coliseum. Rams patrons arrivals are more temporally concentrated within 2 h of game start time. USC patrons begin arriving up to 6 h before game start time and are less temporally concentrated. The different arrival patterns are likely because of different pre-game activities. The USC campus has extensive tailgating before games but is closed for Rams games. However, the location of the greatest speed difference is the same for both groups, about 5 mi from the Coliseum. This is the location of a major freeway-to-freeway interchange. Finally, Figure 4, a and b, illustrates the extreme non-linearity of both distance and time with respect to speed difference.

Pre-game traffic speed patterns on highway corridor I-110 S: (a) Rams games and (b) University of Southern California games.
Figure 5, a and b, gives the same information for the arterial system. The same difference in arrival patterns for Rams versus USC is evident as with the highway corridors. However, the spatial and temporal patterns are much smoother. In general, speed difference declines with both distance and time to game start, as expected.

Pre-game traffic speed patterns on arterials: (a) Rams games and (b) University of Southern California games.
Model Data and Variables
The dependent variable is the pre-game traffic speed difference between game days and baseline control days as follows:
where
where
Summary Statistics, Speed Difference, Highway Corridors, and Arterials
Note: SD = standard deviation; Min. = minimum; Max. = maximum; USC = University of Southern California.
Accounting for Autocorrelation
As noted, traffic data have strong spatial and temporal autocorrelation that must be controlled in the models. Spatial correlation was tested using Moran’s I; tests were conducted for each corridor and year and were positive and significant in 9 of 12 cases. To control for spatial correlation, a distance-inversed weighted function was used to generate a weighted speed for each detector:
where
Temporal autocorrelation was tested with the partial autocorrelation function (PACF). Each 15-min lag was tested for 20 lags. Based on the PACF score, the first lag is very significant but the following lags are not. Therefore, the models include a 15-min lag variable.
Model 1: OLS with Spatial and Temporal Lags
Tables 2 and 3 give regression results for Rams and USC, respectively, with models for each highway corridor and arterials. Fixed-effect variable coefficients for month and year are not shown. The spatial and temporal lag coefficients are highly significant and account for most of the variance explained by the models (stepwise results not shown). Controlling for spatio-temporal correlation, distance to the Coliseum has the expected sign in all but one case but is not always significant. The greater significance of the coefficient in the arterials model is consistent with the smoother time relationship observed in Figure 5. Coefficients for time to kickoff are mixed and mostly not significant, even though Rams and USC games were separated. Distance to the nearest freeway-to-freeway interchange is significant and of the expected sign in three of four cases for Rams games, but significant in only one case for USC games. Coefficients of the remaining variables are not significant for Rams games. For USC they are significant only for arterials. It may seem surprising that attendance is not significant. Intuitively, more intense traffic problems can be expected when there are more attendees. However, these effects may be highly concentrated around the venue. In the larger system, a few thousand more or less has little effect. In summary, game-specific factors (e.g., distance to venue, time to kickoff, attendee number, etc.) are largely swamped by the correlation effects. This does not mean that these factors are unimportant. It simply means that once the correlations in the data are controlled for, the game day specific factors have a modest effect.
Ordinary Least Square Results: Rams
Note: Fixed-effects coefficients not shown; t-statistics in parentheses. Distance to nearest interchange not applicable in arterial model. na = not applicable.
p < 0.05, **p < 0.01, ***p < 0.001.
Ordinary Least Square Results: University of Southern California
Note: Fixed-effects coefficients not shown; t-statistics in parentheses. Distance to nearest interchange not applicable in arterial model. na = not applicable.
p < 0.05, **p < 0.01, ***p < 0.001.
Model 2: RF Regressions
RF regressions allow for much more flexibility in the form of relationships between variables. The same set of variables are used in RF estimations. RF does not provide coefficient values, but rather a measure of contribution of each variable to accurate prediction. RF does provide R2 and root mean square error. The results of OLS and RF are compared in Table 4. It can be seen that the RF model performs better in all cases. RF not only allows for many different variable transformations, it also allows for different combinations of variables and different relationships between the independent variables.
Comparison of OLS and RF Results
Note: OLS = ordinary least square; RF = random forest; RMSE = root mean square error; USC = University of Southern California.
Table 5 shows the six most important covariates determined by the RF algorithm for all highway corridor and arterial road models. The rank of important variables is quite consistent across models. In all cases, the time and spatial lags are the top-ranked variables. For relative contribution, these two variables have much stronger effects than any of the others (results not shown). Distance to Coliseum and time to kickoff appear in all models, and number of attendees appears in all but one. Distance to the nearest interchange is present in all highway models. Results indicate that once spatio-temporal lags and complex non-linearities are controlled for, football games have significant and predictable impacts on the surrounding highway and arterial systems.
Top Six Covariates for Highway and Arterial Road Speed Difference Prediction
RF allows the generation of partial dependence plots to illustrate relationships between speed difference and event-related variables. A partial dependence plot demonstrates the marginal effect of an independent variable on the predicted response while controlling for all other variables in the model. This is illustrated with one example that links back to Figure 4, a and b. Figure 6 gives a two-dimensional plot of distance to Coliseum (x-axis) and time to kickoff (y-axis) relationships to speed difference. Note that these plots are controlling for the influence of other variables. The color scheme is given in the figure: blue tones indicate little effect, orange and yellow indicate more effect.

Relationship of distance to Coliseum and time to kickoff with speed difference on highway corridors: (a) Rams games and (b) University of Southern California games.
The plots in Figure 6 are quite similar to those of Figure 4. They show that Rams game attendee arrivals are more concentrated in time and occur mostly within 2–3 h of game start time, whereas USC attendee arrivals are earlier and less concentrated in time. The plots also confirm that the added demand of weekend football games has a greater impact on nearby freeway-to-freeway bottlenecks than on the highway segments closer to the Coliseum. Finally, the plots show that game attendees who drive to the Coliseum do not use the closest highway exit. Such a similar representation of the underlying data suggests that the RF model would be an effective predictor of future game impacts.
Discussion
Our results show that weekend football games at the Los Angeles Memorial Coliseum do have impacts on the local traffic system, but game day traffic patterns are complex. The linear models showed that most of the difference in game day versus non-game day travel speed is explained by spatial and temporal autocorrelation. Game day-related variable coefficients are often not significant, likely because there are no variable transformations that can capture the extreme non-linearity of the data. The RF estimation allows for complex non-linearities among variables, and the flexibility of the RF algorithm allows for the estimation of models with better fit and predictive ability. With the RF estimation, event-related covariates (e.g., distance to Coliseum, time to kickoff, and attendees) play a significant role in prediction.
Pre-game traffic patterns on highways and arterial roads are quite different. A non-linear relationship is observed between traffic speed difference and distance to Coliseum and time to kickoff on highway corridors, but distance to Coliseum has a linear relationship for arterials. The non-linear relationship on highways can be explained by more complex traffic conditions over the distance of the corridors included in the study. In contrast, within a few miles of the Coliseum, arterials are more clearly affected by the event.
Traffic patterns of Rams and USC game days are also different because of the arrival behavior of the attendees. Rams attendees tend to be more temporally concentrated and, therefore, generate a greater peak impact. USC attendees arrive earlier and across a broader time period. It is concluded that this difference is mainly because of the availability of pre-game activities.
The RF results show that the greatest impacts on freeways tend to be around existing interchange bottlenecks rather than closest to the Coliseum. The extra demand caused by football games adds to the existing bottleneck. Examples include the I-110/I-105 interchange in the I-110 S corridor and the four-level interchange of I-110/SR-101 in the I-110 N corridor. There is some evidence that game attendees may choose to exit at more distant locations in response to strict game day traffic management, such as road closure, parking restriction, and driving detours. For arterials, the impact of game-induced traffic is limited to within 2 mi of the Coliseum.
The approach in this study has several strengths. First, the RF algorithm performs better than the conventional approach of spatio-temporal regression estimation. The flexibility of the method accommodates non-linear relationships and complex relationships between independent variables. This makes it a better prediction tool; the study model should be able to predict impacts of future games with reasonable accuracy.
Second, unlike many other machine learning techniques, the RF does not have to be used as a black box. It not only improves prediction accuracy but also provides an importance estimate for each predictor variable. Variable importance measures help to pinpoint what variables contribute most to reduction of prediction errors and thus make variable selection more efficient than other algorithms. For transportation planners, it is not only about improving prediction accuracy but also important to explain what factors affect traffic patterns so that mitigation measures can be developed.
Conclusions
The purpose of this research was to provide guidance for better managing the impacts of planned special events. Both conventional and machine learning statistical tools were used to gain the best possible understanding of how Coliseum football games affect the surrounding transportation system. The results have important implications for local transportation planning. The results show that Rams and USC attendees have completely different travel behavior on game days. Therefore, strategies to smooth traffic need to be different. Rams attendees might be incentivized to arrive earlier with pre-game activities or preferential parking. USC volumes might be further spread by various parking policies. Freeway-to-freeway interchanges—typically serious bottlenecks in Los Angeles—are trouble spots, even when located miles away from the venue. This suggests that traffic management strategies should extend beyond the immediate venue area. In all cases, travelers could benefit from information on anticipated traffic at the major bottlenecks as well as in the local area. The next stage in this research is to generate recommendations for better management of Coliseum events. Although this study is limited to Coliseum football games, the results are likely generalizable to other venues and other major entertainment events.
Footnotes
Acknowledgements
The authors are grateful for the assistance of graduate students Alanna Coombes and Sean Soni. The following contributed data used in this research: USC, Los Angeles City Department of Transportation, Caltrans District 7, and LA Metro.
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: G. Giuliano and Y. Lu; data collection: Y. Lu; analysis and interpretation of results: G. Giuliano and Y. Lu; draft manuscript preparation: G. Giuliano and Y. Lu. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Los Angeles County Metropolitan Transportation Authority (LA Metro) under the Los Angeles County Service Authority for Freeway Emergencies program, contract PS36665000. Additional support was provided by the Sol Price School of Public Policy, University of Southern California, Los Angeles, CA.
Data Accessibility
Data sources are described in the paper. The main data sources are part of the Archived Data Management System and owned by LA Metro. Access is by request to LA Metro.
