Abstract
This paper discusses a machine-learning traffic signal control method. A full-scale corridor is analyzed and the transferability of using a model pre-trained on a single intersection is examined. Two controller designs are explored, a simple two-phase design and a full ring-and-barrier style controller. The full ring-and-barrier controller adapts many of the key features present in traditional controllers, such as protected-permissive left turns, so that they can be used in the reinforcement learning (RL) paradigm. This study is the first to propose a method that uses deep reinforcement learning (DRL) to implement a full ring-and-barrier style controller. The study also examines the feasibility of using transfer learning to pre-train a model on a single intersection and then fine-tune it for application in a complete environment. Training is done on a simple four lane intersection and the pre-trained model is then transferred for fine-tuning to six controllers operating on a corridor modeled with field data obtained for University Avenue in Waterloo, Ontario, Canada. The performance of the fully trained model is then compared with the existing signal plans in relation to the average delay and average queue length. Application of the ring-and-barrier design to this corridor was found to reduce delays by at least 5% and average queue lengths at intersections by 27%.
In recent years, significant advancements in machine learning have resulted in the development of highly intelligent algorithms. In fact, many tasks that were previously thought to be restricted to human intelligence and cognition can now be performed much more efficiently by computers. One key example of this is pattern recognition, which has led to computers exceeding the performance of humans in complex strategy games ( 1 – 3 ). These advancements have been made possible through the use of deep neural network models, and new applications for these techniques continue to be found each day. Recently, some developments have been made applying these techniques to signalized intersections, but many current implementations are limited in their scope and applicability ( 4 – 6 ).
As a repetitive decision-making process, traffic control lends itself well to reinforcement learning (RL) and deep reinforcement learning (DRL). These techniques work by allowing an agent to explore their world in an unsupervised fashion and learn from experience ( 7 ). Applications of RL and DRL are typically framed as Markov decision processes (MDP), which model the agent-world interaction in relation to states (S) and actions (A). When an agent chooses a particular action, its choice influences the world, and the world transitions according to a stochastic process to a new state (S’). A key assumption of MDPs is that the transition to successor states is independent of past actions and states. While previous actions and states may have brought about the current state, the agent’s past decisions do not affect the optimality of its choices in the current state. To motivate agents during learning, rewards or penalties are often generated from the state the agent is in according to a specified reward function. RL seeks to map out the expected rewards from choosing particular actions in a given state ( 7 ).
One of the most successful techniques used to teach agents an optimal strategy is called Q-learning. Q-learning operates by using a state-value function that predicts the expected reward of choosing a particular action when faced with a specific state. The value considers the long-term consequences by discounting the expected value of future rewards according to some rate, and the optimal policy is one that chooses actions to maximize this reward. Traditional approaches to learning are divided into two methods: on-policy and off-policy. In off-policy learning, the agent learns an optimal policy regardless of the current policy being used to choose actions, though some policies may not lead to the actual optimal policy (or may take a lot of training to do so). Off-policy methods typically focus on exploring the possible states and actions as best as possible, such as ε-greedy training or softmax action selection ( 8 ). On-policy methods update the Q-function with the expected value for continuing to choose actions according to the current policy. On-policy approaches can be useful where the optimal policy is not the goal of the algorithm or when comparisons between a particular policy and the optimal one are desired.
In practice, learning the Q-function can be complicated if the range of possible and actions states is vast, and, in practice, the Q-function is typically approximated using techniques such as linear approximation or neural networks ( 9 ). When RL is used with deep Q-networks (DQN), a deep neural network is often used in tandem with other techniques commonly applied in deep learning, such as convolution and pooling.
A neural network is made of large groups of weights that generate output based on their input, activation function, and any bias terms ( 10 ). Input may be from numeric representations of the input (e.g., color mapping of a picture, positions of vehicles) or it may be from other neurons. Neurons are grouped into layers, and neurons within the same layer are typically connected to the same input and output sources and have the same activation function. When training a model, typically the weights that govern the output of individual neurons are adjusted based the error between the prediction of the model and the actual error. These errors are propagated backwards through the model using the well-known backpropagation algorithm that determines the contribution of each individual weight to the error ( 11 ).
In models with complex inputs and multiple actions, the number of neurons required to process the input can be very high, which can lead to difficulty in training. This problem is further exacerbated in deep neural networks where many layers are typically used. To reduce this effect, convolutional layers and pooling layers are used to filter and identify important features. In a convolutional layer, a smaller collection of weights progressively scans and receives input from a smaller area of the total input. The advantage of this approach is easier identification of the relevant features of a particular situation ( 12 ).
One important concept in machine learning tasks is the concept of “transfer learning,” which is an important tool that can increase the applicability of a model. In transfer learning, a dataset is used to train the model initially. The dataset required to train the model initially may be very large but may not cover all application cases. If the model needs to be applied to a domain outside the initial task and dataset, then transfer learning can be used to reduce the amount of data required to achieve the same level of performance ( 13 ). In image classification problems, this is the idea that a model can be pre-trained on a dataset of labeled images, and then fine-tuned for application on other unrelated images. In traffic, this is akin to training the model on a specific intersection configuration and driver behavior regime and applying it on another ( 14 ).
This paper discusses the application of a DRL algorithm to control a network of traffic signals. In particular, the idea of transfer learning is evaluated by first pre-training the model on a simple single intersection and then applying it to multiple intersections in a corridor. This paper also explores the performance of deep learning relative to traditional methods on a corridor of intersections.
Literature Review
Machine Learning
Artificial neural networks contain nodes that emulate biological neurons. Each node has a set of weights w0, w1, … wn that receives input from other nodes (including the input layer). Each neuron also has an activation function
where z is the aggregate sum of the input received by the neuron. Qualitatively, the function truncates negative inputs (outputting zero) and is linear on its output, and has shown better performance than linear functions. Another common activation is the softmax function, especially on the final layer, as it can be used to generate a probability distribution to sample actions from ( 15 ).
Weights in the individual layers are updated using backpropagation, which uses stochastic gradient descent to update the weights based on the following Equation ( 11 ):
where
In DQN, the algorithm uses RL to learn the Q-function. The problems suited to RL must be framed as MDPs (
17
). These processes all have four basic elements: a set of states St the agent can be in at time t; a set of possible actions At they can take at that time; a transition function that defines how the system changes state from time t to time t+ 1; and rewards R(St, At, St+1) that can be determined based on the states and actions that measure the agent’s performance (
17
,
18
). If the agent cannot observe the true state St and instead is limited by its observations Ot, then the situation is called a partially observable Markov decision process (POMDP) (
19
,
20
). In RL problems, rewards are usually assigned at every time step (though they may be sparse), and the agent’s goal is to learn an optimal policy
The value function
where S’ is any successor state to S. When discussing Q-learning and DQN, typically the goal is to learn the optimal policy
which is an iterative update that adds the value of the previous value of Q with an error calculated based on the expectations from the best successor state.
In well-defined problems, Q-learning will converge to an optimal policy (if the agent is able to visit each possible combination of states and actions enough times) ( 7 ). However, in many situations, the state-action space is very large and complex, and approximations of the Q-function provide better performance. In DQN, this approximation is achieved using a neural network that estimates the value of Q that would be provided by Equation 6 and is trained using the same error ( 3 ).
Machine Learning and Traffic Signal Control
In recent years, many different approaches to machine learning and traffic signal control have been explored by researchers. One of the earliest works to use ordinary Q-learning was done by Abdulhai et al. (
21
). Their agent was provided with queue lengths on the four approaches, elapsed phased time, and a reward based on the total delay. The agent was trained using a
Recently, some studies have also begun to consider the deep-learning based implementations. For example, Casas et al. proposed a deep deterministic policy gradient (DDPG) algorithm that used an actor-critic structure with two separate outputs ( 6 ). In the actor-critic design, the critic provides estimates of the Q-value while the actor chooses actions. Their algorithm was designed to function with loop detectors and used a “speed score” calculated based on their output as both the input for the agent and as the basis for the rewarding system. Chu et al. also proposed a scalable multi-agent RL agent based on Advantage Actor Critic (A2C) ( 5 ). Their algorithm used the cumulative delay of the first vehicle together with the number of approaching vehicles as the state input. This design allowed compatibility with existing loop-detectors. Li et al. used a DQN structure to control traffic signals on eight approaches ( 24 ). Their model had a three-layer structure that used sigmoidal activation functions and evaluated it in a simulation environment. They compared their approach with traditional Q-learning and found improvements in the agent’s performance. Tan et al. proposed a cooperative learning framework and used transfer learning to scale up their model to many intersections ( 25 ). Their implementation, however, was limited to two-phase signal controllers. It is important to note that currently no existing products use deep learning techniques to re-time signals; however, some research has also been done on proposing systems to integrate new machine learning techniques with existing signal re-timing processes through engineer-in-the-loop approaches ( 26 ).
While some research has already been done exploring the potential of deep learning, current research is still in the early stages. In particular, few studies have been done exploring the use of deep learning in complex multi-agent environments, and many proposed systems are limited to isolated intersections with two-phase movements. Furthermore, few studies address practical issues surrounding the use of machine learning on intersections, including the issue of the transferability of pre-trained models to other intersection arrangements. This paper’s study is designed to fill in this gap and explores how much additional fine-tuning would be required for a model trained on a simple intersection for application to a multi-intersection corridor. This paper also proposes a novel approach to adapting the ring-and-barrier paradigm commonly used by traditional traffic controllers for use in RL configurations.
Model Design
This paper builds on previous work conducted by the authors exploring the use of deep learning in traffic signal control ( 4 ). A model previously developed by the authors for single intersections is made use of, and it is evaluated on a multi-intersection corridor.
Model Design
The basic structure of the model is adapted from work presented by Google Deep Mind that showed high success when applied to play computer games (
3
). The model design uses experience replay that stores past experiences e in a queue
where
State Definition
States are defined as described by Muresan et al. (
4
). This is adapted from the work done by Mnih et al. for a traffic system (
3
). Four frames representing the transition of the state space across the past four time stamps are used to build the basic components of the input. Vectors with queue length information (
Action Definition
There are many different ways to define the actions an agent can take with respect to traffic signal control. The most straightforward method is called phase switching, and describes the situation where agents directly control the phasing decisions similar to how pre-timed signals switch phases after fixed time intervals. In phase switching, the agent has near full control over when to switch the signal to the next phase, but no control over the phase order. In contrast, an agent can be given full control over the phase structure itself or can be restricted to making incremental adjustments to specific timing plan values (such as the split time, cycle time, or offset time) similar to systems like SCOOT ( 27 ). Phase switching provides additional flexibility to respond to fast-changing environments while still allowing engineers to specify which phases should or should not occur.
This paper examines two model structures. In the first, a transferability study is conducted using a two-phase signal based on the authors’ previous work (
4
). Action filtering is used to enforce ring-barrier rules on the controller’s decision as well as other rules such as minimum greens and maximum greens. Restrictions on maximum and minimum greens can also be done to enforce other restrictions such as pedestrian signal timing requirements. The agent continuously decides at each timestep
Finally, a novel extension is proposed that emulates the basic operation of a ring-barrier controller with some modifications to reduce the number of actions required and fit the paradigm of RL and action selection. In the proposed method, phases are grouped into two groups, “optional” and “mandatory,” and five actions are defined for the controller to choose from that correspond to advancing to the next phase or skipping the next optional phase on a two-ring controller. Figure 1 highlights the actions available to the controller for a typical protected-permissive design with a callable protected left-turn phase. Actions 1, 2, and 3 correspond to advancing to the next phase in sequence for Ring 1, 2, or both, respectively. For Actions 1 and 2, if the phase change would create an incompatibility with the other ring, that ring’s phase is advanced to the next compatible phase within the barrier group as highlighted in Figure 1. In the figure, Phases 1, 3, 5, and 7 are protected left-turn phases and are defined as “callable” and do not need to be served if no vehicles are waiting. Therefore, if the controller is currently serving Phases 2 and 6 (WB, EB) and the controller does Action 1 (top left scenario of Figure 1), then Ring 1 will advance from EB to NBL and Ring 2 will advance from WB permissive (Phase 2) to the next compatible phase (NB) skipping the optional protected SBL phase to give priority to the NB movement. In contrast, if the controller is currently serving Phases 1 and 5 (EBL and WBL) and Action 1 is selected (bottom left scenario of Figure 1), then the controller will only advance Ring 1 from WBL to EB, as EB is compatible with EBL. Certain combinations of phases will prevent certain actions from being chosen. For example, if the controller is timing Phases 4 and 7 (priority for all NB movements), its only legal actions are the “do nothing” Action 0 or Action 1 to advance Ring 1 from NBL to SB. It cannot choose Action 2 to advance Ring 2 as the barrier after Phase 4 requires Phase 8 to be timing.

Actions and action filtering for a ring-barrier controller with two rings.
Action 4 is a special action that allows the controller to skip all optional phases and move to a mandatory phase. This is a simplification from what would typically occur in a traditional ring-and-barrier controller, as phases are normally “called” by detectors and can be skipped if no call is present but can be forced to serve through virtual calls to the controller (recall). In this sense, Action 4 can be understood as instructing the controller to ignore all calls from vehicles and skip to the next phase that has a recall option enabled. This simplification is suitable for most configurations but may not be appropriate in certain configurations where multiple sequential phases could be skipped.
Reward Definition
While many performance measures can be considered when defining a system that quantifies “optimal performance,” fundamentally the goal of traffic systems is to maximize the throughput of an intersection. In this case, the simplest reward function is one that provides a fixed reward for each vehicle discharged from the approach. Since rewards are provided at each timestep, the maximum reward is achieved when the controller discharges vehicles immediately (future rewards are discounted, so there is no incentive to delay vehicles). To improve training speed, an additional penalty is also defined that penalizes the agent for causing vehicles to stop and be delayed. The reward function can be formally defined as follows:
where
Reward clipping is a process that normalizes the reward values, and has been shown to stabilize learning and produce better results (
28
). In this study, reward values are clipped to
where
Neural Network Design
A deep convolutional neural network (CNN) modeled after Google Deep Mind’s proposed structure is used in the model ( 3 ). The structure of this network is shown in Figure 2. This model has three layers with an input layer of 84 × 84 × 4 and is used to input the last four states to the model. Following the first layer has 32 filters of 8 × 8 with a stride of 4, the second has 64 filters of 4 × 4 with a stride of 2, and the final has 64 filters with a stride of 1. The output of the last convolutional layer is fed into a fully connected layer with 512 units and is then reduced to give Q values for each possible action. This model uses experience replay to sample 10,000 past experiences that are stored in a queue.

Convolutional neural network (CNN) network structure.
Simulation Network
This study makes use of two simulation scenarios to design and test the model outlined. The intersection is coded in SUMO, which is an open-source microsimulation platform ( 9 ). A single intersection is used to develop and train the model (Figure 3 left) and afterwards it is then applied and fine-tuned on six intersections (Figure 3 right) of a corridor modeled after University Avenue in Waterloo, ON. On the single intersection, a ramping volume scenario is used to expose the controller to varying levels of demand on both the north-south and east-west roadway. The volume changes linearly over 3 h and starts at 2200 vph on the east-west approaches and 0 vph on the north-south approaches and ends at 0 vph on the east-west corridor and 2200 vph on the north-south corridor. The corridor network has two major intersections with similar demands on all approaches, and four minor intersections where University Avenue has higher demand. Volumes on the corridor are derived from field observations from the AM peak period, and signal timing plans are pre-timed with actuated protected left turns at some intersections. The average degree of saturation (x) and peak 15-min degree of saturation for the intersections of this corridor are summarized below in Table 1.

(Left and right)—SUMO model of individual intersection (left) and corridor (right).
Average and Maximum Total Intersection Volume for Corridor Scenario
Estimated by using the maximum total 15-min volume from the data.
Model Training and Transfer Learning Process
Model training was undertaken in two stages for the transfer learning scenarios tested. In the first phase, the model is pre-trained on the isolated intersection layout for 1 million simulation frames. This duration was chosen by examining the performance of the model and determining where performance gains begin to level off.
During training, the simulation’s volumes are modeled as a Poisson process by converting the average volume to a randomized headway. At any time, if the model creates a situation where more than 150 vehicles are queued and waiting, a terminal state is triggered, and the model is re-started. This prevents the model from spending too much time in congested situations caused by its sub-optimal control decisions. Actions must be selected at each timestep. At each timestep, the selected action is implemented, and the consequences of the action are observed in the subsequent timestep.
A
In the second phase, the individual model’s state after 1 million simulation frames is then transferred to the model on the evaluation network, giving each intersection the same starting model. These models are then fine-tuned simultaneously using a shorter training process that capitalizes on the model’s pre-existing learned patterns. During fine-tuning,
Results
Following the completion of the training process, the model is evaluated for a period of 25 simulated peak periods. For the transfer learning scenarios, evaluation is conducted after the model re-converges from the fine-tuning.
Two-Phase Signal: Benefits of Transfer Learning
Transfer learning is evaluated by comparing the training speed in relation to the network’s performance of the model to training on all six intersections from an empty model training on all six intersections simultaneously. For simplicity, and to reduce training speed, this portion of the analysis was conducted using a two-phase signal. As part of the modification, turning volumes were removed and Synchro was used to estimated new signal timing plans for each intersection to use as a baseline comparison with a fixed cycle length of 60 s. Offsets used by the field timing plan were maintained. A minimum green of 6 s and a maximum green of 100 s were used to restrict which actions the controller could choose. No coordinated cycle length was imposed on any of the DRL controllers. All signals were coded to use 5 s of amber and 2 s of all red. Terminal states are triggered if any intersection has more than 150 vehicles waiting.
Average cumulative delay and total counts of stopped vehicles for all vehicles on the network were collected using SUMO’s data collection tool. The data collection tool used in SUMO does not record delay until the vehicle exits the simulation, thus some inaccuracies arise if terminal states are triggered because of excess queues (only occurs before convergence). These results are summarized in Figure 4 below for both the full training and the fine-tuning portion of the transfer learning scenarios. In addition, average cumulative delay values for vehicles travelling on specific key routes are also tabulated and shown in Figure 4.

Average delays (top), queues (middle), and per route delays (bottom) for vehicles in the two-phase controller.
While the fine-tuning process creates congestion and delays on the network, at their worst these delays are half the delays caused by training the model from scratch. This suggests that the model is able to exploit some of the learned patterns from the pre-training process. It should be noted that the model applied during the transfer learning did not create terminal states because of high queue lengths. The use of transfer learning allows the model to reach convergence in about half the time it takes to train a complete model from initial conditions. In relation to computation time, on average approximately 500,000 simulation frames can be generated per 24 h period on the hardware evaluated (Intel i7-4790) and during training each intersection’s model uses 2gb of RAM. Training speed is bottlenecked by data extraction from SUMO and speed is reduced during highly congested periods.
In relation to the delay, when compared with the Synchro plan, the DRL model provides higher delays on the corridor, but lower delays on the side streets. However, this optimization led to similar delays overall but average queue lengths at intersections were lower. On average for all vehicles the Synchro plan produced delays of 53 s and at any given time approximately 64 vehicles were waiting at a stoplight. In contrast, the DRL model produces average delays of 54 s and an average of 26 queued vehicles at any given time after convergence in the transfer learning scenario and 53 s and 25 vehicles queued when the full model is re-trained.
Full Ring Barrier Controller: Transfer Learning
The transfer learning approach discussed previously was then used to train and fine-tune a model using the full ring-barrier controller with five actions. This controller was trained on the same volume scenario used to train the two-phase signal, but with 10% less volume and with 5% left-turning and 5% right-turning traffic. A total of 3 million simulation frames were required to achieve convergence. Comparison was done to the currently implemented traffic signal timings and counts were obtained from the Region of Waterloo and are based on field studies conducted during the AM peak period. Signal plans were modified to increase their amber time to 5 s (from 4 s) after the through movements had timed green as amber times of 4 s were found to cause lock-ups and priority issues with turning movements in SUMO because of inadequate clearing time. The existing signal controllers have actuated protected left-turn phases at all intersections except Hazel St, including green extension and a maximum green time for the protected phase. This design was replicated in SUMO using the built-in signal controller.
After training a single intersection model, the same transfer learning process discussed previously was applied to evaluate the model on the corridor. For simplicity, the same ring-and-barrier model design and structure discussed in the previous sections were used for all intersections (even though the intersection at Hazel does not have protected left phasing, the DRL controller was permitted to use it). The results are highlighted in Figures 5 and 6 which show the overall performance for the simulated period and the performance on a per 5-minute basis respectively.

Average delays (top left), queues (top right), and per route delays (bottom) for the full ring barrier design.
Training time increases substantially when compared with the case with a two-phase controller. A total of nearly 5 million frames of simulation were required before convergence was reached (3 million for the single intersection, and 2 million for the for fine-tuning) and was done over 10 days on an Intel i7-4790.
In relation to delay, the fully trained model provided better performance than the field implemented times at all intersections. In particular, the field times currently in use do not provide adequate service to the left turns at Albert and Westmount during the peak hour period, and queue-spillback blocking the main street occurs for part of the simulation. This spillback is better handled by the DRL model. This is clearly visible in Figure 6 where the spike in demand on the minor streets leads to higher delays in the field plan that are mitigated by the DRL model. On average under the field plan, vehicles experience a total delay of 83 s and at any given time an average total of 91 vehicles are waiting behind a stoplight. In contrast, in the DRL model vehicles experience an average delay of 78 s and at any given time an average of 66 vehicles are waiting behind a stoplight. It should be noted that, for some time periods during the peak hour, the queues overflow such that vehicles can no longer enter the simulation. This delay is not accounted for, as only in-network travel-time is considered in the delay calculation. Thus, the delay under the field conditions is higher than what is reported here. Based on these numbers, the DRL model improves on the field timing plan’s performance by at least 5% in relation to average delay.

Average delays for minor (left) and corridor streets (right) by time interval for last 10 runs.
Conclusion
The performance of RL is ultimately tied to the rewarding system used. In the proposed framework, a rewarding system based on queue lengths and discharging vehicles was used and compared against traditional methods and the existing signal timing plan. In general, this resulted in a controller that sought to lower queue lengths as much as possible, and when compared with traditional actuated control queues were lower by 60% in the two-phase design and by 27% in the full ring-barrier design.
Although the DRL controller reduced the average amounts of queued vehicles in the system significantly, only modest reductions in delay were observed. For the two-phase design, performance was similar between the DRL controller and an optimized Synchro-developed timing plan. In the full ring-barrier design, delays were reduced by approximately 5% when compared with currently implemented field timings. However, these reductions did not take into account queue-spillback on some side streets that prevented vehicles from entering the simulation during the peak hour period. The DRL controller was able to react to oversaturated conditions on side streets and prevent queue-spillback.
Limitations and Future Research
The proposed platform discussed here is one of the first examinations of DRL to replicate a fully featured ring-and-barrier style controller. While the proposed controller accommodates many of the important aspects of a traditional controller, some simplifications were made to adapt the design to better suit the paradigm that RL operates in. Future research will explore other approaches to action filtering.
The full ring-and-barrier design increases the action count in many situations from two actions to five. This results in higher training times. Initial training on the single intersection required approximately 3×as much experience in the full ring-and-barrier implementation, and just over 2×the amount of experience in fine-tuning. In relation to real-world time, the full ring-and-barrier design required 10 days of computation time to train to convergence compared with approximately 3 days for the two-phase design. The proposed platform does not use any GPU acceleration, and training was done only using the CPU, with the major bottleneck being the interface between the simulation platform and the controller.
This research focused on a feasibility assessment of using transfer learning to reduce the amount of training required, and demonstrates the potential for DRL to be used as an intelligent traffic controller. While some aspects are explored, many additional questions need further exploration. In particular, this study does not test varying levels of demand or compare the performance of other adaptive systems. Additionally, while some elements of coordination are discussed, this research did not focus on evaluating green progression along the corridor, and, in some cases, the proposed model provided higher delays on the evaluated corridor to maintain better levels of service on the side streets. Furthermore, in the platform evaluated, all signals were controlled by the DRL system, and no co-operation with other coordinated intersections was studied. Future research will further explore the effectiveness of DRL to work in wider coordinated systems as well as focus on improving the platform to better consider the actions of adjacent intersections.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: M. Muresan, G. Pan, L. Fu; data collection: M. Muresan; analysis and interpretation of results: M. Muresan, G. Pan, L. Fu; draft manuscript preparation: M. Muresan, G. Pan, L. Fu. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded in part by the National Science and Engineering Council (Canada) (PGS-D award, reference #503970), Mitacs Canada (Globalink Research Award [Application #IT09579]), and Ontario Research Fund—Research Excellence (ORF-RE) (Intelligent Systems for Sustainable Urban Mobility [ISSUM] project).
