Abstract
Traffic speed prediction is a crucial task of the intelligent traffic system. However, due to the highly nonlinear temporal patterns and non-static spatial dependence of traffic data, timely and accurate traffic forecasting remains a challenge. The existing methods usually use a static adjacency matrix to model spatial dependence while ignoring the spatial dynamic characteristics of the road network.Meanwhile, the dynamic influence of different time steps on the prediction target is ignored. Thus, we propose a dynamic multi-graph convolution recurrent neural network (DMGCRNN), which models the dynamic correlations of road networks over time based on various information of road network. Dynamic correlation is an essential factor for accurate traffic prediction, because it reflects the change of the traffic conditions in real-time. In this model, we design a dynamic graph construction method, which utilizes the local temporal and spatial characteristics of each road segment to construct dynamic graphs. Then, a dynamic multi-graph convolution fusion module is proposed, which considers the dynamic characteristics of spatial correlations and global information to model the dynamic trend of spatial dependence. Moreover, by combing the global context information, temporal attention is provided to capture the dynamic temporal dependence among different time steps. The experimental results from two real-world traffic datasets demonstrate that our method outperforms the state-of-the-art baselines.
Introduction
In recent years, the intelligent transportation system (ITS) has been a research hotspot, among which traffic prediction based on given historical traffic data is one of the essential tasks in ITS. Accurate and reliable traffic prediction can reduce traffic congestion and pollutant emissions, help people plan their travel routes and save travel time, which is essential for the development of smart cities [1]. However, as shown in Fig. 1, accurate traffic predictions become particularly challenging due to the complex spatio-temporal correlations among different sensors in the road network.

Topology of road network and complex spatio-temporal correlations between sensors. Adjacent sensors 1,2 are not always highly correlated, and distant sensors 1,4 can also influence each other. Meanwhile, the traffic condition at time step t
Traffic prediction, which can be considered a typical spatio-temporal problem, has never stopped being studied in recent decades. Recent research has applied deep learning to traffic forecasting. In order to capture spatial correlations, convolutional neural networks (CNN), graph neural networks (GCN) and attention are widely used. At first, some methods divided the road network into grids and used CNN to model spatial correlation, but this destroyed the original topology of the road network [2–4]. Recent research has formulated the road networks as graphs, using GCN or attention to model spatial dependence. Some methods use the distance of the road network as a measure of spatial correlations [5–7]. But these methods neglect many useful spatial relationships hidden in traffic data. Other methods propose adaptive adjacency matrices to capture hidden spatial dependence [8–10]. Recurrent neural networks (RNN), CNN and attention are widely used in temporal correlations modeling [5, 10–12].
However, the existing models based on deep learning still need to be improved for traffic prediction, which mainly facing the following challenges. (i) The dynamic characteristics of spatial dependence of road networks. Many studies still regard the impacts between sensors as static, although the interactions between two sensors may be different at different times. As shown in Fig. 2, on the METR-LA dataset, the speed patterns of the two sensors 15, 17 on April 5,2012 are similar in the morning peak hours (from 6:00 to 10:00). Traffic congestion at the sensor 17 (speed drops first) quickly affected sensor 15 on the same road section for a period of time. But in the evening peak hours (from 13:00 to 20:00), the traffic patterns of the two sensors are different. (ii) Dynamic nonlinear temporal dependence. Because of complex traffic patterns, such as accidents, special events and periodic patterns, different time steps have different effects on target prediction.

Example of dynamic spatial correlations.
In this paper, we propose a Dynamic Multi-Graph Convolution Recurrent Neural Network for traffic speed forecasting. In order to model the dynamic characteristics of spatial dependence, we design a dynamic graph construction method, which uses the adjacent temporal features and spatial features of nodes at different time steps simultaneously to obtain more abundant information to model the dynamic spatial correlation between sections. Meanwhile, the dynamic and global spatial dependencies are captured by a dynamic multi-graph convolution fusion module, which takes into account both global spatial information and dynamic spatial information. In order to model the dynamic temporal dependence among different time steps, we employ the attention mechanism, and assign different weights to different historical time steps based on the global context information of historical moments, so as to simulate the influence of different time steps.
In summary, this paper has the following contributions:
We propose a novel dynamic spatial dependence learning method, which utilizes the temporal characteristics and spatial structure information of nodes to generate dynamic graphs. It helps to capture the dynamic trend of spatial correlation more comprehensively.
We design a dynamic multi-graph convolution fusion module, which uses dynamic graph and global graph to capture global changes and local dynamic characteristics in spatial correlation. By using the global and dynamic information the performance of the traffic prediction model can be further improved.
We evaluate our model on two real-world traffic datasets, and experimental results show that our model outperforms other state-of-the-art methods in predicting traffic speed, which verifies the effectiveness of our model.
In the past decades, traffic forecasting has always been an important research topic.
The early statistical analysis methods include historical average (HA), autoregressive moving average(ARMA) [13], Autoregressive Integrated Moving Average (ARIMA) [14], vector autoregression (VAR) [15]. However, these methods assume that the data are stationary and univariate, so they have limited accuracy in forecasting traffic data.
Because of the complexity of data, machine learning model is widely employed. Including K-nearest neighbor(KNN) [16], support vector regression (SVR) [17], support vector machine (SVM) [18], and Kalman filtering (KF) [19]. However, these methods fail to model the highly nonlinear spatial-temporal correlations. Then they also need more feature engineering.
In recent years, with the development of deep learning, deep neural network has been widely used in image recognition [20, 21], traffic signal control [22, 23], etc., and has made remarkable achievements. Therefore, many researchers apply deep learning to traffic prediction to model the complex spatio-temporal correlation of traffic data.
Among deep learning methods, previously, recurrent neural networks, such as long short term memory (LSTM), gated recurrent unit(GRU), could model time series for traffic prediction [24–26]. Yu et al. [26] combined the LSTM network with the SAE to predict the traffic flow in complex situations, such as congestion. But these methods ignored the spatial dependence of road networks. In order to capture spatial dependence, some studies proposed to divide urban areas into regular grids and use CNN to model spatial correlations [2, 28]. However, these methods fail to effectively reflect the complex topology of the road network.
In recent years, scholars have found that it is more reasonable to regard road networks as a graph. Methods based on Graph Neural Network(GNN) have been widely developed and obtained satisfactory performance. Li et al. [5] proposed Diffusion Convolutional Recurrent Neural Network (DCRNN), which considers traffic flow as a bidirectional diffusion process on a graph to achieve traffic prediction. Yu et al. [7] proposed the Spatio-Temporal Graph Convolutional Networks(ST-GCN), which combines ChebNet and 1D temporal convolution. Lin et al. [29] proposed a REST framework that combines edge inference networks with GCN. Attention mechanism has also been widely adopted in traffic forecasting [12, 31]. Guo et al. [30] by considering and extracting periodic data and using attention to achieve traffic prediction. However, due to the pre-defined adjacency matrix limiting the model performance, the self-adaptive adjacency matrix is proposed to preserve the hidden spatial relationship between nodes [8, 32]. Wu et al. [8] proposed a parameterized approach to learning adjacency matrix and constructing graph structure adaptively. In addition to, Guo et al. [33] proposed a hierarchical structure, which divides the road network into micro and macro traffic graphs. A meta-learning strategy is employed to extract spatial dependencies between traffic nodes [34, 35]. The spatial-temporal graph be proposed to capture spatial correlations and temporal correlations simultaneously [36–38].
Although the above methods build the adjacency matrix to overcome the limitation of the pre-defined matrix, it is essentially a static matrix due to lack of dynamic adjustment. Han et al. [10] proposed a tucker decomposition method to learn the time-specific spatial dependencies of road segments during a day. Zhao et al. [39] proposed a dynamic adjustment module based on channel attention mechanism to obtain the dynamically adjusted spatial-temporal feature data. Sun et al. [40] constructed a dual hypergraph of traffic flow graph by a dual transformation to capture the correlation of the edges in the traffic flow graph. Although the above methods consider the dynamic characteristics of traffic data, they are not refined enough in constructing dynamic graphs. The method proposed by Liang et al. only learned the spatial dependence of different road sections in a day to roughly represent the daily spatial dependence changes. Meanwhile, the above methods ignore the global characteristics of traffic data. We argue that spatial dependency has global attributes and dynamic characteristics that changes with time. Therefore, we construct elaborate dynamic graph at each time step to model the spatio-temporal correlations.
Preliminaries
We denote a road network as a graph G = (V, E, A). Here V = {v1, v2, . . . , v N }is a set of node, where each node v i represent a traffic sensor on traffic road networks for collecting traffic data, and |V| = N to denote the number of nodes in a graph. E = {ei,j|v i ∈ V, v j ∈ V, i, j = 1, 2, . . . , n} is set of edge. ei,j ∈ E indicates that there is a connection between road sensors v i and v j . A ∈ RN×N is the adjacency matrix with weights Ai,j represents the proximity between node v i and v j , which is measured by the distance of the road network.
The traffic observation of the traffic network at time step t i is represented as traffic feature X t i ∈ RN×F, where F is the number of features of each node (e.g., traffic volume, traffic speed, etc.).
Problem Definition: Given a road network G and the observations of historical T time steps Xt1:t
T
= {X
t
1
, . . . , X
t
T
} ∈ RN×T×F at N nodes. Our aim is to learn a function
In this section, we will describe the dynamic multi-graph convolution recurrent neural network (DMGCRNN) we proposed in detail. DMGCRNN aims at extracting and integrate useful static and dynamic spatio-temporal dependent information from a given traffic network, so as to achieve timely and accurate traffic forecast. The DMGCRNN consists of two modules: graph construction and spatio-temporal correlation modeling. The graph construction module includes global graph construction and dynamic graph construction. In the graph construction module, not only the global spatial dependence between nodes is considered, but also the dynamic spatial dependence among nodes is studied. The spatio-temporal correlation modeling includes dynamic multi-graph convolution gated recurrent unit(DMGC-GRU)and temporal attention module. The DMGC-GRU makes fully use of information of spatial dependence and temporal dependence. The attention mechanism is employed to capture global context information. The above two parts combine to capture dynamic spatio-temporal dependence using graphs learned from graph construction. Figure 3 shows the overall framework of our proposed DMGCRNN model.

The architecture of DMGCRNN, where the black arrow represents traffic data and the blue arrow represents graph structure data.
The traffic network in the real-world is complicated and constantly changing. In order to capture the complex and dynamic spatial correlations, we construct global and dynamic graphs to learn spatial dependence with a broader view. The global graph reflects the long-term fixed global static spatial dependence among different nodes in the road network, which the dynamic graph reflects the dynamic correlations between nodes over time. We specify the global graph and dynamic graph as follows.
Global graph construction
In order to capture the global spatial correlations of the road network, we adopt a parametrized method to learn the global graph. The simple method is to assign a learnable N × N parameter tensor to directly represent global graph, where N is the number of nodes, but it will lead to high computational cost and space complexity. Moreover, random initialization brings convergence difficulties and numerical instability. To solve the above problems, firstly, we decompose the pre-defined adjacency matrix A into two smaller matrices by singular value decomposition as
Because of the dynamic characteristic of traffic data and the complex road network, the spatial dependence between segments is constantly changing. Therefore, we consider both spatial and temporal information when modeling dynamic graph. Dynamic graph construction includes three parts: feature sampling, feature aggregation, and dynamic graph generation. As shown in Fig. 4.

Dynamic graph construction. Taking the dynamic graph construction process at t i time step as an example, including three parts: feature sampling, feature aggregation and dynamic graph generation.
The graph at each time step t i is called the instantaneous dynamic graph as the encoder input. In the decoder, the input of the decoder is the final output of the encoder. If the predicted output of each unit in the decoder is used as the traffic data at that time to calculate the dynamic graph, it may bring errors, and errors will accumulate continuously. Therefore, the average aggregation of the learned dynamic graph sequence A d is used as the continuous-time dynamic graph A ct .
This module employs an encoder and decoder architecture to capture the dynamic spatio-temporal dependence, including two parts: DMGC-GRU and temporal attention. The DMGC-GRU is the unit in the encoder and decoder, and a dynamic multi-graph convolution fusion module (DMGC) is designed in DMGC-GRU to capture the global and dynamic spatial correlation of the road network simultaneously. The attention mechanism combines global context information to model the dependencies on different time steps.
Dynamic multi-graph convolution gated recurrent unit

Detailed information of DMGC-GRU. (a) dynamic multi-graph convolution gated recurrent unit (DMGC-GRU). (b) dynamic multi-graph convolution fusion(DMGC).
In order to better express the directivity of the road network, we adopt bidirectional diffusion graph convolution denoted as follows.
After the diffusion graphs convolution results from different aspects are obtained, the gated mechanism is employed to fuse the spatial representations at global and dynamic aspects adaptively:
where σ (·) is the Sigmoid activation function, and ⊙ represents the Hadamard product. U, V, b are learnable parameters. H gout , H dout represents the result of diffusion graph convolution of global and dynamic graph, respectively.
The DMGC of the above is denoted as
We replace the linear transformation in GRU with DMGC to capture rich spatio-temporal dependence. As shown in Fig. 5(a), the dynamic multi-graph convolution gated recurrent unit (DMGC-GRU) is defined as follows:
For time-series data, different time steps have different influences on predicted targets. In order to model this relationships, we use attention to assign different attention weights to the hidden states of the encoder. It is provided to the decoder to predict the future steps.
We compute a variable-length alignment vector a
t
i
(t
j
) by using the hidden state H
t
j
of the encoder and the hidden state H
t
i
of the decoder.
In the multi-step prediction model, we employ the sequence-to-sequence architecture and scheduled sampling strategy. Scheduled sampling is an effective curriculum learning strategy that can mitigate the discrepancy between training and inference phases. During the training period, we input the historical time series into the encoder, and initialize the decoder with the final state of the encoder or ground truth. The decoder obtains previous ground truth in a decay probability for prediction until it drops to 0, where the decay probability is calculated by decay steps cl _ decay _ steps. Therefore, the model can learn from the correct answers, and eventually give its own prediction. In the testing step, forecasting results replace the ground truth, and the probability of using the ground truth is reset as 0. During training, the goal is to minimize the mean absolute error(MAE) between predicted values and ground truths.
In this section, experimental results and detailed analysis on two real-world data sets will be presented.
Datasets
We evaluate the performance of our model on two real-world datasets, METR-LA and PEMS-BAY, which are publicly available and widely used in traffic prediction research. The details of these two datasets are as follows:
Table 1 shows the statistics of these datasets.
Benchmark dataset statistics
Benchmark dataset statistics
There are three categories of methods to predict traffic speed: statistical analysis method, machine learning method, and deep learning method. We selected the most representative or state-of-the-art methods in each of these three categories to compare with our model and evaluate the performance of our model. Specific baseline methods are as follows.
We use the original parameters from their original paper for all baseline methods.
Experiment settings
In the experiments, we apply three widely used metrics in traffic state prediction tasks to evaluate the performance of our model: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). MAE represents the average absolute error between the real and the predicted traffic data. RMSE represents the squared error, which magnifies the prediction error, is more sensitive to outlier data, and can highlight the error value that has great influence. MAPE measures the relative magnitude of deviations by percentage. These are defined as follows.
We have implemented our DMGCRNN model based on the Pytorch framework, and conducted experiments on a computer with one Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz and an NVIDIA Quadro P2000 GPU card. The early stopping strategy was used to avoid over-fitting, that is, we stop training when the predicted results have not improved for 30 consecutive epochs. We used the Adam optimizer to train our model, and the batch size and learning rate are set to 64 and 0.001, respectively. The hidden layer state d is set to 64. The decay steps cl _ decay _ steps is set to 4000. In the experiment, the main hypeparameters were tuned. For all datasets, set a 5-minute time step, and use the z-score method to normalize. The 70% of data is used for training, 10% are used for validation while the remaining 20% for testing. We considered each traffic sensor as a vertex, and calculated the road network distance between sensors as the pre-defined adjacency matrix A using the thresholded Gaussian kernel. In each dataset, we use historical traffic observations of 12-time step (1 hour) to predict the traffic speed of the next 12-time step (1 hour).
Prediction performance comparison
Table 2 shows the predicted performance of our method and all baselines for 15 minutes (3 horizons), 30 minutes (6 horizons), and 60 minutes (12 horizons) on two public datasets, METR-LA and PEMS-BAY. Our method outperforms results in both short-term and long-term predictions, which proves the effectiveness of our model.
As shown in the Table 2, traditional statistical methods and machine learning methods are not performing well. The reason is that statistical methods and machine learning methods cannot extract the temporal and spatial correlation of traffic data. Compared with traditional statistical methods and machine learning methods, the deep learning method has achieved better performance. The performance of LSTM is improved compared with traditional statistical methods and machine learning methods, which show the effectiveness of long short term memory network in modeling time series data. However, compared with deep learning methods (STGCN, DCRNN) which considers the spatial and temporal correlation at the same time, the performance of LSTM is poor, because LSTM ignores the spatial correlation of traffic data. This shows that the spatial topological information of traffic data is important for traffic prediction.
Prediction performance of DMGCRNN and other methods. The best results are marked in bold
Prediction performance of DMGCRNN and other methods. The best results are marked in bold
Moreover, in the methods that consider spatio-temporal dependence based on deep learning, the performance of STGCN and DCRNN is slightly worse. STGCN and DCRNN capture spatial dependencies based on predefined graph structures, which may ignore some crucial implicit dependencies in traffic data. Graph WaveNet, MTGNN, and AutoSTG all achieved good prediction performance. These methods might benefit by generating a novel adjacency matrix, which could effectively utilize the valuable but potential spatial correlation of historical traffic data. GMAN leverages node2vec algorithm to generate node structural information and performing self-attention mechanisms which achieves competitive performance in the 60 minute prediction. But the short-term prediction performance is mediocre, which might owe to the self-attention can not capture local spatial correlations.
However, these works ignore the dynamic of spatial dependence and fail to model the dynamic spatial dependence of traffic data. ADSTGCN, DDSTGCN and our method all capture the dynamic spatial correlations by constructing dynamic graphs, and have achieved excellent experimental results. ADSTGCN used a channel attention mechanism to mine the dynamic spatial correlation, which achieves good long-term predictive capability, but the short-term prediction results are unsatisfactory. DDSTGCN captures the dynamic spatial-temporal feature of the edges by transforming the traffic flow graph into its dual hypergraph, so as to obtain great predictive capability. However, the complexity of generating dual hypergraph is too high, and the global change characteristics of traffic data are ignored. Our DMGCRNN considers the global and dynamic characteristics of spatial correlations of traffic data at the same time, and further puts forward a fine model of the dynamic change of spatial correlation, which achieves the best prediction performance.
In order to compare the prediction effects of different methods more intuitively, we compared the traffic forecasting results of our method DMGCRNN with the spatio-temporal methods (DCRNN, Graph WaveNet, DDSTGCN), and focused on comparing the forecasting differences of different methods in the morning peak hours and the evening peak hours when the traffic data changes greatly, as shown in Fig. 6. From the part marked in black dot-dashed boxes, it is easy to observe that DMGCRNN can capture the dynamic changes of peak traffic more accurately than other methods. This also means that our method is better at handling complex and changeable traffic data due to our more accurate modeling of spatial-temporal dependency.

The Speed prediction results of a day of different methods on METR-LA.
In real-world traffic scenarios, traffic data may be missing or incorrect due to sensor failures or errors. To evaluate the model’s fault tolerance, we set the data fault-ratio at 10% to 90%, using zero-valued noise instead of traffic data to simulate real-world conditions on the METR-LA dataset.
As shown in Fig. 7, with the increase of noise ratio, the performance metrics of various models will decrease to varying degrees. The performance of STGCN becomes worse more quickly. This indicates that the robustness of the STGCN model is poor. Compared with STGCN, LSTM achieves better performance, which further proves its robustness in modeling time series data. In addition, the fault tolerance of DDSTGCN model ranks second, because it takes into account the dynamic characteristics of spatial dependence. Compared with other methods, our method DMGCRNN has the best fault tolerance capability. This indicates that DMGCRNN is effective and robust in the case of noisy data, and it can learn the spatio-temporal correlations from sparse and noisy real-world datasets more efficiently.

Fault-tolerance comparison.
In order to better investigate the contribution of each component in the model, we designed four variants. DMGCRNN-NG refers to removing the global graph on the basis of the original model to evaluate the validity of the global graph. DMGCRNN-ND, in this variant, we don’t use the dynamic graph of learning. DMGCRNN-NAtt, this variant has no temporal attention mechanism. And the DMGCRNN-NF variant does not use gated fusion mechanism, but simply adds the convolution results of global graph and dynamic graph.
The result is shown in Fig. 8, where MAE, RMSE, and MAPE for each variant are evaluated on two publicly available real-world datasets. It is observed that the dynamic graph, global graph, temporal attention and gated fusion all contribute to improve the performance of model. Comparing the results of the four variants, we find that the DMGCRNN-ND has the worst performance, which indicates that it is necessary to model the dynamic spatial correlations of traffic data for traffic prediction. The DMGCRNN is always superior to its variants, indicating the effectiveness of each component and the validity of our model.

Ablation study.
In order to evaluate the influence of hyper parameters on the model DMGCRNN, we evaluated the hyper parameters of DMGCRNN on the dataset METR-LA. The estimated hyperparameters include the dimension of hidden layer state d and the decay steps of scheduled sampling cl _ decay _ steps. The dimension of hidden layer state setting range is 32 to 128, and the decay steps setting range is 3000 to 5000.
We performed five experiments on the METR-LA, and reported the mean MAE on the test dataset as the experimental results. The result of the experimental is shown in Fig. 9. We observe that the performance of the model increases with the increase of hidden layer state dimension. However, when the dimension of the hidden layer state is too high, the model performance will decrease due to overfitting. The optimal value of the state of the hidden layer is stabilized at about 64. We observed that the decay steps of scheduled sampling work best at 4000. When the value of the decay steps is larger than the optimal value, the performance of the model will decrease.

Parameter study.
In this paper, we propose a dynamic multi-graph convolution recurrent neural network (DMGCRNN) to predict traffic speed. In the spatial dimension, we offer a dynamic graph construction method, which uses the aggregated spatio-temporal information and prior knowledge to construct dynamic graph and global graph. The former reflects the dynamic characteristics of spatial dependence with time, while the latter can describe the long-term fixed spatial dependence of the traffic network. Then DMGC is designed to capture spatial correlations by modeling these graphs, and use a gated to fuse information adaptively. In the temporal dimension, we use GRU to capture the temporal dependence. Temporal attention is also utilized to model the dynamic effects between different time steps. Finally, experimental results on two real-world datasets show that the performance of DMGCRNN is better than other models, which proves its effectiveness of capturing dynamic spatio-temporal correlations. In future work, we will consider more external factors, such as weather, POI, accidents, etc., to further improve the accuracy of the prediction. Additionally, we plan to apply our method to other areas of spatial-temporal prediction, such as air quality prediction, to help relevant authorities make efficient decisions.
