Abstract
Traffic flow prediction can improve transportation efficiency, which is an important part of intelligent transportation systems. In recent years, the prediction method based on graph convolutional recurrent neural network has been widely used in traffic flow prediction. However, in real application scenarios, the spatial dependence of graph signals will change with time, and the filter using a fixed graph displacement operator cannot accurately predict traffic flow at the current moment. To improve the accuracy of traffic flow prediction, a two-layer graph convolutional recurrent neural network based on the dynamic graph displacement operator is proposed. The framework of our proposal is to use the first layer of static graph convolutional recurrent neural network to generate the sequence wave vector of the graph displacement operator. The sequence wave vector is passed through the deconvolutional neural network to obtain the sequence dynamic graph displacement operator, and then the second layer dynamic graph convolutional recurrent neural network is used to predict the traffic flow at the next moment. The model is evaluated on the METR-LA and PEMS-BAY datasets. Experimental results demonstrate that our model significantly outperforms other baseline models.
Introduction
Traffic flow prediction plays an important role in the field of intelligent transportation. Accurate traffic forecasts help urban traffic control and vehicle route planning [1, 2], thus greatly reducing the risk of traffic congestion and improving the overall efficiency of the city’s traffic [3, 4, 5]. Many efforts have been made to improve the accuracy of traffic predictions. Autoregressive models [6], K-nearest neighbor [7], and support vector machines [8] were initially used for prediction. With the development of deep neural networks, people use recurrent neural networks RNN [9, 10, 11] and convolutional neural networks CNN [12, 13, 14] for prediction. RNN uses the state vector of the hidden layer to capture the time-series correlation of traffic data [15, 16, 17]. CNN can learn spatial features in traffic data through convolution kernel. Researchers convert the feature matrix composed of traffic data into 2D images [18] or 3D matrices [19, 20], and then use a convolution or a combination of convolution methods to learn the spatial feature. Other researchers have tried to combine CNN and RNN. They use one-dimensional convolution operation to convolve the graph signal at each moment, and then input it to the RNN to get the prediction result [21]. However, traditional convolutional neural networks use fixed-size convolution kernel to aggregate pixel node information, which are not suitable for unstructured traffic networks. The graph convolutional neural network GCN [22] constructs corresponding filters based on the Laplacian matrix to filter graph signals, which can well capture the spatial dependencies of graph signals [23, 24]. So the combination of RNN and GCN has become the main method of traffic flow prediction [25]. Some researchers [26] used graph convolutional maps to replace fully connected maps in recurrent neural networks. It takes into account the structural properties and timing properties of the graph signals, which greatly improved the prediction efficiency of the model.
However, existing RNN-GCN frameworks have some limitations that make them still inefficient in traffic prediction. GCN constructs the corresponding filters based on a fixed Laplacian matrix, but in real applications, the relationship between intersections changes rapidly at different times. For example, if a traffic accident or congestion occurs in a certain road section, the current relationship between the two intersections of the road section will become weak. As a result, the exact Laplacian matrix of a graph is time-varying and often intractable. To capture the change in the graph displacement operator, Simonovsky [27] uses the filter-generating network to dynamically update each vertex aggregation weight to achieve the purpose of dynamically updating the network connection. Cucurull [28, 29, 30, 31] use the attention mechanism to calculate the attention weight of each intersection at the next moment and change the input graph signal each time. Zhou [32, 33, 34, 35] designed an adaptive graph displacement operator to better capture the changes in the graph displacement operator. Perozzi [36] uses the K-order adjacency matrix to extract the attribute matrix of the node, and aggregates the information of the multiorder neighbors. Diao [37] uses tucker decomposition to construct a tensor decomposition layer to decompose feature tensors into short-term tensors and global tensors, and then uses matrix estimators to obtain dynamic laplacian matrices. Yang [29, 38] uses the self-attention mechanism to calculate the attention coefficient of each node at each moment, and then obtains the final result through multiple convolution kernels. Wang [39] uses a clustering algorithm to convert traffic flow data into dynamic hypergraphs of different time scales, and then performs multiple temporal and spatial convolutions based on the dynamic hypergraphs. However, these methods do not change the graph displacement operator at each moment in the prediction process, which is not completely suitable for practical applications. To capture the change at each moment, Ji [40] reconstructs the traffic network at each moment through learnable node embedding. Rui [41, 42] concatenates the learnable node embedding with the graph signals of the nodes at each moment to calculate the weight coefficient of each edge, and then aggregates the neighbor information. These methods improve the accuracy of traffic flow prediction, but ignore the continuity of traffic network changes over a period of time. If the continuity is ignored, the convolution result of the graph signal will have a deviation, which will affect the final prediction accuracy.
To address the above issues, a two-layer dynamic graph convolutional recurrent neural network (TDGC-GRU) is proposed to improve the accuracy of traffic flow prediction. The state vector of the first layer of recurrent neural network is defined as the wave vector. They are added to the propagation of the next layer of the dynamic graph convolutional neural network, which realizes dynamic sequence graph convolution learning. The contributions of this paper are summarized as follows:
The first layer of static graph convolutional recurrent neural network is proposed to capture the continuous variation of the graph displacement operator. In order to achieve real-time learning with low complexity, a deconvolutional neural network is used to capture the mapping law between the wave vector and the wave graph displacement operator. The K-hop order dynamic diffusion convolution operation of the graph signal based on the dynamic directed graph displacement operator is designed to filter graph signal at each moment. The model is evaluated using the METR-LA and PEMS-BAY data sets. Experimental results show that our model achieves improved prediction accuracy compared to other baseline models.
The remainder of this paper is organized as follows: In Section 2, the background and objectives of the traffic flow prediction problem are introduced. In Section 3, the details of the two-layer dynamic graph convolutional recurrent neural network are presented. In Section 4, the TDGC-GRU model and baseline models are evaluated on the METR-LA and PEMS-BAY datasets. Section 5 concludes the work.

Schematic diagram of traffic flow prediction.
Directed graph of road network structure
It is assumed that a set of IoT devices located in a given road network segment capable of sensing and transmitting vehicles within their sensing range. Let
if
The traffic speed measured by each sensor

Architecture of TDGC-GRU.
Graph convolutional neural networks on dynamic directed graphs
Traditional graph neural networks use undirected graph-based Laplacian matrices, and then diffusion convolution [26] were proposed to predict traffic speeds. Diffusion convolution is an extension of graph convolutional networks (GCN) on directed graphs, which is computationally expensive. [29] proposed a simplified directed graph neural network to learn the directed diffusion structure of road traffic networks. In this paper, that neural network is used to filter the graph signal based on the static graph displacement operator as the input of the recurrent neural network that generates the wave vector of the sequence graph displacement operator. From the perspective of the spatial domain, it can be understood that the aggregation of the K-order neighbor information of the out-degree and in-degree of each node. Among them, the setting of K is very important and will affect the prediction effect of the model. Too large K will lead to over-smooth and too small K will make the structure information of nodes not fully learned. In the experimental part, the effect curve of K value is given. It can be understood from the perspective of the frequency domain as a polynomial adaptive filter designed based on the static graph displacement operator. The result of graph convolution integrates the attribute information and static spatial structure information of the graph signal at the current moment, and injects enough information of the current moment into the sequence wave vector of the graph displacement operator generated by the first-layer recurrent neural network. Specifically, the K-step diffusion convolution operation
where
When the fluctuating graph displacement operator
where
In order to obtain the sequence wave graph displacement operator

Deconvolution process of METR-LA data.
The input of the first layer of the recurrent neural network is the result of the convolution of the sequence graph signal based on the static graph displacement operator
Where
General settings of the parameters of the deconvolutional neural network (The parameter-writing format Layer () : (in channels, out channels, kernel size, stride, padding, output padding, bias)).
A deconvolutional neural network and the above parameter setting process are used to obtain a high-dimensional sparse wave matrix from a low-dimensional dense wave vector. The deconvolutional neural network achieves the purpose of expanding the size of the input feature map by adding 0 to the input feature map pixels and then performing convolution. The
After propagation of the first layer of the recurrent neural network and the deconvolutional neural network, the sequence dynamic graph displacement operator
where
Calculation the Output of the TDGC-GRU Layer.
In this section, the computational time complexity of TDGC-GRU is analyzed. The time complexity of the static directed graph convolution of the first layer GRU is
Discussion
Previously, the details of our proposed model TDGC-GRU has been introduced. In this section, the motivation of the model design and some details of the model are discussed. The motivation for designing the TDGC-GRU model is to obtain a time-series dynamic graph displacement operator to achieve accurate graph convolution of the graph signal at the current moment, thereby improving the final prediction accuracy of the recurrent neural network. In order to ensure the validity of graph convolution, the dynamic graph displacement operator is divided into two parts. One is the weighted adjacency matrix of the directed graph of the static road topology, and the other is the current wave graph displacement operator. In order to obtain the time-series dynamic wave graph displacement operator, another recurrent unit is added before the predicted recurrent unit. The recurrent unit for prediction is called as the second-level recurrent unit, and the recurrent unit for obtaining the wave graph displacement operator is called as the first-level recurrent unit. Among them, the hidden layer state vector of the first layer recurrent unit is called as the wave feature vector. The dimension of the wave graph displacement operator is large (
Regarding our use of two different graph convolution operations on the input of the two-layer recurrent neural network, our explanation is as follows. The input of the first-layer recurrent network uses the weighted adjacency matrix of the directed graph of the static road topology. The purpose of the graph convolution is to fuse the static road topology information and the current graph signal information. The result of this graph convolution is fed into the recurrent network to generate the wave vector for the next moment. The simplified directed graph neural network proposed by [29] is used to generate a suitable polynomial filter based on a static graph displacement operator to fuse graph signal information and road graph information. The input graph convolution of the second-layer cyclic network uses a dynamic graph displacement operator, which is equivalent to an adaptive graph displacement operator, corresponding to an adaptive filtering effect, so it is not necessary for the graph neural network to generate another filter. It is necessary to expand the receptive field of each node (the k power of the graph displacement operator) based on this adaptive graph displacement operator, and splice the information aggregated from different receptive fields.
Additionally, adversarial attacks on graph neural networks [43, 44, 45] point out that subtle changes in node attributes and graph topology can greatly affect the accuracy of results. So the spatial graph convolution and time-series capture of the graph signal are separated, and
Experiments
Datasets
All methods are evaluated by using two datasets containing traffic information collected from loop detectors on highways in Los Angeles County and the California Bay Area.
METR-LA: This data set contains traffic information collected from loop detectors on highways in Los Angeles County, and selects data recorded by 207 sensors at 5-minute intervals. The data set contains 119 days of data collected from March 1, 2012 to June 27, 2012. PEMS-BAY: This data set contains traffic information collected by the California Transit Agency Performance Measurement System (PeMS), which selected data recorded by 325 sensors at 5-minute intervals. The data set contains 183 days of data collected from January 1, 2017 to June 30, 2017.
The TDGC-GRU model is compared with the following baseline models:
Fc-GRU: A recurrent neural network using fully connected GRU recurrent units, where the hidden layer state vector is of size N. Conv-GRU: Before the graph signal is input to the GRU cyclic unit (the size of the hidden layer state vector is N), a 1-dimensional convolution operation is performed, kernel size Dc-GRU [29]: Before the graph signal is input to the GRU recurrent unit (the hidden layer state vector size is N), a simplified directed graph diffusion convolution is performed. Among them, K Diff-GRU [26]: The gate computation of the GRU recurrent unit is replaced by matrix multiplication with a simplified directed graph diffusion convolution. Among them, K TGC-LSTM [33]: Before the graph signal is input to the LSTM recurrent unit (the size of the hidden layer state vector is N), the author-defined TGC operation (traffic graph convolution) is performed on the state vector at the previous moment and the graph signal at the current moment. (the weighted adjacency matrix defined in Subsection 2.1 is used and the free-flow reachability matrix defined in the original paper is cancelled). Among them, K-hop
The model is implemented by using pytorch1.10.1 and Python3.8. The epoch is set to 150, the initial learning rate is 1e-04, and after the 100th epoch, the learning rate is adjusted to 1e-05. The optimizer adopts the RMSProp algorithm, the decay rate
Evaluation index
The model is evaluated by using three general metrics: 1) Root Mean Square Error (RMSE); 2) Mean Absolute Error (MAE); 3) Mean Absolute Percent Error (MAPE).
where
Table 1 shows the experimental results of different models in the two datasets. The three metrics introduced in D are used for comparison. As can be seen in the METR-LA dataset, our proposed model outperforms all baseline models on MAE and MAPE, and is second only to the Diff-GRU model on RMSE. MAE and MAPE are less susceptible to extreme values, while RMSE takes the square of the error, which amplifies the prediction error and is more sensitive to outlier extreme values. It can be seen that the Diff-GRU model is prone to overfitting to sudden fluctuations in traffic data, and the prediction results of our proposed model balance the influence of normal values and extreme values, and have good performance in the three indicators. In the following comparison of actual prediction results, the difference between the Diff-GRU and TDGC-GRU models and the superiority of the TDGC-GRU model can be seen. Among other baseline models, the Dc-GRU model also performs well, illustrating the effectiveness of the simplified directed graph diffusion convolution [29] for graph signal filtering. Compared to the TDGC-GRU model, it can be seen that the dynamic graph displacement operator prediction model is better than the static graph displacement operator prediction model. The poor performance of the TGC-LSTM model may be due to the fact that the model was originally proposed for undirected graphs, and the free-flow reachability matrix defined in the original paper is cancelled. As can be seen from the PEMS-BAY dataset, our proposed model outperforms all baseline models on all three metrics, achieving good results. The performance of the Diff-GRU model is also very good, and the RMSE index is similar to the TDGC-GRU model, maintaining the fitting ability on extreme values. It is worth noting that the Fc-GRU model performs well on the PEMS-BAY dataset, indicating that the traditional recurrent neural network also has a certain ability to fit traffic data. The performance of other baseline models is similar to that in the METR-LA dataset.
Comparison of different models.
Comparison of different models.
Figure 4 shows the convergence of different models on different datasets. As can be seen from the METR-LA dataset, since MAE and MAPE are less susceptible to extreme values, their convergence is similar. Except for the Diff-GRU model, other models enter a slow-convergence state after 30 epochs. Although there are slight fluctuations, the overall trend and the final convergence value tend to be stable as the learning rate continues to decrease. The initial error of the Diff-GRU model is large, the convergence rate is slow, and it enters a slow convergence state after 50 epochs. From the convergence values of MAE and MAPE, our proposed model TDGC-GRU outperforms other baseline models. In terms of the number of epochs entering convergence, our model outperforms the Diff-GRU model and is in line with other baseline models. The convergence of RMSE is different from that of MAE and MAPE. The Diff-GRU model enters a slow convergence state after 100 epochs, with the smallest convergence value and the best effect. Our model is second only to the Diff-GRU model, entering a slow convergence state after 30 epochs, and the convergence value is better than other baseline models. RMSE is more sensitive to extreme outlier values. The Diff-GRU model continuously reduces the RMSE value in 40–100 epochs, while the MAE and MAPE values of the model converge at the same time, indicating that the model is prone to overfitting to extreme values. It can be seen in the PEMS-BAY data set that all models enter a slow convergence state after the 20th epoch. The fluctuation of the convergence curves of all models between 20 and 100 epochs may be due to the large learning rate. After the 100th epoch, as the learning rate decreases, the convergence value is reached. The initial error of the Diff-GRU model is large, but the rate of error decline is rapid. In the 20 to 100th epoch, the three indicators of the Diff-GRU model have the same downward trend, indicating that the model does not overfit on the PEMS-BAY data set, and the overall change of the predicted data is relatively stable. In terms of the number of epochs entering convergence, our model outperforms the Diff-GRU model and is in line with other baseline models. From the final convergence value, our model outperforms all baseline models, indicating the effectiveness of our proposed model.

Model convergence curve. (a)–(c) METR-LA. (d)–(f) PEMS-BAY.

The effect of graph convolution parameters on the model. (a) K-hop
Figure 5 shows the effect of the parameter K of the static directed graph convolution
As can be seen in Figure 5b, in both datasets, the model performs best when K-hop

The change and impact of the sequence fluctuation graph displacement operator on METR-LA. The horizontal and vertical coordinates represent part of the sensor number. (a)-(c) The sequence fluctuation graph displacement operator of three adjacent moments. (d) The part of static graph displacement operator. (e) The part of fluctuating graph displacement operator. (f) The part of dynamic graph displacement operator.
Figure 6 shows the change of the sequence-fluctuation graph displacement operator and its impact on the static graph displacement operator during the prediction process on the METR-LA dataset. Fifty sensor values are selected for visualization. Figures 6a–6c shows the variation of the displacement operator of the sequence fluctuation graph. Among them, red represents the strengthened connection of the corresponding two sensors, blue represents the weakened connection of the corresponding two sensors, and the darker the color, the stronger the effect. From the distribution of colors, the enhanced and weakened regions change significantly over time. This is in line with our design of the wave graph displacement operator, which can sensitively capture the change of the graph displacement operator at each moment to achieve the purpose of accurate graph convolution. To ensure the availability of the dynamic graph displacement operator, the influence of the wave graph displacement operator on the static graph displacement operator is weaken, and
Prediction results visualization
Figure 7 visualizes the true prediction results of our model TDGC-GRU and two baseline models Diff-GRU and TGC-LSTM on the METR-LA and PEMS-BAY datasets. Among them, the predicted dates and intersections are selected randomly. From the real data of the METR-LA dataset, the data fluctuates greatly. There are large fluctuations in flow reduction at 04:00, 13:00 and 19:00, and the fluctuations are maintained at around 60–70 mph at other times. Judging from the prediction results of the METR-LA data set, the prediction results of the Diff-GRU model are obviously more sensitive to sudden fluctuations in the data, and are accurate for the fluctuations of large traffic reductions at 04:00, 13:00 and 19:00, while loss of prediction accuracy for most averages at other times. The prediction results of the TGC-LSTM model capture the fluctuation and average value in the data to some extent, but the overall effect is not good. The TDGC-GRU model has good predictions for the fluctuations and average values in the data. For the sudden large fluctuations in the data at 04:00 and 13:00, there are certain predictions, but they are not particularly sensitive, ensuring other time periods. For the prediction accuracy of the majority mean, the overall prediction performance is better than the two baseline models.
From the real data of the PEMS-BAY dataset, the fluctuation of the data is not large, and there is no sudden large fluctuation. From 05:00 to 06:00, the speed increased slightly from 65 mph to 70 mph, from 15:00 to 18:00, the speed gradually decreased from 60 mph to about 10 mph, and from 18:00 to 20:00, the speed gradually increased from 10 mph to about 65 mph. At other times Segment speed fluctuates slightly around 65 mph. From the prediction results of the PEMS-BAY dataset, except for the time period from 15:00 to 20:00, the prediction accuracy of the three models is similar, and there is not much difference. Among them, the prediction accuracy of the TGC-LSTM model and the TDGC-GRU model is better than the Diff-GRU model for the average value in the 20:00–24:00 time period. During the gradual decrease of speed from 15:00 to 18:00, the TDGC-GRU model captures the small fluctuations from 16:00 to 18:00 and the trough of 10 mph at 18:00 with more accuracy, outperforming the two baseline models. From the overall prediction effect, our TDGC-GRU model is more accurate.

The prediction results on two randomly selected days. (a) Sensor ID: 773906 in METR-LA on 2012.3.4. (b) Sensor ID: 402120 in PEMS-BAY on 2017.1.4.
In this paper, in order to improve the accuracy of network-wide traffic flow prediction, Two-layer Dynamic Graph Convolutional Recurrent Neural Network (TDGC-GRU) is proposed, which consists of a static graph convolutional recurrent neural network, a dynamic graph convolutional recurrent neural network and a deconvolutional neural network. Compared to previous graph convolutional recurrent neural networks based on static graph displacement operators, dynamic sequential graph convolution learning is implemented. The model is evaluated on the METR-LA and PEMS-BAY datasets. The experimental results show that our model balances the influence of extreme values and average values on the prediction results, and achieves an improvement in prediction accuracy compared with other baseline models.
Footnotes
Acknowledgments
This research was supported in part by National Natural Science Foundation of China under Grant nos. 61602290 and 61902229, Natural Science Basic Research Plan in Shaanxi Province of China under Grant nos. 2022JM-329 and No. 2020JM-288, Fundamental Research Funds for the Central Universities under Grant nos. GK202103090 and GK202103084, Natural Science Foundation of Gansu Province under Grant no. 21JR7RA282, Education Department of Gansu Province: Industrial Support Plan Project under Grant no. 2022CYZC-38, and Foundation of Guizhou Provincial Key Laboratory of Public Big Data under Grant no. 2018BDKFJJ004.
