Two-layer dynamic graph convolutional recurrent neural network for traffic flow prediction

Abstract

Traffic flow prediction can improve transportation efficiency, which is an important part of intelligent transportation systems. In recent years, the prediction method based on graph convolutional recurrent neural network has been widely used in traffic flow prediction. However, in real application scenarios, the spatial dependence of graph signals will change with time, and the filter using a fixed graph displacement operator cannot accurately predict traffic flow at the current moment. To improve the accuracy of traffic flow prediction, a two-layer graph convolutional recurrent neural network based on the dynamic graph displacement operator is proposed. The framework of our proposal is to use the first layer of static graph convolutional recurrent neural network to generate the sequence wave vector of the graph displacement operator. The sequence wave vector is passed through the deconvolutional neural network to obtain the sequence dynamic graph displacement operator, and then the second layer dynamic graph convolutional recurrent neural network is used to predict the traffic flow at the next moment. The model is evaluated on the METR-LA and PEMS-BAY datasets. Experimental results demonstrate that our model significantly outperforms other baseline models.

Keywords

Traffic flow prediction graph convolution deep neural network

1. Introduction

Traffic flow prediction plays an important role in the field of intelligent transportation. Accurate traffic forecasts help urban traffic control and vehicle route planning [1, 2], thus greatly reducing the risk of traffic congestion and improving the overall efficiency of the city’s traffic [3, 4, 5]. Many efforts have been made to improve the accuracy of traffic predictions. Autoregressive models [6], K-nearest neighbor [7], and support vector machines [8] were initially used for prediction. With the development of deep neural networks, people use recurrent neural networks RNN [9, 10, 11] and convolutional neural networks CNN [12, 13, 14] for prediction. RNN uses the state vector of the hidden layer to capture the time-series correlation of traffic data [15, 16, 17]. CNN can learn spatial features in traffic data through convolution kernel. Researchers convert the feature matrix composed of traffic data into 2D images [18] or 3D matrices [19, 20], and then use a convolution or a combination of convolution methods to learn the spatial feature. Other researchers have tried to combine CNN and RNN. They use one-dimensional convolution operation to convolve the graph signal at each moment, and then input it to the RNN to get the prediction result [21]. However, traditional convolutional neural networks use fixed-size convolution kernel to aggregate pixel node information, which are not suitable for unstructured traffic networks. The graph convolutional neural network GCN [22] constructs corresponding filters based on the Laplacian matrix to filter graph signals, which can well capture the spatial dependencies of graph signals [23, 24]. So the combination of RNN and GCN has become the main method of traffic flow prediction [25]. Some researchers [26] used graph convolutional maps to replace fully connected maps in recurrent neural networks. It takes into account the structural properties and timing properties of the graph signals, which greatly improved the prediction efficiency of the model.

However, existing RNN-GCN frameworks have some limitations that make them still inefficient in traffic prediction. GCN constructs the corresponding filters based on a fixed Laplacian matrix, but in real applications, the relationship between intersections changes rapidly at different times. For example, if a traffic accident or congestion occurs in a certain road section, the current relationship between the two intersections of the road section will become weak. As a result, the exact Laplacian matrix of a graph is time-varying and often intractable. To capture the change in the graph displacement operator, Simonovsky [27] uses the filter-generating network to dynamically update each vertex aggregation weight to achieve the purpose of dynamically updating the network connection. Cucurull [28, 29, 30, 31] use the attention mechanism to calculate the attention weight of each intersection at the next moment and change the input graph signal each time. Zhou [32, 33, 34, 35] designed an adaptive graph displacement operator to better capture the changes in the graph displacement operator. Perozzi [36] uses the K-order adjacency matrix to extract the attribute matrix of the node, and aggregates the information of the multiorder neighbors. Diao [37] uses tucker decomposition to construct a tensor decomposition layer to decompose feature tensors into short-term tensors and global tensors, and then uses matrix estimators to obtain dynamic laplacian matrices. Yang [29, 38] uses the self-attention mechanism to calculate the attention coefficient of each node at each moment, and then obtains the final result through multiple convolution kernels. Wang [39] uses a clustering algorithm to convert traffic flow data into dynamic hypergraphs of different time scales, and then performs multiple temporal and spatial convolutions based on the dynamic hypergraphs. However, these methods do not change the graph displacement operator at each moment in the prediction process, which is not completely suitable for practical applications. To capture the change at each moment, Ji [40] reconstructs the traffic network at each moment through learnable node embedding. Rui [41, 42] concatenates the learnable node embedding with the graph signals of the nodes at each moment to calculate the weight coefficient of each edge, and then aggregates the neighbor information. These methods improve the accuracy of traffic flow prediction, but ignore the continuity of traffic network changes over a period of time. If the continuity is ignored, the convolution result of the graph signal will have a deviation, which will affect the final prediction accuracy.

To address the above issues, a two-layer dynamic graph convolutional recurrent neural network (TDGC-GRU) is proposed to improve the accuracy of traffic flow prediction. The state vector of the first layer of recurrent neural network is defined as the wave vector. They are added to the propagation of the next layer of the dynamic graph convolutional neural network, which realizes dynamic sequence graph convolution learning. The contributions of this paper are summarized as follows:

–
The first layer of static graph convolutional recurrent neural network is proposed to capture the continuous variation of the graph displacement operator.
–
In order to achieve real-time learning with low complexity, a deconvolutional neural network is used to capture the mapping law between the wave vector and the wave graph displacement operator.
–
The K-hop order dynamic diffusion convolution operation of the graph signal based on the dynamic directed graph displacement operator is designed to filter graph signal at each moment.
–
The model is evaluated using the METR-LA and PEMS-BAY data sets. Experimental results show that our model achieves improved prediction accuracy compared to other baseline models.

The remainder of this paper is organized as follows: In Section 2, the background and objectives of the traffic flow prediction problem are introduced. In Section 3, the details of the two-layer dynamic graph convolutional recurrent neural network are presented. In Section 4, the TDGC-GRU model and baseline models are evaluated on the METR-LA and PEMS-BAY datasets. Section 5 concludes the work.

Figure 1.
Schematic diagram of traffic flow prediction.
2. Problem formulation

2.1. Directed graph of road network structure

It is assumed that a set of IoT devices located in a given road network segment capable of sensing and transmitting vehicles within their sensing range. Let $V = {v_{1}, \dots, v_{i}, \dots, v_{N}}$ denote a set of such nodes, and let E denote a set of connected edges corresponding to V. Then the directed graph of the road network can be expressed as $G = (V, E, A)$ . $A \in R^{N * N}$ is used to represent the weighted adjacency matrix of the directed graph. To reflect the proximity-based impacts of the connected sensor nodes, the weighted adjacency matrix is built with the radial basis function (RBF) kernel following [26]. Specifically,

\begin{aligned} A_{i, j} & = \exp (- [(d i s t (v_{i}, v_{j})^{2}) / (σ^{2})]) \end{aligned}

(1)

if $\exp (∙) < κ$ , then $A_{i, j} = 0$ . $A_{i, j}$ represents the edge weight between the sensor $v_{i}$ and the sensor $v_{j}$ , and $d i s t (v_{i}, v_{j})$ represents the distance between two nodes. $σ$ represents the standard deviation of all pairwise distances and $κ$ is the threshold. For elements along the main diagonal, $A_{i, i} = 1$ is set since each node has a distance of 0 from itself. The weighted adjacency matrix $A$ is a typical graph displacement operator. The graph displacement operator can be defined as a matrix $S \in R^{N * N}$ ( $N$ is the number of nodes in the graph) that can only take non-zero values on diagonal and edge coordinates, with all other positions being zero. Given a graph signal $x \in R^{N * 1}$ , $S \cdot x$ describes a transformation operation that acts on the first-order subgraph of each node.

2.2. Traffic eigenvectors and flow prediction

The traffic speed measured by each sensor $v_{i}$ at time t is denoted as $x_{t}^{i}$ . Then the time series of the traffic speed collected by the sensor is $x^{i} = (x_{1}^{i}, \dots, x_{t}^{i}, \dots, x_{T}^{i})$ , and the time series vectors of all the N sensors are spliced together and expressed as $X_{T}^{N} = (x^{1}, x^{2}, \dots, x^{N})^{T}$ , where $X_{T}^{N} \in R^{N * T}$ . $X_{T}^{N}$ can also be expressed as $X_{T}^{N} = (x_{1}, x_{2}, \dots, x_{T})$ , where $x_{t} \in R^{N}$ represents the observations of the N sensors at time t. Given the structure of the road sensor network G and the historical traffic speed matrix $X_{T}^{N}$ observed from all sensors, the goal of traffic speed prediction is to learn the function $f (X_{T}^{N}, G) = x_{T + 1}^{N}$ , using this function predicts the traffic speed ${\hat{x}}_{T + 1}^{N}$ at all intersections at the next moment.

Figure 2.

Architecture of TDGC-GRU.

3. Methodology

3.1. Graph convolutional neural networks on dynamic directed graphs

Traditional graph neural networks use undirected graph-based Laplacian matrices, and then diffusion convolution [26] were proposed to predict traffic speeds. Diffusion convolution is an extension of graph convolutional networks (GCN) on directed graphs, which is computationally expensive. [29] proposed a simplified directed graph neural network to learn the directed diffusion structure of road traffic networks. In this paper, that neural network is used to filter the graph signal based on the static graph displacement operator as the input of the recurrent neural network that generates the wave vector of the sequence graph displacement operator. From the perspective of the spatial domain, it can be understood that the aggregation of the K-order neighbor information of the out-degree and in-degree of each node. Among them, the setting of K is very important and will affect the prediction effect of the model. Too large K will lead to over-smooth and too small K will make the structure information of nodes not fully learned. In the experimental part, the effect curve of K value is given. It can be understood from the perspective of the frequency domain as a polynomial adaptive filter designed based on the static graph displacement operator. The result of graph convolution integrates the attribute information and static spatial structure information of the graph signal at the current moment, and injects enough information of the current moment into the sequence wave vector of the graph displacement operator generated by the first-layer recurrent neural network. Specifically, the K-step diffusion convolution operation $f_{s} (x | θ, A, K)$ on the graph signal $x_{t}$ is defined as:

\begin{aligned} x_{t} (K) & = f_{s} (x_{t} | θ, A, K) \\ = \sum_{k = 0}^{K - 1} (θ_{k, 1} (D_{o}^{- 1} A)^{k} + θ_{k, 2} (D_{I}^{- 1} A^{T})^{k}) x_{t} \end{aligned}

(2)

where $θ \in R^{K * 2}$ are training parameters, $A \in R^{N * N}$ represents a static graph displacement operator (weighted adjacency matrix of the road network), $D_{o} = diag (A I)$ is the out-degree diagonal matrix of A, $D_{I} d i a g (A^{T} I)$ is the in-degree diagonal matrix of A.

When the fluctuating graph displacement operator $A_{t}$ at time t from the deconvolutional neural network is obtained, combined with the static graph displacement operator A, the dynamic graph displacement operator ${\hat{A}}_{t}$ can be obtained. At this time, the directed graph convolutional neural network based on the dynamic graph displacement operator has a new representation. Specifically, the K-hop order dynamic diffusion convolution operation $f_{D} (x | θ, {\hat{A}}_{t}, k)$ for the graph signal $x_{t}$ is defined as:

\begin{aligned} {\hat{A}}_{t} & = A + α A_{t} \end{aligned}

(3)

\begin{aligned} x_{t}^{k} & = f_{D} (x_{t} | θ, {\hat{A}}_{t}, k) \\ = (θ_{k, 1} {(D_{o}^{- 1} {\hat{A}}_{t})}^{k} + θ_{k, 2} {(D_{I}^{- 1} {\hat{A}}_{t}^{T})}^{k}) x_{t} \end{aligned}

(4)

\begin{aligned} X_{t}^{K_h o p} & = [x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{K_h o p}] \end{aligned}

(5)

where $θ \in R^{K_hop* 2}$ are trainable parameters, $α \in (0, 1)$ are adjustable coefficients, $D_{o} = d i a g ({\hat{A}}_{t} I)$ is the out-degree diagonal matrix of ${\hat{A}}_{t}$ , $D_{I} = d i a g ({\hat{A}}_{t}^{T} I)$ is the in-degree diagonal matrix of ${\hat{A}}_{t}$ , $X_{t}^{K_h o p} \in R^{N * K_h o p}$ contains the graph convolution feature of K-hop order, A represents the static graph displacement operator and $A_{t}$ represents the graph displacement operator that fluctuates in time t. From a spatial perspective, this graph convolution uses an adaptive graph displacement operator to aggregate the K-hop-order neighbor information of graph signal dynamics. From the frequency domain perspective, the graph convolution corresponds to an adaptive filtering effect. The aggregated information of each order of neighbors is spliced to obtain the graph convolution result $X_{t}^{K_h o p}$ based on the dynamic graph displacement operator.

3.2. Recurrent and deconvolutional propagation of wave vectors

In order to obtain the sequence wave graph displacement operator $A_{t}$ , the first layer of recurrent neural network is used. The state vector of the first layer of recurrent neural network is defined as the wave vector. The wave vectors are one-dimensional vector that describe the dynamics of the graph displacement operator. To reduce computational complexity, the wave feature vector $F_{t} \in R^{p} (p ≪ N^{2})$ is set, and a deconvolutional neural network is used to map the wave feature vector to the wave graph displacement operator.

Figure 3.

Deconvolution process of METR-LA data.

The input of the first layer of the recurrent neural network is the result of the convolution of the sequence graph signal based on the static graph displacement operator $x_{T}^{N} (K)$ and the initial fluctuation vector $F_{0} \in R^{p} (p ≪ N^{2})$ (from the random generated standard normal distribution). During the propagation process, the recurrent neural network will update the wave vector according to the convolution result of the graph signal $x_{t} (K)$ at the current moment, thereby generating the wave feature vector at the next moment. The graph signal convolution result $x_{t} (K)$ contains the attribute information of the graph signal at the current moment and the topology information of the static road network. Combined with the wave vector information from the previous sequence, the wave vector at the next moment can be predicted. The specific propagation process of the first layer of the recurrent neural network is as follows:

\begin{aligned} z_{t} & = σ (w_{z f} . [F_{t - 1}; x_{t} (K)] + b_{z f}) \end{aligned}

(6)

\begin{aligned} r_{t} & = σ (w_{r f} . [F_{t - 1}; x_{t} (K)] + b_{r f}) \end{aligned}

(7)

\begin{aligned} c_{t} & = t a n h (w_{c f} . [r_{t} ⊙ F_{t - 1}; x_{t} (K)] + b_{c f}) \end{aligned}

(8)

\begin{aligned} F_{t} & = (1 - z_{t}) ⊙ F_{t - 1} + z_{t} ⊙ c_{t} \end{aligned}

(9)

Where $W_{z f}, W_{r f}, W_{c f} \in R^{p * (p + N)}$ , $b_{z f}, b_{r f}, b_{c f} \in R^{p}$ are the weight matrix and bias vector of the update gate $z_{t}$ , the reset gate $r_{t}$ and the candidate output gate $c_{t}$ , respectively. The operator [;] means concatenating two tensors along the same dimension. $σ ()$ is the sigmoid activation function, tanh() is the hyperbolic tangent function, and $⊙$ represents element-wise multiplication. Through the propagation of the first layer of recurrent neural network, the sequence wave feature vector can be obtained. They are input to the deconvolutional neural network to obtain the sequence wave graph displacement operator $A_{t}$ . For the fluctuation vector at each moment, the same deconvolutional neural network is used, which greatly reduces the computational load and complexity of the model. The specific propagation process of the deconvolutional neural network is as follows:

\begin{aligned} A_{t + 1} & = Conv_transpose (F_{t}) \end{aligned}

(10)

$conv_transpose (\cdot)$ represents the propagation of the deconvolutional neural network, and $A_{(t + 1)}$ is the graph displacement operator that fluctuates at time $t + 1$ . In particular, $A_{1} \in O^{N * N}$ . For different datasets, the output results of the deconvolutional neural network are different. Assuming that our $p$ -dimensional wave vector $F_{t}$ can be modified to have a shape of $(c_{i n}, 2^{k}, 2^{k})$ , the wave adjacency matrix of $(1, N, N)$ needs to be obtained, and now a general deconvolutional neural network parameter setting process is given in Algorithm 1.

Algorithm 1

General settings of the parameters of the deconvolutional neural network (The parameter-writing format Layer () : (in channels, out channels, kernel size, stride, padding, output padding, bias)).

Input:

k, N, c_{t o t a l}, c_{in}

Initialize:

C h a n n e l s = c_{t o t a l}, I = 1

While():

(2^{k + 1} > N)

L a y e r (\log_{2} I + 1)

(Channels / I, Channels / Channels,

4, 2, ⌈ \frac{2^{k + 1} + 2 - N}{2} ⌉, (N + 2 padding - 2^{k + 1} - 2),

False)

Break

L a y e r (\log_{2} I + 1) :

(I == 1)

(c_{in}, C h a n n e l s, 4 I, 2, 2 I - 1, 0, F a l s e)

Else :

(2 Channels / I, Channels / I, 4 I, 2, 2 I - 1, 0, F a l s e)

(2^{k + 1} == N)

Break

I * = 2

k + = 1

Output:

Layer (1), \dots, L a y e r (\log_{2} I + 1)

A deconvolutional neural network and the above parameter setting process are used to obtain a high-dimensional sparse wave matrix from a low-dimensional dense wave vector. The deconvolutional neural network achieves the purpose of expanding the size of the input feature map by adding 0 to the input feature map pixels and then performing convolution. The $c_{t o t a l}$ parameter is the number of output channels of the first layer of the deconvolutional neural network, in order to capture the information of the wave vector from multiple channels. As the number of deconvolution layers deepens, the size of the output feature map will also increase, so the size of the corresponding convolution kernel will also increase accordingly, which can expand the perception area of the convolution kernel and aggregate more information from surrounding nodes. The last layer of deconvolution aggregates all channel information and reduces the perception range of the convolution kernel to increase the difference of each point in the output feature map. The final output feature map is controlled between [ $-$ 1, 1] through the Tanh activation function, and is added to the static graph displacement operator to achieve the purpose of updating the graph displacement operator. In the loss function, the F norm of the fluctuation matrix generated at each moment is used to ensure the sparsity of the fluctuation matrix generated at each moment. In addition, the updating of convolution kernel parameters through the stacking of multiple deconvolution layers and the propagation of subsequent networks, as well as the divisibility of the convolution kernel by the stride size in the parameter setting, weaken the checkerboard effect of deconvolution neural network to a certain extent.

3.3. Dynamic graph convolutional recurrent neural networks

After propagation of the first layer of the recurrent neural network and the deconvolutional neural network, the sequence dynamic graph displacement operator ${\hat{A}}_{T}$ is obtained, and the dynamic diffusion convolution operation in Eq. (4) is used to obtain the sequence dynamic sequence graph convolution result $X_{T}^{K_h o p}$ , flatten the convolution result at each moment into a 1-dimensional vector, and input it to the second-layer recurrent neural network to capture the time correlation of the sequence diagram signal $X_{T}^{K_h o p}$ . The specific propagation process is as follows:

\begin{aligned} z_{t} & = σ (w_{z} . [h_{t - 1}; X_{t}^{K}] + b_{z}) \end{aligned}

(11)

\begin{aligned} r_{t} & = σ (w_{r} . [h_{t - 1}; X_{t}^{K}] + b_{r}) \end{aligned}

(12)

\begin{aligned} c_{t} & = t a n h (w_{c} . [r_{t} ⊙ h_{t - 1}; X_{t}^{K}] + b_{c}) \end{aligned}

(13)

\begin{aligned} h_{t} & = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ c_{t} \end{aligned}

(14)

where $W_{z}, W_{r}, W_{c} \in R^{N * (N + N * K_h o p)}$ , $b_{z}, b_{r}, b_{c} \in R^{N}$ are the weight matrix and bias of update gate $z_{t}$ , reset gate $r_{t}$ and candidate output gate $c_{t}$ respectively. $σ ()$ is the sigmoid activation function and tanh() is the hyperbolic tangent function, and $⊙$ represents element-wise multiplication. At the last time step T, the hidden state vector $h_{T}$ is the output of TDGC-GRU, i.e. the predicted value ${\hat{X}}_{T + 1} = h_{T}$ . The loss function is defined as follows:

\begin{aligned} L o s s = L ({\hat{X}}_{T + 1}, X_{T + 1}) + β \sum_{t = 1}^{T} ∥ A_{t} ∥_{F} \end{aligned}

(15)

$L$ is the mean square error (MSE) function, $∥ \cdot ∥_{F}$ is the $F$ norm of the matrix, which is used to ensure the sparsity of the wave graph displacement operator output by the deconvolutional neural network, β $\in (0, 1)$ is the adjustable regularization factor.

Algorithm 2

Calculation the Output of the TDGC-GRU Layer.

Input:

X_{T}^{N}, X_{T + 1}, A

Parameters:

θ

in Eq. (2) and Eq. (4),

W, b

in Eqs 6)–(9) and Eqs (11)–(14), parameters in Eq. (10)

Initialize:

h_{0} \in O^{N}, F_{0} \in R^{p}, A_{1} \in O^{N * N}

for

t = 1

T

x_{t} (K) = f_{s} (x_{t} | θ, A, K)

F_{t} = G R U_1 (x_{t} (K), F_{t - 1})

A_{t + 1} = Conv_transpose (F_{t})

{\hat{A}}_{t} = A + α A_{t}

x_{t}^{k} = f_{D} (x_{t} | θ, {\hat{A}}_{t}, k)

X_{t}^{K_h o p} = [x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{K_h o p}]

h_{t} = G R U_2 (X_{t}^{K_h o p}, h_{t - 1})

end for

Output:

h_{T}

3.4. Complexity analysis

In this section, the computational time complexity of TDGC-GRU is analyzed. The time complexity of the static directed graph convolution of the first layer GRU is $O (K | E |)$ . The time complexity of the first layer GRU is $O (P^{2} + PN)$ . The time complexity of each layer of the deconvolutional neural network is $O (m^{2} * k^{2} * c_{in} * c_{out})$ , where m is the size of the output feature map, k is the size of the convolution kernel, and $c_{in}$ is the number of input channels , $c_{out}$ is the number of output channels. The time complexity of the K-hop order dynamic diffusion convolution of the second-layer GRU is $O (N^{2})$ . The time complexity of the second-layer GRU is $O (N^{2} + N^{2} * K_h o p)$ .

3.5. Discussion

Previously, the details of our proposed model TDGC-GRU has been introduced. In this section, the motivation of the model design and some details of the model are discussed. The motivation for designing the TDGC-GRU model is to obtain a time-series dynamic graph displacement operator to achieve accurate graph convolution of the graph signal at the current moment, thereby improving the final prediction accuracy of the recurrent neural network. In order to ensure the validity of graph convolution, the dynamic graph displacement operator is divided into two parts. One is the weighted adjacency matrix of the directed graph of the static road topology, and the other is the current wave graph displacement operator. In order to obtain the time-series dynamic wave graph displacement operator, another recurrent unit is added before the predicted recurrent unit. The recurrent unit for prediction is called as the second-level recurrent unit, and the recurrent unit for obtaining the wave graph displacement operator is called as the first-level recurrent unit. Among them, the hidden layer state vector of the first layer recurrent unit is called as the wave feature vector. The dimension of the wave graph displacement operator is large ( $A_{t} \in O^{N * N}$ ). If the wave matrix is directly flattened into a one-dimensional vector as the wave vector of the first-layer recurrent unit, it will bring a huge unbearable computational load. So the initial wave vector is set to be $F_{0} \in R^{p} (p ≪ N^{2})$ (randomly generated from the standard normal distribution), and then the wave graph displacement operator at the current moment is output through the deconvolutional neural network. Since the deconvolutional neural network captures the law of the wave vector to the wave matrix, the same deconvolutional neural network is used for the wave vector at each moment, which greatly reduces the number of parameters.

Regarding our use of two different graph convolution operations on the input of the two-layer recurrent neural network, our explanation is as follows. The input of the first-layer recurrent network uses the weighted adjacency matrix of the directed graph of the static road topology. The purpose of the graph convolution is to fuse the static road topology information and the current graph signal information. The result of this graph convolution is fed into the recurrent network to generate the wave vector for the next moment. The simplified directed graph neural network proposed by [29] is used to generate a suitable polynomial filter based on a static graph displacement operator to fuse graph signal information and road graph information. The input graph convolution of the second-layer cyclic network uses a dynamic graph displacement operator, which is equivalent to an adaptive graph displacement operator, corresponding to an adaptive filtering effect, so it is not necessary for the graph neural network to generate another filter. It is necessary to expand the receptive field of each node (the k power of the graph displacement operator) based on this adaptive graph displacement operator, and splice the information aggregated from different receptive fields.

Additionally, adversarial attacks on graph neural networks [43, 44, 45] point out that subtle changes in node attributes and graph topology can greatly affect the accuracy of results. So the spatial graph convolution and time-series capture of the graph signal are separated, and $α$ in Eq. (3) is set to be small. In this way, the availability of graph convolution results can be ensured while capturing the dynamic changes of the graph displacement operator.

4. Experiments

4.1. Datasets

All methods are evaluated by using two datasets containing traffic information collected from loop detectors on highways in Los Angeles County and the California Bay Area.

–
METR-LA: This data set contains traffic information collected from loop detectors on highways in Los Angeles County, and selects data recorded by 207 sensors at 5-minute intervals. The data set contains 119 days of data collected from March 1, 2012 to June 27, 2012.
–
PEMS-BAY: This data set contains traffic information collected by the California Transit Agency Performance Measurement System (PeMS), which selected data recorded by 325 sensors at 5-minute intervals. The data set contains 183 days of data collected from January 1, 2017 to June 30, 2017.

4.2. Baseline model

The TDGC-GRU model is compared with the following baseline models:

–
Fc-GRU: A recurrent neural network using fully connected GRU recurrent units, where the hidden layer state vector is of size N.
–
Conv-GRU: Before the graph signal is input to the GRU cyclic unit (the size of the hidden layer state vector is N), a 1-dimensional convolution operation is performed, kernel size $=$ 5, stride $=$ 2, input dim $=$ 1, output dim $=$ 2, using the ReLU activation function.
–
Dc-GRU [29]: Before the graph signal is input to the GRU recurrent unit (the hidden layer state vector size is N), a simplified directed graph diffusion convolution is performed. Among them, K $=$ 3.
–
Diff-GRU [26]: The gate computation of the GRU recurrent unit is replaced by matrix multiplication with a simplified directed graph diffusion convolution. Among them, K $=$ 3.
–
TGC-LSTM [33]: Before the graph signal is input to the LSTM recurrent unit (the size of the hidden layer state vector is N), the author-defined TGC operation (traffic graph convolution) is performed on the state vector at the previous moment and the graph signal at the current moment. (the weighted adjacency matrix defined in Subsection 2.1 is used and the free-flow reachability matrix defined in the original paper is cancelled). Among them, K-hop $=$ 3, $λ_{1} = λ_{2} = 0.01$ .

4.3. Experimental setup

The model is implemented by using pytorch1.10.1 and Python3.8. The epoch is set to 150, the initial learning rate is 1e-04, and after the 100th epoch, the learning rate is adjusted to 1e-05. The optimizer adopts the RMSProp algorithm, the decay rate $ρ$ is set to 0.99, $δ$ is set to 1e-08 (to avoid the denominator of the optimization process being zero), and the batch size of each parameter update is 10. The static and dynamic graph convolution diffusion parameters are set to 5 and 3, respectively. The duration of the historical time series is set to 10, the coefficient of the fluctuating weighted adjacency matrix $α$ is set to 0.01, and the regularization parameter $β$ is set to 1e-05. The dimension of the fluctuation feature vector is set to 1024, and the shape is modified to (4, 16, 16) before input to the deconvolutional neural network. The deconvolutional neural network uses the ReLU activation function and batch normalization BN, $c_{t o t a l}$ is set to 64, and the output layer uses the Tanh activation function, using the parameter setting procedure in Algorithm 1.

4.4. Evaluation index

The model is evaluated by using three general metrics: 1) Root Mean Square Error (RMSE); 2) Mean Absolute Error (MAE); 3) Mean Absolute Percent Error (MAPE).

\begin{aligned} R M S E (x, \hat{x}) & = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {(x_{t} - {\hat{x}}_{t})}^{2}} \end{aligned}

(16)

\begin{aligned} MAE (x, \hat{x}) & = \frac{1}{M} \sum_{i = 1}^{M} | x_{t} - {\hat{x}}_{t} | \end{aligned}

(17)

\begin{aligned} M A P E (x, \hat{x}) & = \frac{1}{M} \sum_{i = 1}^{M} | \frac{x_{t} - {\hat{x}}_{t}}{x_{t}} | * 100 % \end{aligned}

(18)

where $x_{t}$ and ${\hat{x}}_{t}$ represent the true labels and predicted results, and M is the set of all test samples. Obviously, the lower the value, the better the predictive model.

4.5. Experimental results

Table 1 shows the experimental results of different models in the two datasets. The three metrics introduced in D are used for comparison. As can be seen in the METR-LA dataset, our proposed model outperforms all baseline models on MAE and MAPE, and is second only to the Diff-GRU model on RMSE. MAE and MAPE are less susceptible to extreme values, while RMSE takes the square of the error, which amplifies the prediction error and is more sensitive to outlier extreme values. It can be seen that the Diff-GRU model is prone to overfitting to sudden fluctuations in traffic data, and the prediction results of our proposed model balance the influence of normal values and extreme values, and have good performance in the three indicators. In the following comparison of actual prediction results, the difference between the Diff-GRU and TDGC-GRU models and the superiority of the TDGC-GRU model can be seen. Among other baseline models, the Dc-GRU model also performs well, illustrating the effectiveness of the simplified directed graph diffusion convolution [29] for graph signal filtering. Compared to the TDGC-GRU model, it can be seen that the dynamic graph displacement operator prediction model is better than the static graph displacement operator prediction model. The poor performance of the TGC-LSTM model may be due to the fact that the model was originally proposed for undirected graphs, and the free-flow reachability matrix defined in the original paper is cancelled. As can be seen from the PEMS-BAY dataset, our proposed model outperforms all baseline models on all three metrics, achieving good results. The performance of the Diff-GRU model is also very good, and the RMSE index is similar to the TDGC-GRU model, maintaining the fitting ability on extreme values. It is worth noting that the Fc-GRU model performs well on the PEMS-BAY dataset, indicating that the traditional recurrent neural network also has a certain ability to fit traffic data. The performance of other baseline models is similar to that in the METR-LA dataset.

Table 1
Comparison of different models.

METR-LA PEMS-BAY

Model MAE MAPE RMSE MAE MAPE RMSE

Fc-GRU 3.68 6.39% 6.54 2.39 5.14% 3.60

Conv-GRU 3.51 6.28% 6.73 2.93 6.19% 4.29

Dc-GRU 3.46 6.14% 6.50 2.70 5.85% 3.89

Diff-GRU 3.36 5.96% 5.14 2.58 5.17% 3.54

TGC-LSTM 3.66 6.35% 6.69 2.52 5.25% 3.83

TDGC-GRU 3.20 5.91% 6.38 2.34 5.04% 3.51

	METR-LA	PEMS-BAY
Fc-GRU	3.68	6.39%	6.54	2.39	5.14%	3.60
Conv-GRU	3.51	6.28%	6.73	2.93	6.19%	4.29
Dc-GRU	3.46	6.14%	6.50	2.70	5.85%	3.89
Diff-GRU	3.36	5.96%	5.14	2.58	5.17%	3.54
TGC-LSTM	3.66	6.35%	6.69	2.52	5.25%	3.83
TDGC-GRU	3.20	5.91%	6.38	2.34	5.04%	3.51

4.6. Model convergence curve

Figure 4 shows the convergence of different models on different datasets. As can be seen from the METR-LA dataset, since MAE and MAPE are less susceptible to extreme values, their convergence is similar. Except for the Diff-GRU model, other models enter a slow-convergence state after 30 epochs. Although there are slight fluctuations, the overall trend and the final convergence value tend to be stable as the learning rate continues to decrease. The initial error of the Diff-GRU model is large, the convergence rate is slow, and it enters a slow convergence state after 50 epochs. From the convergence values of MAE and MAPE, our proposed model TDGC-GRU outperforms other baseline models. In terms of the number of epochs entering convergence, our model outperforms the Diff-GRU model and is in line with other baseline models. The convergence of RMSE is different from that of MAE and MAPE. The Diff-GRU model enters a slow convergence state after 100 epochs, with the smallest convergence value and the best effect. Our model is second only to the Diff-GRU model, entering a slow convergence state after 30 epochs, and the convergence value is better than other baseline models. RMSE is more sensitive to extreme outlier values. The Diff-GRU model continuously reduces the RMSE value in 40–100 epochs, while the MAE and MAPE values of the model converge at the same time, indicating that the model is prone to overfitting to extreme values. It can be seen in the PEMS-BAY data set that all models enter a slow convergence state after the 20th epoch. The fluctuation of the convergence curves of all models between 20 and 100 epochs may be due to the large learning rate. After the 100th epoch, as the learning rate decreases, the convergence value is reached. The initial error of the Diff-GRU model is large, but the rate of error decline is rapid. In the 20 to 100th epoch, the three indicators of the Diff-GRU model have the same downward trend, indicating that the model does not overfit on the PEMS-BAY data set, and the overall change of the predicted data is relatively stable. In terms of the number of epochs entering convergence, our model outperforms the Diff-GRU model and is in line with other baseline models. From the final convergence value, our model outperforms all baseline models, indicating the effectiveness of our proposed model.

Figure 4.

Model convergence curve. (a)–(c) METR-LA. (d)–(f) PEMS-BAY.

Figure 5.

The effect of graph convolution parameters on the model. (a) K-hop $=$ 5. (b) K $=$ 3.

4.7. The effect of graph convolution parameters

Figure 5 shows the effect of the parameter K of the static directed graph convolution $f_{s} (x_{t} | θ, A, K)$ and the parameter K-hop of the dynamic diffusion convolution $f_{D} (x_{t} | θ, {\hat{A}}_{t}, k)$ on the performance of the model. As can be seen from Figure 5a, in both datasets, the model achieves the best results when K $=$ 3. When K is further increased, the effect of the model becomes worse. From the perspective of the frequency domain, as K becomes larger, the polynomial filter becomes more complex, and it is easy to overfit, resulting in lower prediction accuracy. From the perspective of the spatial domain, as K becomes larger, the receptive field of the convolution result becomes larger and larger, which will cause the prediction difference of each node to become smaller, thereby reducing the prediction accuracy.

As can be seen in Figure 5b, in both datasets, the model performs best when K-hop $=$ 5. When K-hop is further increased, the effect of the model becomes worse. From the perspective of spatial domain, as K-hop becomes larger, the receptive field of convolution result becomes larger. After the convolution results of different receptive fields are concatenated, the final convolution result dimension is too high. It is more difficult for the model to capture the effective information in $X_{t}^{K_h o p}$ , which leads to a decrease in the final prediction accuracy. Based on the influence of the parameters on the model, K $=$ 3 and K-hop $=$ 5 are chosen, so that the prediction result is optimal.

Figure 6.

The change and impact of the sequence fluctuation graph displacement operator on METR-LA. The horizontal and vertical coordinates represent part of the sensor number. (a)-(c) The sequence fluctuation graph displacement operator of three adjacent moments. (d) The part of static graph displacement operator. (e) The part of fluctuating graph displacement operator. (f) The part of dynamic graph displacement operator.

4.8. Fluctuation graph displacement operator visualization

Figure 6 shows the change of the sequence-fluctuation graph displacement operator and its impact on the static graph displacement operator during the prediction process on the METR-LA dataset. Fifty sensor values are selected for visualization. Figures 6a–6c shows the variation of the displacement operator of the sequence fluctuation graph. Among them, red represents the strengthened connection of the corresponding two sensors, blue represents the weakened connection of the corresponding two sensors, and the darker the color, the stronger the effect. From the distribution of colors, the enhanced and weakened regions change significantly over time. This is in line with our design of the wave graph displacement operator, which can sensitively capture the change of the graph displacement operator at each moment to achieve the purpose of accurate graph convolution. To ensure the availability of the dynamic graph displacement operator, the influence of the wave graph displacement operator on the static graph displacement operator is weaken, and $α =$ 0.01 is set. Figure 6f shows the effect of the wave graph displacement operator (Figure 6e) on the static graph displacement operator (Figure 6d). When the wave graph displacement operator (value between $-$ 0.01 and 0.01) is added to the static graph displacement operator (value between 0 and 1), the resulting dynamic graph displacement operator changes very little. This ensures the availability of the dynamic graph displacement operator while adding fluctuation information. Although the value of the wave graph displacement operator is small, it also have an important impact on the final convolution result because the entire graph displacement operator matrix is very large.

4.9. Prediction results visualization

Figure 7 visualizes the true prediction results of our model TDGC-GRU and two baseline models Diff-GRU and TGC-LSTM on the METR-LA and PEMS-BAY datasets. Among them, the predicted dates and intersections are selected randomly. From the real data of the METR-LA dataset, the data fluctuates greatly. There are large fluctuations in flow reduction at 04:00, 13:00 and 19:00, and the fluctuations are maintained at around 60–70 mph at other times. Judging from the prediction results of the METR-LA data set, the prediction results of the Diff-GRU model are obviously more sensitive to sudden fluctuations in the data, and are accurate for the fluctuations of large traffic reductions at 04:00, 13:00 and 19:00, while loss of prediction accuracy for most averages at other times. The prediction results of the TGC-LSTM model capture the fluctuation and average value in the data to some extent, but the overall effect is not good. The TDGC-GRU model has good predictions for the fluctuations and average values in the data. For the sudden large fluctuations in the data at 04:00 and 13:00, there are certain predictions, but they are not particularly sensitive, ensuring other time periods. For the prediction accuracy of the majority mean, the overall prediction performance is better than the two baseline models.

From the real data of the PEMS-BAY dataset, the fluctuation of the data is not large, and there is no sudden large fluctuation. From 05:00 to 06:00, the speed increased slightly from 65 mph to 70 mph, from 15:00 to 18:00, the speed gradually decreased from 60 mph to about 10 mph, and from 18:00 to 20:00, the speed gradually increased from 10 mph to about 65 mph. At other times Segment speed fluctuates slightly around 65 mph. From the prediction results of the PEMS-BAY dataset, except for the time period from 15:00 to 20:00, the prediction accuracy of the three models is similar, and there is not much difference. Among them, the prediction accuracy of the TGC-LSTM model and the TDGC-GRU model is better than the Diff-GRU model for the average value in the 20:00–24:00 time period. During the gradual decrease of speed from 15:00 to 18:00, the TDGC-GRU model captures the small fluctuations from 16:00 to 18:00 and the trough of 10 mph at 18:00 with more accuracy, outperforming the two baseline models. From the overall prediction effect, our TDGC-GRU model is more accurate.

Figure 7.

The prediction results on two randomly selected days. (a) Sensor ID: 773906 in METR-LA on 2012.3.4. (b) Sensor ID: 402120 in PEMS-BAY on 2017.1.4.

5. Conclusion

In this paper, in order to improve the accuracy of network-wide traffic flow prediction, Two-layer Dynamic Graph Convolutional Recurrent Neural Network (TDGC-GRU) is proposed, which consists of a static graph convolutional recurrent neural network, a dynamic graph convolutional recurrent neural network and a deconvolutional neural network. Compared to previous graph convolutional recurrent neural networks based on static graph displacement operators, dynamic sequential graph convolution learning is implemented. The model is evaluated on the METR-LA and PEMS-BAY datasets. The experimental results show that our model balances the influence of extreme values and average values on the prediction results, and achieves an improvement in prediction accuracy compared with other baseline models.

Footnotes

Acknowledgments

This research was supported in part by National Natural Science Foundation of China under Grant nos. 61602290 and 61902229, Natural Science Basic Research Plan in Shaanxi Province of China under Grant nos. 2022JM-329 and No. 2020JM-288, Fundamental Research Funds for the Central Universities under Grant nos. GK202103090 and GK202103084, Natural Science Foundation of Gansu Province under Grant no. 21JR7RA282, Education Department of Gansu Province: Industrial Support Plan Project under Grant no. 2022CYZC-38, and Foundation of Guizhou Provincial Key Laboratory of Public Big Data under Grant no. 2018BDKFJJ004.

References

Gilmore

J.F.

Abe

, Neural network models for traffic control and congestion prediction, I V H S Journal 2(3) (1995), 231–252. doi: 10.1080/10248079508903828.

Deng

Demiryurek

Shahabi

Ravada

, Towards Fast and Accurate Solutions to Vehicle Routing in a Large-Scale and Dynamic Environment, in: Advances in Spatial and Temporal Databases, Springer International Publishing, 2015, pp. 119–136. doi: 10.1007/978-3-319-22363-6_7.

Asghari

Deng

Shahabi

Demiryurek

, Price-aware real-time ride-sharing at scale, in: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2016. doi: 10.1145/2996913.2996974.

Shi

Ding

Errapotu

S.M.

Yue

Zhou

Pan

, Deep Q-Network-Based Route Scheduling for TNC Vehicles With Passengers’ Location Differential Privacy, IEEE Internet of Things Journal 6(5) (2019), 7681–7692. doi: 10.1109/jiot.2019.2902815.

Jabbarpour

M.R.

Zarrabi

Khokhar

R.H.

Shamshirband

Choo

K.-K.R.

, Applications of computational intelligence in vehicle traffic congestion problem: a survey, Soft Computing 22(7) (2017), 2299–2320. doi: 10.1007/s00500-017-2492-z.

Williams

B.M.

Hoel

L.A.

, Modeling and Forecasting Vehicular Traffic Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results, Journal of Transportation Engineering 129(6) (2003), 664–672. doi: 10.1061/(asce)0733-947x(2003)129:6(664).

Cai

Wang

Chen

Ding

Sun

, A spatiotemporal correlative k-nearest neighbor model for short-term traffic multistep forecasting, Transportation Research Part C: Emerging Technologies 62 (2016), 21–34. doi: 10.1016/j.trc.2015.11.002.

Chen

Liang

C.-Y.

Hong

W.-C.

D.-X.

, Forecasting holiday daily tourist flow based on seasonal support vector regression with adaptive genetic algorithm, Applied Soft Computing 26 (2015), 435–443. doi: 10.1016/j.asoc.2014.10.022.

Tan

, Short-term traffic flow forecasting with spatial-temporal correlation in a hybrid deep learning framework, arXiv, 2016. doi: 10.48550/ARXIV.1612.01022.

10.

Hochreiter

Schmidhuber

, Long Short-Term Memory, Neural Computation 9(8) (1997), 1735–1780. doi: 10.1162/neco.1997.9.8.1735.

11.

Chung

Gulcehre

Cho

Bengio

, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arXiv, 2014. doi: 10.48550/ARXIV.1412.3555.

12.

Dai

Wang

, Learning Traffic as Images: A Deep Convolutional Neural Network for Large-Scale Transportation Network Speed Prediction, Sensors 17(4) (2017), 818. doi: 10.3390/s17040818.

13.

Zhang

Zheng

, DNN-based prediction model for spatio-temporal data, in: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2016. doi: 10.1145/2996913.2997016.

14.

Zhang

Zheng

, Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction, Proceedings of the AAAI Conference on Artificial Intelligence 31(1) (2017). doi: 10.1609/aaai.v31i1.10735.

15.

Tao

Wang

, Long short-term memory neural network for traffic speed prediction using remote microwave sensor data, Transportation Research Part C: Emerging Technologies 54 (2015), 187–197. doi: 10.1016/j.trc.2015.03.014.

16.

Cui

Wang

, Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction, arXiv, 2018. doi: 10.48550/ARXIV.1801.02143.

17.

Shahabi

Demiryurek

Liu

, Deep Learning: A Generic Approach for Extreme Condition Traffic Forecasting, in: Proceedings of the 2017 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 2017, pp. 777–785. doi: 10.1137/1.9781611974973.87.

18.

Wang

, Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks, Sensors 17(7) (2017), 1501. doi: 10.3390/s17071501.

19.

Chen

Teo

S.G.

Chen

Zou

Yang

Vijay

R.C.

Feng

Zeng

, Exploiting Spatio-Temporal Correlations with Multiple 3D Convolutional Neural Networks for Citywide Vehicle Flow Prediction, in: 2018 IEEE International Conference on Data Mining (ICDM), IEEE, 2018. doi: 10.1109/icdm.2018.00107.

20.

Guo

Lin

Chen

Wan

, Deep Spatial-Temporal 3D Convolutional Neural Networks for Traffic Data Forecasting, IEEE Transactions on Intelligent Transportation Systems 20(10) (2019), 3913–3926. doi: 10.1109/tits.2019.2906365.

21.

Zheng

Lin

Feng

Chen

, A Hybrid Deep Learning Model With Attention-Based Conv-LSTM Networks for Short-Term Traffic Flow Prediction, IEEE Transactions on Intelligent Transportation Systems 22(11) (2021), 6910–6920. doi: 10.1109/tits.2020.2997352.

22.

Kipf

T.N.

Welling

, Semi-Supervised Classification with Graph Convolutional Networks, arXiv, 2016. doi: 10.48550/ARXIV.1609.02907.

23.

Henaff

Bruna

LeCun

, Deep Convolutional Networks on Graph-Structured Data, arXiv, 2015. doi: 10.48550/ARXIV.1506.05163.

24.

Yin

Zhu

, Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting, in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2018. doi: 10.24963/ijcai.2018/505.

25.

Zhao

Song

Zhang

Liu

Wang

Lin

Deng

, T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction, IEEE Transactions on Intelligent Transportation Systems 21(9) (2020), 3848–3858. doi: 10.1109/tits.2019.2935152.

26.

Shahabi

Liu

, Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting, arXiv, 2017. doi: 10.48550/ARXIV.1707.01926. ???

27.

Simonovsky

Komodakis

, Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017. doi: 10.1109/cvpr.2017.11.

28.

Veličković

Cucurull

Casanova

Romero

Liò

Bengio

, Graph Attention Networks, arXiv, 2017. doi: 10.48550/ARXIV.1710.10903.

29.

Zhou

Yang

Zhang

Trajcevski

Zhong

Khokhar

, Reinforced Spatiotemporal Attentive Graph Neural Networks for Traffic Forecasting, IEEE Internet of Things Journal 7(7) (2020), 6414–6428. doi: 10.1109/jiot.2020.2974494.

30.

Guo

Lin

Feng

Song

Wan

, Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 33(01) (2019), 922–929. doi: 10.1609/aaai.v33i01.3301922.

31.

Zheng

Fan

Wang

, GMAN: A Graph Multi-Attention Network for Traffic Prediction, Proceedings of the AAAI Conference on Artificial Intelligence 34(1) (2020), 1234–1241. doi: 10.1609/aaai.v34i01.5477.

32.

Zhou

, Graph Convolution: A High-Order and Adaptive Approach, arXiv, 2017. doi: 10.48550/ARXIV.1706.09916.

33.

Cui

Henrickson

Wang

, Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting, IEEE Transactions on Intelligent Transportation Systems 21(11) (2020), 4883–4894. doi: 10.1109/tits.2019.2950416.

34.

Pan

Long

Jiang

Zhang

, Graph WaveNet for Deep Spatial-Temporal Graph Modeling (2019). doi: 10.48550/ARXIV.1906.00121.

35.

Pan

Long

Jiang

Chang

Zhang

, Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks, in: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2020. doi: 10.1145/3394486.3403118.

36.

Abu-El-Haija

Perozzi

Kapoor

Alipourfard

Lerman

Harutyunyan

Steeg

G.V.

Galstyan

, MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing, arXiv, 2019. doi: 10.48550/ARXIV.1905.00067.

37.

Diao

Wang

Zhang

Liu

Xie

, Dynamic Spatial-Temporal Graph Convolutional Neural Networks for Traffic Forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019), 890–897. doi: 10.1609/aaai.v33i01.3301890.

38.

Yang

Liu

Zhao

, Space Meets Time: Local Spacetime Neural Network For Traffic Flow Forecasting, arXiv, 2021. doi: 10.48550/ARXIV.2109.05225.

39.

Wang

Zhang

Wei

Piao

Yin

, Metro Passenger Flow Prediction via Dynamic Hypergraph Convolution Networks, IEEE Transactions on Intelligent Transportation Systems 22(12) (2021), 7891–7903. doi: 10.1109/tits.2021.3072743.

40.

Lei

, Self-Supervised Spatiotemporal Graph Neural Networks With Self-Distillation for Traffic Prediction, IEEE Transactions on Intelligent Transportation Systems (2022), 1–14. doi: 10.1109/tits.2022.3219626.

41.

Xue

Zhao

Han

, An Embedding-Driven Multi-Hop Spatio-Temporal Attention Network for Traffic Prediction, IEEE Transactions on Intelligent Transportation Systems (2022), 1–16. doi: 10.1109/tits.2022.3220915.

42.

Pan

Liang

Wang

Zheng

Zhang

, Urban Traffic Prediction from Spatio-Temporal Data Using Deep Meta Learning, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, ACM, 2019. doi: 10.1145/3292500.3330884.

43.

Dai

Tian

Huang

Wang

Zhu

Song

, Adversarial Attack on Graph Structured Data, arXiv, 2018. doi: 10.48550/ARXIV.1806.02371.

44.

Zügner

Akbarnejad

Günnemann

, Adversarial Attacks on Neural Networks for Graph Data, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery, ACM, 2018. doi: 10.1145/3219819.3220078.

45.

Zügner

Günnemann

, Certifiable Robustness and Robust Training for Graph Convolutional Networks, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery, ACM, 2019. doi: 10.1145/3292500.3330905.

	METR-LA			PEMS-BAY
Model	MAE	MAPE	RMSE	MAE	MAPE	RMSE
Fc-GRU	3.68	6.39%	6.54	2.39	5.14%	3.60
Conv-GRU	3.51	6.28%	6.73	2.93	6.19%	4.29
Dc-GRU	3.46	6.14%	6.50	2.70	5.85%	3.89
Diff-GRU	3.36	5.96%	5.14	2.58	5.17%	3.54
TGC-LSTM	3.66	6.35%	6.69	2.52	5.25%	3.83
TDGC-GRU	3.20	5.91%	6.38	2.34	5.04%	3.51