FedCGAT: A federated demand prediction method for shared bicycles based on contrastive graph attention mechanism

Abstract

Demand prediction for shared bicycles based on historical trip data helps bicycle management organizations plan the scheduling of bicycles. However, the current prediction methods have the following problems: (1) the bicycle trip station data hides the temporal and spatial information, which is crucial for demand prediction, and the traditional methods cannot effectively use that information. (2) each bicycle organization manages its region, and local region data cannot accurately predict the demand of the whole region. Due to the privacy trip data, organizations cannot share raw data directly, which makes it a challenge to achieve federated multi-participant analyses. To address these issues, we propose a federated learning framework for demand prediction of shared bicycles (FedCGAT). Firstly, we propose a spatio-temporal graph neural network based on an attention mechanism for feature modeling of data. Meanwhile, we propose a graph data augmentation method to eliminate noisy data and capture spatial correlations. Then, we construct an auxiliary task based on contrastive learning to assist model training, which can learn the data features fully. Finally, we conducted experiments on two real-world bicycle datasets. The experiments demonstrate that FedCCAT achieves high prediction accuracy while preserving data privacy. Compared to the best-performing baseline model on both datasets, our model achieves reductions in MAPE values of 1.67% and 1.94%, respectively.

Keywords

Federated learning demand prediction shared bicycles graph attention mechanism contrastive learning

1 Introduction

Data-driven Intelligent Transport Systems (ITS) aim to enhance people’s lives through various methods of data collection and processing [1, 2]. Bicycle Sharing Systems (BSS) represent a significant component of ITS, they have been demonstrated to alleviate traffic congestion and improve last-mile connectivity with public transport [3]. With the advancement of mobile communication networks and the widespread adoption of mobile devices [4], shared bicycle usage has become more accessible, leading to a proliferation of organizations offering related services. Bike management organizations can improve bike scheduling and reduce resource wastage by collecting user journey data for bike demand forecasting. Nevertheless, due to the limited availability of data, these organizations often encounter challenges related to prediction accuracy. Moreover, accurate demand prediction on a large scale often requires data sharing among multiple organizations. Since trip data contains sensitive user information, organizations are obligated to restrict the sharing of such data. Hence, a key challenge lies in conducting joint multi-organization demand forecasting for bike sharing while ensuring the protection of user privacy.

In the early stages, the issue of bike-sharing demand prediction was approached as a time series problem. Researchers primarily investigated this using statistical modeling methods [5, 6]. In order to tackle more intricate traffic data, deep learning-based traffic prediction methods have experienced significant advancements in recent years. Researchers first tried based on Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) methods [13 –15], but the accuracy is often poor due to the inability to effectively capture spatial information in complex data topologies. As an advanced deep learning technique, Graph Convolutional Network (GCN) effectively solves the difficult problem of capturing spatial information of traffic topology [7]. GCN-based methods have been effectively used in the field of traffic prediction [16 –20]. Researchers have considered different realistic factors and proved the applicability of spatio-temporal graph neural networks in traffic prediction.

While GCN-based models can be effectively utilized for traffic prediction, existing methods still exhibit certain limitations. These limitations include the potential presence of noise in the graph topology, which is often overlooked by current approaches, and the insufficient consideration of deep spatio-temporal information. Consequently, these factors impact the accuracy of these methods. Additionally, existing methods tend to heavily rely on centralized training with large-scale data. However, in Bicycle Sharing Systems (BSS), extensive public travel data are typically collected by various providers, including government organizations and private companies. These trip data often contain personal information, raising potential privacy concerns in data exchange. Consequently, many organizations opt to store data locally to avoid sharing, posing a challenge for collaborative multi-organization efforts to train more effective predictive models.

To address the aforementioned issues, we propose a framework called federated spatio-temporal contrast graph attention network (FedCGAT) for demand forecasting in shared bicycle systems. Specifically, we introduce the Federated Learning (FL) technique to facilitate collaborative training among multiple organizations, enabling them to jointly train an effective demand prediction model without sharing data. We validate the effectiveness of the framework across different settings, including variations in the number of clients and data distribution methods. To ensure accurate traffic demand prediction while safeguarding user privacy, the framework integrates our proposed Contrastive Graph Attention Mechanism (CGAT) method. In CGAT, we introduce a personalized graph augmentation method to address the noise problem in traffic data. Additionally, by combining spatio-temporal attention mechanisms and comparative learning methods, CGAT achieves deep spatio-temporal information mining. In this configuration, FedCGAT enables effective joint forecasting of demand for shared bikes without compromising privacy.

The main contributions of this paper are as follows:

We propose the FedCGAT framework, which safeguards the privacy of data from each bicycle management organization while allowing multiple organizations to collaboratively train more accurate demand prediction models.

We propose a demand prediction method based on the Contrastive Graph Attention Mechanism (CGAT), integrated into FedCGAT. This method utilizes personalized graph augmentation to eliminate noise and combines contrastive learning methods with attention mechanisms to focus on deep spatio-temporal information, enabling each client to effectively capture data features.

We validate the performance of the proposed algorithm on two real-world datasets. Through experiments conducted in diverse scenarios, we demonstrate that FedCGAT achieves satisfactory shared bike demand prediction results while ensuring data privacy protection.

The rest of this paper is organized as follows. Section 2 mainly introduces the related work. Section 3 describes the demand prediction problem of shared bicycles. Section 4 describes the CGAT algorithm’s principle and the FedCGAT framework’s design. Section 5 focuses on verifying the performance of the CGAT algorithm and FedCGAT framework. Section 6 concludes the paper.

2 Related work

2.1 Time series prediction methods for shared bicycle demand

Accurate travel demand prediction is a crucial way to ensure the rational operation of urban transportation systems. Researchers mainly used machine learning and time series methods in the early days. For example, Lin et al. [8] proposed a stacked model SMVP based on XGBoost for predicting traffic changes in public bicycles. Xu et al. [9] proposed a machine learning model based on hybrid edge computing, which predicts the bicycle demand at each station by constructing a regression tree.

Due to the large amount of traffic demand data and variable characteristics, traditional time series methods can no longer meet prediction accuracy requirements. In recent years, deep learning has been applied in transportation demand prediction due to its ability to handle complex nonlinear relationships better. To model complex time dependencies, Chen et al. [10] predicted each bicycle station’s real-time rental and return demand based on different architectures of RNN. Wang et al. [11] combined Long Short Term Memory Network (LSTM) and Gated Recurrent Unit (GRU) to predict the number of short-term available bicycles at docking stations. Jiang et al. [12] also proposed a bi-directional GRU method based on point-trajectory classification to infer traffic patterns from GPS trajectory data. However, these methods only focus on temporal dependence and ignore the spatial correlation between traffic sequences.

Some recent studies have converted traffic regions into a two-dimensional grid. These methods used CNN to capture spatial correlations between neighboring regions and used RNN to model spatio-temporal dependencies simultaneously. For example, Zhang et al. [13] proposed ST-Resnet, which added external factors such as weather and holiday events for spatio-temporal feature modeling. Yao et al. [14] proposed DMVST-Net, which used the CNN method to handle spatial dependencies and used the LSTM to model temporal relationships. Li et al. also [15] proposed an STMN network to predict bicycle usage, this method extracted spatial features via CNN and captured correlations on time series using LSTM.

However, the grid-based structure cannot effectively capture spatial information about station topology, and CNN cannot effectively analyze non-Euclidean structure data. When the grid granularity is inappropriate, the problem of having no stations in the grid or dividing several stations into the same grid occurs. The station-based zoning approach is more reasonable and effective than the grid-based zoning approach.

2.2 Traffic prediction method based on GCN method

The traffic region is viewed as a topological map in station-based traffic prediction. Since GCN can extract the traffic information of a region more efficiently, researchers are turning to study prediction methods based on GCN models. Yao et al. [16] proposed STGCN, which combines spectral-domain GCN with gated convolution to express the problem on a graph and build the model using the complete convolution structure, which leads to faster training and fewer parameters. Zhao et al. [17] proposed T-GCN, which combines GCN and GRU for capturing spatial and temporal dependencies, respectively. Li et al. [18] proposed DCRNN, which extends GCN to directed graphs by replacing matrix multiplication in GRU with diffusion convolution. In a recent study, Guo et al. [19] proposed a new spatio-temporal graph convolutional model to develop a spatio-temporal attention mechanism to learn dynamic spatio-temporal correlations of traffic data. Song et al. [20] paid attention to the spatio-temporal data heterogeneity and designed several model modules based on different periods.

Traffic demand forecasting based on GCN methods can capture spatio-temporal correlations relatively effectively. However, most existing methods require centralized collection of a large amount of data followed by model training. However, in practical applications, different participants have their traffic data. Since trip data contains sensitive privacy information and trade secrets among participants, the participants cannot share the raw data, which makes it challenging to collaborate among multiple participants to train robust models.

2.3 Privacy protection in intelligent transport system

In ITS, most traffic demand forecasting methods are mainly based on data-driven, and data exchange between users and organizations may reveal private information. Privacy in ITS has been a concern for researchers [24]. To protect privacy in ITS, Zhou et al. [25] proposed a privacy-preserving method for measuring the traffic flow of cyber-physical road systems, which uses Maximum Likelihood Estimation (MLE) to make predictions. Sucasa et al. [26] proposed an autonomous privacy-preserving authentication method where vehicles can hide their information using pseudonyms. Ogundoyin et al. [27] proposed a secure ITS motion analysis scheme allowing participants to generate key pairs. However, these methods only use some encryption methods to protect data security and can only partially solve the privacy protection problem. At the same time, the computational efficiency is very low when dealing with huge amounts of data.

Google first proposed Federated Learning (FL) as a decentralized machine learning paradigm [28] that can adapt well to big data scenarios. This method replaces the exchange of raw data by sharing the global model across clients. These clients can use local data to train a global model with high generalizability collaboratively. Unlike traditional decentralized machine learning approaches, FL protects sensitive data privacy and reduces communication overhead, and thus is rapidly gaining popularity in the research community. Some data processing methods have also started to integrate with federated learning. In ITS, FL has good applicability. The methods in ITS are also instructive for the optimization of federated learning [29]. Xu et al. proposed a Hessian Regularized Spatiotemporal Low Rank (HRST-LR) algorithm to solve the problem of missing traffic data [30]. Some methods have been innovatively improved based on federated learning. Xu et al. [31] built a federated parallel data platform to support dynamic data analysis and decision making. Zhang et al. [32] introduced an adjacency matrix preservation method based on differential privacy. Chen et al. [33] designed a blockchain-based federated learning framework that ensures the trustworthiness of the federated computational process. Yuan et al. [34] proposed a federated deep learning method based on a spatio-temporal long short-term network (FedSTN) and designed three network modules to predict traffic flow. Zhao et al. [35] proposed a client selection method based on reinforcement learning to select trustworthy clients adaptively for global model fusion. Chen et al. [36] proposed a malicious client identification method based on client logit features to improve the security of federatedlearning.

In summary, considering the problems and shortcomings of the existing demand prediction methods for bike sharing, we combine federated learning to conduct a study on privacy-preserving demand prediction for bike sharing.

3 Preliminaries

In this section, we first describe the shared bicycle demand problem for a single organization. Then, we describe the federated demand prediction problem between multiple organizations.

3.1 Demand prediction for shared bicycles in transport network

Assume the city B contains N disjoint regional stations, and the user’s trip is from a departure station to a return station. The bicycle station distribution graph can be regarded as an undirected graph G, where $G = (V, E, A)$ , and V denotes the stations available for bicycle parking, and E denotes the edges connecting every two stations. The $A \in R^{N \times N}$ represents the adjacency matrix of the station graph, where N represents the number of stations V. We define v_i and v_j as two different stations, and A_ij is the connection information for these two stations. If the two stations are directly connected, where A_ij = 1, and A_ij = 0 otherwise.

Bike-sharing demand prediction is a time-series prediction problem. We define the demand of the n-th station at the t-th time step as $x_{t}^{n}$ , where n ∈ [1, N]. The sequence of demand for the first m time steps of the n-th station is defined as $X_{t - m : t}^{n} = {x_{t - m}^{n}, x_{t - m + 1}^{n}, \dots, x_{t}^{n}}$ . The set of all stations at the t-th step is defined as $X_{t} = {x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{N}}$ . We aim to learn a function F (·) by training to accurately estimate the demand at all stations at the future t + 1 time step, and the demand is defined as $X_{t + 1} = {x_{t + 1}^{1}, x_{t + 1}^{2}, \dots, x_{t + 1}^{N}}$ . Thus, our first problem to be solved is shown in Equation (1). $X_{t + 1} = F ({X_{t - m}, X_{t - m + 1}, \dots, X_{t}})$ (1) This means we need to predict the value of bicycle demand at each station at time t + 1 based on the history of the previous m moments.

3.2 Federated demand prediction for shared bicycles

In the previous section, we defined the demand forecasting problem for a single organization. This section will focus on the federated demand forecasting problem for multiple organizations. We can use a federated learning approach to solve the joint demand forecasting problem. Each organization is treated as a client in a federated learning scenario. All clients train a global model collaboratively without sharing local raw data. We define the federated learning scenario as follows.

Defining the clients participating in this federated learning task, and the set of all clients is denoted as C, where C = {C₁, C₂, . . . , C_L}, and L denotes the number of clients. Different organizations own the user trip data in different regions. The private data owned by client C_i is defined as D_i. The trip data of all clients are represented as D_g = D₁ ∪ D₂ ∪ ⋯ ∪ D_L, reflecting the transportation demand of the whole city at a certain time. We define the local model trained by the i-th client as M_i, then the set of clients’ model is defined as M = {M₁, M₂, ⋯ , M_L}.

We define a server trusted by all clients to collect the models uploaded by all clients in each round of communication. Then, the server fuses the global model with a specific model aggregation algorithm. Finally, the server distributes the global model to each client. Since the federated learning approach only uses the models of each client and does not use the raw data of each client, it can solve the privacy problem of the user’s data.

This paper uses the FedAvg algorithm [28] as the global model aggregation algorithm. The algorithm fuses the client models with the average weighted method, in which the server calculates the fusion weight according to the number of client samples. Then, the server uses this weight to fuse all the client’s model parameters and calculates the global model parameters. In the whole federation process, each client C_i only performs local training to learn a local model M_i, which can get the global M_g after the joint training. We express the process as the Equation (2). $M_{g}^{j} = FusionModel ({M_{1}^{j}, M_{2}^{j}, \dots, M_{L}^{j}})$ (2) The FusionModel (·) function represents the fusion of each client’s local model into a global model, and j represents the current communication round. The second problem we need to solve is the federated prediction problem by fusing the local models trained by each client.

4 Methodology

In this paper, we focus on two problems: the problem of demand for shared bicycles in a single organization and the problem of federated demand forecasting among multiple organizations. To solve these two problems, we propose the CGAT algorithm to solve the single-organization demand prediction and the FedCGAT framework to solve the federated demand prediction problem. In the subsequent sections, we will introduce the CGAT algorithm and the FedCGAT framework.

4.1 Bicycle sharing demand forecasting model CGAT

This section describes the design of the prediction CGAT algorithm and each module of CGAT. The overall architecture of the model is shown in Fig. 1.

Fig. 1

The overall architecture of CGAT.

4.1.1 ST-Block spatio-temporal feature building block

To model the spatio-temporal features of the data, we propose a spatio-temporal information building block ST-Block with an attention mechanism, as shown in Fig. 1(1). Inspired by [16], we design a four-layer structure, which includes the TGC temporal information coding layer, the SGC spatial information coding layer, and the TA temporal attention module. The hierarchical structure is denoted as TGC→SGC→TA→TGC. The structure uses alternating convolution to mine spatio-temporal correlations, which can capture features from both spatial and temporal domains. We also add a TA temporal attention layer to mine deep information.

Specifically, in the temporal information encoding part, the TGC layer uses a convolutional structure with a gating mechanism to capture the temporal variability of the demand, containing a causal convolution and a Gated Linear Unit (GLU). Taking as input a demand sequence of m time slices $X = {X_{t - m}, X_{t - m + 1}, \dots, X_{t}}$ , it outputs the time dimension embedding information for each node as Equation (3). $P = TGC (X, A)$ (3)

Where $P = {P_{t - m}, P_{t - m + 1}, \dots, P_{t}}$ represents the generated embedding matrix. $P_{t} = {p_{t}^{1}, p_{t}^{2}, \dots, p_{t}^{N}}$ represents the embedding information at time step t, N represents the number of nodes in the input graph. At time step t, $p_{t}^{n}$ represents the embedding of the node n after a time convolution operation. $A$ represents the adjacency matrix of the current demand graph.

The middle layer (SGC) is a spatial graph convolution layer based on Chebyshev polynomials. It is a bridge connecting the upper and lower layers and realizes fast propagation from graph convolution to spatial state using spatial convolution. For the embedding information of the time step t, the spatial domain information fusion using the SGC module is obtained as Equation (4). $Z_{t} = SGC (P_{t}, A)$ (4)

The purpose of the SGC function is to add a spatial embedding to the temporal embedding, and Z_t is the spatio-temporal embedding of all stations at the moment t.

Correlation between demand conditions in different time slices. We use attention mechanisms to capture temporal dynamics and adaptively assign different importance to data. Formally expressed as Equation (5). $Z^{Att} = δ ((Z_{m} W_{1}) (Z_{m} W_{2})^{T}) W_{3}$ (5) $δ (Γ) = \frac{\exp (Γ_{i, j})}{\sum_{j = 1}^{m} \exp (Γ_{i, j})}$ (6)

The $Z^{Att}$ operation is denoted as the feature representation after a temporal attention mechanism. Specifically $Z_{m} = {Z_{t - m}, Z_{t - m + 1}, . . ., Z_{t}}$ represents the sequence information of the input. The W₁,W₂, and W₃ represent the learnable parameter matrices from the input. The δ (·) operation represents normalizing the attention score matrix by softmax function as shown in Equation (6), where i and j represent the dependencies at different timesteps.

4.1.2 A graph augmentation approach for shared bicycle data

There is spatial information hidden in the topology map of bike-sharing stations. On the one hand, two spatially distant stations may have similar demand patterns, and these spatial correlations cannot be obtained directly from the raw data. On the other hand, bike sharing is more suitable for people’s short trips than taxis, and each trip tends not to cross many stations. Therefore, the number of trips between two neighboring stations will be larger, which may contain potential demand correlations. However, if the difference in demand patterns between two neighboring stations is too large, it produces noisy data. To capture the potential demand correlations of nodes while counteracting the noise perturbation, we design a graph augmentation method for bike-sharing data, as shown in Fig. 1(2). We will accomplish this function in three steps described in detail below.

(1) Obtain the node demand at time t: In this step, our goal is to obtain the demand of the node. We define γ _n as the overall demand of node n at m time steps and the spatio-temporal graph as $G = (V, E, A)$ . The embedding sequence $P_{t - m : t}^{n} = p_{t - m}^{n}, . . ., p_{t}^{n}$ is the output of n-th node from the TGC layer, and the embedding sequence is aggregated to obtain the overall embedding as shown in Equation (7). $γ_{n} = \sum_{i = t - m}^{t} {p_{i}^{n}}^{T} \cdot ω_{0} \cdot p_{i}^{n}$ (7)

Where ω ₀ is a parameter vector. The γ _n reflects the overall demand pattern of node n at m time steps.

(2) Calculate the similarity of nodes: In this step, we will calculate the correlation between the nodes. We use the embedding value from the previous step as a feature of the nodes and then use the following improvement cosine similarity method to calculate the correlation between two nodes, for any two nodes j, k in the spatio-temporal graph, the correlation of two nodes is calculated using Equation (8). $R_{j, k} = cos (γ_{j}, γ_{k}) + α$ (8)

Where α is an adjusθ parameter that responds to the potential correlation of demand between two neighboring stations, for two neighboring nodes, set α > 0; otherwise, α = 0. R denotes the demand correlation of two nodes. For two nodes j and k in the graph, the larger the value of R_j,k, the higher the demand correlation between the two nodes.

(3) Graph structure optimization: Once we have calculated the correlation coefficient value of the two nodes, we modify the graph structure based on this value. If the two nodes are not adjacent and the correlation is high (R_j,k > 0), edge e_j,k is added between the nodes. If the two nodes are neighboring and the demand correlation is still small (R_j,k ≤ 0) after adding the α parameter adjustment, the edge e_j,k ∈ E of the node is masked.

With the above graph augmentation method, the CGAT model not only focuses on the unique data distribution characteristics of the bicycle sharing data but also eliminates the noise effect of useless edges and captures the spatial dependency between nodes. The augmented graph can be expressed as $\hat{G} = (V, \hat{E}, \hat{A})$ . $\hat{E}$ is the adjusted set of edges and $\hat{A}$ represents the new adjacency matrix.

4.1.3 Spatio-temporal feature capture based on contrastive learning

Contrastive learning is an essential part of self-supervised learning, and in [37], the authors used self-supervised learning methods to predict traffic flow. Inspired by [37], to fully allow the model to learn spatio-temporal features, we construct a contrastive learning task to assist in model training in this section. We will accomplish this function in the following three steps. The process is shown in Fig. 1(3).

(1) Get the node-level embedding information: The embeddings obtained for the original data and the augmented data after the second ST-Block are $H$ and $\tilde{H}$ , respectively. The embedding information of the n-th node at t time step is denoted as $h_{t}^{n}$ . The aggregated node-level embedding information can be obtained as Equation (9) by fusing the augmented embedding with the original embedding. $s_{t}^{n} = φ_{1} ⊙ h_{t}^{n} + φ_{2} ⊙ {\tilde{h}}_{t}^{n}$ (9)

Where φ is the parameter matrix, and ⊙ stands for performing the Hadamard product operation.

(2) Get the graph-level embedding information: the graph-level embedding u_t for the current time step is obtained by averaging $s_{t}^{n}$ as shown in Equation (10). $u_{t} = ψ (\frac{1}{N} \sum_{n = 1}^{N} s_{t}^{n})$ (10)

Where ψ is the sigmoid function.

(3) Constructing sample pairs and computing the loss function: For each client, node-level and graph-level embedding under the same time step t are selected as positive sample pairs to achieve a holistic focus on spatial demand patterns. The local and global embedding under different time steps t and t^* are selected as negative sample pairs to capture the heterogeneity of demand at different time steps.

The contrastive learning assistance task for demand prediction of shared bicycles requires optimizing the following cross-entropy loss expressed in Equation (11).

$\begin{matrix} L_{c} = - (\sum_{n = 1}^{N} log ψ ({s_{t}^{n}}^{T} W_{4} u_{t}) + \\ \sum_{n = 1}^{N} log (1 - ψ ({s_{t^{*}}^{n}}^{T} W_{4} u_{t})) \end{matrix}$ (11)

Where W₄ is a matrix of parameters that can be learned, and t^* represents t different arbitrary time steps.

4.1.4 Demand prediction module

We have extracted spatio-temporal features, optimized graph structure, and obtained a contrast loss in the first three stages. In this stage, we aim to use this information to predict the demand for bicycles. Fig. 1(4) shows the demand prediction module.

We aim to output the demand for bicycles at a given moment through a predictive model. We define the problem as the Equation (12). ${\hat{X}}_{t + 1} = W_{fc} H + b$ (12)

The ${\hat{X}}_{t + 1}$ is the demand prediction value for all nodes at time step t + 1, and $H$ denotes the embedding information of the nodes. W_fc is the weight matrix that maps the hidden output in the final TGC layer to the predicted output, and b is the bias parameter.

We have obtained the embedding information $H$ from the final TGC layer (In the first stage). Therefore, our goal is to learn the optimized W-matrix and b-parameters. We compare the predicted and actual values and optimize the model parameters using the calculated losses. We define this loss as $L_{total}$ .

We use the sum of the two losses to calculate the total loss. As shown in Equation (13), we define the loss of the prediction task as $L_{p}$ . $L_{p} = \sum_{n = 1}^{N} | {\hat{x}}_{t + 1}^{n} - x_{t + 1}^{n} |$ (13)

Where ${\hat{x}}_{t + 1}^{n}$ denotes the predicted value of the model for the n-th node at time step t + 1 and $x_{t + 1}^{n}$ represents the ground truth of demand. We have obtained the loss for the contrastive learning task ( $L_{c}$ ) for each client in the previous phase. Then, The total loss value is the loss-weighted sum of the two tasks, as shown in the formula (14). $L_{total} = β L_{p} + (1 - β) L_{c}$ (14)

The β is a hyperparameter to adjust the weights of the two losses. The model training is performed via a backpropagation algorithm. Finally, we can get the optimal parameters of the W-matrix by minimization loss.

4.2 Federated learning based spatio-temporal demand forecasting model for shared bicycles FedCGAT

In this section, we integrate the CGAT algorithm into federated learning and propose a privacy-preserving demand prediction method for shared bicycles (FedCGAT). We use the FedAvg algorithm to fuse the global model on the server. The client only needs to upload the model parameters to the server without exposing the local private data. This way, multiple bike-sharing organizations can jointly train and obtain a predictive model with good results while avoiding privacy leak issues.

The overall architecture of FedCGAT is shown in Fig. 2. Given a set of clients C = {C₁, C₂, . . . , C_L}, the global trip data is represented as D_g = D₁ ∪ D₂ ∪ . . . D_L, and the data private to each client is D_i. The number of samples of each client’s data D_i is denoted as S_i.

Fig. 2

The overall architecture of FedCGAT.

For every communication round of federated learning, the client uses the most recent global model from the server to initialize the local model. Then, each client uses the local data D_i to train the local model. We define the i-th client model at the j-th round as $M_{i}^{(j)}$ .

After finishing local model training, each client uploads model M^(j) to the server. Then, the server aggregates the received model parameters to get a new global model. The process is shown in Equation (15). $M_{g}^{(j)} = \sum_{i = 1}^{L} M_{i}^{(j)} \cdot \frac{S_{i}}{S_{g}}$ (15) $S_{g} = \sum_{i = 1}^{L} S_{i}$ (16)

Where S_i is the number of data samples used by a client, and S_g is the sum of data samples of all clients, as shown in Equation (16). Specifically, as in Fig. 2 and Algorithm 1, the working process of FedCGAT includes the following four phases.

Model distribution phase: The server sends the global model M^(j) to each client C_i via broadcast, with j indicating the global round it is currently in. Before the first round begins, the server will initialize a model M⁽⁰⁾.

Local model training phase: Each client C_i uses the global model distributed by the server to initialize the local model M_i. Then, each client uses local data D_i to train local model $M_{i}^{(j)}$ .

Model upload phase: Each client C_i uploads the trained model parameters $M_{i}^{(j)}$ to the server.

Model aggregation phase: The server collects all client model parameters ${M_{i}^{(j)}, \dots, M_{L}^{(j)}}$ at the j-th communication round. Then, the server fuses the new global model $M_{g}^{(j)}$ using the aggregation algorithm (FedAvg).

Algorithm 1 FedCGAT

Input: Clients C = {C₁, C₂, . . . , C_L}, private data D_i is not uploaded.The number of rounds, J.Local training configuration.

Output: The trained FedCGAT model parameters M_g;

1: Initialize global model parameters M⁽⁰⁾

2: for round j = 0, 1, . . . do

3: Broadcast the current global model M^(j) to each client

4: for each client C_i ∈ C do

5: Initialize local model M_i for the current epoch ← M^(j)

6: $M_{i}^{(j)}$ ← Optimize M_i by minimizing loss function

7: end for

8: Aggregate local model parameters to get the new global model $M_{g}^{(j)} = \sum_{i = 1}^{L} M_{i}^{(j)} \cdot \frac{S_{i}}{S_{g}}$

9: end for

Repeat the above steps in a federated task until all the communication rounds are finished. In the next section, we will evaluate the performance of FedCGAT for demand prediction of shared bicycles through specific experiments.

5 Experiments

We use two real-world datasets and conduct a three-part experiment in this section. We first validate the performance of the CGAT algorithm and compare it with eight baselines. Then, we design different scenarios in a federated environment and verify the performance of the FedCGAT algorithm under different data distributions. Finally, we verify the performance of each module of the CGAT and FedCGAT algorithms through ablation experiments.

5.1 System configuration

5.1.1 Dataset description

The experiments are conducted based on two real-world bicycle-sharing trip datasets. The details are described below.

BikeNYC14: this dataset is taken from the 2014 New York City Bicycle System [13] for the period 1 April to 30 September and contains more than 6,800 records of shared bike trips.

BikeNYC16: this dataset is taken from the Citi Bike System in New York [38] from 1 July 2016 to 29 August 2016 and contains more than 2,600,000 records of shared bike trips.

Bike share trip data includes trip duration, start station IDs, end station IDs, and time information. The detailed statistics of the dataset are summarized in Table 1.

Table 1
Dataset description

Datasets BikeNYC14 BikeNYC16

Time interval 60 min 30 min

Time range 4/1/2014-9/30/2014 7/1/2016-8/29/2016

Graph nodes 128 200

Adjacency matrix (16,8) (10,20)

Number of orders 6.8k+ 2.6m+

Datasets	BikeNYC14	BikeNYC16
Time interval	60 min	30 min
Time range	4/1/2014-9/30/2014	7/1/2016-8/29/2016
Graph nodes	128	200
Adjacency matrix	(16,8)	(10,20)
Number of orders	6.8k+	2.6m+

5.1.2 Experimental environment

The experiments are conducted using a workstation with Intel(R) Core(TM) i7-10700 CPU @3.80GHz, 64G RAM, and GeForce RTX 3090. The algorithms are designed based on PyTorch 1.10.0 and CUDA version 11.3 environments, and we have also developed a federated learning simulation program for federated learning experiments.

5.1.3 Metrics

The performance of each algorithm is verified using two metrics, Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE), defined as Equations (17) and (18). $MAE = \frac{1}{N} \sum_{i = 1}^{N} | Y_{i} - \hat{Y_{i}} |$ (17) $MAPE = \frac{1}{N} \sum_{i = 1}^{N} | \frac{Y_{i} - \hat{Y_{i}}}{Y_{i}} | \times 100 %$ (18) Where Y_i and $\hat{Y_{i}}$ denote the real and predicted bicycle demand of the i-th node, respectively, and N is the number of samples. The smaller values of the two indicators indicate that the model is predictingbetter.

5.2 Demand forecasting for bike sharing based on CGAT

5.2.1 Experimental configuration

In this section, we utilize CGAT for shared bike demand prediction. The experiments employ spatio-temporal convolutional kernels of size 3 for ST-Blocks. The datasets are split into training, validation, and testing sets in an 8:1:1 ratio. During training, we use the Adam optimizer for 100 epochs with a batch size of 64 and a learning rate of 0.001. Similar experimental configurations are adopted for the baseline algorithms, which will be detailed in the following section.

5.2.2 Baselines

The CGAT algorithm is compared to the following eight baseline methods,

ARIMA [6]: It is a classical temporal prediction model.

ST-ResNet [13]: It uses convolution-based and deep residual networks to model spatial and temporal attributes, respectively.

STGCN [16]: It is a neural network structure consisting of spatio-temporal blocks, applying pure convolutional layers to extract spatio-temporal information from graph-structured data simultaneously.

ASTGCN [19]: It designs a multi-temporal multiplexing network structure that integrates attention mechanisms with spatio-temporal convolution.

STSGCN [20]: It constructs local spatio-temporal graphs that connect individual spatial graphs of neighboring time steps into one graph to extract spatiotemporal correlations.

MSSTN [21]: It proposes a multi-scale spatio-temporal network for trajectory prediction using complex local and non-local correlations in the trajectory flow.

SDGCN [22]: It uses a state-sharing Hidden Markov Model to capture traffic flow patterns from sparse trajectory data, and proposes a semantic-aware dynamic graph convolutional network for traffic prediction.

ST-CGCN [23]: It constructs a distance matrix, a data correlation matrix and a comfort measurement matrix, and proposes method based on spatio-temporal complex graph convolutional networks.

These baselines can be classified into two categories according to the methods used. ARIMA only models the temporal dimension, while all other baseline methods model both the temporal and spatial dimensions.

5.3 Result analysis

Following the experimental configuration, the CGAT algorithm is validated on two datasets and compared with eight baseline models, and the experimental results are shown in Table 2.

Table 2
Performance comparison of FedCGAT with different baselines

Datasets Method MAE MAPE(%)

BikeNYC14 ARIMA 11.36 35.21

ST-ResNet 5.91 28.62

STGCN 5.54 27.81

ASTGCN 6.12 26.47

STSGCN 5.95 27.35

MSSTN 5.77 26.28

SDGCN 5.49 25.82

ST-CGCN 5.75 25.87

CGAT(ours) 5.34 24.15

BikeNYC16 ARIMA 9.05 28.34

ST-ResNet 5.61 29.84

STGCN 4.94 26.51

ASTGCN 5.32 24.97

STSGCN 5.01 25.86

MSSTN 5.12 24.42

SDGCN 4.91 22.87

ST-CGCN 4.98 23.19

CGAT(ours) 4.78 20.93

From the experimental results, the ARIMA method based on temporal feature prediction is the least effective. ST-Resnet is based on grid structure while considering spatial features, but it lags behind most of the GCN-based methods in terms of experimental results. The experimental results show that GCN-based methods generally have better prediction results, indicating that GCN does have an advantage in capturing spatial correlation. Among the GCN-based baseline methods, SDGCN achieves the best prediction performance, probably because its dynamic perceptual network focuses on more spatial information.

Compared with the baseline algorithm, our proposed CGAT method can achieve better prediction results. Unlike other methods, CGAT reduces the noise effect based on graph enhancement method, and makes the graph network focus on deeper spatio-temporal features based on contrast learning and attention mechanism. In the experiments based on the BikeNYC14 dataset, CGAT achieves the MAE and MAPE metrics of 5.34 and 24.15%, respectively, which are reduced by 0.15 and 1.67% relative to SDGCN.

In the experiment based on the BikeNYC16 dataset, the MAE and MAPE metrics are reduced by 0.13 and 1.94%, respectively. This shows that CGAT is able to better capture the data features in the bike-sharing prediction scenarios and achieve accurate predictions.

5.4 Multi-organization joint bike-sharing demand forecast based on FedCGAT

5.4.1 Experimental configuration

In this section, we employ the FedCGAT framework for collaborative shared bike demand prediction across multiple organizations. The experiments involve setting the number of clients participating in federated learning to K, with three different settings (K = 4, 6, 8). For each set of experiments, we set the global communication rounds to 30, considering each completion of model aggregation as the end of one round. During each global communication round, each client conducts local training for 10 epochs using the same training configuration as in Section 5.3. The experiments in this section are conducted using the BikeNYC16 dataset, where the training set is proportionally partitioned among each client, and the Mean Absolute Percentage Error (MAPE) is used to evaluate the experimental results.

5.4.2 Baselines

To assess the efficacy of FedCGAT in multi-organization collaborative demand forecasting, we choose two federated learning-based predictive algorithms as baseline models.

FedGRU [40]: It is a gated recurrent unit neural network algorithm based on federated learning for traffic flow prediction.

FedTDP [41]: It combines federated learning with GCN methods for vehicle demand forecasting.

5.4.3 Data distribution scenario description

In federated learning, the manner in which data is partitioned can significantly impact the performance of the global model. To simulate realistic multi-organization collaborative training scenarios, we establish different data distribution setups for experiments. As shown in Table 3, for each client number setting, we set up three data distribution scenarios, including an independent and identically distributed (IID) scenario and two non-independent identically distributed (Non-IID) scenarios. In the IID scenario, we distribute data evenly to each client. In non-IID scenarios, data distribution is relatively uneven among clients. For example, when the number of clients is 4, we set the data distribution Scenario-1 to 6:2:1:1 and Scenario-2 to 4:3:2:1.

Table 3
Data distribution with different number of clients

Clients numbers Scenario-1 Scenario-2

K= 4 6:2:1:1 4:3:2:1

K= 6 3:3:1:1:1:1 2:2:2:2:1:1

K= 8 3:1:1:1:1:1:1:1 2:1.5:1.5:1:1:1:1:1

5.4.4 Results and discussion

The experimental results are presented in Table 4 and Fig. 3. It is evident that across experiments with varying client number configurations, the performance under IID distribution consistently outperforms that of the two Non-IID scenarios. This performance is attributed to the relatively favorable performance of each client’s local model under IID distribution, thereby resulting in high accuracy of the aggregated global model. In Non-IID distribution, Scenario-2 exhibits improved performance to some extent compared to Scenario-1, indicating that the uniformity of data distribution impacts the effectiveness of the global model.

Fig. 3

Comparison of model performance under different data distribution scenarios.

Unlike the baseline algorithms, FedCGAT empowers each client to effectively focus on the deep

spatio-temporal characteristics of their respective data, thereby ensuring prediction accuracy. As illustrated in Fig. 3, our FedCGAT consistently outperforms the two baseline algorithms across various client number settings, with reductions in the MAPE metric ranging from 2.2% to 8.0%. Specifically, for K = 4, FedCGAT achieves an MAPE value of 22.94%, surpassing the majority of baseline algorithms described in Section 5.3. This demonstrates that our approach can achieve high prediction accuracy in multi-organization collaborative demand forecasting while preserving user privacy.

As shown in Table 4, it is noteworthy that the effectiveness of all algorithms diminishes with an increase in the number of clients. In the experiment with K = 8, for instance, the MAPE value of FedGRU reaches 32.18%, failing to meet prediction requirements, whereas our FedCGAT achieves 25.41%. This decline in effectiveness can be attributed to the diminishing amount of valid data available for each client to train a valuable model as the number of clients

Table 4

Comparison of FedCGAT performance in different scenarios

Client numbers	Method	Scenario-1	Scenario-2	IID
	FedGRU	29.81	29.31	28.72
K= 4	FedTDP	26.02	25.81	25.13
	FedCGAT	23.87	23.21	22.94
	FedGRU	32.45	31.21	30.55
K= 6	FedTDP	27.59	27.50	26.88
	FedCGAT	25.25	24.76	24.08
	FedGRU	34.05	32.86	32.18
K= 8	FedTDP	29.39	28.74	28.25
	FedCGAT	26.05	25.79	25.41

Increases. Additionally, the likelihood of errors during model uploading also escalates. In future work, we will look at how to mitigate the impact of these issues on experiments effect.

5.5 Ablation analysis

5.5.1 Effect of different modules on the performance of CGAT

In the ablation experiments, we perform experiments on the CGAT algorithm and the FedCGAT framework separately, and we verify the impact of each module in the algorithm by eliminating the key modules. The CGAT algorithm consists of two main modules: the prediction module based on the graph-contrast attentional mechanism (V1) and the graph-data enhancement module (V2). We eliminate these two modules separately to perform ablation experiments. At the same time, we also eliminate these two modules in the FedCGAT framework separately to perform the experiments. A detailed description of each version of the algorithm is given below.

FedCGAT: A demand forecasting method for shared bikes based on federated spatio-temporal graphical attention networks in Section 4.2.

FedCGAT-V1: This variant removes the temporal attention mechanism module of the ST-Block and experiments with it in a federated scenario.

FedCGAT-V2: This variant removes the graph augmentation approach for bike-sharing data and experiments in a federated scenario.

CGAT: The plain CGAT model without federated learning is described in Section 4.1.

CGAT-V1: This variant removes the temporal attention mechanism module of ST-Block and uses all data in one client experiment.

CGAT-V2: This variant removes the graph augmentation approach for the bike-sharing data and uses all the data in a single client experiment.

The experimental results are shown in Table 5. With the validation results on both datasets, FedCGAT and CGAT consistently outperform their respective variants. In addition, the metrics of CGAT are always better than that of FedCGAT. The non independent and identically distributed problem of the data across clients leads to decreased global model accuracy. Therefore, the accuracy of the global model of FedCGAT is lower than that of the model trained using raw data. Although there is accuracy degradation in the global model, our proposed framework does not require sharing raw data among the participants, which is essential for each participant. A comparison of the MAPE metrics shows that the V2 variant is generally worse than the V1 variant, which means that the graph attention module (V2) has a greater impact on model performance. In summary, the designed sub-modules positively impact performance improvement and are more effective for modeling in federated scenarios.

Table 5
Comparison of ablation tests

Dataset Variant MAE MAPE(%)

BikeNYC14 FedCGAT 5.56 25.78

FedCGAT-V1 5.80 26.75

FedCGAT-V2 5.87 27.72

CGAT 5.34 24.15

CGAT-V1 5.42 25.91

CGAT-V2 5.59 26.13

BikeNYC16 FedCGAT 4.96 22.94

FedCGAT-V1 5.19 25.07

FedCGAT-V2 5.28 25.76

CGAT 4.78 20.93

CGAT-V1 4.95 22.52

CGAT-V2 5.10 23.23

5.5.2 Model calculation quantity analysis

In this section, we use three metrics, the number of model parameters (Params), the number of floating-point operations (FLOPs), and the time required for each training epoch of the model to evaluate the computational amount of the model [39].

Params refers to the number of parameters that need to be trained in model training, which can be used to measure the size of the model and reflect the complex complexity of the computational space. FLOPs represents the number of floating point operations per second, which can be used to measure the computational complexity of an algorithm. The smaller the value, the lower the computational complexity required. In addition, the time required for each training round of the model can also reflect the resources required for model training. By comparing the three indexes, the calculation and practicability of the model can be evaluated.

Table 6 shows the model computation information for CGAT and its variants, compared to the SDGCN with the best prediction from the baseline algorithm. It can be found that the computation amount of the CGAT variant decreases relative to the CGAT, which is due to the reduction of some computation steps in the model with the removal of the modules. Compared with SDGCN, the FLOPs value of CGAT is reduced by 94.21M, which consumes less computing energy. Meanwhile, combined with the modelprediction effect shown in Fig. 4, it can be found that the MAPE value of CGAT is reduced by 1.94% compared with SDGCN, indicating that our model has better prediction effect while occupying fewer resources.

Table 6
Comparison of model complexity

Variant Params FLOPs Training Time

(M) (M) (s/epoch)

CGAT 1.841 429.251 6.82

CGAT-V1 1.466 384.568 6.23

CGAT-V2 1.195 310.284 5.89

SDGCN 1.968 523.461 8.36

Fig. 4

Training time and MAPE value comparison.

6 Conclusions

In this paper, we propose the FedCGAT framework, which enables collaborative data analysis across multiple organizations while protecting data privacy. At the same time, the framework can also allow the organizations in each region to realize more accurate demand predictions. To address the problem of shared bicycle demand prediction in a single organization, we propose a contrastive graph attention (CGAT) method. We first extract spatio-temporal features of the bicycle data using spatio-temporal convolution and graph attention mechanisms. Then, we design a graph data augmentation method to eliminate noisy data and further capture spatial correlations. Finally, we use an auxiliary task based on contrastive learning to help model training, which can fully learn spatio-temporal features. We conduct experiments on two real datasets and simulate different data distributions in the federated scenario. The experiment shows that the FedCGAT framework can maintain good prediction performance while protecting data privacy.

In our future work, we will optimize the performance of the FedCCAT algorithm in federated scenarios involving a large number of clients. Concurrently, we will investigate methods to ensure the performance of demand prediction models in the presence of missing traffic data. Additionally, the issue of data integrity in distributed scenarios has been scarcely addressed in existing research, we will strive for innovation in this regard.

Footnotes

Acknowledgment

This research was supported by the National Natural Science Foundation of China (No. 62072469).

References

Muthuramalingam

, Bharathi

, Rakesh Kumar

, Gayathri

, Sathiyaraj

and Balamurugan

, IoT based intelligent transportation system (IoT-ITS) for global perspective: A case study, Internet of Things and Big Data Analytics for Smart Generation (2019), 279–300.

Chen

, Lin

, He

, Polat

, Alhudhaif

and Alenezi

, Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images, Expert Systems with Applications 229 (2023), 120519.

Litman

and Burwell

, Issues in sustainable transportation, International Journal of Global Environmental Issues 6(4) (2006), 331–347.

Zhang

, Lin

, Pan

and Xu

, Crftl: cache reallocation-based page-level flash translation layer for smartphones, IEEE Transactions on Consumer Electronics (2023).

Liu

and Guan

, A summary of traffic flow forecasting methods, Journal of Highway and Transportation Research and Development 21(3) (2004), 82–85.

Williams

B.M.

and Hoel

L.A.

, Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and empirical results, Journal of Transportation Engineering 129(6) (2003), 664–672.

Kipf

T.N.

and Welling

, Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907 (2016).

Lin

, Jiang

, Fan

and Wang

, A stacking model for variation prediction of public bicycle traffic flow, Journal of Intelligent & Fuzzy Systems 22(3) (2018), 911–933.

, Han

, Qi

, Du

, Lin

and Shu

, A hybrid machine learning model for demand prediction of edge-computing-based bike-sharing system using Internet of Things, IEEE Internet of Things Journal 7(8) (2020), 7345–7356.

10.

Chen

P.C.

, Hsieh

H.Y.

, Su

K.W.

, Sigalingging

X.K.

, Chen

Y.R.

and Leu

J.S.

, Predicting station level demand in a bikesharing system using recurrent neural networks, IET Intelligent Transport Systems 14(6) (2020), 554–561.

11.

Wang

and Kim

, Short-term prediction for bike-sharing service using machine learning, Transportation Research Procedia 34 (2018), 171–178.

12.

Jiang

, de Souza

E.N.

, Pesaranghader

, Hu

, Silver

D.L.

and Matwin

, Trajectorynet: An embedded gps trajectory representation for point-based classification using recurrent neural networks, arXiv preprint arXiv:1705.02636 (2017).

13.

Zhang

, Zheng

and Qi

, Deep spatio-temporal residual networks for citywide crowd flows prediction, Proceedings of the AAAI Conference on Artificial Intelligence 31(1) (2017).

14.

Yao

, Wu

, Ke

, Tang

, Jia

, Lu

, Gong

, Ye

and Li

, Deep multi-view spatial-temporal network for taxi demand prediction, Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018).

15.

, Xu

, Chen

, Wang

, Zhang

and Shi

, Short-term forecast of bicycle usage in bike sharing systems: a spatial-temporal memory network, IEEE Transactions on Intelligent Transportation Systems 23(8) (2021), 10923–10934.

16.

, Yin

and Zhu

, Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting, arXiv preprint arXiv:1709.04875 (2017).

17.

Zhao

, Song

, Zhang

, Liu

, Wang

, Lin

, Deng

and Li

, T-gcn: A temporal graph convolutional network for traffic prediction, IEEE Transactions on Intelligent Transportation Systems 21(9) (2019), 3848–3858.

18.

, Yu

, Shahabi

and Liu

, Diffusion convolutional recurrent neural network: Data-driven traffic forecasting, arXiv preprint arXiv:1707.01926 (2017).

19.

Guo

, Lin

, Feng

, Song

and Wan

, Attention based spatial-temporal graph convolutional networks for traffic flow forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 33(01) (2019), 922–929.

20.

Song

, Lin

, Guo

and Wan

, Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting, Proceedings of the AAAI Conference on Artificial Intelligence 34(01) (2020), 914–921.

21.

Song

, Bai

, Fan

, Deng

and Jiang

, MSSTN: a multi-scale spatio-temporal network for traffic flow prediction, International Journal of Machine Learning and Cybernetics (2024), 1–15.

22.

Liang

, Kintak

, Ning

, Tiwari

, Nowaczyk

and Kumar

, Semantics-aware dynamic graph convolutional network for traffic flow forecasting, IEEE Transactions on Vehicular Technology (2023).

23.

Bao

, Huang

, Shen

, Cao

, Ding

, Shi

and Shi

, Spatial-temporal complex graph convolution network for traffic flow prediction, Engineering Applications of Artificial Intelligence 121 (2023), 106044.

24.

Zhong

, Lin

and He

, Dynamic multi-scale topological representation for enhancing network intrusion detection, Computers and Security 135 (2023), 103516.

25.

Zhou

, Mo

, Xiao

, Chen

and Yin

, Privacy-preserving transportation traffic measurement in intelligent cyber-physical road systems, IEEE Transactions on Vehicular Technology 65(5) (2015), 3749–3759.

26.

Sucasas

, Mantas

, Saghezchi

F.B.

, Radwan

and Rodriguez

, An autonomous privacy-preserving authentication scheme for intelligent transportation systems, Computers and Security 60 (2016), 193–205.

27.

Ogundoyin

S.O.

, An anonymous and privacy-preserving scheme for efficient traffic movement analysis in intelligent transportation system, Security and Privacy 1(6) (2018), e50.

28.

McMahan

, Moore

, Ramage

, Hampson

and y Arcas

B.A.

, Communication-efficient learning of deep networks from decentralized data, Artificial Intelligence and Statistics. PMLR (2017), 1273–1282.

29.

Chen

, Lin

, Liu

, Yang

, Zhang

and Xu

, NT-DPTC: a non-negative temporal dimension preserved tensor completion model for missing traffic data imputation, Information Sciences 653 (2024), 119797.

30.

, Lin

, Luo

and Xu

, HRST-LR: A Hessian Regularization Spatio-Temporal Low Rank Algorithm for Traffic Data Imputation, IEEE Transactions on Intelligent Transportation Systems (2023).

31.

Chen

, Zhang

, Xu

, Zeng

, Lu

, Zhao

, Chen

and Wang

, A Federated Parallel Data Platform for Trustworthy AI, 2021 IEEE 1st International Conference on Digital Twins and Parallel Intelligence (2021), 344–347.

32.

Zhang

, Zhang

, James

J.Q.

and Yu

, FASTGNN: A topological information protected federated learning approach for traffic speed forecasting, IEEE Transactions on Industrial Informatics 17(12) (2021), 8464–8474.

33.

Chen

, Zhao

, Tao

, Wang

, Qiao

, Zeng

and Tan

C.W.

, A Credible and Fair Federated Learning Framework Based on Blockchain[J], IEEE Transactions on Artificial Intelligence 1 (2024), 1–15.

34.

Yuan

, Chen

, Yang

, Zhang

, Yang

, Han

and Taherkordi

, Fedstn: Graph representation driven federated learning for edge computing enabled urban traffic flow prediction, IEEE Transactions on Intelligent Transportation Systems (2022).

35.

Chen

, Zhang

, Dong

, Qiao

, Huang

, Wang

, Nie

, Hou

and Tan

, FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning, Computing and Informatics 43(1) (2024), 1–37.

36.

Chen

, Zhang

, Dong

, Zhao

, Zeng

, Qiao

, Zhu

and Tan

C.W.

, FedTKD: A Trustworthy Heterogeneous Federated Learning Based on Adaptive Knowledge Distillation[J], Entropy 26(1) (2024), 96.

37.

, Wang

, Huang

, Wu

, Xu

, Wu

, Zhang

and Zheng

, Spatio-temporal self-supervised learning for traffic flow prediction, Proceedings of the AAAI Conference on Artificial Intelligence 37(4) (2023), 4356–4364.

38.

Yao

, Tang

, Wei

, Zheng

and Li

, Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction, Proceedings of the AAAI Conference on Artificial Intelligence 33(01) (2019), 5668–5675.

39.

Ramachandran

, Parmar

, Vaswani

, Bello

, Levskaya

and Shlens

, Stand-alone self-attention in vision models, Conference and Workshop on Neural Information Processing Systems 32 (2019), 1–13.

40.

Liu

, James

J.Q.

, Kang

, Niyato

and Zhang

, Privacy-preserving traffic flow prediction: A federated learning approach, IEEE Internet of Things Journal 7(8) (2020), 7751–7763.

41.

, Ye

, Hu

, Liu

, Cao

, Yang

, Shen

, Angeloudis

, Parada

and Wu

, A Federated Learning-Based Framework for Ride-Sourcing Traffic Demand Prediction, IEEE Transactions on Vehicular Technology (2023).

Datasets	Method	MAE	MAPE(%)
BikeNYC14	ARIMA	11.36	35.21
	ST-ResNet	5.91	28.62
	STGCN	5.54	27.81
	ASTGCN	6.12	26.47
	STSGCN	5.95	27.35
	MSSTN	5.77	26.28
	SDGCN	5.49	25.82
	ST-CGCN	5.75	25.87
	CGAT(ours)	5.34	24.15
BikeNYC16	ARIMA	9.05	28.34
	ST-ResNet	5.61	29.84
	STGCN	4.94	26.51
	ASTGCN	5.32	24.97
	STSGCN	5.01	25.86
	MSSTN	5.12	24.42
	SDGCN	4.91	22.87
	ST-CGCN	4.98	23.19
	CGAT(ours)	4.78	20.93

Clients numbers	Scenario-1	Scenario-2
K= 4	6:2:1:1	4:3:2:1
K= 6	3:3:1:1:1:1	2:2:2:2:1:1
K= 8	3:1:1:1:1:1:1:1	2:1.5:1.5:1:1:1:1:1

Dataset	Variant	MAE	MAPE(%)
BikeNYC14	FedCGAT	5.56	25.78
	FedCGAT-V1	5.80	26.75
	FedCGAT-V2	5.87	27.72
	CGAT	5.34	24.15
	CGAT-V1	5.42	25.91
	CGAT-V2	5.59	26.13
BikeNYC16	FedCGAT	4.96	22.94
	FedCGAT-V1	5.19	25.07
	FedCGAT-V2	5.28	25.76
	CGAT	4.78	20.93
	CGAT-V1	4.95	22.52
	CGAT-V2	5.10	23.23

Variant	Params	FLOPs	Training Time
	(M)	(M)	(s/epoch)
CGAT	1.841	429.251	6.82
CGAT-V1	1.466	384.568	6.23
CGAT-V2	1.195	310.284	5.89
SDGCN	1.968	523.461	8.36

FedCGAT: A federated demand prediction method for shared bicycles based on contrastive graph attention mechanism

Abstract

Keywords

1 Introduction

2 Related work

2.1 Time series prediction methods for shared bicycle demand

2.2 Traffic prediction method based on GCN method

2.3 Privacy protection in intelligent transport system

3 Preliminaries

3.1 Demand prediction for shared bicycles in transport network

4.1 Bicycle sharing demand forecasting model CGAT

5.1 System configuration

5.1.1 Dataset description

Table 1 Dataset description Datasets BikeNYC14 BikeNYC16 Time interval 60 min 30 min Time range 4/1/2014-9/30/2014 7/1/2016-8/29/2016 Graph nodes 128 200 Adjacency matrix (16,8) (10,20) Number of orders 6.8k+ 2.6m+

5.1.3 Metrics

5.2.1 Experimental configuration

5.2.2 Baselines

5.3 Result analysis

5.4.1 Experimental configuration

5.4.2 Baselines

5.4.3 Data distribution scenario description

Table 3 Data distribution with different number of clients Clients numbers Scenario-1 Scenario-2 K= 4 6:2:1:1 4:3:2:1 K= 6 3:3:1:1:1:1 2:2:2:2:1:1 K= 8 3:1:1:1:1:1:1:1 2:1.5:1.5:1:1:1:1:1

5.5.1 Effect of different modules on the performance of CGAT

Table 6 Comparison of model complexity Variant Params FLOPs Training Time (M) (M) (s/epoch) CGAT 1.841 429.251 6.82 CGAT-V1 1.466 384.568 6.23 CGAT-V2 1.195 310.284 5.89 SDGCN 1.968 523.461 8.36

Footnotes

Acknowledgment

References

Table 1
Dataset description

Datasets BikeNYC14 BikeNYC16

Time interval 60 min 30 min

Time range 4/1/2014-9/30/2014 7/1/2016-8/29/2016

Graph nodes 128 200

Adjacency matrix (16,8) (10,20)

Number of orders 6.8k+ 2.6m+

Table 3
Data distribution with different number of clients

Clients numbers Scenario-1 Scenario-2

K= 4 6:2:1:1 4:3:2:1

K= 6 3:3:1:1:1:1 2:2:2:2:1:1

K= 8 3:1:1:1:1:1:1:1 2:1.5:1.5:1:1:1:1:1

Table 6
Comparison of model complexity

Variant Params FLOPs Training Time

(M) (M) (s/epoch)

CGAT 1.841 429.251 6.82

CGAT-V1 1.466 384.568 6.23

CGAT-V2 1.195 310.284 5.89

SDGCN 1.968 523.461 8.36