Abstract
With the development of the Intelligent Transportation System, various distributed sensors (including GPS, radar, infrared sensors) process massive data and make decisions for emergencies. Federated learning is a new distributed machine learning paradigm, in which system heterogeneity is the difficulty of fairness design. This paper designs a system heterogeneous fair federated learning algorithm (SHFF). SHFF introduces the equipment influence factor
Introduction
The Intelligent Transportation System (ITS) [1, 2] has many types of equipment (such as global positioning system, radar, infrared sensor, etc.) and massive privacy data, which brings network delay, energy consumption, and data security problems in the distributed system. Federated learning [3, 4] is a new distributed machine learning paradigm, which can perform collaborative learning without sharing data, which protects users’ privacy. Federated Learning (FL) opens a new way for data analysis of the intelligent transportation system.
Recently, more and more researchers focus on Federated Learning (FL) algorithms with accuracy, fairness, and convergence time. Fairness means that the performance of each equipment should be as close as possible while the average accuracy is basically unchanged, so that the global model can fit all devices as much as possible. Balakrishnan etc. [5] developed a federated learning optimization algorithm based on importance sampling and ranking algorithm. Experiments show that compared with FedAvg [6] and FedProx [7], this model can significantly shorten the overall training time without losing performance. Liu et al. [8] proposed an FL based simulation learning framework for cloud robot systems with heterogeneous sensor data. The research shows that FL can improve the efficiency and accuracy of robot imitation learning by other robots’ knowledge. Yu et al. [9] proposed a revenue-sharing scheme for FL motivators (FLI), which dynamically allocates data owners’ budgets through context-aware methods. In order to ensure fairness between users and robustness to malicious adversaries, Hu et al. [10] defined federated learning as a multi-objective optimization problem, and proposed a new algorithm FedMGDA+ algorithm, which guarantees convergence to Pareto stationary solution. Li et al. [11] proposed an extensible q-FedAvg to solve the problem of FL fairness under statistical heterogeneity.
The above algorithm only solves the fairness problem of federated learning under statistical heterogeneity, does not consider the fairness of system heterogeneity. Li et al. [12] pointed out that there are differences in equipment hardware, network connectivity, and battery power (i.e. system heterogeneity) in FL. These differences will lead to different equipment proportions with other performance optimization objectives and bring the risk of difficult to fit. A reasonable method for the heterogeneity of the system is lack of attention. Therefore, this paper optimizes the above problems.
In this paper, we propose a system heterogeneous fair federated learning algorithm (SHFF). The algorithm introduces the equipment influence factor
Problem definition
System model
Vehicle type recognition [13, 14] based on sensors is one of the key technologies of ITS. In this paper, the SHFF algorithm solves the fair distribution of vehicle recognition accuracy. The basic architecture of the system is shown in Fig. 1, including a server (S) and
Symbol description
Symbol description
Federated learning learns the generated data on the local equipment and communicates with the server regularly to achieve global optimization. The heterogeneity of systems among equipment (i.e., the number of samples between equipment, training rounds per unit time, the accuracy of local training, etc.) will lead to differences in local equipment models. If only minimized the optimization objective of q-FFL without considering the system [12], the global optimization model is challenging to fit all equipment. Therefore, we design the equipment influence factor
The equipment influence factor
During the training process, the equipment uploads the influence factor
Equipment classification
System model.
SHFF improves the fairness between equipment by dynamically applying influence factors on them with different intelligence levels. The lower the intelligence of the equipment, the smaller the influence on the overall situation in the optimization goal. Therefore, a larger equipment influence factor enhances the proportion of the equipment in the global optimization objective. The equipment influence factors are continually changing in the training process to achieve dynamic adjustment and increase the influence of low-intelligence equipment in the optimization goal. If equipment with higher intelligence and the global average accuracy rate does not fluctuate much, the algorithm can improve the accuracy of equipment with lower intelligence, reduce variance between equipment, and enhance fairness between equipment.
Equation (2) gives the optimization objective, where
In this paper, a larger equipment influence factor improves the low intelligent equipment for the low intelligent equipment in the optimization target. This method makes the low intelligent equipment dominate the global training model, to strengthen its local accuracy to fit all the equipment. The definition of
System heterogeneity parameters (such as local training accuracy, sample number, and local training rounds per unit time) are negative indicators. Taking the accuracy of local training as an example, the higher the accuracy of local training is, the higher the intelligence of equipment is, and the smaller the influence factor of equipment is required. The main parameters (
Similarly, the number of samples and the number of local training rounds per unit time are negative indicators.
We use
SHFF algorithm
In the SHFF algorithm, the number of equipment, iterations, samples, initial accuracy and other parameters are initialized firstly. And we propose an initial equipment influence factor
The flowchart of the algorithm SHFF is shown in Fig. 2.
The flowchart of the algorithm SHFF.
The global fairness parameters
Training a series of target families with different
The step size is inversely proportional to the Lipchitz constant of the function gradient, which is selected by grid search. However, with the change of
Data set introduction
The data set used in this experiment is the vehicle data set from DARPA/Ixos sensitive project [23]. The data set consists of acoustic, seismic, and infrared sensor data collected by a distributed network of 23 sensors. Each sensor is modeled as equipment with
Isomorphic system
Figure 3 shows the fairness distribution between the SHFF algorithm, FedAvg (uniform sampling) algorithm, and q-FedAvg algorithm without applying system heterogeneity. The horizontal axis represents the accuracy, and the vertical axis represents the number of equipment that achieve the corresponding accuracy. The accuracy distribution of the SHFF algorithm is more centralized than that of FedAvg and q-FedAvg without applying system heterogeneity. It shows that the accuracy distribution of the SHFF algorithm is fairer than the above two algorithms.
Table 3 shows the average accuracy, the Worst 10% equipment accuracy, the Best 10% equipment accuracy, and variance of the data statistics of SHFF, FedAvg, and q-FedAvg algorithms in the case of 5 random shuffles of data sets without applying system heterogeneity. The global final average accuracy and the average accuracy of the Best 10% equipment fluctuate little. The average accuracy and variance of the Worst 10% of equipment are significantly improved compared with FedAvg and q-FedAvg algorithms. Compared with q-FedAvg algorithm, the average accuracy of SHFF algorithm on Worst 10% equipment is improved by 4.7%, and the variance is reduced by 35%.
SHFF, Fedavg, and q-FedAvg data statistics (isomorphic system)
SHFF, Fedavg, and q-FedAvg data statistics (isomorphic system)
Data statistics under four heterogeneous systems (epo_7–epo_10)
Accuracy distribution of SHFF and FedAvg (isomorphic system).
Figure 4 shows the comparison of fairness distribution of the SHFF algorithm, FedAvg algorithm, and q-FedAvg algorithm when system heterogeneity is applied. In this paper, the heterogeneity of the system changes the local training grounds. Figure 4a–d shows the fairness distribution of the local training round epo_7 to epo_10 in unit time. The accuracy distribution of the SHFF algorithm is more centralized than that of FedAvg and q-FedAvg.
Accuracy distribution of SHFF, FedAvg, and q-FedAvg (heterogeneous system).
Table 4 shows the average accuracy, the Worst 10% equipment accuracy, the Best 10% equipment accuracy, and variance of the data statistics of SHFF, FedAvg, and q-FedAvg algorithms in the case of 5 random shuffles of data sets when applying system heterogeneity. The accuracy of the Best 10% equipment and the global final average accuracy fluctuates little. The accuracy of the Worst 10% of equipment is significantly improved, and the variance between equipment and equipment reduce dramatically. And with the gradual increase of the heterogeneity of the system, the accuracy of the Worst 10% of equipment is improved more obviously, and the variance decrease more obviously. When the local training round is set to epo_10, compared with the FedAvg algorithm, the average accuracy of the Worst 10% of equipment is 45% improved, and the variance is 76% reduced. Compared with the q-FedAvg algorithm, the average accuracy of the Worst 10% of equipment is 26% improved, and the variance is 61% reduced.
The intelligent transportation system processes the massive data generated and make timely decisions for emergencies. Federated learning is a new distributed machine learning paradigm, but it is difficult for heterogeneity system fairness design. In this paper, we design a system heterogeneous fair federated learning algorithm (SHFF) for Federated learning fairness. The experimental results show that SHFF better promotes fairness between heterogeneous equipment. In the future, we will focus on the differences in network resources in heterogeneous systems, such as the network transmission performance optimization of federated learning.
Footnotes
Acknowledgments
The authors acknowledge Research on Key Technologies of intelligent manufacturing management based on digital twin technology, key R & D project in industrial field of Jilin Province (20200401076GX) (2020.1.1-2022.12.31).
