Abstract
With the rapid development of social networks and the massive popularity of intelligent mobile terminals, network anomaly detection is becoming increasingly important. In daily work and life, edge nodes store a large number of network local connection data and audit data, which can be used to analyze network abnormal behavior. With the increasingly close network communication, the amount of network connection and other related data collected by each network terminal is increasing. Machine learning has become a classification method to analyze the features of big data in the network. Face to the problems of excessive data and long response time for network anomaly detection, we propose a trust-based Federated learning anomaly detection algorithm. We use the edge nodes to train the local data model, and upload the machine learning parameters to the central node. Meanwhile, according to the performance of edge nodes training, we set different weights to match the processing capacity of each terminal which will obtain faster convergence speed and better attack classification accuracy. The user’s private information will only be processed locally and will not be uploaded to the central server, which can reduce the risk of information disclosure. Finally, we compare the basic federated learning model and TFCNN algorithm on KDD Cup 99 dataset and MNIST dataset. The experimental results show that the TFCNN algorithm can improve accuracy and communication efficiency.
Introduction
With the rapid development of network technology and applications, user data such as shopping websites, social networks, and real-time communications have increased exponentially. Many network data mining applications rely on advanced machine learning technology to achieve high-performance data analysis and processing. However, traditional machine learning methods are usually deployed in a central server or cloud to uniformly and centrally train the collected data information. This method will cause a large amount of data to occupy network bandwidth. When the central server is under too much pressure data confidentiality will be exposed during data transmission. With the increase in data volume, network bandwidth and data privacy, it is impractical and unnecessary to send all data to a remote cloud. Meanwhile, there are potentially huge risks in uploading user data to the central server. Solving the decentralized big data processing while protecting user privacy has become a hot research direction in current data mining [13].
Due to the decentralized nature of Federated learning, it can make more full use of devices in the Internet of things. Nowadays, the storage capacity and computing power of the terminal devices of the Internet of things have been greatly improved. Therefore, some machine learning training tasks can be sunk to the network terminal, so that multiple edge nodes and the central cloud can cooperate to handle big data analysis tasks. Google proposed a data federation modeling scheme for Android phones users in 2016 [1]. This scheme enables each Android mobile phone user to upload relevant parameters of local mobile phones to Android cloud. Google has established a federation model based on users with the same data feature dimension. The multi task learning based on Federated learning improves the communication overhead of the original distributed multi task learning and has better robustness. In references [4, 8], user privacy data is segmented through different clients to reduce the risk of user information being leaked by the central server. The central server can update the global training model by aggregating the gradient of each local node.
The machine learning trained model will become better with the increase of training data and diversity. Most of the algorithms in machine learning are mainly used to solve the classification problem, which can divide the network behavior differently. Meanwhile, machine learning algorithm is good at improving its performance through its own learning, which is very suitable for the current diverse and complex network environment. The introduction of machine learning algorithm into network intrusion detection system makes the system more intelligent and efficient. However, in many fields, especially those related to medicine, the law does not allow the sharing of personal data. Therefore, the combination of machine learning and federated learning structure is helpful to protect private data and better train the accuracy of detection model.
This paper utilizes the limited computing and communication resources at the edge to achieve the effective state of model convergence to detect anomaly quickly. We adopt a typical edge computing architecture, in which the edge nodes are interconnected with the remote center cloud in the form of gateway and router. Each node stores the collected network data and audit logs locally, and the local nodes do not upload the raw data to the central node to train a model for machine learning using federated learning. Each edge node independently trains the machine learning model locally, and optimizes and merges the global model through the central server. Because the data owned by any independent edge device is limited, the learning model of each participant is easy to fall into local optimization when training alone. Using the model learned by other participants to optimize the model parameters of local learning can effectively help each participant avoid local optimization and enable them to explore other values, so as to produce a more accurate model.
This paper proposes an adaptive aggregation algorithm, which can make the training effect of the central node correlate with the trust weight of each edge node to make the most effective use of resources. The algorithm investigates gradient descent-based joint learning algorithms that have general applicability to a variety of machine learning models. The learning process of the algorithm includes a local update step of the local model of the edge nodes, where each edge node performs gradient descent to adjust the (local) model parameters to minimize the loss function defined on its own dataset. The model also includes a global aggregation step where the central node will aggregate the model parameters uploaded by different edge-end nodes, train them by aggregating them, and then return the optimal values to each end-node to start the next iteration [9]. With the increasing number of edge terminals, the problem of communication between nodes and the fast convergence of the overall model becomes increasingly important. We will utilize the edge nodes weight to reduce the iteration and improve the global model performance.
In the following section we present the theoretical foundations of our approach in related work. In Section 3, we propose the trust-based federated learning model. Following the section, we evaluate our algorithm on KDD Cup 99 dataset and MNIST dataset.
Related work
Network anomaly detection technology is an important method to identify network anomalies and determine whether the system is compromised or a botnet. It is mainly divided into four types of methods: classification-based methods, statistics-based methods, clustering-based methods and information theory-based methods.
There have been a number of anomaly detection methods based on distance measures. Ref [11] proposed a simple sampling-based distance anomaly detection algorithm which outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. Zhang [15] defined a novel “Local Distance-based Outlier Factor” (LDOF) to measure the {outlier-ness} of objects in scattered datasets which addresses these issues. Pang [10] introduced the concept of Least Similar Nearest Neighbours (LeSiNN) and use LeSiNN to detect anomalies directly. However, the traditional distance-based anomaly measures contain major limitation in high-dimensional data due to the curse of dimensionality. Since deep distance-based anomaly detection techniques project data onto low-dimensional space before applying the distance measures, it can well overcome this limitation. Li proposed a novel traffic anomaly detection model based on Multi-Head Attentions (MHA) that take into account the inherent correlations of traffic generated by ICSs [6].
Federal machine learning is a machine learning framework that can effectively perform data usage and machine learning modeling under the conditions of user privacy protection and data security. The Federated learning framework utilizes multiple computing to collaboratively train the same model, while ensuring data security and protecting personal data privacy. Federated learning includes two processes, namely model training and model inference. In the process of model training, model-related information can be exchanged between the edge and the center, but the specific processed data does not participate in communication to protect the privacy of local data. In addition, federated learning can be combined with a variety of machine learning algorithms to solve the problem of the lack of labeled data in the intrusion detection field, greatly improving the efficiency of model training and improving the accuracy of model recognition [16]. The ref [3] focused on OS-ELM (Online Sequential Extreme Learning Machine) and combine it with auto encoder for anomaly detection. Li [5] proposed a novel privacy preserving incentive announcement network called CreditCoin, which based on Blockchain via an efficient anonymous vehicular announcement aggregation protocol. Wang [14] analyzed the risk of an individual permission and the risk of a group of collaborative permissions in Android security mechanism. Liu [7] proposed a new communication-efficient on-device federated learning (FL)-based deep anomaly detection framework for sensing time-series data. Reference [12] limited the threshold of gradient transmission to improve communication efficiency, but the uniform threshold is difficult to be set in practical systems. Dryden [2] used a fixed positive and negative ratio of gradients to screen the aggregated gradients to improve efficiency. However, these algorithms are difficult to work realistically in real systems.
Hence we considered node weights and rewards nodes with excellent training in the previous round, which not only can speed up the convergence but also improve the accuracy.
In this paper, we combine the advantages of distance measure and Federated learning to improve the anomaly detection efficiency.
Trust-based federated learning model
Edge node data processing model
We use the mature neural network to classify network features in local environment. In the local processing part of the edge nodes, the paper utilized a convolutional neural network approach to construct a neural network model to extract the most effective feature representation from the original data. The construction of this network includes convolutional, pooling and fully connected layers. The model uses two convolutional and pooling layers overlapped and then connects the fully connected layers shown in Fig. 1.

TFCNN framework.
In the convolutional layer, the network connection and system audit data are mainly transformed into feature maps to extract data features efficiently. In the convolution layer, it will initialize the weights and bias values randomly. The convolution calculation is performed in the form of a sliding window with the following equation.
In the convolutional layer, the main purpose is to transform the network
connection and system audit data into a feature map and extract data features
efficiently. In the convolution layer, its weights and bias values are
initialized randomly. The convolution calculation is performed in the form of a
sliding window with the following formula (1).
In the pooling layer stage, the down sampling operation is mainly performed to
reduce the feature space of the feature map; otherwise, too many feature map
parameters are not conducive to the extraction of higher-level features. The
down sampling operation of features is generally computed in two ways: maximum
pooling and average pooling.
In the fully connected layer, the classification operation of data features will be performed based on the feature extraction performed in the first two layers. This layer is similar to the implicit layer in a feed-forward neural network, which is no longer computed in terms of spatial topology, but is expanded into a vector form by means of an excitation function.
In the process of network communication, the network traffic information and audit data received by nodes in different locations are very different. There exist certain nodes that receive so little network data that anomalies are difficult to be detected. The federated learning model under the edge computing framework can effectively break the phenomenon of data silos and can simultaneously process the heterogeneous network data received by different device node ports to achieve distributed joint learning. At the same time, in edge computing, if individual device nodes drop off the line situation, the data from other end devices can still provide information to ensure the overall operation of the system. Some studies have indicated [7] that a small amount of missing gradients will not affect the overall effectiveness of the system. Due to the own architectural advantage of federation learning, the central node does not need to collect all the data of each terminal, which in turn can guarantee the privacy of terminal data.
In this paper, we propose a TFCNN model based on the federation learning framework, in which the edge nodes use CNN machine learning methods to train local data and send the parameters of the local training model to the central node, which aggregates the parameters with each edge node trust value and returns the corresponding learning rate for each edge node. Therefore, federated learning can be applied in edge computing architectures to improve communication efficiency and protect user privacy. The federation learning training process designed in this paper is shown as Fig. 2.

TFCNN flow chart.
Each terminal builds an initialized TFCNN model
Each terminal processes the model training locally
The terminal uploads the gradients from the local CNN model training process to the central node
The central node aggregates the gradient parameters with each edge weight value
The central node returns the learning parameters to each edge node
The edge side uses the new learning parameters for model training
Repeat steps 2–6 until the model converges and ends.
During the communication between the edge nodes and the central node, the
traditional federal learning calculates the learning parameters by summing the
gradients of each edge node and calculating the mean value as the gradient to
train the overall model. This method does not create an incentive mechanism for
well-performing gradients and may reduce the overall training accuracy due to
the problem of one edge node in the case of large variance of gradients.
In this thesis, an incentive mechanism is introduced in the federal learning
aggregation parameter process, i.e., weights are added for edge nodes with good
gradient performance. The weight of the edge nodes uploading gradients is
determined by the performance of the previous round of model training.
The TFCNN algorithm utilize edge node to collect local network data, statistical data, upload the processed gradient parameters, and the central node according to the weight value of the last round of edge node training aggregation parameters to aggregate gradient parameters. The specific algorithm is shown in Table 1.
In the local training part, edge nodes will collect the local data to train local model by CNN. The gradient parameters will be send to the center sever. The edge nodes download global aggregation parameters last round to train the local model.
In the global aggregation part, the center node will calculate the trust value of each edge node and aggregate the local parameters with the local node weight. The center will provide the global parameter to edge node for local training.
TFCNN algorithm
To verify the effectiveness of the TFCNN algorithm, we compare the anomaly detection accuracy and data processing efficiency with the federal average algorithm in the KDD CUP 99 dataset and MNIST dataset.
Experimental data set and feature analysis
The KDD CUP 99 dataset was the dataset used in the 1999 ACM (Association for Computing Machinery) SIGKDD (Special Interest Group on Knowledge Discovery and Data Mining) competition on data mining and knowledge discovery. The dataset collected network connectivity and system audit data from a simulated U.S. Air Force local area network built by Lincoln Laboratory, including various user types and different network traffic and attack methods for the network simulation over a 9-week period. Since the simulated environment of this dataset is close to the real network environment, this dataset is widely used for the evaluation of intrusion detection algorithms. In order to make the data information more realistic, the test dataset contains some attack types that are not yet present in the training dataset to better validate the detection performance of the algorithm.
Each network connection data in this dataset includes 41 fixed feature attributes and 1 label vector shown as Fig. 3. Among them, the 41 features are divided into five main categories, which are shown as Table 2 with Nomal, DOS, R2L, U2R and PROBING. The network anomalies in this dataset mainly include the following four aspects.
DOS (denial-of-service) denial-of-service attacks, such as back, land, neptune, pod, smurf, teardrop attacks.
R2L (unauthorized access from a remote machine to a local machine) unauthorized access from a remote host, e.g. ftp_write, guess_passwd, imap, multihop, phf, spy, warezclient, warezmaster.
U2R (unauthorized access to local superuser privileges by a local unpivileged user) Unauthorized access to local superuser privileges.
PROBING (surveillance and probing) Port surveillance or scanning, like, ipsweep, nmap, portsweep, satan.
The following is a sample of records from this dataset, which shows the first 41 items are network characteristic attributes, and the last item is a marker feature to distinguish whether it is in normal state or not. The first 9 features in this data are basic TCP connection attributes, such as continuous time, protocol type, number of bytes transferred. The 10th to 22nd items show the content characteristics of TCP connections, which include the number of failed login attempts, the number of access control files. Since the data of DoS attack and the data of U2R and R2L attacks are displayed in different forms, the data of U2R and R2L attacks are generally present in the data load of the packets, and the packets that are looked at once individually are similar to the data displayed for normal network connections. Therefore, it is necessary to detect the content characteristics that include TCP connections. Items 23 to 31 are time-based network traffic statistics features, which can reflect the correlation between the current connection itself and the previous connection record, and can show the relevance of network intrusion in terms of time slice. These features are divided into “same host” and “same service”, respectively, for the same target host and the same service of two kinds of connection data. Items 32 to 41 are host-based network traffic statistics features. This feature is only for the data related to the current connection in the past two seconds, but it will not reflect the statistical characteristics of the port scanned longer than 2 seconds, so this dataset in this part of the feature statistics of the 100 connections before the current connection and the current connection has the same target host statistics. For example, the number of connections in the first 100 connections that have the same target host with the same service as the current connection.

KDD CUP 99 data features.
Attack type
The above data contains two types of data, discrete and continuous. And there exist data with different features of different magnitudes from each other, producing the phenomenon that features with small magnitudes have a small influence. Therefore, we will pre-process this dataset before the experiment.
We utilize MNIST dataset to further test the feature classification effect of the TFCNN algorithm. The data set is from the National Institute of standards and technology, including 250 handwritten numbers, of which 50% are high school students and 50% are Census Bureau staff. The test set is the same proportion of handwritten digital data.
The experimental computer configuration is 16GB RAM, i7-9700 3.2GHz processor, Centos as the operating system, Python as the experimental language, and the framework construction is done using Pytorch platform.
In this paper, the model is evaluated using the accuracy (AC). Its calculation
formula is shown in Equation (7).
Experiment 1
The experiment is to verify the efficiency between the multiple edge nodes training model and single node model. The experiment constructed on the KDD Cup 99 dataset with single node and 3 edge nodes, respectively. Among them, the single-node model is set up with 2 hidden layers, the first layer with 20 5 × 3-dimensional convolutional kernels and the second layer with 50 5 × 5-dimensional convolutional kernels. All convolutional kernels have a step size of 1, and each convolutional layer is followed by a pooling layer, and all pooling layers have a step size of 2.
We compare the performance of CNN algorithm and CNN algorithm under federated learning architecture in different scale data sets. In Fig. 4, it can be found that with the reduction of the size of the data set, the accuracy of the detection model of each algorithm is decreasing. However, in the same scale data set, the accuracy of federated learning framework model is higher than that of CNN trained only with local model. Because TFCNN algorithm considers the performance of each edge node, its accuracy is higher than that of basic FL algorithm. In the case of multi classification, some edge nodes do not have some types of data, if only local data is used to train the model, the AC value will be very low or even unable to train. However, the federated learning model solves the problem of insufficient data types and obtains better classification effect through global aggregation parameters. Therefore, the CNN model under the federated learning framework can better detect network intrusion, and the TFCNN algorithm with trust mechanism can achieve better performance.

The experiment with single node.
This experiment compares the relationship between the number of edge nodes and the model training time based on Experiment 1. Experiment 2 used the same model parameters as Experiment 1, and was trained with 3 edge nodes, 5 edge nodes, 10 edge nodes, and 20 edge nodes, respectively. This experiment focuses on whether the increase in the number of edge nodes makes the overall training time decrease with the federal learning model. In the experiment, we divide the dataset equally among the nodes according to the number of nodes. We compare the running time for the model to reach 99% accuracy. The experimental results are shown in Fig. 5.
In Fig. 5, it can be seen that the training time gradually decreases as the number of nodes increases with 99% accuracy. However, the training time does not decrease linearly with the increase of the number of edge nodes, the rate of decrease of the training time gradually decreases with the increase of the number of nodes. This is because as the number of edge nodes increases, the network environment becomes complex, the communication environment becomes worse, and the model parameters are distributed and aggregated for a longer period of time, which results in a slower overall training time.

Comparison of training time with different number of edge nodes.

Comparative experiment on MNIST dataset with 5 edge nodes.
This experiment compares the accuracy between TFCNN algorithm and federal averaging algorithm (Basic FL). In Basic FL, the aggregation nodes are updated globally by taking the mean value of all node parameters. In Experiment 3, we conduct classification test on the MNIST dataset. We compared the accuracy between TFCNN algorithm and Basic FL with 5 and 10 edge nodes separately. In Fig. 6, we can find the TFCNN algorithm get 99% accuracy with less than 50 iterations. However, Basic FL algorithm finally converges to an accuracy of 95% after 300 iterations. When the edge nodes increase to 10 nodes, the accuracy of the experiment is maintained above 99% in Fig. 7. Moreover, TFCNN algorithm still maintained at 98% accuracy with training less than 50 iterations even at 30 iterations.
By comparing the two experiments, it can be found that the accuracy of Basic FL algorithm will improve with the number of edge nodes increasing, however the TFCNN algorithm considering the weight of edge nodes does not depend on the number of edge nodes, which shows that the weight calculation based on trust mechanism can improve the accuracy of the federated learning model.

Comparative experiment on MNIST dataset with 10 edge nodes.
The rapid development of the current network has led to the increasing cost of mass web data mining. Due to the exponential growth of network data volume, the traditional centralized machine learning consumes a large amount of data storage and computational resources. The popularity of smart terminals allows sinking the web data analysis to the edge nodes to reduce the computing pressure on the central server. The response time and attack classification are important research aspects in network anomaly detection field.
To address the above phenomena and problems, this paper proposed a trust-based federation learning anomaly detection algorithm by computing the performance of edge nodes. The federated learning structure utilized edge nodes to train local data models and upload machine learning gradient parameters to the central node. According to the last round of model training performance of each edge node, the weights of edge nodes have been considered while the center sever aggregated the global parameters, which can accelerate the convergence speed, reduce the model training iterations and improve the accuracy of anomaly detection.
This paper analyzed the performance of TFCNN on KDD cup 99 dataset and MNIST dataset. Through experiments on different scale datasets, we can find that the federated learning architecture has better detection accuracy than centralized machine learning in the case of sparse data. Comparing with Basic FL algorithm, the TFCNN algorithm trained converges quickly, and the accuracy is higher than Basic FL in the same iterations, which can effectively speed up the model training speed and improve the model training accuracy.
We proposed a federated learning model based on trust mechanism, which can meet the needs of model training for global data without sharing private data. However, due to the different processing capacity of different edge nodes, the nodes with low processing capacity will lead to the phenomenon that the global model training takes a long time. In the next step, we will focus on asynchronous federated learning to alleviate the problem of node imbalance, which will make each edge node participating in federated learning give full play to its maximum computing and storage capacity.
