An efficient intrusion detection method using federated transfer learning and support vector machine with privacy-preserving

Abstract

In recent decades, network security for organizations and individuals has become more and more important, and intrusion detection systems play a key role in protecting network security. To improve intrusion detection effect, different machine learning techniques have been widely applied and achieved exciting results. However, the premise that these methods achieve reliable results is that there are enough available and well-labeled training data, training and test data being from the same distribution. In real life, the limited label data generated by a single organization is not enough to train a reliable learning model, and the distribution of data collected by different organizations is difficult to be the same. In addition, various organizations protect their privacy and data security through data islands. Therefore, this paper proposes an efficient intrusion detection method using transfer learning and support vector machine with privacy-preserving (FETLSVMP). FETLSVMP performs aggregation of data distributed in various organizations through federated learning, then utilizes transfer learning and support vector machines build personalized models for each organization. Specifically, FETLSVMP first builds a transfer support vector machine model to solve the problem of data distribution differences among various organizations; then, under the mechanism of federated learning, the model is used for learning without sharing training data on each organization to protect data privacy; finally, the intrusion detection model is obtained with protecting the privacy of data. Experiments are carried out on NSL-KDD, KDD CUP99 and ISCX2012, the experimental results verify that the proposed method can achieve better results of detection and robust performance, especially for small samples and emerging intrusion behaviors, and have the ability to protect data privacy.

Keywords

Federated learning transfer learning support vector machine intrusion detection

1. Introduction

The network has not only become the foundation of society and our modern life, but also stores a large amount of data related to people’s private information and national security. Nowadays, computer network and the Internet are the fundamental components of our society, having made great contributions to economy and impacts on peopleâ€™s work and lifestyle [1]. Attacks on the network are increasing at an alarming rate. If the network is invaded or attacked, it will certainly threaten our normal activities and national security. Therefore, network security has become more and more important, and the problem of cybersecurity has been the focus of a growing number of people [2, 3, 4]. Researchers have proposed and implemented many measures to protect the network from intrusion and attack, such as firewall, digital signature and Intrusion Detection System (IDS) [5]. As emerging security defense technology, IDS [6, 7] can improve the reliability and security of the system by detecting and responding to various malicious behaviors, actively protect the network system from illegal external attacks and has become an important technical method to protect cyberspace security against network attacks and intrusions.

In recent years, with the rapid development of machine learning, deep learning and artificial intelligence, their application in intrusion detection has become a research hotspot in the field of network security [8]. Li et al. [9] proposed a new active transfer learning algorithm based on support vector machine (SVM) combined with the advantages of transfer learning and active learning, experiments show the effectiveness of the algorithm. Cheng et al. [10] proposed a basic Extreme Learning Machine (ELM) method based on random features and a kernel-based ELM classification method, which are superior to the Support Vector Machine in terms of classification, training and testing speed, and detection accuracy. Singh et al. [11] proposed an intrusion detection technology based on an online sequential extreme learning machine (OS-ELM), which uses alpha analysis to reduce time complexity and feature selection based on filtering, correlation and consistency discards irrelevant features. Wang et al. [12] proposed a kernel-based extreme learning machine (KELM) with supervised learning capabilities to shorten the training cycle. Abdulla et al. [13] proposed a new integration construction method, which create classifier integration with higher accuracy in intrusion detection.

Although these methods achieve a good application effect in intrusion detection, and reduce the false alarm rate and the false alarm rate, but it still faces some problems: (1) the labeled data of intrusion detection generated by a single organization is limited in terms of data volume and data diversity, sufficient available and high-quality data is required owing to having a direct impact on model [14]; (2) the distribution of data generated by each organization is different, but machine learning achieve good learning results need that the data meets the condition of independent and identical distribution, if the data of organizations are directly aggregated to lead to the poor detection performance [15]; 3) in reality, data usually exists in the form of isolated islands, although each organization contains a wealth of data, which is stored independently of each other and cannot centrally set up data pools to collect and share user data, so only the performance of machine learning models trained with independent data from various departments cannot achieve global optimization [16]; 4) the privacy and security of data are getting more and more attention, each organization has a strict privacy policy to protect its own data, and it is forbidden to exchange data without clear user approval [17].

Recently, federated learning [18] has become one of the most promising directions in the future development of machine learning. The purpose of federated learning is to conduct collaborative training without sharing private data. It does not need to aggregate the data required for model training for centralized calculation, but transmits encrypted gradient-related data, and uses multi-source data to collaboratively train the same model [19]. Its emergence allows traditional machine learning models to achieve better training results while ensuring data security and privacy, which has the advantages of distributed collaboration, good scalability, strong privacy protection capabilities, and low cost. Even federated learning is positioned as the last mile of artificial intelligence [17]. After federal learning was proposed, its related research work was carried out successively: such as edge computing [20, 21], wearable device [22], privacy protection [24, 44], mobile keyboard prediction [25] and intrusion detection [26]. In particular, another machine learning method – distributed machine learning (DML) [45, 46, 47] has much in common with federated learning. For example, both use decentralized data sets and distributed model training. Many researchers also regard federated learning as a special form of DML, such as [48, 49, 50], or consider federated learning as the next development of DML. However, compared with DML, federated learning has significant advantages in data decentralization, solving data islands and privacy protection. In this paper, benefiting from the advantages of federated learning, FETLSVMP based on federated transfer learning and SVM is proposed to solve the problems of data islands, scarce labeled samples, data privacy protection and personalization in intrusion detection. FETLSVMP utilizes federated learning [27] and homomorphic encryption [28] to build a powerful SVM model by aggregating data from independent institutions while protecting data privacy. FETLSVMP aggregates data distributed in various organizations through federated learning, and then uses transfer learning and SVM to build a personalized model suitable for each organization: first constructs a transfer SVM model to solve the problem of data distribution differences among various organizations; then, under the federal learning mechanism, the learned model can be used for learning without sharing the training data of each institution, so as to protect the data privacy of each institution; finally, an intrusion detection model is obtained.

Our contributions are highlighted as follows:

To the best of our knowledge, we are the first to apply federated transfer learning and SVM to intrusion detection and propose FETLSVMP. It aggregates intrusion detection data from different organizations without sacrificing data privacy and security, at the same time obtains a strong learning model that includes individualized behaviors suitable for each organization through knowledge transfer.

The experimental results show excellent performance: compared with traditional machine learning methods, FETLSVMP achieves more than 99% for three types of normal, prob and DOS attacks with a larger number, and more than 70% for R2L and U2R with a smaller number, which are significantly better than the best benchmark algorithm. Therefore, FETLSVMP improves the detection accuracy, especially for small samples and new intrusion behaviors, and also protects the privacy.

The rest of the paper is arranged as follows: Section 2 reviews the related works of federated transfer learning and SVM; in Section 3, an intrusion detection algorithm based on federated transfer learning and SVM is proposed; in Section 4, the effectiveness of the algorithm is verified on NSL-KDD, KDD CUP99 and ISCX2012; Section 5 summarizes the main work of this paper.

2. Related works

2.1 Federated transfer learning

Federated learning was first proposed by McMahan et al. [18] in 2016, and used to train machine learning models based on mobile phones distributed around the world by Google. Compared with traditional machine learning algorithms that require the large amount of high-quality data collected from various institutions to be trained on the cloud server for centralized training, it allows each user to train the model on the local machine, and upload the model to the server for aggregation after being encrypted, finally a global learning model are obtained through multiple iterations. This learning method not only protects the privacy of users, but also does not require data aggregation to cause uncontrollable data flow and sensitive data leakage. The process of federated learning is shown in Fig. 1.

Figure 1.

Process of Federated Learning.

It can be seen from Fig. 1 that the learning process of federated learning is as follows:

The organization downloads the global model $m_{t}$ from the central server;

Organization $k$ trains by local data to obtain the local model $m_{t,k}$ (local model updated in the $t$ -th iteration of the $k$ -th organization);

Each organization uploads the locally updated model to the central server;

The central server performs a weighted aggregation operation after receiving each model to obtain the global model $m_{t}$ (update the global model at the $t$ -th iteration).

In order to ensure data privacy, federated learning only allows all remote devices to exchange model gradients with a central server. In this process, each distributed device uses local data to train its own model, and then uploads the local model to the central server. After aggregating all the collected models, the server returns the new global model to each device.

According to different distribution patterns of samples and data feature space, federated learning can be divided into three categories: horizontal federated learning, vertical federated learning, and federated transfer learning [27]. Horizontal federated learning is suitable for situations where the user features of the two data sets overlap a lot, but the user overlap is small; vertical federated learning is applicable to the situation where the user features of the two data sets overlap very little, but the user overlaps a lot. Federated transfer learning [29] is different from the previous two federated learning algorithms. It is used when the user and user characteristics of the two data sets rarely overlap, without segmenting the data, but using transfer learning [30] (The transfer of knowledge from an existing field to a new field related to it) to overcome the lack of data or labels is often used to solve the problem of different feature spaces of data sets and scarcity of label samples. Therefore, the federated transfer learning formed by federated learning $+$ transfer learning has formed a mechanism that can not only protect privacy, but also jointly model. This mechanism has received strong responses in the industry, and especially useful in between different companies/organizations across domains. Recently, more and more researchers have begun to pay attention to this field [31, 32]. This paper is keenly aware of the advantages of federated transfer learning in the field of intrusion detection, and a federated transfer learning algorithm specifically proposed for intrusion detection to solve the current problems.

2.2 Support vector machine

SVM was officially published by Vapnik [33] in 1995, which is based on the VC dimension theory of statistical learning and the principle of structural risk minimization. The learning strategy of SVM is interval maximization, which can be formalized into a convex quadratic programming problem. Therefore, SVM is an optimization algorithm for convex quadratic programming. In the support vector machine, it is assumed that the training samples are linearly separable in the sample space or feature space, however it is difficult to find a linearly separable situation in reality. To alleviate this situation, SVM is allowed to make errors on some samples, introducing “soft margin”. The typical algorithms derived from SVM are [34, 35, 36, 37, 38].

Given training dataset $D=\{(x_{1},y_{1}),(x_{2},y_{2}),\ldots,(x_{n},y_{n})\}$ , $y_{i}\in\{-1,1\}$ . $n$ is the number of samples. The soft-margin optimization model of SVM is Eq. (1):

$\displaystyle\left\{{\begin{array}[]{l}\mathop{\min}\limits_{w,b,\xi}\frac{1}{% 2}||w||^{2}+C\sum\limits_{i=1}^{n}{\xi_{i}}\\ s.t.y_{i}(w^{T}\cdot x_{i}+b)\geqslant 1-\xi_{i},i=1,2,\ldots,n\\ \xi_{i}\geqslant 0,i=1,2,\ldots,n\\ \end{array}}\right.$ (1)

In Eq. (1), $\xi_{i}=\{\xi_{1},\xi_{2},\ldots,\xi_{n}\}$ is the distance of each sample that deviates from the optimal hyperplane, also known as the “slack variable”; $w$ is the normal vector of support vector hyperplane; $C>0$ , it is penalty factor and a constant, which is generally determined by specific application problems, the larger the value, the greater the penalty for misclassification.

The soft margin SVM algorithm requires the same distribution of training and testing samples, which solves the linear inseparable problem by the slack variables. However, for target domain with a small number of training sample datasets, the soft margin SVM approach is not sufficient to obtain an accurate model. In response to this situation, using similar domain knowledge with sufficient training samples to transfer to target domain can not only accelerates the establishment of learning tasks in target domain, but also alleviates the problem of reduced accuracy due to lack of training sample datasets. In addition, pay special attention to negative transfer issues when transferring knowledge, once negative transfer phenomenon occurs, the effect of classifier obtained after using the similar domain knowledge may be worse than that when not used. This is one of the problems that the paper focuses on.

3. The proposed FETLSVMP algorithm

3.1 Definition of problems

Given data $\{D_{1},\ldots,D_{k}\}$ from $K$ different organizations (users), the feature spaces of the data are different from each other. A machine model $M_{\textit{ALL}}$ trained on the data of all organizations. In this paper, the federated transfer learning model $M_{\textit{FED}}$ is trained through all data collaboration. Among them, no organization will disclose its data to other organizations. For convenience, binaryclassification problem is considered, class label set is $Y=\{-1,1\}$ , and the multi-classification problem can be extended on the binary-classification problem. Assuming that the accuracy rate is expressed as Acc, then the goal of learning is to ensure that the accuracy of federated learning $M_{\textit{FED}}$ is close to or better than that of traditional learning $M_{\textit{ALL}}$ :

$\displaystyle\textit{Acc}_{\textit{FED}}-\textit{Acc}_{\textit{ALL}}>\psi$

Among them, $\psi$ represents a very small non-negative real number. Suppose the classification space is $Y=\{-1,1\}$ .

3.2 Framework of FETLSVMP

Figure 2.

Framework of FETLSVMP.

FETLSVMP aims to achieve accurate detection of malicious network intrusions without sacrificing the privacy and security of data. Without loss of generality, we assume that there are 3 organizations (users) and 1 server, and more organizations can be expanded according to actual conditions. Figure 2 gives an overview of the framework.

The framework mainly includes four procedures: firstly, the cloud model on the server is trained according to the public data set; then, the cloud model is distributed to all organizations, and each organization can train their own models on their own data; subsequently, the model of each organization can be uploaded to the cloud server, and the new cloud model can be trained through model aggregation; finally, each organization can use cloud models and data as well as local data for training to build a personalized model. In the last step, because there is a huge distribution difference between the distributed server data and the data of each institution, it is necessary to adopt the method of transfer learning to perform probabilistic adaptation to obtain a model that is more suitable for each organization (as shown in Fig. 3, organization $i$ model $m_{i}$ has become a new personalized model $m_{i}$ ). It is worth noting that the process of sharing all parameters by homomorphic encryption of data does not involve any data leakage [28].

Figure 3.

Process of transfer knowledge.

It can be seen from Fig. 3 that domain knowledge plays an important role in the whole process. For example, firstly, the cloud model SVM is obtained by using the knowledge training of the source domain $D_{C}$ , then the target model TLSVM is constructed by combining SVM on the target domain $D_{i}$ , and finally a new global model FETLSVMP is obtained. See algorithm implementation process (1) for more detailed model knowledge transfer process. The new model FETLSVMP is created on the basis of cloud model SVM, which does not need to train the model from scratch, which shortens the training time and improves the training effect. It can be seen that the knowledge in domain $D_{C}$ is very important for the target model.

The learning process of FETLSVMP involves model establishment and parameter sharing. After the cloud model is established, it can be directly applied to various organizations. In the actual situation, it is obvious that the samples in the server and the data of various organizations have highly different probability differences. Therefore, traditional intrusion detection models fail in personalization, and transfer learning can adapt to the probability differences between models to achieve the purpose of personalization. In addition, due to the data privacy and security issues of various institutions, the models of various institutions cannot be easily and continuously updated.

3.3 Implementation of FETLSVMP

(1) Construction of transfer learning model

Federal learning solves the problem of data islands between institutions. Therefore, the data of all organizations can be used to build a cloud model, and then each organization can directly use the cloud model. However, due to differences in the distribution of data from various organizations and cloud data, it is obvious that the model does not perform well for specific users, that is, it cannot provide users with personalized features. In this paper, use transfer learning to build a personalized model for each user (organization), as shown in Fig. 3. In this way, through the acquired cloud model parameters, transfer learning is performed on users to learn their personalized models.

Server as a source domain $D_{S}=\{(x_{s_{j}},y_{s_{j}}),j=1,\ldots,n_{s}\}$ , $n_{s}$ is the number of samples, $y_{s_{j}}$ is corresponding class label of $x_{s_{j}}$ , marginal and conditional distributions are $P_{S}(x_{s_{j}})$ and $P_{S}(y_{s_{j}}|x_{s_{j}})$ . Similar to source domain, $D_{T}=\{(x_{t_{j}},y_{t_{j}}),i=1,\ldots,n_{t}\}$ represents target domain, $n_{t}$ is the number of samples, marginal and conditional distributions are $P_{T}(x_{t_{j}})$ and $P_{T}(y_{t_{j}}|x_{t_{j}})$ . In transfer learning, source and target domains are related but different.

According to the principles of structural risk minimization and domain negative similarity minimization, the objective function constructed based on SVM is shown in Eq. (3.3):

$\displaystyle\mathop{\min}\limits_{w_{t},b_{t},\xi}\frac{1}{2}||w_{t}||^{2}+C_% {t}\sum\limits_{j=1}^{n_{t}}{\xi_{j}^{t}}+\frac{\lambda}{2}||w_{s_{1}}-w_{t}||% ^{2}+\frac{1}{2}(b_{s_{1}}-b_{t})^{2}+n_{s}C_{s}v^{\prime}(f_{t},P_{S},P_{T})$

(2) $\displaystyle s.t.(w_{s_{1}}^{T}x_{t_{j}}+b_{s_{1}})(w_{t}^{T}x_{t_{j}}+b_{t})% \geqslant 1-\xi_{j}^{t},\xi_{j}^{t}\geqslant 0,j=1,2,\ldots,n_{t}$

In Eq. (3.3), $w_{t}$ and $b_{t}$ are the classifier parameters of that require the solution, $w_{s_{1}}$ and $b_{s_{1}}$ are the classifier parameters of known source domain, $\xi_{j}^{t}$ represents the slack variable, $C_{t}$ and $C_{s}$ represent the balance parameters of target and source domains. $n_{t}$ and $n_{s}$ are the number of samples in source and target domains, $v^{\prime}$ is the negative similarity measure function, $f_{t}$ is the target domain function, $\lambda$ is the compromise, $P_{S}$ and $P_{T}$ represent the distributions of source and target domains, respectively. The first term $\left(\frac{1}{2}||w_{t}||^{2}+C_{t}\sum\limits_{j=1}^{n_{t}}{\xi_{j}^{t}}\right)$ represents knowledge in target domain; the second term $\left(\frac{\lambda}{2}||w_{s_{1}}-w_{t}||^{2}+\frac{1}{2}(b_{s_{1}}-b_{t})^{2% }\right)$ represents the distance between source and target domains, which is the knowledge can be obtained from the source domain to some extent; the fourth term $(n_{s}C_{s}v^{\prime}(f_{t},P_{S},P_{T}))$ represents similarity between domains. The formal description of $v^{\prime}(f_{t},P_{S},P_{T})$ is shown in Eq. (3):

$\displaystyle v^{\prime}(f,P_{S},P_{T})=\frac{1}{n_{s}}\sum\limits_{i=n_{t}+1}% ^{n_{t}+n_{s}}{I(y_{i}f(x_{i})<0)}$ (3)

Since $v^{\prime}$ is a non-convex function in Eq. (4) and is difficult to optimize. Therefore, an easy-to-optimize proxy loss function is used. When a source domain sample $x$ is misclassified, $yf(x)<0$ , the greater absolute value of $yf(x)$ , the further distance is from the decision boundary and the greater the empirical risk is. To do this, replace it with the following function:

$\displaystyle V(f_{t},P_{S},P_{T})=\frac{1}{n_{s}}\sum\limits_{i=n_{t}+1}^{n_{% t}+n_{s}}{\max(0,-y_{i}f_{t}(x_{i}))<0)}$ (4)

Thus, Eq. (3.3) can be transformed into the following Eq. (3.3):

$\displaystyle\mathop{\min}\limits_{w_{t},b_{t},\xi}\frac{1}{2}||w_{t}||^{2}+C_% {t}\sum\limits_{j=1}^{n_{t}}{\xi_{j}^{t}}+\frac{\lambda}{2}||w_{s_{1}}-w_{t}||% ^{2}+\frac{1}{2}(b_{s_{1}}-b_{t})^{2}$ $\displaystyle{}+C_{s}\sum\limits_{i=n_{t}+1}^{n_{t}+n_{s}}{\max(0,-y_{i}(w_{s_% {1}}x_{i}+b_{s_{1}}))}$ (5) $\displaystyle s.t.(w_{s_{1}}^{T}x_{t_{j}}+b_{s_{1}})(w_{t}^{T}x_{t_{j}}+b_{t})% \geqslant 1-\xi_{j}^{t},\xi_{j}^{t}\geqslant 0,j=1,2,\ldots,n_{t}$

Introducing a slack variable in source domain, further transforming into Eq. (3.3):

$\displaystyle\mathop{\min}\limits_{w_{t},b_{t},\xi}\frac{1}{2}||w_{t}||^{2}+C_% {t}\sum\limits_{j=1}^{n_{t}}{\xi_{j}^{t}}+\frac{\lambda}{2}||w_{s_{1}}-w_{t}||% ^{2}+\frac{1}{2}(b_{s_{1}}-b_{t})^{2}+C_{s}\sum\limits_{i=n_{t}+1}^{n_{t}+n_{s% }}{\xi_{j}^{s}}$ $\displaystyle s.t.(w_{s_{1}}^{T}x_{t_{j}}+b_{s_{1}})(w_{t}^{T}x_{t_{j}}+b_{t})% \geqslant 1-\xi{}_{j}^{t},$ (6) $\displaystyle\xi_{j}^{t}\geqslant 0,j=1,2,\ldots,n_{t}$ $\displaystyle\xi_{j}^{s}\geqslant 0,j=n_{t}+1,n_{t}+2,\ldots,n_{t}+n_{s}$

The solution to Eq. (3.3) can be transformed into the dual problem in theorem 3.1.

Theorem 3.1. For the optimization problem in Eq. (3.3), the dual theory can be used to transform it into the following quadratic programming optimization problem:

$\displaystyle\mathop{\min}\limits_{\beta}\frac{1}{2}\beta^{T}H\beta-e^{T}\beta$ $\displaystyle s.t.0\leqslant\beta_{i}\leqslant C_{s}+C_{t},i=1,2,\ldots,n_{t}+% n_{s}$ $\displaystyle\beta=[\alpha^{s},\alpha^{t}]^{T}$ $\displaystyle e=\frac{1}{1+\lambda}\sum\limits_{i=1}^{n_{t}}{f_{i}^{s_{1}}w_{s% _{1}}x_{t_{1}}-f_{i}^{s_{1}}b_{s_{1}}}$ (7) $\displaystyle f_{i}^{s_{1}}=w_{s_{1}}^{T}x_{t_{i}}+b_{s_{i}}$ $\displaystyle H=[k_{ij}]_{n_{t}\times n_{t}}$ $\displaystyle k_{ij}=1+\frac{\lambda}{1+\lambda}x_{t_{i}}x_{t_{j}}$

Proof: The Lagrange function corresponding to Eq. (3.3) is as follows:

$\displaystyle L(w_{t},b_{t},\xi,\alpha,r,q)=\frac{1}{2}||w_{t}||^{2}+C_{t}\sum% \limits_{j=1}^{n_{t}}{\xi_{j}^{t}}+\frac{\lambda}{2}||w_{t}-w_{s_{1}}||^{2}+% \frac{1}{2}(b_{t}-b_{s_{1}})^{2}+C_{s}\sum\limits_{i=n_{t}+1}^{n_{t}+n_{s}}{% \xi_{j}^{s}}$ $\displaystyle-\sum\limits_{j=1}^{n_{t}}{r_{j}\xi_{j}^{t}}-\sum\limits_{i=n_{t}% +1}^{n_{t}+n_{s}}{q_{i}\xi_{i}^{s}}$ (8) $\displaystyle-\sum\limits_{j=1}^{n_{t}}{\alpha_{i}(f_{j}^{s_{1}}(w_{t}x_{t_{j}% }+b_{t})+\xi_{j}^{t})}$

Where $\alpha=(\alpha_{1},\ldots,\alpha_{n_{t}})$ , $r=(r_{1},\ldots,r_{n_{t}})$ , $q=(q_{1},\ldots,q_{n_{s}})$ are Lagrange coefficients, and $\alpha_{j}>0,r_{j}>0,q_{i}>0$ . Solve partial derivatives for $w_{t}$ , $b_{t}$ and $\xi$ , and let the derivative be zero:

$\displaystyle\frac{\partial L}{w_{t}}=w_{t}+\lambda(w_{t}-w_{s_{1}})-\sum% \limits_{j=1}^{n_{t}}{\alpha_{j}f_{j}^{s_{1}}x_{t_{j}}}=0\Rightarrow w_{t}=% \frac{1}{1+\lambda}\left(\sum\limits_{j=1}^{n_{t}}{\alpha_{j}f_{j}^{s_{1}}x_{t% _{j}}+\lambda w_{s_{1}}}\right)$ (9) $\displaystyle\frac{\partial L}{b_{t}}=(b_{t}-b_{s_{s}})-\sum\limits_{j=1}^{n_{% t}}{\alpha_{j}f_{j}^{s_{1}}}=0\Rightarrow b_{t}=b_{s_{1}}+\sum\limits_{j=1}^{n% _{t}}{\alpha_{j}f_{j}^{s_{1}}}$ (10) $\displaystyle\frac{\partial L}{\xi_{j}^{t}}=C_{t}-\sum\limits_{j=1}^{n_{t}}{% \alpha_{j}}-\sum\limits_{j=1}^{n_{t}}{r_{j}}=0\Rightarrow C_{t}=\sum\limits_{j% =1}^{n_{t}}{(\alpha_{j}+r_{j})}$ (11)

Substitute Eqs (9) $\sim$ (11) into Eq. (3.3):

$\displaystyle L=\frac{1}{2}w_{t}^{2}+\sum\limits_{j=1}^{n_{t}}{(\alpha_{j}+r_{% j})\xi_{j}^{t}}+\frac{\lambda}{2}(w_{t}-w_{s_{1}})^{T}(w_{t}-w)+\frac{1}{2}% \left(\sum\limits_{j=1}^{n_{t}}{\alpha_{j}f_{j}^{s_{1}}}\right)^{2}$ (12) $\displaystyle-\sum\limits_{i=n_{t}+1}^{n_{t}+n_{s}}{q_{i}\xi_{j}^{s}}-\sum% \limits_{j=1}^{n_{t}}{\alpha_{j}f_{j}^{s_{1}}w_{t}^{\ast}x_{t_{j}}}-\sum% \limits_{j=1}^{n_{t}}{\alpha_{j}f_{j}^{s_{1}}b_{t}}$

Simplify Eq. (3.3) to obtain Eq. (14):

$\displaystyle L=\frac{1}{2}\sum\limits_{j=1}^{n_{t}}{\sum\limits_{i=1}^{n_{t}}% {\alpha_{i}\alpha_{j}f_{i}^{s_{1}}f_{j}^{s_{1}}\left(1+\frac{\lambda}{1+% \lambda}x_{t_{i}}^{T}x_{t_{i}}\right)}}+\left(\frac{1}{1+\lambda}\sum\limits_{% j=1}^{n_{t}}{f_{j}^{s_{1}}w_{s_{1}}x_{t_{j}}-f_{j}^{s_{1}}b{}_{s_{1}}}\right)% \alpha_{j}$ (13)

Simplifying Eq. (13) yields a quadratic programming problem of Eq. (3.3).

Theorem 3.2. The Eq. (3.3) is a convex quadratic programming problem.

Proof: In order to prove that the quadratic programming problem is a convex quadratic programming problem, it is only necessary to prove that the matrix H is a positive semidefinite matrix. ${\rm{\bf H}}=[k_{ij}]_{n_{t}\times n_{t}}$ , $k_{ij}=\left(1+\frac{\lambda}{1+\lambda}x_{t_{i}}^{T}x_{t_{j}}\right)$ , $x_{t_{i}}^{T}x_{t_{j}}$ is a real symmetric matrix, so ${\rm{\bf H}}$ is semidefinite matrix. Theorem 2 is proved.

Theorem 3.3. The solution of Eq. (3.3) is the global optimal solution

Proof: According to Theorem 3.2, Eq. (3.3) is a convex quadratic programming problem that satisfies the KKT condition, so the optimal solution obtained from it is explained. Theorem 3.3 is proved.

The Eq. (3.3) is solved to obtain the optimal solution $\alpha=\alpha^{\ast}$ , and then the optimal values of the target classifier parameters $w_{t}$ and $b_{t}$ are obtained as follows:

$\displaystyle w_{t}=\frac{1}{1+\lambda}\sum\limits_{j=1}^{n_{t}}{\alpha_{j}f_{% j}^{s_{1}}x_{t_{j}}}+\frac{1}{1+\lambda}\lambda w_{s_{1}}$ (14) $\displaystyle b_{t}=\frac{1}{1+\lambda}\sum\limits_{i=1}^{n_{s}}{\alpha_{j}f_{% j}^{s_{1}}}+\frac{\lambda}{1+\lambda}b_{s_{1}}$ (15)

The objective decision function can be obtained from Eqs (14) and (15):

$\displaystyle f_{t}=w_{t}f({\rm{\bf x}}_{t})+b_{t}$ (16)

It can be seen from Eqs (14) and (15) that the target domain classifier parameters use the model knowledge from source domain, that is, the knowledge in source domain is transferred into target domain, and the knowledge effectively improve the performance of target classifier. That means that the server can transfer its own data according to different users, so as to help each user obtain a personalized learning model.

(2) Federal learning process

FETLSVMP uses federated learning to implement encryption model training and sharing. The learning process of federated learning is mainly composed of two key parts: cloud model learning and user model learning. The basic learning model is SVM. $f_{S}$ represents the server model to be learned, and the objective function of learning is as in Eq. (17):

$\displaystyle\arg\mathop{\min}\limits_{\Theta}L=\sum\limits_{i=1}^{n}{l(y_{i},% f_{S}(x_{i}))}$ (17)

Among them, $l(\cdot)$ represents the loss function of the network, $n$ represents the number of samples on the server, and $\Theta$ represents the network parameters that need to be learned. $(x_{i},y_{i})_{i=1}^{n}$ are samples from the server data with $n$ their sizes.

After obtaining the cloud model $f_{S}$ , it will be distributed to all organizations. As can be seen from Fig. 3, the organization’s “wall” prohibits the direct sharing of information. This process uses the existing homomorphic encryption technology [28] to avoid information leakage. Through federated learning, user data can be aggregated without affecting privacy and security.

The learning model of $k$ -th organization is expressed as Eq. (18):

$\displaystyle\arg\mathop{\min}\limits_{\Theta}L_{t}=\sum\limits_{i=1}^{n_{t}}{% l(y_{t_{i}},f_{k}(x_{t_{i}}))}$ (18)

In all organizational models, $f_{k}$ is based on a shared cloud model and uploaded to the server for aggregation. In federated learning, good performance can be obtained by sharing initialization and averaging models. Therefore, this paper uses the model average to align the user model, and averages $K$ user models in each round of training for cloud model update. Note that here we are only averaging the user model, and the updated cloud model is expressed as:

$\displaystyle f^{\prime}_{S}(w)=\frac{1}{M}\sum\limits_{k=1}^{K}{f_{k}(w)}$ (19)

Among them, $w$ is a parameter of the network, which represents the learning model of the $k$ -th organization. After enough iterations, the updated server model has better generalization capabilities. Subsequently, new users can participate in the next round of server model training, so FETLSVMP also has the ability to incrementally learn.
3.4 Training FETLSVMP

From Section 3.3, the learning process of FTLTrELM is given in Algorithm 1. The improved algorithm can continue to work with newly emerging organizational data, and update the user model and cloud model at the same time when faced with new data. Therefore, the longer the FTLTrELM is used, the more personalized the model in each organization, and the better the effect of intrusion detection.

Algorithm 1: Training process of FETLSVMP algorithm
Training process of FETLSVMP algorithm
Input: The training data set $\{D_{1},\ldots,D_{N}\}$ from N organizations, the class label set, the initial ELM. class label $Y=\{-1,1\}$ , a basic classifier SVM.
Step1. Use Eq. (17) to construct the initial cloud model $f_{S}$ on the public data set;
Step2. Distribute $f_{S}$ to all organizations;
Step3. Use Eq. (18) to train the model;
Step4. Use homomorphic encryption to update all user models to the server, and implement model aggregation through Eq. (19) as the updated cloud model $f^{\prime}_{S}$ ;
Step5. Distribute $f^{\prime}_{S}$ to all organizations to perform transfer learning process, and obtain a personalized model $f_{t}$ through Eq. (16);
Step6. Repeat the above process for user data that continues to appear.

4. Experimental results

4.1 Experimental setting evaluation criteria

All experiments in this paper are performed on a PC machine with a processor Intel Core (TM), 3.6GHz, 8GB RAM, and Windows 10 operating system. In order to verify the effectiveness and generalization performance of the proposed algorithm in intrusion detection FETLSVMP, three intrusion detection data sets, NSL-KDD, KDD CUP99 and ISCX2012, are used as experimental data sets. The benchmark algorithms selected in the experiment are: ELM [41], ACTrAdaBoost [42], NBSVM [43], and the results in Table 4 [51] are added.

The 10-fold cross-validation method is a standard method for evaluating machine learning algorithms, so this article uses the intrusion detection model proposed by its evaluation. Specifically, randomly sample the original data set into 10 mutually exclusive subsets of equal size. In each run of the model, nine subsets are selected to train the intrusion detection model, and the remaining one is used to test the model. Therefore, by repeating the above process 10 times, each subset has an equal chance to be selected to train and test the model. Finally, the performance of the proposed detection model is obtained by averaging the results of the test subset. The average of the results of all experiments repeated ten times is used as the final comparison result.

Commonly used evaluation indicators for detection include: Precision, Detection Rate and Accuracy, False positive rate and miss rate. Precision reflects the proportion of correctly classified samples to the total number of samples, the larger the better; accuracy reflects the proportion of true positive samples to the total number of samples classified as positive, the larger the better; the detection rate reflects the proportion of positive samples classified as positive in all positive samples. Accuracy and detection rate are a pair of contradictory indicators. The higher the accuracy, the fewer false positives, and the higher the detection rate, the fewer false negatives. If more precision, the detection rate will increase, but the accuracy will decrease, and vice versa. In intrusion detection, the false positive rate refers to the proportion of the number of misclassified positive samples to the number of all negative samples. The smaller the value, the better, and the higher it is, which is prone to “the wolf is coming”.

The formal description of precision rate, detection rate, accuracy rate, false positive rate and miss rate is as follows:

$\displaystyle\text{Precision: }CR=\frac{TP}{TP+FP}\times 100\%$ $\displaystyle\text{Detection Rate: }DR=\frac{TP}{TP+FN}\times 100\%$ $\displaystyle\text{Accuracy: }\textit{ACC}=\frac{TP+TN}{TN+FP+FN+TP}\times 100\%$ $\displaystyle\text{False Positive Rate: }FR=\frac{FP}{TN+FP}\times 100\%$ $\displaystyle\text{Miss Rate: }MR=\frac{FN}{TP+FN}\times 100\%$

Among them, $T P$ represents the number of positive samples that are correctly classified as positive samples, $F P$ represents the number of negative samples that are incorrectly classified as positive samples, $T N$ represents the number of negative samples that are correctly classified as negative samples, and $F N$ represents the number of positive samples that are incorrectly classified as negative samples.

In the work of this paper, the average accuracy rate, false alarm rate and false alarm rate of the experimental results obtained by the 10-fold cross-validation method are used as overall evaluation indicators to verify the effectiveness and accuracy of the algorithm.

4.2 Dataset

This section describes NSL-KDD, KDD CUP99 and ISCX2012 data sets and preprocesses them.

a. Dataset

ISCX2012 dataset

Researchers have noticed that the attack types considered in KDD CUP99 intrusion detection data set are now out of date. In 2012, the center of information security Excellence (ISCX) of the University of New Brunswick (UNB) released an intrusion detection data set named ISCX2012 [32]. This data set contains seven days of original network traffic data, including normal traffic and four intrusion types Dos and Prob, R2L and U2R. See Table 1 for details. In the experiment, 2% data is selected from the training data set, most of labeled information is deleted as source domain data set, the remaining labeled data is composed of target data set, and the two data sets together constitute the training data set; similarly, 1% data is taken from the test data set as the test data set.

Table 1
Distribution of attack types in ISCX2012 dataset

Dataset	Training dataset		Testing dataset
	Count	Percentage	Count	Percentage
Normal	890,726	97.27%	593,811	97.27%
BFSSH	4,179	0.46%	2,785	0.46%
Infiltration	6,027	0.66%	4,017	0.66%
HttpDoS	2,090	0.2%	1,392	0.23%
DDos	12,673	1.38%	8,448	1.38%
Total	915,695		610,453

KDD CUP99

KDD CUP99 is a widely used competition data for intrusion detection provided by Lincoln Laboratory of Massachusetts Institute of Technology. It is an intrusion detection data set with the best influence and credibility in academia [37]. The data set has 5*106 pieces of data, and each piece of data has 41 characteristic attributes and 1 class identifier. Contains about 38 attack types, of which 21 attack types appear in the training data set, and another 17 unknown attack types appear only in the test data set. The purpose of this design is to test the generalization ability of the classifier model. The ability to detect unknown attack types is also one of the important indicators to evaluate the effect of classifiers in intrusion detection applications.

So far, the most used by researchers is the 10% KDD CUP99 data set (including training data set and test data set), which is a sample of 10% of all data sets of the KDD CUP99 data set, and this data set is also used in the article. The 10% data set contains 1 type normal with normal signs, and 4 major network attack types: DOS, Probing, U2R and R2L. In the two 10% data sets, the four types of cyber attacks contain different amounts of attack behavior. Table 2 details 22 attack behaviors in the training data set, 39 attack behaviors in the test data set, and normal data is also counted as one type of attack in the table.

Table 2

KDD 99 dataset

Dataset	10% testing dataset	10% training dataset
Normal	Normal	Normal
DOS	back, land, neptune, pod, smurf, teardrop, apache2, mailbomb, udpstorm, processtable	back, land, neptune, pod, smurf, teardrop
Probing	ipsweep, nmap, portsweep, satan, saint, mscan	ipsweep, nmap, portsweep, satan
R2L	ftp_write, guess_password, imap, multihop, phf, spy, warezmaster, warezclient, named, xsnoop, xlock, sendmail, worm, snmpgetattack, snmpguess	ftp_write, guess_password, imap, multihop, phf, spy, warezmaster
U2R	buffer_overflow, loadmodule, perl, rookit, httptunnel, ps, sqlattack, xterm	buffer_overflow, loadmodule, perl, rookit

Table 3

Distribution of attack types in KDD 99 dataset

Dataset	Training dataset		Testing dataset
	Number	Proportion (%)	Number	Proportion (10%)
Normal	97278	19.69	60593	19.48
Probe	4107	0.83	4166	1.34
DOS	391458	79.24	229853	73.90
U2R	52	0.01	228	0.073
R2L	1126	0.22	16189	5.20

In order for the intrusion detection algorithm to be able to recognize new attack behaviors by learning from the training data set, the test data set in Table 3 contains more new attack behaviors than the training data set. In Table 3, the proportion of Normal in the two data sets in the 10% data set is basically the same, but the proportions of the other four attack types are significantly different; because U2R and R2L have very small proportions, most of the current detection algorithms have difficulty to detect these two types of attacks.

NSL-KDD

NSL-KDD [38] is an optimization of the KDD CUP99 data set, deleting some duplicate records, including different classification difficulty levels, and the number is more balanced, so that it can be used as an effective benchmark data set to correct and effectively detect the ability of model. The NSL-KDD data set includes 4 sub-data sets: KDDTrain $+$ , KDDTrain $+$ _20Precent, KDDTest $+$ , KDDTest $+$ 21. This article uses KDDTrain $+$ for training and KDDTest $+$ for testing. The data set contains 4 anomaly types, which are subdivided into 39 attack types, of which 17 unknown attack types appear in the test set. Each record includes 41 characteristics and 1 category identifier. Among the 41 features, there are 9 basic TCP connection features, 13 TCP connection content features, 9 time-based network traffic statistics features, and 10 host-based network traffic statistics features. The details of the NSL-KDD data set are shown in Table 4.

Table 4

Distribution of attack types in KDD 99 dataset

Dataset	KDDTrain $+$		KDDTest $+$
	Number	Proportion (%)	Number	Proportion (%)
Normal	67345	53.46	9711	43.08
Probe	11655	9.25	2421	10.74
DOS	45926	36.46	7458	33.08
U2R	52	0.04	200	0.89
R2L	995	0.79	2754	12.22

b. Data preprocessing

In the intrusion detection data set, there are non-numerical data and the dimension difference between the values, and these data need to be converted into numerical data and unified dimension processing. Therefore, the data preprocessing operation includes two steps: character type digitization and data normalization.

Character type digitization

ISCX2012, NSL-KDD and KDD CUP99 data set processing methods are also the same. In each record, their symbol characteristics are converted into numerical data by using 1 to N encoding. Take KDD CUP99 as an example, convert 3 network connection types, 70 network service types, 23 attack types (including normal type Normal), and 11 network connection states of the character type of the data set into numerical types. The converted forms of the 11 network connection types are shown in Table 5, and other character types are similar.

Data cleaning

The actual data set is very vulnerable to noise, missing values and inconsistent data, because the sample size of the data set is too large, and most comes from multiple heterogeneous data sources. Low quality data will lead to low-quality results, so it is necessary to clean up the data.

Table 5

Network connection type conversion

Character	Numerical
OTH	0
REJ	1
RSTO	2
RSTOSO	3
RSTR	4
S0	5
S1	6
S2	7
S3	8
SF	9
SH	10

For duplicate values, find out the duplicate values in data and eliminate them. The missing value adopts statistical method, and for numerical data, the average method is used to make up; For categorical data, use the value with the largest category mode to supplement. For the noise data, the box division method is used, which is a simple and commonly used preprocessing method, and the final value is determined by examining the adjacent data. The so-called “box division” is actually a sub interval divided according to the attribute value. If an attribute value is within the range of a sub interval, it is said to put the attribute value into the “box” represented by this sub interval. In this paper, the minimum entropy method is used to put the data to be processed (a column of attribute values) into some boxes according to certain rules, investigate the data in each box, and use some method to process the data in each box respectively. When adopting the box splitting technology, the two main problems that need to be determined are: how to divide the boxes and how to smooth the data in each box.

Table 6

Average accuracy rate, false positive rate (%) and miss rate on NSL-KDD dataset (%)

Algorithm	DoS			Normal			Probe			R2L			U2R
	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR
FETLSVMP	99.99	2.12	1.81	99.76	2.97	2.28	99.73	3.25	3.21	90.12	5.15	6.96	70.56	5.98	6.85
ACTrAdaBoost	98.26	2.28	2.21	98.46	3.19	2.33	98.27	3.56	3.38	73.38	9.66	9.02	55.65	9.28	10.27
NB-SVM	97.86	2.38	2.65	97.88	3.55	2.54	97.38	3.88	3.49	69.11	9.87	9.26	43.28	9.35	10.64
ELM	97.23	2.46	3.02	97.46	4.17	3.37	97.12	4.15	4.37	68.17	10.24	9.97	32.34	10.24	11.02
KNN Cubic [50]	96.85	2.56	3.26	96.87	4.52	3.78	96.44	4.38	4.58	61.86	10.39	10.26	28.89	10.54	11.68
SVM-Linear [50]	97.12	2.49	3.15	97.28	4.28	3.65	96.59	4.26	4.42	65.34	10.18	10.11	30.65	10.36	11.35

Table 7

Average accuracy rate, false positive rate (%) and miss rate on KDD 99 dataset (%)

Algorithm	DoS			Normal			Probe			R2L			U2R
	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR
FETLSVMP	99.65	2.32	2.15	99.98	3.11	2.28	99.53	3.49	3.26	88.48	5.23	6.11	70.35	5.29	6.89
ACTrAdaBoost	98.54	2.46	2.28	99.15	3.22	2.43	99.18	3.74	3.44	73.76	8.97	8.64	48.29	8.68	9.38
NB-SVM	97.43	2.51	2.67	98.65	3.58	2.59	98.85	3.85	3.58	69.24	9.23	9.11	47.37	8.85	9.86
ELM	96.65	2.53	2.98	97.47	4.11	3.47	97.65	4.26	4.43	67.34	11.23	10.12	31.87	10.35	11.13
KNN Cubic [50]	95.52	3.23	3.45	96.85	4.37	4.12	97.14	4.85	4.85	65.26	12.63	11.56	28.65	11.85	12.05
SVM-Linear [50]	96.86	2.98	3.16	97.56	4.25	3.62	97.43	4.49	4.67	66.12	12.18	11.28	31.22	11.26	11.64

Data normalization

After digitization, for the continuous feature attributes in data set, the measurement methods of each attribute are different, so the calculation of the distance between the data has a greater impact, which in turn affects the accuracy of the calculation results. In order to avoid the above situation, the difference between different features can be eliminated. For discrete features, the method normalization is adopted. For continuous features, the method of Z-Score is used to fix the value at [0, 1], As shown in Eqs (20) and (21).

$\displaystyle x_{mn}=\frac{x-x_{\min}}{x_{\max}-x_{\min}}$ (20) $\displaystyle x_{ze}=\frac{x-x_{av}}{\sigma}$ (21)

Among them, $x_{ze}$ is the original sample data, $x_{\max}$ is the maximum value, $x_{\min}$ is the minimum value, $x_{av}$ is the average value, $\sigma$ is the standard deviation, $x_{mn}$ and $x_{ze}$ is the normalized result of the original data.

4.3 Experiments results and analysis

In this section, the experimental results of the algorithms on NSL-KDD, KDD CUP 99 and ISCX2012 datasets are analyzed to verify the effectiveness of the algorithm proposed in this chapter. In addition, the influence of the adjustable parameter $C_{1}$ on the results of TrELM is discussed and analyzed.

Table 8
Average accuracy rate, false positive rate (%) and miss rate on ISCX2012 dataset (%)

Algorithm	Normal			Infiltrating			HttpDoS			DDoS			BFSSH
	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR	ACC	FR	MR
FETLSVMP	99.86	2.25	1.25	98.65	2.68	1.96	92.25	3.54	3.34	98.64	5.52	3.75	96.69	2.59	1.12
ACTrAdaBoost	98.45	2.86	1.87	95.86	2.89	2.18	89.56	3.87	4.12	97.52	5.86	3.98	94.65	2.83	1.29
NB-SVM	97.86	2.92	2.11	94.75	3.28	2.64	88.36	4.15	4.21	96.44	6.12	4.26	93.65	3.21	1.64
ELM	94.43	3.25	2.54	90.76	3.76	2.85	82.33	4.89	4.23	91.25	6.23	4.65	89.38	3.26	1.87
KNN Cubic [50]	92.18	3.75	3.64	89.75	4.28	3.32	80.85	5.37	4.62	89.45	6.64	5.11	87.63	3.76	2.88
SVM-Linear [50]	93.97	3.56	2.95	90.11	4.17	3.28	81.54	5.26	4.31	90.26	6.56	4.88	88.28	3.64	2.45

Tables 6–8 are the average accuracy, false positive rate and miss rate of the algorithm on NSL-KDD, KDD CUP99 and ISCX2012 data sets. The conclusions that can be obtained are as follows:

Sufficient available intrusion detection training samples are the base of high-accuracy classifier trained. On the intrusion detection data set NSL-KDD and KDD CUP99, there are a large number of three types of attacks: Normal, Prob and DOS. All algorithms have high accuracy for these three types of attacks, reaching over 95%. In the same way, on the ISCX2012 data set, the accuracy of all algorithms against a large number of Normal, Infiltration, and DDoS attacks reached 80%.

On the intrusion detection data sets NSL-KDD and KDD CUP99, for the attack types U2R and R2L with a small number of samples, traditional intrusion detection algorithms are not enough to train and obtain a high-accuracy detection model. Therefore, they have low accuracy against the two types. FETLSVMP and ACTrAdaBoost are transfer learning algorithms that use knowledge from a large number of well-labeled intrusion detection samples to train the detection types for U2R and R2L, so their detection rates for U2R and R2L will be improved; from Tables 6 and 7 it can be seen that FETLSVMP and ACTrAdaBoost have improved accuracy on U2R and R2L, especially the accuracy rate of R2L: R2L is above 70%, and U2R is above 46%. On the ISCX2012 data set in Table 8, the transfer learning algorithms FETLSVMP and ACTrAdaBoost are more accurate than ELM, NB-SVM, KNN Cubic and SVM-Linear for the smaller number of attack types HttpDos and BFSSH. Therefore, FETLSVMP has a significant effect on improving the detection rate of U2R and R2L attack types that contain a small number of samples.

In terms of false alarm rate, Tables 6 and 7 show that the false alarm rate of the three intrusion attack behaviors Normal, Probe and DOS in the intrusion detection data set NSL-KDD and KDD CUP99, all algorithms do not exceed 5%. Among them, FETLSVMP has the lowest false alarm rate, all below 4%. In the intrusion detection behavior U2R and R2L, the three non-transfer benchmark algorithms performed poorly. The false alarm rate of R2L and U2R on the NSL-KDD data set reached 10% and 9% or more, on the KDD CUP99 data set has reached more than 9%. However, FETLSVMP performed relatively well on these two data sets below 6%. On the ISXC2012 data set, Table 8 shows that FETLSVMP has the lowest false positive rate among all attack types.

In terms of miss rate, it can be seen from Tables 6–8 that FETLSVMP is the lowest in the nine attack behaviors compared with the benchmark algorithm.

The experimental results show that on the KDD 99 and NSL-KDD data sets, accuracy of FETLSVMP for the five attack types is higher than the benchmark algorithm, and the accuracy of the attack type with a small number of samples has also been significantly improved; in ISCX2012 data set, FETLSVMP’s accuracy for the 5 attack types is also better than the benchmark algorithms.

Therefore, FETLSVMP has improved the detection rate of all 9 attack behaviors, especially for the R2L attack behavior detection rate with sparse samples; there is no problem that the detection rate of a certain attack behavior is too low, and the detection rate is very different. Effectively alleviate the imbalance problem of the attack type detection in the machine learning algorithm; FETLSVMP has shown significant advantages in the false positive rate and the false negative rate. In other words, FETLSVMP achieves the best classification accuracy for all intrusions. This is because federated learning can indirectly use more information in distributed data to train better models, and the model becomes more consistent through transfer learning. The characteristics of each organization, compared with traditional methods (ELM, NB-SVM, KNN Cubic and SVM-Linear), greatly improve the recognition results. In short, the experiment proves the effectiveness of FETLSVMP, and the proposed algorithm can also protect the privacy of data, which is lacking in other algorithms. The experimental results on the three data sets also show that the FETLSVMP algorithm has better generalization performance.

Table 9

Average training time on NSL-KDD, KDD CUP99 and ISCX2012(s)

Algorithm	NSL-KDD	KDD CUP99	ISCX2012
FETLSVMP	6.56	13.76	7.12
ACTrAdaBoost	12.24	50.12	14.52
NB-SVM	8.87	48.36	12.76
ELM	4.23	12.45	5.87
KNN Cubic [50]	6.75	16.86	7.64
SVM-Linear [50]	3.88	12.78	4.97

Table 9 shows the average training time of the algorithm on three intrusion detection data sets. Compared with the transfer learning algorithms TrAdaBoost and FETLSVMP, the non-transfer learning algorithm ELM, NB-SVM, KNN Cubic and SVM-Linear have no transfer learning process and do not need to process additional data, so their training time is relatively less; while FETLSVMP needs to transfer the knowledge in the auxiliary data to help build the target learning task, and the training time is slightly increased.

It can be seen from Eq. (11) that the objective function of FETLSVMP includes the parameter $\lambda$ , whose value will affect the learning effect, so it is necessary to analyze the sensitivity. On NSL-KDD, KDD and ISCX2012 data sets, according to the parameter $\lambda$ , take [0, 1] different values to record the changes in classification effect of FETLSVMP algorithm. The average accuracy rate is shown in Figs 4–6, according to $\lambda$ parameter sensitivity analysis of the experimental results presented by them is to illustrate the influence of parameters on learning effect of FETLSVMP.

Figure 4.

The sensitivity of parameter $\lambda$ analysis on NSL-KDD.

Figure 5.

The sensitivity of parameter $\lambda$ analysis on KDD CUP99.

Figure 6.

The sensitivity of parameter $\lambda$ analysis on ISCX2012.

From Figs 4–6, the conclusions obtained through analysis are as follows:

Using the parameter grid search method provided by literature [39], determine the value of $\lambda$ , and record the experimental results on the real data set at the same time. Show that taking different values within a certain range, the classification effect of FETLSVMP is significantly different; it can be seen that the closer the domain relationship, the greater the value and the higher the accuracy. Therefore, the algorithm is sensitive to the regularization parameter $\lambda$ within a certain value range, and the parameter value when the algorithm classification effect is the best can be obtained on different cross-domain tasks.

5. Conclusions

In this paper, FETLSVMP based on federated transfer learning and extreme learning machine for intrusion detection is proposed. FETLSVMP aggregates data from different organizations without compromising privacy and security, realizes the adaptation to each user through knowledge transfer, and conducts personalized model learning. FETLSVMP provides a method for future research on intrusion detection. The effectiveness of the algorithm is evaluated in the experiment. The experimental results show that FETLSVMP effectively solves small samples and emerging attack behaviors with low detection accuracy, privacy protection, data islanding problems, and improves the detection effect. In the future work plan, in the process of transfer learning, further consideration will be given to measuring the difference in conditional probability of each organization’s data and the improvement of training efficiency.

References

and Lu

, An effective intrusion detection approach using SVM with naïve Bayes feature embedding, Computers & Security 103(3) (2020), 102158.

Wang

Zhang

et al., A novel intrusion detection system based on an optimal hybrid kernel extreme learning machine, Knowledge-Based Systems 195 (2020), 105648.

Wang

H.W.

and Wang

S.S.

, An effective intrusion detection framework based on SVM with feature augmentation, Knowl.-Based Syst 136 (2017), 130–139.

Buczak

A.L.

and Guven

, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Commun. Surv. Tutor 18 (2016), 1153–1176.

Yahalom

Steren

Nameri

Roytman

Porgador

and Elovici

, Improving the effectiveness of intrusion detection systems for hierarchical data, Knowl.Based Syst 168 (2019), 59–69.

Benmessahel

Xie

and Chellal

, A new evolutionary neural networks based on intrusion detection systems using multiverse optimization, Applied Intelligence 48(C) (2018), 2315–2327.

Wang

C.R.

R.F.

Lee

S.J.

et al., Network intrusion detection using equality constrained-optimization-based extreme learning machines, Knowledge-Based Systems 147(MAY) (2018), 68–80.

Buczak

and Guven

, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Communications Surveys & Tutorials 18(2) (2017), 1153–1176.

Al-Turaiki

and Altwaijry

, A convolutional neural network for improved anomaly-based network intrusion detectionig, Data 9(3) (2021), 233–252.

10.

Cheng

Tay

W.P.

and Huang

, Extreme learning machines for intrusion detection, in: The 2012 International Joint Conference on Neural Networks, 2012, pp. 1–8.

11.

Singh

Kumar

and Singla

R.K.

, An intrusion detection system using network traffic profiling and online sequential extreme learning machine, Expert Systems with Applications 42(22) (2015), 8609–8624.

12.

Wang

Zeng

Liu

et al., Deep belief network integrating improved kernel-based extreme learning machine for network intrusion detection, IEEE Access 99 (2021), 16062–16091.

13.

Aburomman

A.A.

and Reaz

M.B.I.

, A novel SVM-kNN-PSO ensemble method for intrusion detection system, Applied Soft Computing 38(C) (2016), 360–372.

14.

and Xue

, Transfer naive bayes algorithm with group probabilities, Applied Intelligence 50(1) (2020), 61–73.

15.

Xue

and Gao

, Multi-source deep transfer neural networks algorithm, Sensors 19(19) (2019).

16.

Mcmahan

H.B.

Moore

Ramage

et al., Communication-efficient learning of deep networks from decentralized data, Artificial Intelligence and Statistics, 2017, 1273–1282.

17.

Yang

, Federated learning: The last on kilometer of artificial intelligence, CAAI Transactions on Intelligent Systems 15(1) (2020), 183–186.

18.

Konen

Mcmahan

H.B.

Ramage

et al., Federated Optimization: Distributed Machine Learning for On-Device Intelligence, arXiv 1610.02527, 2016, 1–38.

19.

Konen

Mcmahan

H.B.

F.X.

et al., Federated Learning: Strategies for Improving Communication Efficiency, arXiv 1610.05492, 2016, 1–10.

20.

Wang

Tuor

Salonidis

et al., Adaptive federated learning in resource constrained edge computing systems, IEEE Journal on Selected Areas in Communications 37(6) (2019), 1205–1221.

21.

Huang

Dai

et al., Differentially private asynchronous federated learning for mobile edge computing in urban informatics, IEEE Transactions on Industrial Informatics PP(99) (2019), 1–1.

22.

Chen

Qin

Wang

et al., FedHealth: A federated transfer learning framework for wearable healthcare, IEEE Transactions on Intelligent Systems PP(99) (2020), 1–1.

23.

Huang

Dai

Maharjan

and Zhang

, Differentially private asynchronous federated learning for mobile edge computing in urban informatics, IEEE Transactions on Industrial Informatics 16(3) (2020), 2134–2143.

24.

Liu

et al., Adaptive privacy-preserving federated learning, Peer-to-Peer Networking and Applications 13(5) (2020).

25.

Hard

Rao

Mathews

et al., Federated Learning for Mobile Keyboard Prediction, arXiv 1811.03604, 2018.

26.

Nguyen

T.D.

Marchal

Miettinen

et al., DoT: A Federated Self-learning Anomaly Detection System for IoT, in: 2019 IEEE 39th International Conference on Distributed Computing Systems, 2019.

27.

Yang

Liu

Chen

et al., Federated machine learning: Concept and applications, ACM Transactions on Intelligent Systems and Technology 10(2) (2019), 1–19.

28.

Zhang

Xie

Bai

et al., A survey on federated learning, Knowledge-Based Systems 216(1) (2021), 106775.

29.

Liu

Kang

Xing

Chen

and Yang

, A secure federated transfer learning framework, IEEE Intell. Syst 35(4) (2020), 70–82.

30.

Pan

S.J.

and Yang

, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering 22(10) (2010), 1345–1359.

31.

Liang

Liu

Chen

et al., Federated Transfer Reinforcement Learning for Autonomous Driving, arXiv, 2019.

32.

Sharma

Xing

Liu

and Kang

, Secure and Efficient Federated Transfer Learning, in: 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 2569–2576.

33.

Cortes

and Vapnik

V.N.

, Support vector networks, Machine Learning 20(3) (1995), 273–297.

34.

Suzuki

and Shouno

, Support vector machine histogram: New analysis and architecture design method of deep convolutional neural network, Neural Processing Letters 2017(4) (2017), 1–16.

35.

Suykens

J.A.K.

and Lukas

, Sparse least squares support vector machine classifiers, Neural Processing Letters 9(3) (2020), 293–300.

36.

Suykens

J.A.K.

and Vandewalle

, Least squares support vector machine classifiers, Neural Processing Letters 9(3) (1999), 293–300.

37.

Yang

Zhou

and Jiang

, Least squares support vector machine with parametric margin for binary classification, Journal of Intelligent & Fuzzy Systems 30(5) (2016), 2897–2904.

38.

Maheswari

R.V.

Subburaj

Vigneshwaran

et al., Non linear support vector machine based partial discharge patterns recognition using fractal features, Journal of Intelligent & Fuzzy Systems Applications in Engineering & Technology 27(5) (2014), 2649–2664.

39.

and Xue

, Research on transfer learning algorithm based on support vector machine, Journal of Intelligent &Fuzzy Systems 38(4) (2020), 4091–4106.

40.

Dhanabal

and Shantharajah

S.P.

, A study on NSL-KDD dataset for intrusion detection system based on classification algorithms, International Journal of Advanced Research in Computer and Communication Engineering 4(6) (2015), 446–452.

41.

Huang

G.B.

Chen

and Siew

C.K.

, Universal approximation using incremental constructive feedforward networks with random hidden nodes, IEEE Trans Neural Netw 17(4) (2006), 879–892.

42.

and Xue

, An intrusion detection method based on active transfer learning, Intelligent Data Analysis 24(2) (2020), 363–383.

43.

and Lu

, An effective intrusion detection approach using SVM with nave Bayes feature embedding, Computers & Security 103(3) (2020), 102158.

44.

Papadopoulos

Abramson

Hall

A.J.

Pitropakis

and Buchanan

W.J.

, Privacy and trust redefined in federated machine learning, Machine Learning and Knowledge Extraction 3(2) (2021), 333–356.

45.

Verbraeken

Wolting

Katzy

Kloppenburg

Verbelen

and Rellermeyer

J.S.

, A survey on distributed machine learning, ACM Computing Surveys (CSUR) 53(2) (2020), 1–33.

46.

Prodromidis

Chan

and Stolfo

, Meta-learning in distributed data mining systems: Issues and approaches, Advances in Distributed and Parallel Knowledge Discovery 3 (2020), 81–114.

47.

Hall

Bowyer

Kegelmeyer

Moore

and Chao

C.M.

, Distributed learning on very large data sets, in: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 79–84.

48.

T.P.

and Phuong

T.T.

, Privacy-preserving deep learning via weight transmission, IEEE Transactions on Information Forensics and Security (99) (2019), 1–1.

49.

Tian

Sahu

A.K.

Zaheer

et al., Federated Optimization for Heterogeneous Networks, CoRR abs/2303.08322, 2023.

50.

Wen

et al., Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection, CoRR abs/2303.08322, 2023.

51.

Kilincer

I.F.

Ertam

and Sengur

, Machine Learning Methods for Cyber Security Intrusion Detection: Datasets and Comparative Study, Computer Networks, 2021, 107840.

An efficient intrusion detection method using federated transfer learning and support vector machine with privacy-preserving

Abstract

Keywords

1. Introduction

2. Related works

2.1 Federated transfer learning

3.1 Definition of problems

3.2 Framework of FETLSVMP

(1) Construction of transfer learning model

(2) Federal learning process

4. Experimental results

4.1 Experimental setting evaluation criteria

4.2 Dataset

a. Dataset

ISCX2012 dataset

Table 1 Distribution of attack types in ISCX2012 dataset

KDD CUP99

NSL-KDD

b. Data preprocessing

Character type digitization

Data cleaning

Data normalization

Table 8 Average accuracy rate, false positive rate (%) and miss rate on ISCX2012 dataset (%)

References

Table 1
Distribution of attack types in ISCX2012 dataset

Table 8
Average accuracy rate, false positive rate (%) and miss rate on ISCX2012 dataset (%)