Incremental semi-supervised kernel construction with self-organizing incremental neural network and application in intrusion detection

Abstract

The semi-supervised learning (SSL) problems are often solved by graph based algorithms, semi-definite programmings etc. These methods always require high space complexities, and thus are not efficient for network intrusion detection systems. Apart from the space complexity challenge, a network intrusion detection system should be able to handle the distribution drifting of data flow as well. A common solution for this concept drift problem is by SSL. In this paper, an incremental SSL training framework is proposed to combine the low space complexity advantage of topology learning and SSL for network intrusion detection. First, the unsupervised self-organizing incremental neural network is extended to process labeled and unlabeled information incrementally. Second, a kernel function is constructed from the training results of the previous step. Finally, a kernel machine is trained with the constructed kernel function. The proposed method reduces the space complexity of SSL to the magnitude similar to supervised learning. The experiments are carried out on the NSL-KDD datasets, and the results show that the proposed method outperforms the mainstream methods such as Transductive Support Vector Machine and Label Propagation.

Keywords

Metric learning nonlinear embedding self-organizing incremental neural network semi-supervised learning

1 Introduction

The network intrusion detection problem has been a prevailing research topic in the last decade. The intrusion detection task is special in at least three ways. First, the data volume in the network is large and is still growing with the expansion of the Internet. Second, the distribution of the data is constantly changing, thus requiring better generalization abilities from the machine learning algorithms used in intrusion detection. Third, the classification problem in network intrusion detection is nonlinear. In general, it brings several challenges in machine learning including nonlinear separation, incremental learning and semi-supervised learning altogether.

Based on the availability of labeled data, intrusion detection is divided into four major solutions, namely the unsupervised, supervised, semi-supervised and one-class classification based intrusion detections. Unsupervised detection methods attempt to discover undesired patterns without any labeled information. It is usually implemented with unsupervised clustering methods. Supervised detection employs traditional supervised classification methods to learn models on the labeled dataset. Semi-supervised detection is based largely on semi-supervised learning (SSL), which employs information from both labeled and unlabeled data to train classifiers. In one class classification based intrusion detection, the labeled data is available without any intrusion samples. There are also hybrid implementations that combine supervised learning and one-class learning [19 , 32]. But an IDS that combine SSL and other machine learning setting is not fully explored.

While supervised learning based IDSes are often limited by the quality of training datasets, one class classification based intrusion detection are widely used to discover unknown intrusions. One such example is implemented by one class support vector machines (OCSVM) [21]. The drawback of OCSVM is that normal data is assumed to be distributed within a spherical area, which might not always be true. Besides, it is incapable of incorporating information from unlabeled data. Therefore its performance depends largely on a sufficient training dataset. Other methods utilize the one class modeling principle such as [14] havn’t seriously considered the insufficiency of the normal samples.

Unsupervised and semi-supervised detection are widely used in IDSes. The unsupervised detection takes the clustering assumption [2], where data points in one cluster are assumed to belong to the same class. But this is potentially false and may cause over fitting problems. The SSL methods are devised so that by including unlabeled data in training to increase the classification accuracy. One of the drawbacks of SSL based detection is that it is only feasible when the data is distributed according to some assumptions. For example, the widely used Transductive Support Vector Machine (TSVM) [12] assumes that samples belonged to different classes are separated by a low density area in a high dimensional space. Graph based SSL such as [24] has the problem of low space and time efficiency. In fuzzy set based intrusion detections [5] and entropy based methods [10], the previous drawbacks are remedied. Their drawback is that they are not online thus not being able to adapt to concept drifts.

The online learning methods can learn without storing data, and can be divided into different categories by the solutions of non-linear separation. In the well known Support Vector Machine [6], the kernel functions are employed. Another solution is lazy learning, i.e. to represent data with samples in the same space to the input. Competitive learning neural networks are common implementations of online lazy learning. There are two kinds of classic competitive learning neural networks, namely the Self-Organizing Map (SOM) [20] and Growing Neural Gas (GNG) [7]. The main drawback of SOM is its compromised performance to learn complex topology without a proper predefined graph. GNG can learn arbitrary topology structure, however the weakness is its endless growing of neuron population, which make it less suitable for large data streams. In the Self-Organizing Incremental Neural Network (SOINN) [8] and enhanced SOINN (ESOINN) [9], arbitrary topology structure can be learned without suffering from endless growing of neurons.

There are a number of researches which integrate SSL and online learning for intrusion detection [22], most of which are based on SOM [11 , 31], where SOM is used as a clustering method. Then clustering centers are labeled for constructing a nearest neighbor classifier. The weak points of SOM based SSL are that labeling of clustering centers is after the online training and potentially offline. Moreover, the limitation of SOM for learning complex topology structure is inherited. Semi-supervised GNG [23] is proposed for strictly online construction of nearest neighbor classifiers. The drawback of GNG based SSL is that the endless growing neurons problem of GNG is not remedied. Online SSL based on ESOINN is proposed in [17, 26], in which the sample count of learning result is reduced. However, the drawback of this method is it works only on convex shapes.

In this paper, we propose a semi-supervised incremental method that combines incremental learning, nonlinearity modeling and SSL. It is able to consolidate information from labeled and unlabeled data to increase classification accuracy, and able to update learned model on new data to increase efficiency. Our key contributions are listed as below.

A knowledge reuse framework for ESOINN is proposed in the form of kernel function construction.

An incremental semi-supervised kernel function construction method based on ESOINN is proposed to enhance the generalization abilities of SVMs.

Combining the proposed framework and SVM, the algorithm performs SSL with the same space efficiency to a supervised SVM.

A extension of ESOINN to processing large volumes of labeled data and unlabeled data called mixture SOINN (MSOINN) is proposed. Moreover, MSOINN can process the labeled and unlabeled data both incrementally.

Experiments are carried out on the NSL-KDD [27] dataset. Comparison results on the NSL-KDD dataset show that the proposed method has better generalization ability than OCSVM, Label Propagation and TSVM etc.

The rest of the paper is organized as follows. Section 2 gives an introduction to the embedding problem and self-organizing incremental neural networks. In Section 3 the framework of our proposed method is given. Section 4 is the simulation results and in Section 5 is the conclusion and future perspectives.

2 The embedding problem and topology learning neural networks

2.1 A brief introduction to linear embedding

Given a matrix of pairwise distances P = {p_ij} where p_ij means the distance between item i and j. In [13, 29] a linear embedding is introduced to find the positions (or referred as weights) of data items, while given the pairwise distance matrix. Denote the weights as a matrix X, where each column is the weight of an item. First the inner product matrix H = X^TX can be calculated according to [29] $\begin{matrix} {(H)}_{ij} = \frac{| p_{ij} |^{2}}{- 2} + \frac{\sum_{m = 1}^{l} | p_{mj} |^{2}}{2 l} + \\ \frac{\sum_{n = 1}^{l} | p_{in} |^{2}}{2 l} + \frac{\sum_{m, n = 1}^{l} | p_{mn} |^{2}}{- 2 l^{2}} \end{matrix}$ (1) where l is the dimension of the space after embedding. Then perform eigendecomposition H = UDU^T= X^TX, where D is a diagonal matrix with eigenvalues in the diagonal and sorted with decreasing magnitude. Then the embedding is achieved by X = D^0.5U^T. In the case where there are negative eigenvalues, the embedding results fall in a pseudo-Euclidean space as X = (MD)^0.5U^T, where the n × n matrix M = diag (I_{n
⁺}, - I_{n
^-}) with n = n⁺+ n^- where the pair (n⁺, n^-) is the signature of the pseudo-Euclidean space. The distance metric is altered as [29] $δ_{ij} = \sqrt{{(X_{i} - X_{j})}^{T} M (X_{i} - X_{j})}$ (2)

For a new data item, its dimensionality reduction can be calculated by two steps. First, calculate [29] $h_{i} = - \frac{| p_{i} |^{2}}{2} + \frac{\sum_{m = 1}^{l} | p_{m} |^{2} + p_{i}}{2 l} - \frac{\sum_{m = 1}^{l} | p_{m} |^{2}}{2 l^{2}}$ (3)

Then the embedding is accomplished as $X = (MD)^{0.5} U H$ (4) where H = [h₁, h₂, ⋯ , h_i, ⋯]^T.

2.2 The single layered SOINN

Assume a data set {X} with data points X (1) , X (2) , . . ., $X (i) \in R^{d}$ , the learning task of GNG [7] and single layered SOINN [26] is that after a single pass scan of the dataset to represent the data by neurons i with weights $W_{i} \in R^{d}$ . The topology structure of the dataset is preserved in the graph comprised of edges connecting the neurons.

The single layered SOINN [26] learning algorithm is comprised of three steps. First, when a new data item is “fired” into the network, a decision step is run to calculate the necessity to insert a new neuron into the set W for incremental topology presentation learning. Second, weight vectors of the neurons are updated by the self-organizing learning principle for vector quantization. Third, the topology structure is refined by label propagation and clustering results are computed.

In step 1, each neuron controls a spherical area in the Euclidean space. The center of the sphere is the neuron itself, and the diameter is the vector to its farthest topological neighbor by metric of Euclidean distance or the smallest distance to its neighbors if it has no topological neighbor. When a new sample V arrives and it is not controlled by its nearest neighbor W_s nor second nearest neighbor W_t in W, then a new neuron is added with weight V, and the weight vector is added to W.

In step 2, first an edge is added to link W_s and W_t, if there’s not already one there. Then if in step 1 the input vector V is not controlled by the winner neurons, update W_s and its topological neighbors towards the new sample as $W_{s} = W_{s} + \frac{1}{c_{s}} (V - W_{s})$ (5) $W_{k} = W_{k} + \frac{φ}{c_{s}} (V - W_{k}), \forall k \in neighbor (s)$ (6) where c_s is the number of counts when the neuron with weight W_s is the winner neuron. φ is the learning step and in [26] it is a constant function and φ = 0.01.

Step 3 is run with the probability of 1/λ. It generates and refines the clustering result by assuming the convexity of data. Since the objective of the algorithm framework is not for clustering under convexity assumption, the final step in the original ESOINN algorithm is not adapted in our framework.

3 Semi-supervised kernel construction and intrusion detection

We present the semi-supervised intrusion detection framework as the following four steps. The framework is also illustrated in Fig. 1.

Train the MSOINN neural network on the mixture of labeled and unlabeled data to obtain the weights of neurons and the edges representing topology structures.

Run flow algorithms on the Euclidean graph from the result of neural network training to gain similarity measures between the neurons.

Embed the neurons with the similarity measures calculated in the previous step.

Construct the kernel function for calculation of similarity matrix of labeled samples.

The key idea of the proposed framework is to transform the learned SOINN results into a kernel function. Through this process, the ESOINN results can be reused to enhance the generalization abilities of kernel machines.

3.1 Semi-supervised extension to the single layered SOINN

In this section, we propose inverse competitive learning that takes into account labeled information for more generalized data representation learning than ESOINN. Comparing to the existing implementation [1], it can learn stable SSL presentations even given ill labeled information by incorporating self-organized learning into conflict processing.

The intuition behind competitive and inverse competitive learning is that when the label of W_s (i.e. the nearest neighbor to the input sample V) is not the same as V’s, the winner weight vector and its topological neighbors are moved in an inverse direction from the current sample. The implementation details are as follows.

The proposed MSOINN is comprised of three steps as the ESOINN algorithm. Step 1 is the same as the unsupervised ESOINN while the most of the improvements are in step 2. In step 2, there are three situations according to the labels new sample V and W_s, W_t.

W_s is labeled and labels of W_s and V do not conflict.

W_s is labeled and labels of W_s and V is not the same.

W_s is not labeled.

For situation 1, W_s and its neighbors are towards V with the magnitude of $\frac{1}{c_{s}} (V - W_{s})$ . The updating of age and c is the same as ESOINN in this situation.

For situation 2, presence of V is designed to affect W_s and W_t. To ensure the effect of samples fall into the ‘gap’ area is not over weighted than the other samples, two weight values are defined as follows. $w_{1} = \frac{| | V - W_{t} | |^{ζ}}{| | V - W_{s} | |^{ζ} + | | V - W_{t} | |^{ζ}}$ (7) $w_{2} = \frac{| | V - W_{s} | |^{ζ}}{| | V - W_{s} | |^{ζ} + | | V - W_{t} | |^{ζ}}$ (8)

If V=W_s=W_t, then w₁ = 0, w₂ = 0. ζ ≥ 1 is the SSL parameter. After weights w₁ and w₂ are calculated, Then W_s and W_t are both updated with the magnitude of $\frac{w_{1}}{c_{s}} (V - W_{s})$ , $\frac{w_{2}}{c_{t}} (V - W_{t})$ respectively. The reason for this weighting strategy is that information from sample V with smaller distance to winner node is considered more credential. After that c and age will be updated according to the weights as c_s = c_s + w₁, c_t = c_t + w₂, age_s,k = age_s,k + w₁ (k ∈ η_s), age_t,k = age_t,k + w₂ (k ∈ η_t).

For situation 3, label the winner neuron with the new sample’s label and check for label conflicts between winner neuron and its neighbors and remove the edges causing conflicts. The rest of situation 3 is the same to situation 1.

3.2 Embedding neurons with graph similarity

In this section we elaborate our method of embedding neurons with graph similarity. Embedding data items with graph similarity is not new, for example in Isomap [28]. The drawback of graph similarity usage in Isomap is that the graph is constructed with all the data items in input space, and graph algorithms are often slow. In our case we only embed the neurons with graph similarity.

In order to calculate the graph similarity through flow network algorithms, weights of edges should first be assigned. We use the following choice of weight for an edge from the topology learning results $d_{ij} = \frac{1}{1 + | | W_{i} - W_{j} | |^{2}}$ (9)

This is a logistic function with squared distance as the input. Other kernel functions such as the popular RBF function can be a choice as well, depending on the specific application settings. After that, the cut value between points can be calculated using algorithms such as Edmonds-Karp [4]. Store the results as a matrix C = {c_ij} where c_ij is the cut value between neuron i and j. Since larger cut value means higher similarity, we chose a transfer function empirically that transfers cut values to distances as $P_{g} = \frac{1}{1 + C}$ (10)

The embedding of neurons can be carried out by procedures introduced in section 3.1. That is, first calculate the inner product matrix H_g, then the eigendecomposition H_g = UDU^T= X^TX, then the embedding X = (MD)^0.5U^T.

3.3 Kernel function construction from MSOINN

According to [13, 29], given a distance vector P to the neurons, the embedding can be calculated as the following two steps. The inner product vector H is calculated using equation (3). Then the embedding is obtained as $X = (MD)^{- 0.5} U H$ (11)

In order to embed an item from the input data space with the above procedure, similarities between an input point and the neurons must be calculated first. In order to stress the locality properties of Voronoi regions [7, 20] we choose the similarity measure specified by the following equation $a_{i} = exp (β | | W - W_{i} | |) - 1$ (12) where β is a positive real number, W is the weight of the input pattern, and a_i is the ith element of the vector A which hold similarity measures between the current input pattern and all neurons. The measure is illustrated in Fig. 2. This choice of this similarity measure also means that the nearest neighbor search is avoided and most important of all is that the result of the metric learning and embedding is smooth. Denote S = {s_i}, where s_i is the similarity between neurons and the input. Then the embedding is carried out as follows. $s_{i} = \frac{\frac{1}{a_{i}}}{\sum_{n} \frac{1}{a_{n}}}$ (13) $P = P_{g} S$ (14)

Set P_i = 1 and ∀j ≠ i, P_j = 0 if S_i is equal to 0. After that the inner product vector H can be calculated from P and the embedding of the input pattern can be carried out with the linear embedding procedures described in equation (3). However, if a kernel function is constructed to define the similarity between data items, the embedding results would not be necessary. The kernel function is constructed as $Δ_{ij} = (MD)^{- 0.5} U (H_{i} - H_{j})$ (15) $k (X_{i}, X_{j}) = \sqrt{Δ_{ij} M Δ_{ij}}$ (16)

This kernel function can be used to train kernel machines, and in our work we chose the famous SVM because of its ability to find the optimal separation hyperplane.

3.4 Stability of the kernel space

It is possible for the kernel function in equation (16) to generate a indefinite matrix. Thought SVM has shown good performance in Minkowski space, to ensure stability of training kernel machines a semi-definite kernel matrix is often required. The “reflection” technique in [13] is employed to solve this problem. First, during training, a reflected inner product matrix is produces as $\bar{H} = UMDU$ which is semi-definite because all the signatures (i.e. diagonals along MD) are non negative. Then for any data item X, it is projected into the semi-definite kernel space by $\bar{X} = MX$ , and it is shown that in [13] the training of a kernel machine for linear classification is equivalent in the reflected space and the original space.

As a result, the kernel function is constructed as $k (X_{i}, X_{j}) = | | X_{i} - X_{j} | | = | | (MD)^{- 0.5} U (H_{i} - H_{j}) | |$ (17)

The complete algorithm for metric learning and embedding is given in Algorithm 1

Algorithm 1 Kernel Construction with MSOINN

Input: Sequence {S}, max _ age, λ, dimension l

Output: Kernel k (·)

1: Initialize set N with the first 2 samples from {S}

2: While {S} is not empty do

3: Draw one input vector as V from {S}.

4: Find winner W_s and second winner W_t from neuron set N.

5: Calculate whether to insert new neuron by Delaunay triangulation principle.

6: if Need to insert neuron then

7: Add a neuron k with weight V to N, and set winning times c_k = 0

8: . else

9: if Input and W_s label conflicts then

10: Perform negative competitive learning.

11: Increase the winning times as c_s (t) = c_s (t - 1) + ω.

12: else

13: Perform positive competitive learning.

14: Increase the winning times as c_s (t) = c_s (t - 1) +1.

15: end if

16: Update age of edges as ${age}_{j, s} (t) = η_{j, s} + {age}_{j, s} (t - 1)$

17:for all edge age_m,n > max _ agedo

18: Remove the edge connecting m and n.

19: end for

20: end if

21: if number of input samples divides λthen

22: Remove neurons with less than 1 neighbors.

23: end if

24: end while

25: Calculate the embeddings of neurons with equation (1) with parameter l.

26: Output the kernel function defined by equation (17)

4 Experiments

The experiments are carried out on the NSL-KDD [27] dataset. The dataset is comprised of three parts, namely the training dataset, the test dataset and a ‘test21’ dataset which contains a small number of samples that are hard to classify. The proposed method is implemented with numpy [3]. Other methods for comparison include two supervised learning methods, a one class learning methods and two SSL methods. The SSL method TSVM is implemented by Svmlight [16], and it is trained with an RBF kernel. Label propagation, and Support Vector Machine are implemented with Sklearn [25]. The supervised methods including and Adaboost are carried out by Weka [15]. The reason for choosing Sequential Minimal Optimization (SMO) and Adaboost it because of their high efficiency and they are already used in intrusion detection researches. The supervised methods provide baselines for the false alarm rate. Label propagation and TSVM are the mainstream SSL methods. Besides, OCSVM is another baseline method which is often used in outlier detection applications including intrusion detection. The parameters are selected on the training dataset with cross validation and grid search. The experiments are carried out on an computer with 8G RAM and a 64bit OS. The comparison results are listed in Table 1 and 2.

On the test dataset (in Table 1) where the unlabeled dataset is abundant, the proposed method shows significant advantage over other methods (17.9% better than TSVM in detection rate), and is proven to have better generalization abilities on large amount of unlabeled datasets. Comparing to the other baseline method OCSVM, the proposed method shows better stability when the labeled information is not sufficient. Especially when the labeled dataset size is less than 251, the OCSVM method is unacceptable because of high false alarm rate. This is because when the labeled dataset is not sufficient, the assumption of OCSVM that normal samples reside in a spherical area dose not hold true, while the proposed method dose not take any assumption of the dataset. The label propagation failed the experiment when the dataset is large due to its high space complexity, and the proposed method is constant in space complexity. The average training time of the proposed method is below 15 seconds and TSVM usually requires several hours to finish training. In summery, the experiments on the NSL-KDD test dataset prove the better generalization ability of the proposed method, and the improved space complexity. On the test21 dataset, the overall trend is similar but is more chaotic. This is because that the distribution of data is skewed. One evidence is that the OSCSVM can generate a false alarm rate as low as 14.9% with a labeled dataset with only 2 samples. This means that the normal dataset is highly concentrated in a small area. Despite this distribution skewing, the proposed method is able to outperform the existing supervised and SSL methods. In summery, the experiments on the NSL-KDD dataset can prove the ability to learn accuracy classifies under concept drift with the proposed method.

To demonstrate the necessity of incorporating dimensionality reduction techniques as described in the proposed framework. The proposed framework combining the semi-supervised MSOINN and linear embedding shows an improvement over MSOINN in most cases. This is because of the effect of dimensionality reduction in the kernel construction process. In summery, the proposed method is a space effective alternative to TSVM. In the area of intrusion detection, the proposed method provides low space complexity, and requires only a small number of labeled samples.

5 Conclusion

In this paper, a semi-supervised learning method based on incremental kernel construction is proposed. First, an SSL extension to ESOINN is proposed to process the mixture of labeled and unlabeled data. Second, a knowledge reuse framework is proposed to construct a kernel function utilizing the knowledge stored in the trained neural network. Finally, we use the kernel function to train SVMs and apply it to intrusion detection. The proposed method shows significant advantage over supervised learning methods and TSVM, and it is more stable than OCSVM when the labeled dataset is small. Though not tested by experiments, the proposed framework can be used in multi class SSL problems. Besides, the proposed framework should work with other topology algorithms and applications. For example, it can be used in combine of GNG providing that semi-supervised extensions are made. These algorithmic and application extensions are to be explored in future works.

Footnotes

Acknowledgment

This work was partly supported by National Natural Science Foundations of China (No. 61301148 and No. 61272061), the fundamental research funds for the central universities of China (No. 531107040263, 531107040276), Hunan Natural Science Foundation of China.

References

Beyer

and Cimiano

, Online semi-supervised growing neural gas, International Journal of Neural Systems22(5) (2012), 1250023.

Chapelle

, Schölkopf

and Zien

, et al., Semi-supervised learning, volume 2, MIT Press, Cambridge, 2006.

NumPy Developers. Numpy. NumPy Numpy. Scipy Developers, 2013.

Edmonds

and Karp

R.M.

, Theoretical improvements in algorithmic efficiency for network flow problems, Journal of the ACM (JACM)19(2) (1972), 248–264.

Elhag

, Fernández

, Bawakid

, Alshomrani

and Herrera

, On the combination of genetic fuzzy systems and pairwise learning for improving detection rates on intrusion detection systems, Expert Systems with Applications42(1) (2015), 193–202.

Ertekin

, Bottou

and Giles

C.L.

, Nonconvex online support vector machines, Pattern Analysis and Machine Intelligence, IEEE Transactions on33(2) (2011), 368–381.

Fritzke

, et al.,A growing neural gas network learns topologies, Advances in Neural Information Processing Systems7 (1995), 625–632.

Furao

and Hasegawa

, An incremental network for on-line unsupervised classification and topology learning, Neural Networks19(1) (2006), 90–106.

Furao

, Ogura

, and Hasegawa

, An enhanced self-organizing incremental neural network for online unsupervised learning, Neural Networks20(8) (2007), 893–903.

10.

Gautam

and Nair

, Entropy variation and J48 algorithm based intrusion detection system for cloud computing[ J], American Society of Civil Engineers7(1) (2007), 10089–10093.

11.

Ghosh

, Roy

and Ghosh

, Semisupervised change detection using modified self-organizing feature map neural network, Applied Soft Computing15 (2014), 1–20.

12.

Görnitz

, Kloft

M.M.

, Rieck

and Brefeld

, Toward supervised anomaly detection, Journal of Artificial Intelligence Research (2013).

13.

Graepel

, Herbrich

, Bollmann-Sdorra

and Obermayer

, Classification on pairwise proximity data, Advances in Neural Information Processing Systems, 1999, pp. 438–444.

14.

Gupta

K.K.

, Nath

and Kotagiri

, Layered approach using conditional random fields for intrusion detection, Dependable and Secure Computing, IEEE Transactions on7(1) (2010), 35–49.

15.

Hall

, Frank

, Holmes

, Pfahringer

, Reutemann

and Witten

I.H.

, The weka data mining software: An update, ACM SIGKDD Explorations Newsletter11(1) (2009), 10–18.

16.

Joachims

, Svm-light support vector machine, 2002. URL=http://svmlight.joachims.org, 2009.

17.

Kamiya

, Ishii

, Furao

and Hasegawa

, An online semi-supervised clustering algorithm based on a self-organizing incremental neural network, In: Neural Networks, 2007-IJCNN 2007, International Joint Conference on, 2007, pp. 1061–1066. IEEE.

18.

Günev Kayacik

, Heywood

, et al., On the capability of an SOM based intrusion detection system, In: Neural Networks, 2003 Proceedings of the International Joint Conference on, volume 3, 2003, pp. 1808–1813. IEEE.

19.

Kim

, Lee

and Kim

, A novel hybrid intrusion detection method integrating anomaly detection with misuse detection, Expert Systems with Applications41(4) (2014), 1690–1700.

20.

Kohonen

, The self-organizing map, Neurocomputing21(1) (1998), 1–6.

21.

, Phung

, Nguyen

and Venkatesh

, Fast one-class support vector machine for novelty detection, In: Advances in Knowledge Discovery and Data Mining, 2015, pp. 189–200. Springer.

22.

, Yang

and Li

, Performance analysis and optimization for spmv on gpu using probabilistic modeling, Parallel and Distributed Systems, IEEE Transactions on26(1) (2015), 196–205.

23.

Maximo

V.R.

, Quiles

M.G.

and Nascimento

M.C.V.

, A consensus-based semi-supervised growing neuralgas, In: Neural Networks (IJCNN), 2014 International Joint Conference on, 2014, pp. 2019–2026. IEEE.

24.

Mingqiang

, Hui

and Qian

, A graph-based clustering algorithm for anomaly intrusion detection, Computer Science & Education (ICCSE), 2012 7th International Conference on, 2012, pp. 1311–1314. IEEE.

25.

Pedregosa

, Varoquaux

, Gramfort

, Michel

, Thirion

, Grisel

, Blondel

, Prettenhofer

, Weiss

, Dubourg

, Vanderplas

, Passos

, Cournapeau

, Brucher

, Perrot

and Duchesnay

, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research12 (2011), 2825–2830.

26.

Shen

, Yu

, Sakurai

and Hasegawa

, An incremental online semi-supervised active learning algorithm based on self-organizing incremental neural network, Neural Computing and Applications20(7) (2011), 1061–1074.

27.

Tavallaee

, Bagheri

, Lu

and Ghorbani

A.-A.

, A detailed analysis of the kdd cup 99 data set, In: Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications2009 (2009).

28.

Tenenbaum

J.B.

, De Silva

and Langford

J.C

, A global geometric framework for nonlinear dimensionality reduction, Science290(5500) (2000), 2319–2323.

29.

Torgerson

W.S.

, Theory and methods of scaling, 1958.

30.

Truong

T.K.

, Li

and Xu

, Chemical reaction optimization with greedy strategy for the 0–1 knapsack problem, Applied Soft Computing13(4) (2013), 1774–1780.

31.

Wei

, Chen

, Guo

, Jing

and Tao

, SOM-based intrusion detection for SCADA systems, Electronics and Electrical Engineering (2015), 57.

32.

, Li

, He

and Zhang

, A hybrid chemical reaction optimization scheme for task scheduling on heterogeneous computing systems, 2014.