Abstract
Intrusion Detection System (IDS) can reduce the losses caused by intrusion behaviors and protect users’ information security. The effectiveness of IDS depends on the performance of the algorithm used in identifying intrusions. And traditional machine learning algorithms are limited to deal with the intrusion data with the characteristics of high-dimensionality, nonlinearity and imbalance. Therefore, this paper proposes an
Introduction
With the development of network technology, the rapid growth of the network scale brings not only convenience to people but also risks and challenges. Hackers invade the user system by looking for security loopholes in the network, so as to steal private data, encroach on server resources, and obtain illegal rights and interests, which has brought great trouble to people. Long et al. [1] described the most common website vulnerabilities and proposed an effective network vulnerability detection algorithm. However, vulnerability detection is a passive defense strategy, in order to better deal with network attacks, we need to cooperate with active intrusion detection technology. Network intrusion detection is the most critical and widely used system security strategy. By learning of a large number of intrusion characteristics and real-time detection, IDS can accurately monitor abnormal traffic, analyze various network protocols and user behaviors, and accurately detect intrusion attacks, thereby avoiding losses [2]. In recent years, with the demand for the Internet, the amount of data transmitted on the network has increased exponentially. The contradiction between massive data processing and insufficient detection technology is the main problem facing current network security [3]. Therefore, strengthening the research on intrusion detection technology is of great significance to preventing problems caused by information security.
At present, machine learning is the most popular in intrusion detection. Tong et al. [4] analyzed the machine learning algorithm for botnet DDoS attack detection, and concluded that USML(unsupervised learning) can effectively distinguish botnet and normal network traffic, which is of great significance to computer security and other related fields. Krishnan et al. [5] proposed an improved region based intrusion detection system, which realized the identification of malicious nodes in the presence of false error reports, improved the delivery rate of packets and reduced the false alarm rate. Lu et al. [6] proposed an intrusion detection algorithm combining with principal component analysis and k-means clustering, which was applied to obtain a higher accuracy and a lower false alarm rate. Lin et al. [7] proposed the cluster center and nearest neighbor based algorithm to extract the representative features for effective detection of normal connections and attack behaviors, and the classification accuracy is improved. Jiang et al. [8] proposed a two-level hybrid algorithm, a filtering feature selection method is used to reduce the feature dimensionality, and a hypergraph is introduced to rescreen the obtained feature subsets, and the random forest and improved k-means algorithms are used for classification. Chen et al. [9] proposed a model RF-XGB based on random forest and extreme gradient boosting tree, the accuracy is guaranteed and the training time of the model is effectively reduced. Elhefnawy et al. [10] proposed a hybrid nested genetic fuzzy algorithm HNGFA, feature selection methods are also considered. These two algorithms greatly improves the classification accuracy of small classes. Yao et al. [11] proposed a HMLD framework based on hybrid multi-level data mining to solve the problem of data imbalance for multi-classification. The KDDCUP99 data set was used to verify the performance of HMLD, and the accuracy reached 96.70%, which greatly improved the prediction accuracy compared with the algorithms proposed in the same period. Meryem et al. [12] proposed a hybrid method combining marking and classification, which improves the accuracy of intrusion detection technology, reduces the false alarm rate, and has a good detection rate for unknown attacks.
Deep Learning is an excellent technique that deals with variants of data, because it can not only learn the given feature but also automatically extract features from data to achieve the goal of classification task [13]. With the increasing complexity of the network environment, machine learning has also shown its limitations. The application of deep learning in cyberspace security has gradually attracted the attention of scholars at home and abroad [14]. Deep learning is a new field in machine learning research. It combines low-level features to discover distributed feature representations of data, resulting in a more abstract high-level presentation of attribute category or feature [15]. It is a neural network that establishes and simulates the human brain for analysis and learning.
Some researchers have applied deep learning in the field of intrusion detection and achieved good results, proving the feasibility of deep learning in the field of network security. Zhou et al. [16] proposed a network intrusion detection method based on AutoEncoder and ResNet, the experimental results have good performance in accuracy, true positive rate and false negative rate. In order to improve the detection ability of unknown attacks, Sohi et al. [17] proposed an algorithm that used Recurrent Neural Network (RNN) to discover complex attack patterns and generate similar data, the algorithm has strong generalization ability. Srikanthyadav et al. [18] proposed a deep learning algorithm for network intrusion detection using deep AutoEncoder, optimized the training model using Adam optimization method, and achieved high accuracy on NSL-KDD data set and CICIDS 2017 data set in the Tensor Flow framework. Feng et al. [19] proposed an intrusion detection model based on a feedforward neural network to reduce the complexity of the intrusion detection model, the training time is reduced and the classification efficiency is ensured. Chakravarthi et al. [20] proposed an efficient intrusion detection algorithm AE with deep automatic coding. The application of the AE algorithm can help SVM and dense NN models reduce the false negative rate in anomaly detection and attack classification.
With the advancement of deep learning theory and the improvement of numerical computing equipment, Convolutional Neural Networks (CNN), a deep learning model, has been widely used in many fields such as computer vision and natural language processing. Compared with traditional machine learning algorithms, it has a higher accuracy rate and lower false alarm rate in most cases [21]. In the era of big data, intrusion detection data is more complex and diverse, gradually showing the characteristics of class imbalance, high-dimensionality, and nonlinearity [20]. The huge amount of calculation makes traditional machine learning algorithms stretched [14]. The characteristics of parameter sharing and local connection in CNN model make it still maintain strong learning ability and recognition ability when facing high-dimensional nonlinear data [22].
In order to solve the problems in the field of intrusion detection and improve the performance of intrusion detection, this paper proposes an intrusion detection algorithm based on image enhancement convolutional neural network. The contribution of the paper can be summarized as below. In order to solve the problem of data imbalance and make a higher accuracy rate in application, this paper proposes the method of generating new samples to increase the number of small types of data. In this way, a balanced data set is established for the training of the intrusion detection data model. In order to efficiently utilize the advantages of CNN in image processing, a method of converting one-dimensional data samples into two-dimensional images is proposed. Using this method, the original one-dimensional feature of the intrusion detection data set is converted into a two-dimensional gray image. In order to solve the problems of high dimensionality and complexity of the existing data sets, it is proposed to use the CNN model to extract the data features of the converted image samples. After training, a model with good classification performance is obtained.
The rest of this paper is organized as follows. Section 2 describes some of the related technologies. Section 3 details the proposed algorithm ID-IE-CNN. In Section 4, the performance of algorithm ID-IE-CNN is verified by experiments with KDDCup99 data set. Finally, the conclusions are summarized in Section 5.
Related technologies
Data preprocessing
In order to adapt to the ID-IE-CNN algorithm proposed in this paper, the data set is specially preprocessed, and the process is shown in Fig. 1.

Data processing.
In the collected data, there are often numerical data and character data. Inconsistent data types will increase the difficulty of data processing and classification, and even the model cannot be trained and the classification cannot be tested. Therefore, the data type will be unified in the data preprocessing. Since the CNN model can only process numerical data, the character data is converted into numerical data. The conversion process is shown in Fig. 1.
Image enhancement
Due to the problem of class imbalance in the intrusion detection data set, the scale of samples in some classes is very small. Therefore, the amount of sample information is enhanced by oversampling method, and then the one-dimensional data of intrusion detection is converted into two-dimensional image data. The class imbalance is one of the key factors that restrict the classification accuracy. If the sample scale of a class is too small, the information that can be learned will be quite limited. So, oversampling method is used to enhance the amount of sample information. The oversampling steps are as follows. Find the sample x
i
from a small class, use Euclidean distance to calculate its k nearest neighbors, denoted as x
ij
, j ∈ (1, . . . , k); Randomly select one of the neighboring samples x
ij
, and use the random number ς, ς ∈ (0, 1) to synthesize a new sample x
new
.
Repeat step (b) to obtain a sufficient number of samples of the small class to balance the data set. CNN algorithm has high accuracy in image classification. In order to adapt to the ID-IE-CNN algorithm, it is necessary to transform the processed data set from one-dimensional data to two-dimensional image data. In order to ensure the integrity of the data features, this paper proposes a method based on data filling to expand the features of the data samples and convert the expanded data to two-dimensional image data. As shown in Fig. 2.

Data conversion.

Traditional neural network structure.
On the basis of retaining all the information of the original data set, this method expands the dimensions of the sample features by using a random normal distribution, and retains the effective information of the original data set to the greatest extent. Data conversion algorithm is as follows. Select a sample x from the data set X; Use the random normal distribution function to expand x from n columns to n + m columns; Reconstruct the structure of x into a p*q matrix, where p*q = n + m; Repeat steps a, b, and c until all samples in X are traversed.
CNN model is an improvement of traditional neural network and has outstanding contributions in the field of image recognition. Traditional neural networks have only input layer, hidden layer and output layer, with various weights and complex calculations, and the input features are manually selected. The traditional neural network is shown below.
Although the CNN model is still a hierarchical network, the function and form of the layers have been improved to solve the problems of the large amount of calculation and the loss of structural information of the artificial neural network. In the face of massive and complex data sets, CNN has unique advantages. CNN model consists of data input layer, convolutional calculation layer, excitation layer, pooling layer, fully connected layer and output layer [23], as shown in Fig. 4. It does not need to select features manually, and can reduce the number of free parameters in the network through local receptive field, convolution kernel weight sharing and down-sampling technology of time or space, so as to reduce the complexity of feature extraction and quantitative reconstruction, and effectively deal with high-dimensional and non-linear input data.

Structure of CNN model.
Based on the principle of retaining the most original information and expanding as few irrelevant features as possible, when the number of features in the original data set is odd n, the data set is expanded to even n + 1 dimensions, and the data conversion method is used to convert the data set into p*q-dimensional image features for the input of the CNN model, where p*q = n + 1.
(2) Convolutional layer
The recognition process of CNN is like the human perception of external images through a local receptive field. For an image, each part of the neuron perceives the image locally, and converges the local perception to a higher level to obtain the global visual field information. Therefore, CNN can effectively handle high-dimensional nonlinear data input. The convolutional layer is the core part of the CNN model. The main feature is the sharing of local links and convolution kernel weights, which can greatly reduce the complexity of the network model. The main function of the convolutional layer is to use multiple convolution kernels to extract features through convolution operations to abstract higher-level features. The convolution operation refers to the inner product of the partial image and the convolution kernel matrix, as shown in Formula (2) [24].
(3) Activation function
The activation function is very important for the model of artificial neural network to learn and understand complex and nonlinear functions. For neural networks, without the activation function, it can only solve linear separable problems, and can do nothing for complex non-linear problems. The activation function can strengthen the ability of nonlinear expression of the network model and fit complex features well. Commonly used activation functions include Sigmoid, Tanh, ReLU, etc [23]. The activation functions are shown in Fig. 5.

Activation functions.
The Tanh function shows the best performance in ID-IE-CNN algorithm. The Tanh function means hyperbolic tangent function. Tanh is shown in formula (3) [25].
The effect comparison of Sigmoid, Tanh, and ReLU is shown in Fig. 6. In most CNN models, ReLU is generally selected as the activation function [21]. However, ReLU treats eigenvalues less than zero as zero, which will increase the sparsity of the network and lose some information of the original data set. In fact, the more available information, the better learning effect. And the Tanh function can help the CNN model fit more sample information and improve the classification accuracy. Therefore, the Tanh function is used as the activation function in ID-IE-CNN algorithm. The comparative experiment also shows that Tanh makes the classification accuracy better.

The loss rate of activation functions with different epoches.
In Fig. 6, the abscissa epoch represents the number of model iterations, and the ordinate represents the loss rate of the model in the validation set. It is obvious from Fig. 6 that, it converges faster with a smallest loss rate by using the Tanh function.
(4) Pooling layer
Convolution abstracts higher-level features, but does not reduce data dimensions. If the classification operation is performed after the convolution operation, it is easy to cause overfitting and reduce the classification performance. Therefore, a pooling layer is needed. The pooling layer retains the main features of the data sample while reducing the dimension, compress the feature maps and parameters, reduce the calculation of the network model, avoid the model overfitting, and enhance the model fault tolerance. The commonly used pooling operations in CNN are Max Pooling and Average Pooling. Suppose the size of the pooling window is kernel*kernel, the expression of the pooling layer is shown in Formula (4) [26].
Maximum pooling is to select the largest element from the modified feature map in the defined pooling window and used for the output of the pooling layer, as shown in Fig. 7.

Maximum pooling.
Average pooling is to calculate the average value from the modified feature map in the defined pooling window and used for the output of the pooling layer, as shown in Fig. 8.

Average pooling.
As shown in Figs. 7 and 8, after the convolutional data passes through the pooling layer, the original four-dimensional feature map is compressed as two-dimensional feature to reach the purpose of dimension reduction.
(5) Fully connected layer
The fully connected layer in the CNN model is similar to the hidden layer in the traditional neural network. After the pooling operation, each neuron in the feature map is connected to all neurons in the fully connected layer, and the local information with the distinguishing nature of the classes in the previous pooling layer is integrated, which can reduce the impact of location on classification and improve the robustness of the model. The process of fully connected layer can be expressed by Formula (5) [27].
The Dropout method is introduced after the fully connected layer, which discards some neurons according to a certain probability, reduces the model overfitting, and improves the generalization ability of the model.
(6) Classification prediction
The Softmax classifier is generally used in CNN classification. After the fully connected layer, the scores z
j
, z
j
∈ (- ∞ , + ∞) of the classes can be obtained. The idea of the Softmax classifier is mapping z
j
to the (0, + ∞) interval, and then normalizing z
j
it to the (0,1) interval to obtain the probability of each class. Compared to score, probability is more intuitive in classification, so this paper chooses Softmax as the classifier. Softmax classifier is as Formula (6) [28].
The overall framework of the ID-IE-CNN algorithm is shown in Fig. 9. First, the data set is preprocessed, that is, conversation of feature value from character type to numeric type, sample enhancement by oversampling, and two-dimensional image data converted from one-dimensional data. Then input the training samples into the CNN model, and introduce the Dropout layer to optimize the model. After training, the validation set is used to verify the accuracy and loss rate. Finally, use the trained model to classify the test set, and output the classification accuracy, false negative rate and precision.

Overall framework of ID-IE-CNN algorithm.
As shown in Fig. 9, the CNN model uses Tanh function as the activation function, and the whole process includes two convolutions, two pooling, and two fully connected layers. The steps are as follows, Convolutional layer ⟶ Pooling layer for feature extraction ⟶ Convolutional layer ⟶ Pooling layer ⟶ Fully connected layer ⟶ Fully connected layer ⟶ Dropout layer. The execution process of ID-IE-CNN is shown in Algorithm 1.
Algorithm 1. The proposed ID-IE-CNN algorithm
Experiments are performed on a PC with Intel(R) Core(TM) i7 processor, 3.2 GHz and 16 G memory, running on Windows 10 operating system. Programming uses Pycharm and the TensorFlow deep learning framework. The KDDCup99 data set is used to evaluate the proposed algorithms on binary classification and multi-classification.
Experimental data preprocessing
This paper uses the KDDCup99 data set [29] to verify the effectiveness of ID-IE-CNN algorithm. The data set contains 41 features and a label, 3 features and the label are in character type. The data set contains a normal class “Normal” and 38 small attack classes, and the 38 small attack classes belong to 4 big attack classes, namely “Probe”, “U2R”, “R2L”, and “Dos”. “Probe” means surveillance and probing; “Dos” means denial-of-service; “R2L” means unauthorized access from a remote machine to a local machine; “U2R” means unauthorized access to local superuser privileges by a local unpivileged user. In the binary classification, the labels are encoded as 0 and 1, respectively representing attack label and normal label [24]; in multi-classification, the normal class and the 4 big attack classes are encoded as 0, 1, 2, 3, and 4. The corresponding relationships are 0∼Dos, 1∼Normal, 2∼Probe, 3∼R2 L, and 4∼U2 R.
Character data numeralization
Because of the convolution operation, CNN model can only deal with numerical data. In order to ensure the validity of the input data, the feature “Protocols”, “Protocol_type” and “Flag” are converted into numerical data. The normal and the 4 big attack classes are labeled as 5 classes namely “Normal”, “Probe”, “U2R”, “R2L”, and “Dos”.
Image enhancement
The class imbalance problem in KDDCUP99 data set is serious, and the sample number of classes U2 R and R2 L accounts for less than 1%, which is easy to over-fit the training set in the learning process. Therefore, sample enhancement should be done before training process by using the oversampling method.
The original data set has 41 features. In order to ensure the integrity of the data characteristics, the data filling method proposed in this paper is used to extend the data features to 42 dimensions, and then, the sample set is converted into 6*7 dimensional image features for the input of the CNN model.
Parameter set of CNN model
The original data set undergoes the preprocessing phase, and the two-dimensional image data is used as the input of the CNN model. In order to adapt to the dimensions of the input matrix, the padding in the convolutional layer is set to “SAME”. The encoded vectors of labels are converted into a binary matrix. In order to learn more detailed features, and loss less important information, this paper selects a small convolution kernel and pooling matrix at the beginning. For the number of nodes and dropout value in the full connection layer, 1024 and 512 were initially selected as the number of nodes and 0.1 as the dropout value, but the performance is not good enough as expected. According to the references [30, 31], this paper reduces the number of nodes and combines multiple experiments to adjust the parameters. Finally, we get the optimal parameters as shown in Table 1.
Parameter set of CNN model
Parameter set of CNN model
Binary classification and multi-classification experiments are performed on the KDDCup99 data set. In the last fully connected layer, the number of output neurons for binary classification is 2, for multi- classification is 5, and the other parameters are the same.
In general classification problems, the commonly used evaluation metrics include accuracy, precision, underreporting, Recall, F1-score, AUC, etc. In the research of intrusion detection, the most important thing is to understand whether the model predicts the unknown data accurately and whether there is underreporting. Therefore, intrusion detection researchers usually use accuracy and false negative rate as metrics to evaluate model performance.
To fully evaluate the performance of the proposed model, this paper chose accuracy (ACC), false negative rate (FNR), precision rate (PR), Recall and F1-score as metrics to verify the effectiveness of ID-IE- CNN algorithm. Based on the confusion matrix shown in Table 2.
Confusion Matrix
Confusion Matrix
The ACC, PR, FNR, Recall and F1-score are calculated as follows.
The accuracy and loss rate for training
The accuracy and loss rate in the training process, for the training set are shown in in Fig. 10, for the validation set are shown in Fig. 11.

The accuracy in the training process.

The loss rate in the training process.
Figure 10 shows the relationship between the accuracy and the epoch. The abscissa “epoch” is the number of training iterations, and the ordinate is the accuracy of the validation set. It can be seen from the figure that the accuracy is higher and the convergence speed gets faster as the epoch increases, and it begins to converge when the epoch is about 30, which means 30 iterations.
Figure 11 shows the relationship between the loss rate and epoch. It can be seen from Fig. 12 that in the process of training, the loss rate gradually decreases as the epoch increases, and it also begins to converge when the epoch is about 30. Contrary to the accuracy, this phenomenon proves the rationality and feasibility of the ID-IE-CNN algorithm.
In order to verify the applicability of the ID-IE-CNN algorithm in the binary classification experiment, the CNN algorithm [32], the RNN algorithm [33], the GR-CNN algorithm [32], the PCA-RNN algorithm [34] and the IGWO-BP algorithm [35] are compared with. The results are shown in Table 3.
Comparison results of the binary classification (%)
Comparison results of the binary classification (%)
It can be seen from Table 3 that, among the deep learning algorithms, the accuracies of the RNN algorithm and the PCA-RNN algorithm are 83.28% and 87.7%, respectively, which are lower than the accuracies of CNN algorithm, GR-CNN algorithm and ID-IE-CNN algorithm. It shows that the CNN model has a best applicability for the analysis of intrusion detection. In ID-IE-CNN algorithm proposed in this paper, the classification accuracy is 99.73%, the false negative rate is 0.29%, and the precision is 99.7%, which are better than the CNN algorithm, the GR-CNN algorithm and IGWO-BP algorithm. Thus, it proves the superiority of the ID-IE-CNN algorithm in intrusion detection classification.
In order to verify the applicability of the ID-IE-CNN algorithm in the multi-classification experiment, the CNN algorithm [36], the DBN-XGBDT algorithm [37], the LFKPCA-DWELM algorithm [38], the DAE + DNN [39] algorithm and the CNN-Focal [40] algorithm are compared with. The results are shown in Table 4.
Comparison results of the multi-classification experiment (%)
Comparison results of the multi-classification experiment (%)
It can be seen from Table 4 that the accuracy and precision of the ID-IE-CNN algorithm are greatly improved compared to the other five deep learning algorithms, reaching 99.3% and 99.19% respectively. The false negative rate dropped significantly to 0.2%, and the F1-score was the best, which indicates that the ID-IE-CNN algorithm has the best performance. In order to further prove that the accuracy of the ID-IE-CNN algorithm is not affected by the class imbalance, Table 5 details the accuracy of each class.
Comparison of classification accuracy (%)
Table 5 lists the accuracy of each class and shows the comparison with other algorithms. Among the one normal class and four attack classes, the ID-IE-CNN algorithm achieves the best performance in accuracy, especially in the U2 R and R2 L classes with fewer samples. That is because, the sample numbers of Normal, Probe and Dos are quite large, which can provide relatively adequate information for learning, so all the algorithms are close to each other in accuracy. For U2 R and R2 L, the sample numbers are quite small and the information provided is limited. However, samples of U2 R and R2 L are enhanced to achieve the class balance of the data set in the ID-IE-CNN algorithm, and it shows obvious advantage of accuracy in these two classes.
The structure of intrusion detection data is complex and its scale is growing exponentially. Traditional machine learning algorithms cannot cope with its class imbalance, high-dimensionality, and nonlinearity. To this end, this paper proposes the ID-IE-CNN algorithm, which uses oversampling and image enhancement technology to increase the capacity of small classes and overcome the class imbalance in intrusion detection data. The application of the CNN model further solves the problem of high-dimensionality and nonlinearity. This paper uses the KDD99 data set to evaluate the performance of the ID-IE-CNN algorithm through binary classification experiments and multi-classification experiments. The experimental results show that the algorithm has improved accuracy and precision compared with the other algorithms and achieved a lower false alarm rate, especially in the classification of small classes, which shows obvious advantages and verifies the effectiveness of the ID-IE-CNN algorithm proposed in this paper.
However, in order to adapt to the complex and changeable Internet society, the algorithm proposed in this paper needs further improvement. First, the technology used to achieve the data balance is relatively single, and we will strive to propose better techniques for balancing datasets and improve the negative effects of imbalances. Then, we will study fuzzy logic methods in depth, refer to [41–44] and others, and apply them to processing noise and outlier data points for further enhance the ability of intrusion detection.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant Nos. 61807028, 61802332, and 61772449, the Youth Foundation of Hebei Educational Committee of China under Grant No. QN2021145, the Natural Science Foundation of Hebei Province of China under Grant No. F2019203120. The authors are grateful to valuable comments and suggestions of the reviewers.
