Abstract
The evolving new and modern technologies raise the risks in the network which will be affected by several attacks and thus give rise to developing efficient network attack detection and classification methods. Here in this article for predicting and classifying the network attacks, the LSTM neural network with XGBoost is suggested in which the NSL-KDD dataset was utilized to train the LSTM in the study. In the beginning, the unnecessary data and the noisy data will be eliminated using the dataset and the feature subset with the most compelling features will be selected using the feature selection. By utilizing the essential data, the proposed system will be trained and the training parameter values will be modified for maximizing the functionality of the proposed system. Then, the result of the proposed system will be evaluated with some of the existing machine learning and deep learning algorithms such as SVM, LR, RF, DNN, and CNN with the performance metrics like Accuracy, F1 score, Recall, and Precision. It was found that the proposed model outperforms better than the other algorithms as this model is trained with the most important features and due to this, the training time and overfitting of the learning model was reduced thereby increasing the model effectiveness
Introduction
The rapid growth and technological advancements as like in the Internet of Things (IoT), cloud computing, and NB-IoT were giving rise to different risks thereby exacerbating the lack of faith, and thus, many end-users have started using cyber security. Several infected computers may form a botnet when they are connected to other infected PCs thus allowing the hackers to perform a variety of crimes such as DDoS attacks, spam, ransomware, and stealthy data theft [1]. Hence, the importance of data confidentiality has been increasing day by day. When an intruder gets access to the scheme either for using or stealing the information from it, then the phishing, packet sniffing, or information stilling attack will be classified as an intrusion. Attackers will either look for reducing network security or compromising the system confidentiality when they try to steal money or to get some important data by executing harmful actions. Some of the examples of cyber-attacks are Data hacking, denial of service, phishing, virus, and theft [2].
With the rise of blockchain-based wireless networks, wireless devices with inadequate security setups will be easily get attacked and exploited to launch large-scale LDDoS attacks. A DDoS attack with Mirai malware targeting OVH distributor of cloud services was identified in early September of 2016 roughly with 145k hijacked cameras delivering 1.1 tbps at peak. According to CNCERT, there will be 39 botnets with a scale of above 100k in 2020 [3]. Currently, the intrusion detection technologies have been split into two categories as Network-based and Host-based intrusion identification. The network-based type were focusing on network traffic analysis thereby evaluating the behavior of network attacks in real-time whereas the host-based will be focusing on collecting, forensic, and auditing the host intrusion traces. For detecting the anomalies, scientists have used network intrusion detection technologies and finally offered a range of anomaly detection and analysis techniques from them [4].
With the rising popularity, one of the advanced machine learning technologies, Deep convolutional neural networks (DCNN) has been extensively implemented in speech recognition, computer vision, and natural language processing. Also, DCNN highlights the automated feature extraction, the potential of extremely nonlinear systems, and the flexibility of the architecture design [5]. In the current networks, NIDS functioning was performed using one of the unique deep learning hybrid models made with deep and shallow learning techniques and is capable of analyzing broad categories of network data. Integrating deep and shallow learning methods will maximize the assets by simultaneously lowering the systematic overheads [6].
An LSTM-based deep learning technique is used for packet-level categorization in IDSs for discriminating semantic meanings of malicious data per data thus making it more competitive with flow-based DL techniques thereby lowering flow processing time significantly [7]. In the evolving architecture and infrastructure, the characteristics will be different and the conventional neural network due to its basic nature wouldn’t be able to extract relevant data from the domain knowledge of traffic data. Detecting intrusions will reduce the traffic information and classify using the reduced features alone as the complex features along with the temporal data between the network connections will be addressed or disregarded simplistically. Hence, intrusion recognition algorithms will eventually drop few traffic data and only be able to identify intrusions using limited feature information [8].
The unwanted and redundant characteristics will be removed with the process of feature selection to reduce the attribute dimension thereby enhancing the accuracy and efficacy of the categorization and also it does remove the noisy data and minimize the overfitting problem in machine learning. Attributes of a subgroup will be located using the attribute subgroup search technology that is substantially connected with the pattern identification issues in the space of search consisting of data features of all combinations [9].
A model for detecting phishing attacks given here is based on best attribute selection and neural network to obtain high-quality characteristics from phishing URLs and related websites. Due to an incorrect selection of sensitive features, the underlying classifier will be unable to accurately recognize phishing websites. Meanwhile, certain insignificant or irrelevant characteristics will lead learning algorithms to become over-fitted [10].
The major contribution of this work is, An efficient deep learning technique is proposed with a feature selection technique to detect and classify the network attack types. The extreme gradient boosting technique is projected for feature selection. The LSTM neural network is utilized to detect and classify the attacks in the network.
The structure of this paper is configured as follows: Section 2 provides a summary of the related work. Section 3 describes the methodology in detail. The results are discussed in Section 4 and Section 5 concludes the paper with a summary and suggestions for future works.
Literature review
Sarumi et al. [11] compared two systems for detecting intrusions: The first one is Apriori which employs the data mining association rule methodology, and the second one is Support Vector Machine which adopts the usage of an ML methodology (SVM). The Network Security Laboratory Knowledge Discovery and Data Mining (NSL-KDD) dataset and the University of New South Wales–NB 2015 (UNSW-NB15) dataset are utilized to assess the effectiveness of the two systems.
Sarker et al. [12] introduced a machine learning-centered security system based on machine-learning, an Intrusion Detection Tree (“IntruDTree”). The IntruDTree will consider the relevance of security aspects initially before building a model for intrusion detection based on the tree and the significant characteristics. This technique is not only accurate in predicting unknown test scenarios but also best in reducing the classifier computational cost by lowering the dimensions of data.
A Deep Belief Network (DBN) attack detection approach was proposed by Reddy and Shyam [13] in which the activation function and weights are refined using the Median Fitness focused Sea Lion Optimization method (MFSLnO). When DBN identifies a malicious point passed control to a bait technique that is lightweight and consistently moderates the most prevalent malicious nodes while maintaining normal relations.
A model based on a hierarchy of deep learning in the identification of anomaly and summarized the study articles on the subject were presented by Gamage and Samarabandu [14]. By using two vintage datasets KDD 99 and NSL-KDD and two current datasets CIC-IDS2017, CIC-IDS2018, four vital deep learning techniques for intrusion classification tasks were trained and assessed as autoencoder, deep belief network, feed-forward neural network, and long short-term memory network.
Intrusion detection was examined in the networks by Venkata and Akkalakshmi [15] with deep and machine learning strategies. The NSL-KDD dataset was utilized which is an upgraded type of the KDD’99 dataset. They trained and built different classification models that categorize networks attacks types using supervised deep learning and machine learning approaches.
Alazab et al. [16] looked at developing adaptable and effective IDS to identify and categorize unpredictable and random cyber-attacks using a deep neural network (DNN). Because of the constant variation in network activity and the quick attack growth, it is required to analyze numerous datasets that have been created throughout time using static and dynamic methodologies.
Three improvements were made in the research of automated intrusion sensing methods by Pan et al. [17]. First, the Robust Software Modeling Tool (RSMT) is used for assessing the viabilities of semi-supervised∖unsupervised methodology for online invasion recognition. They also explained how RSMT uses a pre-trained network autoencoder to encrypt and recreate the call graph.
Liu and Lang [18] developed sorting of IDS that uses information objects as the primary property for identifying and describing deep learning and machine learning-based IDS literature. For cyber security experts, this sort of categorization system is ideal. The survey initially defines the notion of IDSs and their classification. Also, the machine learning methods, metrics, and benchmark datasets that are often employed in IDSs are discussed.
The unique method based on artificial neural network methodology was established by Shenfield et al. [19] for detecting traffic of malicious networks that may be used in the inspection-based deep packet for intrusion detection. Experiments using a variety of common harmless network traffic data yielded the following findings (dynamic link library files, images, and a selection of other miscellaneous files such as music files, logs, and word processing documents).
Elsherif [20] solved the problem by developing a model of description utilizing several deep Recurrent Neural Network models (RNNs) that generalizes information utilized to find both visible and invisible threats. This generality is based on RNN’s ability to define typical behavior and the variance that is considered normal.
The UNSW-NB15 intrusion detection dataset was examined by Kasongo and Sun [21] which is utilized for training and testing the models. Furthermore, they used the XGBoost algorithm to provide a feature reduction strategy that is filter-based. Then, by using the smaller feature space, the following ML techniques were implemented: Support Vector Machine, Artificial Neural Network, k-Nearest-Neighbor, Decision Tree, and Logistic Regression.
Ayo et al. [22] created a NIDS, a deep learning approach that uses rule-based hybrid attribute selection for enhancing efficiency. The design is divided into three stages as rule assessment, hybrid feature selection, and detection. For improving experimentation and comparison, many search algorithms and attribute evaluators were integrated for feature selection.
The XGBoost–DNN model was introduced by Devan and Khare [23], which employs the XGBoost algorithm for the selection of attributes and classifying the deep neural network (DNN) network attacks. Normalization, feature selection, and classification are the three processes of the presented technique. While DNN training, the Adam optimizer is utilized to improve the rate of learning, and a softmax classifier is utilized to 4/*-categorize network issues.
Ergen and Kozat [24] used training strategies based on gradient and quadratic programming which are extremely effective for concurrently training and refining the structure of LSTM and OC-SVM (or SVDD) algorithm’s factors. They altered the main goal requirements of the OC-SVM and SVDD algorithms to employ the gradient-based training approach and exhibited integration of the modified purposes to the original factors.
A dynamic perspective built on artificial intelligence was proposed by Thapa and Duraipandian [25] in which the fundamental concept of the work is to provide a unique malicious categorization scheme utilizing the Long Short-Term Memory concept. For assessing the effectiveness of the suggested model both the experimental setup and the experimental validation is used.
A unique Long Short-Term Memory (LSTM) system for anomaly identification was introduced by Tanksale [26] which will detect abnormalities in real-time using lesser resources. They published the findings of a unique prediction approach used to choose the best parameters of the LSTM network. The prediction process and abnormality recognition techniques are validated using real-world data from automobiles.
A new NIDS architecture based on a deep convolutional neural network has been proposed by Khan et al. [27] which utilizes the network spectrogram images created with the help of the short-time Fourier transform. They used the CIC-IDS2017 dataset to assess the efficiency of the suggested solution. When compared to previous deep learning (DL) methods, the experimental results showed a 2.5 percent to 4% enhancement in properly detecting intrusions thereby lowering the FAR from 4.3 percent to 6.7 percent in a binary classification situation. The authors also tested its effectiveness in a seven-class classification situation, reaching nearly 98.75 percent accuracy with a 0.56 percent - 3.72 percent growth over existing DL approaches.
A new framework for malware analysis was suggested by Aslan and Yilmaz [28] for optimizing the integration of two broad pre-trained network models. Data collecting, deep neural network design phase, training the suggested neural network architecture, as well as assessment of the trained deep neural network are the four primary stages of this architecture. Malimg, Microsoft BIG 2015, and Malevis datasets were used to validate the developed technique. If the suggested method was evaluated on the Malimg dataset, it achieved 97.78 percent accuracy, outperforming most ML-based malware detection methods.
Doriguzzi-Corin et al. [29] developed LUCID which is a viable and lightweight deep learning DDoS identification system using the features of Convolutional Neural Networks (CNNs) for identifying the traffic flows whether normal or anomalous. They presented three major contributions: (1) novel use of a CNN to identify DDoS traffic with low processing overhead, (2) a dataset-agnostic preprocessing method to generate traffic findings for web detecting attacks, (3) an activation assessment to describe LUCID’s DDoS categorization, and (4) empirical investigation of the solution on a resource-constrained hardware platform. LUCID achieves existing framework detection accuracy while delivering a 40x decrease in processing time using the most recent datasets.
The BAT traffic abnormality identification model was proposed by Su et al. [30] proposed for addressing the issues of lower intrusion detection accuracy and feature engineering. BLSTM (Bidirectional Long Short-Term Memory) and attention mechanisms are combined in the BAT model. The BAT-MC model delivers a high level of accuracy on the NSL-KDD dataset, according to the results obtained. On the KDDTest C set, the BAT-MC model outperforms CNN and RNN by 4.12 % and 2.96 %, accordingly. The accuracy of the BAT-MC strategy seems to be 4.75 % greater than CNN and 7.1 % higher than RNN on the KDDTest-21 set.
Khan et al. (2019) [31] proposed a unique two-stage deep learning (TSDL) model for effective network intrusion detection, centered on a stacked auto-encoder with a soft-max classifier. The model has two judgment stages in which an early phase uses a probability score value to categorize network traffic as normal or anomalous. It is also employed as an extra character in the final judgment stage for identifying the normal state and various types of attacks. Comparative simulation findings show that their proposed model exceeds existing techniques, with identification rates of up to 99.996 % for the KDD99 dataset and 89.134 % for the UNSW-NB15 dataset, correspondingly. Researchers believe that their model seems to have the capacity to become a future standard for deep learning as well as network security research.
Proposed methodology
This section illustrates the approaches for finding and classification of network attacks. The NSL-KDD dataset is used in this work for training the categorizing system. The XGBoost technique is utilized for the selection of optimal features and the LSTM model is used for the classification of the attack types. The proposed system architectural design is delineated in Fig. 1.

The architecture of the proposed system.

Flowchart of XGBoost feature selection.
The process flow of the proposed work shows that the dataset will be processed for attribute selection to remove the redundant, inappropriate, and noise data from the dataset. After features are selected from the dataset, the records are divided into two categories as testing and training information. The proposed LSTM is built with the training data and validated by the testing data for the detection and classification of network anomalies.
The NSL-KDD database is utilized in this study which is the altered edition of the KDD cup 99 database. This study makes use of the NSL-KDD database [23]. The NSL-KDD database is the altered edition of the KDD cup 99 database. The NSL-KDD data set is proposed as a solution to several intrinsic faults of the KDD’99 data set. 80 % of the information in the dataset is utilized for training the model whereas the remaining 20 % is utilized for testing.

Selected features with a feature importance score.
The researchers selected the NSL-KDD dataset as the conventional standardized dataset for their experimental assessment and showing the enhancement on IDS. This database incorporates 41 attributes that are presented in Table 1 and the features have been reduced using the feature selection methodology.
Features of NSL-KDD dataset
The different types of attack class in the dataset are DoS, Probe, R2L, and U2R that contains any number of attack types which are shown in Table 2.
Attack classes in the NSL-KDD dataset
Feature selection helps in reducing the redundant dimension of the database. The irrelevant, noisy attributes are removed by selecting the features that have high importance scores using the XGBoost technique. Some of the advantages of the feature selection technique are that the learning of the classification model is increased and the computational complexity is reduced.
XGBoost allocates a significant score to every feature of the iteration indicating the significance of every attribute in which the network will be trained. The baseline will be established for building a new tree with a gradient direction. Gradient boosting builds boosted trees smartly to acquire feature scores, showing the relevance of each feature to the training model. The more frequently a feature is utilized to make an important decision with enhanced trees, the better its score.
A set of decision trees is used to make the predictive model shown in Equation (1)
In which, m is the number of decision trees, f
k
is the decision tree prediction, and x
i
is the ith data point feature vector. It is required to improve a loss function before training the model. The binary classification’s loss function is presented as Equation (2).
The feature selection using XGBoost is given in algorithm 1.
Another key element of the XGBoost approach is regularization and as given by Equation (3)
In which, T is the number of leaves and jth leaf score is wj2.
The objective of the model is expressed as in Equation (4)
Where L is the loss of training, which defines how good the system predicts. Ω is a regularization function that regulates model complexity and prevents 4 over fitting. For maximizing the objective function, the variance and mean will be employed as the gradient descents order in XGBoost. Hence, the objective function is articulated as in Equation (5)
The significance score characteristics of every input variable will be effectively permitting the evaluation of every subgroup attributes’ significance.
For obtaining the multiple observations, the whole process will be continued using different Num_trees parameter values. The most essential features from the estimations have been used to identify the key features, as illustrated in Algorithm 1. The number of estimators is Num_trees, while the total number of features is n. The dataset has n = 41 features and four estimators have been used: 100, 500, 800, and 1000.
To find the ideal tree structure, we may use the greedy method, in which each attribute and the variable with the maximum value is picked for node splitting once all of its values have been visited. The most informative features can split samples with the greatest information gain, allowing the tree to converge more quickly.
The basic idea is to use the percentile technique for discovering a large number of candidate locations that might become dividing points, and then choose the best splitting point among them. This approach can greatly reduce computational costs. The gain after the split is shown in Equation (6)
XGBoost algorithm was used with the NSL-KDD database, and the result obtained was 20 features.
Long Short Term Memory is a Recurrent Neural Networks (RNN) based deep learning approach as it will not perform well in long-term learning and thus, LSTM was created. The design of LSTM, a prominent time series forecasting model allows it to recall and learn any long-term dependence at random intervals and can deal with long-term dependency data excellently. It is regarded as an effective strategy for analyzing data or events that have a certain relationship, particularly in temporal sequence.
When dealing with long-term dependencies, LSTM was primarily driven and built to address the traditional RNN’s vanishing gradients problem. The LSTM model is built and trained using the training data. The LSTM is comprised of three layers: the input, output layer, and the hidden layer, in which each layer incorporates a large number of neurons known as units. A hyper-parameter tuning strategy is used to compute the number of hidden layers. The information is sent from one layer to the next.
As a non-linear activation function, Rectified linear units (ReLU) are utilized in each hidden layer for reducing the state of vanishing and the error gradient problem. ReLU has demonstrated that it is better capable of accelerating the entire training process. As an optimization algorithm, the Adam optimizer is utilized. The key invention of LSTM is the memory cell, which functions as an accumulator of state information. The LSTM architecture is made up of memory blocks that each has 3 gates: a forget gate f t , an input gate i t , and an output gate o t . The mathematical expression for LSTM is as follows. The previous time step’s output ht-1 and the input xt is accepted by the input gate, and then produce two values:
Forget gate:
Cell state:
Output gate:
Finally, output of LSTM unit is:
In the above equation, the bias vector b and weight matrix W are learnable factors in each gate, respectively whereas (x) represents the sigmoid function and * represents dot formation. The input gate specifies the amount of data that should be retained from the input, the forget gate determines how much prior data to be discarded, and the output gate determines how much data to be sent to the next stage. The lower layer’s output will be fed as the higher layer’s input and stacked again in multi-layer LSTM. All the outputs of all hidden units are joined in the final layer to a fully linked layer to accomplish binary classification.
Layers of the LSTM model are an output, hidden, an input, a softmax layer, and a fully connected layer. The input layer has 20 nodes that correspond to 20 attributes used to identify network anomalies. The number of hidden layers employed in this case is two. Some hidden units can be found in the hidden layer. Each hidden layer has its nonlinear activation function. ReLU has improved results and has sped up the training process. The elimination of the vanished and exploding gradient problems was the fundamental innovation of ReLU.
A fully connected layer follows the network, with two nodes for each class. The network includes an output layer and softmax to predict class labels. Softmax classifies the result of the fully connected layer as 0 or 1. The classification probability is determined in the softmax layer, and the output classification layer divides testing data into 2 categories: normal and anomalous. Adaptive Moment Estimation (Adam) is utilized during training to improve the intrinsic factors of an LSTM. The Adam optimizer makes it possible for the model to learn rapidly. The initial learning rate is 0.01, the mini-batch size is 32, and the epochs are 200. The LSTM network’s biases and weights are chosen at random.
Adam optimizer for optimization of the model
The learning rate, which is used in the LSTM, is one of the hyperparameters that affect the training. To decrease inaccuracies, it is vital to take a neural network design and network factors. The neural network’s productivity is closely affected by these variables. To resolve this issue, the Adam optimizer, which is an adaptive moment estimate, is used. The learning rate is calculated by gradients’ variance and mean.
Classification
The training data is fed as input to the network via the input layer. The amount of input neurons is as same as the number of attributes in the database that is fed into the neural network. As a result, the D inputs in the input layer may be written as in Equation (7)
The hidden layer uses random weights, wi, and a bias, bj to transfer the input X. As a result, the hidden layer inputs are represented in Equation (8)
Where j denotes the number of hidden neurons of DNN, j = 1,2,3, . . . , k
To categorize the predictions of the classes, we use the Softmax layer in Equation (9). During training, we seek to minimize the cross-entropy error for each neuron.
Where yij is the predictive distribution and pi is the target distribution.
The experimental results of the proposed model are presented in this section. For network anomaly detection, we proposed LSTM with XGB feature selection, and for categorizing the network traffic as normal or abnormal; the LSTM approach employs a softmax regression unit. The suggested model is validated using the NSL-KDD dataset. The LSTM is constructed using a small number of features as the input node, two hidden layers, and two classification nodes in the output layer. During model training, the learning rate is vital for maintaining convergence.
In this study, the Adam adaptive learning rate algorithm is applied that converges rapidly and the learning pace will be fairly rapid and efficient. So it addresses the difficulties that other optimization methods have, like decaying learning rates, delayed converging, or higher fluctuation in parametric quantity modifications, which caused fluctuation of the loss function. The results of the suggested model have been evaluated with existing algorithms such as Convolutional Neural Network (CNN), Deep Neural Network (DNN), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR).
The hyperparameters that are used while training the model are given in Table 3. The Adam optimization technique is utilized which incorporates the factor of momentum with the Adagrad optimization method that enables the decay of learning rate as it allows the learning rate to be altered adaptively throughout the training process without establishing the weight decay manually.
List of best hyperparameters of LSTM
List of best hyperparameters of LSTM
For reducing data to a set of ideal features, the feature selection method is utilized. By deleting unnecessary attributes from the provided dataset, the feature selection reduces training time. Once the ideal features have been determined using the XGB methodology, they may be used to efficiently train and evaluate the model.
Number of hidden layers
The performance is determined by analyzing the number of hidden layers as increasing the number of hidden layers helps to increase the accuracy of the model in complex situations. For simple tasks, one hidden layer is enough. To boost the model performance in complex tasks, the number of hidden layers will be increased further. The model performance is evaluated using the hidden layers 1 to 4. The performance of the network is analyzed under the different hidden layers from 1 to 4 which is shown in Table 4.
Metrics values under different numbers of hidden layers
Metrics values under different numbers of hidden layers
Metrics values under different numbers of hidden units
By varying the number of hidden layers, the metrics like accuracy, precision, recall, and F1score will be accounted for. The number of hidden layers will be set by considering the best values for these four parameters. The accuracy and precision of the LSTM model will be high when there are two layers in the network as shown in Fig. 4. Thus the network with 2 hidden layers is best for this work.

Metrics values under a different number of layers.
In this study, a distinctive number of hidden neurons are used which will be changed in each layer. Following that, the performance of the LSTM classifier is evaluated using various numbers of hidden units. As illustrated in Fig. 5, the suggested approach provides the maximum accuracy and precision value when the units in each layer are 200.

Metric values under different number of hidden units.
The highest value of accuracy obtained by using 200 units is 96.1% whereas the accuracy obtained by using 100 units is 93.7 %, and the accuracy obtained by using 300 units is 87.49%. The performance of the model will be highly enhanced by using the 200 units in the fully connected layer.
The dataset is partitioned into testing and training sets with 80% and 20%, respectively. The LSTM system will be built by utilizing training samples which will be then validated using the testing data. The database is partitioned into ten equalized subgroups for ten-fold cross-validation. A single subset is exploited as validation data, whereas the leftover nine subsets are used as training data to test the model. This technique is performed 10 times, with each subset of data serving as validation data just once. The developed model’s performance is compared to that of various systems such as Convolutional Neural Network (CNN), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Deep Neural Network (DNN). The evaluation criteria are used to assess the proposed methodology: F1 score, accuracy, recall, and precision. True positive (TP): Intruder behavior that is correctly classified as an attack. True Negative (TN): Normal actions are classified as normal. False-positive (FP): Misclassify normal behavior as an attack. False Negative (FN): Misclassify normal behavior as an attack.
Accuracy
A criterion is the only factor to be assessed while analyzing the performance of DL algorithms. This is partly related to its simplicity and ease of implementation. Accuracy is defined as the proportion of accurately classified instances divided by the total number of instances.
Whereas TP, the true positives represent the positive instances predicted as positive whereas TN, true negatives represent the negative instances predicted as negative. Simultaneously FP, the false positives represents the negative instances predicted as positive whereas FN, the false negatives represent the negative instances predicted as positive.
Precision and recall are frequently combined because they are correlated. Precision is defined as the ratio of successfully predicted attacks to all samples predicted as attacks. The recall is a ratio of all correctly identified attack samples to all attack samples. The mathematical formula for precision and recall can be given as,
The F1 score is the harmonic mean of the Precision and Recall and it is a statistical approach for analyzing a system’s correctness. It is used to gauge the quality of the Classification. F1-Score can be mathematically defined as:
It is a common statistic because it may provide an overall perspective of both precision and recall, which is useful when an algorithm has high precision but a low recall or vice versa. While this assessment metric is often utilized to provide a more balanced perspective of precision and recall, it should be used with caution as an evaluation tool. This is because the statistic uses a weighted average of precision and recall [32]. Table 6 shows the accuracy of the proposed LSTM with other techniques like SVM, RF, LR, CNN, and DNN.
Accuracy of various algorithms compared with proposed LSTM
While comparing with other methods, the accuracy of the proposed LSTM system is found to be increasing whereas using the feature selection method, the overfitting problem will be reduced thereby removing the redundant features with XGBoost. Figure 6 demonstrates that the proposed LSTM model outperforms other models with an accuracy of 98.99% with feature selection and 97.32% without feature selection. While other algorithms like DNN showed an accuracy rate of 97.87%, CNN with an accuracy of 97.9%, LR with 95.12%, RF with 93.54%, and SVM with 96.6%.

Accuracy of various algorithms compared with proposed LSTM.
Table 7 shows the performance of the LSTM classifier with and without the feature selection and feature selection using XGBoost. The performance of the LSTM classifier with and without feature selection and feature selection using XGBoost is shown in Table 7. The highest precision, recall, and F1-score values obtained by proposed LSTM without feature selection are found to be 87%, 81%, and 81% respectively. The performance will be improved by using the feature selection process. The highest value of precision, recall, and F1-score obtained by the proposed LSTM with feature selection are 91%, 87%, and 85%. These values are higher than the existing models of SVM with precision values of 82%, RF with 85%, LR with 76%, CNN with 89%, and DNN with 87%.
Precision, recall, and F1 score values of various algorithms compared with proposed LSTM
From Fig. 7, Fig. 8, and Fig. 9, it is observed that the precision, recall, and F1 score of the framework is 0.91, 0.87, and 0.85 respectively because the feature selection technique is implemented which helps to remove the redundant, noisy, unwanted data from the dataset thereby removing the overfitting problem of the learning model by increasing the learning rate and decreasing the model training time and computational complexity of the model.

Precision of various algorithms compared with proposed LSTM.

Recall of various algorithms compared with proposed LSTM.

F1 score of various algorithms compared with proposed LSTM.
In this work, the LSTM with XGB feature selection has been proposed for efficient classification of the network anomaly as the growing technologies may cause network risks that cause several network attacks. To overcome these network attack problems, the LSTM method being trained with the most important features of the dataset was proposed for detecting the network anomaly thereby removing the irrelevant noises and unwanted data. NSL-KDD is the data utilized for this work and the suggested approach was evaluated with some of the deep and machine learning technologies like CNN, DNN, LR, RF, and SVM. The quality of the result was assessed using the factors like Accuracy, F1 score, Recall, and Precision. The proposed system obtains the highest accuracy of 98.99% and the precision, recall and F1 score of the framework are 0.91, 0.87, and 0.85 respectively. By implementing the feature selection technique, unpleasant noises and unwanted data will be removed thereby avoiding the overfitting problem of the learning model by increasing the learning rate and decreasing the model training time and computational model complexity.
Footnotes
Acknowledgments
The author wishes to thank the editors and reviewers for their hard work. We acknowledge DST-File No. 368, DST-FIST (SR/FIST/College-235/2014 dated 21 Nov 2014) for financial support and DBT-STAR-College-Scheme-ref.no: BT/HRD/11/09/2018 for providing infrastructure support.
