Abstract
Anomaly-based detection is coupled with recognizing the uncommon, to catch the unusual activity, and to find the strange action behind that activity. Anomaly-based detection has a wide scope of critical applications, from bank application security to regular sciences to medical systems to marketing apps. Anomaly-based detection adopted by various Machine Learning techniques is really a type of system that consists of artificial intelligence. With the ever-expanding volume and new sorts of information, for example, sensor information from an incontestably enormous amount of IoT devices and from network flow data from cloud computing, it is implicitly understood without surprise that there is a developing enthusiasm for having the option to deal with more conclusions automatically by means of AI and ML applications. But with respect to anomaly detection, many applications of the scheme are simply the passion for detection. In this paper, Machine Learning (ML) techniques, namely the SVM, Isolation forest classifiers experimented and with reference to Deep Learning (DL) techniques, the proposed DA-LSTM (Deep Auto-Encoder LSTM) model are adopted for preprocessing of log data and anomaly-based detection to get better performance measures of detection. An enhanced LSTM (long-short-term memory) model, optimizing for the suitable parameter using a genetic algorithm (GA), is utilized to recognize better the anomaly from the log data that is filtered, adopting a Deep Auto-Encoder (DA). The Deep Neural network models are utilized to change over unstructured log information to training ready features, which are reasonable for log classification in detecting anomalies. These models are assessed, utilizing two benchmark datasets, the Openstack logs, and CIDDS-001 intrusion detection OpenStack server dataset. The outcomes acquired show that the DA-LSTM model performs better than other notable ML techniques. We further investigated the performance metrics of the ML and DL models through the well-known indicator measurements, specifically, the F-measure, Accuracy, Recall, and Precision. The exploratory conclusion shows that the Isolation Forest, and Support vector machine classifiers perform roughly 81%and 79%accuracy with respect to the performance metrics measurement on the CIDDS-001 OpenStack server dataset while the proposed DA-LSTM classifier performs around 99.1%of improved accuracy than the familiar ML algorithms. Further, the DA-LSTM outcomes on the OpenStack log data-sets show better anomaly detection compared with other notable machine learning models.
Keywords
Introduction
Anomaly-based identification and classification together make for a practical pair with regard to finding a meaningful outcome for irregularity identification use cases. The anomaly-based detection model is utilized first—in an interpretation stage—to assist the digital forensic experts and cloud computing security experts with making sense of what is happening and what they have to search for. The utilization of the identification model is to spot anomalies, and at that point set up an effective classifier model with new data inputs to the classification groups which is already recognized. Once the model is ready, at that point update the detection model to consider these new data inputs as normal and repeat the procedure. This process is outlined in Fig. 1 as one approach to utilize anomaly-based detection.

Anomaly-based detection when uncertainty is the input data.
Anomaly-based detection is a well-known process similar to the classification techniques, and yet in recent times, there has been an extended interest in adopting it. Beneficially, there exist new ways to deal with completing it accurately in real-time settings; considerably more reliable and advanced strategies are currently in development and have wide research scope with respect to Artificial Intelligence approaches, namely, the machine learning and deep learning models (Bhattacharyya et al. 2013 [4]).
To get a lot of precisely labeled information which could be represented by a wide range of behavior is very expensive. Normally, feature extraction and labeling are processed without tools by experienced subject matter experts; also this involves significant time and more effort. For the most part, a dataset containing examples named normal conduct was acquired more effectively apart from the dataset related to abnormal conduct, including a wide range of anomaly types. In addition, anomaly-based data is very dynamic in the deployment environment. A new variety of malicious data may require a related label for preparing information at training. In other aspects, no prior training information is required for the anomaly-based detection related to unsupervised methods and thus, these strategies are simpler to utilize. The non-processed dataset is considered as the entry as per the process of methodology and the technique to look for the intrusion attempts among the available data. Further, the identified intrusion data is utilized for training the model on the misuse category or on the supervised techniques. The unsupervised training methods for anomaly-based detection are outlined in the subsequent sections. Regularly a few strategies are utilized in various combinations.
Clustering methods can learn and recognize anomaly data without the need for vast details of the respective classes or the category of an anomaly given by the engineers. Therefore, the abnormal discovery dependent on the cluster methods doesn’t require any processed data like in training for better information. The clustering techniques are applied broadly to the computer network anomaly-based detection. A network intrusion based detection techniques related to the cluster method and the classification method is outlined in Fig. 2.

Clustering-based anomaly detection.
Clustering methods are done as per the available information in which they are chosen with a few groups, also referred to as abnormal categories, and others as the normal category based on some specified rules. It results in different profiles among the specified classes. In a sense, the prepared training data avoided in listing the group profiles, are utilized for the classification techniques. In the testing phase, the technique utilizes the controlled classification methods towards grouping the traffic data in the computer network as anomalous and usual.
The Probability related computing methods are reasonable for computer network anomaly-based detection, because regularly one can’t discover the exact models. The probability related processing techniques are generally thought of as the encompassing techniques, for example, the genetic methods, artificial neural networks, fuzzy computing, bio-inspired ant colony based techniques, and simulated immune-based techniques. As per the category of algorithms related to the classifier, there exist various machine learning techniques which are utilized in grouping the anomaly-based detection (Naseer et al. 2018 [30]).
Isolation forest
The Isolation Forest ‘detaches’ perceptions by arbitrarily choosing a feature, and afterwards choosing a split between the most extreme and least estimations of the feature selected (Gao et al. 2019 [9]). Since recursive partition can be shown by a tree structure, the quantity of splittings required to separate an example is proportionate to the way length from the root node to the ending node. This way length, found the middle value of over a forest of such arbitrary trees, is a proportion of typicality and our decision method. Arbitrary dividing produces observably shorter ways for an anomaly. Consequently, when a forest of arbitrary trees all in all produce shorter way lengths for specific examples, they are almost certain to be the anomaly. Further, the information point is taken care of into a prepared backwoods model for each tree and the irregularity score is characterized as:
The abnormality score is calculated for each tree and normal those out across various trees and get the last inconsistency score for a whole backwoods for a given information point. The technique works better than other models to identify the anomalies with a small dataset. There are various existing anomaly detection strategies (i.e., SVM, Random forest). This calculation is one of a kind that it rather isolated the anomalies as opposed to profiling typical instances (Liu et al. 2008 [11])
Support vector machine (SVM)
The second classifier utilized in this investigation is (SVMs), Support Vector Machines which are a lot of linear generalizing classifiers (Liu et al. 2019 [16]). The best advantage of SVMs is that, after the model training stage is finished, the real forecast can be quick. Additionally, in some real-time cases, SVMs have the best order execution. A disadvantage of SVMs is that the model training can be costly if there are countless input samples for training. The fundamental idea of (SVM) Support Vector Machines is to infer a hyper-plane that improves the isolating edge between two classes —the positive class and the negative class. The conventional standard support vector machine will settle on the choice surface digress from the ideal situation in the feature domain. When it is mapped again back to the entered space, it brings about a greater nonlinear boundary of decision. Along these lines, the standard SVM is so sensitive to the data with noise, that it leads to a poor ability to generalize the model.
ANN (Artificial Neural Network) approaches
The progress in the ANN is naturally inspired from the beginning of this science with the acceptance that, human beings’ thinking process with the brain differs totally from the traditional advanced computing (Erfani et al. 2016 [7]). ANNs are built up models for different applications, for example, data group clustering, extraction of features, and various anomaly pattern recognitions in computer networks (Lee et al. 2018 [15]).
The approach of Aberkane et al. 2019 [26] autonomously trained to learn new types of attacks quickly utilizing tailored reinforcement techniques. His methodology utilizes the learning input related to the signature pattern addition when another new type of attack is to be experienced. In the work of Alnafessah and Casale 2020 [2], they utilized the stacking order of artificial neural network systems for the identification of the anomalies. The neural network layers are trained to utilize the data which spans the whole normal data space and can identify a new pattern of attacks in the process. Jieming et al. 2019 [12] report an industry-ready process for detecting recorded patterns and dynamic patterns of attacks with reference to the computing network data flow adopting the un-supervised artificial neural networks.
Because of the adaptable approach of Artificial Neural Networks, it is possible to train and test occurrences gradually using definite algorithms. Stacked artificial neural networks based methods are so much more efficient than single-level artificial neural network systems (Savaridassan et al. 2020 [27]).
RNN (Sequence Models) are notable techniques in DL neural network modelling. RNN is utilized to perceive created pictures and text messages with greater performance (Sheikhan et al. 2012 [28]). RNN finds it difficult to catch the long-term relevancy that associates the task-based on consecutive connections and the vanishing gradient (Kim et al. 2016 [14]). With respect to the variant sequence model illustrated in Fig. 3, the long short-term (LSTM) is foreseen towards catching the long term associations. The essential objective of the Long Short Term Memory technique is to get rid of the vanishing gradient utilizing the optimizing algorithm to find out the weights of neural systems to prevent the long-term issues of dependency. This experimental setup utilizes the LSTM to identify the anomaly.

Long short-term memory (LSTM) model stacked on one another.
Data Analysts considered the effectiveness of IDS techniques in (Lu et al. 2018 [29]). The CIDDS-001 OpenStack flow-based dataset was investigated adopting various ML algorithms, considering KNN, Decision tree, and the clustering techniques like the k-means algorithm. Additionally, the dataset is assessed with various datasets utilizing profound Deep learning (DL) models to precisely categorize the attacks. In recent times, DL based approaches perform better than benchmarked conventional machine learning (ML) techniques.
This research paper is organized as follows: The Literature review of the system is described in Section 2. Section 3 presents the methodology employed in this work. Section 4 describes the data and computation environment. The Section 5 outlines the results obtained and discussions. Finally Section 6 concludes this paper.
Aberkane et al. 2019 [26] worked on anomaly based detection systems using the Long Short Term Memory models. The flow-based CIDDS dataset is utilized by the model. The authors have accomplished an accuracy metric of 85.5%. The researcher Vinaykumar et al. 2019 [25] experimented with an approach called the scalable integrated detection system, namely, the AlertNet, an integrated system of network IDS and host IDS that could be employed to identify and to alarm probable internet attacks. The well-known benchmark network flow datasets utilized are: Kyoto, NSLKDD, CICIDS-2017, UNSW-NB15, and KDD CUP99. In their assessment, the authors have concluded that Deep Neural Networks performed better as opposed to the conventional classifiers.
Tao et al. 2018 [19] in their work outlined one model meant for intrusion detection utilizing Genetic simulated techniques (Genetic Algorithm) and SVM. The core idea is on the selection of weight, features, and optimizing the parameters of the SVM using the GA techniques. In their work, double pace fine tuning done with the support vector machine algorithm based on the genetic algorithm is utilized: to choose the feature component subset and fine tune from the available chosen input values with the SVM technique configuration value.
Alex et al. 2018 [1] recommended another model reasonable to be adopted in the application of the deep network segment for anomaly assessment based on the intrusion finding computing frameworks by means of the artificial neural network structural. Their proposed network model accomplished an accuracy of percentage 98%, and 0.98 AUC in the recurring Ten-fold cross technique with respect to validation. To identify and group the attacks, an online exploitation and the vulnerable repository, namely, the exploitdb was adopted.
In the referred deep learning paper, Alsughayyir et al. 2019 [3] worked on the DL (Deep Learning) techniques with respect to the Auto encoders for multiclass grouping towards classifier systems. The conversion type MinMax scaling is considered for the normalization approach. The KDD dataset was adopted for the model training and for the assessment. The overall accuracy of the system demonstrated better than the conventional techniques a performance level of 91.28%with respect to the testing period.
Das and Morris 2017 [22] experimented with a model utilizing four types of classifiers: J48, Naive Bayes, Random Forest, and OneR. The authors have adopted the MODBUS dataset gathered from a gas system pipeline for assessment. With reference to cross validation tenfold is used. The J48 algorithm performed with better results than others with 0.995 AUC, 0.992 precision and 0.992 recall.
Darkaie and Tavoli 2019 [18] utilized a new approach adopting the MLP neural system. Back BP (back propagation) is utilized in training the artificial neural system and reducing the error related with the weights. In this model, they have used the dataset KDD99 for the model training and for the evaluation. In the model, to reduce the feature, the PCA technique is adopted. The new technique has accomplished an accuracy of 91%.
Patgiri et al. 2018 [24] built up a working model which utilizes SVM and Random forest techniques. Recursive exclusion of feature is utilized as the method for feature extraction and selection with the support vector machine. The dataset NSL KDD is utilized for the model training and for the evaluation. The authors have used the Cross-validation techniques in assessment of the model. RF (Random Forest) performed better than the support vector machine before the feature selection process. After the selection of the feature, the support vector machine shows better results compared with the Random forest.
Ludwig et al. 2019 [17] proposed an ANN ensemble technique for attack classification. The developed ensemble technique comprises of an auto-encoder, a system for learning, the Deep neural n/w system and the artificial neural system with respect to deep-belief. The dataset KDD is utilized in training the model and for evaluation. The authors have tested 93%accuracy, 92%f-measure and 92%recall.
Yin et al. 2017 [6] designed a working system adopting the RNN. The author evaluated the system with conventional machine learning classifiers. The adopted dataset NSL KDD is utilized for the model training and assessment. Their results outlined that the presentation of recurrent neural network IDS shows better performance than that of conventional methods of classification.
Yang and Wang 2019 [10] implemented an ANN model utilizing improved CNN. Model training comprises of forward and back propagation techniques. The dataset NSL-KDD CUP was used for the evaluation. The model was compared with other available three techniques - Recurrent Neural Network, ANN and Deep-Belief networks. They have accomplished 95.36%accuracy, better than the other three models.
Lotfallahtabrizi et al. 2018 [20] introduced HIDS that utilizes ANNs. A Neural Network with feed-forward is designed by two intermediate layers. The authors adopted intelligent gadgets dataset in the process of the assessment. The result of the model is 0.011 FAR and 0.974 DR.
Li et al. 2019 [32] implemented a detection system utilizing two RNNs (LSTM, GRU) and BLS and its expansions. The authors have assessed the execution of the proposed models utilizing the dataset BGP and also KDD-NLS. Using the BG Protocol dataset, the author accomplished an accuracy and f-measure of 90%–l 95%; also with the dataset KDD-NSL the performance is between 80%–85%.
Pokhrel et al. 2019 [21] performed a model utilizing (NB) Naïve Bayes and SVM. The model is analysed from two Organizations. The model accomplished an 95%–l 96%accuracy, 98–l99%recall and 92%–l 93%precision, and 0.9313 and 0.9518 ROC curve area individually for the two tested Organizations.
AI and ML strategies from the inception utilized towards handle divergent issues. Algorithms have the capacity in learning as well as recognizing the patterns with no additional programming (Xin et al. 2018 [31]). So, the performance of ML methods is extraordinarily reliant on the data representation. Frequently, it requires skilled expertise to create and get a fitting view from raw type data. With respect to the DL approaches, the algorithms solve the respective issues through finding appropriate representations. The techniques indicated the capability to outperform the ML strategies in numerous applications with remarkable results.
In the course of our research, the proposed Deep Auto-Encoder Long Short-Term Memory (DA-GA-LSTM) model intended for anomaly-based detection is implemented, by optimizing the suitable parameter using a genetic algorithm (GA) and the performance of the proposed DA-GA-LSTM model assessed on the benchmark intrusion detection dataset CIDDS-001.
Proposed methodology
The major contributions related to this work are listed below. The DA-GA-LSTM (Deep Auto-Encoder utilizing Genetic Algorithm with LSTM) model is proposed for classification and anomaly based detection. The proposed DA-GA-LSTM model is assessed using two benchmark datasets and the outcomes are evaluated with notable machine learning and deep learning models.
The DA-GA-LSTM model utilizes two phases, a Deep Auto-Encoder (DA) and LSTM. To start with, the available data group is partitioned as separate sets, one among them is positive named information (class of normal) and another set is the negative marked information (class of abnormal) for the DA training process. The DA consists of a distinct neural network with hidden layers of up-to three. The network system is trained by the single category labeled information with no explicit labels. Thereafter, two outputs are combined to obtain a distinctly labeled dataset and applied to a genetic algorithm for the optimal parameters for the LSTM to detect the anomaly. The DA-GA-LSTM design with two Deep Auto-encoder systems and an LSTM model for the anomaly-based detection and further classification is represented in Fig. 4.

The architecture of the DA-GA-LSTM model consists of two DA networks and anomaly-based detection with optimized LSTM model.
As per the category of algorithms related to the classifier, there exist various machine learning techniques, which are utilized in grouping the anomaly based detection. The Various models used in this work are as follows. In the following section, the Deep Auto-encoder, Genetic algorithm (GA), and LSTM architectures adopted in the proposed work are explained.
Deep auto-encoder architecture
The Deep Auto-encoder (DA) falls under the category of deep learning technique comprised of a multi-layer network system of the feed-forward type and a similar amount of data with respect to the input neuron and output neurons. The objective of the DA is to bring about reduced representation and also decrease the errors in the data. Model training is done utilizing the backpropagation technique as per the loss computation.
The Autoencoder approach with more hidden layers is known as the DA (Deep Autoencoder). As numerous encoder and decoder layers are available, it empowers a DA to outline the complex data allocation. Figure 5 depicts the design structure of the Auto-encoder, which consists of the first layer as the input, two consecutive layers as the intermediate, and one final layer as the output. The hidden encoder vector is determined from Xn, where Xn denotes an unlabeled dataset.

Deep Auto-encoder structure with an input layer, an output layer, and three intermediate layers.
Where g is the function for encode, M1 is the weight matrix of the encoder, and bv1 is the vector for bias.
The decoding process,
Where f is the decoding function, M1 is the weight matrix of the encoder, and bv1 is the vector for bias.
GA (Genetic Algorithm) represents the heuristic process and an optimization strategy observed by the procedure of the natural method of selection. It is broadly utilized for finding an optimal solution for the problems related to optimization with huge boundary space. The process of progression of species is mapped, by relying upon naturally inspired things, for example, the crossover process. Moreover, as it doesn’t consider the derivatives, it tends to be utilized for both continuous and discrete techniques of optimization.
Figure 6 outlines a flow of the complete genetic algorithm procedure, where, the initial solutions (the populations) are generated randomly. Next, the solutions were assessed by a selected fitness function, then the crossover, and afterward the mutation function. This procedure is iterated for a fixed number of times (called the generations in the genetic terminology). Towards the end, an optimal solution with most fitness scores could be chosen to be considered for the best available parameter.

Process flow diagram of traditional GA.
The vanilla LSTM type of recurrent network system is effectively used to address the sequential challenges of data space (Kim et al. 2016 [11]). It is designed with cells to store the data in the form of blocks that can be connected recurrently. These defined cells take care of the problem related to the vanishing gradient in RNN. Each of the LSTM sections contains self-associated cells among the forget gate, output gate, and the input gate. These designated gates were intended to store the data state longer than the neural network systems of type feed-forward, towards improving the performance of the system. A section block in an LSTM model contains cells which are associated recurrently, as shown in Fig. 7.

LSTM block outlined by forgetting gate, input gate, block input, output gate, tangent activation functions.
The formulas specify the forward pass of a vanilla LSTM implementation:
The process flow of the DA-GA-LSTM model outlined in Fig. 8 starts with the preprocessing module for the defined outcome of the data by utilizing the Deep auto-encoder (DA) and further, the resulting dataset is trained with the Long term short memory module optimized with the Genetic Algorithm (GA) to naturally discover the ideal window size and various units to be utilized, for the anomaly discovery in the openstack log dataset.

Process flow diagram of the DA-GA-LSTM model.
The Openstack netflow dataset is partitioned into two sets, positive identifier as normal and negative identifier as abnormal, taken by the Deep-Autoencoder process for training. Further, the Deep-Autoencoder results are consolidated and forwarded into the streamlined LSTM network for abnormality classification from the log dataset. The utilized Genetic procedure with LSTM (GA-LSTM) is driven with the selection of random criteria of the population and the evaluation function which tends to act upon the three essential activities to recognize the ideal parameters. The three tasks are listed below.
For implementing a Genetic Algorithm, two preconditions must be satisfied, 1) an available representation or characterizing a chromosome and 2) a best-suited fitness function to assess the created solution representation. In this case, an array of the binary values is a genetic type representation of the solution (see Fig. 9) and models the (RMSE) Root-Mean-Square Error on the evaluation set which acts as the fitness value. Furthermore, three fundamental tasks that establish a Genetic Algorithm are the selection, next the crossover and the mutation process applied with heuristic methods.

A solution structure as per genetic representation.
GA (Genetic Algorithm) is utilized for finding an optimal window size and no. of units in LSTM based RNN as per the flow mentioned in Fig. 10. For the Genetic Algorithm, the DEAP python package will be adopted. The basic idea of this methodology is to utilize the algorithm, to discover the optimal parameters with the help of two helper methods. The first technique is to fragment the information to make X, Y pairs for model preparation. The subsequent technique is to perform three things, a) decoding the algorithm solution to obtain the window size and the number of units. b) Prepare the available dataset utilizing window size found by the genetic algorithm and partition into training data and another as the validation data, and c) train the LSTM model with the optimal parameters, compute the RMSE (mean square error) with respect to the validation data and fitness score to be returned as the current solution space to the genetic algorithm. Bernoulli distribution is used for random initialization, similarly, crossover in an ordered form, and shuffling the mutation process using the roulette wheel selection technique. Subsequently, the optimal parameters are taken for model training from the total training set and also to test it with the projected test data. The LSTM model is now utilized for anomaly-based detection. This neural network has only one hidden layer. To start with, the combined dataset is separated into train and test data with 15%for model learning and 85%for the accuracy test, and the datasets are mixed up. Further, the train data is to be partitioned into two data groups as 80%in model training and 20%invalidation of the trained model. Using the Keras [13] for implementation the input is taken to convert each element to a vector with the embedded layer. The LSTM hidden layer of size 100 is used and utilizing softmax in the final layer to classify the data into two labels. The adam optimizer is adopted, and for the estimation of the loss, “categorical cross” entropy is utilized.

the flow of Proposed GA-LSTM Methodology for optimization.
The outcome derived from the GA-LSTM network is controlled by computing the root mean squared error estimation (RMSE).
RMSE can be characterized as:
In the above expression 6, number of samples is denoted as n, and the value predicted is y′ by the model and the actual preferred value is y.
Open Stack server benchmark dataset CIDDS-001
The dataset mentioned as CIDDS-001 is the network flow information, which includes two distinct parts sorted out with respect to the area considering the captured traffic (Ring et al. 2017 [23]). As per the flow outlined, the network flow saved with OpenStack condition is utilized towards the metric assessment of the models. The initial category dataset is utilized for the model training in learning the features, while the second data are used for the evaluation of the model. The qualities of the benchmark dataset comprise of increasingly basic settings identified in NetFlow documentation outlined in Table 1. Because of protection sensitivity, the creator has hidden the initial three bytes as per the open locations, and also specific addresses of both the server which is external, and the DNS. The flow-based dataset was caught from the OpenStack server over a period of about a month. Among the initial fourteen days, an aggregate of four distinct known attacks is conveyed irregularly close by other general behaviors. The known attacks computerized also the quality of each type of attack trials to accomplish, introduced as per Table 2. Table 3 depicts the no. of instances and the flow types caught in the OpenStack cloud environment.
CIDDS-001 dataset features with respect to NetFlow
CIDDS-001 dataset features with respect to NetFlow
Attacks executed in the OpenStack environment
CIDDS-001 dataset classes of OpenStack network flow
Keras’ deep learning package is a significant application of interface type neural systems, developed using python language, as the TensorFlow is executed backend, which is utilized for implementing DL models like LSTM. The investigation is done by discovering the parameters to achieve the best execution measures of anomaly detection systems. The LSTM model in the training phase uses a mapping entry layer by ten neurons in-lined data features, comprising of six neurons of the hidden layer and, five neurons at the output layer. 200 epochs are set for the number of iterations setting to the training stage. A considerable range of weights for the network, utilized in this experimental research, is between 0 and 0.05. In the learning phase, the loss method is determined to assess the weights within the neural network. In this trial, the logarithmic loss function is adopted, mentioned as “categorical cross-entropy” in Keras, as the examination means to handle problems related to multi-classification.
To evaluate the difference between the actual values and predicted values the loss function is important. The fit() method is adopted with the training data; also the model parameter indicates the batch size and epochs. Once the implemented system is trained with the complete data, the model loss value and the accuracy metric can be assessed, as the measurement of performance with respect to the data.
The proposed GA-DA-LSTM model and machine learning models are assessed utilizing the two benchmark datasets, in particular to OpenStack raw log and CIDDS-001 flow-based OpenStack server dataset. The proposed models resulted in better outcomes with the two well-known datasets.
The associated four measures are utilized to assess the performance: precision, F-measure accuracy and recall. The accuracy metric defined as a portion of input data predicted accurately.
A portion of the more typical measurements utilized in this analysis is demonstrated as follows. It is also referred to as TPR - True Positive Rate. The primary measurement is Sensitivity or Recall. It is utilized to quantify the rate for every instance related to the positive category (in this case abnormal type events) and classified or identified as it is.
The next measurement utilized is PPV (Precision). It means the rate at which all the anticipated occasions of a class are truly evident:
Accuracy: It is the assessed proportion of accurately perceived information record towards the complete no. of information record within the given information collection. The high pace in exactness illustrates that the trained system provides better performance. The metric is defined as.
F1-Measure/F1-Score: It is termed as the consonant mean towards Recall as well as Precision. The high rate of the score concludes that the ML system outperformed well. The F1-Score [0,1] is characterized as follows.
TP - quantity of right predictions (traffic with anomaly) in the positive class
TN - quantity of right predictions (normal traffic) in the negative class
FP - quantity of forecast with respect to anomaly instance, are not really abnormal.
FN - number of anomaly instances that are not identified.
The OpenStack system represents a cloud-based IAAS structure that effectively controls an enormous group of networking, storage, compute, and identity management, all through a datacenter. This OpenStack server log dataset was produced on the CloudLab, an adaptable, logical framework for research on distributed computing. Both the positive logs (normal) and negative logs (abnormal) with the abnormal injections are specified, making the dataset suitable for research in anomaly-based detection.
For the OpenStack raw log dataset, a positive category of 136,074 log records and a negative category of 17,434 log records (abnormal) is acquired with the Deep Auto-encoder technique. From the dataset, 18,420 messages are utilized for the model training, and 4,605 used for the model validation along with the left out log records 130,481 for the model test stage. Taking into account the proposed DA-GA-LSTM technique, with respect to model training an average 98.5%of accuracy and model validation accuracy average of 98.6%was obtained. The model training average loss is 0.05. The accuracy of model testing is 92.5%, the precision for negative logs is 95.4%and for the positive logs it is 96.7%, and with respect to the recall for negative and positive logs it is 94.4%and 95.3%, individually. The projected F-measure score for negative and positive logs is 94.9%and 96%correspondingly.
Table 4 illustrates the outcomes for the OpenStack cloud raw log dataset with the DA-GA-LSTM and other neural network classifier models, individually. For the particular OpenStack cloud server raw log dataset, recall, average validation accuracy, testing accuracy, training average accuracy, training average loss, precision, and the F-measure are outlined.
Performance metrics of three models under OpenStack data set
Performance metrics of three models under OpenStack data set
The Existing benchmark datasets could save the data gathering time and improve the examination profitability. This CIDDS-001 dataset involves 13 considerable features. The network attacks are sorted into five different classes as shown in Fig. 11. The CIDDS-001 flow-based dataset is the most widely recognized dataset utilized for the anomaly-based detection frameworks that are accessible publicly [5].

Attacks in CIDDS-001 dataset.
The CIDDS-001 dataset is utilized for the execution assessment of the proposed DA-GA-LSTM approach. The corpus of the dataset contains flow-based data-set created for anomaly-based detection assessment. The dataset of system traffic was retrieved with OpenStack setup also with an external server. Moreover, the first information of the dataset contains 1 positive (classes of normal) along with 92 negative (classes of abnormal). Also, it includes majorly 159373 entries of records having specifically 13 element vectors: Dest Port, Src Port, Dest IP, Src IP, Proto, Duration, Date initially observed, Bytes, Flags, Packets, AttackID, AttackType, and Class. The attacks as per the dataset are ordered into five groups of classes: suspicious, attacker, unknown, victim, and normal as shown in Table 5. Since the input data of LSTM must be a numerical matrix, all non-numeric data features, like, ‘protocol_type’, ‘IP Address of the goal hub’, ‘banner’ highlights, and ‘IP Address of the source hub’ should be converted into the numeric structure.
CIDDS-001 classes of records with attack and normal classes
The performance with respect to the classification of the LSTM model was assessed utilizing the accuracy metric. The plot related to accuracy is shown in Fig. 12, which demonstrates the OpenStack dataset with a better learning rate and acquired acceptable exhibitions. Figure 13 denotes that, the proposed system keeps up practical and identical execution for both the testing and the training data.

Accuracy of DA-GA-LSTM model in the training and testing phase with 100 epochs.

Loss of DA-GA-LSTM model in the training and testing phase with 100 epochs.
Though the FPR metric in the proposed DA-GA-LSTM model should be somewhat higher than that of the other machine learning models, the percentage of recall, accuracy and precision are results best compared with the three different algorithms as shown in Table 6.
The Proposed model comparison with other Algorithms under CIDDS-001 dataset
In this experimental approach, an LSTM neural network model dependent on genetic-based optimization is proposed to locate the optimized parameters with the parameter adjustment of the LSTM neural system. The main phase of these models utilizes a Deep Auto-encoder (DA) system to extract the obtainable information and important features of the data with available data space and further, a subsequent phase in anomaly-based detection by the optimized LSTM network model.
The proposed system gives great performance results to the two OpenStack instance datasets with the adoption of a genetic algorithm utilized for training the model. This demonstrates that the system could be utilized for tasks other than log-based anomaly detection. Future exploration can consider the impact of hyperparameter tuning on the neural weights between the LSTM neural network’s intermediate nodes.
Although in the most recent decade, countless techniques and frameworks have been created to counter intrusion detection based on log analysis; there are as yet various well known research issues and difficulties. The best of the metrics related with respect to performance to assess a network intrusion detection system (NIDS) is a generally known unsolved issue in anomaly-based detection. In assessing an anomaly-based detection system, the four most significant characteristics that should be estimated are performance, completeness, accuracy, and quality of the data.
In further experimental research, a framework is to be applied towards the comparative experiments with dynamic Deep Learning optimized techniques, namely, the BLSTM, DBN, and GAN networks, as per the benchmark datasets, for example, CIC-IDS 2018, CIDDS-002, CIC-IDS 2017, and so forth. The approach must be the learning feature on the raw log parameter values rather than the determined features, with the goal of achieving deep neural techniques that can consequently learn vital features of data and bring out the best potential of the neural models.
