Hybrid machine learning approach based intrusion detection in cloud: A metaheuristic assisted model

Abstract

Cloud computing provides various cost-effective on-demand services to the user and so it is rising like a real trend in the IT service model. However, due to its open and distributed architecture, it is highly vulnerable to attackers. The security and privacy of cloud users has become a major hurdle. The most prevalent approach for detecting attacks on the cloud is the Intrusion Detection System (IDS). Scalability and autonomous self-adaptation weren’t features of contemporary IDS deployed in traditional Internet or Intranet contexts. Furthermore, they lack determinism, making them inappropriate for cloud-based settings. This necessitates the development of new cloud-based IDS capable of fulfilling the firm’s security standards. Therefore, in this research work, we have introduced a new IDS model for the cloud environment. Our research work is made up of four major phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Initially, the collected raw data are pre-processed to enhance the quality of the data. Then, these pre-processed data are segmented with the newly introduced K-means clustering model, where we’ve optimally selected the centroids by introducing a new hybrid optimization model referred as Spider Monkey Updated with Sealion Optimization (SMSLO), which is the conceptual hybridization of standard SeaLion Optimization (SLnO) and Spider Monkey Optimization (SMO), respectively. At the end of segmentation, two clusters (attack data and non-attack data) will be formed. The data available in both clusters seems to be huge in dimensions, so we’ve lessened the dimensions of the data in the clusters by applying the “Principal Component Analysis (PCA)” algorithm. Subsequently, these dimensionality-reduced features pass into the attack detection phase. The attack detection phase is modeled with the optimized Deep Belief Network (DBN), which portrays the type of attack (Dos, Botnet, DDoS as well) that intruded into the network. Since the DBN makes the final detections; it is ought to be less prone to errors. We have lessened the detection errors such as the Mean Square Error (MSE) of DBN by fine-tuning its weight using a new hybrid optimization model (SMSLO). Finally, the result acquired from the proposed work (DBN $+$ SMSLO) is validated.

Keywords

Cloud computing attack type detection improved K-means clustering Deep Belief Network SMSLO model

1. Introduction

Digital devices, including smartphones and tablets, have become more and more ubiquitous in human life over the decades [34, 33, 64]. These smart devices are typically wire-free, allowing users to access their data (in multimedia format) and applications from anywhere at any time via the internet [59, 57, 58, 61]. The rapid increase in global internet usage requires a new way to manage the size, variety, and availability of data, which is CLOUD COMPUTING [54, 41, 40]. Cloud computing is a rapidly growing technology that enables users to access dependable, on-demand, and scalable resources at any time while incurring fewer infrastructure costs. In today’s technological environment, CLOUD COMPUTING [65, 53, 15, 47] is gaining huge attention among organizations. Furthermore, this technology seems to be the distribution of computer resources that allows users to easily access “servers, storage, databases, networking, software, analytics, and intelligence” through the internet and maintains all application services’ network-attached hardware through a web application [20, 23]. The companies like Amazon and Google have their own clouds and have taken their operations over them [28, 36, 39]. Moreover, during the COVID-19 crisis [21, 8], cloud computing enables coordination, connectivity, and crucial internet services.

Computer networking is a term describing the access of networking resources from a centralized third-party provider using Wide Area Networking (WAN) or Internet-based access technologies. Cloud computing refers to the provision of computational resources on demand via a computer network. Intrusion detection in Cloud computing process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality, integrity, availability, or to bypass the security mechanisms of a computer or network. Despite the rise in cloud adoption, cyber security continues to be a major bottleneck that decreases network performance and, as a result, affects trustworthiness [28, 43, 44]. Information security, computer security, and network security are all sub-domains of cloud computing security [25, 34]. Users begin to be concerned about data security as the majority of the government and private sectors rely on the cloud as the only source of faster data transmission. “According to a recent report by insight, most organizations are moving to the cloud for efficiency, but many still have security concerns” [46, 50, 52, 54, 27]. Some distinctive security issues with cloud computing are misconfiguration, external sharing of information, insecure interfaces, hijacking of accounts, lack of visibility, unauthorized access, malicious insiders, cyberattacks, spoofing, tampering, repudiation, information disclosure, the elevation of privilege, embedded malicious code, protocol manipulation and so on [56]. Furthermore, an attacker gains unauthorized access to computers and installs malware like a Trojan horse on them. DDoS assaults, data theft, and unauthorized access are among the most prevalent attack [54, 22, 64]. The majority of traditional security technologies identify dangers based on a database of prior malware instances [36, 39, 42].

Today, organizations are looking for a solution that would allow them to send out notifications to their employees alerting them towards the presence of a proactive vulnerability on their website, portal, or application [64]. The system is updated at several layers of the application environment, including hardware, network, and applications, at various levels of the stack [20, 23, 27, 28]. Identifying the attack nodes in the system is the greatest strategy to protect data against theft, leakage, and elimination [45, 14, 17, 19]. Although cloud customers may now deploy “off-the-shelf Intrusion Detection Systems (IDSs)” on tenant networks and virtual instances, these technologies have been constrained in terms of coverage and cannot detect specific changes in the cloud hosting environment. IDS’ most popular detection methods rely on signature-based threats as well as user behavior [40, 49]. Scalability and autonomous self-adaptation aren’t the features of contemporary IDS deployed in traditional Internet or Intranet contexts. Furthermore, they lack determinism, making them inappropriate for cloud-based settings. This necessitates the development of new cloud-based IDS capable of fulfilling the company’s security standards. As the architecture of cloud computing is distinct from existing computing technologies like Grid computing, the deployment of presently available IDS and Prevention Systems (PS) cannot meet the appropriate degree of security and reliability.

In early, virtual machine-based IDS is introduced [51] and this system is managed by a remote controller. Also in [2], a virtual machine-based IDS for DDoS is presented. Although, both methods necessitate a separate instance of the IDS for each virtual machine and this system could only find out the well-known attacks. To overcome this demerit, snort-based IDS has been presented [16] but this approach is capable of detecting known threats because it relied heavily on snort. Furthermore, the idea of mobile agents is introduced [6] for IDS to keep the external virtual machine safe. The disadvantage of this strategy is that it has a high overhead. In [32], a mutual agent-based method is presented to determine the DDoS and there are considerable computing expenses for transmitting warning information. Recently, fuzzy clustering and Artificial Bee Colony (ABC) with Artificial Neural Network (ANN) are presented in [7] for IDS. This technique reduces the error but the hybrid combination is too costly. Furthermore, an optimized Neural Network (NN) with Improved Particle Swarm Optimization (IPSO) is deployed in IDS for new feature selection [1]. However, this method still needs a deep learning model to enhance performance. The description of the abbreviation is illustrated in Table 1.

To attain a marvellous attack detection model, the deep learning algorithms with optimization concepts are being the best detection model. The major contribution of this research work is:

•
Introduces a new Improved K-means clustering model to segment the pre-processed data into the attack and non-attack clusters. In the K-means model, the centroids are optimally tuned via the new Spider Monkey Updated with Sealion Optimization (SMSLO) model.
•
Introduces a new weight-optimized Deep Belief Network (DBN) to reduce detection errors. The weight of DBN is fine-tuned by a new SMSLO model.
•
The SLnO and SMO algorithm is hybridized as the proposed SMSLO model to improve the convergence and produce excellent solutions.

The rest of this paper is organized as: Section 2 addresses the literature works done in the arrack detection models in “cloud computing environment”. Section 3 describes the proposed IDS model in the cloud: an overview. Section 4 depicts the pre-processing and improved K-means-based segmentation phase. In addition, Section 5 manifests the Dimensionality reduction and Attack detection model. The results acquired with the proposed model are discussed in Section 6. Finally, this paper is concluded in Section 7.

Table 1
Nomenclature

Abbreviation Description

AaaS Application as a Service

ABC Artificial Bee Colony

AE Auto Encoder

ANN Artificial Neural Network

CNN Convolutional Neural Network

CRESOM Convolution Recursively Enhanced Self-Organizing Map

CS Classifier System

CD Contrastive Divergence

DBN Deep Belief Network

DDoS distributed denial of service

DL Deep Learning

DNN Deep Neural Network

DRNN Deep Recurrent Neural Network

DoS Denial of Service

DT Decision Trees

FDR False Discovery Rate

FNR False Negative Rate

FPR False Positive rate

FRC Fraudulent Resource Consumption

FSOMDM Fuzzy Self-Organizing Maps-based DDOS Mitigation

GRU Gated Recurrent Unit

ICRPU Intensive Care Request Processing Unit

IDS Intrusion Detection System

IPSO Improved Particle Swarm Optimization

LEDEM Learning-Driven Detection Mitigation System

LP Learning Percentage

LSTM Long Short-Term Memory

MCC Matthews Correlation Coefficient

MLP Multi Layer Perceptron

MSE Mean Square Error

NIDS Network Intrusion Detection System

NN Neural Network

NPV Negative Predictive Value

PCA Principal Component Analysis

PS Prevention System

RBM Restricted Boltzmann Machine

SCAE Stacked Contractive Auto Encoder

SD Standard Deviation

SDN Software-Defined Network

SDNMS Software Defined Networking-based Mitigation Scheme

SLnO SeaLion Optimization

SMO Spider Monkey Optimization

SMSLO Spider Monkey Updated Sealion Optimization

SVM Support Vector Machine

TCP Transmission Control Protocol

WAN Wide Area Networking

2. Literature review

Abbreviation	Description
AaaS	Application as a Service
ABC	Artificial Bee Colony
AE	Auto Encoder
ANN	Artificial Neural Network
CNN	Convolutional Neural Network
CRESOM	Convolution Recursively Enhanced Self-Organizing Map
CS	Classifier System
CD	Contrastive Divergence
DBN	Deep Belief Network
DDoS	distributed denial of service
DL	Deep Learning
DNN	Deep Neural Network
DRNN	Deep Recurrent Neural Network
DoS	Denial of Service
DT	Decision Trees
FDR	False Discovery Rate
FNR	False Negative Rate
FPR	False Positive rate
FRC	Fraudulent Resource Consumption
FSOMDM	Fuzzy Self-Organizing Maps-based DDOS Mitigation
GRU	Gated Recurrent Unit
ICRPU	Intensive Care Request Processing Unit
IDS	Intrusion Detection System
IPSO	Improved Particle Swarm Optimization
LEDEM	Learning-Driven Detection Mitigation System
LP	Learning Percentage
LSTM	Long Short-Term Memory
MCC	Matthews Correlation Coefficient
MLP	Multi Layer Perceptron
MSE	Mean Square Error
NIDS	Network Intrusion Detection System
NN	Neural Network
NPV	Negative Predictive Value
PCA	Principal Component Analysis
PS	Prevention System
RBM	Restricted Boltzmann Machine
SCAE	Stacked Contractive Auto Encoder
SD	Standard Deviation
SDN	Software-Defined Network
SDNMS	Software Defined Networking-based Mitigation Scheme
SLnO	SeaLion Optimization
SMO	Spider Monkey Optimization
SMSLO	Spider Monkey Updated Sealion Optimization
SVM	Support Vector Machine
TCP	Transmission Control Protocol
WAN	Wide Area Networking

In 2020, Bhardwaj et al. [3] have suggested a new method for detecting DDoS cyberattacks in the cloud. Combining the stacked “sparse AE” for feature extraction and the DNN for classification, the suggested architecture was constructed. Furthermore, the parameters of AE and DNN are fine-tuned with the aid of an appropriate tuning model. As a result, the suggested model dealt with the problem of overfitting and resulted in a low reconstruction error.

In 2017, Sahi et al. [5] have built a classifier system for identifying and combating DDoS Transmission Control Protocol (TCP) flood attacks in the public clouds. The suggested Classifier System (CS) with DDoS categorized the arriving packets in order to secure the stored records for making decisions regarding the existence or absence of an attack based on the classified outcomes. The data packets are evaluated for attacks in the detection phase, and the packets that have been found to be malicious were denied access to cloud services in the prevention phase. The CS DDoS model’s performance was substantially accurate and time complicated.

In 2020, Wang et al. [63] have developed a novel method for identifying intrusions into the cloud model. The resilient low-dimensional features were extracted automatically using an effective SCAE approach. An SVM classification approach was used to classify the collected characteristics. The SCAE $+$ SVM strategy was evaluated on the “KDD Cup 99 and NSL-KDD” dataset, and the results showed that it outperformed extant models in terms of detection performance.

In 2020, Ravi and Shalinie [38] have suggested a unique security strategy for dealing with DDoS attacks produced by rogue IoT servers. The authors reduced the DDoS assault on IoT servers by incorporating the SDN architecture into the cloud. The DDoS was also identified and mitigated using a LEDEM. The suggested model’s simulation results showed a better accuracy rate in detecting DDoS attacks.

In 2019, Pillutla and Arjunan [24] have developed a FSOMDM methodology”, in which the neurons in a standard NN were updated using fuzzy rules rather than the usual Kohonen neural network model. The authors have used the software-oriented traffic inspection property in their suggested technique to identify and enable DDoS attacks. The suggested work’s effectiveness investigation revealed greater categorization precision.

In 2018, Bharot et al. [37] have constructed a DDoS attack detection and mitigation model with the assistance of the ICRPU and a feature selection mechanism. After evaluating the traffic using the “Hellinger distance function”, the packets were categorized as genuine request groups or DDoS based on the derived properties. The legitimate requests were then transferred to the Normal Request Processing Unit, while the DDoS demands were sent to the ICRPU. As a result, the suggested work was found to have the highest detection rate and the lowest false alarm rate.

In 2019, Bhushan and Gupta [29] have developed a novel methodology for detecting and mitigating the FRC attack in cloud-based services based on network traffic analysis. The authors used a real-world benchmark to assess the suggested methodology, and as a result, the proposed work had a reduced overhead and greater accuracy.

In 2020, Harikrishna and Amuthan [45] has been designed “Convolution Recursively Enhanced Self-Organizing Map and Software-Defined Networking-based Mitigation Scheme (CRESOM-SDNMS)” to prevent DDoS attacks in the cloud computing environment. To overcome the vector quantization challenges, the authors used a better initialization technique and improved topology preservation in the SOM-based classification process. The submitted technique reduced the FPR during DDoS mitigation, according to simulated trials.

In 2020, Kim et al. [26] have introduced CNN-based IDS for DoS attacks. In the two datasets, we create a Deep Learning (DL)-based detection model for Denial of Service (DoS) assaults. Our model is built on a Convolutional Neural Network (CNN), and they use the CNN-based approach to conduct binary and multiclass categorization. This approach attains 99% or more outcomes in binary and multiclass categorization for KDD and 91.5% of average accuracy at the CSE-CIS-IDS 2018 dataset.

In 2020, Boukhalfa et al. [4] have established a novel concept for a Network Intrusion Detection System (NIDS) relying on Long Short-Term Memory (LSTM) to recognize threats and build a long-term memory on them to prevent current assaults that are similar to existing ones while also having a single means to block intrusions. Based on the findings of our detection trials, the accuracy goes up to 99.98% and 99.93% for the categorization of two classes and multiple classes, respectively.

In 2020, Tang et al. [55] have introduced a RNN with Gated Recurrent Unit (GRU) based IDS in SDN environments and it is known as Deep Recurrent Neural Network (DRNN). In this method, the input was treated as a time series by RNN. Eventually, NSL-KDD and CICIDS2017 datasets were achieved 89% multiclass detection accuracy. Although, the SDN architecture is based on flows but the NSL-KDD is not a flow-based dataset.

In 2019, Fontaine et al. [18] have used machine learning methods to achieve a simplified cloud security system. This contributes to a more specific application that employs Decision Trees (DT) and NN as classifiers, which are trained using data gathered by cloud apps. Such methods are used to evaluate web application logs from a variety of servers. These logs are combined into a single format to make feature extraction easier. The NN has the greatest accuracy of 98.47%, making it feasible to detect attacks on various web services in cloud settings quickly and accurately.

In 2019, Aboueata et al. [35] have investigated the applicability of two excellently machine learning techniques, ANN and SVM, in detecting intrusions or unusual behavior in the cloud. They also use feature engineering to determine the best set of features for achieving the highest accuracy with the least amount of training time and complexity. Their goal was to shorten training time by picking the best set of characteristics while maintaining accuracy. Finally, compare their findings to the previous works. Table 2 lists the characteristics and problems of a cloud-based threat detection approach.

Table 2
Reviews on conventional Intrusion Detection System in cloud environment

Types	Adopted methodology	Features	Challenges
Deep learning-based detection	Optimized AE $+$ DNN [3]	$\surd$ Decreased detection time $\surd$ Enhanced detection precision $\surd$ Less computational complexity	$\times$ F1-score and accuracy are both lower $\times$ Data that was noisy, voluminous, and had a high dimensionality was not handled properly
	CNN [26]	$\surd$ Provides a good accuracy rate for both datasets	$\times$ Only detect DoS attacks
	LSTM [4]	$\surd$ Enhanced accuracy rate at multiple classes	$\times$ The system contains less memory bandwidth
	GRU-RNN [55]	$\surd$ Attains good precision, recall, and F1 score	$\times$ Need consideration on optimization methods to enhance the accuracy and minimize the overhead
Machine learning-	SCAE $+$ SVM [63]	$\surd$ Detection performance is improved $\surd$ Make the controller’s bottleneck smaller	$\times$ Lower reliability
based	DT $+$ NN [18]	$\surd$ High accuracy in the feature selection test	$\times$ DT was frequently led to data overfitting
detection	ANN $+$ SVM [35]	$\surd$ Reduced training time and complexity	$\times$ Need consideration on cross-validation to detect the exact attack type
Flood attack	CS_DDoS system [5]	$\surd$ Lower bandwidth $\surd$ Suitable for several attackers	$\times$ Tedious $\times$ Need to improve the security of records
detection	CRESOM-SDNMS [45]	$\surd$ Reliable detection model $\surd$ Precision, recall, and F-measure is higher	$\times$ Testing and training accuracy is lower $\times$ Lower Classification Accuracy
	LEDEM [38]	$\surd$ Higher detection accuracy $\surd$ Lower computational cost $\surd$ Lower bandwidth usage	$\times$ Show lower performance with an increase in data rate
	FSOMDM [24]	$\surd$ Enhanced detection precision $\surd$ Reduced bandwidth consumption $\surd$ Flexible in controlling malicious data traffic flow	$\times$ Higher computational complexity $\times$ Need to decrease the false positive (%)
	ICRPU [37]	$\surd$ Provides best detection rate, accuracy	$\times$ Higher FAR $\times$ Attack detection rate is lower
FRC attack detection	T-distribution based flow-confidence technique [29]	$\surd$ Minimal computational overhead $\surd$ Higher recall and precision	$\times$ Lower attack flow detection $\times$ Lower confidence level

Figure 1.

The architecture of the proposed work.

3. Proposed intrusion detection model in cloud computing: An overview

3.1 Architectural description

Cloud Computing is becoming the favored choice of any IT firm since it provides consumers with flexible and pay-per-use services. However, due to its open and distributed architecture, which would be vulnerable to attackers, privacy and security are big obstacles to its success. The most prevalent approach for detecting attacks on the cloud is the Intrusion Detection System (IDS). In this research work, a novel IDS is developed by following 4 key phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Figure 1 shows the architecture of the proposed work. In our research, we consider the public cloud with the Application as a Service (AaaS) model. Initially, the collected raw data is pre-processed to enhance its readability by the machine learning models. Then, these pre-processed data is clustered with the newly proposed improved k-means technique. At the end of clustering, two clusters will be formulated: (a) normal and (b) attacks. Since the length of data in the clusters (both normal and attack clusters) seems to be higher; they suffer from the curse of dimensionality issue. Therefore, we have applied the PCA algorithm to lessen the dimensionality of the data in the clusters. Then, with these dimensionally reduce data $D^{\dim}$ , the attack detector (deep learning network) is trained. In this research work, we have introduced a new optimized Deep Belief Network as an attack detector. We have optimized the weight of DBN using a new optimization model. The newly introduced SMSLO model is formulated by hybridizing the concepts of the standard SeaLion Optimization (SLnO) and Spider Monkey Optimization (SMO). The outcome from the SMSLO based weight-optimized DBN model will exhibit the type of attack taking place in the cloud computing environment.

4. Description of proposed model

4.1 Pre-processing

Pre-processing is the fundamental step for transforming the raw input data $D^{\textit{inp}}$ into an efficient and useful format. The collected data $D^{\textit{inp}}$ has many missing and irrelevant data, so we have pre-processed it by removing the null values. And then we have normalized those data in the scale $-$ 1 to 1. These pre-processed data are represented as $D^{\textit{pre}}$ . These $D^{\textit{pre}}$ is segmented using a new “Improved K-means clustering”.

4.2 Improved k-means clustering

The k-means clustering is renowned unsupervised learning that fits data points of $D^{\textit{pre}}$ into clusters. The k-means is faster than hierarchical clustering and can produce tighter clusters than hierarchical clustering, especially if the clusters are globular. On the other hand, K means usually performs poorly on very high-dimensional data. So, we have introduced a new improved K-means model, wherein the key components of K-means (i.e. clusters) are selected optimally via a new hybrid optimization model referred as SMSLO. The method finds cluster centers $K$ (centroid), then evaluates the distance (Euclidean distance) between each data item and each cluster center, and then assigns it to the closest cluster. In this research, two clusters ( $K=$ 2) will be formulated: (a) attacks and (b) non-attacks. When all of the data types are divided into these 2 clusters, the first stage is completed, and the average of the previously created cluster is recalculated. This procedure is repeated until the termination conditions are met to the greatest extent possible. Assume that the average of the cluster $C_{i}$ is indicated by $p_{i}$ as the target object and that the criterion function is explained as Eq. (1)

$\displaystyle E=\sum\limits_{i=1}^{k}{\sum\limits_{p\in c_{i}}{|p-p_{i}}}|^{2}$ (1)

calculates the total squared error of all the objects in the database. Euclidean distance is the normal feature distance, which determines the nearest distance between each cluster core and the data object. Equation (1) gives the Euclidean distance $d(p_{i},q_{i})$ . The Euclidean interval between one vector $p=(p_{1},p_{2},\ldots p_{n})$ and another vector $q=(q_{1},q_{2},\ldots q_{n})$ is shown in Eq. (2).

$\displaystyle d(p_{i,}q_{i})=\left[{\sum\limits_{i-1}^{n}(p_{i}-q_{i})^{2}}% \right]^{\frac{1}{2}}$ (2)

Input: count of expected clusters, $K$ and a database including data objects $D^{\textit{seg}}=\{d_{1},d_{2},\ldots d_{n}\}$ .

Output: sequence of $K=$ 2 clusters.

Step 1:

Cluster centroids with $K$ data objects are selected. Our contribution lies in this stage. Instead of finding the cluster centroids randomly, we have selected them optimally using a new hybrid optimization model. Therefore, the data can be grouped more effectively and precisely into attack and non-attack clusters.

Step 2:

Repeat step 1.

Step 3:

Compute the distance among every data object $d(1<=i<=n)$ and the entire $K$ cluster centers $c_{q}(1<=i<=K)$ , and then the closest cluster is assigned with this data object.

Step 4:

Recomputed the cluster center for all clusters $q(1<=q<=K)$ .

Step 5:

Repeat these steps till no alteration is made in the cluster center. The segmented attack, as well as non-attack data clusters, are together represented as $D^{\textit{seg}}$ . Since the data formulated in clusters seems to be huge in dimension, they suffer from a “curse of dimensionality problem”, so we have employed the PCA model to alleviate this issue.

Algorithm 1 shows the pseudo-code of the improved K-means clustering method.

Algorithm 1: Improved K-means clustering method
Input: Database including $n$ data objects $D^{\textit{seg}}=\{d_{1},d_{2},\ldots d_{n}\}$ , count of expected clusters $K$
Output: sequence of $K=$ 2 clusters
Initialize cluster centroids with $K$ data objects
Optimally select the cluster centroids by SMSLO model
Repeat
Evaluate the distance between every data object $d_{i}$ and entire $K$ cluster centers
Assign data object $d_{i}$ to the nearest cluster centroid
Re-evaluate the cluster center for all clusters
Until no change in the cluster centroids

4.3 Dimensionality reduction with principal component analysis

The attained $D^{\textit{seg}}$ from the segmentation phase is given to PCA for dimensionality reduction. PCA is the common method that adopts a complicated fundamental arithmetical standard to convert numerous feasible interrelated parameters into a slighter amount of variables named as major constituents. The goal of PCA is to reduce the large dimensionality of the information space (noticed parameters) to the much smaller dimensionality of the characteristic space. The statistical analysis which is considered in PCA is Standard Deviation (SD), mean, covariance, Eigen values, and Eigen vectors of a matrix.

(a) Mean: It is defined as “the average of the values of the variables throughout the whole distribution.” The central tendency is another name for this phenomenon. Equation (3) gives the mean value for the $A$ random variable, where $D_{A}^{\textit{seg}}=D_{1}^{\textit{seg}},D_{2}^{\textit{seg}},\cdots D_{m}^{% \textit{seg}}$ stands for the random variables and $m$ denotes their size.

$\displaystyle\textit{Mean}(\overline{D^{\textit{seg}}})=\frac{1}{A}\sum\limits% _{g=1}^{A}{D_{A}^{\textit{seg}}}$ (3)

(b) Standard Deviation: It is “used to determine the degree of scattering. The average distance between the mean and the point at which the data is available is calculated by squaring the average distance between the mean and the point at which the data is accessible”. Equation (4) shows the mathematical equation for SD, with the mean represented as $\overline{D^{\textit{seg}}}$ .

$\displaystyle SD=\sqrt{\frac{1}{A}\sum\limits_{g=1}^{A}{({D_{A}^{\textit{seg}}% -\overline{D^{\textit{seg}}}})^{2}}}$ (4)

(c) Covariance: “Between two dimensions, the covariance is determined. This measurement also aids in identifying the magnitude of dimension variations from the mean”. Covariance is represented mathematically by Eq. (5).

$\displaystyle\textit{Cov}(D^{\textit{seg}},G)=\frac{\sum\limits_{g=1}^{A}{({D_% {A}^{\textit{seg}}-D^{\textit{seg}}})}({G_{A}-\overline{G}})}{A}$ (5)

(d) Eigen values & Eigen vectors of a matrix: Each of the individual items included within the rectangular array of the matrix is referred to as an element. A matrix is a rectangular array containing integers, symbols, or expressions. Furthermore, the term $B$ is a ${n\times n}$ matrix, and Eq. (6) is the mathematical equation corresponding to $B$ ’s Eigen value. The scalar parameter in Eq. (6) is indicated as $\lambda$ . Moreover, with aspire of achieving the distinct Eigen values, the Eigen vector of a symmetric matrix is symmetric for real values.

$\displaystyle[B][{D^{\textit{seg}}}]=\lambda[{D^{\textit{seg}}}]$ (6)

Finally, the dimensionality reduced data acquired from PCA is denoted as $D^{\dim}$ , which is used to train the attack detector (DBN) in the detection phase.

4.4 Attack detection with deep belief network

The DBN [65] framework, which was first developed in 1986, is a well-known intelligent method. Figure 2 reveals the attack detection model with DBN. It typically has several layers, with the output layer consisting of visible and buried neurons. Furthermore, there is a strong link between input and hidden neurons; yet, no association rules exist between hidden neurons, and no relationships exist between visible neurons. The connection between visible and hidden neurons is exclusive and symmetric. The Boltzmann network’s neurons provide probabilistic output. The output $\overline{PO}$ in Eq. (7) is based on the probability function $\overline{P_{q}}(\zeta)$ . The pseudo-temperature is denoted by the symbol $t^{p}$ . Equation (8) specifies the DBN model. The feature extraction process in the DBN model is carried out by a collection of RBM layers, while the classification process is carried out by MLP.

$\displaystyle\overline{P_{q}}(\zeta)=\frac{1}{1+e^{\frac{-\zeta}{t^{p}}}}$ (7) $\displaystyle\overline{PO}=\left\{{\begin{array}[]{ll}1&\text{with }1-% \overline{P_{q}}(\zeta)\\ 0&\text{with }\overline{P_{q}}(\zeta)\\ \end{array}}\right\}$ (8) $\displaystyle{\mathop{\lim}\limits_{t^{P}\to 0^{+}}\overline{P_{q}}(\zeta)}=% \mathop{\lim}\limits_{t^{P}\to 0^{+}}\frac{1}{1+e^{\frac{-\zeta}{t^{p}}}}=% \left\{{\begin{array}[]{ll}0&\text{for }\zeta<0\\ \frac{1}{2}&\text{for }\zeta=0\\ 1&\text{for }\zeta>0\\ \end{array}}\right.$ (9)

The “Boltzmann machine energy function” for the generation of the binary state $b i$ of a neuron is revealed by the mathematical design, as shown in Eqs (10) and (11), where $L_{a,l}$ indicates the weights between neurons and $\theta_{a}$ indicates the biases.

$\displaystyle EN(bi)=-\sum\limits_{a<l}{bi_{a}}L_{a,l}-\sum\limits_{a}{\theta_% {a}}bi_{a}$ (10) $\displaystyle\Delta EN(bi_{a})=\sum\limits_{l}{bi_{a}}L_{a,l}-\sum\limits_{a}{% \theta_{a}}bi_{a}$ (11)

Equations (12)–(14) show the effects of energy in relation to the combined composition of hidden and visible neurons $({x,y})$ . The neuron of a visible unit $a$ is denoted by $x_{a}$ , the binary state of a hidden unit $l$ is denoted by $C_{l}$ , and the hidden unit is denoted by $k_{a}$ .

$\displaystyle EN({x,y})=-\sum\limits_{({a,l})}{L_{a,l}}x_{a}y_{l}-\sum\limits_% {a}{k_{a}}x_{a}-\sum\limits_{l}{C_{l}}y_{a}$ (12) $\displaystyle\Delta EN({x_{a},\overline{y}})=\sum\limits_{l}{L_{al}}y_{l}+k_{a}$ (13) $\displaystyle\Delta EN({\vec{x},y_{a}})=\sum\limits_{l}{L_{al}}x_{a}+C_{a}$ (14)

The probability distribution of incoming data is embedded into RBM’s learning pattern’s weight constraints. The distributed probabilities may be obtained using RBM training, and the resulting weight allocation is provided by Eq. (15).

$\displaystyle\hat{L}_{(\hat{z})}=\mathop{\max}\limits_{\hat{L}}\mathop{\prod}% \limits_{\vec{x}\in N}c(\vec{x})$ (15)

The probability distributed RBM model for the hidden and visible vectors pair $(\vec{x},\vec{y})$ is stated in Eq. (16), where $PR^{F}$ represents the partition function as provided in Eq. (17).

$\displaystyle c(\vec{x},\overrightarrow{hi})=\frac{1}{PR^{F}}e^{-EN(\vec{x},% \vec{y})}$ (16) $\displaystyle PR^{F}=\sum\limits_{\vec{x}\vec{y}}{e^{-EN(\vec{x},\vec{y})}}$ (17)

The attack detection performance is based on the accuracy of DBN. The error function “MSE” is computed in DBN to know about the error interrupted within it during the training process. Mathematically, the MSE is given in Eq. (18).

$\displaystyle\textit{MSE}=\textit{Act}-\Pr e$ (18)

Here, Act points to the actual output (i.e. actual attack) and $\Pr e$ denotes the predicted output (i.e. predicted outcome acquired with proposed work). This error is ought to be lower, in order to prove that the projected detection model is more accurate in detecting the attacks during the testing phase. We’ve attempted to lessen this error (MSE) by employing the new SMSLO model (hybrid optimization model). The objective function or fitness function of this work is shown in Eq. (19).

$\displaystyle\textit{Obj}-\min(\textit{MSE})$ (19)

The DBN model makes use of the Contrastive Divergence (CD) learning paradigm, whose phases are shown below.

Step 1:

Choose $x$ training samples and position them in visible neurons.

Step 2:

Determine the likelihood of hidden neurons $c_{y}$ by identifying the product of a visible vector $x$ and $\hat{L}$ weight matrix as $c_{y}=\sigma(x.\hat{L})$ using Eq. (20), where $\sigma$ is the activation function.

$\displaystyle c(\vec{y}_{l}\to 1|\overline{x})=\sigma\left({C_{l}+\sum\limits_% {a}{x_{a}L_{a,l}}}\right)$ (20)

Step 3:

Examine the $c_{y}$ probabilities to find the $y$ hidden states.

Step 4:

Determine the exterior product of the vectors $x$ and $c_{y}$ , which the positive gradient is $\phi^{+}=x.c_{y}^{t^{p}}$ .

Step 5:

Use Eq. (21) to analyze the renovation of $x^{\prime}$ visible states from $y$ hidden states. In addition, the restoration of $x^{\prime}$ necessitates the assessment of $y^{\prime}$ concealed states.

$\displaystyle c(\vec{x}\to 1|\vec{y})=\sigma\left({k_{a}+\sum\limits_{a}{x_{l}% L_{a,l}}}\right)$ (21)

Step 6:

calculate the negative gradient (exterior products) $\phi^{-}=x^{\prime}.y^{{}^{\prime}t^{p}}$ , using the $x^{\prime}$ and $y^{\prime}$ .

Step 7:

Describe the updated weight as given by Eq. (22), in which $\eta$ denotes the learning rate.

$\displaystyle\Delta\hat{L}=\eta(\phi^{+}-\phi^{-})$ (22)

Step 8:

The weight updated using new values is defined by Eq. (23)

$\displaystyle L^{\prime}_{a,l}=\Delta L_{a,l}+L_{a,l}$ (23)

The outcome from DBN will portray the information about the type of attacker (DoS, Botnet, DDoS as well), who has intruded into the cloud computing environment.

Figure 2.

Attack detector (DBN).

4.4.1 Proposed spider monkey updated sealion optimization model

The input to the proposed SMSLO model is the weight of DBN and centroids of the K-means model. The solution encoding is shown in Fig. 3.

Figure 3.

Solution encoding.

The SLnO method is a well-known optimization technique that was inspired by sea lion hunting behavior. Sea lions have many appealing characteristics, including quick mobility, excellent vision, and outstanding hunting ability. Moreover, it is capable of finding the best solutions with higher convergence. In addition, the SMO is a metaheuristic technique informed by the foraging behavior of sophisticated spider monkeys. The foraging activity of spider monkeys is focused on the fission-fusion social structure. The SMO finds global solutions without getting trapped into premature convergence as well as local optima. Interestingly, while hybridizing the optimization models, the convergence of the solution could be increased further [13, 10, 41, 11, 9, 30, 48]. Therefore, we have hybridized these two renowned optimization models (SMO and SLnO) to acquire the most excellent solutions. We have named this new technique as SMSLO. On updating the solutions with the formulated SMSLO model, we can achieve the best global solutions that can accurately portray the type of attack taking place in the cloud environment. The steps followed in the SMSLO are depicted below:

Step 1:

The search agent’s population pop (sea lion and spider monkey) is initialized. In addition, the global leader, perturbation rate prate, and global leader limit are initialized.

Step 2:

The population is evaluated. The $b^{\text{th}}$ spider monkey in $z^{\text{th}}$ dimension is initialized as per Eq. (24).

$\displaystyle SP_{bz}=SP_{\min z}+UD(0,1)\ast SP_{\max j}-SP_{\min z}$ (24)

Here, $SP_{\min z}$ and $UD(0,1)$ points to the lower bound of the search space and the uniformly distributed random number within the dimensions (0, 1). In addition, $SP_{\max j}$ points to the upper bound of the search space.

Step 3:

The local, as well as the global limits, are determined.

Step 4:

Using the local leader phase, the position of the search agent is updated.

The Local Leader Phase (LLP) is the most curial phase of SMO, wherein the spider monkeys updates themselves. Here, the fitness of the search agent is determined using Eq. (19). If the present position of the search agent is better than its older one, then the search agent moves to the newer position. In this phase, we’ve introduced a new updating expression, rather than the existing one. The newly formulated position update is shown in Eq. (25).

$\displaystyle SP(\textit{new})_{bz}=SP_{Hz}+\alpha\ast(SP_{H.y}-LW_{H,z})$ (25)

Here, $SP(\textit{new})_{bz}$ is the new position of the search agent. In addition, $LW_{H,z}$ is the worst position and $\alpha$ is a constant. In addition, $SP_{K.b}$ is the position of the solution in $z^{\text{th}}$ dimension?

Step 5:

Using the global leader phase, the position of the search agent is updated using the SLnO’s attacking phase, rather than the existing SMO model.

On the basis of the selection probability, the solutions are updated. Based on the objective function defined in Eq. (18), the fitness $\textit{Fit}_{b}$ of the search agent is calculated.

On the basis of the roulette wheel selection, compute the selection probability $\textit{prb}_{b}$ as per Eq. (26).

$\displaystyle\textit{prb}_{b}=\frac{\textit{Fit}_{b}}{\sum\limits_{b=1}^{N}{% \textit{Fit}_{b}}}$ (26)

The new position of the search agent per the SLnO’s attacking phase rather than the global update of SMO. This newly formulated is Eq. (27).

$\displaystyle SP(\textit{new})_{bz}=[{\textit{Dist}_{bz}-SP_{bz}}].\cos t(2\pi% .v)+\textit{Dist}_{bz}$ (27)

Here, $\textit{Dist}_{bz}$ is the distance between the $b^{\text{th}}$ global leader’s position and its target in $z^{\text{th}}$ dimension. In addition, $U(-1,1)$ is the uniformly distributed random numbers within the range ( $-$ 1, 1) and arbitrary integer is indicated as $v$ .

Step 6:

Then, using the global leader learning phase, the learning mechanism is undergone.

In this phase, the best solution is determined from the so far acquired solutions. Once, the best position is found, it is said to be the “global leader of the swarm”. In this case, the “Global Limit Count (GLC)” $=$ 0. On the other hand, if the global leader’s position is not updated then increase the “Global Limit Count (GLC)” value by 1.

Step 7:

Then, using the local leader learning phase, the learning mechanism is undergone.

In between the group members, the greedy selection is employed for updating the position of the local leader. On the other hand, when the local leader’s position is not updated, the “Local Limit Count (LLC)” is increased to 1 from 0.

Step 8:

With the “local leader decision phase”, the search agent’s position is updated.

“If any local leader fails to reorganize to a specific boundary, known as the Local Leader Limit, then all members of that group must update their positions either by random initialization or by utilizing the global leader’s experience. This is updated as per Eq. (28)”.

$\displaystyle SP(\textit{new})_{bz}=SP_{\min z}+UD(0,1)\ast(\textit{Glob}_{y}-% SP_{bz}){}+U(-1,1)\ast(SP_{\textit{rand}.y}-\textit{Loc}_{bz})$ (28)

Step 9:

With global leader decision phase, perform Decide fission or fusion.

“If the global leader does not reorganize to a specific borderline known as the Global leader limit, the swarm is divided into smaller groups or fused into a single unified group. GLL is the parameter that determines whether or not there is any premature convergence”.

Step 10:

If the termination condition is met, go to step 5 and announce the global leader position as the best option.

Step 11:

Terminate

Algorithm 2 shows the pseudocode of the proposed SMSLO model.

Algorithm 2: Proposed SMSLO model
Initialize prate, local leader limit, and global leader limit and population is initialized by Eq. (24)
Fitness of the search agent is calculated by Eq. (19)
Selection of global leader and local leader
While until the termination criterion has not been met do
Update the search agent position by using local leader phase as per Eq. (25)
Apply the greedy selection process based on the fitness values
Evaluate the selection probability as per Eq. (26)
Update the search agent position by using global leader phase with the aid of SLnO’s attacking phase as per Eq. (27)
Again apply the greedy selection for all group members and update the local and global leaders
Cycle $=$ cycle $+$ 1
Whenever this condition is happen for a local leader, updation is made as per Eq. (28)
In the same way, when Global Leader becomes stuck, the method provided in Step 9 is used
end while (Result the global leader position as the best solution)

5. Results and discussion

5.1 Simulation procedure

The proposed framework was implemented in the PYTHON. The proposed model (DBN $+$ SMSLO) is tested with multiple data collected from Amazon AWS, traffic from ML, and CIC flow meter. Hence, we utilized the CSE-CIC IDS 2018 dataset (http://www.unb.ca/cic/datasets/ids-2018.html) and take the Brute-force attack, DoS attack, DDoS attack, and Botnet attack for evaluation. This analysis is carried out by varying the learning percentage (LP) from 60 (40% of data was used for training), 70 (30% of data was used for training), 80 (20% of data was used for training), and 90 (10% of data was used for training), respectively. The DBN $+$ SMSLO are compared over the existing works like DNN [3], LSTM [4], DRNN [55], CNN [26], SVM [31], DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO and DBN $+$ SMO, respectively.

5.2 Performance evaluation

The performance of the proposed work in identifying the type of intruder within the cloud computing environment is analyzed in this section. This evaluation is done by varying the LP from 50, 60, 70, and 80 respectively. The positive measures like “accuracy, specificity, sensitivity, and precision” are ought to be sustained at a higher level, for the most favourable results. The error measures or native measures are FAR, FRR, ERR, and FDR, which needs to be as low as possible. The F-measure, MCC, and NPV are additional value-added indicators that exhibit the supremacy of the DBN+SMSLO.

Figure 4.

Performance of adopted method over extant models for positive measures.

Figure 5.

Performance of proposed method over traditional models for negative measures.

Figure 6.

Performance of proposed method over traditional models for other measures.

5.2.1 Analysis with respect to positive measures

Figure 4 depicts the results obtained in terms of positive measures. The DBN $+$ SMSLO’s accuracy, which is the most important parameter, is higher for each LP. On observing the 90 ${}^{\text{th}}$ LP, the DBN $+$ SMSLO had achieved the maximal accuracy as 93%, which is 19.06%, 8.53%, 19.92%, 17.51%, 17.25%, 9.98%, 7.31%, 8.9%, 8.39%, and 13.22% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO, and DBN $+$ SMO, respectively. From this single evaluation alone, it is vivid that the DBN $+$ SMSLO is much sufficient for identifying the type of attacker intruded in the cloud environment. In addition, the sensitivity of the DBN $+$ SMSLO is much higher than all the existing models’ for each variation in the LP. At 60 ${}^{\text{th}}$ LP, the sensitivity of the DBN $+$ SMSLO is 82%, while the sensitivities recorded by the existing works are DNN $=$ 0.46%, LSTM $=$ 0.62%, DRNN $=$ 0.52%, CNN $=$ 0.31%, SVM $=$ 0.42%, DBN $=$ 0.62%, DBN $+$ WOA $=$ 0.6%, DBN $+$ MFO $=$ 0.65%, DBN $+$ SLnO $=$ 0.63% and DBN $+$ SMO $=$ 0.62%. In addition to this, the specificity and precision of the DBN $+$ SMSLO achieve the highest value. At 80 ${}^{\text{th}}$ LP, the precision of the DBN $+$ SMSLO is 26.62%, 43.63%, 24.86%, 66.15%, 47.43%, 23.28%, 22.05%, 23.68%, 26.05%, and 15.89% better than the better than the propose work with DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO and DBN $+$ SMO models, respectively. From the evaluation, it is clear that the DBN $+$ SMSLO had achieved the highest performance in the case of positive measures, and so it is suggested as the best approach for the type of attack detection in the cloud environment.

5.2.2 Analysis with respect to negative measures

The results acquired under negative measures like FNR, FDR, and FPR by the DBN $+$ SMSLO and the existing models are shown in Fig. 5. The FDR of the DBN $+$ SMSLO is the lowest value and it records the value approximately below 0.2. On observing the 70% of learning, the FDR of the DBN $+$ SMSLO is 69.17%, 51.95%, 36.21%, 38.33%, 67.54%, 48.61%, 65.74%, 42.19%, 56.98%, and 50.93% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO and DBN $+$ SMO, respectively. In addition, the FNR of the proposed work at 80 ${}^{\text{th}}$ LP is 0.19, which’s lower than DNN $=$ 0.43, LSTM $=$ 0.54, DRNN $=$ 0.39, CNN $=$ 0.73, SVM $=$ 0.57, DBN $=$ 0.37, DBN $+$ WOA $=$ 0.43, DBN $+$ MFO $=$ 0.38, DBN $+$ SLnO $=$ 0.32 and DBN $+$ SMO $=$ 0.39. Further, the FPR of the proposed work is the lowest value and it is below the range of 0.04 for every variation in the LP. At 60 ${}^{\text{th}}$ LP, the FPR of the proposed work is 65.63%, 51.63%, 61.46%, 73.21%, 67.89%, 48.67%, 51.63%, 41.97%, 31.09%, and 45.94% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO, and DBN $+$ SMO, respectively. Thus, from the evaluation, it is obvious that the DBN $+$ SMSLO had achieved the least values for error measures.

5.2.3 Analysis on other measures

The F1-score, MCC, and NPV are additional value-added indicators that exhibit the supremacy of the DBN $+$ SMSLO. The outcomes acquired with the other measures are shown in Fig. 6. These F1-score, MCC, and NPV are found to be higher with the DBN $+$ SMSLO for every variation in the LP. Thus, from the overall evaluation, a clear conclusion can be derived that the DBN $+$ SMSLO is much appropriate for detecting the type of attacks in the cloud environment.

5.3 Overall performace evaluation

The overall performance of the proposed work is tabulated in Table 3. The overall sensitivity of the proposed work is 0.812, which is 51.12%, 24.75%, 13.18%, 26.11%, 47.2%, 23%, 23%, 25.97%, 19.5% and 21.9% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO and DBN $+$ SMO, respectively. In addition, the DBN $+$ SMSLO had archived the maximal specificity as 95.2%. The overall accuracy of the DBN $+$ SMSLO is 0.924667, which is better than DNN $=$ 0.758, LSTM $=$ 0.845, DRNN $=$ 0.882, CNN $=$ 0.6, SVM $=$ 0.771, DBN $=$ 0.843, DBN $+$ WOA $=$ 0.848, DBN $+$ MFO $=$ 0.803, DBN $+$ SLnO $=$ 0.833 and DBN $+$ SMO $=$ 0.858. The NPV of the DBN $+$ SMSLO is 0.953, which is 10.91%, 5.25%, 2.83%, 21.3%, 10.5%, 15.5%, 5.02%, 6.5%, 10.7% and 12.5% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN $+$ WOA, DBN $+$ MFO, DBN $+$ SLnO and DBN $+$ SMO, respectively. In addition, the DBN $+$ SMSLO ha archived the least error measures. The FPR of the DBN $+$ SMSLO is 0.047, which is the least value when compared to DNN $=$ 0.151, LSTM $=$ 0.097, DRNN $=$ 0.074, CNN $=$ 0.633, SVM $=$ 0.143, DBN $=$ 0.085, DBN $+$ WOA $=$ 0.089, DBN $+$ MFO $=$ 0.095, DBN $+$ SLnO $=$ 0.079 and DBN $+$ SMO $=$ 0.067. From the overall evaluation, it is clear that the DBN $+$ SMSLO had achieved the optimal values, thereby the DBN $+$ SMSLO had become much sufficient for detecting the types of attackers in the cloud environment.

Table 3
Overall performance of the proposed method over traditional models

Measures	DNN [3]	LSTM [4]	DRNN [55]	CNN [26]	SVM [17]	DBN	DBN $+$ WOA	DBN $+$ MFO	DBN $+$ SLnO	DBN $+$ SMO	DBN $+$ SMSLO
Sensitivity	0.40	0.61	0.71	0.60	0.43	0.62	0.62	0.60	0.65	0.63	0.81
Specificity	0.85	0.90	0.93	0.75	0.86	0.85	0.91	0.89	0.90	0.83	0.95
Accuracy	0.76	0.85	0.88	0.60	0.77	0.84	0.85	0.80	0.83	0.86	0.93
Precision	0.40	0.61	0.71	0.65	0.43	0.63	0.62	0.60	0.68	0.62	0.81
F-measure	0.40	0.61	0.71	0.61	0.43	0.62	0.52	0.63	0.68	0.62	0.81
MCC	0.25	0.51	0.63	0.63	0.29	0.55	0.53	0.52	0.50	0.57	0.77
NPV	0.85	0.90	0.93	0.75	0.86	0.81	0.91	0.89	0.85	0.83	0.95
FPR	0.15	0.10	0.07	0.63	0.14	0.09	0.09	0.10	0.08	0.07	0.05
FNR	0.60	0.39	0.30	0.30	0.57	0.44	0.38	0.32	0.39	0.37	0.19
FDR	0.60	0.39	0.30	0.30	0.57	0.36	0.54	0.32	0.44	0.38	0.19

5.4 Discussion

The various measures analyzed in the result part show the efficiency of the proposed SMSLO model. In this work, the optimization method SMSLO is deployed to enhance accuracy and to minimize the overhead [26, 4, 55]. The results show that our proposed method attain good accuracy when compared to [3, 45, 26, 4, 55]. Moreover, the SMSLO method also provides good F ${}_{1}$ -score and error measures than [3, 24]. Many existing works are introduced for intrusion detection in the cloud. However, there are common problems like low accuracy, high overhead, and unsuitable for all attack types [26, 4, 55].

In contrast, our proposed model detects five types of attacks in the CSE-CIC-IDS 2018 dataset. In the future, our work can be extended to detect more attacks and cross-validation to improve the detection performance.

6. Conclusion

In this research work, a new IDS model for the cloud environment was introduced with 4 phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Initially, the collected raw data were pre-processed to enhance the quality of the data. Then, these pre-processed data are segmented with the newly introduced improved K-means clustering model, wherein the centroids were selected optimally with SMSLO model. At the end of segmentation, two clusters (attack data and non-attack data) will be formed. The data are available in both clusters dimensionally reduced with PCA. Therefore, the data gets escaped from the “curse of dimensionality” issue. Subsequently, these dimensionality-reduced data features pass into the attack detection phase. The attacker type detection phase is modeled with the optimized DBN, whose weights are fine-tuned using SMSLO model. The overall accuracy of the DBN $+$ SMSLO is 0.925%, which is better than DNN $=$ 0.758%, LSTM $=$ 0.845%, DRNN $=$ 0.882%, CNN $=$ 0.6%, SVM $=$ 0.771%, DBN $=$ 0.843%, DBN $+$ WOA $=$ 0.848%, DBN $+$ MFO $=$ 0.803%, DBN $+$ SLnO $=$ 0.833% and DBN $+$ SMO $=$ 0.858%. From the overall evaluation, it is clear that the DBN $+$ SMSLO had achieved the optimal values, thereby the DBN $+$ SMSLO had become much sufficient for detecting the types of attackers in the cloud environment.

Footnotes

Author’s Bios

	Dr. V. Murali Mohan, working as an Assistant Professor, from the depatrment of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, A.P., India. Doctorate from KLEF, India, Masters in CSE from JNTU Hyderabad, T.S., India. Having 14* years of experience in both academic and corporate sectors. Cloud Computing is major research area. Currently working on Machine Learning approach in the field of Cloud Computiong, IoT. Published 10* in various International journals,conferences 4* Patents, published book and chapters etc..
	Balajee R M, working as an Assistant Professor in KL University, Gutur, Andhra Pradesh, India. I had published 19 International Journals still now. Published a book in “Mobile Computing” for the Anna University Syllabus in the year 2016. One copy write had been registered in my name for an ERP system (Maintaining cash flow of educational institution) in government of India in the year 2018. I had done one Guinness and Asia Book of Record by organization a “Handicapped Awareness Program” in the year 2018.
	Hiren K Mewada has obtained his MTech and PhD degree from SardarVallbhbhai National Institute of Technology – Surat, Gujarat, India. Presently he is Assistant Research Professor at Prince Mohammad Bin Fahd University, Kingdom of Saudi Arabia. Previously, he served as an associate professor at Charotar University of Science and Technology, Gujarat, India. He has more than 20 years of teaching experience. His current areas of interest are computer vision, signal processing, machine learning and Embedded System design. He has published more than 60 research papers and completed several funded research projects. He is co-author of one book and published five book chapters. He is a member of IETE and ISTE.
	B R Rajakumar received his B.E., in Electronics and Communication Engineering from Cape Institute of Technology affiliated to Anna University, Chennai and he is pursuing his master degree in Applied Electronics from Rohini College of Engineering and Technology affiliated to Anna University, Chennai. His interested research domains are: Soft computing, natural computing, and Artificial intelligence.

Binu D received his B.E., degree in Electronics and Communication Engineering from St-Xavier’s Catholic College of Engineering affiliated to Anna University, Chennai and M.E., in Applied Electronics from Tamizhan College of Engineering and Technology affiliated to Anna University, Chennai. His current research includes Data mining, Image processing, and communication systems. He has published a lot of research papers in various international journals, and he has about 10 years of experience in the field of research.

References

Thirumalairaj

and Jeyakarthic

, An intelligent feature selection with optimal neural network based network intrusion detection system for cloud environment, International Journal of Engineering and Advanced Technology (IJEAT) 9(3) (2020).

Bakshi

and Dujodwala

Y.B.

, Securing cloud from ddos attacks using intrusion detection system in virtual machine, in: 2010 Second International Conference on Communication Software and Networks, IEEE, 2010, pp. 260–264.

Bhardwaj

Mangat

and Vig

, Hyperband tuned deep neural network with well posed stacked sparse autoencoder for detection of DDoS attacks in cloud, IEEE Access 8 (2020), 181916–181929. doi: 10.1109/ACCESS.2020.3028690.

Boukhalfa

Abdellaoui

Hmina

and Chaoui

, LSTM deep learning method for network intrusion detection system, International Journal of Electrical and Computer Engineering (2088–8708) 10(3) (2020 Jun 15).

Sahi

Lai

and Diykh

, An efficient DDoS TCP flood attack detection and prevention system in a cloud environment, IEEE Access 5 (2017), 6036–6048. doi: 10.1109/ACCESS.2017.2688460.

Dastjerdi

A.V.

Bakar

K.A.

and Tabatabaei

S.G.

, Distributed intrusion detection in clouds using mobile agents, in: 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences, IEEE, 2009 Oct 11, pp. 175–180.

Hajimirzaei

and Navimipour

N.J.

, Intrusion detection for cloud computing using neural networks and artificial bee colony optimization algorithm, ICT Express 5(1) (2019), 56–59.

Kuqi

Elezaj

Millaku

Dreshaj

and Hung

N.T.

, The impact of COVID-19 (SARS-CoV-2) in tourism industry: evidence of Kosovo during Q1, Q2 and Q3 period of (2020), Journal of Sustainable Finance and Investment, 2021, 1–2.

Rajakumar

B.R.

and Aloysius George, A New Adaptive Mutation Technique for Genetic Algorithm, in: Proceedings of IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Coimbatore, India, 2012 December 18–20, pp. 1–7. doi: 10.1109/ICCIC.2012.6510293.

10.

Rajakumar

B.R.

, Optimization using lion algorithm: A biological inspiration from lion’s social behavior, Evolutionary Intelligence, Special Issue on Nature inspired algorithms for high performance computing in computer vision, 11(1–2) (2018), 31–52. doi: 10.1007/s12065-018-0168-y.

11.

Rajakumar

B.R.

, Static and adaptive mutation techniques for genetic algorithm: A systematic comparative analysis, International Journal of Computational Science and Engineering 8(2) (2013), 180–193. doi: 10.1504/IJCSE.2013.053087.

12.

Rajakumar

B.R.

, The Lion’s Algorithm: A New Nature Inspired Search Algorithm, Procedia Technology-2nd International Conference on Communication, Computing and Security, 6 (2012), 126–135. doi: 10.1016/j.protcy.2012.10.016 (Elsevier).

13.

Brammya and Angelin Deepa

, Job sceduling in cloud environment using lion algorithm, Journal of Networking and Communication Systems 2(1) (2019), 1–14.

14.

Anglano

Gaeta

and Grangetto

, Securing coding-based cloud storage against pollution attacks, IEEE Transactions on Parallel and Distributed Systems 28(5) (2017), 1457–1469.

15.

Ashok Kumar

and Vimala

, Load balancing in cloud environment exploiting hybridization of chicken swarm and enhanced raven roosting optimization algorithm, Multimedia Research 3(1) (2020), 45–55.

16.

Mazzariello

Bifulco

and Canonico

, Integrating a network ids into an open source cloud computing environment, in: 2010 Sixth International Conference on Information Assurance and Security, IEEE, 2010, pp. 265–270.

17.

Lee

E.K.

Viswanathan

and Pompili

, Model-based thermal anomaly detection in cloud datacenters using thermal imaging, IEEE Transactions on Cloud Computing 6(2) (2018), 330–343.

18.

Fontaine

Kappler

Shahid

and De Poorter

, Log-based intrusion detection for cloud web applications using machine learning, in: International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Springer, Cham, 2019, pp. 197–210.

19.

S.X.

Zhang

and Li

, Neural networks-aided insider attack detection for the average consensus algorithm, IEEE Access, 8 (2020), 51871–51883.

20.

Somani

Gaur

M.S.

Sanghi

Conti

and Rajarajan

, Scale inside-out: Rapid mitigation of cloud DDoS attacks, IEEE Transactions on Dependable and Secure Computing 15(6) (2018), 959–973.

21.

Bojja

G.R.

Ofori

Liu

and Ambati

L.S.

, Early public outlook on the coronavirus disease (COVID-19): A social media study.

22.

Chen

Meng

Shan

and Bhargava

B.K.

, A Novel Low-Rate Denial of Service Attack Detection Approach in ZigBee Wireless Sensor Network by Combining Hilbert-Huang Transformation and Trust Evaluation, IEEE Access 7 (2019), 32853–32866. doi: 10.1109/ACCESS.2019.2903816.

23.

Ding

Yang

J.Y.

and Xiong

, Bayes-based ARP attack detection algorithm for cloud centers, Tsinghua Science and Technology 21(1) (2016), 17–28.

24.

Pillutla

and Arjunan

, Fuzzy self organizing maps-based DDoS mitigation mechanism for software defined networking in cloud computing, Journal of Ambient Intelligence and Humanized Computing 10 (2019), 1547–1559.

25.

Devagnanam

and Elango

N.M.

, Optimal resource allocation of cluster using hybrid grey wolf and cuckoo search algorithm in cloud computing, Journal of Networking and Communication Systems 3(1) (2020), 31–40.

26.

Kim

Shim

and Choi

, CNN-based network intrusion detection against denial-of-service attacks, Electronics 9(6) (2020), 916.

27.

Garre

J.T.M.

Pérez

M.G.

and Ruiz-Martínez

, A novel Machine Learning-based approach for the detection of SSH botnet infection, Future Generation Computer Systems, 2020.

28.

Virupakshar

K.B.

Asundi

and Narayan

D.G.

, Distributed Denial of Service (DDoS) Attacks Detection System for OpenStack-based Private Cloud, Procedia Computer Science, 2020.

29.

Bhushan

and Gupta

B.B.

, Network flow analysis for detection and mitigation of Fraudulent Resource Consumption (FRC) attacks in multimedia cloud computing, Multimedia Tools and Applications 78 (2019), 4267–4298.

30.

Gayathri Devi

K.S.

, Hybrid genetic algorithm and particle swarm optimization algorithm for optimal power flow in power system, journal of computational mechanics, Power System and Control 2(2) (2019), 31–37.

31.

Joseph

and Mukesh

, Detection of malware attacks on virtual machines for a self-heal approach in cloud computing using VM snapshots, Journal Of Communications Software And Systems 14(3) (2018).

32.

Sanjay Ram

, Secure cloud computing based on mutual intrusion detection system, International Journal of Computer Application 1(2) (2012), 57–67.

33.

Mohan Murali

and Satyanarayana

K.V.V.

, The Contemporary Affirmation of Taxonomy and Recent Literature on Workflow Scheduling and Management in Cloud Computing, Global Journal of Computer Science and Technology, 2016.

34.

Mohan

V.M.

and Satyanarayana

K.V.V.

, Efficient task scheduling strategy towards QOS aware optimal resource utilization in cloud computing, journal of theoretical and applied information technology, Journal of Theoretical and Applied Information Technology 80(1) (2015).

35.

Aboueata

Alrasbi

Erbad

Kassler

and Bhamare

, Supervised machine learning techniques for efficient network intrusion detection, in: 2019 28th International Conference on Computer Communication and Networks (ICCCN), IEEE, 2019 Jul 29, pp. 1–8.

36.

Agrawal

and Tapaswi

, Defense mechanisms against DDoS attacks in a cloud computing environment: State-of-the-art and research challenges, IEEE Communications Surveys and Tutorials 21(4) (2019), 3769–3795, Fourthquarter.

37.

Bharot

Verma

Sharma

and Suraparaju

, Distributed denial-of-service attack detection and mitigation using feature selection and intensive care request processing unit, Arabian Journal for Science and Engineering 43 (2018), 959–967.

38.

Ravi

and Shalinie

S.M.

, Learning-driven detection and mitigation of DDoS attack in IoT via SDN-Cloud architecture, IEEE Internet of Things Journal 7(4) (2020), 3559–3570. doi: 10.1109/JIOT.2020.2973176.

39.

Ravi

and Shalinie

S.M.

, Learning-driven detection and mitigation of DDoS attack in IoT via SDN-Cloud architecture, IEEE Internet of Things Journal 7(4) (2020), 3559–3570.

40.

Veeraiah

and Dr. B.T. Krishna, Intrusion detection based on piecewise fuzzy c-means clustering and fuzzy naive bayes rule, Multimedia Research 1(1) (2018), 27–32.

41.

Ninu Preetha

N.S.

Brammya

Ramya

Praveena

Binu

and Rajakumar

B.R.

, Grey wolf optimisation-based feature selection and classification for facial emotion recognition, IET Biometrics 7(5) (2018), 490–499. doi: 10.1049/iet-bmt.2017.0160.

42.

Alkadi

Moustafa

and Turnbull

, A review of intrusion detection and blockchain applications in the cloud: Approaches, challenges and solutions, IEEE Access 8 (2020), 104893–104917.

43.

Alkadi

Moustafa

Turnbull

and Choo

K.R.

, A Deep Blockchain Framework-enabled Collaborative Intrusion Detection for Protecting IoT and Cloud Networks, IEEE Internet of Things Journal.

44.

AlKadi

Moustafa

Turnbull

and Choo

K.R.

, Mixture localization-based outliers models for securing data migration in cloud centers, IEEE Access 7 (2019), 114607–114618.

45.

Harikrishna

and Amuthan

, SDN-based DDoS attack mitigation scheme using convolution recursively enhanced self organizing maps, Sādhanā 45 (2020).

46.

Mishra

Varadharajan

Pilli

E.S.

and Tupakula

, VMGuard: A VMI-Based Security Architecture for Intrusion Detection in Cloud Environment, IEEE Transactions on Cloud Computing 8(3) (2020), 957–971.

47.

Thomas

and Rangachar

M.J.S.

, Hybrid optimization based DBN for face recognition using low-resolution images, Multimedia Research 1(1) (2018), 33–43.

48.

Roy

R.G.

, Rescheduling based congestion management method using hybrid Grey Wolf optimization-grasshopper optimization algorithm in power system, J. Comput. Mech., Power Syst. Control 2(1) (2019), 9–18.

49.

Anand

, Intrusion detection system for wireless mesh networks via improved whale optimization, Journal of Networking and Communication Systems 3(4) (2020).

50.

Dong

Abbas

and Jain

, A survey on distributed denial of service (DDoS) attacks in SDN and cloud computing environments, IEEE Access 7 (2019), 80813–80828.

51.

Roschke

Cheng

and Meinel

, An extensible and virtualization-compatible IDS management architecture, in: 2009 Fifth International Conference on Information Assurance and Security, Vol. 2, 2009, pp. 130–134.

52.

Shamshirband

and Antonio Pescapè

M.F.

, Computational intelligence intrusion detection techniques in mobile cloud computing environments: Review, taxonomy, and open research issues, Journal of Information Security and Applications, 2020.

53.

Sindhu

Reddy

and Shyam

G.K.

, A machine learning based attack detection and mitigation using a secure SaaS framework, Journal of King Saud University – Computer and Information Sciences, 2020.

54.

Phan

T.V.

and Park

, Efficient distributed denial-of-service attack defense in SDN-Based cloud, IEEE Access 7 (2019), 18701–18714.

55.

Tang

T.A.

McLernon

Mhamdi

Zaidi

S.A.

and Ghogho

, Intrusion detection in sdn-based networks: Deep recurrent neural network approach, in: Deep Learning Applications for Cyber Security, Springer, Cham, 2019, pp. 175–195.

56.

Tabrizchi

and Rafsanjani

M.K.

, A survey on security challenges in cloud computing: Issues, threats, and solutions, The Journal of Supercomputing 76(2) (2020), 9493–9532.

57.

Murali Mohan

and Satyanarayana

K.V.V.

, Multi-Objective Optimization of Composing Tasks from Distributed Workflows in Cloud Computing Networks, Advances in Intelligent Systems and Computing Volume 1090, Proceedings of the Third International Conference on Computational Intelligence and Informatics ICCII (2018) ISSN 2194-5357 ISSN 2194-5365 (electronic)

58.

Murali Mohan

and Satyanarayana

K.V.V.

, Resource planning and allocation in distributed cloud networks using voids in scheduled intervals, International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277–3878 8 (2019) Issue-2S8.

59.

Murali Mohan

and Satyanarayana

K.V.V.

, Task and resource pairing by batch scheduling in cloud, Jour of Adv Research in Dynamical and Control Systems 10 (2018), ISSN 1943-023X.

60.

Netaji

V.K.

and Bhole

G.P.

, Optimal container resource allocation using hybrid SA-MFO algorithm in cloud architecture, Multimedia Research 3(1) (2020), 11–20.

61.

Lakshmi Lalitha

, Dr. S. Hrushikesava Raju Vijaya Krishna

, Dr. V. Murali Mohan, Customized Smart Object Detection: Statistics of detected objects using IoT, presented in ICAIS – (2021).

62.

Mohan

V.M.

and Satyanarayana

K.V.V.

, Application level resource scheduling with optimal schedule interval filling (RS-OSIF) for distributed cloud computing environments, International Journal of Applied Engineering Research 12(24) (2017), 15746–15753.

63.

Wang

Shan

Qin

and Wang

, Cloud Intrusion Detection Method Based on Stacked Contractive Auto-Encoder and Support Vector Machine, IEEE Transactions on Cloud Computing. doi: 10.1109/TCC.2020.3001017.

64.

Zhijun

Wenjing

Liang

and Meng

, Low-rate DoS attacks, detection, defense, and challenges: A survey, IEEE Access 8 (2020), 43920–43943. doi: 10.1109/ACCESS.2020.2976609.

65.

Tian

Luo

Qiu

and Guizani

, A distributed deep learning system for web attack detection on edge devices, IEEE Transactions on Industrial Informatics 16(3) (2020), 1963–1971.

Hybrid machine learning approach based intrusion detection in cloud: A metaheuristic assisted model

Abstract

Keywords

1. Introduction

Table 2 Reviews on conventional Intrusion Detection System in cloud environment

3.1 Architectural description

4. Description of proposed model

4.1 Pre-processing

4.2 Improved k-means clustering

5.1 Simulation procedure

5.2 Performance evaluation

5.2.2 Analysis with respect to negative measures

5.2.3 Analysis on other measures

5.3 Overall performace evaluation

Table 3 Overall performance of the proposed method over traditional models

6. Conclusion

Footnotes

Author’s Bios

References

Table 2
Reviews on conventional Intrusion Detection System in cloud environment

Table 3
Overall performance of the proposed method over traditional models