Abstract
Cloud computing provides various cost-effective on-demand services to the user and so it is rising like a real trend in the IT service model. However, due to its open and distributed architecture, it is highly vulnerable to attackers. The security and privacy of cloud users has become a major hurdle. The most prevalent approach for detecting attacks on the cloud is the Intrusion Detection System (IDS). Scalability and autonomous self-adaptation weren’t features of contemporary IDS deployed in traditional Internet or Intranet contexts. Furthermore, they lack determinism, making them inappropriate for cloud-based settings. This necessitates the development of new cloud-based IDS capable of fulfilling the firm’s security standards. Therefore, in this research work, we have introduced a new IDS model for the cloud environment. Our research work is made up of four major phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Initially, the collected raw data are pre-processed to enhance the quality of the data. Then, these pre-processed data are segmented with the newly introduced K-means clustering model, where we’ve optimally selected the centroids by introducing a new hybrid optimization model referred as Spider Monkey Updated with Sealion Optimization (SMSLO), which is the conceptual hybridization of standard SeaLion Optimization (SLnO) and Spider Monkey Optimization (SMO), respectively. At the end of segmentation, two clusters (attack data and non-attack data) will be formed. The data available in both clusters seems to be huge in dimensions, so we’ve lessened the dimensions of the data in the clusters by applying the “Principal Component Analysis (PCA)” algorithm. Subsequently, these dimensionality-reduced features pass into the attack detection phase. The attack detection phase is modeled with the optimized Deep Belief Network (DBN), which portrays the type of attack (Dos, Botnet, DDoS as well) that intruded into the network. Since the DBN makes the final detections; it is ought to be less prone to errors. We have lessened the detection errors such as the Mean Square Error (MSE) of DBN by fine-tuning its weight using a new hybrid optimization model (SMSLO). Finally, the result acquired from the proposed work (DBN
Keywords
Introduction
Digital devices, including smartphones and tablets, have become more and more ubiquitous in human life over the decades [34, 33, 64]. These smart devices are typically wire-free, allowing users to access their data (in multimedia format) and applications from anywhere at any time via the internet [59, 57, 58, 61]. The rapid increase in global internet usage requires a new way to manage the size, variety, and availability of data, which is CLOUD COMPUTING [54, 41, 40]. Cloud computing is a rapidly growing technology that enables users to access dependable, on-demand, and scalable resources at any time while incurring fewer infrastructure costs. In today’s technological environment, CLOUD COMPUTING [65, 53, 15, 47] is gaining huge attention among organizations. Furthermore, this technology seems to be the distribution of computer resources that allows users to easily access “servers, storage, databases, networking, software, analytics, and intelligence” through the internet and maintains all application services’ network-attached hardware through a web application [20, 23]. The companies like Amazon and Google have their own clouds and have taken their operations over them [28, 36, 39]. Moreover, during the COVID-19 crisis [21, 8], cloud computing enables coordination, connectivity, and crucial internet services.
Computer networking is a term describing the access of networking resources from a centralized third-party provider using Wide Area Networking (WAN) or Internet-based access technologies. Cloud computing refers to the provision of computational resources on demand via a computer network. Intrusion detection in Cloud computing process of monitoring the events occurring in a computer system or network and analyzing them for signs of intrusions, defined as attempts to compromise the confidentiality, integrity, availability, or to bypass the security mechanisms of a computer or network. Despite the rise in cloud adoption, cyber security continues to be a major bottleneck that decreases network performance and, as a result, affects trustworthiness [28, 43, 44]. Information security, computer security, and network security are all sub-domains of cloud computing security [25, 34]. Users begin to be concerned about data security as the majority of the government and private sectors rely on the cloud as the only source of faster data transmission. “According to a recent report by insight, most organizations are moving to the cloud for efficiency, but many still have security concerns” [46, 50, 52, 54, 27]. Some distinctive security issues with cloud computing are misconfiguration, external sharing of information, insecure interfaces, hijacking of accounts, lack of visibility, unauthorized access, malicious insiders, cyberattacks, spoofing, tampering, repudiation, information disclosure, the elevation of privilege, embedded malicious code, protocol manipulation and so on [56]. Furthermore, an attacker gains unauthorized access to computers and installs malware like a Trojan horse on them. DDoS assaults, data theft, and unauthorized access are among the most prevalent attack [54, 22, 64]. The majority of traditional security technologies identify dangers based on a database of prior malware instances [36, 39, 42].
Today, organizations are looking for a solution that would allow them to send out notifications to their employees alerting them towards the presence of a proactive vulnerability on their website, portal, or application [64]. The system is updated at several layers of the application environment, including hardware, network, and applications, at various levels of the stack [20, 23, 27, 28]. Identifying the attack nodes in the system is the greatest strategy to protect data against theft, leakage, and elimination [45, 14, 17, 19]. Although cloud customers may now deploy “off-the-shelf Intrusion Detection Systems (IDSs)” on tenant networks and virtual instances, these technologies have been constrained in terms of coverage and cannot detect specific changes in the cloud hosting environment. IDS’ most popular detection methods rely on signature-based threats as well as user behavior [40, 49]. Scalability and autonomous self-adaptation aren’t the features of contemporary IDS deployed in traditional Internet or Intranet contexts. Furthermore, they lack determinism, making them inappropriate for cloud-based settings. This necessitates the development of new cloud-based IDS capable of fulfilling the company’s security standards. As the architecture of cloud computing is distinct from existing computing technologies like Grid computing, the deployment of presently available IDS and Prevention Systems (PS) cannot meet the appropriate degree of security and reliability.
In early, virtual machine-based IDS is introduced [51] and this system is managed by a remote controller. Also in [2], a virtual machine-based IDS for DDoS is presented. Although, both methods necessitate a separate instance of the IDS for each virtual machine and this system could only find out the well-known attacks. To overcome this demerit, snort-based IDS has been presented [16] but this approach is capable of detecting known threats because it relied heavily on snort. Furthermore, the idea of mobile agents is introduced [6] for IDS to keep the external virtual machine safe. The disadvantage of this strategy is that it has a high overhead. In [32], a mutual agent-based method is presented to determine the DDoS and there are considerable computing expenses for transmitting warning information. Recently, fuzzy clustering and Artificial Bee Colony (ABC) with Artificial Neural Network (ANN) are presented in [7] for IDS. This technique reduces the error but the hybrid combination is too costly. Furthermore, an optimized Neural Network (NN) with Improved Particle Swarm Optimization (IPSO) is deployed in IDS for new feature selection [1]. However, this method still needs a deep learning model to enhance performance. The description of the abbreviation is illustrated in Table 1.
To attain a marvellous attack detection model, the deep learning algorithms with optimization concepts are being the best detection model. The major contribution of this research work is:
Introduces a new Improved K-means clustering model to segment the pre-processed data into the attack and non-attack clusters. In the K-means model, the centroids are optimally tuned via the new Spider Monkey Updated with Sealion Optimization (SMSLO) model. Introduces a new weight-optimized Deep Belief Network (DBN) to reduce detection errors. The weight of DBN is fine-tuned by a new SMSLO model. The SLnO and SMO algorithm is hybridized as the proposed SMSLO model to improve the convergence and produce excellent solutions.
The rest of this paper is organized as: Section 2 addresses the literature works done in the arrack detection models in “cloud computing environment”. Section 3 describes the proposed IDS model in the cloud: an overview. Section 4 depicts the pre-processing and improved K-means-based segmentation phase. In addition, Section 5 manifests the Dimensionality reduction and Attack detection model. The results acquired with the proposed model are discussed in Section 6. Finally, this paper is concluded in Section 7.
Nomenclature
In 2020, Bhardwaj et al. [3] have suggested a new method for detecting DDoS cyberattacks in the cloud. Combining the stacked “sparse AE” for feature extraction and the DNN for classification, the suggested architecture was constructed. Furthermore, the parameters of AE and DNN are fine-tuned with the aid of an appropriate tuning model. As a result, the suggested model dealt with the problem of overfitting and resulted in a low reconstruction error.
In 2017, Sahi et al. [5] have built a classifier system for identifying and combating DDoS Transmission Control Protocol (TCP) flood attacks in the public clouds. The suggested Classifier System (CS) with DDoS categorized the arriving packets in order to secure the stored records for making decisions regarding the existence or absence of an attack based on the classified outcomes. The data packets are evaluated for attacks in the detection phase, and the packets that have been found to be malicious were denied access to cloud services in the prevention phase. The CS DDoS model’s performance was substantially accurate and time complicated.
In 2020, Wang et al. [63] have developed a novel method for identifying intrusions into the cloud model. The resilient low-dimensional features were extracted automatically using an effective SCAE approach. An SVM classification approach was used to classify the collected characteristics. The SCAE
In 2020, Ravi and Shalinie [38] have suggested a unique security strategy for dealing with DDoS attacks produced by rogue IoT servers. The authors reduced the DDoS assault on IoT servers by incorporating the SDN architecture into the cloud. The DDoS was also identified and mitigated using a LEDEM. The suggested model’s simulation results showed a better accuracy rate in detecting DDoS attacks.
In 2019, Pillutla and Arjunan [24] have developed a FSOMDM methodology”, in which the neurons in a standard NN were updated using fuzzy rules rather than the usual Kohonen neural network model. The authors have used the software-oriented traffic inspection property in their suggested technique to identify and enable DDoS attacks. The suggested work’s effectiveness investigation revealed greater categorization precision.
In 2018, Bharot et al. [37] have constructed a DDoS attack detection and mitigation model with the assistance of the ICRPU and a feature selection mechanism. After evaluating the traffic using the “Hellinger distance function”, the packets were categorized as genuine request groups or DDoS based on the derived properties. The legitimate requests were then transferred to the Normal Request Processing Unit, while the DDoS demands were sent to the ICRPU. As a result, the suggested work was found to have the highest detection rate and the lowest false alarm rate.
In 2019, Bhushan and Gupta [29] have developed a novel methodology for detecting and mitigating the FRC attack in cloud-based services based on network traffic analysis. The authors used a real-world benchmark to assess the suggested methodology, and as a result, the proposed work had a reduced overhead and greater accuracy.
In 2020, Harikrishna and Amuthan [45] has been designed “Convolution Recursively Enhanced Self-Organizing Map and Software-Defined Networking-based Mitigation Scheme (CRESOM-SDNMS)” to prevent DDoS attacks in the cloud computing environment. To overcome the vector quantization challenges, the authors used a better initialization technique and improved topology preservation in the SOM-based classification process. The submitted technique reduced the FPR during DDoS mitigation, according to simulated trials.
In 2020, Kim et al. [26] have introduced CNN-based IDS for DoS attacks. In the two datasets, we create a Deep Learning (DL)-based detection model for Denial of Service (DoS) assaults. Our model is built on a Convolutional Neural Network (CNN), and they use the CNN-based approach to conduct binary and multiclass categorization. This approach attains 99% or more outcomes in binary and multiclass categorization for KDD and 91.5% of average accuracy at the CSE-CIS-IDS 2018 dataset.
In 2020, Boukhalfa et al. [4] have established a novel concept for a Network Intrusion Detection System (NIDS) relying on Long Short-Term Memory (LSTM) to recognize threats and build a long-term memory on them to prevent current assaults that are similar to existing ones while also having a single means to block intrusions. Based on the findings of our detection trials, the accuracy goes up to 99.98% and 99.93% for the categorization of two classes and multiple classes, respectively.
In 2020, Tang et al. [55] have introduced a RNN with Gated Recurrent Unit (GRU) based IDS in SDN environments and it is known as Deep Recurrent Neural Network (DRNN). In this method, the input was treated as a time series by RNN. Eventually, NSL-KDD and CICIDS2017 datasets were achieved 89% multiclass detection accuracy. Although, the SDN architecture is based on flows but the NSL-KDD is not a flow-based dataset.
In 2019, Fontaine et al. [18] have used machine learning methods to achieve a simplified cloud security system. This contributes to a more specific application that employs Decision Trees (DT) and NN as classifiers, which are trained using data gathered by cloud apps. Such methods are used to evaluate web application logs from a variety of servers. These logs are combined into a single format to make feature extraction easier. The NN has the greatest accuracy of 98.47%, making it feasible to detect attacks on various web services in cloud settings quickly and accurately.
In 2019, Aboueata et al. [35] have investigated the applicability of two excellently machine learning techniques, ANN and SVM, in detecting intrusions or unusual behavior in the cloud. They also use feature engineering to determine the best set of features for achieving the highest accuracy with the least amount of training time and complexity. Their goal was to shorten training time by picking the best set of characteristics while maintaining accuracy. Finally, compare their findings to the previous works. Table 2 lists the characteristics and problems of a cloud-based threat detection approach.
Reviews on conventional Intrusion Detection System in cloud environment
Reviews on conventional Intrusion Detection System in cloud environment
The architecture of the proposed work.
Architectural description
Cloud Computing is becoming the favored choice of any IT firm since it provides consumers with flexible and pay-per-use services. However, due to its open and distributed architecture, which would be vulnerable to attackers, privacy and security are big obstacles to its success. The most prevalent approach for detecting attacks on the cloud is the Intrusion Detection System (IDS). In this research work, a novel IDS is developed by following 4 key phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Figure 1 shows the architecture of the proposed work. In our research, we consider the public cloud with the Application as a Service (AaaS) model. Initially, the collected raw data is pre-processed to enhance its readability by the machine learning models. Then, these pre-processed data is clustered with the newly proposed improved k-means technique. At the end of clustering, two clusters will be formulated: (a) normal and (b) attacks. Since the length of data in the clusters (both normal and attack clusters) seems to be higher; they suffer from the curse of dimensionality issue. Therefore, we have applied the PCA algorithm to lessen the dimensionality of the data in the clusters. Then, with these dimensionally reduce data
Description of proposed model
Pre-processing
Pre-processing is the fundamental step for transforming the raw input data
Improved k-means clustering
The k-means clustering is renowned unsupervised learning that fits data points of
calculates the total squared error of all the objects in the database. Euclidean distance is the normal feature distance, which determines the nearest distance between each cluster core and the data object. Equation (1) gives the Euclidean distance
Cluster centroids with Repeat step 1. Compute the distance among every data object Recomputed the cluster center for all clusters Repeat these steps till no alteration is made in the cluster center. The segmented attack, as well as non-attack data clusters, are together represented as
Algorithm 1 shows the pseudo-code of the improved K-means clustering method.
The attained
Finally, the dimensionality reduced data acquired from PCA is denoted as
The DBN [65] framework, which was first developed in 1986, is a well-known intelligent method. Figure 2 reveals the attack detection model with DBN. It typically has several layers, with the output layer consisting of visible and buried neurons. Furthermore, there is a strong link between input and hidden neurons; yet, no association rules exist between hidden neurons, and no relationships exist between visible neurons. The connection between visible and hidden neurons is exclusive and symmetric. The Boltzmann network’s neurons provide probabilistic output. The output
The “Boltzmann machine energy function” for the generation of the binary state
Equations (12)–(14) show the effects of energy in relation to the combined composition of hidden and visible neurons
The probability distribution of incoming data is embedded into RBM’s learning pattern’s weight constraints. The distributed probabilities may be obtained using RBM training, and the resulting weight allocation is provided by Eq. (15).
The probability distributed RBM model for the hidden and visible vectors pair
The attack detection performance is based on the accuracy of DBN. The error function “MSE” is computed in DBN to know about the error interrupted within it during the training process. Mathematically, the MSE is given in Eq. (18).
Here, Act points to the actual output (i.e. actual attack) and
The DBN model makes use of the Contrastive Divergence (CD) learning paradigm, whose phases are shown below.
Choose Determine the likelihood of hidden neurons
Examine the Determine the exterior product of the vectors Use Eq. (21) to analyze the renovation of
calculate the negative gradient (exterior products) Describe the updated weight as given by Eq. (22), in which
The weight updated using new values is defined by Eq. (23)
The outcome from DBN will portray the information about the type of attacker (DoS, Botnet, DDoS as well), who has intruded into the cloud computing environment.
Attack detector (DBN).
The input to the proposed SMSLO model is the weight of DBN and centroids of the K-means model. The solution encoding is shown in Fig. 3.
Solution encoding.
The SLnO method is a well-known optimization technique that was inspired by sea lion hunting behavior. Sea lions have many appealing characteristics, including quick mobility, excellent vision, and outstanding hunting ability. Moreover, it is capable of finding the best solutions with higher convergence. In addition, the SMO is a metaheuristic technique informed by the foraging behavior of sophisticated spider monkeys. The foraging activity of spider monkeys is focused on the fission-fusion social structure. The SMO finds global solutions without getting trapped into premature convergence as well as local optima. Interestingly, while hybridizing the optimization models, the convergence of the solution could be increased further [13, 10, 41, 11, 9, 30, 48]. Therefore, we have hybridized these two renowned optimization models (SMO and SLnO) to acquire the most excellent solutions. We have named this new technique as SMSLO. On updating the solutions with the formulated SMSLO model, we can achieve the best global solutions that can accurately portray the type of attack taking place in the cloud environment. The steps followed in the SMSLO are depicted below:
The search agent’s population pop (sea lion and spider monkey) is initialized. In addition, the global leader, perturbation rate prate, and global leader limit are initialized. The population is evaluated. The
Here, The local, as well as the global limits, are determined. Using the local leader phase, the position of the search agent is updated. The Local Leader Phase (LLP) is the most curial phase of SMO, wherein the spider monkeys updates themselves. Here, the fitness of the search agent is determined using Eq. (19). If the present position of the search agent is better than its older one, then the search agent moves to the newer position. In this phase, we’ve introduced a new updating expression, rather than the existing one. The newly formulated position update is shown in Eq. (25).
Here, Using the global leader phase, the position of the search agent is updated using the SLnO’s attacking phase, rather than the existing SMO model. On the basis of the selection probability, the solutions are updated. Based on the objective function defined in Eq. (18), the fitness On the basis of the roulette wheel selection, compute the selection probability
The new position of the search agent per the SLnO’s attacking phase rather than the global update of SMO. This newly formulated is Eq. (27).
Here, Then, using the global leader learning phase, the learning mechanism is undergone. In this phase, the best solution is determined from the so far acquired solutions. Once, the best position is found, it is said to be the “global leader of the swarm”. In this case, the “Global Limit Count (GLC)” Then, using the local leader learning phase, the learning mechanism is undergone. In between the group members, the greedy selection is employed for updating the position of the local leader. On the other hand, when the local leader’s position is not updated, the “Local Limit Count (LLC)” is increased to 1 from 0. With the “local leader decision phase”, the search agent’s position is updated. “If any local leader fails to reorganize to a specific boundary, known as the Local Leader Limit, then all members of that group must update their positions either by random initialization or by utilizing the global leader’s experience. This is updated as per Eq. (28)”.
With global leader decision phase, perform Decide fission or fusion. “If the global leader does not reorganize to a specific borderline known as the Global leader limit, the swarm is divided into smaller groups or fused into a single unified group. GLL is the parameter that determines whether or not there is any premature convergence”. If the termination condition is met, go to step 5 and announce the global leader position as the best option. Terminate
Algorithm 2 shows the pseudocode of the proposed SMSLO model.
Simulation procedure
The proposed framework was implemented in the PYTHON. The proposed model (DBN
Performance evaluation
The performance of the proposed work in identifying the type of intruder within the cloud computing environment is analyzed in this section. This evaluation is done by varying the LP from 50, 60, 70, and 80 respectively. The positive measures like “accuracy, specificity, sensitivity, and precision” are ought to be sustained at a higher level, for the most favourable results. The error measures or native measures are FAR, FRR, ERR, and FDR, which needs to be as low as possible. The F-measure, MCC, and NPV are additional value-added indicators that exhibit the supremacy of the DBN+SMSLO.
Performance of adopted method over extant models for positive measures.
Performance of proposed method over traditional models for negative measures.
Performance of proposed method over traditional models for other measures.
Figure 4 depicts the results obtained in terms of positive measures. The DBN
Analysis with respect to negative measures
The results acquired under negative measures like FNR, FDR, and FPR by the DBN
Analysis on other measures
The F1-score, MCC, and NPV are additional value-added indicators that exhibit the supremacy of the DBN
Overall performace evaluation
The overall performance of the proposed work is tabulated in Table 3. The overall sensitivity of the proposed work is 0.812, which is 51.12%, 24.75%, 13.18%, 26.11%, 47.2%, 23%, 23%, 25.97%, 19.5% and 21.9% better than the existing works like DNN, LSTM, DRNN, CNN, SVM, DBN, DBN
Overall performance of the proposed method over traditional models
Overall performance of the proposed method over traditional models
The various measures analyzed in the result part show the efficiency of the proposed SMSLO model. In this work, the optimization method SMSLO is deployed to enhance accuracy and to minimize the overhead [26, 4, 55]. The results show that our proposed method attain good accuracy when compared to [3, 45, 26, 4, 55]. Moreover, the SMSLO method also provides good F
In contrast, our proposed model detects five types of attacks in the CSE-CIC-IDS 2018 dataset. In the future, our work can be extended to detect more attacks and cross-validation to improve the detection performance.
Conclusion
In this research work, a new IDS model for the cloud environment was introduced with 4 phases: “data pre-processing, optimal clustering, feature selection, and attack detection phase”. Initially, the collected raw data were pre-processed to enhance the quality of the data. Then, these pre-processed data are segmented with the newly introduced improved K-means clustering model, wherein the centroids were selected optimally with SMSLO model. At the end of segmentation, two clusters (attack data and non-attack data) will be formed. The data are available in both clusters dimensionally reduced with PCA. Therefore, the data gets escaped from the “curse of dimensionality” issue. Subsequently, these dimensionality-reduced data features pass into the attack detection phase. The attacker type detection phase is modeled with the optimized DBN, whose weights are fine-tuned using SMSLO model. The overall accuracy of the DBN
Footnotes
Author’s Bios
