An intelligent hybrid model for cyber attack classification with selected feature set

Abstract

Cyber security evolving as a severe problem almost in all sectors of cyberspace, due to the time-to-time increase in the number of security breaches. Numerous Zero-days attacks occur continuously, due to the increase in multiple protocols. Almost all of these attacks are small variants of previously known cyber attacks. Moreover, even the advanced approach like Machine Learning (ML), faces the difficulty in identifying those attack’s small mutants over time. Recently, Deep Learning (DL) has been utilized for multiple applications related to cybersecurity fields. Making use of this DL to identify the cyber attack might be a resilient mechanism for novel attacks or tiny mutations. Thereby, a novel cyber attack classification model named DCNN-Bi-LSTM-ICS is proposed in this work. This proposed DCNN-Bi-LSTM-ICS has five working stages. Firstly, in the data acquisition stage, the input data (considering the datasets) for attack classification has been collected. These raw data are pre-processed in the second stage, where an improved class imbalance balancing processing is conducted which makes use of the Improved Synthetic Minority Oversampling Technique (ISMOTE). In the third stage, along with the conventional mutual information and statistical features, Improved holo-entropy-based features are extracted. To choose the appropriate feature from those retrieved features, an Improved Chi-Square (ICS) processing is developed in the fourth stage. In the final classification stage, a hybrid classification model that combines both the Deep Convolutional Neural Network (DCNN) and Bi-directional Long Short Term Memory (Bi-LSTM) has been developed. The outcomes show that the proposed DCNN-Bi-LSTM-ICS can offer outstanding performance in the cyber attack classification task.

Keywords

Machine Learning (ML)Deep Learning (DL)ISMOTE improved holoentropy Improved Chi-Square (ICS) processing Deep Convolutional Neural Network (DCNN)

1. Introduction

Internet’s extreme utilization in multiple areas motivated scientists and researchers to leverage the intelligent models that can assist users. In addition to providing efficient computation, it can also sustain the service quality over the network. Some conventional approaches were less efficient and more time-consuming. To lower the cyber threat’s harmful effects, it’s essential to develop an effective attack detection system. The compilation of multiple security mechanisms, as well as technologies created for securing the network, data, and a program from multiple malicious activities like stealing, data modification, destruction over the network or internet, and unauthorized access, is named Cyber security [1, 2, 3]. Two main concerns of cybersecurity are the network’s security systems and the protection of the host. Recently, numerous areas like Wireless Sensor Networks (WSNs), cloud computing as well as IoT were protected by this. To provide network or system security, multiple security measures like firewalls, antivirus, and Intrusion Detection Systems (IDSs) are available. However, internet services are still disrupted continuously by these cyber threats every day. That is the reason, researchers are providing their contribution to the security system design [4, 5].

Some popular cyber attacks are denial of service (DoS), remote to local attacks, distributed DoS attacks (DDoS), adversarial attacks, user-to-root attacks, probing, botnet, poisoning and evasion attacks, zero-day attacks, spamming as well as phishing attacks. To detect these attacks, 3 categories of attack detection approaches are there, which are misused, anomaly, and hybrid-dependent detection [6, 7, 8].

By the pre-stored attack signatures, misuse-based detection was scanned and is frequently utilized for detecting known attacks. Detecting the known attacks only with minimal false alarms is crucial. This requires a signature’s specific alteration and attack rules on the database. Both unknown and known attacks are recognized by the anomaly-based technique. It can capture the host’s machine behavior along with the network and find the anomalies [9, 10, 11]. Since it is capable of identifying zero-day attacks, it is considered an important method. This approach has multiple advantages. The most important one is profiling actions customization because it makes the attackers confused about what they can do to enter and remain unnoticed. However, with the advantage, this also has the demerits of a higher false alarm rate and misclassification like considering the legitimate action as an anomaly [12, 13]. The third one is the hybrid, which is the fusion of misuse as well as anomaly-based detection. In the phase of detection, it assists higher performance and lower false alarm rate. But for newer attacks, these approaches are not efficient. For attaining efficient computation as well as rapid processing of the network’s complex data, Neural Networks (NNs) are frequently utilized [14, 15]. In this regard, deep network models play a major role in attack detection by training the most important attacker information. Considering that, an intelligent model for cyber attack classification named DCNN-Bi-LSTM-ICS is proposed in this work with the major contributions given below.

•
Proposing the intelligent detection Model that uses ISMOTE to solve the input data’s class imbalance problem.
•
Retrieving Improved holo-entropy-based features along with the mutual information and statistical features.
•
To propose an effective feature selection process for choosing the appropriate features via an improved Chi-square processing.

This paper’s arrangement is made as follows: Section 2, involves the recent cyber attack detection and classification-related publication, Section 3, describes the methodology of the proposed intelligent model for cyber attack classification, Section 4, offers the implementation outcomes, Section 5 explains this work’s conclusion and afterwards few references were provided.
2. Literature survey

A few recent publications about cyber attack detection and classification have been explained below.

In 2022, Shifang Dai et al. [6] focused on fault detection for networked systems with deception attacks. Initially, as a residual generator for the timely detection of randomly occurring fault signals, while taking into account the deception attacks and network delay, a fault detection filter (FDF) was presented.

In 2019, ZyadElkhadir and Benattou Mohammed [16] developed the Intrusion Detection Systems (IDS) were created. Unfortunately, dimensionality issues have occurred in those tools that lead to lower resource utilization along with higher time complexity.

In 2021, Sudhakar Sengan et al. [17] highlighted the integrity of false data cyber-attacks in the physical layers of smart grids. Through an Agent-Based Model (ABM), an attack exposure metric was provided by the developed True Data Integrity (TDI) as the first contribution. Afterwards, with an Agent-based approach, the system’s Data Integrity Security decentralization was focused on by the research.

In 2021, Prabhat Kumar et al. [18] presented an intelligent cyber attack detection system for IoT networks using a novel hybrid feature-reduced approach. By the utilization of the correlation coefficient, the feature ranking was initially performed. A single optimized feature set was obtained by combining the features via a properly designed mechanism.

In 2021, Abdulrahman Saad Alqahtani et al. [19] suggested a new frame for distributed blind intrusion detection by modeling sensor measurements as the graph signal and using the statistical features of the graph signal for the detection of intrusion.

In 2021, Ahmad Ali AlZubi et al. [7] proposed the cognitive machine learning-assisted Attack Detection Framework (CML-ADF) to share healthcare data securely. In distributing the gathered data to cloud storage, the Healthcare Cyber-Physical Systems was more effective. Cyber-attack behavior was predicted by ML approaches and they are processed.

In 2020, Ankang Ju et al. [20] analyzed the current Cyber Kill Chain models and heterogeneous data sources for APT detection. Then MCKC (Modified Cyber Kill Chain model) was proposed that could be used for standardized correlation analysis. The sub-chains were organized as a recursive structure by MCKC and various kill chain penetration processes in a similar attack scenario were linked well.

In 2020, BenamarBouyeddou et al. [8] presented an anomaly detection mechanism using the Kullback-Leibler distance (KLD) to detect DOS and DDOS flooding attacks, including transmission control protocol (TCP) SYN flood, UDP flood, and ICMP-based attacks.

3. Methodology

Recently, with the progress of new linked devices, the internet has enriched in both complexity as well as size. Due to the growing network complexity, the network’s systems are affected by diverse security threats and intrusions. Such threats can be overcome by creating accurate IDS. For the network, IDS can be considered as a 2^nd line of defense and it can also be the network security’s key component. Many issues like higher false positive alerts cannot detect zero-day attacks and higher time consumption for attack detection are suffered by extant IDS. Consequently, this work proposes DCNN-Bi-LSTM-ICS which is an Intelligent Hybrid Model for Cyber Attack Classification. This includes 5 main working stages that were:

•
Data acquisition: Data collection is the initial step. According to the work, benchmark datasets are considered.
•
Preprocessing: The acquired raw data get pre-processed by Improved data imbalance processing using ISMOTE, to solve the class imbalance problem.
•
Feature extraction: From the pre-processed data, Improved entropy (Holoentropy) based features along with mutual information and statistical (maximum, minimum, standard deviation, mean, and median) features are extracted.
•
Feature selection: Subsequently, choosing the appropriate feature from the retrieved features via improved chi-square processing.
•
Classification: For attack classification, a Hybrid classification algorithm combining DCNN and Bi-LSTM is employed in our work. Figure 1 shows the overall representation of the proposed DCNN-Bi-LSTM-ICS.

Figure 1.
Overall representation of the proposed DCNN-Bi-LSTM-ICS model.

3.1 Data acquisition

Our cyber attack classification model’s first step is data acquisition. That means gathering the data from related resources. Our work makes use of two benchmark datasets like NSLKDD Dataset and the CSE-CIC-IDS2018 dataset. Following the data acquisition, the data pre-processing is conducted which is elucidated below.

3.2 Pre-processing

The obtained raw data $DA_{R}$ get subjected to data preprocessing, where $DA_{R}=DA_{R1},DA_{R2},\ldots\linebreak DA_{RN}$ . In our work, improved data balancing processing is conducted in this pre-processing stage to solve the class imbalance problem in the input data. A frequently utilized method for solving the class imbalance issue is data sampling. In this method, a balanced dataset was created by adjusting the majority class sample’s count, which occupies nearly all imbalanced datasets as well as the minority class that is occupied by a tiny part. Most of the classifiers preferred the well-balanced data. A pre-processing technique utilized to address the dataset’s class imbalance problem is known as SMOTE which is dependent on the nearest neighbor algorithm (kNN) idea [21]. The process of SMOTE is provided in the following steps.

•
For every sample, identify the k-nearest neighbors.
•
From a k-nearest neighbor, choose the samples arbitrary.
•
New sample identification $=$ original samples $+$ dissimilarity * gap (0, 1).
•
To the minority, add the new samples. Afterwards, a dataset will be developed.

This SMOTE has drawbacks because of its insensitive oversampling where the minority sample’s formation fails to account for the sample from the majority class’s distribution. This leads to the unnecessary minority sample’s formation around the positive instances, which might aggravate the issue developed for borderline as well as noise during the process of learning.

For better classification, almost all classification algorithms attempt to get pure samples for learning and creating each class borderline as definitive as feasible. It’s easier to categorize the synthetic examples that were distant from the borderline than the closer ones. This creates a great learning challenge for several classifiers. Considering this fact, an Improved SMOTE (ISMOTE) is proposed in this pre-processing stage, it defines the borderline clearly and also creates pure synthetic samples. The proposed Improved data imabalancing process based on ISMOTE includes 2 stages that are described below.

Stage 1: Initially, for generating the synthetic instance, the SMOTE algorithm is applied as in Eq. (1).

$\displaystyle N_{\textit{new}}=\left[{\frac{({r_{\textit{maj}}^{2}+z_{\min}^{2% }-2r_{\textit{maj}}z_{\min}})+({r_{\textit{maj}}-z_{\min}})}{z_{\min}}}\right]$ (1)

Here, $N_{\textit{new}}$ symbolizes the newly generated initial synthetic instance number, $r_{\textit{maj}}$ represents the count of majority class samples, and the count of minority class samples were denoted as $z_{\min}$ .

Afterwards, the Minkowski Distance between the new synthetic distances as well as the original minority and majority class is calculated.

Stage 2: In the 2^nd stage, we have eliminated the synthetic samples having higher closeness to the majority than the minority class and the synthetic instances nearer to the borderline created by SMOTE. Its procedure has been listed below.

•
Consider $\hat{A}=\{\hat{A}_{1},\hat{A}_{2},\ldots,\hat{A}_{N_{\textit{new}}}\}$ as the set of new synthetic instances. Also, $SS_{m}=\{SS_{m1},\linebreak SS_{m2},\ldots,SS_{mz_{\min}}\}$ and $SS_{a}=\{{SS_{a1},SS_{a2},\ldots,SS_{ar_{\textit{maj}}}}\}$ is the set of minority as well as majority samples. Here, $\hat{A}_{i},j\in[{1,M}]$
•
Compute the Improved Euclidian distance between $\hat{A}$ and $SS_{mk}$ , along with the Improved Euclidian distance between $\hat{A}_{i}$ and $SS_{al}$ to make the acceptance or rejection. For $i$ from 1 to $N_{\textit{new}}$ step by step, we compute these distances as follows:

Let the two points as $\alpha_{q}$ and $\beta_{q}$ , and the conventional formula for Euclidean distance (d) in $p$ points is expressed in Eq. (2).

$\displaystyle d({\alpha,\beta})=\sqrt{\sum\limits_{q=1}^{p}{({\alpha_{q}-\beta% _{q}})^{2}}}$ (2)

To provide a more precise estimation of Euclidian distance we have utilized the improved formula of Euclidian distance. The improved Euclidian distance calculation between $\hat{A}_{i}$ and $SS_{mk}$ is given in Eq. (3.2).

$\displaystyle DD_{\min}(\hat{A}_{1},SS_{mk})=\left\{{\frac{\sqrt{\sum\limits_{% i=1}^{n}{({|{\hat{A}_{i}}|-|{SS_{mk}}|})^{2}}}}{\min\left({\sqrt{\sum\limits_{% i=1}^{n}\hat{A}_{i}},\sqrt{\sum\limits_{i=1}^{n}{SS_{mk}}}}\right)+\left[{\sum% \limits_{i=1}^{n}{({|\hat{A}_{i}|-|{SS_{mk}}|})\ast\delta}}\right]}}\right\},$ (3) $\displaystyle\text{Where }\delta=0.6$

The Improved Euclidian distance between $\hat{A}_{i}$ and $SS_{al}$ can be expressed as in Eq. (4)

$\displaystyle DD_{\min}({\hat{A}_{1},SS_{al}})=\left\{{\frac{\sqrt{\sum\limits% _{i=1}^{n}{({|{\hat{A}_{i}}|-|{SS_{al}}|})^{2}}}}{\min\left({\sqrt{\sum\limits% _{i=1}^{n}{\hat{A}_{i}}},\sqrt{\sum\limits_{i=1}^{n}{SS_{al}}}}\right)+\left[{% \sum\limits_{i=1}^{n}{({|{\hat{A}_{i}}|-|{SS_{al}}|})\ast\delta}}\right]}}\right\}$ (4)

From the above Eqs (3.2) and (4), we got two arrays $A_{\min}$ and $A_{\textit{maj}}$ , which are defined as follows:

$\displaystyle A_{\min}=({DD_{\min}({\hat{A}_{i},SS_{m1}}),\ldots,DD_{\min}({% \hat{A}_{i},SS_{mz_{\min}}})})$ (5) $\displaystyle A_{\textit{maj}}=({DD_{\textit{maj}}({\hat{A}_{i},SS_{a1}}),% \ldots,DD_{\textit{maj}}({\hat{A}_{i},SS_{ar_{\textit{maj}}}})})$ (6)

Then, $A_{\min}$ choose the minimum value that is $\textit{Mini}({A_{\min}})$ and from $A_{\textit{maj}}$ choose the minimum value that is $\textit{Mini}({A_{\textit{maj}}})$ .

$\displaystyle\textit{Mini}({A_{\min}})\leqslant\textit{Mini}({A_{\textit{maj}}% })(\textit{Accepted})$ $\displaystyle\textit{Mini}({A_{\min}})>\textit{Mini}({A_{\textit{maj}}})(% \textit{Rejected})$

If $\textit{Mini}({A_{\min}})$ is greater than $\textit{Mini}({A_{\textit{maj}}})$ , then those new synthetic samples were rejected or otherwise accepted.

With these accepted synthetic instances, the noise can be eliminated by the following process. Consider, $\widehat{SS}=\{\widehat{SS_{1}},\widehat{SS_{2}},\ldots,\widehat{SS_{n}}\}$ as the new synthetic minority attained from the above steps. Afterwards, evaluate the distance between $\widehat{SS_{i}}$ with every original minority. $SS_{m},\textit{Min}_{\textit{Rapt.}}(\widehat{SS_{i}},SS_{m})$ , can be given as in Eq. (7).

$\displaystyle\textit{Min}_{\textit{Rapt}}({\widehat{SS_{i}},SS_{m}})=\sum% \limits_{k=1}^{z_{\min}}\sum\limits_{j=1}^{M}{\sqrt{\left(\widehat{SS_{i}^{(j)% }}-SS_{mk}^{(j)}\right)^{2}}}$ (7)

With all minorities, the sample rapprochement can be denoted as $\textit{Min}_{\textit{Rapt}}(\widehat{SS_{i}},SS_{m})$ . We can obtain $L o$ as per the Eq. (8).

$\displaystyle Lo=\sum\limits_{i=1}^{n}\textit{Min}_{\textit{Rapt}}(\widehat{SS% _{i}},SS_{m})$ (8)

Similarly, as in Eq. (9), the distance among $\widehat{SS_{i}}$ as well as each original majority $SS_{a}$ , $\textit{Maj}_{\textit{Rapt}}(\widehat{SS_{i}},SS_{a})$ would be described

$\displaystyle\textit{Maj}_{\textit{Rapt}}(\widehat{SS_{i}},SS_{a})=\sum\limits% _{i=1}^{r_{\textit{maj}}}\sum\limits_{j=1}^{M}\sqrt{\left(\widehat{SS_{i}^{(j)% }}-SS_{a}^{(j)}\right)^{2}}$ (9)

With all majorities, the sample rapprochement can be denoted as $\textit{Maj}_{\textit{Rapt}}(\widehat{SS_{i}},SS_{a})$ . We can obtain $H$ as per the Eq. (10).

$\displaystyle H=\sum\limits_{i=1}^{n}(\textit{Maj}_{\textit{Rapt}}(\widehat{SS% _{i}},SS_{a}))$ (10)

Then half of the synthetic samples have the lowest distance among $SS_{a}$ and $\widehat{SS_{i}}$ is eliminated to get the balanced data $DA_{\textit{Bal}}$ , where $DA_{\textit{Bal}}=DA_{\textit{Bal}1},DA_{\textit{Bal}2},\ldots,DA_{\textit{% BalN}}$ . Afterward, this balanced data $DA_{\textit{Bal}}$ is subjected to feature extraction that is briefed below.
3.3 Feature extraction

The balanced data $DA_{\textit{Bal}}$ is subjected to feature extraction. From this $DA_{\textit{Bal}}$ , relevant information or features are extracted in this feature extraction. Features like Improved entropy (holoentropy), along with mutual information-based and statistical features have been retrieved in our work and is explained below.

3.3.1 Improved entropy (holoentropy) based features

The measure of a system’s disorder or randomness is named as entropy. For detecting the outliers, the energy measurements are not sufficient. To get superior outlier candidates, total correlation is also essential. Likewise, holoentropy’s contribution along with entropy as well as the total correlation assists in providing proper outcomes for outlier detection [22]. The sum of entropy, as well as the random vector’s total correlation, is referred to as the holoentropy. Here, the conventional holoentropy $\textit{HLE}(R)$ can be expressed using Eq. (11)

$\displaystyle\textit{HLE}(R)=HE(R)+CE(R)$ (11)

Here, $CE(R)$ symbolized as the total correlation and $HE(R)$ indicated as the total entropy and $R$ is the random vector.

Our work utilizes the improved holo-entropy function to handle the disordered data based on the correlation among the attributes. The formulation of improved holoentropy is given by Eq. (12), where the weighted tansig function, $\omega_{x}(b)$ is determined using Eq. (12) in which $b=1,\ldots N$ .

$\displaystyle\textit{HLE}(R)=\sum\omega_{x}(b)\times HE(R)+CE(R)\ast E_{b}$ (12)

Where the b-th entropy value can be denoted as $E_{b}$ . Also, for total entropy $(HE(R))$ calculation, improved holo-entropy uses the Deng entropy and for total correlation $(CE(R))$ calculation, the Pearson correlation has been used. Finally, we got the improved holoentropy that is utilized as a feature to offer effective cyber attack classification. A basic probability assignment’s (BPA’s) uncertainty can be measured by the Deng entropy. This is the Shannon entropy’s generalization. For mass function, it is more suitable. A mathematical formula for Deng entropy is provided in Eq. (13).

$\displaystyle ED(e)=-\sum\limits_{G\subseteq X}e(G)\log_{2}\frac{e(G)}{2^{|G|}% -1}$ (13)

Here, the mass function is $e$ and that is defined on the discernment $X$ ’s frame, and the mass function’s focal element can be denoted as $G$ . Also, the cardinality of $G$ can be symbolized as $|G|$ . A correlation coefficient that is utilized to measure the linear correlation between two data sets is referred to as the Pearson correlation coefficient (PCC). Equation (14), displays its numerical expression.

$\displaystyle\textit{Cor}=\frac{\sum{({z_{in}-\bar{z}})({g_{in}-\bar{g}})}}{% \sqrt{\sum{({z_{in}-\bar{z}})^{2}\sum{({z_{in}-\bar{g}})^{2}}}}}$ (14)

Where the correlation coefficient can be symbolized as Cor. In a sample, $z$ variable’s value is $z_{in}$ and its mean is $\bar{z}$ . Similarly, in a sample $g$ variable’s value is $g_{in}$ and its mean is $\bar{g}$ . The extracted improved entropy-based features is symbolized as $\textit{HLE}(R)$ .

3.3.2 Mutual information (MI)

The measure of the dependence among the variable (a category $C I$ and a term $T K$ ) is defined as MI [23]. The term $T K$ as well as the category $C I$ are independent if MI of the term $TK=0$ . MI’s numerical expression is given in Eq. (15). Here, $M I$ denotes the mutual information-based feature.

$\displaystyle MI({TK,CI})=\frac{\log({P({TK,CI})})}{P({TK})\ast P({CI})}$ (15)

3.3.3 Statistical features

Statistical features such as mean, standard deviation, median, maximum as well as minimum are extracted to offer a better intrusion detection model. These statistical features are given below.

a) Mean

All numbers in the pre-processed data sets are averaged to get the mean value that is numerically represented in Eq. (16) [24].

$\displaystyle\bar{\lambda}=\frac{\sum\lambda}{\textit{num}}$ (16)

Here, the set of $h$ value’s mean is denoted as $\bar{\lambda}$ , and all the $h$ value’s sum is defined as $\sum\lambda$ . Also, the number of $\lambda$ value’s number is denoted as num.

b) Median

In an arranged dataset, the mid value is considered the median [24].

Let the value at position $\left({\frac{no+1}{2}}\right)$ be median, if $n o$ is odd.

Find the values at positions $\left({\frac{no}{2}}\right)$ and $\left({\frac{no+1}{2}}\right)$ , if $n o$ are even, then take two values’ average which is called the median.

c) Standard deviation (STD)

The square root variance is referred to as mean and is symbolized as ( $\sigma$ ). Its numerical expression is given in Eq. (17) [24].

$\displaystyle\sigma=\sqrt{\frac{\sum{({pv-\mu})^{2}}}{ps}}$ (17)

Here, population standard deviation is symbolized as $\sigma$ , $p s$ indicating the population size, and every value from the population is represented by $p v$ and $\mu$ is the population mean.

d) Minimum

This number is the data value $\leqslant$ of all other values in the dataset [25].

e) Maximum

This number is the data value $\geqslant$ of all other values in the dataset [25].

The extracted statistical features were symbolized as $FE_{ST}=\{{\bar{\lambda},\textit{Median},\sigma,\textit{Min},\textit{Max}}\}$ .

These extracted features were denoted as $\textit{EXT}_{FE}=\{{\textit{HLE}(R),MI,FE_{ST}}\}$ and that is transferred to the feature selection process.

3.4 Feature selection via Improved Chi-square process

After the extraction of features, a feature selection process takes place, which uses an improved Chi-square processing. To reduce the large data, the feature selection process is crucial. The classification process can be enhanced by this. The noisy as well as irrelevant data get deleted by selecting all the data’s representative sub-sets to reduce the complexity of the classification process.

Chi-square has been utilized in statistics, for testing two events’ independence. By providing two variable’s data, we can obtain the expected count $E X$ as well as the observed count $O B$ . Chi-square evaluates how these counts deviate from each other. Its numerical formula is provided in Eq. (18) [26].

$\displaystyle CH^{2}=\frac{\sum{({OB_{v}-EX_{v}})^{2}}}{EX_{v}}$ (18)

Here, the observed value is symbolized as $OB_{v}$ and the expected value has been represented as $EX_{v}$ . The Chi-square method shows very good results but still suffers from some limitations. For instance, when the top 20 attributes are selected using Chi-square, the number of these attributes per class tends to vary accordingly. Considering that, we have introduced the improved Chi-square processing for feature selection. The steps used in our improved Chi-square-based feature selection are given below

Calculate improved Chi-square for all features.

In this step, to compute the improved Chi-square, we have utilized the following Eq. (19), instead of Eq. (18).

$\displaystyle CH^{2}=\frac{\sum{({\sqrt{OB_{v}^{2}-2OB_{v}EX_{v}+EX_{v}^{2}}})% }}{|{1+e^{-({OB_{v}-EX_{v}})^{2}}}|}$ (19)

Rank the features in descending order based on entropy values.

A high entropy value indicates more information and a low entropy value indicates less information.

Select features with the highest entropy values and then find the percentage for setting the threshold.

Scores higher than the percentage are selected to be a part of the final set of features and scores lower than the percentage are excluded.

This selected feature can be denoted as and that is sent to the classification process.

3.5 Classification process

The classification process is carried out based on those selected features. This is the final stage of our DCNN-Bi-LSTM-ICS model. A hybrid classification model which is the combination of both DCNN and Bi-LSTM is utilized for this process that intakes the selected features as input.

3.5.1 Attack classification using DCNN

The most commonly utilized NN is DCNN. This is developed from the traditional NNs; this can be used for data, images, or video. The CNN having more convolutional layers is termed Deep CNN. CNNs can be utilized for numerous applications because they can learn features from the database automatically. This CNN has the layer arrangements like convolutional, max pooling, convolutional $+$ ReLU, max pooling, convolutional $+$ ReLU, max pooling, convolutional $+$ ReLU, max pooling, convolutional $+$ ReLU, max pooling, convolutional $+$ ReLU, max pooling dropout layer, flatten and fully-connected or output layer [27].

a) Convolutional layer: Several feature maps are included in a convolutional layer. As expressed in Eq. (20), at the $L$ th layer, the feature map $FM_{J}^{L}$ is evaluated by convolving its prior layer’s feature maps sss via an activation function $f$ having the learnable kernels $K_{IJ}^{L}$ as well as an additive bias $bs_{J}^{L}$ .

$\displaystyle FM_{J}^{L}=f\left({\sum\nolimits_{I\in Q_{J}}{FM_{I}^{L-1}*K_{IJ% }^{L}+bs_{J}^{L}}}\right),L\geqslant 2;$ (20)

Here, the first layer $FM_{J}^{1}$ is represented as the input data, and the activation function $f$ is generally preferred to be the logistic (sigmoid) function and $Q_{J}$ represents an input map selection.

b) Pooling layer: The input feature map’s down-sampled versions are produced by a pooling layer as in Eq. (21).

$\displaystyle FM_{J}^{L}=f({\psi_{J}^{L}\textit{pool}({FM_{J}^{L-1}})+bs_{J}^{% L}}),L\geqslant 2.$ (21)

Here, the multiplicative bias has been represented by $\psi_{J}^{L}$ , $bs_{J}^{L}$ symbolizes the additive bias. Also, pool (.) indicates a pooling operation that commonly evaluates the input map’s aggregated statistics like max or their mean values. Following the max-pooling layer, the dropout layer is present, which has a dropout rate of 0.2. Then the flattened and fully connected or output layer is present.

Figure 2.

DCNN’s architecture.

Following the feature learning, the classification process has been conducted. The CNN architecture’s final stage utilizes a classification layer like softmax to provide the classification output that is normal or anomaly for dataset 1 (NSLKDD) and dataset 2 (CSE-CIC-IDS2018), Benign or DDoS Attack-HOIC or DDoS Attack-LOIC-UDP. DCNN’s architecture is presented in Fig. 2.

3.5.2 Bi-LSTM for attack classification

In parallel, the selected features from the feature selection process are also given to the Bi-LSTM classifier as input. For processing the sequence data, LSTM is more suitable. This is also utilized to solve the issues of gradient explosion as well as gradient disappearance during long sequence data processing. Three layers are hidden, input as well as output layers are included in the LSTM. Along with that, LSTM has three gating units and memory units [28].

During the LSTM’s forward propagation process, each threshold can be evaluated from the following formulas that are Eqs (22)–(27).

$\displaystyle\tilde{C}_{T}=\tanh({Wt_{Yc}Y_{T}+Wt_{hc}h_{T-1}+BI_{c}})$ (22) $\displaystyle C_{T}=F_{T}C_{T-1}+\textit{inp}_{T}\tilde{C}$ (23) $\displaystyle\textit{inp}_{T}=\sigma d({Wt_{\textit{Yinp}}Y_{T}+Wt_{\textit{% hinp}}h_{T-1}+Wt_{\textit{cinp}}c_{T-1}+BI_{\textit{inp}}})$ (24) $\displaystyle F_{T}=\sigma d({Wt_{YF}Y_{T}+Wt_{hF}h_{T-1}+Wt_{cF}c_{T-1}+BI_{F% }})$ (25) $\displaystyle O_{T}=\sigma d({Wt_{YO}Y_{T}+Wt_{hO}h_{T-1}+Wt_{cO}c_{T-1}+BI_{O% }})$ (26) $\displaystyle h_{T}=O_{T}.\tanh({c_{T}})$ (27)

Where the sigmoid activation function can be represented as $\sigma d$ and the hyperbolic tangent function is tanh. Each corresponding gate’s weights and bias were $Wt_{Yc}$ , $Wt_{hc}$ , $Wt_{\textit{Yinp}}$ , $Wt_{\textit{hinp}}$ , $Wt_{\textit{cinp}}$ , $Wt_{YF}$ , $Wt_{hF}$ , $Wt_{cF}$ , $Wt_{YO}$ , $Wt_{hO}$ , $Wt_{cO}$ , $BI_{c}$ , $BI_{\textit{inp}}$ , $BI_{F}$ .

In Bi-LSTM, two LSTM layers are there in opposite detections, which are named as forward as well as backward propagation layers. At the same time, these layer links the input and layers and function reverse-order and time sequence calculations correspondingly to attain forward as well as backward hidden layer’s output at all moments in turn. Afterwards, by merging the forward as well as backward hidden layer’s corresponding output results at each moment, the final output can be achieved. Its architecture is displayed in Fig. 3.

Figure 3.

Architecture of Bi-LSTM.

Its specific numerical expression is given in Eqs (28)–(30)

$\displaystyle h_{T}=F({wt_{1}Y_{T}+wt_{2}h_{T-1}})$ (28) $\displaystyle{h}^{\prime}_{T}=F({wt_{3}Y_{T}+wt_{5}{h}^{\prime}_{T-1}})$ (29) $\displaystyle O_{T}=\eta({wt_{4}Y_{4}+wt_{6}{h}^{\prime}_{T}})$ (30)

At time $t$ , the forward as well as backward propagation layer’s outputs were symbolized as $h_{T}$ and ${h}^{\prime}_{T}$ . From input to forward as well as backward propagation layer’s weight matrices are $wt_{1}$ and $wt_{2}$ . From forward as well as backward propagation layers to the self-propagation layer, the weight matrices were represented as $wt_{3}$ and $wt_{5}$ . Similarly, from forward as well as backward propagation layers to the output layer, the weight matrices were represented as $wt_{4}$ and $wt_{6}$ . By splicing the forward as well as backward propagation outcomes we got the function $\eta$ . Also, the final output gate’s output values are denoted as $O_{T}$ . This includes the attack classification results.

Finally, to get accurate classification results, the output obtained from both classifiers is averaged and the final result is normal or anomaly for dataset 1 (NSLKDD) and dataset 2 (CSE-CIC-IDS2018), Benign or DDoS Attack-HOIC or DDoS Attack-LOIC-UDP.

Figure 4.

Evaluation of the DCNN-Bi-LSTM-ICS approach for dataset 1 in relation to current methods with regards to (a) accuracy, (b) sensitivity, (c) precision, (d) specificity.

Figure 5.

Evaluation of the DCNN-Bi-LSTM-ICS approach for dataset 2 in relation to current methods with regards to (a) accuracy, (b) sensitivity, (c) precision, (d) specificity.

Figure 6.

Comparing the new technique’s performance to current techniques for (a) FNR, (b) FPR, and (c) FDR for dataset 1.

Figure 7.

Comparing the new technique’s performance to current techniques for (a) FNR, (b) FPR, and (c) FDR for dataset 2.

4. Result and discussion

4.1 Experimental setup

The DCNN-Bi-LSTM-ICS model for cyber-attack classification was implemented in Python using a 16 GB RAM Intel i5 CPU. The Device name is DESKTOP-344S3TO, and the Processor used is 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40 GHz 2.42 GHz. The Installed RAM is 16.0 GB (15.7 GB usable) and the System type is 64-bit operating system, x64-based processor Pen and touch. The DCNN-Bi-LSTM-ICS model makes use of the NSL-KDD [29] and CSE-CIC-IDS2018 [30] datasets. Conventional methods including SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN [31] and LSTM [32] are compared with the DCNN-Bi-LSTM-ICS model. The final results of the DCNN-Bi-LSTM-ICS algorithms for datasets 1 and 2 have been successfully validated.

4.2 Dataset description

Dataset 1: Compared to the original KDD data collection, the NSL-KDD data set has a number of benefits. By removing duplicate entries from the train set, bias towards more frequent records is avoided. Additionally, the suggested test sets don’t contain duplicate records, guaranteeing that approaches with higher detection rates won’t affect learners’ results. The number of selected records from each difficulty level group and the original KDD data set have an inverse relationship, which allows for a wider range of classification rates for different machine learning algorithms. Adequate data in the training and test sets ensures that evaluation results are consistent and comparable across different studies.

Dataset 2: The academic intrusion detection dataset CSE-CIC-IDS2018, authored by Drs. Iman Sharafaldin, Arash Habibi Lashkari, and Ali Ghorbani were made available by the Canadian Institute for Cybersecurity. This is the upgrade from CIC-IDS2017, and it has a much higher representation of ’Infiltration’ traffic. After being created with AWS, the dataset was cleaned into parquet files to make sure all data types were set appropriately. The clean version has baseline classification scores using straightforward models and no missing or duplicate entries.

4.3 Comparative analysis

The DCNN-Bi-LSTM-ICS model is compared with conventional methods such as SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN, and LSTM to detect cyberattacks. For comparison, both positive and negative criteria are employed. Positive metrics include things like accuracy, sensitivity, specificity, and precision; negative measures include things like FNR, FPR, and FDR. The analysis also includes other metrics like NPV, F-measure, and MCC in addition to these variables. The ability of the DCNN-Bi-LSTM-ICS model to recognize and classify cyberattacks is assessed taking into account each of these variables.

4.3.1 Analyzing positive indicators for dataset 1 and dataset 2

For Datasets 1 and 2, the proposed model’s sensitivity, specificity, accuracy, and precision are evaluated in order to create an intelligent cyberattack detection model. Figure 4 presents the analysis of the DCNN-Bi-LSTM-ICS model for dataset 1 and compares it to other techniques including SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN, and LSTM. As seen in Fig. 4a, the DCNN-Bi-LSTM-ICS model’s accuracy level is higher than that of other conventional techniques. The DCNN-Bi-LSTM-ICS model’s detection procedure makes use of DCNN and Bi-LSTM, which improves the prediction accuracy of the model. With accuracy rates of 93%, 94%, 95%, and 96%, the DCNN-Bi-LSTM-ICS model surpasses all conventional approaches at different learning percentages (60, 70, 80, and 90). The DCNN-Bi-LSTM-ICS model achieves 97% precision at a learning percentage of 90, while other standard strategies produce a precision of less than 95%. The sensitivity and Specificity of the DCNN-Bi-LSTM-ICS model at the 80^th learning percentage is 95% for both, which is 5% higher than conventional techniques. DCNN-Bi-LSTM-ICS model for dataset 1 performs better than all traditional techniques in cyber-attack detection.

The proposed model is also compared with the conventional approach for dataset 2. Similar to Dataset 1, the DCNN-Bi-LSTM-ICS model outperforms conventional methods for Dataset 2. A visual depiction of the comparison for dataset 2 is shown in Fig. 5. At a leaning percentage of 60, the DCNN-Bi-LSTM-ICS model achieves 90% accuracy, whereas other methods only reach 89%. For dataset 2, the DCNN-Bi-LSTM-ICS method obtains 89% sensitivity at a LP of 80, outperforming the traditional methods. Compared to the subpar precision of earlier models, which were at 91% at a learning percentage of 90, 93% precision is achieved. The model benefits from improved data unbalancing at the pre-processing step in the feature extraction and selection processes. Ultimately, the hybridization of DCNN and Bi-LSTM results in a prediction that is more accurate. The DCNN-Bi-LSTM-ICS model performs better than the previous models as a result.

4.3.2 Analyzing negative indicators for dataset 1 and dataset 2

Apart from positive measures, the possibility of errors in the DCNN-Bi-LSTM-ICS model for epileptic seizure prediction is investigated through negative metrics. Figures 6 and 7 display the error occurrence possibility value of the DCNN-Bi-LSTM-ICS model for cyberattack categorization. As seen in Fig. 6, the DCNN-Bi-LSTM-ICS model’s error occurrence for Dataset 1 is extremely low. Its FNR, FPR, and FDR values are also quite low when evaluated over existing methods. The suggested method chooses the right characteristics required for classification by using enhanced chi-square processing for feature selection. Compared to the traditional model, this improves the prediction’s accuracy. Figure 6a shows that at LP 60, the FNR, FPR, and FDR for the DCNN-Bi-LSTM-ICS model are 0.060, 0.066, and 0.064, respectively. At learning percentage 90, the errors progressively decreased to 0.04 for FNR, 0.02 for FPR, and 0.02 for FDR. Figure 6 shows that the other existing method has a bigger inaccuracy at different learning percentages of 60 to 90.

Figure 7 presents the DCNN-Bi-LSTM-ICS model’s error analysis using conventional approaches for dataset 2. As shown in Fig. 7a–c, the DCNN-Bi-LSTM-ICS model has a lower error value, even for dataset 2. Specifically, the FNR achieved for dataset 2 at learning % 90 is 0.04 for the DCNN-Bi-LSTM-ICS model, whereas the prior techniques yielded FNR values greater than 0.06. The DCNN-Bi-LSTM-ICS model yielded FPR and FDR values of 0.04 at a learning percentage of 80, indicating a 5% reduction in error compared to the standard procedures. Because the DCNN-Bi-LSTM-ICS model has less inaccuracy than the previous methods, it can be clearly seen from the analysis of datasets 1 and 2 that it will be more accurate. Consequently, the DCNN-Bi-LSTM-ICS model exhibits improved performance and reduced error in cyber-attack classification.

4.3.3 Analyzing some additional indicators for dataset 1 and dataset 2

We also use NPV, F-measure, and MCC assessments to analyze the cyber-attack categorization performance of the DCNN-Bi-LSTM-ICS model. For datasets 1 and 2, the proposed model is compared with conventional techniques as SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN, and LSTM. Figures 8 and 9 exhibit the analytical representation graphically. The DCNN-Bi-LSTM-ICS model’s NPV, F-measure, and MCC scores for dataset 1 are displayed in Fig. 8. With 90% learning percentage, the suggested model produced NPV of 95%, F-measure of 96%, and MCC of 92% for dataset 1. Figure 8a demonstrates that the NPV of the DCNN-Bi-LSTM-ICS model ranges from 93% to 95% for learning percentages of 60 to 90, whereas competing techniques have lower NPVs in comparison. The DCNN-Bi-LSTM-ICS model obtains its greatest performance when compared to conventional methods since it uses DCNN and Bi-LSTM to help the Intelligent model for cyber-attack categorization.

Figure 8.

Evaluation of the DCNN-Bi-LSTM-ICS technique with current schemes for dataset 1 (a) NPV, (b) F-measure, and (c) MCC.

Figure 9 presents a performance comparison between the created strategy and the existing methodologies for NPV, F-measure, and MCC for dataset 2. The net present value (NPV) of the DCNN-Bi-LSTM-ICS model for dataset 2 is 96% at the 90th learning percentage, surpassing the NPV for dataset 1 and other conventional techniques. Compared to other conventional approaches, the DCNN-Bi-LSTM-ICS model for dataset 2 has a higher MCC and F-measure, with an MCC of 84% and an F-measure of 89% at learning percentage 80. The F-measure and MCC demonstrate that the DCNN-Bi-LSTM-ICS model outperforms the conventional approach by a substantial margin, performing 5% better than the current methods. At a learning percentage of 90, the DCNN-Bi-LSTM-ICS model for dataset 2 has an NPV of 96%, an F-measure of 93%, and an MCC of 89%. In terms of cyberattack classification, the DCNN-Bi-LSTM-ICS model outperforms earlier techniques evaluated for all LPs.

Table 1

Ablation study of the DCNN-Bi-LSTM-ICS method for dataset 1

	DCNN-Bi-LSTM-ICS	Proposed with traditional SMOTE	Proposed with traditional holoentropy	Proposed with traditional Chi-Square
Accuracy	96.32%	92.40%	92.31%	92.28%
Precision	97.52%	88.61%	88.47%	88.43%
Sensitivity	95.32%	88.59%	88.46%	88.41%
F-measure	96.41%	88.60%	88.47%	88.42%
Specificity	97.39%	94.30%	94.24%	94.21%
MCC	92.65%	82.90%	82.70%	82.63%
NPV	95.08%	94.29%	94.23%	94.20%
FDR	2.48%	11.39%	11.53%	11.57%
FNR	4.68%	11.41%	11.54%	11.59%
FPR	2.61%	5.70%	5.76%	5.79%

Figure 9.

Performance comparison of the DCNN-Bi-LSTM-ICS technique with current schemes for dataset 2’s (a) NPV, (b) F-measure, and (c) MCC.

4.4 Performance analysis

The performance of the proposed scheme is evaluated through the use of Improved chi-square processing, Holoentropy, and Improved data imbalance processing. The ablation study and analysis results for Datasets 1 and 2 are described in Tables 1 and 2.

4.4.1 Ablation study for datasets 1 and 2

The DCNN-Bi-LSTM-ICS model performs better than the model built using traditional methods. An enhanced data balancing approach is used to preprocess Mutual Information, and statistical features for the DCNN-Bi-LSTM-ICS model are obtained from the preprocessed data. A feature extraction method based on Holoentropy is employed in the DCNN-Bi-LSTM-ICS model. Then, to precisely classify cyberattacks, a hybrid DCNN and Bi-LSTM classifier is used. The ablation analysis of the proposed method for datasets 1 and 2 is presented in Tables 1 and 2. Table 1 shows that the DCNN-Bi-LSTM-ICS method outperforms the model utilizing the standard SMOTE, conventional Chi-Square, and conventional Holoentropy processes. The DCNN-Bi-LSTM-ICS model achieves 96% accuracy, which is 4% greater accuracy than the usual techniques. With a little 0.04 FNR error value, the DCNN-Bi-LSTM-ICS technique outperforms the standard SMOTE, conventional Holoentropy, and conventional Chi-Square model, which has 7% greater error. When evaluated over precision of the DCNN-Bi-LSTM-ICS model with that of dataset 1, the DCNN-Bi-LSTM-ICS model performs better – its precision value is 97% – than the conventional approaches, which is roughly 10% less accurate than the DCNN-Bi-LSTM-ICS model.

Table 2 displays the results of the DCNN-Bi-LSTM-ICS method’s ablation research for dataset 2. Compared to the model using standard SMOTE, Holoentropy, and Chi-Square, the DCNN-Bi-LSTM-ICS technique achieves 95% accuracy, which is 3% greater. The DCNN-Bi-LSTM-ICS model offers accurate cyber-attack categorization with 93% precision compared to 88% precision for all models using conventional methodologies. The model’s error occurrence for dataset 2 is similarly incredibly low, given that the FPR, FDR, and FNR of the DCNN-Bi-LSTM-ICS model are all much lower than those of conventional approaches, at 0.03, 0.06, and 0.06, respectively. For datasets 1 and 2, the ablation examination of the DCNN-Bi-LSTM-ICS technique shows that the DCNN-Bi-LSTM-ICS model performs better in cyber-attack classification than the conventional approach.

Table 2
Ablation analysis of the DCNN-Bi-LSTM-ICS model for dataset 2

	DCNN-Bi-LSTM-ICS	Proposed with traditional SMOTE	Proposed with traditional holoentropy	Proposed with traditional Chi-Square
Accuracy	95.52%	92.46%	92.22%	92.53%
Precision	93.30%	88.69%	88.34%	88.80%
Sensitivity	93.25%	88.68%	88.32%	88.78%
F-measure	93.28%	88.69%	88.33%	88.79%
Specificity	96.65%	94.35%	94.17%	94.40%
MCC	89.91%	83.03%	82.50%	83.19%
NPV	96.62%	94.34%	94.16%	94.39%
FDR	6.70%	11.31%	11.66%	11.20%
FNR	6.75%	11.32%	11.68%	11.22%
FPR	3.35%	5.65%	5.83%	5.60%

4.5 Validation using K-fold for datasets 1 and 2

A machine learning technique called K-fold cross-validation splits a dataset into K folds, each of which is utilized for training and validation. K-1 folds are used for training and the remaining fold is used for validation during the K $=$ 4 training cycles of the DCNN-Bi-LSTM-ICS model. K iterations of the method are carried out, using a different fold as the validation set each time. The performance metric is then averaged over the iterations for a more robust estimate. DCNN-Bi-LSTM-ICS model’s K-fold cross-validation is evaluated based on accuracy, which is shown in Figs 10 and 11. K-fold cross-validation is evaluated by comparing the accuracy of conventional approaches like SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN and LSTM. The X-axis represents the number of folds considered for evaluation. From Fig. 10, the DCNN-Bi-LSTM-ICS model attains above 93% accuracy whereas all others are below 91%. The accuracy score for the DCNN-Bi-LSTM-ICS model looks like same at different folds K $=$ 2, 3, 4, 5. For dataset 2, DCNN-Bi-LSTM-ICS model attains 94% at 5^th fold and at 2^nd, 3^rd, 4^th K-fold accuracy ranges at 93%. Except for GoogLeNet all other conventional approaches achieved an accuracy of 89% and been constant at all the fold. It is evident from the K-fold cross-validation of datasets 1 and 2 that the DCNN-Bi-LSTM-ICS model performs better than any other previous method.

Figure 10.

K-fold cross-validation for dataset 1.

Figure 11.

K-fold cross-validation for dataset 2.

4.6 AUC & ROC Analysis for dataset 1 and dataset 2

Figures 12 and 13 display the AUC and ROC analysis of the DCNN-Bi-LSTM-ICS model compared to traditional methodologies for datasets 1 and 2. The true positive rate or false positive rate in the cyberattack categorization prediction is determined using AUC & ROC analysis. The data shown in Fig. 12 indicates that the DCNN-Bi-LSTM-ICS model for Dataset 1 outperforms other standard models like SqueezeNet, GoogLeNet, DCNN, Bi-LSTM, CNN, and LSTM. The Ash line is used to indicate the DCNN-Bi-LSTM-ICS model. The suggested model achieves the maximum true positive rate of 0.95. TPR values smaller than 0.8 are obtained using other traditional techniques, which is in line with the DCNN-Bi-LSTM-ICS model. The AUC and ROC analysis of dataset 2’s DCNN-Bi-LSTM-ICS model is shown in Fig. 13, along with a comparison with the previous method. The DCNN-Bi-LSTM-ICS method even achieves the highest TPR value of 0.83 for dataset 2, whilst other traditional techniques only manage 0.82. A cyberattack can be classified as a false positive due to the lower TPR value of the conventional model. Thus, the suggested approach has a better chance of correctly classifying cyberattacks.

Figure 12.

AUC & ROC analysis of DCNN-Bi-LSTM-ICS approach over traditional models for dataset 1.

Figure 13.

AUC & ROC evaluation of DCNN-Bi-LSTM-ICS approach over traditional models for dataset 2.

4.7 Non-parametric analysis

Non-parametric tests are mathematical procedures used in statistical hypothesis testing that do not assume anything about the frequency distribution of the variables to be evaluated. The non-parametric test’s analysis is displayed in Table 3. The non-parametric experiment is a collection of methods used when the data are skewed and does not depend on data from any particular distribution. These models are not parameter-free just because they are called non-parametric. The truth is that neither the parameters nor their quantity are set in stone; they are somewhat arbitrary. These models are hence referred to as distribution-free models. To determine if the data passes the normalcy test, Prism additionally used the conventional 0.05 cut-off. If the $P$ value is more than 0.05, the answer is in the affirmative. If the $P$ value is less than 0.05, the answer is negative. One can utilize a $t$ -test to ascertain whether a single group deviates from a predetermined value (a one-sample $t$ -test), whether two groups differ from each other (an independent two-sample $t$ -test), or whether paired measures demonstrate a significant difference (a paired, or dependent samples $t$ -test). The signed-rank test and the rank sum test are the two variants of the Wilcoxon test. It contrasts two matched sets of data. The test’s objective is to determine whether two or more sets of pairings differ from one another in a way that is statistically significant. Friedman states that the null hypothesis is that all k-related variables are generated by the same population. A z-test is used in hypothesis testing to assess the statistical significance of a discovery or association. It specifically examines the null hypothesis, or whether two means are equal.

Table 3
Analysis of non-parametric test

Dataset 1
	SqueezeNet	AlexNet	GoogLeNet	DCNN	Bi-LSTM	CNN	LSTM	DCNN-Bi- LSTM-ICS
P-test	1.53E-07	5.86E-09	3.83E-31	3.68E-09	1.00E-07	4.22E-09	8.49E-09	2.81E-06
T-test	5.250459	5.822557	11.62043	5.899778	5.327497	5.877035	5.760111	4.685287
Friedman test	5.39E-100	9.87E-108	3.60E-230	5.31E-109	3.64E-101	1.44E-108	2.11E-106	5.15E-89
Wilcoxon test	1.22E-51	1.59E-55	2.20E-151	3.67E-56	3.15E-52	6.05E-56	7.40E-55	4.00E-46
Z-score test	1.81E-52	1.73E-56	1.08E-155	3.77E-57	4.46E-53	6.34E-57	8.49E-56	8.86E-47
Dataset 2
	SqueezeNet	AlexNet	GoogLeNet	DCNN	Bi-LSTM	CNN	LSTM	DCNN-Bi- LSTM-ICS
P-test	0.656193	0.596135	0.625854	0.471068	0.581521	0.625854	0.596135	0.848675
T-test	0.445191	0.529985	0.487588	0.720774	0.551184	0.487588	0.529985	0.190815
Friedman test	0.615606	0.508068	0.551535	0.318366	0.483414	0.563056	0.460946	0.751098
Wilcoxon test	0.486099	0.410573	0.440473	0.284692	0.393896	0.448525	0.378837	0.592654
Z-score test	0.486125	0.41059	0.440493	0.284684	0.39391	0.448547	0.378849	0.592683

5. Conclusion

Due to the recent progress in communication as well as information technologies, more amounts of corporate and sensitive data are exchanged continuously, which makes it vulnerable to attacks. The most crucial security mechanisms are named IDS, which can detect malicious activities. Considering that, a novel DCNN-Bi-LSTM-ICS was proposed in this work for cyber attack classification, which includes 5 working stages. Initially, the data from datasets was taken as input. Afterwards, an improved class imbalance balancing processing based on ISMOTE was conducted to pre-process the input data. The third stage has the extraction of improved holo-entropy-based, mutual information and statistical features. For proper feature selection, ICS was developed. Finally, a hybrid classifier which includes the parallel processing of both DCNN and Bi-LSTM classifiers was proposed for accurate classification. To get accurate classification outcomes, the result of these classifiers was averaged. The outcomes showed that the performance of DCNN-Bi-LSTM-ICS was superior to other approaches.

References

Kumar

Sinha

. A robust intelligent zero-day cyber-attack detection technique. Complex & Intelligent Systems. 2021 Oct; 7(5): 2211-34.

Alguliyev

Imamverdiyev

Sukhostat

. Hybrid DeepGCL model for cyber-attacks detection on cyber-physical systems. Neural Computing and Applications. 2021 Aug; 33(16): 10211-26.

Kumar

Sahayakingsly

Udayakumar

. Analysis of intrusion detection in cyber attacks using DEEP learning neural networks. Peer-to-Peer Networking and Applications. 2021 Jul; 14(4): 2565-84.

Haghnegahdar

Wang

. A whale optimization algorithm-trained artificial neural network for smart grid cyber intrusion detection. Neural Computing and Applications. 2020 Jul; 32(13): 9427-41.

Zhang

Yang

Hang

. Cyber-attack detection for autonomous driving using vehicle dynamic state estimation. Automotive Innovation. 2021 Aug; 4: 262-73.

Dai

Zha

Liu

Xie

Tian

. Fault detection filter design for networked systems with cyber attacks. Applied Mathematics and Computation. 2022 Jan 1; 412: 126593.

AlZubi

Al-Maitah

Alarifi

. Cyber-attack detection in healthcare using cyber-physical system and machine learning techniques. Soft Computing. 2021 Sep; 25(18): 12319-32.

Bouyeddou

Harrou

Kadri

Sun

. Detecting network cyber-attacks using an integrated statistical approach. Cluster Computing. 2021 Jun; 24: 1435-53.

Zaib

Bashir

Qureshi

Kausar

Rizwan

Jeon

. Deep learning based cyber bullying early detection using distributed denial of service flow. Multimedia Systems. 2022 Dec 1: 1-20.

10.

Kang

Liu

Zhu

Zhao

Liu

. Coordinated cyber-physical attacks based on different attack strategies for cascading failure analysis in smart grids. Wireless Networks. 2021 Aug 18: 1-6.

11.

Luh

Temper

Tjoa

Schrittwieser

Janicke

. PenQuest: A gamified attacker/defender meta model for cyber security assessment and education. Journal of Computer Virology and Hacking Techniques. 2020 Mar; 16: 19-61.

12.

Lou

Jiang

Xiao

Yan

. Cyber intrusion detection through association rule mining on multi-source logs. Applied Intelligence. 2021 Jun; 51: 4043-57.

13.

Wang

Liu

. Deducing cascading failures caused by cyberattacks based on attack gains and cost principle in cyber-physical power systems. Journal of Modern Power Systems and Clean Energy. 2019 Nov; 7(6): 1450-60.

14.

Palleti

Adepu

Mishra

Mathur

. Cascading effects of cyber-attacks on interconnected critical infrastructure. Cybersecurity. 2021 Dec; 4: 1-9.

15.

Cvitié

Peraković

Periša

Jurcut

. Methodology for detecting cyber intrusions in e-learning systems during COVID-19 pandemic. Mobile Networks and Applications. 2023 Feb; 28(1): 231-42.

16.

Elkhadir

Mohammed

. A cyber network attack detection based on GM Median Nearest Neighbors LDA. Computers & Security. 2019 Sep 1; 86: 63-74.

17.

Sengan

Subramaniyaswamy

Indragandhi

Velayutham

Ravi

. Detection of false data cyber-attacks for the assessment of security in smart grid using deep learning. Computers & Electrical Engineering. 2021 Jul 1; 93: 107211.

18.

Kumar

Gupta

Tripathi

. Toward design of an intelligent cyber attack detection system using hybrid feature reduced approach for iot networks. Arabian Journal for Science and Engineering. 2021 Apr; 46(4): 3749-78.

19.

Alqahtani

Abuhasel

Alquraish

. A novel decentralized analytical methodology for cyber physical networks attack detection. Wireless Personal Communications. 2022 Nov; 127(2): 1705-16.

20.

Guo

. MCKC: A modified cyber kill chain model for cognitive APTs analysis within Enterprise multimedia network. Multimedia Tools and Applications. 2020 Oct; 79(39): 29923-49.

21.

Hussein

Yohannese

Bashir

. A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE. International Journal of Computational Intelligence Systems. 2019 Jan; 12(2): 1412-22.

22.

Munagala

Kodati

. Enhanced holoentropy-based encoding via whale optimization for highly efficient video coding. The Visual Computer. 2021 Aug; 37(8): 2173-94.

23.

Bahassine

Madani

Al-Sarem

Kissi

. Feature selection using an improved Chi-square for Arabic text classification. Journal of King Saud University-Computer and Information Sciences. 2020 Feb 1; 32(2): 225-31.

24.

https://medium.com/analytics-vidhya/statistics-mean-median-mode-variance-standard-deviation-47fab926465a.

25.

https://www.thoughtco.com/what-are-the-maximum-and-minimum-3126236.

26.

https://towardsdatascience.com/chi-square-test-for-feature-selection-in-machine-learning-206b1f0b8223.

27.

Yan

Chen

Shyu

Chen

. Deep learning for imbalanced multimedia data classification. In 2015 IEEE International Symposium on Multimedia (ISM). IEEE. 2015 Dec 14. pp. 483-488.

28.

Chen

Liu

Wang

. A RUL prediction method of small sample equipment based on DCNN-BiLSTM and domain adaptation. Mathematics. 2022 Mar 23; 10(7): 1022.

29.

https://www.kaggle.com/datasets/hassan06/nslkdd.

30.

https://www.kaggle.com/datasets/dhoogla/csecicids2018.

31.

Oliveira

Praça

Maia

Sousa

. Intelligent cyber attack detection and classification for network-based intrusion detection systems. Applied Sciences. 2021 Feb 13; 11(4): 1674.

32.

Lin

Wang

Lin

Tsai

. Behaviour classification of cyber attacks using convolutional neural networks. J. Comput. Sci. 2021 Feb 1; 32(1): 65-82.

An intelligent hybrid model for cyber attack classification with selected feature set

Abstract

Keywords

1. Introduction

3. Methodology

3.2 Pre-processing

3.3.1 Improved entropy (holoentropy) based features

a) Mean

b) Median

c) Standard deviation (STD)

d) Minimum

e) Maximum

3.5.1 Attack classification using DCNN

4.1 Experimental setup

4.2 Dataset description

4.3 Comparative analysis

4.3.1 Analyzing positive indicators for dataset 1 and dataset 2

4.3.2 Analyzing negative indicators for dataset 1 and dataset 2

4.3.3 Analyzing some additional indicators for dataset 1 and dataset 2

4.4.1 Ablation study for datasets 1 and 2

Table 2 Ablation analysis of the DCNN-Bi-LSTM-ICS model for dataset 2

Table 3 Analysis of non-parametric test

References

Table 2
Ablation analysis of the DCNN-Bi-LSTM-ICS model for dataset 2

Table 3
Analysis of non-parametric test