Abstract
The internet of things (IoT) has significantly influenced day-to-day life in large industrial systems. The Internet of Things (IoT) offers a platform for information systems to integrate effectively with network servers. In contrast, cyber threats are becoming critical, especially for IoT servers. A strong strategy must be in place to protect the network system from multiple attacks. In order to detect malicious behaviors that deteriorate network performance, an intrusion detection system (IDS) is crucial. An IDS use a detection method to monitor network activity to alert IoT users regularly. This paper proposes a novel IDS for IoT using log-sigmoid kernel principal component analysis (LSK-PCA) and activation updated deep feed-forward neural network (AU-DFFNN) based dimensionality reduction (DR) and classification technique. Initially, the input data is taken from the NSLKDD dataset and undergoes pre-processing. Afterwards, attribute extraction is carried out, followed by Fisher’s Yates Adapted Golden Eagle Optimizer (FY-GEO) based feature selection. Then, DR of the feature selected data is done using the LSK-PCA model. Finally, the reduced dataset is given as an input to the classifier for classifying the data as attacked and normal data. As a final point, experimental analysis is performed using performance metrics like precision (PR), recall (RC), f-score (FS), accuracy (AC), false alarm rate (FAR) and computational time (CT). The results proved that the proposed work detects intrusion effectively compared to state-of-art techniques.
Keywords
Introduction
Because of rapidly growing network communication technologies, the Internet will link everything from every place [1]. The IoT is the forthcoming emergence of the Internet, and each entity is given the capability to access the Internet [2]. The IoT is a physical device network with sensors, software, and interoperability that can interact with other network-connected devices [3]. Based on the world internet statistics report, the evolution of the Internet (2000-2019) reached 1,114 per cent, with over two quintillion bytes of data produced daily [4]. IoT rapidly increasingly communicates with different systems in the information and communication technologies era [5]. IoT links everyday devices to the Internet, permitting an expansive range of disparate ideals to be met. Mobile phones, sensors, actuators, and other devices can all communicate and collaborate by employing a single addressing model [6]. IoT applications are manifold and connect nearly every part of daily life, including healthcare, the military, and agriculture [7]. IoT are used in several applications, such as microgrids for renewable energy systems and autonomous vehicular and surveillance systems [8]. The IoT sensors are also communicated to the network via mobile devices and get access to most confidential data. As an outcome, the attack surface area extends, as does the possibility of an attack [9]. More IoT-specific security tools are needed, and systems like IDS could be employed to fulfill that need [10].
The IDS is a preliminary network protection strategy that strives to protect and scrutinize the network against irregular activity and intrusion hazards [11]. The IDS is an essential defence mechanism consisting of network monitors that detect and alert users to incoming attacks [12]. IDS will protect terminal users and service providers so that, as risks on the Internet growth, the methods employed in IDS need to be updated [13]. In terms of intrusion detection (ID), IDS outperform traditional firewalls. The IDS’s critical parameters are confidentiality, integrity, and availability. Its confidentiality, integrity, and availability are compromised when information is compromised [14]. IDS techniques come with two approaches: anomaly and misuse detection. Misuse-based IDS (MIDS), also known as signature-based IDS, analyze incoming network traffic using existing signatures. On the other hand, anomaly-based IDS (AIDS), employ a classification method to identify malicious behaviour in the network [15]. During the last decade, much research has been into machine learning (ML) and artificial intelligence-based IDSs. Various machine algorithms, including neural networks and their newer versions, deep learning (DL), support vector machine (SVM), decision tree (DT), k-nearest neighbor (KNN) clustering, and naive bayes [16], were investigated. Because of the large number of features in IoT network traffic, ML models take longer to build and impact IDS performance [17]. So, there is a need for a system to identify the threats quickly and accurately to diminish network problems. As a result, using the LSK-PCA and AU-DFFNN algorithms, this paper proposes a novel DR and classification-based IDS for IoT. The list of mathematical symbols and their description is tabulated in Table 1. The following are the research objectives of the proposed IDS,
Mathematical symbols and their description
Mathematical symbols and their description
To present an efficient pre-processing and optimal feature selection approach using the FY-GEO model to reduce the training time and improve the classification AC of the IDS.
To propose an AU-DFFNN classification technique for identifying multi-class attacks in IoT networks.
To present a novel dimensionality reduction approach using the LSK-PCA algorithm to avoid overfitting issues and misleading classification results in ID.
The other parts of the paper are prearranged as follows: section 2 comprises the literature survey with regards to the proposed IDS framework, the detailed analysis of the proposed IDS system is represented in section 3, in section 4, the results and discussions of the proposed methodology are present, and finally in section 5, the paper is concluded with future scope.
Wisam Elmasry et al. [18] presented a double particle swarm optimization (PSO) based DL technique to perform ID in an IoT network. The double PSO was used to select the optimal feature subsets in the dataset which enhances the classification AC. The algorithm was tested with the two ID datasets, NSL-KDD and CICIDS2017, to evaluate its performance against existing frameworks. Using double PSO, the system improved the AC by 4 to 6% and decreased the error rate by 1 to 5% compared to existing approaches. Tongtong tu et al. [19] suggested a bidirectional long short-term memory and attention framework named BAT to recognize network intrusions in IoT. In BAT, an attention mechanism was included to find the critical features for attack classification, and the multiple convolution layers were used to figure out the local features of the dataset. Finally, the SoftMax classifier was used for ID. The work was tested with the NSL-KDD dataset and showed superior performance compared to existing works. Xianwei Gao et al. [20] recommended an adaptive ensemble learning model for detecting intrusions in the network. The system designed a multi-tree classification scheme for ID by adjusting the training data proportion and setting up multiple DTs. An adaptive ensemble technique was also developed by including random forest, DT, KNN, and deep neural network as base classifiers to improve detection AC. The suggested approaches were tested on the NSL-KDD; the AC attained by the multi-tree approach was 84.2%, whereas the ensemble framework attained an AC of 85.2%.
Unal Cavusoglu [21] presented a hybrid IDS that utilized several ML and feature selection techniques to offer better performance in attack classification. Initially, the dataset was preprocessed and using a feature selection approach reduced the dataset’s size. Finally, a layered architecture was used to decide on a better ML algorithm for ID. The work was tested on the NSL-KDD dataset, and the system resulted in higher AC and a lower false positive rate. Mangayarkarasi Ramaiah et al. [22] suggested a random forest technique to detect several intrusions in the network. The classifier came with an optimized architecture to improve the detection AC. The work was tested on the dataset of KDDCUP99, and the results revealed the best classifier performance amongst existing related schemes. Chia Ming Hsu et al. [23] presented a DL model by combining a convolutional neural network (CNN) as well as long-short-term memory (LSTM) named CNN-LSTM to identify the several attacks involved in the network. The combined approach was tested on the NSL-KDD dataset, and the technique attained better results than the existing related works.
Ai-min Yang et al. [24] designed an optimal neural network, namely levenberg-marquard backpropagation, for performing ID in IoT. The weight values of the neural network were optimized using the levenberg-marquard model. The dataset used was KDD CUP 99, and the best outcomes were attained by continuous training. The outcomes proved that the presented model outperformed the conventional BP neural network concerning detection and FAR. Pankaj Kumar Keserwani et al. [25] presented a mixed optimization-based random forest classifier for detecting attacks in IoT networks. Grey wolf optimization and PSO were used to choose the most significant network features from the dataset. The selected features from the hybrid optimization model were fed into a random forest classifier to detect the network attacks. The system was evaluated on the NSL-KDD dataset, and its outcomes were compared to other similar approaches. The projected model was found to be the most accurate than the compared models. Vikash Kumar et al. [26] developed an IDS, say unified IDS (UIDS), for the IoT platform that protected the IoT network from four different kinds of attacks an exploit, DoS, probe, and generic. After pre-processing the collected dataset (UNSW-NB15), the behaviour of different classes (attack types) and feature selection was investigated. Then the UIDS model was trained on a selected feature of the UNSW-NB15 dataset. The analysis proved that the model outperformed existing approaches regarding attack detection rate.
Azam Davahli et al. [27] presented a lightweight ML approach as an ID mechanism for IoT. The model was based on the GA-GWO, which hybridized genetic algorithm (GA) with grey wolf optimizer (GWO). The hybrid algorithm reduced the dimensionality of the enormous network traffic by choosing essential traffic features. The outcomes demonstrated that the presented GA-GWO improved the performance of the IDS regarding computational costs, AC and FAR when tested on the AWID (aegean wi-fi intrusion dataset). An IDS based on convolutional neural networks was recommended by Idriss Idrissi et al. [28] to detect Botnet attacks in IoT. The system normalized the raw dataset to optimize the classifier’s performance. The model was tested on the Bot-IoT dataset regarding AC and processing time and achieved better performance than some existing techniques. Daming Li et al. [29] presented a deep migration learning-based IoT feature extraction and ID models for a smart city. The ID process included data acquisition, data analysis, and alarm response. According to the experimental results, the developed algorithm attained a lower detection time as well as a higher detection rate.
Weiwei Jiang [30] reviewed several graph-based methodologies, such as graph attention and graph attention convolution networks, to model different network topologies, such as wired, wireless and software-defined networks. The drawbacks and solutions for each graph-based technique to model the network topologies were also presented in the study. Xiaokang Zhou et al. [31] presented the graph neural network (GNN) based IDS for IoT networks. The system generated adversarial examples using the saliency map technique to identify and modify the dataset’s important feature attributes. The vulnerable nodes in the network were identified using a hierarchical node selection algorithm and a random walk with a restart mechanism. By considering the node’s structural features and the loss changes of the network, malicious nodes were identified in the IoT network. The evaluation was done using the UNSW-SOSR2019 dataset, and the outcomes showed the technique’s ability over existing related schemes. Wai Weng Lo et al. [32] presented a graph neural network (GNNs) based IDS for IoT networks. In GNNs, training and evaluation of data were represented as a flow record, which can be represented in a graph format. The GNNs used E-GraphSAGE that captured topological information and edge features of the graph in IoT networks. The presented approach was evaluated on the four benchmark datasets, and the results showed that the GNNs outperformed other related schemes for ID in IoT. Several ML and DL approaches were developed with optimal feature selection algorithms to improve IDS performance. However, it still faces the following limitations. Because of the daily increase in cyber-attacks, IDS classifiers need help determining which event pattern is unusual and malicious. Several methods result in higher FARs due to the irrelevant selection of features and the intricacy of recognizing malicious traffic, similar to regular traffic. Because of the large number of features with redundant information, most existing techniques have higher misclassification rates. Some of the existing works did not use preprocessing or feature selection and DR mechanisms, which increases the detection burden of the classifier for attack classification and leads to false detection. If the dataset was not appropriately processed with the above mechanisms, then the training time of the classifier was increased, and the classifier would be stuck into an overfitting problem. The optimal feature selection algorithms used in existing techniques have the problems of higher CT. In addition, most of the optimization techniques used in existing works came with their traditional form, which means they need to be presented with more improvement or modification to overcome their drawbacks. So, they still raise the problem of population diversity, slow convergence, and being trapped into local optima problems. These problems in optimization algorithms lead to irrelevant feature selection, and as a result, inaccurate classification was achieved in ID. Tuning of hyper parameters is important in any kind of ML and DL model to get optimal results of classification. If the parameters ate not tuned properly, the estimated prediction results generate sub optimal results and the loss function was not minimized that means the model makes more errors (false alarm rate) in prediction. When analyzing the existing frameworks, most of the works did not use any kind of optimization to tune its parameters. All existing classification techniques, such as ML and DL models for ID, came with their traditional form of activation functions, such as sigmoid, which leads to vanishing gradients, overfitting, and under-fitting issues. The techniques could be more focused selectively on valuable parts of the input sequence; hence, they still need to learn the association between them. Due to these drawbacks, the classification of intrusions in IoT leads to lower AC, higher error rate and non-optimal prediction.
As a result, this paper proposes a novel DR and classification technique based on IDS for IoT using LSK-PCA and AUDFFNN algorithms. The presented model conquers numerous drawbacks of previously developed methods, such as lower detection of rare attacks, many features with irrelevant information, misclassification error in attack detection, and time overhead. The proposed framework employs an FY-GEO algorithm for the appropriate selection of necessary attributes and tuning of the network for attack classification and an LSK-PCA model for DR to reduce memory requirements and processing time. Additionally, the proposed framework employs a DL-based AU-DFFNN classifier to accurately and efficiently classify intrusion with a low misclassification rate based on the selected and dimensionality-reduced features. Preprocessing operations such as redundant data removal, nominalization, and normalization are also performed to handle the heterogeneous data generated by different IoT sensor devices.
Proposed framework
This paper develops an IDS based on DL techniques, using LSK-PCA and AU-DFFNN to improve security. The proposed system’s input dataset is first imported, and pre-processing is performed before attribute extraction, attribute selection, and classification. The dimensionality of the selected attributes is reduced before classification process to improve classification AC and reduce training time. The proposed IDS’ structural design is shown in Fig. 1 below.

Structural design of the proposed IDS framework.
To begin the proposed IDS, the input data to train the proposed system is gathered from the NSLKDD dataset. This standard benchmark dataset for IDS comprises 41 features with two labels (normal and attack). After data collection, pre-processing of the dataset is carried out to improve the performance of the proposed IDS system. The results and discussion section gives a detailed explanation of the collected dataset.
Pre-processing
The input dataset often contains discrepancies, especially in large-scale and high-dimensional datasets. Hence, pre-processing is the initial step to remove such discrepancies and transform the data suitable for further processing. The pre-processing steps, such as redundant data removal, Numeralization, and normalization, are carried out in the proposed technique, which is detailed further down.
Redundant data removal
It refers to removing duplicate records present in the dataset multiple times. This is performed to reduce the time required to classify the intrusion.
Numeralization
The dataset may contain both the character and numeric data of the network. In order to enable the DL process, the character features present in the dataset are converted to numeric values. This conversion can be mathematically modeled as,
Where, mathchar′26 - 10muλ i stands for the numeralized result, and D i (i = 1, 2, …, N) represents the number of features in the dataset.
Data normalization is a data mining process that transforms the values of a dataset into a standard scale. This is important because many machine learning algorithms are susceptible to the scale of input features and serve better when the data is normalized. To scale the dataset values, this research uses a min-max normalization method. The values of a feature are scaled to a range between 0 and 1. This is achieved by subtracting the feature’s minimum value from each value and dividing it by its range. The min-max normalization expression is as follows:
Where D is the attribute data, and max(D) andmin (D) are the maximum and minimum absolute value of D respectively. μ and μ* denotes the old and new value of each entry in data. new
min
(D) and new
m
ax (D) denotes the minimum and maximum value of the range (i.e., boundary value of range required) respectively. Finally, the set of normalized input (mathchar′26 - 10muλ
i
) is defined as,
In Equation (3), N is the number of elements in the dataset.
Attribute extraction is the process of extracting all the features present in the data that are required for classification. The input dataset contains a wide range of attributes such as duration, protocol type, service, flag, source bytes, destination bytes, logged-in, etc., which are extracted for analysis.
Attribute selection with FY-GEO
Golden Eagle Optimizer (GEO) is a swarm-based metaheuristic optimization model inspired by nature that mimics cruising and hunting behaviour in a spiral trajectory. During every flight, golden eagles are influenced by the desire to hunt and cruise. After attribute extraction, the proposed FY-GEO algorithm is used to sort only the most important features used in the IDS to detect attacks efficiently. The primary role of feature selection is that the extracted attributes may contain irrelevant, redundant, or noisy features that can be removed without losing information or affecting the prediction model’s AC. This means that the eagle’s prey is usually on one side. This enables them to screen the prey and nearby boulders and bushes to determine the best angle of attack. Meanwhile, they conduct surveys in other areas to see if they can find better food. Due to the high dimensionality of the problems, golden eagles may become trapped in locally optimal solutions, slowing the overall process’s convergence. The Chaos mechanism and the Fishers Yates (FY) technique will be integrated to generate the initial population and calculate attack and cruise coefficients during position updating into the original GEO algorithm to address this issue. The combination of the chaos mechanism and FY in GEO is known as FY-GEO. This prevents the algorithm from being trapped into local optimal solutions and slow convergence and maintains the population diversity of the technique by balancing exploration and exploitation capability. The steps involved in optimizing FY-GEO are described in greater detail. \\
Here, the initial golden eagles’ population (extracted features (x
j
)) is initialized in the d-dimensional search space x
j
d
using the chaos mechanism. Chaos is a deterministic system that shows disproportionate behavior and is subtle to initial conditions. By utilizing their laws, chaos can penetrate all states in a particular range without replication. Attributing to this, chaos search is loftier than any other searching mechanism and can often be used to solve optimization problems. Relative to various chaos search mechanisms, the proposed work uses the one-dimensional logistic chaotic map for generating the initial population to speed up the convergence speed of GEO. The chaotic logistic map can be formulated as,
In the aforesaid equation, α defines the chaos constant, and ℓ
d
(j) represents the j
th
chaos variable in the d
th
dimension. \\
After population initialization, the fitness of the individuals is computed based on the classifier’s AC. It is expressed as follows:
Where, F (x
j
) refers to the accuracy (AC) of the individual in each iteration. The AC is computed by dividing the two accurate predictions by the total number of samples. The two accurate predictions in classification are true positive (Tp) and true negative Tn values respectively, where Fp and Fn indicates the false positive and false negative samples of the classifier. The individuals attaining higher fitness are chosen as the best in the current iteration for selecting weights and biases. The higher fitness of the individuals indicates the eagle’s closest position towards the prey. \\
In search of food, eagles are attracted to the target and cruise together. Once appropriate eagles are identified, then the spiral motion of the population is carried out. This is a vital step in GEO. It has a better memory capacity to remember the best prey location visited up to now. At each iteration, each j arbitrarily chooses the prey (β) of other eagle j’ and circles around the best-visited location till now. \\
Every golden eagle must select a prey to undertake the attack and cruise processes in each iteration. The prey is represented in GEO as the best solution discovered by the flock of golden eagles. Each search agent selects a target prey from the flock’s collective memory during each iteration. A search agent attacks one of the memory positions occupied by another search agent. Each golden eagle’s attack and cruise vectors are then calculated about the prey of choice. The memory is then updated with the discovery that the new location is better than the previous one. \\
At this point, the exploitation process begins with a prey attack. The attack can be represented by a vector that begins with the golden eagle’s current location and ends with the location of the prey in the eagle’s memory. The below equation (6) estimates the attack vector (ξ
j
) of a golden eagle (j),
In Equation (8),
The attack vector determines the cruise. A perpendicular vector to the attack vector is a tangent vector. The tangent hyperplane includes the cruise vector. Also, it is identified as the linear speed of the golden eagle while attacking prey. The scalar model of a hyperplane is mathematically denoted by,
Where, ρ specifies the equation of hyperplane in a d-dimensional search space, ψj′, ψ
j
∈ ξ
j
(ψ1, ψ2, …… , ψ
m
) is the attack vector. \\
The golden eagles move to a new position regarding the attack and cruise vectors. Thus, the step vector (∂x
j
) of each eagle is formulated further,
In Equation (11), κ1, κ2 are the random numbers between 0 and 1, η (ξ
j
) , μ (τ
j
) show the attack and cruise coefficients calculated using the FY technique which controls the cruise and attack behavior. \\
Primarily, the Fisher-Yates algorithm chooses a random golden eagle (Θ) from the initial population (x
j
). Then, the population is counted from the lower end and replace Θ
th
d eagle with the one from the current iteration. The process continues until the replacement of all the eagles in the population and uses this for the computation of coefficients. The shuffled population for coefficient computation is notated as y
ij
. Afterward, the position is updated as,
Here, t refers to the current iteration number. \\
The attack and cruise coefficient from y
ij
controls the transition from the state of exploration to the state of exploitation. The coefficients η (ξ
j
) , μ (τ
j
) can be calculated using equation (10),
In the aforementioned equation,
The features selected from the dataset are in higher dimension. So, they were changed into a lower dimensional data for minimizing the burden of classifier. The technique used in the proposed work for DR is LSK-PCA. In general, principal component analysis (PCA) is a statistical technique that transforms the higher dimensional data onto a lower D-dimensional subspace by removing highly correlated as well as redundant data features. It maps the input higher dimensional data into lower dimensional subspace such that the subspace set is called principal components, and each is further directed towards maximum variance. In PCA, a correlation between data is evaluated based on covariance calculation. However, the covariance calculation has the drawback of only measuring the affiliation between the data, whereas the data strength measurement is not performed. Nevertheless, strength analysis is effective for reducing the dimension. Therefore, to overcome this downside, this paper proposed Log Sigmoid Kernel (LSK) computation instead of covariance estimation. This kernel estimation implicitly maps the data into a lower dimensional space. Henceforth, the proposed technique is renamed LSK-PCA.
In the above equations, ℑ*(k) determines the standardized data, m (ℑ (k)) describes the mean value of the data, and s d (ℑ (k)) specifies the standard deviation.
In (14), lsk (ℑ *(k)) mentions the kernel computation of the normalized input, ℑ*(i), ℑ *(j) indicates the i th & j th from the set of data(k), and γ represents the kernel parameter which is a constant.
Classification of the reduced feature set is done after the process of DR using AU-DFFNN. Feed-forward neural network (FFNN) is a neural network architecture that consists of three layers: an input layer, several hidden layers, and an output layer. FFNN computes the weight values of neurons in the hidden layer to produce output. The features reduced from the previous phase are given as input to the neural network, which considers every input as neurons and processes the data in hidden and output layers. The sigmoid function is used in FFNN to activate the hidden neurons, while the output layer contains a linear transfer function. The unit, however, will never activate if the gradients reach zero. This can result in dead neurons that are never activated. To address these issues, the number of hidden layers is increased by including the activation function (AF), namely the Leaky Single Peaked Triangle linear unit (LSPTLU). This activation is modeled by adjusting the upper bound response threshold of the Leaky ReLU (LReLU) based on biological neuron response characteristics, as well as accounting for the possibility of a vanishing gradient. In addition, the hyper parameters such as weights and bias in the network are tuned using FYGEO that is used in the feature selection phase to improve the prediction accuracy. Furthermore, the cross-entropy loss function is utilized to predict loss in classification. So, the activation and parameter updated DFFNN is called AU-DFFNN. The structure of the proposed classifier is shown in Fig. 2, and the classification procedure is discussed below,

Structure of AU-DFFNN.
In this equation, w
kν
is the optimized weight value between the k
th
input neurons and the ν
th
hidden neuron, b
v
illustrates the hidden layer bias value, and ι (z
ν
) delivers the LSPTLU activation function modeled by,
Where, a is a random number in the interval [-1, 1].
Here, w kς is the optimized weight value between the k th input neurons and the ς th hidden neuron, and b ς illustrates the hidden layer bias value,
In the above equation, w ςl is the optimized weight value between the ς th hidden neuron and the l th output neuron.
In (21), T k denotes the number of training samples, O l be the target label for training samples, and w k characterizes the optimized weights of the AU-DFFNN network. As a result, the classifier separates the normal data and different types of attacked data.
This section discusses the results of the proposed technique and analyses its performance over existing schemes. The proposed work is implemented in PYTHON with Intel(r) Core i7 CPU @ 3.4 GHz and 32 GB of memory running on 64-bit windows. The data is gathered from NSLKDD [33] to train and test the presented system. The NSL-KDD is intended to manage some of the weaknesses of the KDD99 dataset, like duplicate and redundant records, sufficient records at diverse difficulty levels, and an appropriate number of records in both the training and testing datasets. The NSL-KDD dataset is mostly utilized in the identification and other related fields. The NSL-KDD includes 41 labeled features (nine nominal and 32 continuous attributes) and four kinds of attacks. Figure 3 depicts the features used in the NSL-KDD to identify malicious activity in IoT. The training and testing clusters have a substantial portion of recordings. This benefit permits assessments on the whole set instead of just a tiny sample. As a result, the evaluation outcomes of various investigations will be compatible and comparative. The NSL-KDD employs four attacks, which are as follows: DoS, U2 R, R2 L, and probing.

Comparison of proposed and existing approaches for binary classes.
Features of NSL-KDD
Attack types and number of instances used for NSL-KDD
In the proposed IDS framework, feature selection and classification are primary techniques developed to improve the AC and reduce the CT of IDS. So, a novel FY-GEO and AU-DFFNN are employed. The performance of these techniques is detailed as follows,
This section compares the results of the proposed classifier (AUDFFNN) with the existing classification techniques, namely, FFNN, CNN, logistic regression (LR), and SVM. The comparison is made regarding evaluation metrics like PR, RC, FM, AC, FAR, and CT. Figure 3 shows the results of proposed and existing models for binary classes (normal and attack) of data.
Next, the techniques are compared based on multi-class attack data types regarding the same performance metrics, tabulated in Table 4. Figure 5 show that the proposed method achieves higher results for PR, RC, FM and AC. It attains an AC of 99.92%, whereas the existing SVM, DT, CNN, and FFNN obtained an AC of 89.22%, 92.68%, 96.54%, and 98.12%, which are lower than the proposed method. Likewise, the results of the proposed AUDFFNN for binary class (normal and attack) recognition of data is higher compared to existing methods regarding PR, RC, and FM, which shows the performance efficiency of the proposed approach against existing approaches. The error rate of the classifiers was also compared regarding FAR. The existing classification techniques say SVM, DT, CNN, and FFNN attain the FAR of 0.43%, 0.35%, 0.23%, and 0.17%, whereas the proposed AUDFFNN attains the FAR of 0.09%, which is very low compared to existing ones. This lower value of FAR for the proposed AUDFNN shows the lower error rate of the classifier compared to all when performing attack detection.
Results of classifiers for multi attack classes
Results of classifiers for multi attack classes
Comparative analysis of proposed and existing state of art techniques
Table 4 shows the results of classifiers for multi-class attack detection, indicating that the proposed AUDFFNN achieves better results than all. It attains higher PR, RC, FM, and AC values than all. It attains an AC of 99.76% for DoS, 98.56% for probe, 99.87% for U2 R, and 99.54% for R2 L, which are high compared to all other existing frameworks. Likewise, when comparing the FAR (Fig. 4) of the proposed and existing frameworks, the proposed method achieves a lower value for all classes of attacks. The FAR achieved by the proposed method for DoS is 0.10%, a probe is 0.12%, U2 R is 0.11%, and R2 L is 0.12, which are lower compared to SVM, DT, CNN, and FFNN. These lower values of FAR for all attack classes show that the proposed method detects attacks more accurately with fewer error rates. The existing SVM shows the lowest performance compared to all techniques used in attack detection. This is because SVM uses hyperplane parameters for tuning the classifier’s weights and biases that did not perform well for multi-class attack detection. Compared to SVM, DT, CNN, and FFNN perform well for multi-class data because they consider the dataset’s factors. However, comparing all, the proposed method attains higher results because it uses efficient schemes of feature selection and DR in the classification process that helps to attain higher classification results of the proposed AUDFFNN without compromising its CT.

FAR of the classifiers for multi-class attack types.

Computational time scrutiny.
Figure 5 above analyses the time required to complete the ID classification in milliseconds (ms). It compares the obtained results with existing classifiers such as SVM, DT, CNN, and FFNN. When comparing the CT of proposed and existing techniques, the existing techniques need more time than the proposed system to carry out an execution. The proposed AUDFFNN takes only 2828 ms, the lowest value to perform ID. This is due to the proper selection of features and the adoption of efficient preprocessing steps. Alternatively, the maximum computation time is 7125 ms for SVM, and the time varies in the range of 4491 ms – 6206 ms for the remaining techniques. These values are higher than the proposed model. Thus, it is apparent that the proposed system is more secure and fast compared to state-of-arts.
Finally, the proposed method’s comparative analysis (Table 5) is done with the existing published works surveyed in the related works section that uses the NSL-KDD dataset for attack detection. According to the results, the PR, RC, FM, and AC detection rates obtained by the proposed IDS are higher than those obtained by the existing schemes. Some of the existing works attain satisfactory performance. However, they could have achieved higher performance in ID due to some shortfalls. The existing methods use some ML and DL algorithms for ID. However, they were used without any improvement or modifications in their algorithmic process. For example, the models such as DNN, LSTM-RNN, ensemble learning, stacking ensemble models, and LSTM-based RNN did not use any feature selection and reduction techniques for ID that increase the computation burden of the classifier with redundant and irrelevant features. As a result, the classifier’s training time and computational resources are increased without an optimal feature engineering mechanism, and the opportunity to make decisions based on noisy and redundant data is increased. In addition, tuning of hyperparameters is essential in any ML and DL model to get optimal classification results. If the parameters were correctly tuned, the estimated prediction results would generate sub-optimal results. The loss function is not minimized, so the model makes more errors (false alarm rate) in prediction. When analyzing the existing frameworks, most works did not use any optimization to tune their parameters. Only a few of them, namely optimized deep neural networks and GWO– PSO– RF, used optimization in their network for better performance. However, they also have the problem of vanishing gradients in intrusion classification because of the traditional usage of sigmoid activation in backpropagation training. This makes it difficult for the network to learn and tune parameters in earlier layers. In addition, the comprehensive analysis was not presented in the existing schemes, making the prediction results from doubt for ID.
These drawbacks in existing works motivate us to develop an efficient mechanism for ID in IoT by incorporating novel feature selection and reduction mechanisms. Using an effective optimization algorithm as a feature selection mechanism, it identifies the most relevant features in the dataset for ID, leading to higher classification AC and lower training or CT. The modifications say chaos and fisher Yates incorporation in GEO results in an optimal global solution of features for classification. Additionally, incorporating novel DR and parameter tuning in classification prevents overfitting and underfitting issues and lowers memory requirement, lower CT, and higher prediction AC because of less misleading data. Finally, using a novel activation function in DFFNN avoids the vanishing gradient problem and improves the classifier’s efficiency. In addition, a comprehensive analysis is performed in this paper to identify the ability of the proposed and existing ID models in the NSL-KDD dataset. In this way, the proposed method achieves better results with a lower error rate when tested on the benchmark NSL-KDD dataset for ID in IoT networks compared to existing approaches. The existing models failed to attain this much performance due to poor feature learning and algorithmic design. As a result, relying on its inherent ability, the proposed work can better differentiate between different attacks and further classify network abnormality.
This paper proposes efficient methods for DR and IDS using FY-GEO and AU-DFFNN algorithms. First, an improved FY-GEO algorithm is proposed to select the best subset based on feature correlation. Then the dimensionality of the selected features is reduced using LCK-PCA. The classifier based on AU-DFFNN is then introduced to build the classification model. Finally, the proposed method’s performance is contrasted with the existing techniques to inspect the efficiency of the proposed approach. The evaluation metrics used for analysis are PR, RC, FM, AC, FAR, and CT. The experimental results are promising, with an AC of classification equal to 99.87% and precision (99.23%) along with less computation time (2828 ms). The proposed method achieves low FAR for binary and multi-class attack types. Considering these metrics, it is evident that the proposed method obtains effective results and efficiently detects intrusions in the network data. The proposed method will be expanded in the future by utilizing enriched security techniques for attack detection, authentication, and cryptographic encryption.
