Isolating botnet attacks using Bootstrap Aggregating Surflex-PSIM Classifier in IoT

Abstract

In an Internet of Things (IoT) environment, any object, which is equipped with sensor node and other electronic devices can involve in the communication over wireless network. Hence, this environment is highly vulnerable to botnet attack. Nevertheless, the challenge prevailed in detection of botnet attack due to its unique structurally repetitive nature, performing dissimilar activities that are non-linear, and an invisible nature by deleting the history. Even though existing mechanisms have taken action against the botnet attack proactively, it failed to capture the frequent abnormal activities of botnet attackers due to frequent monitoring. Moreover, when the number of devices in the IoT environment has increased, existing mechanisms has missed more number of botnets due to functional complexity. Therefore, to overwhelm the issues in detecting the botnet attack, our work has proposed a Bootstrap Aggregating Surflex-PSIM Classifier. It gathers data from several sensor nodes, which are then preprocessed using Linear Random Euler complex-valued Filter (LRECF). Accordingly, the linearized data is subjected to the training phase comprising of Random Poison Forest (RPF) to predict accurately the botnet creating Distributed Denial of Service (DDoS) and Spam attacks within less time. After being trained, similar botnets are clustered using surflex-PSIM that isolates the botnet attacked clusters based on automatic trained characteristics pocket value. Thus, with the aid our proposed classifier, botnet is detected and isolated with high accuracy at reduced time, thereby ensures system reliability with enhanced system performance.

Keywords

Random Poisson forest Linear Random Euler complex-valued filter Psim-protein similarity

1 Introduction

With the recent rapid development of the IoT, there has been increasing interest in understanding emerging cyber threats in IoT. IoT devices are extremely vulnerable and attractive to attackers for their highly heterogeneous components; naive security configurations and weak encryption verification [1]. Nodes of IoT are limited in resources where dedicated, diversified communication protocols are used. Some of these differences weaken the ability of IoT nodes to protect themselves. IoT is connecting smart things, such as intelligent devices and sensors, to the Internet [2]. The data collected by smart things is sent to a central cloud-based service that processes all the gathered data and shares these data with users [3].

Botnet attacks are considered as one of the biggest challenges that security researchers and analysts face today on an International scale. The term Bot originates from a word Robot that naturally works as per a PC program or contents composed by the Bot master [4] and these Botnets continue to be a significant source of large scale attacks on the Internet with recent increases in the volume of attack traffic [5]. A typical botnet platform is composed of command and control (C&C) servers, an infection vector to create new bots, and the bots (compromised hosts) [6]. Botnets are used for a variety of purposes including distributed denial of service (DDoS) attacks, spamming, and phishing attacks [7].

The existing botnet detection techniques makes use of use two different methodologies to identify the attacks caused by the botnet, namely, honeynet based detection technique and intrusion detection system. Once the botnet is detected, the threat impact reducing techniques such as proactive defense techniques and reactive defense techniques are used to cure infections caused by the botnet [8]. It is important to detect the botnets at their initial stages and prevent the attacks caused by them to protect the cloud network.

The behavior of the botnet in the network devices and the analysis on their comportment aids in predicting their nature of attack on the network [9]. The monitoring of such data leads to the classification based on behavior in two types, namely, active analysis and passive analysis. The active analysis detects the possible malwares and deactivates them. The bot master either directly or indirectly knows the detection activity. Honeypot and Honeynets are the common active analysis methods used for the botnet detection [10].

Honeypots are vulnerabilities that are intentionally introduced to detect the intrusions and attacks in the internet of things environment. Based on the emulation capacity, the honeypot is classified into two types, namely, high-interaction and low interaction. Almost all features of the real operating system are simulated in the high interaction honeypot [11]. It provides response only to the known ports and protocols. The low interaction honeypot simulates only the key features of the real operating system.

High interaction honeypot allows the attackers to gain full control of the operating system, whereas the low interaction honeypot restricts the attackers from gaining full control of the operating system due to the limited availability of features [12]. Honeypot carries information, such as, signature of bots for content-based detection, information of the botnet C&C mechanisms and servers, unknown security holes that enable bots to penetrate the network, tools and techniques that the attackers use and the motivation of the attackers [13]. Nepenthe is a low-interaction honeypot that focuses on the vulnerabilities of the malware binaries. The honeypot based botnet detection generates a report regarding the detected botnet to provide a better understanding about the consequences of the botnet [14].

Though the honeypots provide a better knowledge regarding the botnet characteristics and technology, the infections caused by botnets are not detected every time. Passive analysis monitors the traffic created by the botnet without corrupting or changing the message. It mainly analyzes the secondary effects caused by the botnet traffic. The advantage of the passive analysis method is that the bot masters are unable to perceive detection activity but it holds less accuracy in detecting botnet attacks when compared to active analysis [15]. Deep Learning is a post-field of machine learning that explores artificial neural networks and similar algorithms of machine learning containing more than one hidden layer.

IOT is a distinct network with large number of applications where there is a chance in occurrence of traffic and privacy concerns whereas a single degradation of a system fails out the whole structure. Similarly, hackers intrude the system through placing botnets and degrade the process. So it is essential to detect the botnet’s accurately and frame out a structure to get rid of botnets and ensure the security of system thereby disabling the exact malware where botnet is placed. Since by ensuring the security through efficient botnet detection and isolation progression, the system accuracy can be improved to great extent and large number of data’s in IOT can be accessed ensure without any intrusion. There arises a falloff in high dimensional data system to accurately knock out the botnets which leads to inaccuracy in detection. In addition to that, delay in detection of botnets causes degradation of system.

Hence, to engulf the issues caused with the detection of botnet, it is desired to gather and analyze their relative behavior, the root cause of DDoS attack, Spam attack etc. Accordingly there arises a need to develop a detection system to tackle the issues along with assuring reliability. Henceforth in our proposed work, a novel Bootstrap Aggregating Surflex-PSIM Classifier is proposed. It makes use of LRECF and RPF to preprocess and train the data gathered from various sensors. It is then followed with clustering and botnet isolation with the help of Surflex_PSIM classifier. Thus, our system ensures reliable and accurate detection of botnets without causing any delay, which in turn reduces the time complexity issues that are faced with the existing detection systems.

The major contribution of our work includes:

To propose a Bootstrap Aggregating Surflex-PSIM Classifier to detect botnet accurately with reduced time complexity.

Preprocess the gathered data with LRECF.

Train linearized data using RPF.

Isolate botnet with the aid of Surflex_PSIM classifier.

Guarantying system reliability with high prediction/detection accuracy.

Rest of the paper is as follows: Section II reviews the conventional researches of this process; Section III explains the proposed methodology and its execution. Section IV discussed the results obtained from the proposed methodology and a brief discussion over the conventional techniques. Conclusions are made in Section V.

2 Related researches

Yair Meidan, Michael Bohadana and Yuval Elovici [16] proposed a novel network-based anomaly discovery method which abstracted behavior snapshots of the network and utilized deep auto encoders to notice anomalous network traffic proceeding from compromised IoT devices. It was relied on deep auto encoders for every device, trained on statistical features pull out from benign traffic data. When applied to new data of an IoT device, noticed anomalies may indicate that the device was compromised.

Vitor Hugo Bezerra et al. [17] had presented a host-based approach to detect botnets in IoT devices, named IoTDS (Internet of Things Detection System). It relies on one-class classifiers, which model only the legitimate device behavior for further detection of deviations, avoiding the manual labelling process. The proposed solution is underpinned by a novel agent-manager architecture based on HTTPS, which prevents the IoT device from being overloaded by the training activities. Even though, it is less resilient.

Yair Meidan, Michael Bohadana and Asaf Shabtai [18] presented machine learning algorithms on network traffic data for accurate identification of IoT devices connected to a network. To train and assess the classifier, it collected and labeled network traffic data from nine distinct IoT devices, and PCs and smartphones. Consuming supervised learning, it trained a multi-stage Meta classifier; in the first stage, the classifier can distinguish between traffic generated by IoT and non-IoT devices. In the second stage, each IoT device was linked a specific IoT device class.

Sajad Homayoun, Marzieh Ahmadzadeh and Sattar Hashemi [19] proposed a deep learning-based botnet traffic analyzer called Botnet Traffic Shark (BoTShark). BoTShark used only network transactions and was free of deep packet examination technique; thus, avoiding integral limitations such as the incapability to deal with encrypted payloads. This also permitted the proposed system to detect correlations between original features and excerpt new features in every layer of an Auto encoder or a Convolutional Neural Networks (CNNs) in a cascading manner. Besides, this exploited a Softmax classifier as the predictor to perceive malicious traffics efficiently.

Farooq Shaikh, Elias Bou-Harb and Jorge Crichigno [20] presented a model to categorize unwelcome IoT devices in enterprises by machine learning (ML). Explicitly IP header data from dark net data was collected for analysis. Then contemplate multiple supervised ML algorithms to categorize these Layer 3 headers. Obtained outcomes showed that Random Forest and Gradient Boosting had high recall and precision scores while Naive Bayes had experienced the worst performance.

Kirubavathi Venkatesh and Anitha Nadarajan [21] had detected the Spyeye and Zeus Botnet with the aid of adaptive learning rate multilayer feed-forward neural network. Here in this work, various classifiers such as Decision tree, Random forest and radial basis function are discussed and are compared with the actively learned neural network.

Kamaldeep Singh et al. [22] built a random forest based decision tree model, to solve the problem of botnet detection in a peer-to-peer network. Though the method served good for detecting botnets, it has failed to detect botnet under low frequency communication, during when certain threshold exceeded.

Thus, [16] makes use of a system, which enable effective detection of malicious activities among nodes but failed to obtain information from large traffic data set; [17, 18] experienced high recognition ratio and high classification ratio, yet, the time complexity is very high; [19] failed to deal with large data set; [20] experiences time complexity issues and [22] doesn’t detect botnet, when certain threshold got exceed. Owing with these issues which causes the IoT system less reliable, it becomes indispensable to innovate a new-fangled technology in the field of IoT to perform accurate detection and isolation of botnet with high efficacy, so as to enhance the reliability of the IoT network.

3 Seclusion of botnet attacks using PSIM based on random Poisson forest model

Botnet attacks are carried out by a group of compromised nodes. Here in order to detect the compromised node accurately, our work proposes a Bootstrap Aggregating Surflex-PSIM Classifier to trace and cluster the botnet attack. The overall architecture of the IoT network with botnet attack is described in Fig. 1.

Fig.1

Overall architecture of the IoT network with botnet attack.

Initially, data to be stored is gathered from sensor network, it includes both linear and nonlinear data. In order to remove an unwanted data, effective preprocessing techniques are required. In the proposed system, Linear Random Euler complex-valued filters (LRECF) which linearize the dataset by using Euler distance valued filtering and prevent the features of botnet from exhaust. consequently, the preprocessed linearized data set are trained by the Random Poisson forest algorithm which applies the general technique of bootstrap aggregation, repeatedly selecting a random sample with replacements of the training sets for a given time. Subsequently, based on the trained data the similar botnets are clustered by means Surflex-PSIM which isolates the botnet attacks as clusters based automatic trained characteristics pocket value based on the surflex characteristics of attacks. Even a large dataset input is subjected also yields accurate clustering such that the timing for get rid of individual analysis of botnet removal can be avoided such that accurate and less time consuming botnet detection can be achieved. The Process flow of our proposed work is shown in Fig. 3.

3.1 Data gathering phase

IoT based applications are based on context aware computing, data are gathered using sensor nodes S = (s₁, s₂ …… …… . . s_n) . Then the collected resources are defined as G = { iot1, iot2, ito3 … iotn } since data are collected from the sensor nodes, it includes raw data also. In order to remove the unwanted data, it requires data preprocessing techniques. Since sensor nodes deliver real time information, linear filtering is adapted to preprocess the data.

3.1.1 Sensors used in Internet of Things (IoT)

The importance of IoT rises day by day, a sensor in its part is designed the measurement of physical external stimulus and records, indicates or responds to it that can be read by a user or another device. The most commonly used sensors in Internet of things are described as follows:

Temperature sensors, used to measure the temperature or heat of a given medium. For that, it requires a physical contact with the object while other types doesn’t require contact, as they can detect liquid or gases that emit radiant energy like spike in heat or temperature. The highly sensitive semiconductors, which are available in market has enough capability to monitor and display slight variation in temperature. Here the sensor allows a central system to monitor the gathered data remotely. However, in order for the temperature sensor to gain internet access, it requires to be connected to a local network. If Wi-Fi is used, then outside attackers have an entry point. Once they are in, they can make the sensor send any data they wish.

Proximity sensors are best to detect any type of motion and are used to avoid obstacles in navigating to a crowded place or any complex route as best possible sensor for the map building. They are widely used in applications such as: security, safety, or efficiency.

Pressure sensors are used for measuring pressure caused in/with any type of gas or liquid. It converts the physical power into an electrical signal. They can be effectively used for measuring other variables like speed and altitude or similar situation in some way. Barometers and pressure gauges are the common examples used for IoT system. Barometers are helpful in weather forecasting as it can give accurate measurement of ambient air. Pressure gauges are mostly used in industrial sites as it is good for the monitoring of pressure in closed environments. Pressure sensors are ultimate solution for IoT devices as it can be used for various areas such as touch screen devices, bio medical devices, automotive systems and manufacturing industry.

Optic sensor is used to detect electromagnetic energies like light. It utilizes the concept of photoelectric effect, in which it can emit, receive, and convert light energy into electrical signal. The fiber optic sensor IoT interface is connected to internet and can collect various related information to monitor different parameters. It acts as the major physical device in an IoT system.

All these above mentioned sensors are subjected to several vulnerabilities caused with the botnets, including: Mirai, persirai, BrickerBot and HideNSeek. Consequently, the initial propagation process tends to be revealed in hindsight, only after an existing infection has been identified. Therefore, the researchers used to model several approaches based on our understanding of the technology and experience with historic attacks to predict propagation dynamics and to explore influential factors.

3.2 Removing complex valued variable

Since IoT environment is based on context aware computing and also different activities carried out by botnet, data sensed by sensor nodes have complex valued variable. In order to remove the non-linearized data, whole dataset obtained are converted in the origin of time axis and there arise a non-linearity while converting to time axis which rectified by using Linear Random Euler complex-valued filter (LRECF), which is shown in Fig. 2. The filter linearizes the dataset by using Euler distance valued filtering and prevent the features of botnet from exhaust.

Fig.2

Complex valued linear filtering.

Fig.3

Process flow of the proposed system.

A complex valued variable C is defined as $C = C_{R} + C_{I}$ (1)

Where C = S (G), C_R and C_I are the real and imaginary parts of C and $i = \sqrt{- 1}$ is the imaginary unity. The probability density function of complex valued random variable would be defined by the joint probability density function of its real and imaginary parts respectively. $p (C) = p (C_{R}, C_{I})$ (2)

The expectation of the complex-valued random variable is defined as $E (C) = E (C_{R}) + iE (C_{I})$ (3)

A random variable which is complex valued would be said to be zero mean when the real and

Imaginary parts are zero $E (C_{R}) = E (C_{I}) = 0$ (4)

In filtering system, pair of samples u^s and v^s from S where S = E (c) is given for training and a set of errors is denoted by e^s = v^s - y^s.Where y^s indicates the expected output. The cost function used for filtering is defined as E (e^s (e^s) ^*).Weight vectors of the learning system would be updated based on the minimization of mean square error and the complex gradient descent method. Initially the cost function would be 0. Then when the second variable enters the learning model, cost function would be calculated based on MSE that is E (e^s (e^s) ^*).Then the weights gets updated using Equation (5) similarly the cost function would be calculated for every instance and weights (w) gets updated simultaneously.l represents current state of the complex learning system. $w (l + 1) = w (l) + η E (e^{s} (l) u^{s})$ (5)

The probability density function of random variable which is complex valued is given as $P (e) = P (e_{R}, e_{I})$ (6)

The entropy of this error data which is complex valued is defined as $H (e) = H (e_{R}, e_{I}) = - E {log p (e_{R}, e_{I})}$ (7)

From Equation (7) data with least entropy error would be passed to the training phase. Data (r1 … . . rn) with least value of entropy error would be chosen.

3.3 Bootstrap aggregating surflex-PSIM classifier

Here linearized data which is derived from the filter would be subjected to the Classifier which has training phase and testing phase. The training phase include Random Poisson forest model which is made up of large set of decision trees and combined them to get an accurate prediction. In decision tree each internal node indicates a test on an attribute. In a decision tree, each branch shows the result of the test. If the node does not have any children then that node is called a leaf node. Every leaf node in the decision tree shows a class label. The main significance of this model is that Rather than hunting down the best feature while part a hub, it scans for the best feature among an irregular subset of features. This procedure makes a wide decent variety, which for the most part brings about a superior model. The conventional random forest algorithm takes less time to train but more time to predict since enormous number of decision trees would cause the model to slow down. In order to speed up the entire process of random forest model, Poisson distribution function is adapted. The testing phase includes P-SIM which isolates the botnet attacks as clusters based automatic trained characteristics pocket value based on the surflex characteristics of attacks. These steps involved in the process of bootstrap aggregating surflex PSIM classifier is depicted in Fig. 4.

Fig.4

Random classifier.

3.3.1 Random training model based on Poisson distribution

Random Poisson forest which counts the number of events and the time that these events occur in a given time interval so it achieved better prediction rate during training phase. The pseudo code for training applies the general technique of bootstrap aggregating is exposed in Section 3.4 and the training process has been shown with the aid of Fig. 5. Initially the linearized data set R = r₁, …… …… … r_n which is obtained from Equation (7) with responses S = s₁, s₂ …… …… …… s_n would be subjected to bagging process which repeatedly selects a random sample with replacements of the training sets for a given time set by Poisson distribution. The linear regression for trained data set is defined as $S_{n} = b_{0} + b_{1} r_{i} + b_{2} r_{i} + . . . . . . . . . . . . . . . . . b_{n} r_{n}$ (8)

Fig.5

Training process with random Poisson forest.

Where b indicates the regression coefficient, S represents the trained data set and r represents the input variable. In order to predict the unseen samples in the data set, Poisson distribution is applied which speeds up the prediction process. $log [E (S)] = 1 b_{0} + b_{1} r_{1} + b_{2} r_{2} . . . . . . . . . + b_{n} r_{n} + log (t)$ (9)

Where log (t) represents the offset variable since Poisson regression uses fixed time and t represents the observed time period. If S follows a Poisson distribution, then the probability of observing i events over the time period is defined as $p (S = i) = \frac{λ e^{- λ}}{i!}$ (10)

Let λ be the expected value (average) of S and e denotes exponential

Then taking average for all the predictions from an individual regression tree $F^{'} = \frac{1}{T} \sum_{t - 1}^{T} F_{t} (E (s))$ (11)

Where F′ indicates the prediction of all the unseen samples, T indicates the time period for observation; E(s) represents the Poisson distribution of trained data set which reduce the time of training.

3.4 Pseudo code for random Poisson forest

Input: Training sample S, classifier F, Iteration I

Output: F′

Training: sets the weightage value m

S_i Sample from S according to the Poisson distribution

F_i Train a classifier S_i on via ${Fe}_{i} = \frac{1}{m}$ $\sum_{r_{i} \in s_{i} : F_{i (r_{i}) \neq y_{i}}}^{weight (r_{i})}$

$β_{i} = \frac{e_{i}}{1 - e_{i}} weight (r_{i}) = weight (r_{i}) β_{i}, \forall r_{i}$

F_i (r_i) = y_i

End for

F′ = ∑_{i:F_i(x)=y} log(1/β_i)

3.4.1 Mass clustering based on P-SIM clustering

After being trained, similar botnets would be clustered using surflex-PSIM utilizing its repetitive structure, which isolates the botnet attacks as clusters based automatic trained characteristics pocket value.

The output of the training phase from Equation (11) would be used for clustering in testing phase based on the similarity of botnet behavior. The main idea behind this approach is to cluster the similar type of botnets among the authenticated smart objects which are involved in the IoT based network. Below mentioned formulae is used to find the similarity of various botnets. The overall process of PSIM clustering is given by the pseudo code, which is discussed in Section 3.5 and is shown with the help of a diagram, Fig. 6. $C_{U, V} = M_{u, v} / - (\forall M_{u^{'}, v^{'}} \in C_{u, v}) \cap (M_{u^{'} v^{'}} \neq M_{u, v})$ (12) $(M_{u^{'}, v^{'}} \neq M_{u, v}) \Rightarrow (u^{'} ⊄ u) U (v^{'} ⊄ v) | M_{u, v} | ⩾ L$ (13)

Fig.6

Flow diagram for clustering of botnets.

Let U and V be two network parameters which are belonging to the IoT family F. Let u and v are two identical subsequence belonging U and V respectively. M_u,v is to represent the matched subsequence of surflex characteristics such as u and v and L represent the minimum length that this similarity should have. C_u,v is defined by the key set of matched parameter values M_u,v for the similarity function.

The matching set C_u,v include all the matched subsequence of maximum length between the sequence u and v.⊄indicates that the one type of botnets is not included in another cluster. All possible matched parameter values should satisfy |M_u,v| ⩾ L since each M_u,v in C_u,v is an expansion of matched parameters of length L. Therefore these approaches gather all the matched network parameter values of length L in linear time. Then weightage value would be given to all matched parameter values to make difference among all other authenticated users. $W (M) = \sum_{i = 1}^{| M |} T [M [i], M [j]]$ (14)

Where M[i] is the i^th botnet of the matched parameter value M and M[i], M[j] is the weightage value of each botnet in the network. T represents the substitution matrix. For the pair of parameter values U and V, matching score S_u,v would be defined as $S_{u, v} = \frac{M \subset C_{u, v}}{MAX (| u |, | v |)}$ (15)

Let S_max be the matching score of the largest network parameter value belonging to the IoT supported network. The maximum of matching score value is defined by $S_{max} = {S_{u, v}; | u | = max {| v |; V \subset F}}$ (16)

Finally, the similarity measure between the two parameters U and V are done by dividing the match score value by the maximum value. Based on that similarity measure value, botnets would be clustered.

3.5 Pseudo code for P-SIM clustering

Matched set is obtained by M

Matched parameter value

C Matching set

For i to 1 maximum of |u| and |v|

k = 0, j = 1

While $(k < | u | and j < | v |)$

if (u [k] = v [j])

Then add the botnet u [k] to M

Else if (|M| ⩾ 1) add M to C

Empty M

End else

Increment k, Increment j

End while

If (|M| ⩾ 1) add M to C

Empty M

k = i j = 0

While $(k < | u | and j < | v |)$

If if (u [k] = v [j])

Then add the botnet u [k] to M

Else if (|M| ⩾ 1) add M to C

Empty M

End else

By clustering the botnet attacks based on the similarity value from data set from training phase, all kind of attacks would be captured and destroyed to enhance the reliability of the network in IoT environment. Since the proposed classifier included random Poisson forest which counts the number of events and the time that these events occur in a given time interval, it achieved better prediction rate during training phase and after being trained, similar botnets would be clustered using surflex-PSIM which isolates the botnet attacks as clusters based automatic trained characteristics pocket value based on the surflex characteristics of attacks.

4 Results

Botnet setup is created in our SSE lab Network that simulates the behavior of the existing real time botnet. We have used seven systems say, 1-BotMaster, 5- Zombies and 1-Command & Control Server for our experimental purpose, which is shown in Fig. 7. The physical topology is star and the speed of the Ethernet cable is 100baseTX. TCP connections features are extracted from the network traffic by sampling at an interval of 5 seconds. The bot traces have been collected for 2 hours a day for five days in a week. In a similar manner, normal web traffic has been collected from the National Knowledge Network with the bandwidth of 100Mbps/s. The working mechanism is mentioned as below:

Fig.7

System model.

Botmaster sends the Trojan or backdoor to these zombies using email spamming technique. Using the backdoor, botmaster installs the HTTP bot binaries in these machines. Now the victim machine will periodically communicate with the Command & Control server and follows the instructions from the botmaster without the knowledge of the user. In this work, DDoS and Spam attacks causing bots are used.

The performance of the proposed system is evaluated based on the clustering ratio of different types of botnet attacks. Botnet attack means group of attackers come together with the aim of destruction of the whole network. Here two types of attacks by botnet are considered. They are Distributed denial of service attack and spam attack. (DoS attack) is a digital attack in which the culprit tries to make a machine or system asset inaccessible to its planned clients by incidentally or inconclusively disturbing administrations of a host associated with the Internet.

Email Spam is the electronic form of garbage mail. It includes sending undesirable messages, regularly spontaneous publicizing, to countless. Spam is a genuine security worry as it can be utilized to convey Trojan stallions, infections, worms, spyware, and focused on phishing attacks. The main difference is that in general attack, one or two attackers would carry out different operation to disturb the normal flow of the network but in botnet attack, group of attackers with the same intention would come together and carry out the same operation to completely destroy the reliability of the network.

4.1 Implementation

The proposed system for IoT based network is implemented by python language and is shown in Fig. 8.

Fig.8

Shows the IoT based network.

4.1.1 Packet delivery ratio

The estimation of Packet Delivery Ratio (PDR) depends on the received and created bundles as recorded in the trace document. All in all, PDR is characterized as the proportion between the got bundles by the goal and the created parcels by the source. Figure 9 shows the ratio of packet delivery achieved during the normal and the botnet attack period.

Fig.9

Shows the ratio of packet delivery during normal and attack period.

4.1.2 Packet lost ratio

Packet loss happens when at least one packets of information traversing a computer network neglect to achieve their goal. Packet loss is estimated as a level of packets lost concerning packets sent. The below figure depicted the packet lost ratio of IoT based network during normal time and attack time. Figure 10 shows the ratio of packet loss achieved during the normal and the botnet attack period.

Fig.10

Shows the packet loss ratio of the network during normal flow and attack.

4.1.3 Throughput

In data transmission, network throughput is the amount of data transferred successfully from source node to destination node in a specified time period, and typically measured in bits per second (bps), as in megabits per second (Mbps) or gigabits per second (Gbps). Figure 11 shows the ratio of throughput achieved during the normal and the botnet attack period. The log detail of each smart object in the network is described in Table 1.

Fig.11

Shows that throughput of the network under normal and attack period.

Table 1

Log detail of each smart object in the network

Node	IP address	Arrival time (sec)	Packet delivery ratio	Packet loss	Throughput
n1	151.142.255.1	2.256	88.025	2.2835	56.895
n2	151.142.255.2	1.267	93.211	1.4756	54.742
n3	151.142.255.3	8.278	94.723	1.8629	55.315
n4	151.142.255.4	1.289	94.601	1.8687	56.889
n5	151.142.255.5	1.314	94.783	1.4756	53.895
n6	151.142.255.6	5.311	89.5404	1.8629	56.888
n7	151.142.255.7	4.322	93.031	2.1905	53.895
n8	151.142.255.8	3.333	95.5216	2.1905	51.2
n9	151.142.255.9	2.344	88.0122	1.4756	60.235
n10	151.142.255.10	1.355	90.5028	2.1905	53.895
n11	151.142.255.11	1.366	92.9934	1.8629	51.2
n12	151.142.255.12	9.377	95.484	2.1905	51.2
n13	151.142.255.13	8.388	87.9746	2.2835	51.2
n14	151.142.255.14	7.399	91.4652	1.9597	51.895
n15	151.142.255.15	6.441	92.9558	1.8629	50.96
n16	151.142.255.16	5.421	89.937	1.8629	53.895
n17	151.142.255.17	4.432	90.4276	1.57717	56.889
n18	151.142.255.18	3.443	92.9182	1.9598	56.888
n19	151.142.255.19	2.454	95.4088	1.8629	51.221
n20	151.142.255.20	1.465	89.8994	1.9598	56.38

4.2 Clustering of botnet of distributed denial of service attack

In this type of botnet attack, group of attackers would send the request for resource to the same destination address for specified time continuously so an authenticated user cannot get that resource for a particular time. The proposed algorithm would cluster those nodes based on the similarity value of packet sending time, destination address and the resource which they requested continuously and the distance between source nodes and destination node is calculated in order to efficiently group the attacks. The nodes clustered under DoS botnet are listed with the help of Table 2 and Fig. 12.

Fig.12

Shows the clustering of botnet attack which leads to distributed DoS attack.

Table 2

Shows that list of nodes clustered under DoS botnet attack

Node	Source IP address	Packet sending time (sec)	Destination IP address	Resource
n1	151.142.255.1	0.214	151.142.250.11	file-1
n2	151.142.255.2	0.214	151.142.250.11	file-1
n3	151.142.255.3	0.214	151.142.250.11	file-1
n4	151.142.255.4	0.214	151.142.250.11	file-1
n5	151.142.255.5	0.214	151.142.250.11	file-1
n6	151.142.255.6	0.214	151.142.250.11	file-1
n7	151.142.255.7	0.214	151.142.250.11	file-1
n8	151.142.255.8	0.214	151.142.250.11	file-1
n9	151.142.255.9	0.214	151.142.250.11	file-1
n10	151.142.255.10	0.214	151.142.250.11	file-1

In existing systems, hierarchical based clustering has been incorporated to cluster the devices of the attackers in the IoT based network, the main problem with hierarchical based clustering is that if the decision is taken once to join two clusters, it cannot be cancelled but in this work mixture model is used for clustering so it has both matrices distance as well as similarity based so the clustering ratio is high when compared with an existing techniques.

4.3 Clustering of botnet of spam attack

Here the botnet would send the email to the spam box instead of sending to the inbox of the mail application. It includes sending undesirable messages, regularly spontaneous publicizing, to countless. Spam is a genuine security worry as it can be utilized to convey Trojan stallions, infections, worms, spyware, and focused on phishing attacks. The proposed system would cluster this type of botnet based on the behavior that sending file to spam box instead of sending to the inbox of the mail. Existing techniques did not cope with different sized cluster and irregular shapes and need of breaking large clusters since they are based on hierarchical based clustering. In this work mixture based clustering is incorporated so it manages all shapes of clustering, and is shown in Fig. 13 and described by means of Table 3.

Fig.13

Shows the clustering of spam type botnet attack.

Table 3

Shows that list of nodes clustered under botnet spam attack

Node	Source IP address	Packet sending time (sec)	Destination IP address	Spam box
N11	151.142.255.11	0.214	151.142.255.1	mail
N12	151.142.255.12	0.214	151.142.255.1	mail
N13	151.142.255.13	0.214	151.142.255.1	mail
N14	151.142.255.14	0.214	151.142.255.1	mail
N15	151.142.255.15	0.214	151.142.255.1	mail
N16	151.142.255.16	0.214	151.142.255.1	mail
N17	151.142.255.17	0.214	151.142.255.1	mail
N18	151.142.255.18	0.214	151.142.255.1	mail
N19	151.142.255.19	0.214	151.142.255.1	mail
N20	151.142.255.20	0.214	151.142.255.1	mail

4.4 Comparison of proposed system with existing techniques

In this section, the proposed system is compared with existing classifiers like decision tree, Random forest, RBF. In order to evaluate the proposed system following parameters are considered Precision, Recall, F-measure and Accuracy.

Table 4 gives the accuracy of the botnet detection system using various classifiers with the same set of TCP features and the performance measures of the proposed detection system for tackling DDoS and Spam attack. Thereby the table ensures that our proposed work has achieved the better accuracy compared with other methods.

Table 4
List of classifiers with proposed system [21]

Classifiers Precision Recall F-Measure Accuracy

IoTDS [17] 0.968 0.931 0.949 96.5333

BoTshark [19] 0.968 0.934 0.95 96.667

Proposed 0.961 0.986 0.976 99.04

Decision Tree 0.968 0.931 0.949 96.5333

Random Forest [22] 0.968 0.934 0.95 96.667

RBF 0.976 0.927 0.95 96.5333

Proposed 0.961 0.986 0.976 99.04

Classifiers	Precision	Recall	F-Measure	Accuracy
IoTDS [17]	0.968	0.931	0.949	96.5333
BoTshark [19]	0.968	0.934	0.95	96.667
Proposed	0.961	0.986	0.976	99.04
Decision Tree	0.968	0.931	0.949	96.5333
Random Forest [22]	0.968	0.934	0.95	96.667
RBF	0.976	0.927	0.95	96.5333
Proposed	0.961	0.986	0.976	99.04

Here the performance metrics includes precision, recall, F-measure and accuracy are shown, in order to analyze that how effectively the proposed system detects the botnet attack (precision). Recall is calculated by analyzing that how many botnets are correctly detected. Accuracy is the measure, which calculates the ratio exist between the total number of botnet with the detected botnet count.

The formula used to calculate all those performance metrics such as precision, recall, accuracy and F-measure are given below as follows and their graphical representations are shown in Figs. 14 –17.

Fig.14

Comparison graph for precision.

Fig.15

Comparison graph for recall.

Fig.16

Comparison graph for F-measure.

Fig.17

Comparison graph for accuracy.

4.4.1 Precision

Precision is a measure of what fraction of test data is detected as attack are actually from the attack classes. $Precision = \frac{TP}{TP + FP}$ (17) where TP represents the true positive value, FP indicates the false positive.

The proposed system has achieved optimum precision value of 0.961 when compared with other classifiers such as Decision tree, Random forest, RBF. Since it has adapted Poisson distribution which counts the number of events and the time that these events occur in a given time interval.

4.4.2 Recall

Recall measures the fraction of attack class that was correctly detected $Recall = \frac{TP}{TP + FN}$ (18)

Where TP indicates the true positive value and FN indicates the false negative

The proposed system has achieved better recall value of 0.986 whereas other classifiers such as Decision tree, Random forest and RBF have got the value of 0.931, 0.934, and 0.927 respectively. Since the proposed system has used the similarity based clustering, it has separated each events correctly.

4.4.3 F-Measure

F-measure is a degree of test’s accuracy, which measures the balance between precision and recall. $F - measure = \frac{2 * P * R}{P + R}$ (19) where P represents the precision and R denotes the Recall value

Since the proposed system has adapted random Poisson distribution in training phase, it has recorded all the rare events which are happened in the IoT environment. Hence it has experienced better F- measure value of 0.976 whereas other classifiers such as Decision tree, Random forest and RBF have got the value of 0.949, 0.95 and 0.95 respectively.

4.4.4 Accuracy

Accuracy is defined as the ratio of number of correctly classified botnet attacks to the total number of botnet attacks $Accuracy = \frac{I_{c} B}{TB}$ (20) where I_cB indicates the correctly identified botnet attack, TB denotes the total number of botnet attack.

Since the proposed system has employed Poisson distribution which captures rare events for a given time and surflex-PSIM which isolates the botnet attacks as clusters based automatic trained characteristics pocket value based on the surflex characteristics of attacks, It has experienced the better accuracy of 99.04 whereas other classifiers such as Decision tree, Random forest and RBF have got the value of 96.5335, 96.667 and 96.5333 respectively.

Table 5 and Fig. 18 reveals about the time complexities experienced by the prior classifiers during the process of botnet detection and isolation. When comparing our proposed work with the prior methodologies such as decision tree, random forest and RBF, our proposed methodology using Surflex-PSIM classifier requires only less period, say 61 and 57 to detect the attacks such as DDoS and Spam respectively. Hence with this comparison, it is clear that the time complexities experienced with the prior techniques are highly reduced with the aid of our proposed work.

Fig.18

Time taken by classifiers to detect botnet.

Table 5

Time taken by classifiers to detect botnet [21]

Classifiers	Botnet	Time (Sec)
Decision Tree	DDoS	98
Random Forest		87
RBF		78
Proposed		61
Decision Tree	Spam	95
Random Forest		90
RBF		72
Proposed		57

Thus, from the overall results and the comparisons discussed in this section, it effectively portrays that our proposed work has achieved better efficacy in detecting and isolating the Botnet with high detection accuracy and reduced time complexity thyan the other existing methodologies.

5 Conclusion

The rapid growth of IoT-oriented technologies results in IoT-based botnet attacks at the cost of increased security threats. The result obtained for the proposed system has exposed better performance when compared to existing systems. The proposed system has taken optimal precision value of 0.961 and recall value of 0.986. It accomplished high F-measure value of 0.976 and high detection accuracy value of 99.04. Thus the proposed Isolating botnet attacks using Bootstrap Aggregating Surflex-PSIM Classifier in IoT has clustered the each type of botnet attack such distributed denial of service, spam botnet attack and maintain the reliability and quality of service in IoT applications. As a future work, we plan to use more IoT botnet attack datasets to analyze the proposed approach as well as conduct comprehensive comparisons for IoT botnet attack detection.

References

Yan

, Huang

, Luo

, Gong

and Yu

F.R.

, A multi-level DDoS mitigation framework for the industrial Internet of things, IEEE Communications Magazine 56(2) (2018), 30–36.

Yeo

, Koo

, Yoon

, Hwang

, Ryu

, Song

and Park

, Flow-based malware detection using convolutional neural network, In Information Networking (ICOIN), 2018 International Conference on (2018), 910–913.

Park

S.W.

, Park

, Bong

, Shin

, Lee

, Choi

and Yoo

H.J.

, An energy-efficient and scalable deep learning/inference processor with tetra-parallel MIMD architecture for big data applications, IEEE transactions on biomedical circuits and systems 9(6) (2015), 838–848.

Jerkins

J.A.

, Motivating a market or regulatory solution to IoT insecurity with the Mirai botnet code, In Computing and Communication Workshop and Conference (CCWC), 2017 IEEE 7th Annual (2017), 1–5.

Kolias

, Kambourakis

, Stavrou

and Voas

, DDoS in the IoT: Mirai and other botnets, Computer 50(7) (2017), 80–84.

Prokofiev

A.O.

, Smirnova

Y.S.

and Surov

V.A.

, A method to detect Internet of Things botnets. In Young Researchers in Electrical and Electronic Engineering (EIConRus), 2018 IEEE Conference of Russian (2018), 105–108.

Smith-perrone

and Sims

, Securing cloud, SDN and large data network environments from emerging DDoS attacks. In Cloud Computing, Data Science & Engineering-Confluence, 2017 7th International Conference on (2017), 466–469.

Rodriguez-Gomez

R.A.

, Macia-Fernandez

and Garcia-Teodoro

, Survey and taxonomy of botnet research through life-cycle, ACM Computer Survey 45(4), 2013.

Seenivasan

and Shanthi

, Categories of Botnet: A Survey, World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering 8(9) (2014), 1689–1692.

10.

Jiang

and Shao

, Detecting P2P botnets by discovering flow dependency in C&C traffic, Peerto-Peer Networking Applications 7(4) (2014), 320–331.

11.

Al-Jarrah

O.Y.

, Alhussein

, Yoo

P.D.

, Muhaidat

, Taha

and Kim

, Data Randomization andCluster-Based Partitioning for Botnet Intrusion Detection, IEEE Transactions on Cybernetics 46(8) (2016), 1796–1806.

12.

Liu

C-Y.

, Peng

C-H.

and Lin

I-C.

, A survey of botnet architecture and botnet detection techniques, International Journal of Network Security 16(2) (2014), 81–89.

13.

Chen

, Gong

, Yu

and Yang

, An adaptive push-styled command and control mechanism in mobile botnets, Wuhan University Journal of Natural Sciences 18(5) (2013), 427–434.

14.

Thangapandiyan

and RubeshAnand

P.M.

, A secure and reputation based recommendation framework for cloud services, in IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2016, Chennai, India (2016), 1–4.

15.

Wang

T-Z.

, Wang

H-M.

, Liu

and Shi

P-C.

, Some critical problems of botnets, Chinese Journal of Computers 35(6) (2012).

16.

Meidan

, Bohadana

, Mathov

, Mirsky

, Breitenbacher

, Shabtai

, Elovici

, N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders. arXiv preprint arXiv:1805.03409 (2018).

17.

Bezerra

V.H.

, da Costa

V.G.T.

, Barbon Junior

, Miani

R.S.

and Zarpelão

B.B.

, IoTDS: A One-Class Classification Approach to Detect Botnets in Internet of Things Devices, Sensors 19(14) (2019), 3188.

18.

Meidan

, Bohadana

, Shabtai

, David Guarnizo

, Ochoa

, Ole Tippenhauer

and Elovici

, ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis, In Proceedings of the Symposium on Applied Computing (2017), 506–509.

19.

Homayoun

, Ahmadzadeh

, Hashemi

, Dehghantanha

and Khayami

, BoTShark: A deep learning approach for botnet traffic detection, Cyber Threat Intelligence (2018), 137–153.

20.

Shaikh

, Bou-Harb

, Crichigno

and Ghani

, A Machine Learning Model for Classifying Unsolicited IoT Devices by Observing Network Telescopes.

21.

Venkatesh

G.K.

and Nadarajan

R.A.

, HTTP botnet detection using adaptive learning rate multilayer feed-forward neural network. In IFIP International Workshop on Information Security Theory and Practice (2012), 38–48.

22.

Singh

, Guntuku

S.C.

, Thakur

and Hota

, Big data analytics framework for peer-to-peer botnet detection using random forests, Information Sciences 278 (2014), 488–497.