An intelligent flow-based and signature-based IDS for SDNs using ensemble feature selection and a multi-layer machine learning-based classifier

Abstract

Software-defined networking is a new paradigm that overcomes problems associated with traditional network architecture by separating the control logic from data plane devices. It also enhances performance by providing a highly-programmable interface that adapts to dynamic changes in network policies. As software-defined networking controllers are prone to single-point failures, providing security is one of the biggest challenges in this framework. This paper intends to provide an intrusion detection mechanism in both the control plane and data plane to secure the controller and forwarding devices respectively. In the control plane, we imposed a flow-based intrusion detection system that inspects every new incoming flow towards the controller. In the data plane, we assigned a signature-based intrusion detection system to inspect traffic between Open Flow switches using port mirroring to analyse and detect malicious activity. Our flow-based system works with the help of trained, multi-layer machine learning-based classifier, while our signature-based system works with rule-based classifiers using the Snort intrusion detection system. The ensemble feature selection technique we adopted in the flow-based system helps to identify the prominent features and hasten the classification process. Our proposed work ensures a high level of security in the Software-defined networking environment by working simultaneously in both control plane and data plane.

Keywords

Software-defined networking (SDN)machine learning (ML)intrusion detection system (IDS)feature selection flow-based IDS

1 Introduction

Computer backbone networks generally consist of numerous switches and routers controlled by a network administrator. Network administrator needs to configure various network policies and update them regularly to adopt the dynamic changes in network traffic. Traditional network architecture is a challenge, and has difficulty handling advanced and dynamic changes in configuration, owing to a lack of automated update mechanisms. To overcome problems in traditional networks and improve the quality of service, a new paradigm termed software-defined networking (SDN) has emerged [1]. SDN reformulates the network control logic as a centralized one. A key characteristic of SDN is the logical separation of the control plane and data (infrastructure) plane. The control plane decides how to handle network data traffic by analysing abstract views of the network. The data plane handles network traffic based on the decisions made by the controller. The data plane comprises network devices like routers, switches, and access points. The controller in the control plane acts as a core part of the network architecture and it is easily programmable. Routing mechanisms and forwarding decisions are implemented in the centralized controller of the software defined network (SDN). The logical separation of the two is made possible by means of a distinct interface between forwarding devices and the application program interface (API) of the controller.

The Open Flow protocol (OF) [1, 2] is a standard API (Application Programming Interface) commonly used in research and academia. The OpenFlow switch is similar to the Ethernet switch which contains one or more flow tables. A flow table is a set of packet handling rules to manage network traffic. For every flow, rules are defined by the centralized controller in the flow table through the OpenFlow protocol. In an SDN context, a flow represents a sequence of packets from source to destination. The controller creates new rules, changes existing rules, or removes certain rules from the flow table by analyzing the overall network. The control logic is programmable through software applications running on the application plane, alongside other applications like load balancing, Firewalls, QoS (Quality of Service), network monitoring etc. Open-source communities have taken on the responsibility of designing SDN controller software [1, 3] such as the NOX, POX, RYU, Floodlight, and OpenDaylight. Such a logical separation between the forwarding devices and controller helps deploy dynamic features and provide utmost flexibility in the network environment.

SDNs, which provide greater flexibility and simplicity than traditional networks, face challenges that are to be addressed in terms of reliability, scalability, and controller placement, as well as security issues like denial-of-service and man-in-the-middle attacks, along with vulnerability scans. This paper focuses on SDN security vulnerabilities. Since the SDN controller is prone to single-point failures, providing the centralized controller adequate security is a major concern. An intrusion detection system (IDS) [4] safeguards the network against all kinds of malicious activity. Integrating an IDS module into an SDN environment potentially secures the network by alerting and detecting unauthorized activity in the network. The IDS, using machine learning techniques, helps detect different kinds of known and unknown attacks with high accuracy [4, 5].

In this paper, we propose a multi-layer machine learning classifier to classify traffic instances in a flow-based IDS. We use an ensemble-based feature selection mechanism to identify essential features from the traffic. Also, we propose a signature-based IDS to monitor traffic in the SDN data plane. To the best of our knowledge, the proposed work is distinct from existing research techniques by providing security measures for both control plane and data plane with the help of IDS.

The major contributions of this research work are:

Proposing a flow-based IDS in the control plane and a signature-based IDS in the data plane.

Designing a flow-based IDS module using machine learning techniques for both feature selection and classification, and training the same using the standard NSL-KDD (Knowledge Data Discovery) dataset.

Creating a signature-based IDS using a rule-based mechanism called Snort to inspect traffic instances in the SDN data plane.

The paper is organized as follows: Section 2 discusses the background and related work. Section 3 describes the flow-based IDS and signature-based IDS. Section 4 explains the evaluation metrics and performance analysis, and Section 5 concludes the paper.

2 Background and related work

Intrusion detection systems play a vital role in traditional networks in detecting malicious activity. There are two kinds of IDS, signature-based IDS and anomaly-based IDS. The signature-based IDS helps detect known attacks by comparing new data with a previously stored malicious database,whereas an anomaly-based IDS helps detect unknown and new attacks by comparing new data against a model of normal activity. It is crucial to design an anomaly-based IDS to protect the network from all kinds of attacks. An IDS module can be constructed using statistical-based, knowledge-based and machine learning-based mechanisms [5, 6]. Machine learning techniques help identify and analyze complex patterns in real-time traffic and produce quick predictions. By successfully integrating the anomaly-based IDS module into SDN control plane [7, 8], we secure the centralized controller from attacks. Since the SDN deals with huge volumes of traffic flow, it is necessary to identify its essential features so as to be alert to malicious activity and detect it quickly. Identifying essential features in the raw dataset helps reduce data dimension and improve the detection accuracy of attacks [9].

2.1 The need for a feature selection technique

An intrusion detection system must deal with large volumes of real-time network traffic to detect and protect the network from malicious activity. It is unnecessary to consider every feature in the dataset for a guaranteed maximized performance. When the number of features increases, the computation cost increases correspondingly. The feature selection technique helps reduce the computational cost of the IDS module by eliminating irrelevant and repeat features from the traffic generated. The feature selection process comprises the three steps of subset generation, evaluation and validation [10]: i) The subset generation module selects subsets of features from the entire feature set. (ii) The subset evaluation module examines the selected subset of features and evaluates them using ranking, correlation, entropy, and accuracy. (iii) Finally, the selected subset of features is considered for validation in a real-time or simulated environment.

Feature selection methods are classified into three types: filter, wrapper and hybrid method [10]. The filter method selects the best subset of features, based on the general characteristics of the dataset. To evaluate the performance of the selected features, an independent machine learning-based classifier is used. The wrapper method uses a classifier as part of the feature selection process. A computationally high and slow process, it performs better than the filter method. The hybrid method uses both the filter and wrapper methods for better performance. Widely-ranging feature selection techniques were used by researchers to identify the best subset of features in both traditional and SDN environments. The outcome of each work depends on factors such as the nature of the dataset as well as feature selection, detection, and validation techniques.

In [11], a support vector machine (SVM) was used as a classifier. Applying the genetic algorithm (GA) and principal component analysis (PCA), essential features in the raw dataset were identified. The KDD CUP’99 dataset was experimented with and the results compared. Their findings showed that minimizing the number of features helps maximize detection accuracy. In [12], the authors proposed a machine learning-based intrusion detection system using feature selection techniques like the linear correlation coefficient(LCC) and cuttlefish algorithm (CFA). A decision tree algorithm was used as a classifier. Their experimental results with the KDD CUP’99 showed that the proposed technique identified essential features in quick time, with a prediction accuracy of 95.03% and a low false alarm rate of 1.65%.

An anomaly-based IDS was advanced in [13], using feature selection analysis and a hybrid classifier model. Key features were selected using the vote scheme and information gain (IG), and hybrid classifiers constructed using the J48, random tree, REPTree, Adaboost and Naive Bayes. The NSL-KDD dataset was experimented with and the results compared with single classifiers like the support vector machine (SVM), Naive Bayes and J48. Their findings showed that identifying essential features and using hybrid classifiers help reduce the false alarm rate and maximize detection accuracy. In [14], feature selection techniques like the IG, gain ratio, correlation-based feature selection, and chi-square test were utilized to select essential features from the NSL-KDD dataset. Their findings showed that the random forest classifier with the gain ratio feature selection technique provides the highest accuracy and hastens the IDS process in the SDN environment.

In [15], feature selection techniques like the principal component analysis (PCA) and genetic algorithm(GA)were investigated to identify the optimal feature subset for accurate traffic classification in the SDN. The authors of [16] used several supervised machine learning techniques as classifiers and the PCA algorithm for feature selection over the NSL-KDD dataset. Their results revealed that the decision tree with the PCA technique provides the highest accuracy and minimum execution time by identifying essential features. Improved information gain (IIG) was proposed in [17] to select essential features. The KDD Cup’99 dataset was experimented with using selected IIG, IG and other features. The IIG features selected provided 96.801% accuracy and a false alarm rate of 1.02%. They found that the feature selection technique helps reduce dataset dimensions and make faster predictions.

In [18], an efficient mechanism was proposed to detect distributed denial-of-service (DDoS) attacks using machine learning-based techniques. Information gain was used for feature selection and the chi-square statistic for feature ranking. The C4.5 and Naive Bayes techniques were used as detection mechanisms. Only 9 of 41 features were used, based on the ranking, and 99.8% accuracy achieved, with a 0.3% false alarm rate. An enhanced support vector decision function (ESVDF) was proposed in [19] to select key features based on the forward feature ranking algorithm. Neural networks and a support vector machine were used as classifiers over the KDD CUP’99 dataset. Their findings showed that the proposed work identifies essential features satisfactorily and offers excellent performance. An efficient IDS mechanism was proposed [20] using a genetic algorithm (GA) as a classifier and a correlation-based feature selection technique, sequential floating selection, the PCA, the IG, and a negative selection approach to pick essential features. Their experimental results showed that sequential floating selection with the GA provides the best accuracy and sensitivity rates.

The literature survey makes it clear that the use of feature selection and a multiple machine learning-based classifier play a critical role in an intrusion detection model. The use of a raw dataset in the classification model leads to poor classification accuracy, owing to redundancy in the dataset. Similarly, every available feature in the input data is not needed to categorize the classes, given that it increases the complexity level of the training as well. Such a problem calls for a feature selection technique that identifies essential features from a high-dimensional feature space.

Hence, in this paper, we propose a flow-based IDS using two mechanisms.

Ensemble-based feature selection techniques to select the best set of features, and

A multi-layer machine learning classifier to classify attacks and normal instances with high accuracy.

3 The proposed architecture

In this section, we discuss our proposed flow-based intrusion detection mechanism and signature-based intrusion detection mechanism to detect malicious activity in the SDN environment. In addition, we discuss how our system helps provide both the centralized controller and forwarding devices security in the data plane by analyzing traffic features.

3.1 System design

We aim to provide a feasible network-based intrusion detection system to secure the network from real-time attacks by applying a machine learning technique with high accuracy and a low false alarm rate. We propose an effective mechanism which integrates a flow-based IDS in the control plane and a signature-based IDS inthe data plane to detect network intrusions.

The two key elements in the SDN environment are the centralized controller in the control plane and forwarding devices in the data plane. The switches in the data plane forward packets based on decisions made by the controller. An OpenFlow-based switch contains flow tables with a set of rules to handle incoming packets. Each entry in the flow table has three fields: a matching rule field, an action field, and a counter filed. When a new packet (packet_in) reaches the OpenFlow-based switches, it starts the lookup process for matching the rule in the flow table. If a match occurs, it follows the action defined in the flow table and updates the packet statistics in the counter field. If a matching rule is unavailable, it sends the packet to the controller as a table_miss. The controller defines the set of rules for the particular flow and installs it in the flow table of the corresponding switch. We use the flow-based forwarding mechanism to collection formation on traffic flow statistics from the controller and direct it to the flow-based IDS module. Using the machine learning-based classifier, it analyzes and detects malicious activity in the network traffic data and alerts the controller. This module does not overhead the SDN environment because flow collection is already part of the SDN framework.

Once the flow rule is installed in the flow table of the switch, there is no security mechanism to monitor the traffic flow installed in the SDN data plane. If a forwarding device in the data plane is compromised by an attacker [21], the flow rules are modified or dropped or misrouted to the victims [22] and an irrelevant request sent to the controller [22, 23], causing the controller to restrict legitimate incoming flows. To address this problem, we introduce a signature-based IDS in the data plane. Using the concept of port mirroring [24], we monitor traffic between switches by forwarding it to the signature-based IDS module. This module uses a rule-based mechanism to detect abnormal activities in traffic and alert the controller. A system diagram of our proposed work is depicted in Fig. 1.

Fig. 1

System diagram.

3.2 The flow-based IDS in thecontrol plane

The flow-based IDS module uses the traffic information extracted from the controller and applies it over the machine learning-based classifier to detect malicious activity. When a table_miss occurs in the OpenFlow switch, it directs it to the controller for further action.

The proposed flow-based IDS module collects data on traffic statistics from the controller, and essential features from the data are identified, based on features selected by the trained IDS module for further processing. The data is forwarded to the trained machine learning-based classifier for inspecting and detecting attacks. If an attack is detected, the IDS module alerts the controller to drop the particular flow, else the SDN controller definesnew flow rules based on an abstract view of the network. Thus, this module helps detect real-time attacks by analyzing all the new flows received from the OpenFlow switches to the controller. The collection of flow statistics does not affect the performance of the controller as every new flow rule reaches the controller for further action. The working function of the flow-based IDS is depicted in Fig. 2.

Fig. 2

Working function of the flow-based IDS.

To analyze and detect malicious activity in real-time traffic based on the flow information, we build a multi-layer machinelearning-based classifier module. By means of identifying important features in every flow, the module detects intrusions in traffic samples. The classifier uses the support vector machine in Layer-I, and the Naive Bayes and C4.5 decision tree in Layer-II. Based on the results of Layer-I, the naïve Bayes classifier inspects all normal instances classified by the Layer-I SVM. Similarly, the C4.5 decision tree classifier inspects all attack instances classified by the Layer-I SVM. This mechanism helps improve classification accuracy and reduce the false alarm rate. Since we identifyessential features using the ensemble feature selection mechanism, the processing and classification complexity are also low. A system diagram of the ensemble-based feature selection process and multi-layer machine learning-based classifier is depicted in Fig. 3.

Fig. 3

A system diagram of the ensemble-based feature selection and multi-layer ML classifier.

3.2.1 Dataset selection

Identifying the correct dataset plays a vital role in all classification models [9, 25]. Our aim is to select a dataset with data that is free of errors and redundancies. The KDD CUP’99 and NSL-KDD [5, 25] are commonly used datasets in the machine learning-based intrusion detection environment. The KDD CUP’99 dataset contains masses of data with redundant elements, which increases the computation cost and causes the machine learning algorithm to bias the results of the IDS module. Most researchers [11 , 20] have only used 10 to 20% of the KDD CUP-’99 dataset for the training and testing phases. Detection results may change dramatically when a random set of data is chosen. To overcome this problem, the NSL-KDD dataset was filtered and developed from the KDD CUP-’99 with the removal of redundant and erroneous data, resulting in the entire dataset being used for training and testing. The NSL-KDD dataset contains a total of 41 features with attack and normal patterns. Attack patterns in the dataset are classified into four categories: denial-of-service (DoS), user-to-root (U2R), remote-to-local (R2L), and probe attacks. The testing dataset contains 22,554 network traffic samples and the training dataset 125,973. A DoS attack prevents a legitimate user from accessing the system’s resources by overloading it with unnecessary requests. In a U2R attack, a normal user in the network tries to gain access to a root user. In a R2L attack, an attacker tries to gain unauthorized access to a local user’s machine. In a probe attack, an attacker tries to retrieve sensitive information from a victim by scanning an unauthorized local user’s machine. Taking into consideration these advantages, we use the NSL-KDD dataset to test and train our proposed flow-based IDS module. The different types of attacksshowcased in the dataset are tabulated in Table 1.

Table 1
A description of the NSL-KDD dataset

Category Total no. of training records Total no. of testing records No. of attacks in training No. of new attacks in testing Total attacks

Probe 11656 (9.11%) 2421 (11%) 4 2 6

DoS 45927 (53%) 7458 (33%) 6 5 11

U2R 52 (0.04%) 200 (0.9%) 4 3 7

R2L 995 (0.825%) 2754 (12.1%) 8 7 15

Normal 67343 (53%) 9711 (43% – – –

Category	Total no. of training records	Total no. of testing records	No. of attacks in training	No. of new attacks in testing	Total attacks
Probe	11656 (9.11%)	2421 (11%)	4	2	6
DoS	45927 (53%)	7458 (33%)	6	5	11
U2R	52 (0.04%)	200 (0.9%)	4	3	7
R2L	995 (0.825%)	2754 (12.1%)	8	7	15
Normal	67343 (53%)	9711 (43%	–	–	–

3.2.2 Data preprocessing

It is necessary to preprocess and normalize data to make the machine learning-based classifier accurate and compatible. The values in the raw NSL-KDD dataset contain all forms (discrete, continuous and symbolic) of data, each with a different range of values. Therefore, preprocessing and normalization are essential for better accuracy. The preprocessing technique involves the normalization of data by mapping symbolic values to numerical values. Protocols like the ICMP (Internet Control Message Protocol), TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are mapped to the numeric values 1,2, and 3 respectively. In the NSL-KDD dataset, 39 types of attack patterns are presented under four types of attack categories, as shown in Table 1. The attack patterns are to be mapped under one of four attack classes: normal (0), probe (1), DoS (2), U2R (3), and R2L (4). Similarly, symbolic values for such items as a service (71) and flag (11) are mapped to numeric values, starting with 0 to N-1 values. Next, we scale the values into a linear range for both higher-end and lower-end values. Features like duration [0,58239], count, and srv_count [0–511] fall within a normal integer range, while others like src_bytes and dst-bytes find themselves in a larger integer range of billions. Normal integer values are scaled into 0.0 to 1.0, and long integer values are scaled into 0.0 to 9.14 by applying logarithmic scaling. Finally, all the features values are in the range either of the Boolean (0 or 1), or 0.0 to 1.0, or 0.0 to 9.14.

3.2.3 Feature selection process

The complete array of features in the dataset does not need to be used to elicit the best performance in the IDS. In the NSL-KDD dataset, the 41 available features are grouped into four: intrinsic (1–9), content-based (10–22), time-based (23–31) and host-based (32–41). Intrinsic features hold basic information on packets. Content-based features hold the payload and information on the original packets. Time-based features hold information on traffic, based on the time duration. Host-based features hold information on the number of connections to the same or different hosts in the network.

Identifying prominent features that are best suited to an SDN environment helps provide better accuracy and reduced training time. Selecting random features in the dataset degrades the performance of the IDS module [9, 10]. We select four filter-based feature mechanisms and one best-first feature mechanism to identify prominent features from the dataset, because the filter method works independently of the classification algorithm and is also less expensive than the wrapper method. The filter method is a statistical-based mechanism that uses the concept of feature ranking to identify the best features. It is also easy to use the filter method in a scenario with a huge quantum of data. In our proposed feature selection mechanism, we use an ensemble-based feature selection technique (EFS). The proposed mechanism identifies the best features by combining the output of information gain (IG), sequential feature selection (SFS), correlation feature selection (CFS), chi-square test and mutual information (MI). Each technique delivers a certain set of selected features as output, using which a subset of features is identified from the total of 41 features. From the identified subset of features, we select the best set of features by applying the threshold value (H = 3); that is, a particular feature selected by three or more feature selection techniques is identified as the best feature. We select the best set of features by applying this process repeatedly over all the subset of features. A flow diagram of the proposed ensemble-based feature selection mechanism is depicted in Fig. 4.

Fig. 4

A flow diagram of the ensemble-based feature selection process.

3.2.3.1 Information gain

Information gain (IG) [26, 27] is an entropy-based feature selection technique built on the concept of the information theory and used to identify the key subset of features. If the removal of feature f_i affects prediction power, it is said to be relevant; that is, f_i contains relevant information about the dataset for prediction. This is done through ranking the subset of features in decreasing order by evaluating the information gain for each variable against the target class. The information gain for the feature is calculated as follows.

Let F represent a set of features (f₁,f₂,f₃, ... ,f_i) and G the target class(g₁,g₂,…,g_i). The information gain of features, F, against class labels, G, is calculatedusing Equation (1): $IG (F, G) = E (F) - E (F | G)$ (1)

We calculate E(F), the entropy of F and E(G|F),and the entropy of F after observing G through Equations (2) and (3), $E (F) = - \sum_{i = 1}^{n} p (f_{i}) {log}_{2} (p (f_{i}))$ (2) $E (G | F) = - \sum_{j = 1}^{n} p (g_{j}) \sum_{i = 1}^{n} p (f_{i} | g_{j}) {log}_{2} (p (f_{i} | g_{j}))$ (3) where p(f_i) is the probability of feature f_i and p(f_i|g_j) the probability of F, given the values of G.

Calculating the IG for each feature in the dataset, we rank the features in decreasing order. It is usual for features with a high IG value to possess the most relevant information for data classification.

3.2.3.2 Sequential feature selection

Sequential feature selection [28] is part of greedy search algorithms, and helps select the K set of sub-features from the D set of the original feature set, K < D. The sequential selection method uses the step-optimal method to select a prominent set of features by adding a good feature or removing a bad feature at each step. We select a subset of features using both sequential forward selection (SFS) and sequential backward selection (SBS).

The sequential forward selection (SFS) technique starts with an empty feature set ø and adds an optimal feature with a higher criterion value at each step. Let input F be the whole set of features {f₁,f₂,f₃, ... ,f_i} in the dataset and output S the selected subset of features. Also, let the number of features selected in S be k, where k is the predefined value such that k < i. Here, we add a feature, s+, into the feature subset, S_k, in case s+ represents the feature that enhances the criterion function. It is added to S_k if associated with the best classifier performance.

Similarly, the sequential backward selection technique starts with a complete feature set, {f₁,f₂,f₃, ... ,f_i}, and selects a set of optimal features by removing features with a low criterion value at each step. Let input F be the whole set of features in the dataset and output S the selected subset of features, where the number of features selected in S is the predefined value, k < i. Here, we remove a feature, s^-, from our feature subset, S_k, whenever s^- represents the feature that reduces the criterion function. It is removed from S_k if associated with the best classifier performance. The algorithm for the SFS and SBS is as follows:

Forward Feature Selection:

Input: F = {f₁,f₂,f₃, ... ,f_i} p = 6

Output: S = {s_j | j = 1,2 ... k;s_jinF} k = {0,1,2 ... i}

Step 1: F₀ =ø, k = 0

Step 2: f+ = arg max J(f_k+f), where f ∈ S –F_k

S_k +1 = S_k+f⁺

k = k+1

Step 3: Repeat step 2 until(k = = p)

Backward Feature Selection:

Input: F = {f₁,f₂,f₃, ... ,f_i}

Output: S = {_j | j = 1,2 ... k; s_j∈F} k = {0,1,2 ... i}

Step 1: F₀ = S, k = i

Step 2: f^- = arg max J (f_k–f), where f ∈F_k

S_k - 1 = S_k–f^-

k = k-1

Step 3: Repeat step 2 until(k = = p)

3.2.3.3 Correlation-based feature selection (CFS)

CFS [20, 29] is a simple filter-based feature selection technique that selects a subset of features by calculating the correlation between each feature and the target class. Features with high correlation value areselected to constitute a prominent subset of features. The correlation function for the optimal subset of featuresiscalculated using Equation (4), $C_{S} = \frac{i * i_{fc}}{\sqrt{i + i * (i - 1) * i_{ff}}}$ (4) where C_S is the heuristic merit of S subset features containing i features, i _fc the mean feature class correlation, and i_ff the mean feature-feature inter-correlation. This function ranks all the feature subsets in the entire search space. Based on the rank, an optimal subset of features is selected.

3.2.3.4. Chi-square test

The chi-square test [30] selects the best feature by calculating the chi-square score between every feature in the dataset against the target class. The method calculates the score by testing the independence between the feature and the target class. A feature with the best score indicates a higher dependency between the selected feature and the target class. The chi-square score is computed using Equation (5),

$\begin{matrix} χ^{2} (f, l^{i}) = \\ \frac{N {[P (f, l_{i}) * P (f^{'}, l_{i}^{'}) - P (f, l_{i}^{'}) * P (f^{'}, l_{i})]}^{2}}{P (f) * P (f^{'}) * P (l_{i}) * P (l_{i}^{'})} \end{matrix}$ (5) where N represents a full dataset, f the presence of a feature, f’ the absence of a feature, l_i the target label, P(f,l_i) the probability of feature f occurring in label l, P(f’,l_i) the probability of feature f not occurring in target label l_i, P(f) the probability of a particular feature occurring in the dataset, P(f’) the probability of a particular feature not occurring in the dataset, and P(l_i) and P(l_i’) the probability that a dataset is labelled to target or not target label l_i.

3.2.3.5. Mutual information (MI)

In information theory, mutual information evaluates the dependency between two variables [26, 31]. The mutual information between two variables, A and B, is the amount of information on B supplied by A or vice versa. If A and B are dependent, the mutual information between the two variables is 1, and 0 if they are not. We select features from the entire dataset by ranking the MI value against the target class. $MI (C, F) = E (C) - E (C | F)$ (6)

The mutual information between the target class and a feature is calculated using Equation (6), where C represents the target class, F the feature, E(C) the entropy of the target class and E(C|F) the conditional entropy.

This function computes how much information is shared between a particular feature and the target class. Features with a high MI value are considered to comprise a prominent set of features.

3.2.3.6 Ensemble feature selection process

Using five different feature selection mechanisms, we identify five different best sets of features from the entire feature set. The final best set of features is selected by a simple ranking mechanism using a predefined threshold value, (H = 3). Once the best sets of selected features are combined, a counter (F_c) is used to count the number of times a particular feature is selected. If a particular feature is chosen by three or more techniques, it is taken into the final best set of features (BS_F). Our proposed ensemble feature selection algorithm is given in 3.2.3.7,

3.2.3.7 Ensemble feature selection algorithm (EFS)

Input: Let F = {f₁,f₂,f₃, ... ,f_i} be the features in the NSL-KDD dataset and C = {c₁,c₂} the target class labels (normal, anomaly)

Output: Overall best set of features BS_F = {f₁,f₂,f₃, ... , f_i}

Initialization

S_Best =φ ∖∖ Set of the best features

Best_(IG), Best_(SFS), Best_(CFS), Best_(MI), Best_(CHI) =φ

H = 3

Begin

For each fi in F

Compute Best_(IG), Best_(SFS), Best_(CFS), Best_(MI), Best_(CHI)

End for

S_Best ←

Best_(IG)+Best_(SFS)+Best_(CFS)+Best_(MI)

+Best_(CHI)

For each f_i in S_Best

Compute C_F

End for

For each f_i in S_Best

If C_F > = 3

BS_F ← f_i

End for

End

where Best_(IG), Best_(SFS), Best_(CFS), Best_(MI), and Best_(CHI) represent the best features selected by information gain, sequential feature selection, correlation feature selection, mutual information and the chi-square test respectively, BS_F the final best set of selected features, and C_F the counter of each feature.

In the NSL-KDD dataset, there are 41 features. By applying five different feature selection mechanisms, we choose five different sets of the best features. The selected features and the search methods are presented in Table 2. After combining all the selected features, the final best set of features, BS_F, is selected using the counter function (C_F) based on the threshold value (H = 3). The number of times a particular feature is selected by different techniques is tabulated in Tables 3 and 4. Table 5 features a description of the selected feature set (BS_F).

Table 2

Different sets of selected features using the feature selection technique

Feature selection method	Selected features	Search method
Information Gain (IG)	F5,F3,F6,F4,F30,F29,F33,F34,F35,F38,F12,F39,F25	Ranker
Sequential Feature Selection	F4,F5,F6,F12,F23,F26,F30	Best First
Correlation Feature Selection	F29,F33,F34,F12,F39,F38,F25,F4,F26,F23,F32,F3,	Ranker
Chi-Squared test	F5,F6,F3,F4,F33,F35,F34,F40,F41,F23,F30,F29,F12,F27,F28,	Ranker
Mutual Information	F6,F5,F4,F41,F28,F12,F27,F30,F3,F40,F29,F34,F35,F33,F37	Ranker

Table 3

Features and their counter value

Selected Features	Count (C_f)
F4,F12	5
F5,F3,F6,F30,F29,F33,F34	4
F35,F23	3
F39,F25,F26,F40,F41,F27,F28	2
F32,F37	1

Table 4

The final best selected features (BS_F)

Method	Selected features (BS_F)
Ensemble Feature Selection (EFS)	F3,F4,F5,F6,F12,F23,F29,
Selection (EFS)	F30,F33,F34,F35

Table 5

Selected feature number, name and description

Selected feature (BS_F)	Name	Description	Nature of the feature
F3	Service	A network service like the http, ftp, telnet etc.	Intrinsic
F4	Flag	Status of connection as error or normal	Intrinsic
F5	Src_bytes	Number of data items from source to destination in bytes	Intrinsic
F6	Dst_bytes	Number of data items from destination to source in bytes	Intrinsic
F12	Logged_in	If successfully logged in means 1 otherwise 0	Content-based
F23	Count	In past two seconds, the number of connections to same host as the current connection	Time-based
F29	Same_srv_rate	Percentage of connections to the same service	Time-based
F30	Diff_srv_rate	Percentage of connections to different services	Time-based
F33	Dst_host_srv_count	Service count for the destination host	Host-based
F34	Dst_host_same_srv_rate	Same service rate for the destination host	Host-based
F35	Dst_host_diff_srv_rate	Different service rate for the destination host	Host-based

3.2.4 The proposed machine learning-based classifier

By applying a feature selection process, we reduce the dimensionality of the feature set from 41 features to 11 essential features. We now train the machine learning-based classifier by applying the best set of features foran enhanced performance. In this section, we discuss our proposed machine learning-based classifier mechanism.

3.2.4.1 Support vector machine (SVM) classifier

The SVM [32, 33] is a supervised machine learning model, primarily used for classification and regression problems. This technique maps the training instances into an n-dimensional feature space and functions by calculating the optimal hyperplane between normal and attack instances in order to classify future instances. Every data point in the SVM is considered a vector in the n-dimensional feature space. Consider that all the data points in the n-dimensional space belong either to class X or class Y. Each training point, a_i, is labelled by b_i, based on Equation (7). $b_{i} = {\begin{matrix} 1 if a_{i} ɛ Class X \\ - 1 if a_{i} ɛ Class Y \end{matrix}$ (7)

The training instances can be represented as in Equation (8). $D = {a_{i}, b_{i} | i = 1, 2, 3, \dots n}$ (8)

Data instances with label 1 and –1 are referred to as positive and negative points, that is, normal and attack classes. Hyperplanes help separate normal and attack instances. The aim of this algorithm is to choose an optimal hyperplanethat separates attack and normal classes with a high margin. The highest margin of the hyperplane is defined by thesum of the distance from the hyperplane to the closest attack and normal points.

The new data point, x, isclassified using Equation (9), $f (x) = sign (w . x + b) = sign (\sum_{i = 1}^{N} α_{i} y_{i} (x_{i} . x) + b)$ (9)

Where α _i is the Lagrangian multiplier of each data point.

The importance of each data point is reflected by α_i,and support vectors are identified by the data points closest to the hyperplane (α> 0). For the remaining data points, α = 0. The data points that lie closest to the hyperplane represent the classifier and act as support vectors. Figure 5 represents the SVM with two classes of data points, attack (–1) and normal (+1), with two boundaries or margins. Support vectors (α> 0) are represented by rounded values (1, –1) and non-support vectors (α = 0) by non-rounded values (1, –1). Changes in support vectors affect the margin; that is, adding or removing non-support vectors does not.

Fig. 5

SVM support vectors, non-support vectors and classes.

To compute the optimal hyperplane, H, consider both the closest positive and negative training points as support vectors H1 and H2, respectively, written as Equations (10) and (11), $\begin{matrix} H \leftarrow w . x - b = 0 \\ H 1 \leftarrow w . x - b = 1 \\ H 2 \leftarrow w . x - b = - 1 \end{matrix}}$ (10) $\begin{matrix} w . x_{i} - b ⩾ 1 for y_{i} = 1 \\ {wx}_{i} - b ⩽ - 1 for y_{i} = - 1 \end{matrix}}$ (11) where w is normal to H, b represents the distance between H and the origin, and H1, H, H2 are parallel.

3.2.4.2 Naïve Bayes classifier

A Naïve Bayes classifier is another commonly used classification technique based on theconcept of Bayes’ theorem [34, 35]. It is a probabilistic model which predicts the class membership probabilities by assuming independence between the predictors. This model works with a large dataset and is also easy to build.

Let TRD = {tr₁,tr₂,tr₃, ... ,tr_n}, where tr_i represents data samples in the training dataset(TRD). Each data sample has values for the M features. Using our proposed feature selection mechanism, we select m, that is, 11 essential features, M = {f₁,f₂,f₃}, ... ,f_m}. The training dataset contains two class labels, attack and normal, represented as C = {c₁,c₂ ... ,c_k}. Let TED = {te₁,te₂,te₃, ... ,te_n}, where te_i represents the data samples in the testing dataset (TED). The trained Naive Bayes classifies te_i to belong to l₁ or l₂ by computing the highest posterior probability.

The Naive Bayes function is computed by Equations (12) and (13), $P (C_{k} | M) = \frac{P (M | c_{k}) P (c_{k})}{P (M)}$ (12) $P (M | C_{k}) = \prod_{i = 1}^{m} P (f_{i} | c_{k})$ (13) where P (C_k|M) is the posterior probability of the target class (C) with a selected set of features (M), P(C) and P(M) are the prior probability of the target class and predictors, andP (M|C_k) is the probability of the features given in the target class. Our proposed work uses the Naive Bayes classifier in the second layer. This layer handles all normal data instances classified from the Layer-I SVM classifier. In all, 53% of the data instances in the training dataset and 43% in the testing dataset comprise normal traffic. So the Naive Bayes classifier in Layer-II is required to process a large number of instances, which, fortunately, it does well [33, 34]. Thus, this module helps restrict wrongly-detected attack instances in the Layer-I SVM.

3.2.4.3 The C4.5 decision tree

In Layer-II, the C4.5 is used to inspect and classify the attack data classified by the Layer-I SVM. In our previous work [35, 36], the C4.5 technique was analyzed and used to detect intrusions in the SDN environment. Based on the results, we have established that the C4.5 works well in terms of both classification accuracy and false alarm rate. The C4.5 classifier inspects attack instances from the SVM and classifies the nature of the attack, categorizing each as a probe, DoS, U2R or R2L.

Ross Quinlan introduced the C4.5 technique, based on the concept of the decision tree. The decision tree [37] is a classification approach that classifies the given input data instances based on the feature values. The task is to select the best feature that correctly partitions the data instances into target classes. The C4.5 uses the information gain ratio as a splitting factor. Information gain is calculated using entropy, which is a measure of the randomness or impurity of data instances. Features with high entropy are selected as the best feature to split the data instances. Information gain for a feature is calculated using Equations 1, 2, and 3. The C4.5 constructs a decision tree by taking the training input data, selecting the best feature as the partition factor, and recursively applying the same to each sub-tree. The leaf nodes of the C4.5 tree represent the target classes. The training data starts at the root node and follows the sub-tree, based on the results of each test, until it reaches a leaf node. The target label of the leaf node is the result of the classification.

Given that the essential features detecting the attack pattern in the feature selection mechanism have been identified already, the computation power of each node is very low, which reduces the process and time complexity. This module helps avert falsely detected attack instances in the Layer-I SVM. The proposed algorithm for the flow-based IDS is given in 3.2.4.5,

3.2.4.5 Multi-layer machine learning classifier algorithm

Input: Generated traffic instances T = {t₁,t_2, ...,t_n}

Output: Attack or normal classification C = {C_attack, C_normal}

Initialization: FS =φ, L1_SVM =φ, L2_NB =φ, L2_C45 =φ

Begin

For each t_i in T

Extract essential features (fi)

FS ← f_i

End For

For each t_i in T

Forward it to the L1_SVM

Process and classify t_i into C_attack or C_normal

If t_i = = C_attack

Forward it to the L2_C45

For each ti in the L2_C45

Process and classify t_i into C_attack or C_normal

If t_i = = C_attack

Alert the controller to drop the flow

Else

Flow rules will be developed

End For

Else

Forward it to the L2_NB

For each t_i in the L2_NB

Process and classify t_i into C_attack or C_normal

If t_i = = C_attack

Flow rules will be developed

Else

Alert the controller to drop the flow

End For

End

where T = {t₁,t₂, ... t_n} represents traffic instances, C = {C_attack, C_normal} the target class labels, FS the features {f₁,f_2, ... ,f_i}, L1_SVM the Layer-I SVM classifier, L2_NB the Layer-II Naïve Bayes classifier, and L2_C45 the Layer-II C4.5 classifier.

3.3 Signature-based IDS module in the data plane

Once an OpenFlow switch receives incoming packets, the switch table lookup process starts. If no match occurs in the table, the switch redirects the request to the controller for new flow rules. If a match is readily available, the switch has a flow rule for the received packet. New packets received in the controller are inspected by the flow-based IDS using the ML-based classifiers. However, for flows defined earlier, there is no mechanism in place to monitor suspicious activity in the data plane. To address this problem, we introduce the signature-based Snort IDS in the data plane. The Snort IDS [24] is a lightweight and open-source IDS that detects abnormal activity by employing rule sets. In traditional networks, Snort is the mechanism commonly used to inspect and detect abnormal activity in the network [38]. OpenFlow switches mirror all the received traffic towards the signature-based IDS module. By applying rule sets, the IDS module monitors network traffic and checks formalicious activity. In the event of an attack, the signature-based IDS module alerts the controller to drop the particular flow.

To mirror the network traffic received in all switches towards the IDS module, our proposed work uses the concept of port mirroring, which duplicates every packet of the switch (in/out) and sends it to the IDS module. Once intrusion occurs, rule sets in the IDS module determine the action to be carried out. Every rule contains two fields, a header and an option. The rule header defines the source, destination IP addresses, port numbers, netmask, and protocol affected by the rule. The rule option defines which part of an IP packet is to be inspected for compliance with a specific rule. This is notified to the controller so that a particular flow can be dropped, following which the controller removes the particular flow from the flow table of the corresponding switch. The working function of the signature-based IDS module is depicted in Fig. 6.

Fig. 6

Working function of the signature-based IDS in the data plane.

4 Evaluation metrics and performance analysis

The performance of our proposed work is evaluated in terms of classification accuracy, false alarm rate (FAR), precision, recall and F-score. To compute these values, metrics such as true positive (TP), true negative (TN), false positive (FP) and false negative (FN) from the confusion matrix are considered. True positive is the measure of the total number of normal instances classified as normal. True negative is the measure of the total number of attack instances classified as attacks. False positive is the measure of the total number of attack instances classified as normal. False negative is the measure of the total number of normal instances classified as attacks.

Accuracy,as given in Equation (14), is the measure of correctly classified attacks and normal instances in the given dataset. $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (14)

Precision, as expressed in Equation (15), is the measure of correctly classified normal instances by all the classified normal instances. $Precision = \frac{TP}{TP + FP}$ (15)

Recall is the measure of correctly classified normal instances by all the instances available for a particular class. It is calculated using Equation (16). $Recall = \frac{TP}{TP + FN}$ (16)

In case of an uneven class distribution, the F-score is used to identify the balance between precision and recall. As seen in Equation (17), it is the weighted average of precision and recall. $F - Score = 2 * \frac{Precision * Recall}{Precision + Recall}$ (17)

4.1 A performanceanalysis of the proposed work using the NSL-KDD dataset

Our proposed multi-layer machine learning-based classifier is trained and tested with the NSL-KDD test and training dataset. The details of the dataset are described in Section 3.2.1. To evaluate the efficiency of the proposed two-layer technique, we have compared our proposed multi-layer classifier with single classifiers like the SVM, Naive Bayes and C4.5 decision tree. To underscore the importance of the feature selection process, we have compared our ensemble feature selection with a full 41-feature dataset.

From Tables 6 –9, it is clear that the identification of essential features in the dataset helps enhance classification accuracy and reduce the time taken to train and test the model.

Table 6
A performance comparison of the ensemble feature selection technique over the training dataset

Classifier Number of features Precision Recall F-Score Accuracy

SVM 41 0.883 0.935 0.908 90.42%

11 0.899 0.951 0.924 92.12%

Naïve Bayes 41 0.93 0.926 0.928 92.33%

11 0.942 0.931 0.936 93.12%

C4.5 41 0.927 0.913 0.92 91.36%

11 0.942 0.934 0.938 93.31%

Classifier	Number of features	Precision	Recall	F-Score	Accuracy
SVM	41	0.883	0.935	0.908	90.42%
	11	0.899	0.951	0.924	92.12%
Naïve Bayes	41	0.93	0.926	0.928	92.33%
	11	0.942	0.931	0.936	93.12%
C4.5	41	0.927	0.913	0.92	91.36%
	11	0.942	0.934	0.938	93.31%

Table 7

A performance comparison of the ensemble feature selection technique over the testing dataset

Classifier	Number of features	Precision	Recall	F-Score	Accuracy
SVM	41	0.901	0.9	0.9	91.41%
	11	0.937	0.914	0.926	93.50%
Naïve Bayes	41	0.929	0.903	0.916	92.66%
	11	0.942	0.93	0.936	94.41%
C4.5	41	0.924	0.927	0.925	93.54%
	11	0.949	0.94	0.944	95.16%

Table 8

A performance comparison based on the time taken to build the model

Classifier	Number of features	Training time	Testing time
SVM	41	25.78	14.63
	11	14.23	8.72
Naïve Bayes	41	10.62	4.34
	11	7.83	3.64
C4.5	41	20.57	12.35
	11	9.63	6.45

Table 9

A performance measure of the Layer-I and Layer-II ML classifiers

Layer	Classifier	Precision	Recall	F-Score	Accuracy
Layer –I	SVM	0.937	0.914	0.926	93.50%
Layer –II	Naïve	0.989	0.971	0.979	98.2%
	Bayes and C4.5

To improve the results significantly, we have proposed a multi-layer ML classifier mechanismthat helps achieve high classification accuracy by evaluating the data in two different layers. In Layer-I, the SVM classifier classifies the data instances into two, attack or normal. A normal instance is forwarded to the Layer-II Naïve Bayes classifier and an attack instance to the Layer-II C4.5 classifier for further inspection. The proposed multi-layer processing approach achieves thehighest classification accuracy of 98.2%. The performance of themulti-layer classifier over a single classifier is depicted in Table 10. Figure 7 represents the confusion matrix for the Layer-I and Layer-II classifiers. Figure 7 indicates that the Layer-II classifier helps reduce the false alarm rate by processing packets separately in parallel.

Table 10

A performance measure of the multi-layer ML classifier over the training and testing datasets

Classifier		Precision	Recall	F-Score	Accuracy	Testing time	Training time
Multi-layer Classifier	Training	0.969	0.977	0.973	97.11%	20.13	12.48
(SVM+Naïve Bayes+C4.5) (11 Features)	Testing	0.989	0.971	0.979	98.2%

Fig. 7

Confusion matrix for the Layer-I and Layer-II classifiers (NSL-KDD dataset).

Our experimental results show that the proposed multi-layer ML-classifier mechanism has achieved better classification accuracy and a lower false alarm rate than the single ML-classifier. Ensemble feature selection has played a pivotal role in our proposed work as well by reducing the time complexity through identifying essential features in the initial stage. The time taken to build the proposed model is also reasonable, given that the Layer-II classifiers process the data parallelly for the final classification of data instances as attacks or normal. The trained multi-layer ML-classifier now processes the real-time flow-based information to detect malicious activity.

4.2 Experimental setup for the flow-based and signature-based IDS in the SDN

We have used the Mininet simulation tool [39] to execute our proposed SDN flow-based IDS and signature-based IDS. It is a standard simulation tool which creates an SDN-based environment using virtual OpenFlow switches, links, controllers and hosts. To control and monitor our network, we have used the Python-based controller POX [1, 3]. The OpenFlow switches in the data plane and the POX controller in the control plane communicate through the OpenFlow protocol version 3.0. To monitor network activity in the data plane, we have used the Snort version 2.0.6 [40] as the signature-based IDS in the data plane. It continuously monitors traffic between all the OpenFlow switches and alerts the controller in case of any suspicious activity. In the control plane, the flow statistics of the new incoming flows are directed to the flow-based IDS module from the POX controller. Using the flow statistics, the trained ensemble feature selection-based multi-layer machine learning classifier analyses the data instances and reports to the controller in the event of intrusions or malicious activity. The working function of both the flow-based and signature-based IDS is more precisely explained in Section 3.2, 3.3.

We have created a network topology with three OpenFlow switches (S1, S2, S3), 12 hosts (H1 –H12), and one POX controller (C1), as depicted in Fig. 8.

Fig. 8

Network topology of the proposed system.

We have generated both attack and normal traffic towards the hosts using Scapy [41], a Python-based packet manipulation tool that helps generate, sniff and modify network packets. It is superior to other packet manipulation tools like the hping3, arping, Nmap, and arpspoof [42]. Given that Scapy sends packets, matches requests and replies, modifies packets, and transmits invalid frames, it is easy to generate different kinds of attacks, along with normal packets. By applying port mirroring, we collect the traffic information transmitted between the three switches. We have usedWireshark to monitor traffic information in the data plane and control plane [43]. Attacks generated using Scapy include flooding attacks like ARP cache poisoning, port scanning, port sweeping, SYN flood, UDP flood and ICMP flood, as well as spoofing attacks like the ARP and DNS. These attacks focus on the forwarding devices in the data plane and the controller in the control plane. For example, flooding attacksstop the victim system from processing legitimate requests by flooding the system with a large number of packets from spoofed IP addresses; spoofing attacks gain access to the local user using spoofed addresses in the network and target the controller with unnecessary flow requests; scanning attacks locate active ports in the network and identify service vulnerability by tracking the victim system. ARP cache poisoning preventsa legitimate user from joining a gateway by poisoning the ARP cache. ICMP flood attacks send an enormous number of echo requests with the victim’s IP address, which results in every host respondingto the said requests. Similarly, we have generated a total of 45,089 attack packets and 72,451 normal packets using Scapy. Table 11 represents the different kinds of packets generated by Scapy. Table 12 shows the performance evaluation of our proposed work in two layers. In Layer-I, the classification accuracy is 94.8%. By splitting attacks and normal into different classifiers (Naïve Bayes and C4.5) in Layer-II, classification accuracy is enhanced to 97.7%. The false alarm rate for the proposed work is comparatively low. In all, 2250 instances (1.91%) in Layer-I and 465 instances (0.39%) in Layer-II are falsely classified as attack instances. Our experimental results clearly show that identifying essential features and processing the data using multiple classifiers improves classification accuracy. Figure 9 depicts the confusion matrix for the Layer-I and Layer-II classifiers using Scapy-generated data instances. It indicates that the false alarm rate is reduced in Layer-II, compared to Layer-I.

Table 11

Different attack and normal instances generated by Scapy for the flow-based IDS

Nature of packets	Total instances
ICMP flood	8942
ARP Poisoning	4986
ARP Spoofing	3259
TCP SYN flood	4872
UDP Flood	6289
HTTP flood	2145
TCP ACK FIN, ACK RST	2561
Other Attacks	12035
Normal Packets	72451

Table 12

A performance measure of the proposed model over the Scapy-generated dataset

Layer	Classifier	Precision	Recall	F-Score	Accuracy
Layer –I	SVM	0.969	0.948	0.958	94.8%
Layer –II	Naïve	0.994	0.974	0.982	97.7%
	Bayes and C4.5

Fig. 9

Confusion matrix for the Layer-I and Layer-II classifiers (Scapy-generated traffic).

To evaluate the performance of the signature-based IDS, we have generated 22,248 attack packets and 27,452 normal packets from eight host systems randomly towards the four victim systems, either on the same switch or a different one. The packets are reflected in the signature-based IDS module by applying the port mirroring technique. Snort IDS rules evaluate the nature of the packet, based on the pre-defined rules set, and alert the controller in case of suspicious activity. The attack instances generated are in the form of UDP, SYN, ICMP and HTTP flood attacks. This is because, by attacking or spoofing host systems,,attackers gain access as local users and try to control the controller by despatching unnecessary flow requests. Our experimental results show that thesignature-based IDS using Snort achieves overall accuracy of 95.26%. Misclassified instances are moderately high in ICMP (5.77%) and UDP flood (6.26%) attacks, and relatively low in TCP SYN (3.29%) and HTTP flood (3.66%) attacks. UDP and ICMP flood attacks are mainly defended using predefined threshold values in Snort rules. This may, however, occasionally cause legitimate traffic to be detected as attack instances.

Table 13 presents the different types of attacks generated by Scapy for the signature-based IDS module. Table 14 represents accuracy for different kinds of flooding attacks, using the signature-based IDS.

Table 13

Different attacks and normal instances generated by Scapy for the signature-based IDS

Nature of packets	Total instances
ICMP flood	8942
TCP SYN flood	4872
UDP Flood	6289
HTTP flood	2145
Normal Packets	27452

Table 14

A performance analysis of the signature-based IDS

Attack types	Accuracy	Overall accuracy
ICMP flood	94.23%	95.26%
TCP SYN flood	96.71%
UDP Flood	93.74%
HTTP flood	96.34%

Table 15

A performance comparison of the proposed work

Title	Type of classifier	Feature selection	Algorithms used	Dataset	Network	Accuracy
[44]	Multi-level Hybrid	Random selection (6 Features)	KNN, Extreme Learning Machine	NSL-KDD	SDN-IDS	84.29
[45]	Single classifier	Information gain (14 Features)	SVM	A generated dataset	SDN-IDS	95%
[46]	Hybrid	Genetic algorithm (14 features)	Genetic Algorithm and Decision tree	KDD-CUP’99 and Generated dataset using hping3	SDN-DDoS	98.22%
[36]	Two level	Not Used	C4.5 and Entropy	A Scapy generated dataset	SDN-DDoS	95.05%
[47]	Two layer multi-class	Information gain (10 Features)	SVM, C5.0	KDD-CUP’99	Traditiona-IDS	93.32%
[48]	Hybrid	Not used	SVM, Self-organized Map (SOM)	A Scapy generated dataset	SDN-DDoS	96.77%
[49]	Hybrid	Genetic Algorithm (11 features)	Genetic Algorithm and SVM	KDD-CUP’99	Traditiona-IDS	97.3%
Proposed Work	Multi-layer	Ensemble Feature Selection (11 Features)	SVM, Naïve Bayes, C4.5	NSL-KDD and the Scapy generated dataset	SDN-IDS	97.7%

4.3 A comparative analysis of the proposed work

We have compared our proposed work with related work proposed for both the SDN and traditional networks, and analysed the results in the form of the type of classifier, feature selection mechanism, classifier algorithm, and type of dataset used, which are tabulated in Table 14. Most researchers have used the KDD-CUP’99 and NSL-KDD datasets. The performance of the classifier depends greatly on the identification of essential features, for the random selection of features results in poor classification accuracy [44]. Hybrid and multi-layer classifiers work well compared to single classifiers [36 , 45–49]. In [36, 48], the focus was only on DDoS attacks, using a generated dataset and the hping3. The comparison makes it clear that identifying essential features and using multiple classifiers help achieve the best accuracy. Using a multi-layer classifier and an ensemble feature selection technique, we have achieved 97.7% accuracy. Additionally, we have proposed the signature-based IDS using Snort to monitor and detect suspicious activity in the data plane.

5 Conclusion and future work

In this work, we have proposed a flow-based IDS in the control plane and a signature-based IDS in the data plane to analyse and detect malicious activity. The IDs helps secure both the control plane and data plane effectively in an SDN environment. This impacts the performance overhead of the controller by monitoring the traffic in the control plane and data plane separately. In the flow-based IDS, we used ensemble-based feature selection and a multi-layer machine learning classifier to analyze and classify attack and normal instances. The NSL-KDD dataset was used to train our proposed multi-layer ML-based classifier. The signature-based IDS used the rule-based Snort IDS to inspect and classify attack and normal instances. To evaluate the performance of our proposed work, attack and normal instances were generated using Scapy. The findings of the proposed work show that identifying essential features and using multiple classifiers improves processing time and classification accuracy, with a low false alarm rate. Our experimental results show that the proposed work outperformed existing approaches with 97.7% and 95.26% accuracy in the flow-based IDS and signature-based IDS, respectively. Insider attackers with access to confidential information on network resources degrade SDN performance by sending undesirable connection requests to the controller or other legitimate devices in the network. Consequently, it is necessary to detect insider attackers in the data plane as early as possible. In future, we plan to implement a host-based IDS in the SDN data plane for the early detection of insider attacks using machine learning techniques.

References

Kreutz

, Ramos

F.M.

, Verissimo

P.E.

, Rothenberg

C.E.

, Azodolmolky

and Uhlig

, Software-defined networking: A comprehensive survey, Proceedings of the IEEE 103(1) (2014), 14–76.

Rotsos

, Sarrar

, Uhlig

, Sherwood

and Moore

A.W.

, OFLOPS: An open framework for Open Flow switch evaluation. In International Conference on Passive and Active Network Measurement (pp. 85–95). Springer, Berlin, Heidelberg. (2012).

Latah

and Toker

, Application of artificial intelligence to software defined networking: A survey, Indian Journal of Science and Technology 9(44) (2016), 1–7.

Sultana

, Chilamkurti

, Peng

and Alhadad

, Survey on SDN based network intrusion detection system using machine learning approaches, Peer-to-Peer Networking and Applications 12(2) (2019), 493–501.

Hindy

, Brosset

, Bayne

, Seeam

, Tachtatzis

, Atkinson

and Bellekens

, Ataxonomy and survey of intrusion detection system design techniques, network threats and datasets, arXiv preprint arXiv:1806.03517. (2018).

Tsai

C.F.

, Hsu

Y.F.

, Lin

C.Y.

and Lin

W.Y.

, Intrusion detection by machine learning: A review, Expert Systems with Applications 36(10) (2009), 11994–12000.

Karakus

and Durresi

, A survey: Control plane scalability issues and approaches in software-defined networking (SDN), Computer Networks 112 (2017), 279–293.

Xie

, Guo

, Hu

, Qu

and Lv

, Control plane of software defined networks: A survey, Computer Communications 67 (2015), 1–10.

Varma

P.R.K.

, Kumari

V.V.

and Kumar

S.S.

, A survey of feature selection techniques in intrusion detection system: A soft computing perspective. In Progress in Computing, Analytics and Networking (pp. 785–793). Springer, Singapore. (2018).

10.

Miao

and Niu

, A survey on feature selection, Procedia Computer Science 91 (2016), 919–926.

11.

Ahmad

, Hussain

, Alghamdi

and Alelaiwi

, Enhancing SVM performance in intrusion detection using optimal feature subset selection based on genetic principal components, Neural Computing and Applications 24(7-8) (2014), 1671–1682.

12.

Mohammadi

, Mirvaziri

, Ghazizadeh-Ahsaee

and Karimipour

, Cyber intrusion detection by combined feature selection algorithm, Journal of Information Security and Applications 44 (2019), 80–88.

13.

Aljawarneh

, Aldwairi

and Yassein

M.B.

, Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model, Journal of Computational Science 25 (2018), 152–160.

14.

Dey

S.K.

, Uddin

M.R.

and Rahman

M.M.

, Performance Analysis of SDN-Based Intrusion Detection Model with Feature Selection Approach. In Proceedings of International Joint Conference on Computational Intelligence (pp. 483–494). Springer, Singapore. (2020).

15.

Da Silva

A.S.

, Machado

C.C.

, Bisol

R.V.

, Granville

L.Z.

and Schaeffer-Filho

, Identification and selection of flow features for accurate traffic classification in SDN. In 2015 IEEE 14th International Symposium on Network Computing and Applications (pp. 134–141). IEEE. (2015),

16.

Latah

and Toker

, Towards an efficient anomaly-based intrusion detection for software-defined networks, IET Networks 7(6) (2018), 453–459.

17.

Xian

, Peiyu

, Wei

and Xuezhi

, An algorithm application in intrusion forensics based on improved information gain. In 2011 3rd Symposium on Web Society (pp. 100–104). IEEE. (2011).

18.

Wang

and Gombault

, Efficient detection of DDoS attacks with important attributes. In 2008 Third International Conference on Risks and Security of Internet and Systems (pp. 61–67). IEEE. (2008).

19.

Zaman

and Karray

, Features selection for intrusion detection systems based on support vector machines. In 2009 6th IEEE Consumer Communications and Networking Conference (pp. 1–8). IEEE. (2009).

20.

Aziz

A.S.A.

, Azar

A.T.

, Salama

M.A.

, Hassanien

A.E.

and Hanafy

S.E.O.

, Genetic algorithm with different feature selection techniques for anomaly detectors generation. In 2013 Federated Conference on Computer Science and Information Systems (pp. 769–774). IEEE. (2013).

21.

Shaghaghi

, Kaafar

M.A.

, Buyya

and Jha

, Software-Defined Network (SDN) Data Plane Security: Issues, Solutions, and Future Directions. In Handbook of Computer Networks and Cyber Security (pp. 341–387). Springer, Cham. (2020).

22.

Parashar

, Poonia

and Satish

, A Survey of Attacks and their Mitigations in Software Defined Networks. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1–8). IEEE. (2019).

23.

Celesova

, Val’ko

, Grezo

and Helebrandt

, Enhancing security of SDN focusing on control plane and data plane. In 2019 7th International Symposium on Digital Forensics and Security (ISDFS) (pp. 1–6). IEEE. (2019).

24.

Manso

, Moura

and Serrão

, SDN-based intrusion detection system for early detection and mitigation of DDoS attacks, Information 10(3) (2019), 106.

25.

Revathi

and Malathi

, A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection, International Journal of Engineering Research & Technology (IJERT) 2(12) (2013), 1848–1853.

26.

Alhaj

T.A.

, Siraj

M.M.

, Zainal

, Elshoush

H.T.

and Elhaj

, Feature selection using information gain for improved structural-based alert correlation, PloS One 11(11) (2016).

27.

Ullah

and Mahmoud

Q.H.

, A filter-based feature selection model for anomaly-based intrusion detection systems. In 2017 IEEE International Conference on Big Data (Big Data) (pp. 2151–2159). IEEE. (2017).

28.

Lee

, Park

and Lee

, Feature Selection Algorithm for Intrusions Detection System using Sequential Forward Search and Random Forest Classifier, KSII Transactions on Internet & Information Systems 11(10) (2017).

29.

Nguyen

, Franke

and Petrovic

, Improving effectiveness of intrusion detection by correlation feature selection. In 2010 International Conference on Availability, Reliability and Security (pp. 17–24). IEEE. (2010).

30.

Osanaiye

, Cai

, Choo

K.K.R.

, Dehghantanha

, Xu

and Dlodlo

, Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing, EURASIP Journal on Wireless Communications and Networking 2016(1) (2016), 130.

31.

Amiri

, Yousefi

M.R.

, Lucas

, Shakery

and Yazdani

, Mutual information-based feature selection for intrusion detection systems, Journal of Network and Computer Applications 34(4) (2011), 1184–1199.

32.

Golmah

, An efficient hybrid intrusion detection system based on C5. 0 and SVM, International Journal of Database Theory and Application 7(2) (2014), 59–70.

33.

Deepa

, Sudar

K.M.

and Deepalakshmi

, Design of Ensemble Learning Methods for DDoS Detection in SDN Environment. In 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN) (pp. 1–6). IEEE. (2019).

34.

Mukherjee

and Sharma

, Intrusion detection using naive Bayes classifier with feature reduction, Procedia Technology 4 (2012), 119–128.

35.

Sudar

K.M.

and Deepalakshmi

, Comparative study on IDS using machine learning approaches for software defined networks, International Journal of Intelligent Enterprise 7(1–3) (2020), 15–27.

36.

Muthamil Sudar

and Deepalakshmi

, A two level security mechanism to detect a DDoS flooding attack in software-defined networks using entropy-based and C4. 5 technique, Journal of High Speed Networks (Preprint), (2020), 1–22.

37.

Ingre

, Yadav

and Soni

A.K.

, Decision tree based intrusion detection system for NSL-KDD dataset. In International Conference on Information and Communication Technology for Intelligent Systems (pp. 207–218). Springer, Cham. (2017).

38.

Ombase

P.M.

, Kulkarni

N.P.

, Bagade

S.T.

and Mhaisgawali

A.V.

, DoS attack mitigation using rule based and anomaly based techniques in software defined networking. In 2017 International Conference on Inventive Computing and Informatics (ICICI) (pp. 469–475). IEEE. (2017).

39.

Mininet Team Mininet: An Instant Virtual Network on your Laptop (or other PC). Available online: http://mininet.org/ (accessed on 12 April 2020).

40.

Cisco Snort—Network Intrusion Detection & Prevention System. Available online: https://www.snort.org/ (accessed on 12 April 2020).

41.

Scapy-Packet Crafting Tool. Available online: https://scapy.net/ (accessed on 12 April 2020).

42.

Biondi

, Network packet manipulation with Scapy. (2007).

43.

Wireshark-Network Packet Analyzer tool.Available online: https://www.wireshark.org/ (accessed on 12 April 2020).

44.

Latah

and Toker

, An efficient flow-based multi-level hybrid intrusion detection system for software-defined networks. arXiv preprint arXiv:1806.03875. (2018).

45.

Boero

, Marchese

and Zappatore

, Support vector machine meets software defined networking in ids domain. In 2017 29th International Teletraffic Congress (ITC 29) (Vol. 3, pp. 25–30). IEEE. (2017).

46.

Preamthaisong

, Auyporntrakool

, Aimtongkham

, Sriwuttisap

and So-In

, Enhanced DDoS Detection using Hybrid Genetic Algorithm and Decision Tree for SDN. In 2019 16th International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 152–157). IEEE. (2019).

47.

Yuan

, Huo

and Hogrefe

, Two layers multi-class detection method for network intrusion detection system. In 2017 IEEE Symposium on Computers and Communications (ISCC) (pp. 767–772). IEEE. (2017).

48.

Deepa

, Sudar

K.M.

and Deepalakshmi

, Detection of DDoS attack on SDN control plane using Hybrid Machine Learning Techniques. In 2018 International Conference on Smart Systems and Inventive Technology (ICSSIT) (pp. 299–303). IEEE. (2018).

49.

Sarvari

, Muda

, Ahmad

and Barati

, GA and SVM algorithms for selection of hybrid feature in intrusion detection systems, Network 1 (2015), 2.