Enhancement of email spam detection using improved deep learning algorithms for cyber security

Abstract

Email has sustained to be an essential part of our lives and as a means for better communication on the internet. The challenge pertains to the spam emails residing a large amount of space and bandwidth. The defect of state-of-the-art spam filtering methods like misclassification of genuine emails as spam (false positives) is the rising challenge to the internet world. Depending on the classification techniques, literature provides various algorithms for the classification of email spam. This paper tactics to develop a novel spam detection model for improved cybersecurity. The proposed model involves several phases like dataset acquisition, feature extraction, optimal feature selection, and detection. Initially, the benchmark dataset of email is collected that involves both text and image datasets. Next, the feature extraction is performed using two sets of features like text features and visual features. In the text features, Term Frequency-Inverse Document Frequency (TF-IDF) is extracted. For the visual features, color correlogram and Gray-Level Co-occurrence Matrix (GLCM) are determined. Since the length of the extracted feature vector seems to the long, the optimal feature selection process is done. The optimal feature selection is performed by a new meta-heuristic algorithm called Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). Once the optimal features are selected, the detection is performed by the hybrid learning technique that is composed of two deep learning approaches named Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN). For improving the performance of existing deep learning approaches, the number of hidden neurons of RNN and CNN is optimized by the same FLI-DA. Finally, the optimized hybrid learning technique having CNN and RNN classifies the data into spam and ham. The experimental outcomes show the ability of the proposed method to perform the spam email classification based on improved deep learning.

Keywords

Email spam detection improved deep learning optimal feature selection text and visual features Fitness Oriented Levy Improvement-based Dragonfly Algorithm Recurrent Neural Network Convolutional Neural Network

Nomenclature

GWO

Grey Wolf Optimization

GLCM

Gray-Level Co-occurrence Matrix

FPR

False Positive Rate

WCFS

Water Cycle Feature Selection

FLI-DA

Fitness Oriented Levy Improvement-based Dragonfly Algorithm

ANN

Artificial Neural Network

NSA

Negative Selection Algorithm

RNN

Recurrent Neural Network

ALO

Ant Lion Optimization

CNN

Convolutional Neural Network

NPV

Net Present Value

PSO

Particle Swarm Optimization

LOF

Local Outlier Factor

Dragonfly Algorithm

RWN

Random Weight Network

Genetic Algorithm

SVM

Support Vector Machine

TF-IDF

Term Frequency-Inverse Document Frequency

FNR

False Negative Rate

CNSA-FFO

Combined Clustered NSA and Fruit Fly Optimization

KNN

K-Nearest Neighbour

MCC

Matthews Correlation Coefficient

Decision Tree

HKSVM

Hybrid Kernel based Support Vector Machine

LSTM

Long Short-Term Memory

Neural Network

WOA

Whale Optimization Algorithm

DWT

Discrete Wavelet Transform

Naive Bayes

FDR

False Detection Rate

GRU

Gate Recurrent Unit

DNN

Deep Neural Network

WOA

Whale Optimization Algorithm

1. Introduction

Nowadays, cyber crimes are mostly considered as “borderless white collar” crimes. The internet users such as the government, organizations, and individuals are mostly affected by it [23]. Most of the email users are tired of receiving spam daily in their inboxes. Various organizations and individuals broadly use electronic mail (email), and it is important for several types of group connections. In these days, spam email is one of the costly and fast-rising issues connected with the internet. Spam emails are mostly a commercial one, and it consists of attractive links to the prominent websites, but mostly these pave the way for the officious sites [37]. Machine learning plays a vital role in cyber security [26]. Spam is a worldwide email services-linked problem. It is composed of unnecessary and spontaneous emails without a deliberate receiver, and it is intended for various reasons, from marketing to scams and fraud [4,13]. In 2009, nearly 97% of received or sent emails were categorized as spam mails. Hence, in modern times, more concentration is provided for email classification. Currently, conflict occurs between spammers and spam detection tools since each side searches for novel paths of revealing other’s presence [3,31].

Spam is defined as a junk or unwanted message sent to an Internet user’s inbox. Spam is a severe threat to the internet and society [2]. The spam messages cause internet users to face security issues and others to face improper and illegal issues. Additionally, valuable resources such as productivity, bandwidth, and storage are also wasted by spam messages [17]. Hence, more demand arises for automatic email spam filtering [15]. Even though the practitioners and researchers take a constant effort to generate accurate spam detection systems, numerous spam messages are yet received by internet users daily. Spammers send many messages with no aid of cost using botnets, malware, and spam campaigns [17,22]. Therefore, a spam detection system must be used to detect the fraudulent and unsolicited emails that are afflicting the enormous benefits of the emails.

Spam detection compares the non-spam and spam emails, which will help to avoid the spam mail from receiving into the user’s inbox [21,34]. The initial step in the email filtering process is spam detection that prevents junk mails from going into the users’ inboxes. The presence of vast mailing tools increases the spam email count in a fast manner. The spam problems are mostly handled by the spam filters [38]. These filters recognize spam emails on the basis of the analysis of their contents and various additional information [16]. In the initial stage, spam filters are produced on the basis of blacklists of recognized spammers, keyword filtering, and a group of user-defined rules [29,39]. Yet, these methods require to be updated and maintain in a continuous manner since it suffers from the time-consuming and ineffective problem. This technique is called as the knowledge engineering approach. An efficient way for pattern matching is the behavioral characterization in detecting malware [35].

The major development of this paper is enlisted as below.

To improve the email spam detection using the hybrid deep learning algorithms consisting of CNN and RNN by optimizing the hidden neurons for improved cyber security.

To perform the feature extraction for extracting the features with two sets of features like text features and visual features, TF-IDF is extracted for the text features. Then, the color correlogram and GLCM are determined for the visual features.

To introduce a new meta-heuristic algorithm called FLI-DA for improving the optimal feature selection and deep learning process.

To reveal the capability of the developed technique using various performance measures for the spam email classification on the basis of improved deep learning.

The paper organization is arranged in the below manner: Section 1 provides the introduction about email spam detection for improved cybersecurity. The various literature survey works related to email spam detection are listed in Section 2. The proposed model for email spam detection using text and image datasets is described in Section 3. Section 4 describes the text features and visual features adopted for proposed email spam detection. The FLI-DA for optimal feature selection and classification is provided in Section 5. Section 6 explains the text and visual feature classification using hybrid CNN and RNN. The results and discussions are discussed in Section 7. Finally, Section 8 concludes the paper.

2. Literature survey

2.1. Related works

In 2019, Rawashdeh et al. [1] used a novel technique composed of enhancement, comparison quality, assessment, induction, and groundwork. The recommended spam classification was tested using seven datasets. The dataset was validated and trained by cross-validation. The feature selection was made by the meta-heuristic algorithm known as the WCFS and simulated annealing using the three styles of hybridization. The outcomes have displayed that the SVM classifier performed better f-measurement. Additionally, the feature count using simulated annealing and interleaved water cycle was minimized below 50%.

In 2018, Naem et al. [30] suggested a novel predictive technique to handle the spam emails issue based on boosting, and ALO is known as the ALO-Boosting. In this context, the developed technique was used in the different seeking regions to alter the original location of the people. On the basis of the boosting algorithm, the optimum feature subset was obtained for the enhanced classification. Further, the boosting classifier known as the classification technique has described the group of models that altered the soft learners to the powerful learners. For the chosen least value of the features, the optimum features were detected by the presented model. Based on the boosting classifier, spam email classification has attained more precision.

In 2015, Idris et al. [19] introduced an email detection model designed depending on the NSA’s improvement. The random detector generation present in the NSA was improved by applying the PSO. During the NSA’s random detector generation stage, the detectors were generated by the model. The detector was generated as the fitness function by combining the NSA-PSO-used LOF. Once it attained the anticipated spam coverage, the procedure of termination has been stopped. The evaluation of the examination and the application of the techniques were also made. The results displayed that the presented model had generated the best performance when differentiated from the conventional models.

In 2019, Faris et al. [14] developed an intelligent detection model to handle the email spam detection works using the RWN and the GA. The detection procedure was used for detecting the features using an automatic recognition capacity. Depending on the three email corpora, the analysis of the developed method has occurred during the sequence of tests. The outcomes have shown that the presented method has achieved high precision, recall, and accuracy.

In 2019, Olatunji [33] developed an SVM-oriented method for detecting spam. The best performance was attained with the optimal parameters by paying more attention to the successful search. The tests were carried on both the testing as well as the training datasets. This method is not suitable for large datasets. The outputs have revealed that the developed technique was superior to the state-of-the-art techniques. In 2019, Chikh et al. [7] had introduced a novel method of email detection based on an improved NSA known as the CNSA-FFO. The efficiency of the conventional NSA was enhanced by combining the original NSA with the FFO and K-means clustering. The results have shown that the presented model has exceeded the conventional NSA-PSO in positive prediction, complexity, and accuracy.

In 2019, Kumaresan et al. [24] presented a spam classification technique using HKSVM and S-Cuckoo. In the first step, depending on the image and the text, the features were collected from the emails. The textual features have used the term TF frequency. The image features were considered by the wavelet moment and the correlogram. The optimum features were identified by the hybrid features using the hybrid algorithm known as the S-Cuckoo search. Moreover, the recommended classifier did the classification by merging three different kernel functions, and it was used by the SVM classifier in the final step. The results have shown that high accuracy was obtained by the recommended model.

In 2019, Shuaib et al. [40] proposed a meta-heuristic optimization technique for choosing the salient features in the email corpus known as the WOA. The rotation forest technique performs the feature selection using WOA to classify the mail as non-spam and spam. Once the feature selection has been made, the presented algorithm can categorize the emails.

In 2021, Noorizadeh et al. [32] developed the design and implementation of the cyber-physical industrial control system testbed. The Tennesse Eastman process replicates in the PC, and the closed-loop controllers were involved in the Siemens PLC. By using the man-in-the-middle structure, the developed testbed was injected by the false data injection cyber attacks. This method is used for providing privacy for the users. Several cyber-attack detection algorithms were improved in real on the testbed, and performance was related with the other. However, this method makes the system very slow.

In 2020, Samira et al. [12] developed a hybrid approach to spam filtering based on the Neural Network model Paragraph Vector-Distributed Memory (PV-DM). It was used for building the compact representation of email. A comprehensive filter for categorizing the emails was represented in this methodology. The PV-DM and the TF-IDF were taken into account to allot a dual representation vector to each message. This method was resistant to differences in message cohesion and the language system. Anyhow, this method ignored the relationship within the languages.

2.2. Review

Although there are some detection models for identifying spam emails, there are still some challenges faced by the people for accessing the spam emails, so a new methodology needs to be implemented for acquiring the spam emails effectively. Some of the features and challenges are mentioned in Table 1. Among them, by using SVM [1], the classification performance acquired was high, and Simulated Annealing [1] was employed for optimizing the outcomes and assessing the spam detection model. Yet, the cost function of simulated annealing is highly expensive. ALO [30] is used for altering the original location of the people present in the different searching regions, and it is performing well in detecting optimum features with the least number of features selected and acquiring more precision. But, it needs to improve the performance when it is used with existing classifiers. NSA-PSO [19] acquired the best accuracy when compared over benchmark NSA model, and it employs LOF as the fitness function for generating detector. Still, parallel hybridization needs to be implemented for detecting email spam in the network. GA [14] can find the significant features of spam emails, and it optimizes the configuration of its core classifiers. However, it needs to examine imbalanced datasets. SVM [33] performs higher, and it is flexible and reliable. This method is not suitable for large datasets. CNSA-FFO [7] has high performance regarding accuracy and computational complexity, and it can detect email spam. But, it has less convergence precision. In HKSVM [24] every kernel function is suitable for some tasks, and it defines the arrangement of huge dimensional space. Yet, the performance needs to be enhanced. Random forest algorithm and WOA [40] have provided a great enhancement, and WOA can shun away from the local optima. Still, it needs to experiment on huge datasets. Hence, it is specified that the above mentioned defects might help upcoming researchers in effectively developing new models for detecting spam emails. The cyberattack detection method [32] is used for providing privacy for the users. This method makes the system slower than before. PV-DM [12] is resistant to differences in message cohesion and the language system. This method ignores the relationship within the languages.

Table 1
Features and challenges of conventional email spam detection methods for cybersecurity

Author [citation] Methodology Features Challenges

Rawashdeh et al. [1] Simulated Annealing algorithm and SVM ∙ By using SVM, the classification performance acquired was high.
∙ Simulated annealing was employed for optimizing the outcomes and assessing the spam detection model. ∙ The cost function of simulated annealing is highly expensive.

Naem et al. [30] ALO ∙ It is used for altering the original location of the people present in the different searching regions.
∙ It performs well in detecting optimum features with the least number of features selected and acquiring more precision. ∙ Need to improve the performance when it is used with existing classifiers.

Idris et al. [19] NSA-PSO ∙ It acquired the best accuracy when compared over benchmark NSA model.
∙ It employs LOF as the fitness function for generating the detector. ∙ Parallel hybridization needs to be implemented for detecting email spam in the network.

Faris et al. [14] GA ∙ It can find the significant features of spam emails.
∙ It optimizes the configuration of its core classifiers. ∙ We need to examine imbalanced datasets.

Olatunji [33] SVM ∙ The performance is high.
∙ It is flexible and reliable. ∙ It does not perform well when the noise is over.

Chikh et al. [7] CNSA-FFO ∙ It has high performance regarding accuracy and computational complexity.
∙ It can detect email spam. ∙ It has less convergence precision.

Kumaresan et al. [24] HKSVM ∙ Every kernel function is suitable for some tasks.
∙ It defines the arrangement of huge dimensional space. ∙ Performance needs to be enhanced.

Shuaib et al. [40] Rotation forest algorithm and WOA ∙ It has provided an outstanding enhancement.
∙ WOA has the ability to shun away the local optima. ∙ Need to experiment on huge datasets.

Noorizadeh et al. [32] Cyber attack detection algorithm ∙ It provides privacy to the users. ∙ This method makes the system very slow.

Samira et al.[12], Paragraph Vector-Distributed Memory ∙ It was resistant to differences in message cohesion and the language system. ∙ This method ignores the relationship within the languages.

Author [citation]	Methodology	Features	Challenges
Rawashdeh et al. [1]	Simulated Annealing algorithm and SVM	∙ By using SVM, the classification performance acquired was high. ∙ Simulated annealing was employed for optimizing the outcomes and assessing the spam detection model.	∙ The cost function of simulated annealing is highly expensive.
Naem et al. [30]	ALO	∙ It is used for altering the original location of the people present in the different searching regions. ∙ It performs well in detecting optimum features with the least number of features selected and acquiring more precision.	∙ Need to improve the performance when it is used with existing classifiers.
Idris et al. [19]	NSA-PSO	∙ It acquired the best accuracy when compared over benchmark NSA model. ∙ It employs LOF as the fitness function for generating the detector.	∙ Parallel hybridization needs to be implemented for detecting email spam in the network.
Faris et al. [14]	GA	∙ It can find the significant features of spam emails. ∙ It optimizes the configuration of its core classifiers.	∙ We need to examine imbalanced datasets.
Olatunji [33]	SVM	∙ The performance is high. ∙ It is flexible and reliable.	∙ It does not perform well when the noise is over.
Chikh et al. [7]	CNSA-FFO	∙ It has high performance regarding accuracy and computational complexity. ∙ It can detect email spam.	∙ It has less convergence precision.
Kumaresan et al. [24]	HKSVM	∙ Every kernel function is suitable for some tasks. ∙ It defines the arrangement of huge dimensional space.	∙ Performance needs to be enhanced.
Shuaib et al. [40]	Rotation forest algorithm and WOA	∙ It has provided an outstanding enhancement. ∙ WOA has the ability to shun away the local optima.	∙ Need to experiment on huge datasets.
Noorizadeh et al. [32]	Cyber attack detection algorithm	∙ It provides privacy to the users.	∙ This method makes the system very slow.
Samira et al.[12],	Paragraph Vector-Distributed Memory	∙ It was resistant to differences in message cohesion and the language system.	∙ This method ignores the relationship within the languages.

3. Proposed model for email spam detection using text and image datasets

3.1. Developed email spam detection model

Email is the major economic and reliable communication type. In recent years, the increase in email users resulted in an increase in spam emails. Automatic classification techniques classify spam from the ham mails utilizing text mining techniques. Several researchers have introduced various deep learning and machine learning-oriented strategies such as case-based reasoning, K-NN, NB, NN, SVM, DNN, artificial immune systems, etc. However, these techniques cannot completely handle the problem owing to the constant complexity of spamming software devices. The architecture of the proposed email spam detection for improved cybersecurity is displayed in Fig. 1.

Fig. 1.

The proposed architecture of email spam detection for improved cybersecurity.

The proposed model comprises four phases: dataset acquisition, feature extraction, optimal feature selection, and detection. In the first step of data acquisition, the benchmark dataset of email is gathered. Here four datasets are gathered, in which three datasets are related to the texts, and one dataset is related to the image. Once the datasets are collected, it is subjected to the next step of feature extraction, consisting of two sets of features such as text features and visual features. The text features are extracted from the text dataset, and visual features are extracted from the image dataset. Here, the text features extract the frequency count of spam words using the TF-IDF. The GLCM and color correlogram are used for the visual features. Initially, the dataset is divided into the training set and the testing set. As the length of the extracted feature vector tends to be long, it undergoes the third step of optimal feature selection to choose the optimal features. The optimal feature selection is made by a new meta-heuristic algorithm known as FLI-DA. Here, 20 features are selected optimally. Once the optimal features are chosen, the final step of detection takes place. The detection is done using two deep learning approaches called RNN and CNN. For enhancing the performance of the traditional deep learning approaches, the number of hidden neurons of both the RNN and CNN are optimized using the same proposed FLI-DA. The optimized hybrid learning technique with CNN and RNN classifies the data into spam and ham.

3.2. Dataset description

The benchmark dataset of spam emails is used here that are publically available. Here, four types of datasets are collected. The first three datasets are related to the text features, and the last dataset is related to the image dataset.

Dataset 1: The first text dataset is known as the Ling-spam dataset, and it is collected from the link [8]. It consists of 2893 spam as well as non-spam messages. These messages mostly concentrate on the linguistic interests around the software discussion, research opportunities, and job postings. The information in the header was removed. There exist 2412 legitimate messages and 481 spam messages.

Dataset 2: The second text dataset is called as the spam mails dataset. This dataset is collected from the link [9]. It includes the enron1 folder. This folder is composed of spam and ham. Each of the folders consists of emails.

Dataset 3: The third text dataset is named as the spam-or-ham-email classification. It is gathered from the link [10]. It consists of 71,325 datasets and 2371 tasks.

Dataset 4: The fourth image dataset is named as the image spam dataset. It is gathered from the link [11]. In addition, it is collected from the mailboxes of two real users.

4. Text features and visual features adopted for proposed email spam detection

4.1. Text feature extraction

The text features are extracted for the text dataset. Here, TF-IDF is used to extract the frequency count of spam words from the text dataset. The TF-IDF [43] is a digital statistical technique. The significance of vocabulary is reflected by modelling it to the documents in corpus or clusters. It is commonly employed as a weighting factor in user modelling, text mining, and information retrieval. The value of TF-IDF is proportional to the count of times the word appearing in the document. It is mostly balanced by the frequency of the word present in the corpus. The fact, which describes that few words are normally available in the corpus and are provided more importance, is rejected here. The product of TF and IDF results in producing TF-IDF. The computation for TF-IDF is very simple, and the computation between two documents is made easier by using this TF-IDF. Moreover, The TF-IDF returns all the documents that are related to the queries. The formula for TF is described in Eq. (1) that represents the frequency in which a characteristic appears in a single document. $\begin{matrix} (1) & {TF}_{j, i} = \frac{M_{j, i}}{\sum_{l = 0}^{m} M_{l, i}} \end{matrix}$

Here, the denominator represents the total word count in the document $E_{i}$ and $M_{j, i}$ represents the count of occurrences of the jth word in the document $E_{i}$ . The formula for IDF is shown in Eq. (2). $\begin{matrix} (2) & IDF = log (\frac{| E |}{| {e_{i} \in E : u_{j} \in e_{1}} |}) \end{matrix}$

In the above equation, the total count of documents present in the corpus is denoted by E and ${e_{i} \in E : u_{j} \in e_{1}}$ represents the corpus containing the document count for the characteristic $u_{j}$ . Finally, the formulation of TF-IDF is shown in Eq. (3). $\begin{matrix} (3) & TF - IDF = TF * IDF = \frac{M_{j, i}}{\sum_{l = 0}^{m} M_{l, i}} * log (\frac{| E |}{| {e_{i} \in E : u_{j} \in e_{1}} |}) \end{matrix}$

If the corpus documents count increases, then the outcome of IDF will be lesser. If the dynamic characteristic team is more, then the distinguishing capability of the characteristic text is very worst. When every document present in a corpus is composed of a characteristic, the weight will be 0. For avoiding zero in the denominator, most practical applications employ a technique of including 1 in the denominator. It is mostly utilized to measure the importance of a word present in the word frequency. The words having high frequency are not significant. The lack of theoretical basis is the significant concept of low frequency words. The common words do not describe the meaningless word. Low frequency words contain better characteristics skills. Few words are evenly distributed in various document categories, and few words may often occur in specific document types. Hence, the latter one is more applicable for describing the document characteristics.

The textual extracted features are described as $F_{nt}^{text}$ , in which $nt = 1, 2, \dots, NT$ and $NT$ denotes the number of textual features extracted by the TF-IDF. Therefore, the total number of textual features extracted is 600.

4.2. Visual feature extraction

The visual features are extracted from the image dataset, which uses GLCM and color correlogram features in the proposed email spam detection.

GLCM: It is used to extract the image features. GLCM [36] is a popular technique to extract the second-order statistical texture features. The statistical distributions of the intensities values combinations extract it at various positions comparable to every other in the image. Statistics are categorized into “first, second, and higher-order” on the basis of the count of intensity points present in an image. Higher-order statistics can retrieve theoretical results, but implementation is difficult owing to the computation complexity. Texture features consist of information regarding the structural order of surfaces together with their relationships. The GLCM has the advantage of characterizing the structure of the images. It also calculates the pixel with specific values frequently. Also, the implementation of GLCM is very simple, and a very good result was shown from this algorithm. Some of the texture-oriented features are described below.

Energy: It is also known as ‘angular second moment’ or ‘uniformity’. It returns the sum of square elements present in the GLCM matrix. It is computed in terms of homogeneous regions to non-homogeneous regions. Thus, when the frequency of the duplicated image pixel is more, it is also more. $\begin{matrix} (4) & Energy = \sum_{i z, j z = 0}^{N z - 1} {(P q_{i z j z})}^{2} \end{matrix}$

Entropy: The randomness present in the image is computed by entropy. As a result, a homogeneous image returns fewer entropy values. $\begin{matrix} (5) & Entropy = \sum_{i z, j z = 0}^{N z - 1} - ln (P q_{i z j z}) P q_{i z j z} \end{matrix}$

Contrast: The intensity linking contrast is computed among a pixel of an image and its neighbour. $\begin{matrix} (6) & Contrast = \sum_{i z, j z = 0}^{N z - 1} P q_{i z j z} {(i z - j z)}^{2} \end{matrix}$

Correlation: It represents the gray tone linear dependencies present in an image. It describes the correlation of pixel with its neighbour. $\begin{matrix} (7) & Correlation = \sum_{i z, j z = 0}^{N z - 1} P q_{i z j z} \frac{(i z - μ) (j z - μ)}{σ^{2}} \end{matrix}$

Homogeneity: It represents the pixel similarity. The homogeneous image’s GLCM matrix returns the value as 1. Thus, when the image texture needs fewer modifications, it is very less. $\begin{matrix} (8) & Homogeneity = \sum_{i z, j z = 0}^{N z - 1} \frac{P q_{i z j z}}{1 + {(i z - j z)}^{2}} \end{matrix}$

Here, $P q_{i z j z}$ represents the $(i z, j z)$ th element of normalized GLCM matrix and μ represents the mean of GLCM matrix, and it is computed using Eq. (9). $\begin{matrix} (9) & μ = \sum_{i z, j z = 0}^{N z - 1} i z P q_{i z j z} \end{matrix}$

The term σ represents the variance of intensities of all the pixels. It is computed as in Eq. (10). $\begin{matrix} (10) & σ^{2} = \sum_{i z, j z = 0}^{N z - 1} P q_{i z j z} {(i z - μ)}^{2} \end{matrix}$

Here, the term $N z$ represents the count of gray levels present in an image.

Color Correlogram: Generally, the color histograms act as the feature vectors for the images. The color histogram displays the global color distribution present in an image. The computation is easier, and it is insensitive to little variations present in the screening positions. However, it is not composed of spatial data, and it mostly accounts for false positives. Specifically, this is fragile for the vast databases. The histogram is also not dynamic to huge appearance variations. These limitations are handled by the competent color feature known as the color correlogram. The global distribution present in the local spatial correlations of colors is also explained. A strong representation of the textures is the small dimensionality. The competent algorithm removes the visual characteristics from the ham as well as spam images. The color correlogram can adjust very large changes in the view of the appearance in the scene. The auto color correlogram functions based on the following points.

Initially, the input image is transformed to a color quantization using 64-colors (4 levels for every channel). Then, the colors present in the input ham or spam images are quantized into $me$ color values such as $D_{1}, D_{2}, D_{3}, \dots, D_{64}$ .

On the basis of the distance l, a neighbourhood of 3 × 3 is considered for every pixel. The distance is considered on the basis of the original color correlogram, such as $l = 1, 3, 4$ and 7. The pixel value of the center pixel is differentiated from the pixel value of all the 8 pixels present in the neighbourhood. The probability of the pixel count describes the pixels local spatial distribution. The probability among $q_{1}$ and $q_{2}$ for a pixel $q_{1}$ of color $D_{j}$ in the image $Im g$ at a distance l away from $q_{1}$ selecting another pixel $q_{2}$ is represented as in Eq. (11).

\begin{matrix} (11) & γ^{l} D_{j} (Im g) = Pr ob obj [| q_{1} - q_{2} | = l, q_{2} \in Im g D_{j} | q_{1} \in Im g D_{j} |] \end{matrix}

For all the pixels present in the instance, this probability is repeated. It produces a 64-dimensional feature vector that is being utilized as a visual feature vector.

The total number of visual features attained is 3537. Here, both the GLCM and color correlogram are concatenated. Thus, the final visually extracted features are represented as $F_{n f}^{visual} = {GCLM, color corre log ram}$ , in which $n f = 1, 2, \dots, NF$ and $NF$ represents the number of visual features extracted by the GLCM and color correlogram.

4.3. Optimal feature selection

The feature selection is performed to minimize the inputs for further analysis and processing. It is also used to find the most relevant inputs. Here, optimal feature selection is made with the help of a new meta-heuristic algorithm known as FLI-DA. The total number of textual features attained is 600. The optimally selected textual features are represented by $F_{n t *}^{text *}$ , where $n t * = 1, 2, \dots, NT *$ and $NT *$ represents the number of optimally selected textual features is taken as 20. Similarly, the total number of visual features attained is 3537. The optimally chosen image features are denoted by $F_{n f *}^{visual *}$ , in which $n f * = 1, 2, \dots, NF *$ and $NF *$ denotes the number of optimally selected visual features that equals 20. Tables 2 and 3 list the features before and after feature selection, respectively in which ${ML}_{1}, {ML}_{2}, \dots, {ML}_{F}$ represents the total mails being considered in the dataset.

Table 2
Features before feature selection

Mails Visual features Textual features

ML₁ $F_{1 (1)}^{visual}$ $F_{2 (1)}^{visual}$ $\dots$ $F_{3537 (1)}^{visual}$ $F_{1 (1)}^{text}$ $F_{2 (1)}^{text}$ $\dots$ $F_{600 (1)}^{text}$

ML₂ $F_{1 (2)}^{visual}$ $F_{2 (2)}^{visual}$ $\dots$ $F_{3537 (2)}^{visual}$ $F_{1 (2)}^{text}$ $F_{2 (2)}^{text}$ $\dots$ $F_{600 (2)}^{text}$

ML₃ $F_{1 (3)}^{visual}$ $F_{2 (3)}^{visual}$ $\dots$ $F_{3537 (3)}^{visual}$ $F_{1 (3)}^{text}$ $F_{2 (3)}^{text}$ $\dots$ $F_{600 (3)}^{text}$

$\dots$ $\dots$ $\dots$ $\dots$ $\dots$ $\dots$ $\dots$ $\dots$ $\dots$

${ML}_{F}$ $F_{1 (F)}^{visual}$ $F_{2 (F)}^{visual}$ $\dots$ $F_{3537 (F)}^{visual}$ $F_{1 (F)}^{text}$ $F_{2 (F)}^{text}$ $\dots$ $F_{600 (F)}^{text}$

Mails	Visual features	Textual features
ML₁	$F_{1 (1)}^{visual}$	$F_{2 (1)}^{visual}$	$\dots$	$F_{3537 (1)}^{visual}$	$F_{1 (1)}^{text}$	$F_{2 (1)}^{text}$	$\dots$	$F_{600 (1)}^{text}$
ML₂	$F_{1 (2)}^{visual}$	$F_{2 (2)}^{visual}$	$\dots$	$F_{3537 (2)}^{visual}$	$F_{1 (2)}^{text}$	$F_{2 (2)}^{text}$	$\dots$	$F_{600 (2)}^{text}$
ML₃	$F_{1 (3)}^{visual}$	$F_{2 (3)}^{visual}$	$\dots$	$F_{3537 (3)}^{visual}$	$F_{1 (3)}^{text}$	$F_{2 (3)}^{text}$	$\dots$	$F_{600 (3)}^{text}$
$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$
${ML}_{F}$	$F_{1 (F)}^{visual}$	$F_{2 (F)}^{visual}$	$\dots$	$F_{3537 (F)}^{visual}$	$F_{1 (F)}^{text}$	$F_{2 (F)}^{text}$	$\dots$	$F_{600 (F)}^{text}$

Table 3

Features after feature selection

Mails	Visual features				Textual features
ML₁	$F_{1 (1)}^{visual *}$	$F_{2 (1)}^{visual *}$	$\dots$	$F_{20 (1)}^{visual *}$	$F_{1 (1)}^{text *}$	$F_{2 (1)}^{text *}$	$\dots$	$F_{20 (1)}^{text *}$
ML₂	$F_{1 (2)}^{visual *}$	$F_{2 (2)}^{visual *}$	$\dots$	$F_{20 (2)}^{visual *}$	$F_{1 (2)}^{text *}$	$F_{2 (2)}^{text *}$	$\dots$	$F_{20 (2)}^{text *}$
ML₃	$F_{1 (3)}^{visual *}$	$F_{2 (3)}^{visual *}$	$\dots$	$F_{20 (3)}^{visual *}$	$F_{1 (3)}^{text *}$	$F_{2 (3)}^{text *}$	$\dots$	$F_{20 (3)}^{text *}$
$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$	$\dots$
${ML}_{F}$	$F_{1 (F)}^{visual *}$	$F_{2 (F)}^{visual *}$	$\dots$	$F_{20 (F)}^{visual *}$	$F_{1 (F)}^{text * *}$	$F_{2 (F)}^{text *}$	$\dots$	$F_{20 (F)}^{text *}$

5. Fitness Oriented Levy Improvement-based Dragonfly Algorithm for optimal feature selection and classification

5.1. Objective function

The major objective of the presented email spam detection for improved cybersecurity is to maximize detection accuracy. Here, the optimization of hidden neurons of the CNN and RNN as well as the optimal feature selection, is done by the proposed FLI-DA. The RNN and CNN are used to detect spam emails. The objective function is described as in Eq. (12). $\begin{matrix} (12) & Ob = arg max_{{F_{nd}, H_{CNN}, H_{RNN}}} (Accy) \end{matrix}$

In the above equation $Ob$ represents the objective function, $F_{nd}$ represents the features to be selected optimally, where $nd = 1, 2, \dots, ND$ , in which $ND$ represents the total number of features from text or image. The term $H_{CNN}$ represents the hidden neurons of the CNN, $H_{RNN}$ represents the hidden neurons of the RNN, and $Accy$ represents the accuracy. The mathematical representation of the accuracy is described in Eq. (13). $\begin{matrix} (13) & Accy = \frac{TRPO + TRNE}{TRPO + TRNE + FAPO + FANE} \end{matrix}$

Here, $TRPO$ denotes the true positive, $TRNE$ denotes the true negative, $FAPO$ denotes the false positive, and $FANE$ denotes the false negative, respectively.

5.2. Encoding of solution

The solution encoding of the proposed email spam detection for improved cybersecurity is displayed in Fig. 2. Here, the optimal feature selection, as well as the hidden neurons of CNN and RNN, is optimized using the FLI-DA. The bounding limit of the number of attributes lies in the range of (1-length of feature). Similarly, the bounding limit of the hidden neurons of both RNN and CNN lies in between the value of (5-255). The bounding limit is nothing but the minimum and maximum ranges of each solution or chromosome variable. The optimization of each variable should be within the bounding limit, which ensures performance. It is user-defined as the input of optimization algorithms is randomly chosen based on the limited ranges. In Fig. 2, $F_{1}, F_{2}, \dots F_{ND}$ represents the features, $ND$ represents the total number of features that should be optimally selected that is assigned 20. Encoding is very time-consuming when extracting the quality of the solution. The feasibility of the solution cannot be guaranteed by the total encoding solution.

Fig. 2.

Solution encoding.

5.3. Conventional DA

The motivation of DA [20] arises from dynamic and static swarming behaviours. These swarming behaviours are identical to the exploitation and exploration phases. In the exploration phase, the dragonflies hunt the remaining flying prey. During the exploitation phase, a huge count of dragonflies migrates in one direction along with long distances. Hence, all the swarms are distracted by the enemies and attracted towards the food. Equation (14) describes the separation of the kth dragonfly $S_{k}$ from its neighbours as below. $\begin{matrix} (14) & U_{k} = \sum_{l = 1}^{P^{'}} (Z^{'} - Z_{l}^{'}) \end{matrix}$

Here, the count of neighbouring individuals is denoted by $P^{'}$ , the position of the lth neighbouring individual is denoted by $Z_{l}^{'}$ , and the location of the present individual is denoted by $Z^{'}$ . Equation (15) computes the alignment. $\begin{matrix} (15) & C_{k} = \frac{\sum_{l = 1}^{P^{'}} X_{l}^{'}}{P^{'}} \end{matrix}$

In the above equation, the velocity of the lth neighbouring individual is denoted by $X_{l}^{'}$ . Equation (16) computes the cohesion. $\begin{matrix} (16) & E_{k} = \frac{\sum_{l = 1}^{P^{'}} Z_{l}^{'}}{P^{'}} - Z^{'} \end{matrix}$

Equation (17) computes an attraction towards a food source. $\begin{matrix} (17) & H_{k} = Food - Z^{'} \end{matrix}$

Here, the position of the food source is denoted by $Food$ . Equation (18) computes the distraction towards an enemy. $\begin{matrix} (18) & G_{k} = Enemy + Z^{'} \end{matrix}$

In the above equation, the enemy’s position is denoted by $Enemy$ . The two vectors used for simulating the movements and updating the location of the dragonflies are the position ( $Z^{'}$ ) and step ( $Δ Z^{'}$ ). Equation (19) represents the step vector displaying the direction of the movement of the dragonflies. $\begin{matrix} (19) & Δ Z_{v + 1}^{'} = (u^{'} U_{k} + c^{'} C_{k} + e^{'} E_{k} + h^{'} H_{k} + g^{'} G_{k}) + δ \cdot Δ Z_{v}^{'} \end{matrix}$

Here, the iteration counter is denoted by v, the inertia weight is denoted by δ, the enemy’s position of the kth individual is denoted by $G_{k}$ , the enemy factor is denoted by $g^{'}$ , the food source of the kth individual is denoted by $H_{k}$ , the food factor is denoted by $h^{'}$ , the cohesion of the kth individual is denoted by $E_{k}$ , the cohesion weight is denoted by $e^{'}$ , the alignment of the kth individual is denoted by $C_{k}$ , the alignment weight is denoted by $c^{'}$ , the separation of the kth individual is denoted by $U_{k}$ , and the separation weight is denoted by $u^{'}$ . These factors lead to various exploitative and explorative behaviours during the process of optimization. Once the step vector is computed, Eq. (20) computes the position vectors as below. $\begin{matrix} (20) & Z_{v + 1}^{'} = Z_{v}^{'} + Δ Z_{v + 1}^{'} \end{matrix}$

Equation (21), Eq. (22), and Eq. (23) are used to update the position of the dragonflies as below. $\begin{array}{l} (21) & Z_{v + 1}^{'} = Z_{v}^{'} + Levy (f) + Z_{v}^{'} \\ (22) & Levy (f) = 0.01 \times \frac{{ra}_{1} \times Φ}{| {ra}_{2} |^{\frac{1}{f}}} \\ (23) & Φ = {(\frac{Γ (1 + ξ) \times sin (\frac{π ξ}{2})}{Γ (\frac{1 + ξ}{2}) \times ξ \times 2^{(\frac{ξ - 1}{2})}})}^{\frac{1}{ξ}} \\ (24) & Γ (z) = (z - 1)! \end{array}$

In the above equations, the constant is denoted by ξ, the two random numbers in the interval range of [0, 1] are denoted by ${ra}_{1}$ and ${ra}_{2}$ , the dimension of the position vectors are denoted by f, and the current iteration is denoted by v. To update $Z^{'}$ and $Δ Z^{'}$ vectors, the neighbourhood of every dragonfly is described by computing the Euclidean distance among all the dragonflies and choosing $P^{'}$ of them. This process of position update continues till it reaches the stopping criterion. The pseudo-code of the traditional DA is given in Algorithm 1.

Algorithm 1

Conventional DA [20]

5.4. Proposed FLI-DA

The proposed FLI-DA does the optimal feature selection and the hidden neuron optimization of CNN and RNN. The CNN can easily find the important features without the intervention of any humans. CNN performs identification and prediction, and it gives an effective dense network. If the data of the CNN is high, the accuracy will also be high. The RNN can remember information by the time. The RNN can easily predict the time series. RNN can process the input, which can be of any length. The model size does not exceed even if the input size is larger. The optimization algorithms have achieved great attention among researchers. Optimization algorithms are used for handling various engineering-related problems. Using the optimization principles, decision-making systems, as well as expert systems, are generated. Optimization algorithms mostly depend on the performance of classification and prediction. The inspiration of DA [20] arises from the dynamic as well as the static swarming behaviour of the dragonflies for implementing the exploitation and the exploration search spaces.

In the conventional DA, the random walk is improved using the levy flight as in Eq. (16). But, in the proposed FLI-DA, rather than the levy flight, the random walk is performed using the new update formula that is decided by the fitness function ${fit}_{k}$ . If $({fit}_{k} > mean ({fit}_{k}))$ , then the solution is updated based on Eq. (25), and $V = abs (Z^{'} - Enemy)$ hence, the position vector is computed as in Eq. (25). $\begin{matrix} (25) & Z_{v + 1}^{'} = Z^{'} + b V \end{matrix}$

Otherwise, if ${fit}_{k} ⩽ mean ({fit}_{k})$ then the solution is updated as $W = abs (Z^{'} - Food)$ and so, the position vector is calculated as in Eq. (26). $\begin{matrix} (26) & Z_{v + 1}^{'} = Z^{'} + b W \end{matrix}$

In the above equations, b represents a random number, and it lies in between the value of −1 to 1. The pseudo-code of the proposed FLI-DA is displayed in Algorithm 2, and the flowchart of the proposed FLI-DA is displayed in Fig. 3.

Algorithm 2

Proposed FLI-DA

Fig. 3.

Flowchart of the proposed FLI-DA.

6. Text and visual feature classification using Hybrid CNN and RNN

Visual and textual features are used for the classification of the document image. Textual features efficiently capture emotional semantics based on similarity in words. Also, the textual analysis extracts the meaningful values from the texts related to the image under test. The visual features deliver a great description of their content.

6.1. Convolutional neural network

CNN [44] consists of two parts. In part 1, the deep features present in the raw data are produced by the pooling and convolution operations. Next, in part 2, for the classification purpose, the features are joined to an MLP. Every layer is described as given below.

Input layer: It is composed of $M \times l$ neurons, in which the length of every univariate series is represented by M and the variate count of input time series is represented by l. The training images are received by using the input layer for the neural network model design.

Convolutional layer: By using the convolution filters, the convolution operations are done on the preceding layer’s time series. The term $t f$ describes the nonlinear transformation function. If the preceding layer is composed of l-variate time series and M describes the length of every univariate, then the convolution operation retrieves $f n$ time series and the length of every univariate is $[\frac{M - 1}{c s} + 1]$ , in which the rounding down is described by $⌊ \cdot ⌋$ . The extraction of the essential features by decreasing the parameters is done by using a convolution layer. It reduces the issue of over-fitting and provides the generalization concept.

Pooling layer: A feature map is classified into M equal-length segments, and then each segment is denoted by its maximum or average value. The pooling layer is useful for decreasing the overhead of data securing useful information. The pooling layer exists in each hidden layer.

Feature layer: After various pooling and convolution operations, the original time series is denoted by a series of feature maps.

Output layer: The output layer is composed of $m r$ neurons that relate to $m r$ classes of time series. The linear output layer is most frequently used in function-fitting issues.

CNN training: The CNN is trained using a training example sequence $((y_{1}, z_{1}), (y_{2}, z_{2}), \dots, (y_{M sample}, z_{M sample}))$ having $y_{u} \in ℜ^{M \times l}$ , $z_{u} \in ℜ^{m}$ for $1 ⩽ u ⩽ M sample$ . The input is represented by the univariate or multivariate time series $y_{u}$ , and the target output represents the vector $z_{u}$ . The network is trained on the basis of the below steps:

Step 1: The bias and weights are initialized using a small random number. Equation (27) describes the sigmoid function by describing an activation function $af$ and a learning rate η as below. $\begin{matrix} (27) & af (y) = sigmoid (y) = \frac{1}{1 + e^{- y}} \end{matrix}$

Step 2: From the training set, select a training sample randomly.

Step 3: The output of every layer is computed. Equation (28) describes the output of the convolutional layer. $\begin{matrix} (28) & {CL}_{s} (u) = af (\sum_{j = 1}^{k} \sum_{i = 1}^{l} y (j + c s (u - 1), i) w t_{s} (j, i) + b s (s)) \end{matrix}$

Here, the bias and weight of the sth convolution filter are represented by $b s (s)$ and $w t_{s} \in ℜ^{k \times l}$ , the uth component of the sth feature map is denoted by ${CL}_{s} (u)$ , the convolution stride is represented by $c s$ , and the output or the input time series of the preceding layer is represented by $y \in ℜ^{M \times l}$ . Equation (29) describes the output of the pooling layer. $\begin{matrix} (29) & {PL}_{s} (u) = h ({CL}_{s} ((u - 1) k p + 1), {CL}_{s} ((u - 1) k p + 2), \dots, {CL}_{s} (u k p)) \end{matrix}$

In the above equation, the pooling strategy is denoted by h. It does not alter the feature map counts. Equation (30) describes the output of the output layer. $\begin{matrix} (30) & OL (i) = af (\sum_{j = 1}^{N} x (j) w t_{af} (j, i) + b s_{af} (i)), i = 1, 2, \dots, m r \end{matrix}$

Here, the connection weights among the output and the feature layer are denoted by $w t_{af} \in ℜ^{N \times m r}$ , the bias of the output layer is denoted by $b s_{af}$ , and the final feature map present in the feature layer is represented by x. Hence, Eq. (31) describes the mean-square error as below. $\begin{matrix} (31) & Err = \frac{1}{2} \sum_{l = 1}^{m r} f {(l)}^{2} = \frac{1}{2} \sum_{l = 1}^{m r} {(OL (l) - z (l))}^{2} \end{matrix}$

Step 4: The gradient descent method updates the bias and weights as in Eq. (32). $\begin{matrix} (32) & q = q - η \frac{\partial Err}{\partial q} \end{matrix}$

Here, q is used to denote $b s_{af}$ , $b s$ , $w t_{af}$ , or $w t_{s}$ and it represents the parameter value.

Step 5: Select another training sample and move to step 3 till the complete samples are trained in the training set.

Step 6: The iteration count is increased. The algorithm stops once the iteration count equals the maximum value. In the else case, move to step 2.

6.2. Recurrent neural network

RNN [25] is one of the divisions of ANN. A special kind of LSTM is the GRU. GRU is used to make RNN easier. The forget as well as the output gates are joined by the GRU into a single update gate b. Through linear interpolation, the current state is received. The major benefits are the easier training and the less parameter. The input $(z_{k s}, j_{k s - 1})$ describes the input features of the $k s$ th image slice and the earlier hidden state. Equation (33) and Eq. (34) describe the reset gate t and the update gate b. $\begin{array}{l} (33) & b_{k s} = σ (Y^{z b} z_{k s} + Y^{j b} j_{k s - 1}) \\ (34) & t_{k s} = σ (Y^{z t} z_{k s} + Y^{j t} j_{k s - 1}) \end{array}$

Here, the logistic sigmoid function is denoted by σ, and the corresponding weight matrices are denoted by $Y^{z b}$ , $Y^{j b}$ , $Y^{z t}$ and $Y^{j t}$ . Equation (35) calculates the candidate state of the hidden unit. $\begin{matrix} (35) & {\tilde{j}}_{k s} = tan (Y^{z j} z_{k s} + Y^{j j} (j_{k s - 1} Θ t_{k s})) \end{matrix}$

In the above equation, the element-wise multiplication is denoted by Θ. The $k s$ th hidden activation state $j_{k s}$ of GRU represents a linear interpolation among the candidate state ${\tilde{j}}_{k s}$ and the earlier state $j_{k s - 1}$ as in Eq. (36). $\begin{matrix} (36) & j_{k s} = (1 - b_{k s}) Θ {\tilde{j}}_{k s} + b_{k s} Θ j_{k s - 1} j_{k s} = (1 - b_{k s}) Θ {\tilde{j}}_{k s} + b_{k s} Θ j_{k s - 1} \end{matrix}$

6.3. Hybridization of two classifiers

Once the optimal features are selected, it is subjected to the detection phase that consists of a hybrid classifier with RNN and CNN. FLI-DA is used for training hybrid classifier RNN and CNN. So, the hidden neurons were selected optimally. Finally, the outcome of both the RNN and CNN undergoes bit and operation detected whether the data is a ham or spam. The hybrid classifier is diagrammatically represented in Fig. 4.

Fig. 4.

Hybrid classifier to perform email spam detection.

7. Results and discussions

7.1. Experimental setup

The implementation environment was the python 3.7 version. The RAM of the system was 8 GB RAM. The proposed email spam detection for improved cybersecurity was implemented in Python, and the analysis of the result was carried out. The population size was taken as 10, and the maximum number of iterations was 25. The total number of datasets used was four, in which the first three datasets resembled the text and the last dataset resembled the image. Here, the performance of the proposed FLI-DA-CRNN was differentiated with several optimization algorithms like PSO-CRNN [42], GWO-CRNN [28], WOA-CRNN [27], and DA-CRNN [20], and deep learning models such as CNN [44], RNN [25], and CRNN [25,44] and machine learning models like DT [41], KNN [18], SVM [6], and NN [5] in terms of Type I measures such as, “accuracy, sensitivity, specificity, precision, NPV, F1 Score, and MCC”, and Type II measures such as “FPR, FNR, and FDR” to determine the superiority of the proposed method. From the analysis, the learning percentage varies from 0.4% to 0.8%. The training percentage is varying from 35% to 85%. Here, when 35% of data is considered for the training, the remaining 65% of data is considered for validation. Similarly, when 40% of data is considered for training, the remaining 60% of data is considered for validation.

7.2. Experimental parameters

The experimental parameters were shown in Table 4.

Table 4
Parameter settings

Methods Experimental parameters

PSO $C_{1} = 2$ ; $c_{2} = 2$ ; $w_{max} = 0.9$ ; $w_{min} = 0.1$

WOA Leader-score = ‘inf’

DA Food-Fitness = ‘inf’

Enemy-Fitness = ‘inf’

Methods	Experimental parameters
PSO	$C_{1} = 2$ ; $c_{2} = 2$ ; $w_{max} = 0.9$ ; $w_{min} = 0.1$
WOA	Leader-score = ‘inf’
DA	Food-Fitness = ‘inf’
Enemy-Fitness = ‘inf’

7.3. Performance metrics

The performance is analyzed using ten metrics as described below.

Accuracy: It is clearly described in Eq. (13).

Sensitivity: It is defined as “the number of true positives, which are recognized exactly”. $\begin{matrix} (37) & Sens = \frac{TRPO}{TRPO + FANE} \end{matrix}$

Specificity: It is defined as “the number of true negatives, which are determined precisely”. $\begin{matrix} (38) & Spec = \frac{TRNE}{FAPO} \end{matrix}$

Precision: It is defined as “the ratio of positive observations that are predicted exactly to the total number of observations that are positively predicted”. $\begin{matrix} (39) & Pr ec = \frac{TRPO}{TRPO + FAPO} \end{matrix}$

FPR: It is defined as “the ratio of the count of false-positive predictions to the entire count of negative predictions”. $\begin{matrix} (40) & FPR = \frac{FAPO}{FAPO + TRNE} \end{matrix}$

FNR: It is defined as “the proportion of positives which yield negative test outcomes with the test”. $\begin{matrix} (41) & FNR = \frac{FANE}{TRNE + TRPO} \end{matrix}$

NPV: It is defined as the “probability that subjects with a negative screening test truly don’t have the disease”. $\begin{matrix} (42) & NPV = \frac{FANE}{FANE + TRNE} \end{matrix}$

FDR: It is defined as “the number of false positives in all rejected hypotheses”. $\begin{matrix} (43) & FDR = \frac{FAPO}{FAPO + TRPO} \end{matrix}$

F1 Score: It is defined as “harmonic mean between precision and recall. It is used as a statistical measure to rate performance”. $\begin{matrix} (44) & F 1 S r e = \frac{Sens ∙ Pr ec}{Pr ec + Sens} \end{matrix}$

MCC: It is defined as a “correlation coefficient computed by four values”. $\begin{matrix} (45) & MCC = \frac{TRPO \times TRNE - FAPO \times FANE}{\sqrt{(TRPO + FAPO) (TRPO + FANE) (TRNE + FAPO) (TRNE + FANE)}} \end{matrix}$

7.4. Accuracy of email spam detection using optimized models

The accuracy of the email spam detection using proposed and conventional optimization models by changing the learning percentages for the four datasets is displayed in Fig. 5. It can be seen that the proposed FLI-DA-CRNN achieved better accuracy when compared with several other optimization models. From Fig. 5(a), for dataset 1, at a learning percentage of 85%, the accuracy of the proposed FLI-DA-CRNN is 6.08% better than PSO-CRNN, 4.52% better than GWO-CRNN, 6.56% better than WOA-CRNN, and 3.93% better than DA-CRNN. On considering Fig. 5(b), at a learning percentage of 75% for dataset 2, the accuracy of the proposed FLI-DA-CRNN is 3.42% better than PSO-CRNN, 2.95% better than GWO-CRNN, 2.02% better than WOA-CRNN, and 1.45% better than DA-CRNN. While Fig. 5(c) is taken into account, for dataset 3, at a learning percentage of 85%, the accuracy of the proposed FLI-DA-CRNN is 4.64% improved than PSO-CRNN, 2.67% improved than GWO-CRNN, 3.24% improved than WOA-CRNN, and 5.36% improved than DA-CRNN. In Fig. 5(d), for dataset 4, at a learning percentage of 85%, the accuracy of the proposed FLI-DA-CRNN is 3.2% progressed than PSO-CRNN, 2.03% progressed than GWO-CRNN, 3.32% progressed than WOA-CRNN, and 2.15% progressed than DA-CRNN. Hence, it can be confirmed that the proposed FLI-DA-CRNN performs better email spam detection when it is differentiated from several existing optimization models.

Fig. 5.

Accuracy of email spam detection using proposed and conventional optimization models by varying learning percentages for (a) Dataset 1, (b) Dataset 2, (c) Dataset 3, and (d) Dataset 4.

7.5. Accuracy of email spam detection using deep learning models

The accuracy of the email spam detection for improved security using the deep learning models for four datasets at various learning percentages are given in Fig. 6. For the four datasets, the accuracy of the proposed FLI-DA-CRNN seems to be high in all the varying percentages. In Fig. 6(a), at 85% learning percentage for dataset 1, the accuracy of the proposed FLI-DA-CRNN is 23.29% superior to CNN, 21.62% superior to RNN, and 4.65% superior to CRNN. On seeing Fig. 6(b) for dataset 2, at a learning percentage of 75%, the accuracy of the proposed FLI-DA-CRNN is 23.29% upgraded than CNN, 28.57% upgraded than RNN, and 2.27% upgraded than CRNN. When taking Fig. 6(c), for dataset 3, at a learning percentage of 85%, the accuracy of the proposed FLI-DA-CRNN is 28.57% surpassed than CNN, 25% surpassed than RNN, and 5.88% surpassed than CRNN. While considering Fig. 6(d), at a learning percentage of 85% for dataset 4, the accuracy of the proposed FLI-DA-CRNN is 28.57% advanced than CNN, 32.35% advanced than RNN, and 2.27% advanced than CRNN. Thus, it is revealed that the email spam is detected accurately by the proposed FRI-DA-CRNN.

Fig. 6.

Accuracy of email spam detection using deep learning models by varying learning percentages for (a) Dataset 1, (b) Dataset 2, (c) Dataset 3, and (d) Dataset 4.

7.6. Accuracy of email spam detection using various machine learning models

The accuracy of email spam detection using proposed and conventional machine learning models at various learning percentages for four datasets are shown in Fig. 7. This figure describes that the accuracy rises for all the datasets at various learning percentages by the proposed model. In Fig. 7(a), in the case of dataset 1, at a learning percentage of 75%, the accuracy of the proposed FLI-DA-CRNN is 6.02% better than DT, 7.32% better than KNN, 8.64% better than SVM, and 4.76% better than NN. Similarly, from Fig. 7(b) for dataset 2 at 85% learning percentage, the accuracy of the proposed FLI-DA-CRNN is 9.76% improved than DT, 8.43% improved than KNN, 9.76% improved than SVM, and 12.5% improved than NN. At a learning percentage of 75% from Fig. 7(c) for dataset 3, the accuracy of the proposed FLI-DA-CRNN is 7.14% progressed than DT, 9.76% progressed than KNN, 8.43% progressed than SVM, and 9.76% progressed than NN. When Fig. 7(d) is viewed, for dataset 4, at a learning percentage of 75%, the accuracy of the proposed FLI-DA-CRNN is 3.61% superior to DT, 7.5% superior to KNN, 6.17% superior to SVM, and 4.88% superior to NN. These results demonstrate that the proposed FLI-DA-CRNN provides superior email spam detection compared with various machine learning models.

Fig. 7.

Accuracy of email spam detection using proposed and conventional machine learning models by varying learning percentages for (a) Dataset 1, (b) Dataset 2, (c) Dataset 3, and (d) Dataset 4.

7.7. Overall performance analysis

The overall performance analysis on email spam detection using proposed and conventional optimization models, deep learning models, and proposed and conventional machine learning models are listed in Table 5, Table 6, and Table 7. The analysis demonstrates that the Type I measures shows an increased output with the proposed method, and Type II measures show a decreased output with the proposed method, which denotes the superiority of the presented technique. From Table 5, while considering dataset 1, the accuracy of the proposed FLI-DA-CRNN is 6.06% advanced than PSO-CRNN, 4.70% advanced than GWO-CRNN, 6.52% advanced than WOA-CRNN, and 3.81% advanced than DA-CRNN. While from Table 6, for dataset 2, the accuracy of the proposed FLI-DA-CRNN is 17.38% better than CNN, 18.83% better than RNN, and 2.26% better than CRNN. In Table 7, while considering dataset 3, the accuracy of the proposed FLI-DA-CRNN is 14.93% improved than DT, 12.24% improved than KNN, 10.32% improved than SVM, and 12.24% improved than NN. Similarly, for dataset 4, the accuracy of the proposed FLI-DA-CRNN is 8.54% progressed than DT, 11.48% progressed than KNN, 7.37% progressed than SVM, and 7.87% progressed than NNN. The outcomes indicate that better spam detection is performed by the proposed FLI-DA-CRNN compared with the traditional optimization models, deep learning models, and machine learning models.

Table 5
Overall analysis on email spam detection using proposed and conventional optimization models for four datasets

Dataset 1

Approaches FPR MCC FDR Sensitivity Accuracy F1 Score Precision Specificity FNR NPV

PSO-CRNN [42] 0.071429 0.626216 0.012438 0.863043 0.871698 0.921114 0.987562 0.928571 0.136957 0.928571

GWO-CRNN [28] 0.114286 0.629133 0.019324 0.882609 0.883019 0.929062 0.980676 0.885714 0.117391 0.885714

WOA-CRNN [27] 0.157143 0.581566 0.026699 0.871739 0.867925 0.919725 0.973301 0.842857 0.128261 0.842857

DA-CRNN [20] 0.114286 0.64441 0.019139 0.891304 0.890566 0.933941 0.980861 0.885714 0.108696 0.885714

FLI-DA-CRNN 0.042857 0.747414 0.007042 0.919565 0.924528 0.954853 0.992958 0.957143 0.080435 0.957143

Dataset 2

Approaches F1 Score Precision MCC FNR Specificity Sensitivity NPV FPR Accuracy FDR

PSO-CRNN [42] 0.757506 0.700855 0.680474 0.175879 0.893939 0.824121 0.893939 0.106061 0.877765 0.299145

GWO-CRNN [28] 0.772627 0.688976 0.702338 0.120603 0.880303 0.879397 0.880303 0.119697 0.880093 0.311024

WOA-CRNN [27] 0.785872 0.700787 0.720477 0.105528 0.884848 0.894472 0.884848 0.115152 0.887078 0.299213

DA-CRNN [20] 0.781857 0.685606 0.716701 0.090452 0.874242 0.909548 0.874242 0.125758 0.882421 0.314394

FLI-DA-CRNN 0.8 0.723577 0.738618 0.105528 0.89697 0.894472 0.89697 0.10303 0.896391 0.276423

Dataset 3

Approaches F1 Score Precision FPR MCC Specificity NPV FDR Sensitivity FNR Accuracy

PSO-CRNN [42] 0.684211 0.565217 0.112045 0.63871 0.887955 0.887955 0.434783 0.866667 0.133333 0.884892

GWO-CRNN [28] 0.723684 0.597826 0.103641 0.688142 0.896359 0.896359 0.402174 0.916667 0.083333 0.899281

WOA-CRNN [27] 0.694444 0.595238 0.095238 0.645893 0.904762 0.904762 0.404762 0.833333 0.166667 0.894484

DA-CRNN [20] 0.675159 0.546392 0.123249 0.631409 0.876751 0.876751 0.453608 0.883333 0.116667 0.877698

FLI-DA-CRNN 0.783784 0.659091 0.084034 0.75919 0.915966 0.915966 0.340909 0.966667 0.033333 0.923261

Dataset 4

Approaches Sensitivity FNR NPV FDR Precision FPR Accuracy MCC Specificity F1 Score

PSO-CRNN [42] 0.859223 0.140777 0.880492 0.277551 0.722449 0.119508 0.874839 0.702803 0.880492 0.784922

GWO-CRNN [28] 0.864078 0.135922 0.891037 0.258333 0.741667 0.108963 0.883871 0.721473 0.891037 0.798206

WOA-CRNN [27] 0.907767 0.092233 0.86116 0.296992 0.703008 0.13884 0.873548 0.715443 0.86116 0.792373

DA-CRNN [20] 0.883495 0.116505 0.88225 0.269076 0.730924 0.11775 0.882581 0.724405 0.88225 0.8

FLI-DA-CRNN 0.898058 0.101942 0.903339 0.229167 0.770833 0.096661 0.901935 0.765694 0.903339 0.829596

Table 6

Overall analysis on email spam detection using deep learning models for four datasets

Dataset 1
Approaches	FNR	MCC	Sensitivity	Precision	NPV	FDR	Accuracy	FPR	Specificity	F1 Score
CNN [44]	0.234783	0.394398	0.765217	0.956522	0.771429	0.043478	0.766038	0.228571	0.771429	0.850242
RNN [25]	0.232609	0.406494	0.767391	0.959239	0.785714	0.040761	0.769811	0.214286	0.785714	0.852657
CRNN [25,44]	0.145652	0.544647	0.854348	0.97037	0.828571	0.02963	0.850943	0.171429	0.828571	0.908671
FLI-DA-CRNN	0.080435	0.747414	0.919565	0.992958	0.957143	0.007042	0.924528	0.042857	0.957143	0.954853
Dataset 2
Approaches	FDR	MCC	Accuracy	FPR	F1 Score	NPV	Sensitivity	Specificity	FNR	Precision
CNN [44]	0.506667	0.454361	0.763679	0.230303	0.593186	0.769697	0.743719	0.769697	0.256281	0.493333
RNN [25]	0.517964	0.473339	0.754366	0.262121	0.604128	0.737879	0.809045	0.737879	0.190955	0.482036
CRNN [25,44]	0.319066	0.695776	0.876601	0.124242	0.767544	0.875758	0.879397	0.875758	0.120603	0.680934
FLI-DA-CRNN	0.276423	0.738618	0.896391	0.10303	0.8	0.89697	0.894472	0.89697	0.105528	0.723577
Dataset 3
Approaches	MCC	FDR	NPV	Specificity	Sensitivity	FNR	F1 Score	Accuracy	Precision	FPR
CNN [44]	0.368654	0.676056	0.731092	0.731092	0.766667	0.233333	0.455446	0.736211	0.323944	0.268908
RNN [25]	0.379163	0.661654	0.753501	0.753501	0.75	0.25	0.466321	0.752998	0.338346	0.246499
CRNN [25 ,44]	0.586903	0.460674	0.885154	0.885154	0.8	0.2	0.644295	0.872902	0.539326	0.114846
FLI-DA-CRNN	0.75919	0.340909	0.915966	0.915966	0.966667	0.033333	0.783784	0.923261	0.659091	0.084034
Dataset 4
Approaches	NPV	F1 Score	Precision	Accuracy	FPR	MCC	Sensitivity	FDR	FNR	Specificity
CNN [44]	0.720562	0.604207	0.498423	0.732903	0.279438	0.438075	0.76699	0.501577	0.23301	0.720562
RNN [25]	0.720562	0.588008	0.488746	0.725161	0.279438	0.413162	0.737864	0.511254	0.262136	0.720562
CRNN [25 ,44]	0.880492	0.81128	0.733333	0.887742	0.119508	0.741117	0.907767	0.266667	0.092233	0.880492
FLI-DA-CRNN	0.903339	0.829596	0.770833	0.901935	0.096661	0.765694	0.898058	0.229167	0.101942	0.903339

Table 7

Overall analysis on email spam detection using proposed and conventional machine learning models for four datasets

Dataset 1
Approaches	Precision	Accuracy	FPR	MCC	Sensitivity	FNR	F1 Score	NPV	FDR	LSpecificitypt
DT [41]	0.971576	0.820755	0.157143	0.50363	0.817391	0.182609	0.887839	0.842857	0.028424	0.842857
KNN [18]	0.973958	0.818868	0.142857	0.507904	0.813043	0.186957	0.886256	0.857143	0.026042	0.857143
SVM [6]	0.96134	0.807547	0.214286	0.456078	0.81087	0.18913	0.879717	0.785714	0.03866	0.785714
NN [5]	0.981481	0.818868	0.1	0.528915	0.806522	0.193478	0.885442	0.9	0.018519	0.9
FLI-DA-CRNN	0.992958	0.924528	0.042857	0.747414	0.919565	0.080435	0.954853	0.957143	0.007042	0.957143
Dataset 2
Approaches	Specificity	FPR	Accuracy	NPV	MCC	F1 Score	Precision	FNR	Sensitivity	FDR
DT [41]	0.819697	0.180303	0.82305	0.819697	0.585882	0.68595	0.582456	0.165829	0.834171	0.417544
KNN [18]	0.819697	0.180303	0.826542	0.819697	0.597824	0.694045	0.586806	0.150754	0.849246	0.413194
SVM [6]	0.824242	0.175758	0.831199	0.824242	0.607431	0.701031	0.594406	0.145729	0.854271	0.405594
NN [5]	0.822727	0.177273	0.814901	0.822727	0.553686	0.663848	0.572993	0.211055	0.788945	0.427007
FLI-DA-CRNN	0.89697	0.10303	0.896391	0.89697	0.738618	0.8	0.723577	0.105528	0.894472	0.276423
Dataset 3
Approaches	FNR	MCC	Precision	Sensitivity	NPV	Accuracy	FDR	Specificity	F1 Score	FPR
DT [41]	0.2	0.470559	0.40678	0.8	0.803922	0.803357	0.59322	0.803922	0.539326	0.196078
KNN [18]	0.15	0.52315	0.439655	0.85	0.817927	0.822542	0.560345	0.817927	0.579545	0.182073
SVM [6]	0.15	0.545338	0.463636	0.85	0.834734	0.83693	0.536364	0.834734	0.6	0.165266
NN [5]	0.15	0.52315	0.439655	0.85	0.817927	0.822542	0.560345	0.817927	0.579545	0.182073
FLI-DA-CRNN	0.033333	0.75919	0.659091	0.966667	0.915966	0.923261	0.340909	0.915966	0.783784	0.084034
Dataset 4
Approaches	FDR	MCC	Specificity	FPR	F1 Score	NPV	Sensitivity	FNR	Precision	Accuracy
DT [41]	0.359551	0.614828	0.831283	0.168717	0.723044	0.831283	0.830097	0.169903	0.640449	0.830968
KNN [18]	0.393382	0.567318	0.811951	0.188049	0.690377	0.811951	0.800971	0.199029	0.606618	0.809032
SVM [6]	0.353571	0.648011	0.826011	0.173989	0.744856	0.826011	0.878641	0.121359	0.646429	0.84
NN [5]	0.34749	0.620176	0.841828	0.158172	0.726882	0.841828	0.820388	0.179612	0.65251	0.836129
FLI-DA-CRNN	0.229167	0.765694	0.903339	0.096661	0.829596	0.903339	0.898058	0.101942	0.770833	0.901935

7.8. Analysis of ensemble approaches

The overall analysis of the ensemble approach for four datasets is shown in Table 8. Here, we introduce new classifiers such as Adaboost, Voting_classifier, and LSTM_CNN. Adaboost classifier integrates many classifiers for enhancing the accuracy of the classifiers. Voting_classifier combines many machine learning classifiers for classification and detection. LSTM_CNN is used for detecting fake news.

Table 8
An overall analysis of the ensemble approach for four datasets

Classifier

Metric Adaboost Voting_classifier LSTM_CNN Proposed FLI-DA-CRNN

Dataset 1

Accuracy 0.77736 0.77170 0.84057 0.92453

Sensitivity 0.75652 0.75870 0.89130 0.91957

Specificity 0.91429 0.85714 0.88571 0.95714

Precision 0.98305 0.97214 0.98086 0.99296

FPR 0.08571 0.14286 0.11429 0.04286

FNR 0.24348 0.24130 0.10870 0.08043

NPV 0.91429 0.85714 0.88571 0.95714

FDR 0.01695 0.02786 0.01914 0.00704

F1-Score 0.85504 0.85226 0.93394 0.95485

MCC 0.48225 0.44602 0.64441 0.74741

Dataset 2

Accuracy 0.79511 0.79627 0.83242 0.89639

Sensitivity 0.33668 0.25628 0.90955 0.89447

Specificity 0.93333 0.95909 0.87424 0.89697

Precision 0.60360 0.65385 0.68561 0.72358

FPR 0.06667 0.04091 0.12576 0.10303

FNR 0.66332 0.74372 0.09045 0.10553

NPV 0.93333 0.95909 0.87424 0.89697

FDR 0.39640 0.34615 0.31439 0.27642

F1-Score 0.43226 0.36823 0.78186 0.8

MCC 0.33961 0.31624 0.71670 0.73862

Dataset 3

Accuracy 0.85851 0.85851 0.82770 0.92326

Sensitivity 0.01667 0.01667 0.88333 0.96667

Specificity 1.0 1.0 0.87675 0.91597

Precision 1.0 1.0 0.54639 0.65909

FPR 0.0 0.0 0.12325 0.08403

FNR 0.98333 0.98333 0.11667 0.03333

NPV 1.0 1.0 0.876750700280112 0.9159663865546218

FDR 0.0 0.0 0.45361 0.34091

F1-Score 0.03279 0.03279 0.67516 0.78378

MCC 0.11959 0.11959 0.63141 0.75919

Dataset 4

Accuracy 0.76774 0.77806 0.83258 0.90194

Sensitivity 0.32039 0.31068 0.88350 0.89806

Specificity 0.92970 0.94728 0.88225 0.90334

Precision 0.62264 0.68085 0.73092 0.77083

FPR 0.07030 0.05272 0.11775 0.09666

FNR 0.67961 0.68932 0.11650 0.10194

NPV 0.92970 0.94728 0.88225 0.90334

FDR 0.37736 0.31915 0.26908 0.22917

F1-Score 0.42308 0.42667 0.8 0.82960

MCC 0.32153 0.34906 0.724405 0.76570

	Classifier
Dataset 1
Accuracy	0.77736	0.77170	0.84057	0.92453
Sensitivity	0.75652	0.75870	0.89130	0.91957
Specificity	0.91429	0.85714	0.88571	0.95714
Precision	0.98305	0.97214	0.98086	0.99296
FPR	0.08571	0.14286	0.11429	0.04286
FNR	0.24348	0.24130	0.10870	0.08043
NPV	0.91429	0.85714	0.88571	0.95714
FDR	0.01695	0.02786	0.01914	0.00704
F1-Score	0.85504	0.85226	0.93394	0.95485
MCC	0.48225	0.44602	0.64441	0.74741
Dataset 2
Accuracy	0.79511	0.79627	0.83242	0.89639
Sensitivity	0.33668	0.25628	0.90955	0.89447
Specificity	0.93333	0.95909	0.87424	0.89697
Precision	0.60360	0.65385	0.68561	0.72358
FPR	0.06667	0.04091	0.12576	0.10303
FNR	0.66332	0.74372	0.09045	0.10553
NPV	0.93333	0.95909	0.87424	0.89697
FDR	0.39640	0.34615	0.31439	0.27642
F1-Score	0.43226	0.36823	0.78186	0.8
MCC	0.33961	0.31624	0.71670	0.73862
Dataset 3
Accuracy	0.85851	0.85851	0.82770	0.92326
Sensitivity	0.01667	0.01667	0.88333	0.96667
Specificity	1.0	1.0	0.87675	0.91597
Precision	1.0	1.0	0.54639	0.65909
FPR	0.0	0.0	0.12325	0.08403
FNR	0.98333	0.98333	0.11667	0.03333
NPV	1.0	1.0	0.876750700280112	0.9159663865546218
FDR	0.0	0.0	0.45361	0.34091
F1-Score	0.03279	0.03279	0.67516	0.78378
MCC	0.11959	0.11959	0.63141	0.75919
Dataset 4
Accuracy	0.76774	0.77806	0.83258	0.90194
Sensitivity	0.32039	0.31068	0.88350	0.89806
Specificity	0.92970	0.94728	0.88225	0.90334
Precision	0.62264	0.68085	0.73092	0.77083
FPR	0.07030	0.05272	0.11775	0.09666
FNR	0.67961	0.68932	0.11650	0.10194
NPV	0.92970	0.94728	0.88225	0.90334
FDR	0.37736	0.31915	0.26908	0.22917
F1-Score	0.42308	0.42667	0.8	0.82960
MCC	0.32153	0.34906	0.724405	0.76570

7.9. Feature analysis and error analysis

The error analysis of the proposed method over the conventional models is shown in Table 9. It is confirmed that the proposed model overcomes the existing models for email-spam detection.

Table 9
Error analysis of proposed and existing works

Dataset 1

Error Analysis of Algorithms

PSO GWO WOA DA proposed FLI-DA-CRNN

0.12830 0.11698 0.13206 0.10943 0.075472

Error Analysis of Classifiers:1

CNN RNN CNN+RNN proposed FLI-DA-CRNN

0.23396 0.23019 0.14905 0.07547

Error Analysis of Classifiers:2

DT KNN SVM NN proposed FLI-DA-CRNN

0.17925 0.18113 0.19245 0.18113 0.07547

Dataset 2

Error Analysis of Algorithms

PSO GWO WOA DA proposed FLI-DA-CRNN

0.12224 0.11991 0.11292 0.11758 0.10361

Error Analysis of Classifiers:1

CNN RNN CNN+RNN proposed FLI-DA-CRNN

0.23632 0.24563 0.12340 0.10361

Error Analysis of Classifiers:2

DT KNN SVM NN proposed FLI-DA-CRNN

0.17695 0.17346 0.16880 0.18510 0.10361

Dataset 3

Error Analysis of Algorithms

PSO GWO WOA DA proposed FLI-DA-CRNN

0.11511 0.10072 0.10552 0.12230 0.07674

Error Analysis of Classifiers:1

CNN RNN CNN+RNN proposed FLI-DA-CRNN

0.26379 0.24700 0.12710 0.07674

Error Analysis of Classifiers:2

DT KNN SVM NN proposed FLI-DA-CRNN

0.19664 0.17746 0.16307 0.17746 0.076739

Dataset 4

Error Analysis of Algorithms

PSO GWO WOA DA proposed FLI-DA-CRNN

0.12516 0.11613 0.12645 0.11742 0.09806

Error Analysis of Classifiers:1

CNN RNN CNN+RNN proposed FLI-DA-CRNN

0.26710 0.27484 0.11226 0.09806

Error Analysis of Classifiers:2

DT KNN SVM NN proposed FLI-DA-CRNN

0.16903 0.19097 0.16000 0.16387 0.09806

7.10. Analysis on execution time

The analysis of the execution time for the algorithm analysis and the classifier analysis was shown in Table 10, Table 11, and Table 12. From the algorithm analysis, the algorithmic time of the PSO, GWO, WOA, DA, and the proposed FLI-DA-CRNN were 450, 421, 398, 356, and 348 seconds. From classifier analysis 1, the classifier time of the CNN and RNN is 450, 421 seconds. The classifier time of both the CNN and RNN was 398. The classifier time of the proposed FLI-DA-CRNN was 348 seconds. From classifier analysis 2, the classifier time of DT, KNN, SVM, NN, and the proposed FLI-DA-CRNN were 192, 189, 186, 179, and 171, respectively.

Table 10
Execution time for algorithm analysis

Approaches Algorithmic time (sec)

PSO 450

GWO 421

WOA 398

DA 356

Proposed FLI-DA 348

Approaches	Algorithmic time (sec)
PSO	450
GWO	421
WOA	398
DA	356
Proposed FLI-DA	348

Table 11

Execution time for classifier analysis 1

Approaches	Classifier time (sec)
CNN	450
RNN	421
CNN+RNN	398
Proposed FLI-DA-CRNN	348

Table 12

Execution time for classifier analysis 2

Approaches	Classifier time (sec)
DT	192
KNN	189
SVM	186
NN	179
Proposed FLI-DA-CRNN	171

8. Conclusion

A novel spam detection model for improved cybersecurity has been developed in this paper. The benchmark dataset of the email was collected in the first step, which was included with both text and image datasets. Next, the feature extraction was done using text features and visual features. The frequency count of spam words like Term Frequency-Inverse Document Frequency (TF-IDF) was extracted in the text features. The color correlogram and Gray-Level Co-occurrence Matrix (GLCM) were determined in the visual features. Further, the optimal feature selection process was done to minimize the length of the extracted feature vector. The optimal feature selection and the hidden neuron optimization of CNN and RNN are done by the proposed Fitness Oriented Levy Improvement-based Dragonfly Algorithm (FLI-DA). In the next step, the hybrid deep learning technique with RNN and CNN has performed the detection. As an improvement, the count of hidden neurons of both RNN and CNN was optimized by the same FLI-DA. In the final step, the optimized hybrid learning technique with Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) classified the data into spam and ham. From the analysis, the accuracy of the proposed FLI-DA-CRNN is 14.93% better than Decision Tree (DT), 12.24% better than K-Nearest Neighbour (KNN), 10.32% better than Support Vector Machine (SVM), and 12.24% better than Neural Network (NN). The outcomes show that better spam detection is performed by the proposed FLI-DA-CRNN when compared with the other optimization models, deep learning models, and machine learning models. Hence, the experimental results revealed the superiority of the developed method in detecting email spam in a very effective manner.

8.1. Future work

Even though the visual and textual features have many advantages, there are some results gets affected because of the proposed classification model. This misclassification will be sought out in the future by developing an innovative and highly performable deep learning model. More image datasets will be considered for experimentation in the future.

References

Al-Rawashdeh,

Mamat and

N.H.B.

Abd Rahim, Hybrid water cycle optimization algorithm with simulated annealing for spam E-mail detection, IEEE Access 7 (2019), 143721–143734. doi:10.1109/ACCESS.2019.2944089.

A.M.

Al-Zoubi,

Faris,

M.A.

Hassonah et al., Evolving support vector machines using whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts, Knowledge-Based Systems 153 (2018), 91–104. doi:10.1016/j.knosys.2018.04.025.

Alsmadi and

Alhami, Clustering and classication of email contents, J. King Saud Univ.-Comput. Inf. Sci. 27(1) (2015), 46–57.

W.A.

Awad and

S.M.

ELseuofi, Machine Learning Methods for Spam E-Mail Classification.

Beck,

Duong,

Lebbah,

Azzag and

Cerin, A distributed approximate nearest neighbors algorithm for efficient large scale mean shift clustering, Journal of Parallel and Distributed Computing 134 (2019), 128–139. doi:10.1016/j.jpdc.2019.07.015.

G.M.

Borkar,

L.H.

Patil,

Dalgade and

Hutke, A novel clustering approach and adaptive SVM classifier for intrusion detection in WSN: A data mining concept, Sustainable Computing: Informatics and Systems 23 (2019), 120–135.

Chikh and

Chikhi, Clustered negative selection algorithm and fruit fly optimization for email spam detection, Journal of Ambient Intelligence and Humanized Computing 10 (2019), 143–152. doi:10.1007/s12652-017-0621-2.

Dataset 1 was available at https://www.kaggle.com/mandygu/lingspam-dataset.

Dataset 2 was available at: https://www.kaggle.com/venky73/spam-mails-dataset.

10.

Dataset 3 was available at: https://www.kaggle.com/balakishan77/spam-or-ham-email-classification.

11.

Dataset 4 was available at: http://www.cs.jhu.edu/~smdredze/datasets/image_spam/.

12.

Douzi,

F.A.

AlShahwan,

Lemoudden and

El Ouahidi, Hybrid email spam detection model using artificial intelligence, International Journal of Machine Learning and Computing 10(2) (2020), 316–322. doi:10.18178/ijmlc.2020.10.2.937.

13.

Ezpeleta,

Zurutuza and

J.M.G.

Hidalgo, A study of the personalization of spam content using Facebook public information, Logic Journal of the IGPL 25(1) (2017), 30–41. doi:10.1093/jigpal/jzw040.

14.

Faris,

A.M.

Al-Zoubi,

A.A.

Heidari,

Aljarah,

Mafarja,

M.A.

Hassonah and

Fujita, An intelligent system for spam detection and identification of the most relevant features based on evolutionary Random Weight Networks, Information Fusion 48 (2019), 67–83. doi:10.1016/j.inffus.2018.08.002.

15.

Günal,

Ergin,

M.B,

Gülmezoğlu and

Ö.N.

Gerek, On feature extraction for spam E-mail detection, in: International Workshop on Multimedia Content Representation, Classification and Security, 2006, pp. 635–642. doi:10.1007/11848035_84.

16.

T.S.

Guzella and

W.M.

Caminhas, A review of machine learning approaches to spam filtering, Expert Systems with Applications 36(7) (2009), 10206–10222. doi:10.1016/j.eswa.2009.02.037.

17.

HamdanMohammad and

AbuZitar, Application of genetic optimized artificial immune system and neural networks in spam detection, Applied Soft Computing 11(4) (2011), 3827–3845. doi:10.1016/j.asoc.2011.02.021.

18.

Huang,

Lin,

Huang and

Xing, A novel approach for precipitation forecast via improved K-nearest neighbor algorithm, Advanced Engineering Informatics 33 (2017), 89–95. doi:10.1016/j.aei.2017.05.003.

19.

Idris,

Selamat,

N.T.

Nguyen,

Omatu,

Krejcar,

Kuca and

Penhaker, A combined negative selection algorithm–particle swarm optimization for an email spam detection system, Engineering Applications of Artificial Intelligence 39 (2015), 33–44. doi:10.1016/j.engappai.2014.11.001.

20.

Jafari and

M.H.

Bayati Chaleshtari, Using dragonfly algorithm for optimization of orthotropic infinite plates with a quasi-triangular cut-out, European Journal of Mechanics A/Solids 66 (2017), 1–14. doi:10.1016/j.euromechsol.2017.06.003.

21.

Jain,

Sharma and

Agarwal, Spam detection in social media using convolutional and long short term memory neural network, Annals of Mathematics and Artificial Intelligence 85 (2019), 21–44. doi:10.1007/s10472-018-9612-z.

22.

W.Z.

Khan,

M.K.

Khan,

F.T.

Bin Muhaya,

M.Y.

Aalsalem and

Chao, A comprehensive study of email spam botnet detection, IEEE Communications Surveys & Tutorials 17(4) (2015), 2271–2295. doi:10.1109/COMST.2015.2459015.

23.

Krishnamurthy, Internet spam threats and email exploitation – A scuffle with inbox attack.

24.

Kumaresan,

Saravanakumar and

Balamurugan, Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine, Cluster Computing 22 (2019), 33–46. doi:10.1007/s10586-017-1615-8.

25.

Li and

Liu, A hybrid convolutional and recurrent neural network for hippocampus analysis in Alzheimer’s disease, Journal of Neuroscience Methods 323 (2019), 108–118. doi:10.1016/j.jneumeth.2019.05.006.

26.

Malipatil,

Maheshwari and

M.B.

Chandra, Area optimization of CMOS full adder design using 3T XOR, in: 2020 International Conference on Wireless Communications Signal Processing and Networking (WiSPNET) IEEE, 2020, pp. 192–194. doi:10.1109/WiSPNET48689.2020.9198627.

27.

Mirjalili and

Lewis, The whale optimization algorithm, Advances in Engineering Software 95 (2016), 51–67. doi:10.1016/j.advengsoft.2016.01.008.

28.

Mirjalili,

S.M.

Mirjalili and

Lewis, Grey wolf optimizer, Advances in Engineering Software 69 (2014), 46–61. doi:10.1016/j.advengsoft.2013.12.007.

29.

Mujtaba,

Shuib,

R.G.

Raj,

Majeed and

M.A.

Al-Garadi, Email classication research trends: Review and open issues, IEEE Access 5 (2017), 9044–9064. doi:10.1109/ACCESS.2017.2702187.

30.

A.A.

Naem,

N.I.

Ghali and

A.A.

Saleh, Antlion optimization and boosting classifier for spam email detection, Future Computing and Informatics Journal 3(2) (2018), 436–442. doi:10.1016/j.fcij.2018.11.006.

31.

N.K.

Nagwani and

Sharaff, SMS spam ltering and thread identification using bi-level text classication and clustering techniques, J. Inf. Sci. 43(1) (2017), 75–87. doi:10.1177/0165551515616310.

32.

Noorizadeh,

Shakerpour,

Meskin,

Unal and

Khorasani, A cyber-security methodology for a cyber-physical industrial control system testbed, IEEE Access 9 (2021), 16239–16253. doi:10.1109/ACCESS.2021.3053135.

33.

S.O.

Olatunji, Improved email spam detection model based on support vector machines, Neural Computing and Applications 31 (2019), 691–699. doi:10.1007/s00521-017-3100-y.

34.

Patidar,

Singh and

Singh, A novel technique of email classification for spam detection, International Journal of Applied Information Systems 5(10) (2013), 15–19. doi:10.5120/ijais13-450976.

35.

Peng,

Li,

Zou and

Wu, Behavioral malware detection in delay tolerant networks, IEEE Transactions on Parallel and Distributed Systems 25(1) (2014), 53–63. doi:10.1109/TPDS.2013.27.

36.

Priyanka and

Kumar, Feature extraction and selection of kidney ultrasound images using GLCM and PCA, International Conference on Computational Intelligence and Data Science (ICCIDS 2019) 167 (2020), 1722–1731.

37.

Renuka and

Hamsapriya, Email classification for spam detection using word stemming, Int J Comput Appl 5(5) (2010), 45–47.

38.

Saha,

DasGupta and

S.K.

Das, Spam mail detection using data mining: A comparative analysis, in: Smart Intelligent Computing and Applications, 2018, pp. 571–580.

39.

Shen and

Li, Leveraging social networks for effective spam filtering, IEEE Transactions on Computers 63(11) (2014), 2743–2759. doi:10.1109/TC.2013.152.

40.

Shuaib,

S.M.

Abdulhamid,

O.S.

Adebayo,

Osho,

Idris,

J.K.

Alhassan and

Rana, Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification, SN Applied Sciences 1 (2019), 390. doi:10.1007/s42452-019-0394-7.

41.

Tsang,

Kao,

K.Y.

Yip,

W.-S.

Ho and

S.D.

Lee, Decision trees for uncertain data, IEEE Transactions on knowledge and data engineering 23(1) (2011), 64–78. doi:10.1109/TKDE.2009.175.

42.

Wang,

Tan and

Liu, Particle swarm optimization algorithm: An overview, soft computing 22 (2018), 387–408. doi:10.1007/s00500-016-2474-6.

43.

Yang, Research and realization of Internet public opinion analysis based on improved TF – IDF algorithm, in: 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 2017.

44.

Zhao,

Lu,

Chen,

Liu and

Wu, Convolutional neural networks for time series classification, Journal of Systems Engineering and Electronics 28(1) (2015), 162–169. doi:10.21629/JSEE.2017.01.18.

	Classifier

Metric	Adaboost	Voting_classifier	LSTM_CNN	Proposed FLI-DA-CRNN
Dataset 1
Accuracy	0.77736	0.77170	0.84057	0.92453
Sensitivity	0.75652	0.75870	0.89130	0.91957
Specificity	0.91429	0.85714	0.88571	0.95714
Precision	0.98305	0.97214	0.98086	0.99296
FPR	0.08571	0.14286	0.11429	0.04286
FNR	0.24348	0.24130	0.10870	0.08043
NPV	0.91429	0.85714	0.88571	0.95714
FDR	0.01695	0.02786	0.01914	0.00704
F1-Score	0.85504	0.85226	0.93394	0.95485
MCC	0.48225	0.44602	0.64441	0.74741
Dataset 2
Accuracy	0.79511	0.79627	0.83242	0.89639
Sensitivity	0.33668	0.25628	0.90955	0.89447
Specificity	0.93333	0.95909	0.87424	0.89697
Precision	0.60360	0.65385	0.68561	0.72358
FPR	0.06667	0.04091	0.12576	0.10303
FNR	0.66332	0.74372	0.09045	0.10553
NPV	0.93333	0.95909	0.87424	0.89697
FDR	0.39640	0.34615	0.31439	0.27642
F1-Score	0.43226	0.36823	0.78186	0.8
MCC	0.33961	0.31624	0.71670	0.73862
Dataset 3
Accuracy	0.85851	0.85851	0.82770	0.92326
Sensitivity	0.01667	0.01667	0.88333	0.96667
Specificity	1.0	1.0	0.87675	0.91597
Precision	1.0	1.0	0.54639	0.65909
FPR	0.0	0.0	0.12325	0.08403
FNR	0.98333	0.98333	0.11667	0.03333
NPV	1.0	1.0	0.876750700280112	0.9159663865546218
FDR	0.0	0.0	0.45361	0.34091
F1-Score	0.03279	0.03279	0.67516	0.78378
MCC	0.11959	0.11959	0.63141	0.75919
Dataset 4
Accuracy	0.76774	0.77806	0.83258	0.90194
Sensitivity	0.32039	0.31068	0.88350	0.89806
Specificity	0.92970	0.94728	0.88225	0.90334
Precision	0.62264	0.68085	0.73092	0.77083
FPR	0.07030	0.05272	0.11775	0.09666
FNR	0.67961	0.68932	0.11650	0.10194
NPV	0.92970	0.94728	0.88225	0.90334
FDR	0.37736	0.31915	0.26908	0.22917
F1-Score	0.42308	0.42667	0.8	0.82960
MCC	0.32153	0.34906	0.724405	0.76570

Dataset 1
Error Analysis of Algorithms
PSO	GWO	WOA	DA	proposed FLI-DA-CRNN
0.12830	0.11698	0.13206	0.10943	0.075472
Error Analysis of Classifiers:1
CNN	RNN	CNN+RNN	proposed FLI-DA-CRNN
0.23396	0.23019	0.14905	0.07547
Error Analysis of Classifiers:2
DT	KNN	SVM	NN	proposed FLI-DA-CRNN
0.17925	0.18113	0.19245	0.18113	0.07547
Dataset 2
Error Analysis of Algorithms
PSO	GWO	WOA	DA	proposed FLI-DA-CRNN
0.12224	0.11991	0.11292	0.11758	0.10361
Error Analysis of Classifiers:1
CNN	RNN	CNN+RNN	proposed FLI-DA-CRNN
0.23632	0.24563	0.12340	0.10361
Error Analysis of Classifiers:2
DT	KNN	SVM	NN	proposed FLI-DA-CRNN
0.17695	0.17346	0.16880	0.18510	0.10361
Dataset 3
Error Analysis of Algorithms
PSO	GWO	WOA	DA	proposed FLI-DA-CRNN
0.11511	0.10072	0.10552	0.12230	0.07674
Error Analysis of Classifiers:1
CNN	RNN	CNN+RNN	proposed FLI-DA-CRNN
0.26379	0.24700	0.12710	0.07674
Error Analysis of Classifiers:2
DT	KNN	SVM	NN	proposed FLI-DA-CRNN
0.19664	0.17746	0.16307	0.17746	0.076739
Dataset 4
Error Analysis of Algorithms
PSO	GWO	WOA	DA	proposed FLI-DA-CRNN
0.12516	0.11613	0.12645	0.11742	0.09806
Error Analysis of Classifiers:1
CNN	RNN	CNN+RNN	proposed FLI-DA-CRNN
0.26710	0.27484	0.11226	0.09806
Error Analysis of Classifiers:2
DT	KNN	SVM	NN	proposed FLI-DA-CRNN
0.16903	0.19097	0.16000	0.16387	0.09806

Enhancement of email spam detection using improved deep learning algorithms for cyber security

Abstract

Keywords

Nomenclature

1. Introduction

2. Literature survey

2.1. Related works

2.2. Review

3.1. Developed email spam detection model

4. Text features and visual features adopted for proposed email spam detection

4.1. Text feature extraction

4.2. Visual feature extraction

4.3. Optimal feature selection

5.1. Objective function

5.2. Encoding of solution

6.1. Convolutional neural network

6.2. Recurrent neural network

6.3. Hybridization of two classifiers

7.1. Experimental setup

7.2. Experimental parameters

Table 4 Parameter settings Methods Experimental parameters PSO C 1 = 2 ; c 2 = 2 ; w max = 0.9 ; w min = 0.1 WOA Leader-score = ‘inf’ DA Food-Fitness = ‘inf’ Enemy-Fitness = ‘inf’

7.4. Accuracy of email spam detection using optimized models

Table 10 Execution time for algorithm analysis Approaches Algorithmic time (sec) PSO 450 GWO 421 WOA 398 DA 356 Proposed FLI-DA 348

8.1. Future work

References

Table 4
Parameter settings

Methods Experimental parameters

PSO $C_{1} = 2$ ; $c_{2} = 2$ ; $w_{max} = 0.9$ ; $w_{min} = 0.1$

WOA Leader-score = ‘inf’

DA Food-Fitness = ‘inf’

Enemy-Fitness = ‘inf’

Table 10
Execution time for algorithm analysis

Approaches Algorithmic time (sec)

PSO 450

GWO 421

WOA 398

DA 356

Proposed FLI-DA 348