Cloud e-mail security: An accurate e-mail spam classification based on enhanced binary differential evolution (BDE) algorithm

Abstract

The evolution of technology has brought new challenges and opportunities for the different dimensions of feature space. The higher dimension of the feature space is one of the most critical issues in e-mail classification problems due to accuracy considerations. The problem of finding the subset features that significantly influence the performance of e-mail spam classification has become one of the important challenges. This paper proposes to overcome such a problem, an intelligent approach to Binary Differential Evolution Support Vector Machine (BDE-SVM). The proposed approach enhances the Binary Differential Evolution (BDE) algorithm based on the correlation coefficient as a fitness function to select the significant subset feature evaluated by an SVM classifier. To our best of knowledge, the correlation coefficient as the fitness function has not been used in the differential evolution algorithm before. The selected subset feature is used to assess the most features that contribute to the reliability of the email spam classification. The finding of the enhanced BDE is to present a powerful accuracy. The tests were conducted using “Spambase” and “SpamAssassin.” Identified benchmark datasets are to assess the feasibility of the proposed solution. The result with full-feature accuracy was 93.55 percent compared to the proposed BDE-SVM approach, which is 93.99 percent. Empirical findings also show that our method is capable of effectively increasing the number of features required to enhance the reliability of the email spam classification.

Keywords

Feature selection e-mail e-mail classification differential evolution (DE)support vector machine (SVM)

1 Introduction

Since e-mails are widely used as a platform for sharing information, most problems arise due to unwanted and unsolicited large e-mail messages known as spam e-mails [1,2, 1,2]. While problem scales continue to increase with the rapid growth of e-mail users worldwide, meaningful work in the identification of e-mail spam is needed to improve classification reliability [3]. e-mail spam classification is a supervised learning issue, and classification is critical to solving the spam problem. The high dimension of the feature space is a significant issue that can affect email spam classification [4, 5]. Furthermore, a reduction in the size of the data by removing the irrelevant features leads to a reduction in the hypotheses space’s size, allowing algorithms to run better and more efficiently [6]. On the other hand, feature selection (FS) techniques are essential in reducing the feature space’s size when a subset of features has been selected from the full original features [7]. Using FS is to reduce the size of the feature space by eliminating unnecessary and irrelevant features [8]. Typically, the feature selection algorithms’ performance is assessed by comparing the performance of the classification algorithms before and after the selection of the subset features. A search algorithm is also required to exploit space features such as binary differential evolution (BDE). In this paper, the suggested method (BDE-SVM) using enhanced BDE with correlation coefficients as the fitness function for the selection of features and the SVM classifier is tested. This is to improve the classification of e-mail spam by reducing the size of the spam. The proposed DE-SVM approach is designed in term of reducing the high dimension of the feature space and increase the quality of accuracy for e-mail spam classification based on the SVM algorithm. One of the contributions of this work, to our best knowledge, the correlation coefficient as a fitness function has not been used in the DE algorithm, and then the combination is used as a feature selection approach. In addition, a comparative study is carried out among our proposed method and the many common approaches such as the support vector machine (SVM), the particle swarm intelligence (PSO) and the genetic algorithm (GA). The experimental results show that the use of BDE with coefficient of correlation as a fitness function for the selection of features and SVM as a classifier has higher classification accuracy compared to the use of SVM only as a classifier. This paper is organized into seven sections: section 1 introduced the subject, section 2 discusses the related work, the proposed improved solution and its structure is discussed in section 3, the proposed Differential Evolution (DE) with a Correlation Coefficient was introduced, the implementation, results and discussion of the proposed method is discussed in section 4, section 5 discusses the experimental results, section 6 compares our results with other methods with in-depth discussions, and the conclusions and recommendations are presented in section 7.

2 Related work

This section presents the previous studies related to supervised machine learning techniques to extract features from an email. Various security applications, such as email anti-spam, may be offered as cloud services. Many intelligent solutions have already been created to reduce the miss classification percentages of the e-mail classifier [9]. They can protect either the virtual infrastructure or the customer’s physical infrastructure and are designed as transparent services using the overlay network or as well-known endpoint services. The internet is an essential part of today’s life, and we spend much of our time on the internet. e-mail is a well-established technology used worldwide in many fields of education and industry for enterprise and private communication through the internet [10]. Email features along with other features are used in numerous research studies for spam email detection [11, 12]. For classification of spam email, a number of machine learning-based methods, such as content-based supervised learning, rule-based learning, semi-supervised learning and unsupervised learning, have been proposed [11]. This is in terms of the low cost of transmission, the quick delivery of a message, and the enhancement of efficient communication [13, 14]. Researchers examined different methods and technologies for improving accurate spam classification and filtering systems [15, 16]. Fortunately, different methods make it possible to automatically detect or classify and remove many of these spam emails, and one of the best-known classifier techniques for binary classification is the SVM algorithm [17, 18]. Email spam classification is a supervised learning problem, and classification is an essential method of eliminating spam e-mails [19]. Recent research shows that e-mail classification is usually based on statistical theory and machine learning (ML) algorithms to distinguish between non-spam e-mail and spam e-mail [20, 21]. Classification issues have been extensively studied in Machine Learning (ML), data mining, and information retrieval with a variety of domains, such as e-mail spam classification [22, 23]. ML is considered to be a branch of artificial intelligence methods and is concerned with the improvement of techniques and methods that allow the computer to learn [24]. ML methods can extract knowledge from a group of e-mails provided [9, 25]. The goal of ML is to improve the effectiveness of the e-mail classifier through experience in creating better decisions and solving problems in an intelligent way through the use of illustrative data [9, 26]. Therefore, Fixing the problem of a high dimension of feature space is one of the most important things in the identification of e-mail spam. Despite many methods of related work, the selection of features is still an emerging area of research [27]. Then, many researchers are looking for new methods to select subset features, enhance classification accuracy, and decreasing execution time [1]. Furthermore, e-mail spam detection is becoming very important when dealing with high-dimensional data [28]. Then, the main problem with the classification of e-mail spam is the high dimension of feature space [29]. One of the fundamental reasons for the selection of features is to solve the high dimensional feature space (curse dimensional) problem. Another motivation for selecting features is to identify unimportant or redundant features and select an optimum subset of features that minimizes the predictive error of the classifiers [30]. In order to address the problem of e-mail spam classification, several scientists have worked to improve the accuracy of classification by decreasing features or eliminating irrelevant and redundant features or by selecting the appropriate features [1 , 20]. Several search techniques applied to the selection of features; most of them usually suffer from local enhancement and/or high computational complexity problems. Therefore, a computationally cheap global search technique is required to create a good feature selection algorithm [31]. The Differential Evolution (DE) is a masses-based algorithm that can be seen as like Genetic Algorithm (GA) since it uses operators such as: crossover, mutation and selection [32].

3 The proposed improved solution and its structure

Advanced solutions have had considerable success in several real-world complex issues solving in modern times. The value of an integrated system is not negotiable, depending on the weakness of the individual system, and the enhanced system is designed to balance the deficiency of these individual intelligent systems. After extracting the email features set, an optimization method based on DE that uses these features to identify the centroids for classification of email spam is proposed. According to the email spam problem, the formulation of the objective function requires maximizing the accuracy that means the number of messages that are correctly detected as SMS spam (TP) and the number of messages are correctly classified as not spam (TN) should be maximized. In this paper, the Binary Differential Evolution (BDE) algorithm was proposed as a feature subset selection to select subset features while at the same time increasing the accuracy of the e-mail spam classification based on the SVM learning algorithm. The binary differential evolution (BDE-SVM) approach consists of two sub-approaches: the binary differential evolution (BDE) approach based on the correlation coefficient as a fitness function for feature selection and the SVM algorithm for e-mail spam classification. The term “Binary” refers to the chromosome configuration that should be modulating into binary dimension space. Each sub approach in the BDE-SVM approaches acts as a separate approach and runs independently from the others. The BDE is trained to select the optimal (or near-optimal) subset features. The outputs of the BDE are then directly applied to the second approach using the SVM learning algorithm as an e-mail spam classifier. The BDE-SVM methodology was designed to test the outcome of a trained method. Due to the high number of specimens associated with the datasets, the dataset was split into 70 percent for learning and 30 percent for research. Figure 1 shows the general phase of our suggested approach.

Fig. 1

Framework of our approach.

4 Proposed differential evolution (DE) with a correlation coefficient

Evolutionary Computations (ECs) are a type of optimization methodology based on the processes of evolution and behavior of living organisms. In literature, evolutionary algorithms (EAs) are often treated in the same way as ECs. Evolutionary saves sufficient data on functionality, storage space, and population information during the iterative search process. Intelligence algorithms are known as EAs or ECs algorithms. ML is a data-driven learner, aiming to achieve higher accuracy in predictions, and can be used for knowledge extraction [31]. ML techniques can be combined with different EC algorithms in a variety of ways, and they also impact ECs in a variety of ways. Accordingly, ML techniques can help the ECs algorithms search more effectively and efficiently and are employed for analyzing and evaluating data to improve EAs search efficiency as opposed to classical versions. In addition, improvement of population initialization for ECs by ML techniques is found to have played an important role in the literature. In addition, ML was introduced for ECs to enhance fitness evaluation and selection. Furthermore, ML can use for population reproduction and variation. The author can refer to the recent survey presented by [33] for more information about using ML in ECs. In this study, the SVM is used as classifiers; while the BDE technique is used to pick the optimum subset features, it is used in terms of the high performance achieved. Differential Evolution (DE) is one of the EAs that can be used to select the optimum subset features and can increase efficiency to improve classification accuracy. DE is used as a search technique due to its rapid convergence rate compared to other approaches. The control parameters of the DE are: F = rand (0,2), which is called the mutation-scaling factor, and CR = rand (0,1) is called the crossover constant, both of which are chosen by the practitioner along with the population size (NP) ≥4 (must be at least 4). The choice of DE control parameters F, CR, and NP can have a major impact on performance optimization. The large value of the F parameter leads to a higher diversity of population generation, while the lower value causes faster convergence [4]. The fitness function, evaluating the performance of each chromosome, must be designed before the optimal value searches begin. Therefore, we must use heuristic methods to search for a subset of space in a reasonable time. In this article, we use the correlation coefficient as a new fitness method to achieve the best chromosome that helps to choose a subset variable. The values of F=0.5, CR=0.9, and population size (NP)=100 would also be used for this article. In addition, the different number of iterations and the different number of runs will be used to produce different results. More details of the parameter value are illustrated in Table 1 [34, 35]. Individuals are assessed on a generation-by-generation basis to determine their performance, and the best participant is selected to track evolutionary progress. The DE process steps are as follows:

Table 1
IDE control parameter values

Parameter name Parameter value

CR 0.9

F 0.5

NP 100

No. of. Iterations 1000

No. of. Runs 20

Parameter name	Parameter value
CR	0.9
F	0.5
NP	100
No. of. Iterations	1000
No. of. Runs	20

4.1 Initialization

A society of generation G (denoted by X_i,G) comprises a collection of d-dimensional vector parameters where each society vector corresponds to a potential solution to the problem (target vector). Initially, all individuals are randomly generated by the uniform probability distribution. Think we would like to optimize the function and choose the size of the population (number of vectors in the population) NP. The parameter vectors have the form:

X_i,G where i = 1, 2, . . . NP, and G is the generation number.

Identify an appropriate population of target vectors. Each goal vector includes different design parameters. The lower and upper limits are defined for each parameter $(X_{i}^{l} < X_{i, G} < X_{i}^{u})$ . In addition, pick the initial parameter values evenly on the intervals $(X_{i}^{l}, X_{i}^{u})$ , and, for each target vector, select other vector parameters randomly.

4.2 Mutation

There are five different learning methods in DE that can be used as mutations. We will adopt the DE / Rand/1/bin formula, which is widely used in many DE literature, and generally, this equation gives better diversity [36]. During the mutation point, the DE algorithm generates new vectors by applying the weighted difference between the two vectors to the third vector. For each target vector X_i,G where i = 1, 2, . . . NP, the G + 1 generation mutant vector (denoted by, V_i,G+1) is generated as follows.

$V_{i, G + 1} = X_{r 1, G} + F (X_{r 2, G} - X_{r 3, G})$

Where r₁, r₂, r₃ are random indices r₁, r₂, r₃ ∈ 1, 2, . . . NP; r₁ ≠ r₂ ≠ r₃ ≠ i and F ∈ [0, 2] is a scale factor that controls the amplification of the differential variation X_r2,G - X_r3,G.

Researches apply in the area of modifications to enhance the equation of mutation to obtain the best member of the population; this may lead to a faster convergence and performance improvement. Therefore, the main aim of the mutation operator is to present some diversity in the population, to extend the effective area of the search space that the algorithm considers.

4.3 Crossover

DE uses a uniform convergence technique to increase the diversity of disrupted vector parameters. The trial vector U_i,G+1 is developed from the elements of the mutant vector V_i,G+1 and the target vector X_i,G.

U_i,G+1 = (U_1i,G+1, U_2i,G+1, . . . , U_Di,G+1)

Where:

$U_{ji, G + 1} = {\begin{matrix} V_{ji, G + 1}, if (rand (j) \leq CR) or j = rn (i) \\ X_{ji, G}, if (rand (j) > CR) or j \neq rn (i) \end{matrix}}$

j = 1, 2, . . . , D

From above, rand (j) is the j^th evaluation of a uniform random number generator with the outcome of rand (j) ∈ [0, 1]. CR is crossover probability CR ∈ [0, 1]. Crossover probability controls the fraction of parameter values that are copied from the mutant vector. A randomly chosen index rn (I) ∈ (1, 2, . . . , D) which ensures that U_i,G+1 gets at least one parameter from V_i,G+1. Therefore, the main goal of the crossover is to increase diversity in the population.

4.4 Selection

In selection operation, the trial vector at generation G + 1 (U_i,G+1) is compared to the target vector at generation G (X_i,G). If the trial vector at generation G + 1 (U_i,G+1) produces more cost value than the target vector at generation G (X_i,G), then the trial vector replaces the target vector in the next generation (generation G + 1) for a maximization problem. Otherwise, the old value X_i,G is retain and X_i,G+1 is determined as follows:

$X_{i, G + 1} = {\begin{matrix} U_{i, G + 1}, if f (U_{i, G + 1}) \geq f (X_{i, G}) \\ X_{i, G}, if f (U_{i, G + 1}) < f (X_{i, G}) \end{matrix}}$

Where f (.) denotes the objective function of the given vector.

Due to the large number of samples associated with the two datasets used for this analysis, the dataset was split into 70 percent for learning and 30 percent for testing during experiments. Since the acceptable size of the most reliable feature subset is unclear, the suggested method has been used for different feature set sizes. The method is used for the twentieth run when searching for each subset size function. The fitness function is an important measurement step in evolutionary computations (ECs) techniques. The use of the objective function is to decide which individuals are fit to get an optimum solution. These techniques use these functions to determine which chromosome achieves the best solution and is considered to be the most appropriate. Each chromosome has the chance to run again (survive) at the next generation of a new population, and the correlation coefficient function as a fitness factor has been used for each chromosome. Then, each chromosome (individuals) has its own fitness, and the top chromosomes with lower objective function value mean that the chromosome is very important for problem-solving. The contribution of this paper is the use of correlation coefficient functions as a fitness function of the BDE algorithm to select the sub-set features and to help the SVM learning algorithm as a classifier to increase the accuracy. We have improved the traditional DE algorithm from its classical form to enhance its convergence characteristics. In contrast, we used a representation method for search variables to evaluate the near-optimal number of features. The first step is using a correlation coefficient (r) as a fitness function of BDE:

$r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x_{i}}) (y_{i} - \bar{y_{i}})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - \bar{x_{i}})^{2}} \sqrt{\sum_{i = 1}^{n} (y_{i} - \bar{y_{i}})^{2}}}$

Where x_i features are values and y_i are the target value (output value), $\bar{x_{i}}$ is feature mean and $\bar{y_{i}}$ is target means. Many studies have used the correlation coefficient to calculate the significance of the features without consideration for the (target value) output value. This paper used the correlation coefficient to calculate the significance of the input (features) and the output (class label) and compare the result of the significance with others, and the highest significance values that mean have better fitness. There are many researchers used the DE algorithm before in their work with different fitness function, but nobody used the correlation coefficient as a fitness function in his work before. From above, we used correlation coefficient as an objective function (fitness function) in BDE to improve the BDE in terms of selecting the optimal subset feature that can enhance the classification accuracy and decreasing the response time. In the binary differential evolution (BDE) for feature selection approach, each chromosome is generated to control the activation and deactivation of its corresponding bit features. The genes in the chromosome represent the required number of selected features. The individuals in this study were initialized by randomly assigning a value of 0, 1 to the variables [34]. To perform the feature selection process, modulation of this real code of binary code is required. Principally, the reason for modulating the real value into a binary value is as follows. The binary number can be used for controlling the selected and unselected features since the real number is unable to directly control such BDE activations. The BDE approach is conjured to operate in binary mode; if gene=1, then the corresponding feature is active and is included in the final selection; otherwise, the corresponding feature will not be considered and will be excluded. The core goal of this paper is to investigate the effect of modulation formulas on the extraction of optimally efficient features. This study employs a binary modulation formula presented by [37] as shown in the equation below:

${FS}_{i} = {\begin{matrix} 1, if rand () \geq \exp (- | x_{i} |) \\ 0, if rand () < \exp (- | x_{i} |) \end{matrix}}$

From the above equation FS_i refers to the corresponding binary of the real value gene |x_i|, rand () is a function that generates a real random number between 0 and 1, and - (|x_i|) is an exponential value of the correspondent gene |x_i|. If rand () is greater or equal to exp (- |x_i| then FS_i = 1 else FS_i = 0.

On the other hand, after transforming the value of gene |x_i|, using the modulation into binary code and if the value of FS_i = 1, then the corresponding feature is selected as an optimal feature. Otherwise, if the bit contains 0 then the corresponding feature is not selected and is not considered as an optimal feature. In the chromosome structure, the first bit refers to the first feature, while the second bit refers to the second feature, and so on. Figure 2 explains the feature representation (chromosome structure).

Fig. 2

Chromosome structure-feature positions.

A chromosome is a series of genes, and their values are controlled through binaries probability appearance. So all probable solutions will not exceed the limit of 2ⁿ where two refers to binary status [0, 1] and n is the number of features. Within this limited search space, the DE covers all these expected solutions and assigns fitness to each chromosome. Feature selection is performed by selecting relevant features whose scaling factor is one and eliminating irrelevant and redundant features whose scaling factor is zero. Therefore, to extract important features, the DE works at different levels: the iteration level and the run level. Each run level contains a specific number of iterations; for example, in our case, we set iteration = 1000 and run = 20. Then, on every single run, the DE starts to find out the best solution among all the iterations that are represented in binary form. The fitness function is used to decide the best chromosome. Then, our system increases the run by one until the last predefined number of runs. In the end, we will obtain a binary array of n runs. Summing up all these binary values in normal ways will result in extracting the importance of the features.

For better understanding, observe the depiction of a chromosome shown in Fig. 3. Each genome is expected to hold a value between (0, 1), and the example of a chromosome is shown in Fig. 3. It may be considered a difficult task to assign a fitness value to the chromosome in its current format; refer to “Row 1” in Fig. 3. From this point of view, a “modulation layer” is needed to generate the corresponding chromosome. Initial values of Row 1 are real values such as [0.73, 0.92 . . .0.44]. However, these values in their current format and random value are not useful. They are then modulated into a binary string to control feature de/activation as shown at “Row 2” [0, 1 . . .1]. This modulation tells the system to generate a classification only based on the active features and ignores the inactive features. A binary modulation formula is used to modulate the genome’s real value into binary values that were proposed by [38], as shown above.

Fig. 3

Chromosome value modulations.

5 Implementation, results, and discussion

Our experiments looked at the amount of classified e-mail spam before entering the Inbox mails. Our approach was tested according to the group of subset features, and we select the best group obtained and compare it with others. The features were divided into nine groups, and each group had a certain number of e-mails and featured improved for each category with each comparison of tests. We started with three optimal (near-optimal) features in the first group. Then, we added 2 more optimal (or near-optimal) features to the first group and 3, 5, 10, 15, 20,25,30,35, and 57 or 88 respectively. The features are organized in descending order for ease of comparison. Each feature value tells its importance and effect on the e-mail. The purpose of this grouping method is to research the accuracy of the e-mail according to the optimal (near-optimal) features selected and to cancel redundant and irrelevant features. Optimal (near-optimal) features are selected in order to enter the second test comparison phase. On the other hand, we ignored the features that are redundant and irrelevant or have scored close to ’0’. Testing is applied again after feature subset selection. We found that the degree of classification increased with a high dimension relative to the first experiments because the degree of classification relied on the number of features derived from e-mails. Consequently, the reduction of unimportant features led to an increase in the degree of classification and vice-versa. The similarity score was determined, and the datasets are then used for cross-checking.

5.1 Experimental results and discussions

The results of this paper indicate two evaluation points of view. The first point is a calculation of the accuracy, and the second is to assess the level of the correlation between the system result and others using a T-test. However, the T-test showed that these results were statistically significant. Figure 4 provides the benefit charts for both SVM training and testing outcomes as classifiers before using the BDE-SVM method. Gains chart with the best line is ($Best-SVM), and the result of SVM before improvement is ($S-SVM).

Fig. 4

Training and testing result of SVM before improvement.

This paper is executed two kinds of experiments: BDE and SVM, as mentioned earlier. Figures 6 demonstrate the result of using BDE as feature subset selection to select the optimal subset features based on population size = 50, iteration no = 1000, and a number of run = 20 for “spambase” and “spamassassin” dataset, respectively.

Fig. 5

Example results in BDE using spambase dataset.

Fig. 6

Example results in BDE using spamassassin dataset.

Tables 3 provide the result of classification using the SVM learning algorithm after reducing the number of features based on the binary differential evolution (BDE) algorithm. This paper produced results by selecting the optimal (near-optimal) subset features using binary differential evolution. The selection of the optimal subset features based on an evolutionary algorithm such as differential evolution, the fitness function is very important. In this paper, a correlation coefficient is used as a fitness function. The approach for the paper reduced the number of features by selecting a subset of all features based on binary differential evolution (BDE). Table 2 demonstrates the result of the classification, testing accuracy, the execution time (computational complexity) based on the SVM algorithm. In addition, Table 3 illustrates the result of a false positive, false negative, precision, recall, and F-measure using the SVM algorithm. The accuracy of classification using the SVM algorithm and the subset of features is 93.99%for testing while the miss-classification is 6.01%for testing of accuracy, the execution-time result is 53.09 seconds, false-negative rate is 0.089, the false positive rate is 0.042, F-measure is 0.94 the precision is 0.94, and the recall is 0.96.

Table 2

Analysis of the training and testing result based on BDE-SVM

Classification	Accuracy %	Execution-time
Accuracy	93.99	53.09 Sec
Miss-classification	06.01	53.09 Sec

Table 3

Analysis of the training and testing result based on BDE-SVM

False-Positive	False-Negative	Precision	Recall	F-measure
0.042	0.089	0.94	0.96	0.95

Table 2 shows that the accuracy of e-mails classified was improved for the datasets besides reduce the execution time. This accuracy indicates most of the e-mail is successfully pass to classifiers for recognition with the lowest execution time, 53.09 Sec. This indicates that the proposed approach is improving the accuracy of e-mail spam classification.

Table 3 displays that the false positive, precision, recall, and F-measure of e-mails classified are improving for the datasets. Figure 7 illustrates the gain charts for both training and testing results after selecting the optimal (or near-optimal) subset features based on SVM as a classifier and BDE as feature subset selection. Gains charts with the best line are ($Best-BDESVM), and the result of SVM after using BDE as feature subset selection is ($S- BDESVM).

Fig. 7

Training and testing of SVM after use of BDE.

Table 4 demonstrates the result for different methods, accuracy, recall, F-measure, false positive, and execution time to compare with our result. In addition, Table 4 compared two kinds of experiments that are executed in this paper. Feature subset selection based on binary differential evolution (BDE) and classification based on SVM.

Table 4

The comparisons results in our approach

Approach	Accuracy %	Recall%	F-measure	False positive	False negative
Full features with SVM	93.55	0.96	0.93	0.042	169.42
BDE-SVM	93.99	0.96	0.95	0.042	53.09

The former is responsible for obtaining optimal (or near-optimal) subset features from all features, while the latter is responsible for the classification using the selected features for the e-mail spam classification problem. The SVM algorithm was designed to test and evaluate the performance of the BDE method by selecting a feature for e-mail spam classification problems.The results demonstrate that the accuracy using the new approach (BDE-SVM) is better than the following optimization techniques: (SVM and GA). Many authors used GA as an example to improve the accuracy of SVM. The outcome after using the feature subset selection based on binary differential evolution and correlation coefficient as fitness function and using SVM as classifiers gives a better result in terms of accuracy and time of execution as described above. Table 4 compares the result of the classification, accuracy, the execution time based on the SVM algorithm, and different methods as feature selection.

Also, Table 4 illustrates the result of false positives, precision, recall, and F-measure using the SVM algorithm. Based on the generalization of the obtained results, the accuracy of the proposed approach BDE-SVM is 93.99%, which is better than using SVM only. Figures 8 demonstrate the accuracy result comparisons between the previous result and the recent result. Figure 8 indicates the comparison among the accuracy of our methods and the y-axis of the percentages of accuracy for the proposed method relative to other approaches. Figure 8 explains that the new approach based on BDE-SVM has achieved better accuracy than others.

Fig. 8

Line comparisons among our results.

Figure 9 shows the comparison between the accuracy and time of execution of our results and the others. From Fig. 9, the x-axis shows the differences between the accuracy and time of execution of each method, and the y-axis shows the proportions of accuracy and time of execution. Figure 9 shows that the new approach has increased reliability and decreased execution time relative to other methods.

Fig. 9

The comparisons among our results and other.

From the above figures, this study has proven two hypotheses. The first: studying the importance of e-mail features produces a good classification. The selected features from the trained approach were used to tune the classification accuracy. Second: developing a robust feature selection approach for the selected features.

5.2 Comparison with other methods

This section shows the results of the comparison between our proposed approach and the different literature methods used to improve the classification of e-mail spam by reducing the features. The comparison accuracy obtained from our approach was compared with the results from Yilmaz Kaya [39], Vinitha et al. [40], Fagbola et al. [22], Maldonado et al. [41], Alom et al. [42] and Parimala [43]. Alom et al. [42], proposed a deep learning model, which is combining two classifiers (i.e., tweet text classifier and metadata classifier). They run the model on two different real-world datasets. The first dataset achieved an accuracy of 99.32%, while the second one attains an accuracy of 93.38%. We selected these methods of comparison because they used the same datasets as ours. By comparing the outcome of different approaches with our result, we find that our result was better than the different approaches.

Table 5 shows the comparison between the results of our approach and others for the classification of e-mail spam. We have noted that the proposed approach has achieved better accuracy scores. Figure 10 present the accuracy comparisons between our proposed approach and other methods. From Fig. 10, the x-axis shows the comparisons between the accuracy of our approach and the other, while the y-axis shows the percentages of accuracy for the proposed approach compared to the others using the column chart. Figure 10 explains that the new approach (BDE-SVM) is achieved better accuracy than the other is.

Table 5
The comparisons of accuracy among our results and others

Method Accuracy %

GA-SVM 93.50

UTF-8 values 93.31

KP-SVM 93.52

DL 93.38

BDE-SVM 93.99

Method	Accuracy %
GA-SVM	93.50
UTF-8 values	93.31
KP-SVM	93.52
DL	93.38
BDE-SVM	93.99

Fig. 10

Line comparisons among our result and others.

Figure 10, x-axis describes the comparisons between the accuracy of our approach and others while y-axis describes the percentages of accuracy for the proposed approach compared with other using line chart. Figure 10 show that the new approach (BDE-SVM) is more accurate than others. In this article, the T-test is used to demonstrate the importance of our proposed method. Table 6 shows that our approach proposed are statistically significant. The Table 6 shows the result before and after optimization using the binary differential evolution subset selection approach (accuracy of use of SVM, the accuracy of use of BDE-SVM) compared to the paired samples T-test procedure. The Paired Samples T-test test contrasts the results of two variables representing the same class at different times. The mean values of the two variables ((Accracy1, Accracy2) are shown in the Statistic Samples Table 6. Since the T-test Paired Samples compares the means of the two variables, it is useful to know what the mean values are. The low value of the T-test shows that there is a significant difference between the two variables. So consider the null hypothesis that the test statistics will be less than the value of the t-distribution table, and that means that there is a significant difference between the two methods. Compared to the calculated values, the value of the t-distribution table was found to be less than the t-distributed value, which means that there is a significant difference between the two values. The confidence interval for the mean difference does not include zero; it also implies that the difference is significant. In addition, a low significance value for the T-test (usually less than 0.05) indicates that there is a significant difference between the two variables. Table 6 shows the terms (0.0006) that show that our proposed method has achieved significant results in SVM and BDE-SVM. In addition, the significant value in the SVM and BDE-SVM values is high, and the confidence interval of the mean difference does not contain zero. We can therefore conclude that there is a significant difference between the results before and after optimization. We also ran two statistical significance tests (t-tests) to show the improvement of our proposed pre-and post-improvement approach using the binary differential evolution approach. We found that the results of our proposed approach were statistically significant. Results after using feature subset selection based on improved BDE and SVM as classifiers; this can re-duce execution time (computational complexity), improve F-measurement, and improve accuracy and deliver better results than others.

Table 6

Testing of statistical significance using T-test based on Spambase dataset

	Paired Differences
	Mean	Std. Deviation	Std. Error Mean	95%Confidence interval of the difference		t	Sigh. (2-tailed)
				Lower	Upper
Pair 1 SVM –BDE SVM	0.0046	0.039	0.0006	0.0035	0.0058	8.1	0.0006

6 Conclusion

The paper focuses on the selection of subset features and, for this reason; the contribution of this paper is to use binary differential evolution (BDE) to select subset features. The challenge is, therefore, to provide an e-mail spam classification method for a subset feature and a classification algorithm such as SVM. Our suggested method is one of the solutions for selecting subset features to reduce the higher dimension and improve accuracy. The results obtained by the proposed approach are considered as one of the significant research solutions for e-mail spam classification. An e-mail spam classification system would be helpful for reducing the number of unsolicited messages. Our implemented approach is to evaluate and compare with some of the current e-mail spam classification techniques. In addition, the suggested solution offers the following advantages: increased detection reliability of the email spam classification. This paper introduces approaches to the choice of features based on the binary differential evolution optimization technique. The quality of the proposed method is contrasted with other population-based selection strategies such as GA and PSO. It is shown that the proposed BDE required smaller memory than other approaches, which results in a reduction in the time of execution. In addition, when testing on an e-mail spam classification problem, the proposed approach managed to outperform both GA and PSO in terms of classification results, yielding an accuracy of 93.99%. Finally, relating to the huge number of e-mails under classification, the difference in accuracy is considering very acceptable.

References

Suebsing

and Hiransakolwong

, A novel technique for featuresubset selection based on cosine similarity, AppliedMathematical Sciences 6(133) (2012), 6627–6655.

Kim

and Chen

S.-S.

, Associative naive bayes classifier: Automatedlinking of gene ontology to medline documents, PatternRecognition 42(9) (2009), 1777–1785.

Dada

E.G.

, et al., Machine learning for email spam filtering:review, ap-proaches and open research problems, Heliyon 5(6) (2019), e01802.

Rahnamayan , et al., Opposition-based differential evolution (ode) with variable jumping rate, IEEE (2007), 81–88.

Khan , et al., A comprehensive study of email spam botnet detection, IEEE Communications Surveys & Tutorials 17(4) (2015), 2271–2295.

Das , et al., Differential evolution: A survey of thestateof-the-art, IEEE transactions on evolutionary computation 15(1) (2010), 4–31.

Rami

, et al., Feature subset selection using differentialevolution and a statistical repair mechanism, Expert Systemswith Applications 38(9) (2011), 11515–11526.

Uysal , et al., A novel probabilistic feature selection method fortext classification, Knowledge-Based Systems 36 (2012), 226–235.

Mohammad , et al., A lifelong spam emails classification model, Applied Computing and Informatics (2020), 226–235.

10.

Zorarpacl , et al., A hybrid approach of differential evolution andartificial bee colony for feature selection, Expert Systemswith Applications 62 (2016), 91–103.

11.

Zamir , et al., A feature-centric spam email detection modelusing diverse supervised machine learning algorithms, The Electronic Library (2020).

12.

Hameed and Sarab

, Differential evolution detection models for smsspam, International Journal of Electrical & ComputerEngineering 11(1) (2021).

13.

Zhang , et al., Fuzzy clustering based on semantic body and its application in chinese spam filtering, JDCTA: International Journal of Digital Content Technology and its Applications (2011).

14.

Jain , et al., A hybrid approach for spam filtering using local concentration based k-means clustering, 2014 5th International Conference-Confluence The Next Generation Information Technology Summit (Confluence) IEEE, (2014), 194–199.

15.

Faris , et al., A hybrid approach based on particle swarm optimization and random forests for e-mail spam filtering, International Conference on Computational Collective Intelligence, Springer (2016), 498–508.

16.

Mirza , et al., Evaluating efficiency of classifier for email spam detector using hybrid feature selection approaches, 2017 International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, (2017), 735–740.

17.

Soranamageswari

and Meena

, Statistical feature extraction for classification of image spam using artificial neural networks. 2010 second international conference on machine learning and computing, IEEE, (2010), 101–105.

18.

Yang , et al., Generating and applying a trained structured machine learning model for determining a semantic label for content of a transient segment of a communication, Google Patents (2020).

19.

Makkar , et al., An efficient deep learning-based scheme for web spamdetection in iot environment, Future Generation ComputerSystems 108 (2020), 467–487.

20.

Makkar , et al., Ga-based feature subset selection in a spam/non-spam detection system, 2012 International Conference on Computer and Communication Engineering (ICCCE), IEE (2012), 675–679.

21.

Rakse , et al., Spam classification using new kernel function insupport vector machine, International Journal on ComputerScience and Engineering 2(5) (2019).

22.

Temitayo , et al., Hybrid ga-svm for efficient feature selection ine-mail classification, Computer Engineering and IntelligentSystems 3(3) (2012), 17–28.

23.

Kadam , et al., Bagging based ensemble of support vector machineswith improved elitist ga-svm features selection for cardiacarrhythmia classification, International Journal of HybridIntelligent Systems 16(1) (2020), 25–33.

24.

Saad , et al., A survey of machine learning techniques for spamfiltering, International Journal of Computer Science andNetwork Security (IJCSNS) 12(2) (2012), 66.

25.

Aggarwal

and Zhai

C.C.

, A survey of text classification algorithms. in mining text data, Springer US (2012), 163–222.

26.

Jakkula , et al., Tutorial on support vector machine (svm). School ofEECS, Washington State University 37 (2006).

27.

Jakkula , et al., Feature selection techniques for email spam classification: A survey, International Conference on Artificial Intelligence, Smart Grid and Smart City Applications, Springer, (2019), 925–935.

28.

Shuaib , et al., Whale optimization algorithm-based email spamfeature selection method using rotation forest algorithm forclassification, SN Applied Sciences 1(5) (2019), 390.

29.

Ni , et al., Support vector machine with manifold regularization andpartially labeling privacy protection, Information Sciences 294 (2015), 390–407.

30.

Khoshgoftaar , et al., A survey of stability analysis of feature subset selection techniques, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI), IEEE, (2013), 424–431.

31.

Cervante , et al., Binary particle swarm optimisation for feature selection:Afilter based approach, 2012 IEEE Congress on Evolutionary Computation, IEEE, (2012), 1–8.

32.

Ajibade , et al., An heuristic feature selection algorithm to evaluate academic performance of students, 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC), IEEE, (2019), 110–114.

33.

Zhang , et al., Evolutionary computation meets machine learning: Asurvey, IEEE Computational Intelligence Magazine 6(4) (2011), 68–75.

34.

Storn , et al., Differential evolution–a simple and efficientheuristic for global optimization over continuous spaces, Journal of Global Optimization 11(4) (1997), 341–359.

35.

Ali , et al., Simplex differential evolution, Acta PolytechnicaHungarica 6(5) (2009), 95–115.

36.

Qin , et al., Self-adaptive differential evolution algorithm fornumerical optimization, 2005 IEEE congress on evolutionarycomputation, IEEE 2 (2005), 1785–1791.

37.

Xingshi , et al., Feature selection with discrete binary differentialevolution, 2009 international conference on artificialintelligence and computational intelligence,IEEE 4 (2009), 327–330.

38.

He , et al., Feature selection with discrete binary differentialevolution, 2009 international conference on artificialintelligence and computational intelligence,IEEE 4 (2009), 327–330.

39.

Kaya , et al., A novel feature extraction approach in sms spamfiltering for mobile communication: one-dimensional ternarypatterns, Security and Communication Networks 9(17) (2016), 4680–4690.

40.

Vinitha , et al., Mapreduce mrmr: Random forests-based email spam classification in distributed environment, Data Management, Analytics and Innovation,Springer, (2020), 241–253.

41.

Maldonado , et al., Svm-based feature selection and classification for email filtering, Pattern recognition-applications and methods, Springer, (2013), 135–148.

42.

Alom , et al., A deep learning model for twitter spam detection, Online Social Networks and Media 18 (2020), 100079.

43.

Parimala , et al., A study of spam e-mail classification using feature selection package, Global Journal of Computer Science and Technology (2011).