Hybrid adapted fast correlation FCBF-support vector machine recursive feature elimination for feature selection

Abstract

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.

Keywords

Feature selection adapted FCBF-SVMRFE complementary features redundant features restored features

1. Introduction

Feature selection (FS) selects an optimum subset of features from the huge features available. The FS algorithms can be divided in three main categories: filter, wrapper and embedded methods. The filter methods are independent of any classification algorithm and evaluate the usefulness of feature, or a set of features, through measures of distance, dependency, correlation on data. Feature ranking are applied for high dimensionality problems [1]. The best ranked features are selected as input of a classifier.

Multivariate feature selection methods measure the interaction between features and select the best subset of features [2, 3]. In contrast, univariate methods evaluate each feature individually. Furthermore, due to the computational cost of the search algorithms used in feature selection, feature ranking methods are widely used in practice [4].

Popular filter methods include Relief [5, 6] Correlation based feature selection [7], Minimum Redundancy Maximum Relevance [8], fast correlation based filters [9], T-score, which is based on t-statistics measuring correlations between input features and output class labels. The filters are fast, computationally simple, evaluated based on criteria like entropy, correlation, mutual information, probabilistic distance measures [10, 11]. The principal filters drawback is they do not interact with the classifier and the majority of proposed methods ignores features dependency which affect classification performance negatively when compared to other FS techniques [1].

The wrappers are computationally expensive [1] and find feature subset that achieves higher accuracies by embedding classifier into the feature selection phases. Popular wrapper and embedded methods include support vector machines recursive feature elimination [2] and its variants Random forest RFE [12]. Classifiers such as boosting algorithms, SVM are applied to rank the features and distinguish relevant samples for classification [2]. The embedded methods incorporate the classifier in a FS construction process, but less computational expensive than filters.

Various evolutionary computation (EC) techniques have proved their performance for FS such as Genetic algorithm (GA) [13, 14], Particle swarm optimization (PSO) [15, 16], Ant Colony Optimization (ACO) [17], Artificial Bee Colony [18] and Differential evolution (DE) [19].

Hybrid FS methods combine two or more FS approaches in multiples stages including filter-wrapper, wrapper-wrapper, or filter inside wrapper, wrapper inside filter. The wrapper could be an evolutionary or swarm algorithm. For high dimensional dataset, applying filter is required to reduce the computational cost, after that, wrapper is performed to enhance the quality of selected features previously.

Kai et al. [20] described multiple SVM-RFE approach, where the SVM was trained on multiple subsets of data and the genes were ranked using statistical analysis, and demonstrated its effectiveness in choosing genes in microarray data. From the experiments, it was found that SVM-RFE (Correlation) outperformed SVM-RFE (mRMR) in terms of classification accuracy.

The proposed FS method [21] hybridized artificial bee colony (ABC) and differential evolution (DE) algorithm and described a new binary neighborhood search mechanism. The experimental results show that the proposed hybrid method is better than ABC and DE.

Akadi et al. [22] designed a two stages FS algorithm by combining the minimum Redundancy and maximum Relevance (mRMR) filter approach and GA in gene selection. Firstly, the mRMR discards irrelevant genes. Secondly, GA selects the highest discriminative gene subset. The results indicate that the proposed method finds the smallest efficient gene subset compared to GA and mRMR.

Filter-wrapper feature selection algorithm is proposed in [23]; the filter measure encodes the position of each particle in particle swarm and the classifier evaluates the solution in the fitness function. The experiments validate that the proposed method slightly outperforms a binary PSO based filter method.

Jiang et al. [24] designed a novel approach measuring the correlation between a continuous and discrete feature. An efficient filter FS algorithm named efficient correlation measure based filter (ECBMF) removes irrelevant features. Comparative studies were carried out on twenty UCI dataset and show that the proposed algorithm outperforms the others FS methods in both accuracy and dimensionality reduction.

The proposed approach is based on embedded method SVMRFE incorporated in multivariate filter FCBF. The FCBF filter identifies the predominant features but unable to guarantee high accuracy; SVMRFE is an efficient FS method in term of accuracy and weak for discarding redundant features.

The present work combines both multivariate filter FCBF and embedded method SVMRFE for feature selection task to benefit from them. We propose a hybrid algorithm with FCBF and Support Vector Machine recursive feature elimination SVMRFE by searching relevant and complementary features and discarding noisy features.

The adapted FCBF-SVMRFE incorporates inside FCBF filter the embedded SVMRFE method to find complementary features. Firstly, filter is used at the first stage; the idea consists of taking every feature and removes all correlated features with it, then; the SVMRFE method is applied to evaluate the impact on the performance. If the results are better, we test the next feature and so on, if not, the features are restored and then avoid decreasing the performance. The main objectives addressed in this work are:

•
Obtain the highest accuracy by machine learning algorithm, identify the smallest subset of relevant features, discard all redundant features and modify SVMRFE algorithm to remove worst features instead of only one at each execution for original algorithm.
•
Increase the performance by finding complementary features and compare the hybrid proposed method with existing methods: GA, PSO, and SVMRFE.

This paper is organized as follows. Section 2 describes filter ranking methods FCBF, SVM-RFE methods and presents a related work on FS algorithms. Section 3 explains the proposed architecture. The numerical experiments are demonstrated on medical datasets from UCI database in Section 4. The performance of the proposed algorithm are validated and compared with existing approaches in Section 5. Finally, the results were discussed followed with conclusion in Section 6.
2. Background and methods

2.1 Fast correlation-based filter

Filters Ranking Methods assign a weight to each feature and rank it accordingly. Filter techniques assess the relevance of the features based only on the properties of the data. In general, a feature relevance score is computed and low scoring features are removed [4].

The fast correlated based filter method [9] is a multivariate algorithm (FCBF) based on relevance redundancy approach and was conceived with two main phases: FCBF starts by selecting a set of features characterized by high correlation with the class using symmetric uncertainty (SU) defined Eq. (1), then, sorts these values into a list. It applies heuristics that removes the redundant features and keep the features that are more relevant to the class. SU is the ratio between the information gain and the entropy of two features. FCBF is effective to remove redundant and irrelevant features. The experimental results [9] validate the efficiency of the FCBF method, faster than both Correlation Feature Selection (CFS) and ReliefF.

$symU(X,Y)=2\left[{\displaystyle\frac{IG(X\mid Y)}{H(X)+H(Y)}}\right]$ (1)

where $I G$ : Information Gain, $H(\cdot)$ is Entropy.

2.2 Support vector machine recursive feature elimination

Guyon et al. [2] proposed an embedded FS algorithm Support vector machine recursive feature elimination (SVM-RFE). SVM-RFE is a FS method that uses SVM weights as the ranking criterion of features. It has two versions: linear and nonlinear. The non linear SVMRFE uses a specific kernel strategy and is more appropriate for non linear decision function [25]. This method starts with all features, uses criteria from SVM models coefficients to assess features, and recursively removes features that have small values.

2.3 Related work

Yu and Liu [2] present a filter ranking method based on symmetrical uncertainty measure. The proposed algorithm searches the predominant features first; then, removes the redundant features from that set. In this paper, the authors show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. They define feature redundancy and propose to perform explicit redundancy analysis in feature selection.

Zhang et al. [25] propose a novel hybrid approach by correlation-based filters and Support Vector Machine Recursive Feature Elimination (SVM-RFE) method for robust feature selection. In the first stage, they incorporate correlation-based filters to identify predominant Features and Complementary Features, and generate multiple groups for robustness; in the second stage, they aggregate multiple groups with SVM-RFE into a compact feature subset for high classification accuracy. Experimental results on both UCI data sets and microarray data sets have proved the effectiveness of the proposed approach.

Djellali et al. [26] designed two hybrid FS methods: multivariate filter fast correlation based filter FCBF and PSO (FCBF-PSO) and FCBF with genetic algorithm (FCBF-GA), FCBF-PSO combines in two phases particles swarm PSO and In first stage, FCBF filter is applied and provides a subset of features (relevant and no redundant), in the second stage, PSO is performed with previous selected features and executed on five UCI datasets. The experiments show that FCBF-PSO and FCBF-GA (genetic algorithm replace PSO) are competitive.

Authors [27] designed a framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. This approach can find small subsets of features that perform well. The experiments were conducted using three existing feature selection methods entropy, T-statistics, and SVM-recursive feature elimination (RFE) on three datasets. The experimental results demonstrate that this approach is a robust and effective approach to find subsets of features with higher classification accuracy and smaller size compared to each individual feature selection algorithm.

Ferreira et al. [28] constructed two feature selection ranking filters for high-dimensional datasets, with a relevance-redundancy approach working in unsupervised or supervised mode. The experimental evaluation validated the efficiency of this method. According to authors, as compared to wrapper approaches, the selected features of this approach may not be the best for any classifier.

Baretta et al. [6] proposed modified filter ranking ReliefF called sReliefF and sTurRF to identify relevant features. Simulation analysis demonstrated that in presence of gene interactions these methods efficiently scale down large scale datasets to improve the computational cost of data mining algorithms and extract optimal features from genetic datasets.

Freeman et al. [29] examined a number of filter-based feature subset evaluation measures with the aim of assessing their performance to specific classifiers. This work tests 16 filter measures with K-nearest neighbors KNN and SVM classifierson 20 real and 20 artificial datasets. The results indicate that KNN classifiers work well with subset based RELIEF, correlation feature selection, whereas Fisher’s interclass separability criterion and conditional mutual information maximization work better for SVM. For KNN classifiers, the best filter measures are Correlation based feature selection CFS, and subset RELIEF.

Tapia et al. [30] described a method for performing sparse and stable gene selection from a number of unstable, but low cost, SVM-RFE units. It was shown that the introduction of a consensus constraint with respect to variations in the policy of gene removal improves gene selection precision. This method is validated on three benchmark microarray datasets.

Piyushkumar et al. [31] investigated filter approaches to gene selection to improve computational complexity and stability. In backward elimination method, a modified logistic regression loss function selects relevant samples. Tscore ranks the genes. The stability and computational complexity are improved compared to SVM based methods without reducing the accuracy.

Moldonado and Weber [32] introduced a novel wrapper algorithm using support vector machines with kernel functions. This method performs a sequential backward elimination of features. This approach has been tested on four datasets, outperforms other filter and wrapper techniques and provides high classification performance.

Apolloni et al. [33] investigated two new hybrid FS algorithms, called respectively Binary Differential Evolution BDE Xrank and BDE XRankf. The information gain values ranks the features in a first stage; the second stage, the BDE based wrapper method performs a search for a best solution. All initial solutions are generated with a reduced number of features. Four classifiers (KNN, C4.5, Naïve Bayes, SVM) have been used on six high dimensional microarray data. This approach reduces the size by 90% and achieves highly results.

Cadenas et al. [34] proposed hybrid FS approach based on fuzzy random forest, it can work with both crisp and uncertain data and integrated a filter and wrapper method into a sequential search procedure with improved classification accuracy of the features selected. This method achieves good performance with smaller number of features.

Maka et al. [35] investigated various wrapper and filter techniques for reducing the feature dimension of pairwise scoring matrices and demonstrated that these two types of selection techniques are complementary to each other. Two fusion strategies used to combine the ranking criteria of filter and wrapper methods. It was shown that feature subsets selected by the fusion methods are superior to those selected by the individual on microarray dataset.

3. Proposed adapted fast correlation SVMRFE based feature selection method

The embedded Support vector machine recursive feature elimination (SVMRFE) algorithm is incorporated in the Fast correlation based feature selection FCBF as shown in Fig. 1. The main idea is to evaluate both symmetrical uncertainties for features then measures the accuracy with classifier. The comparison between two accuracies at iteration I and I-1 before removing these features and after, if the current accuracy decreases that means, even if these features are redundant, they are complementary and then we restore them. The key idea is to remove for each relevant feature j its redundant features $i$ , ( $i\neq j$ ) then apply SVMRFE to evaluate the impact on performance and modify the size of the subset of features depending on the resulted accuracy. Therefore, the hybrid methods eliminate more redundant features and measure the accuracy to validate the efficiency of resulted subset. The presented method provides better results than SVMRFE or filter alone.

Figure 1.

Flowchart of the proposed adapted feature selection.

3.1 Adapted FCBF-SVMRFE

The Adapted fast correlation based filter algorithm (Fig. 1) consists of three stages:

•
Stage 1

The first stage is the relevance analysis, aims at ordering the input variables depending on relevance scores which is computed as the symmetric uncertainty with the respect to the target output. This phase discards irrelevant variables, which have a ranking score below a predefined threshold.
•
Stage 2

The redundancy analysis phase selects predominant features from the relevant set obtained in the first stage. Each predominant feature has its redundant features removed.
•
Stage 3

The third stage apply SVMRFE algorithm to evaluate the impact of removed features on accuracy. However, if the accuracy at iteration I decrease in comparison to accuracy at iteration I-1, it indicates that these features are complementary and cannot be removed. Consequently, deleted features are restored in the list of selected features. This process is repeated (stage 2 and 3) for all predominant features.

To avoid high computational time due to excessive SVMRFE execution for each predominant feature, the number of redundant feature removed is counted. If this number is greater than ten 10% percent of total number of features, we apply SVMRFE. This choice ensures:

1)
Reduce the number of execution of SVMRFE algorithm.
2)
All complementary features obtained contribute to improve accuracy. However, we know that only a subset of these features improves the performance, the proposed solution is to perform a function SearchComplementary.

The adapted FCBF-SVMRFE tests the previous and current accuracy depending on the size of features: the restoration stage (if Cp $>$ limit) implies applying the classification process only when size is reached (variable limit). Performing SVMRFE for each predominant feature implies executing SVM classifier which is time consuming, that’s the reason we fix the limit size criterion, reducing by that the computational cost. Limit is ten percent of the feature size, then, SVMRFE is executed less than the number of predominant feature compared to the standard SVMRFE. However, this choice means that we cannot precise which feature is complementary when Oldaccuracy is greater than current accuracy. For that reason we perform the function SearchComplementary to select them by learning process.

Function [finalsetFeat] $=$ searchComplementary (InitialSubset)

It $=$ 0; maxiter $=$ 30; //choosen empirically

${\rm T}=$ between 15% to 40% are tested

Step 1 Select T% of features randomly: Scandidate

Iter $=$ iter $+$ 1;

Step 2 //Fusion a set of predominant features Union Scandidate.

SFeat $=$ SpredominantU Scandidate; Accuracy $=$ classifier(SFeat)

If (accuracy $>$ oldaccuracy)

Oldaccuracy $=$ accuracy; finalSetFeat $=$ Sfeat;

end

Step 3Repeat step 1 to 2 Until (iter $\leqslant$ Maxiter);

3.2 Adapted AFCBF-SVMRFE algorithm

Function [finalsetFeat] $=$ searchComplementary (InitialSubset)
It $=$ 0; maxiter $=$ 30; //choosen empirically
${\rm T}=$ between 15% to 40% are tested
Step 1 Select T% of features randomly: Scandidate
Iter $=$ iter $+$ 1;
Step 2 //Fusion a set of predominant features Union Scandidate.
SFeat $=$ SpredominantU Scandidate; Accuracy $=$ classifier(SFeat)
If (accuracy $>$ oldaccuracy)
Oldaccuracy $=$ accuracy; finalSetFeat $=$ Sfeat;
end
Step 3Repeat step 1 to 2 Until (iter $\leqslant$ Maxiter);

Input: S (F1, F2, F3,

\ldots

, FN, C) //a Training data

Comp

=

[]; restore

=

[]; rate

=

[0–5%]; Thresh //a predefined threshold

Output: Sbest //best subset of features

BEGIN

1 Cp

=

0; //Count the number of features

//Redundancy analysis

2 For

i=

1 to

N

do begin

3 Calculate SymU

(i,c)

for Fi with Eq. (1);

4 if (SymU

(i,c)

>

Thresh) append Fi to Slist;

End;

// Order Slist in descending SymU

(i,c)

value;

5 Ordlist

=

sort (Slist, ‘descend’); Fj

=

getFirstElement (Ordlist);

6 Repeat

7 Fi

=

getNextElement (Ordlist, Fj);

8 if (Fi

<>

NULL)

9 Repeat

10 if (SymU

(i,j)

\geqslant

SymU

(i,c)

)

11 Begin Comp

=

Comp U {Fi}; //complementary features

12 Remove Fi from Ordlist;

13 Cp

=

+

1//Count the number of features removed

End;

14 Fi

=

getFirstElement (Ordlist, Fi);

15 Until (Fi

==

NULL) // all redundant features to Fj are removed

16 Len

=

length (Ordlist); Limit

=

0.1

*

Len; // threshold

=

limit;

//Evaluation and restoration

17 If (Cp

>

limit) //over execution of SVMRFE for each feature is not recommended begin // FeatRemoved: list of removed features

18 [Accuracy, featRemoved]

=

SVMRFE(Ordlist, rate)

19 Cp

=

20 If (oldaccuracy

>

Accuracy)begin

21 Restore

=

Restore U Comp;

21 Bestrestore

=

SearchComplementary(Restore)

22 Sbest

=

Ordlist union Bestrestore

//get back complementary features //it means there are complementary feature end

23 Else Oldaccuracy

=

accuracy;

24 End

25 Fj

=

getNextElement (Ordlist, Fj);

26 Until (Fj

==

NULL);

END

The procedure SVMRFE is a FS method that uses SVM weights as the ranking criterion of features. It was proved that removing more than one feature at each execution is more accurate [2]. The previous resulted subset is used as input for SVMRFE method. It has three steps:

SVMRFE algorithm
Input Ordlist from FCBF. rate
Output Accuracy, featRemoved
FeatRemoved (with removed X% features),
X is estimated experimentally [0–5%].
1. Train a Support Vector Machines on the training set.
2. Order features using the weights of the resulting classifier.
3. Eliminate X percent of features with the smallest weight (lowest rank).
4. Repeat the process with the training set limited to the remaining features.

4. Experimental results

In this section, we first describe the experimental settings, we empirically evaluate the performance of the proposed feature selection method Adapted FCBF-SVMRFE on different classifiers: Support vector machine, K nearest neighbors (KNN). We also compare the proposed algorithms AFCBF-SVMRFE with five FS methods (Table 1).

Table 1
List of tested algorithms

Algorithm	Abreviation
Fast correlation based filter	FCBF
Support vector machine recursive feature elimination	SVMRFE
Standard FCBF-SVMRFE	FCBF-SVMRFE
Adapted FCBF-SVMRFE	AFCBF-SVMRFE
Particle swarm optimization	PSO
Artificial bees colony	ABC

ABC and PSO are employed as wrapper approach using accuracy of classifiers SVM, KNN respectively as fitness function: We employ ABC-KNN; ABC-SVM and PSO-KNN, PSO-SVM.

4.1 Databases

To assess the performance of the proposed feature selection algorithms, we choose the dataset from UCI [40] machine learning repository. The datasets properties are summarized in Table 2.

Table 2
Datasets used in the experimental study

Dataset	Number of features	Class	Number of samples
Hepatitis	19	2	80
Arrythmia	279	16	452
Colon tumor	2000	2	62
Lung cancer	56	3	27
DLBCL	4026	2	45
Prostate	12600	2	102
Leukemia	7129	2	38
WDBC	32	2	569

4.2 Classifier

The quality of the subsets of features obtained from the proposed method is validated using two machine learning algorithms: Knearest neighbors (KNN), Support Vector Machine (SVM) on eight UCI dataset.

Support Vector machine is a robust supervised learning method [36] and generalize well in different applications. For multiclass problem, one versus all classification technique is applied [37]. K nearest neighbors KNN [38] is an efficient algorithm that stores all cases and classifies new cases based on a majority vote of its neighbors (distance measure). KNN is used in pattern recognition and machine learning.

4.3 Parameters setting of FS algorithms

We describe the parameters setting of different FS algorithms: PSO, ABC, SVMRFE and hybrid FCBF-SVMRFE. All algorithms are implemented by using MATLAB 7.0. Each dataset is split into training and test set. Classification accuracies of AFCBF-SVMRFE, PSO, ABC, SVMRFE are measured by the standard k $=$ 10 fold cross validation methods. PSO and ABC are two stockastic methods providing different results at each run. Consequently, $k=$ 10 fold validation was repeated 20 times. The mean of accuracy and standard deviation are calculated to enhance the statistical significance of the classification accuracies. The parameters setting of algorithms are described in Table 3.

Table 3
Setting parameters

Algorithms	Setting parameters
ABC	Limit $=$ 100; maxIterations $=$ 50; NBsource $=$ 50; fitness function: classifier accuracy.
PSO	Population size $=$ 50; inertia weight $=$ 0.72; personal learning coefficient c1 $=$ 1.5; global learning coefficient c2 $=$ 1.5; max iteratio $n=$ 50; fitness function: classifier Accuracy .
SVMRFE	The original SVMRFE apply SVM classifier then rank the features and delete the worst feature at each execution. In our experiments, we eliminate more features ( $X=$ 5%) due to the size of data and time consumed with SM classifier.
FCBF-SVMRFE	Threshold $=$ 0; standard SVMRFE: one feature removed.
FCBF	Threshold $=$ 0;

4.4 Comparison with existing FS methods

Different existing methods were studied to compare the effectiveness of the proposed method. Five methods were applied: ABC, PSO, FCBF, SVMRFE, hybrid FCBF-SVMRFE.

Regarding PSO, the position of particle is encoded using binary digits. Each bit set to 1 or 0 represents selected or unselected feature. The particles are initialized randomly.

The hybrid two stages FCBF-SVMRFE method consists of applying FCBF filter method on all features then the resulted subset of features are used as input of SVMRFE embedded method. The threshold is set to zero and X% of features removed for SVMRFE at each execution.

4.5 Experiments on proposed FS methods

The results of numerical experiments are summarized in Table 4. SVM and KNN classifier are used to evaluate the accuracy on the selected feature subset.

Table 4
WDBC and hepartitis performance

Dataset	FS methods	SVM accuracy %	Numbers of features	KNN accuracy ( $K=$ 7), (#Features)
WDBC	AFCBF-SVMRFE	97.059 $\pm$ 0.00	25	72.83 $\pm$ 0.0
	ABC	95.30 $\pm$ 0.75	15	73.58 $\pm$ 0.23 (13)
	PSO	96.86 $\pm$ 0.64	17	92 $\pm$ 0.32 (12)
	FCBF	95.59 $\pm$ 0.00	6	65.44 $\pm$ 0.0
	SVMRFE	96.48 $\pm$ 0.00	8	75.00 $\pm$ 0.0
	FCBFSVMRFE	96.77 $\pm$ 0.00	5	70.30 $\pm$ 0.0
Hepatitis	AFCBFSVMRFE	100.00 $\pm$ 0.00	13	90.23 $\pm$ 0.0
	ABC	86.50 $\pm$ 0.39	12	93.75 $\pm$ 0.12 (4)
	PSO	83.33 $\pm$ 0.15	9	76.25 $\pm$ 0.62 (4)
	FCBF	100.00 $\pm$ 0.00	13	87.50 $\pm$ 0.0
	SVMRFE	83.75 $\pm$ 0.00	16	88.70 $\pm$ 0.0
	FCBFSVMRFE	87.50 $\pm$ 0.00	13	81.25 $\pm$ 0.0

The experiments on WDBC data (Table 4) show that Adapted AFCBFSVMRFE accomplishes the best SVM accuracy (97.059%) and 25 features in comparison with ABC (95.30% and 15 features), PSO (96.86%; 17 features).

As can be learned from hepatitis in Table 4, the hybrid AFCBF-SVMRFE algorithm and FCBF achieve best SVM classification accuracy with 100% and 13 features selected. PSO realizes less accuracy 83.33% with SVM than AFCBFSVMRFE but provides minimal number of features (9). ABC (86.50% and 12 features). Lung Cancer as shown in Table 5, PSO achieves (94.44% SVM accuracy, 27 features) and outperforms the proposed AFCBFSVMRFE with accuracy of 79.40% on SVM classifier and FCBF-SVMRFE algorithm (88.89%). The proposed method AFCBFSVMRFE is better than both FCBF and SVMRFE. PSO realizes the best accuracy for KNN ( $k=$ 7) on both lung cancer (70.65%) and arrhythmia (74.22%).

Table 5

Lung cancer and arrythmia

Dataset	FS methods	SVM accuracy %	Numbers of features	KNN $k=$ 7 accuracy (# features)
Lung cancer	AFCBFSVMRFE	79.40 $\pm$ 0.00	53	66.67 $\pm$ 0.00
	ABC	91.24 $\pm$ 0.24	31	70.12 $\pm$ 0.16 (42)
	PSO	94.44 $\pm$ 0.36	27	70.65 $\pm$ 0.26 (36)
	FCBF	75.00 $\pm$ 0.00	38	58.40 $\pm$ 0.00
	SVMRFE	77.62 $\pm$ 0.00	38	63.65 $\pm$ 0.00
	FCBFSVMRFE	88.89 $\pm$ 0.00	35	65.13 $\pm$ 0.00
Arrythmia	AFCBFSVMRFE	79.08 $\pm$ 0.00	237	70.11 $\pm$ 0.00
	ABC	83.45 $\pm$ 0.42	170	73.98 $\pm$ 0.17 (186)
	PSO	84.61 $\pm$ 0.67	159	74.22 $\pm$ 0.41 (147)
	FCBF	68.88 $\pm$ 0.00	210	66.76 $\pm$ 0.00
	SVMRFE	70.50 $\pm$ 0.00	210	67.70 $\pm$ 0.00
	FCBFSVMRFE	68.55 $\pm$ 0.00	171	69.00 $\pm$ 0.00

The results on Arrythmia shown in Table 5 demonstrate that PSO (84.61% and 159 features) outperforms the proposed AFCBFSVMRFE and ABC (83.45% and 170 features). However, AFCBFSVMRFE with SVM accuracy 79.08% and 237 features is more efficient than FCBFSVMRFE which realizes the accuracy of 68.55% with 171 features, FCBF (68.88% and 2010 features) and SVMRFE (70.50% and 210 features).

The lung cancer has 4 restored features with 79.4% SVM accuracy, 12 restored for arrhythmia (79.08%) and 455 features restored for colon cancer (94.44%) increasing accuracy.

Colon cancer as shown in Table 6, AFCBFSVMRFE performs well with accuracy of (94.44%) on SVM classifier better than ABC (88.81%), FCBF (83.30%), SVMRFE (81.25%) and 87.10% for FCBF-SVMRFE algorithm. In contrast, ABC achieves minimal number of features (981) compared to other methods. ABC accomplishes the best performance for KNN ( $k=$ 7) on both colon (70.12%; 821 features) and DLBCL (81.35%; 1574 features).

Table 6

Colon cancer and DLBCL performance

Dataset	FS methods	SVM accuracy %	Numbers of features	KNN accuracy $k=$ 7 (#features)
Colon	AFCBFSVMRFE	94.44 $\pm$ 0.00	1203	68.76 $\pm$ 0.00
	ABC	88.81 $\pm$ 0.14	981	70.12 $\pm$ 0.21 (821)
	PSO	82.44 $\pm$ 0.49	1023	66.70 $\pm$ 0.15 (1014)
	FCBF	83.30 $\pm$ 0.00	1057	66.70 $\pm$ 0.00
	SVMRFE	81.25 $\pm$ 0.00	1820	66.90 $\pm$ 0.00
	FCBFSVMRFE	87.10 $\pm$ 0.00	1047	66.70 $\pm$ 0.00
DLBCL	AFCBFSVMRFE	100.00 $\pm$ 0.00	2293	80.85 $\pm$ 0.00
	ABC	88.75 $\pm$ 0.72	1541	81.35 $\pm$ 0.82 (1574)
	PSO	90.43 $\pm$ 0.31	2068	79.60 $\pm$ 0.43 (674)
	FCBF	85.70 $\pm$ 0.00	2433	80.61 $\pm$ 0.00
	SVMRFE	91.30 $\pm$ 0.00	2738	71.60 $\pm$ 0.00
	FCBF-SVMRFE	100.00 $\pm$ 0.00	2312	80.20 $\pm$ 0.00

The results on DLBCL in Table 6 demonstrate that AFCBFSVMRFE outperforms all FS methods (100.00% accuracy and numbers of features 2293) on SVM classifier better than FCBF (85.70%), SVMRFE (91.30%) and FCBF-SVMRFE (100.00%) and ABC (88.75%). However, ABC realizes minimal number of features (1541).

As can be learned from Fig. 2, the minimal number of features is realized with adapted AFCBFSVMRFE on leukemia Fig. 2f and prostate Fig. 2h. The hybrid approach FCBFSVMRFE accomplishes 5 features on WDBC Fig. 2a where PSO succeed to reach the minimum subset in Fig. 2b on hepatitis (9); arrhythmia (159) Fig. 2d and DLBCL (2068) Fig. 2g. The best result is achieved with ABC on lung cancer Fig. 2c and colon Fig. 2e.

Figure 2.

Comparison of FS methods of number of features on different datasets on SVM classifier.

For leukemia data, the AFCBFSVMRFE method provides the best performance in SVM accuracy 100.00% and less numbers of features 366 as shown in Table 7. AFCBFSVMRFE achieves the best performance (85.70%) for KNN with $k=$ 7 on leukemia and prostate (87.00%).

Table 7

Leukemia and prostate performance

Dataset	FS methods	SVM accuracy %	Numbers of features	KNN accuracy $k=$ 7 (#features)
Leukemia	AFCBFSVMRFE	100.00 $\pm$ 0.00	366	85.70 $\pm$ 0.00
	ABC	96.78 $\pm$ 0.14	2540	71.35 $\pm$ 0.07 (2064)
	PSO	94.00 $\pm$ 0.21	3516	71.40 $\pm$ 0.64 (1270)
	FCBF	100.00 $\pm$ 0.00	3570	71.40 $\pm$ 0.00
	SVMRFE	94.70 $\pm$ 0.00	6416	71.40 $\pm$ 0.00
	FCBFSVMRFE	100.00 $\pm$ 0.00	3428	85.22 $\pm$ 0.00
Prostate	AFCBFSVMRFE	90.61 $\pm$ 0.00	468	87.00 $\pm$ 0.00
	PSO	91.13 $\pm$ 0.19	6274	76.13 $\pm$ 0.24 (6390)
	FCBF	90.00 $\pm$ 0.00	1189	65.26 $\pm$ 0.00
	SVMRFE	94.00 $\pm$ 0.00	5670	77.12 $\pm$ 0.00
	FCBFSVMRFE	86.30 $\pm$ 0.00	1142	76.00 $\pm$ 0.00
	ABC	88.92 $\pm$ 0.13	1324	74.16 $\pm$ 0.26 (3452)

The experiments carried out demonstrate that the adapted AFCBF-SVMRFE method with KNN classifier outperforms the feature selection methods: FCBF, SVMRFE and FCBF-SVMRFE for all datasets.

The SVM classifier performs well than KNN classifier for all FS methods including existing approaches and proposed method like shown in Tables 5–7.

5. Discussion

It has been found that the proposed method Adapted FCBF-SVMRFE is more accurate than the exiting FS methods (FCBF, SVMRFE, FCBF-SVMRFE, PSO and ABC) for classifying the hepatitis, WDBC, colon, DLBCL, leukemia and prostate.

The two stages FCBF-SVMRFE method is more efficient than FCBF filter method and embedded approach SVMRFE. The SVMRFE performs well than FCBF filter approach.

Restored features constitute a subset of features from the set of complementary features with random selection and classifier accuracy that are efficient for increasing the performance in term of accuracy and feature size reduction.

The complementary features when restored partially and well selected with classifier criterion is a good technique to restore the best features randomly. The proposed method is relevant with high size. However, some parameters influence the performance like the size of limit variable (chosen 10% of the feature size).

The adapted FCBF-SVMRFE method provides a list of restored features that are dependent of the size of redundant features. This number is critical; the reason is the restoration of features depends on the improvement of accuracy. If this number is high, it means that some features might be irrelevant. The tradeoff must be conducted between keeping complementary features and removing irrelevant features to guarantee the best choice of features subset.

6. Conclusion

In this paper a hybrid approach called adapted AFCBF-SVMRFE based on filter method FCBF and embedded SVMRFE was proposed. The SVMRFE is incorporated in FCBF filter. The experiments conducted validated the novel algorithm in term of SVM and KNN classification accuracies and number of features better than existing method.

Restored features from the set of complementary features with random selection and classifier accuracy provide an optimal subset of features. To avoid high computational cost, the proposed algorithm does not restore only complementary features but might restore irrelevant features. This drawback is acceptable because of the increase of the classifier accuracy. The future work intends to enhance the ability of AFCBFSVMRFE for features restoration.

References

Bolon-Canedo

Sanchez-Marono

Alonso-Betanzos

Benitez

Herrera

. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014; 282: 111-135.

Guyon

Weston

Barnhill

. Gene selection for cancer classification using support vector machines. Machine Learning. 2002. 46(1-3), 389-422.

Yan

Zhang

. Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Journal Sensors and Actuators B Chemical. 2015; 212: 353-363.

Fakhraei

Zadeh

Fotouhi

. Bias and stability of single variable classifiers for feature ranking and selection. Expert Systems with Applications. 2014; 41: 6945-6958.

Kononenko

. Estimating attributes: Analysis and extensions of relief. In machine learning 1994; ECML 94, Springer. 171-182.

Baretta

Santaniello

. Implementing reliefF filters to extract meaningful features from genetic life time datasets. Journal Biomedical Informatics. 2011; 44: 361-369.

Hall

. Correlation based feature selection for machine learning. Phd Thesis, The University of Waikato. 1999.

Ding

Peng

Long

. Feature selection based on mutual information: Criteria of max-dependency, max relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005; 27(8).

Liu

. Feature selection for high-dimensional data: A fast correlation based filter solution. in: Proceedings of International Conference on Machine Learning. ICML. 2003; 856-863.

10.

Bolon

Sanchez

Betanzos

. A review of feature selection methods on synthetic data. Journal of Knowledge Information System. 2013; 34: 483-519.

11.

Yoon

Kim

. Mutual information-based SVM-RFE for diagnostic classification of digitized mammograms. Journal of Pattern Recognition Letters. 2009; 30: 1489-1495.

12.

Diaz

Uriarte

. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006; 7.

13.

Holland

. Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor. 1975.

14.

Shreem

Abdullah

Nazri

Alzaqebah

. Hybridizing reliefF MRMR filters and GA wrapper approaches for gene selection. J Theor Appl Inf Technol. 2012; 46(2): 1034-1039.

15.

Kennedy

Eberhart

. Particle swarm optimization. in: IEEE Intern Conf on Neural Networks. 1995; 4: 1942-1948.

16.

Yeoman

Xue

Zhang

. Particle swarm optimisation for feature selection: A size-controlled approach. Proceedings of the 13-th Australasian Data Mining Conference. Sydney, Australia. 2015.

17.

Dorigo

Maniezzo

Colorni

. Ant system: Optimization by a colony of cooperating agents. IEEE Trans on Systems, Man and Cybernetics. 1996; 26(1): 29-41.

18.

Karaboga

. An idea based on honey bee swarm for numerical optimization. Technical Report TR06, Erciyes University, Engineering Faculty. 2005.

19.

Price

Storn

Lampinen

. Differential evolution: A practical approach to global optimization. Springer, London 2005.

20.

Kai-Bo

Rajapakse

Wang

Azuaje

. Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience. 2005; 4(3): 228-234.

21.

Zorarpac

Selma

Özel

. A hybrid approach of differential evolution and artificial bee colony for feature selection. Journal Expert Systems with Applications. 2016; 62: 91-103.

22.

El Akadi

Amine

El Ouardighi

Aboutajdine

. A two-stage gene selection scheme utilizing MRMR filter and GA wrapper. Knowl Inf Syst. 2011; 26: 487-500.

23.

Esseghir

Goncalves

Slimani

. Adaptive particle swarm optimizer for feature selection. in: International Conference on Intelligent Data Engineering and Automated Learning. Springer Verlag, Berlin, Heidelberg. 2010; 226-233.

24.

Jiang

Wang

. Efficient feature selection based on correlation measure between continuous and discrete features. Journal of Information Processing Letters. 2016; 116: 203-215.

25.

Zhang

. A hybrid feature selection approach by correlation based filters and SVM-RFE. International Conference on Pattern Recognition. 2014; 3684-3689.

26.

Djellali

Zine

Azizi

. Two stages feature selection based on filter ranking methods and SVMRFE on medical applications. in: Modelling and Implementation of Complex Systems. Lecture Notes in Networks and Systems, Vol 1. Springer, Cham. 2016; doi: 10.1007/978-3-319-33410-3_20.

27.

Xue

Zhang

Bourgeois

. A genetic algorithm-based method for feature subset selection. Soft Comput. 2008; 12: 111-120. doi: 10.1007/s00500-007-01938.

28.

Ferreira

PAJ

Figueiredo

MAT

. Efficient feature selection filters for high-dimensional data. Pattern Recognition Letters. 2012; 33: 1794-1804.

29.

Freeman

Dana

Otman

. An evaluation of classifier-specific filter measure performance. Feature Selection Pattern Recognition. 2015; 48: 1812-1826.

30.

Tapia

Bulacio

Angelone

. Sparse and stable gene selection with consensus SVM-RFE. Pattern Recognition Letters. 2012; 33: 164-172.

31.

Piyushkumar

Jagath

Rajapakse

. Gene and sample selection using t-score with sample selection. Journal Bimedical Informatics. 2016; 59: 31-41.

32.

Maldonado

Weber

. A wrapper method for feature selection using support vector machines. Journal Information Sciences. 2009; 179: 2208-2217.

33.

Apolloni

Leguizamon

Alba

. Two hybrid wrapper-filter feature selection algorithms applied to high dimensional microarray experiments. Journal Soft Computing. 2016; 38: 922-932.

34.

Cadenas

Garrido

Martinez

. Feature subset selection filter – wrapper based on low quality data. Journal Expert Systems with Applications. 2013; 40: 6241-6252.

35.

Maka

Kung

. Fusion of feature selection methods for pairwise scoring SVM. Neurocomputing. 2008; 71: 3104-3113.

36.

Vapnick

. Statistical learning theory. John Wiley & Sons. 1998.

37.

Rifkin

Klautau

. In defense of one vs all classification. Journal of Machine Learning Research. 2004; 5: 101-141.

38.

Duda

Hart

. Pattern classification and scene analysis. Wiley. 1973.

39.

Alaydin

. Introduction to machine learning. The MIT Press, 2004.

40.

Machine Learning repository UCI: https://archive.ics.uci.edu/ml/datasets.html.

Hybrid adapted fast correlation FCBF-support vector machine recursive feature elimination for feature selection

Abstract

Keywords

1. Introduction

2.1 Fast correlation-based filter

2.3 Related work

3. Proposed adapted fast correlation SVMRFE based feature selection method

4. Experimental results

Table 1 List of tested algorithms

Table 2 Datasets used in the experimental study

4.3 Parameters setting of FS algorithms

Table 3 Setting parameters

4.5 Experiments on proposed FS methods

Table 4 WDBC and hepartitis performance

6. Conclusion

References

Table 1
List of tested algorithms

Table 2
Datasets used in the experimental study

Table 3
Setting parameters

Table 4
WDBC and hepartitis performance