A novel three layer particle swarm optimization for feature selection

Abstract

Feature selection (FS) is a vital data preprocessing task which aims at selecting a small subset of features while maintaining a high level of classification accuracy. FS is a challenging optimization problem due to the large search space and the existence of local optimal solutions. Particle swarm optimization (PSO) is a promising technique in selecting optimal feature subset due to its rapid convergence speed and global search ability. But PSO suffers from stagnation or premature convergence in complex FS problems. In this paper, a novel three layer PSO (TLPSO) is proposed for solving FS problem. In the TLPSO, the particles in the swarm are divided into three layers according to their evolution status and particles in different layers are treated differently to fully investigate their potential. Instead of learning from those historical best positions, the TLPSO uses a random learning exemplar selection strategy to enrich the searching behavior of the swarm and enhance the population diversity. Further, a local search operator based on the Gaussian distribution is performed on the elite particles to improve the exploitation ability. Therefore, TLPSO is able to keep a balance between population diversity and convergence speed. Extensive comparisons with seven state-of-the-art meta-heuristic based FS methods are conducted on 18 datasets. The experimental results demonstrate the competitive and reliable performance of TLPSO in terms of improving the classification accuracy and reducing the number of features.

Keywords

Feature selection particle swarm optimization three layer structure random exemplar selection local search operator

1 Introduction

Nowadays, huge amounts of data are produced in the real world due to the rapid development of information technology and data storage devices [1]. The process of data analysis is challenging and often inaccurate because of the existence of irrelevant and redundant features. Feature selection (FS) is an indispensable preprocessing step in data mining tasks (i.e., classification), especially for large datasets [2]. FS aim to find the most informative features from all the features. FS can improve the classification performance, speed up the learning speed, and provide better understanding of the data and its underlying characteristics.

Based on the evaluation criteria, FS can be divided into two categories: filter and wrapper [3, 4]. The wrapper approach employs a learning algorithm to evaluate the discriminatory power of the feature subsets. In contrast, the filter approach is independent of the learning algorithm and evaluates the features with some statistical properties, such as mutual information, information gain, and correlation [5 –7]. Generally speaking, wrappers can produce better classification performance than the filters because the wrappers directly use the classification accuracy to obtain feature subsets. But the filters are much faster since they do not need to call the learning algorithm repeatedly during the searching process.

FS can be considered as a combinatorial optimization problem which aims to select the optimal feature subset. It is an NP-hard problem since the search space grows exponentially with the increase of feature number. For a dataset with N features, there are 2 ^N candidate feature subsets. Therefore, an exhaustive search which examines all the possible feature subsets is impractical in most of the cases. Heuristic search methods are less computationally expensive, but they are prone to fall into local optimal solutions [8]. Recently, many nature inspired meta-heuristic algorithms have been used to select optimal feature subsets, including Genetic Algorithm (GA) [9, 10], Particle Swarm Optimization (PSO) [11], Ant Colony Optimization (ACO) [12], Differential Evolution (DE) [13], and Artificial Bee Colony (ABC) [14]. The meta-heuristic techniques apply randomness into their search process to explore the entire search space in order to find better solutions within an acceptable time. They have been widely used in optimization problems without prior domain knowledge or a differentiable fitness function [15 –18].

Among all the population based meta-heuristic algorithms, PSO has attracted increasing attention from the FS community because of its ease of implementation and rapid convergence speed [19]. Although PSO based FS approaches have shown promising results, it still suffers from several shortcomings.

Most PSO based FS methods employ the fully informed topology. By using this topology, the information exchange speed among the swarm is very fast and the algorithm displays the characteristic of rapid convergence [20]. But this topology may lead to the quick loss of diversity.

In most PSO variants, all the particles learn from the historical best positions, such as the global best (gbest), the neighborhood best position (nbest), or the personal best position (pbest). These historical best positions may stay the same for many iterations, and this learning strategy easily leads to premature convergence [21].

In high-dimensional FS problems, the huge search space is a great challenge for the search efficiency of PSO. Further, the existence of local optimal solutions also increases the chance of premature convergence [22].

To overcome the aforementioned problems and improve the performance of PSO in FS problem, this paper proposes a novel three layer PSO (TLPSO) for FS. In a swarm, it is obvious that particles are in different evolution status and they have different potential in exploration and exploitation abilities [23]. Therefore, different learning strategies should be applied to particles with respect to their own evolution status. In the proposed TLPSO, a three layer structure is employed instead of the fully connected topology. In each iteration, the particles are sorted according to their fitness values and then divided into three layers: elite, ordinary, and inferior. The elite particles aim at local exploitation to further enhance the quality of solutions. For the inferior particles, they focus on exploring the entire search space and finding potential regions of good solutions. The ordinary particles keep a balance between global exploration and local exploitation.

Instead of learning from the historical best positions, the TLPSO employs a random learning exemplar selection strategy to enhance the population diversity. Particles in the ordinary and inferior layers randomly choose learning exemplars from the current swarm. Hence, all the superior particles can be the candidate learning exemplars. By using this random selection strategy, the population diversity is likely enhanced. In order to improve exploitation ability of the algorithm, a local search operator based on the Gaussian sampling is utilized on the elite particles. The main contributions of this paper are summarized as:

This paper proposes a PSO algorithm with a novel three layer structure instead of the classical fully informed topology. Different particles are treated differently to fully investigate their searching abilities. The three layer structure and the learning strategy in the TLPSO can better maintain population diversity and enrich the searching behavior of the swarm.

For the ordinary and inferior particles, a random learning exemplar selection strategy is proposed instead of learning from those historical best positions. Therefore, gbest does not dominate the searching process of the entire swarm. This random selection strategy is beneficial for diversity enhancement and makes particles can explore the entire feature space.

For the elite particles, a local search operator based on the Gaussian distribution is utilized. It is able to improve the exploitation ability and the convergence speed of the algorithm.

The TLPSO is compared with seven well known meta-heuristic algorithm based FS methods. The experimental results show that TLPSO produces more accurate feature subsets in most of the cases and selects markedly fewer features than other approaches.

The rest of the paper is organized as follows. Section 2 introduces the background information and some related works. Section 3 describes the proposed TLPSO for FS. Section 4 presents the extensive experiments to verify the effectiveness of the proposed approach. Finally, conclusions are drawn in section 5.

2 Background information and related works

2.1 Particle swarm optimization

PSO is a population based optimization algorithm which imitates the social behaviors of fish schooling or bird flocking [24]. Due to its simple and effective searching mechanism, it has drawn a lot of attentions. In the standard PSO, each particle is described by two vectors, namely, position and velocity. Each particle’s position is encoded as a candidate solution of the problem and the velocity decides the flying speed and direction of the particles. In the beginning, the particles’ positions are randomly generated in the search space and the velocities are set to 0. The particles are evaluated with a fitness function. During the evolution process, each particle learns from the whole swarm’s best experience and its own flying experience. For the ith particle in the swarm, its position is represented as X = {x_i1, x_i2, ⋯ , x_iD} and its velocity V = {v_i1, v_i2, ⋯ , v_iD}. PSO updates the velocity and position of each particle as follows: $\begin{matrix} V_{i} (t + 1) = w \times V_{i} (t) + c_{1} \times r_{1} \times ({pbest}_{i} - X_{i} (t)) \\ + c_{2} \times r_{2} \times (gbest - X_{i} (t)) \end{matrix}$ (1) $X_{i} (t + 1) = X_{i} (t) + V_{i} (t + 1)$ (2) where V_i (t) is the velocity of particle i at the tth iteration. X_i (t) is the position vector of particle i. pbest_i is the personal best. w is the inertia weight which is used to keep the balance between global search and local search. c₁ is the cognitive weight and c₂ is the social weight. r₁ and r₂ are two random numbers between [0,1].

2.2 Related works

Meta-heuristic algorithms have been widely used in various optimization problems. FS is a combinatorial optimization problem which aims at reducing the number of features while maintaining the classification accuracy. GA is considered as the first meta-heuristic algorithm used to tackle FS problems. Chakraborty [25] proposed a FS method based on GA and the feature subsets were evaluated with a fuzzy fitness function. Kabir et al. [9] proposed a new local search operator in GA to fine-tune its searching ability in FS problem. Ma et al. [26] developed a tribe competition based GA for FS in which each tribe focused on exploring a specific part of the search space.

Hance et al. [14] proposed a FS method based on ABC with a novel neighborhood selection mechanism. Li et al. [27] proposed a DE based FS method in the filter framework and a novel mutation operator was introduced into DE. Zorarpaci et al. [28] combined ABC and DE to select optimal feature subsets. Kashef et al. [12] treated the features as the graph nodes which were fully connected to each other and ACO was used to select the nodes. Wan et al. [29] proposed a FS approach based on a modified binary coded ACO combined with GA. Shunmugapriya et al. [30] proposed a hybrid algorithm which combined the characteristics of ACO and ABC to select optimal feature subsets.

The research of employing meta-heuristics to tackle FS problems is ongoing, and many recent meta-heuristics are applied to FS. Emary et al. [31] used two versions of binary grey wolf optimizer (GWO) in the FS domain to maximize the classification accuracy and minimize the number of features. Abdel-Basset et al. [32] proposed a FS method based on GWO integrated with a two-phase mutation operator. Aladeemy et al. [33] employed the self-adaptive cohort intelligence (SACI) for FS and three opposition based learning (OBL) strategies were used to enhance the searching capacity of SACI. Faris et al. introduced the salp swarm algorithm (SSA) to tackle FS problems and a crossover operator was used to enhance the exploratory behavior of the algorithm [34]. Eid [35] proposed a binary version of Whale Optimization Algorithm (WOA) using the sigmoid function for FS. Hussien et al. [36] proposed two binary variants of WOA to find the best feature subset that contains the representative information of all the data. A hybrid algorithm based on SSA and chaos theory was proposed for FS in [37]. The logistic map help SSA find better feature subsets. Mafarja et al. [38] proposed two versions of binary grasshopper optimization algorithm (GOA) to select optimal feature subsets in classification problems. Arora et al. [39] employed the butterfly optimization algorithm (BOA) to select optimal feature subset. Since this paper mainly focuses on the PSO based FS method, more works on other meta-heuristics based FS approaches can be found in [4, 40].

Various PSO based FS methods have been presented. Many works introduced several novel operators into PSO to improve its performance in FS. Chuang et al. [41] proposed a catfish PSO for FS in which the worst particles in the swarm were replaced by the catfish particles to improve the performance of PSO. Vieira et al. [42] proposed a modified PSO based FS method for mortality prediction of septic patients with several new operators including local search and swarm best resetting. Xue et al. [43] proposed three new initialization strategies and three new pbest and gbest updating mechanisms in PSO to develop novel FS approaches. Moradi et al. [44] introduced a local search operator into the binary PSO to select less correlated and salient feature subset.

Some researchers combined PSO with other algorithms to enhance its performance. Nguyen et al. [45] introduced the crossover and mutation operators into PSO to improve its search ability and the proposed approach was used to select optimal feature subsets. Mistry et al. [46] proposed a FS method based on PSO embedded with the micro GA and the FS method was used for facial emotion recognition. Some works have been done on the encoding scheme of PSO. Tran et al. [47] proposed a FS method called potential particle swarm optimization (PPSO) which employed a new representation scheme to reduce the search space. Engelbrecht et al. [48] utilized the set based PSO for FS. In the set based PSO, a particle’s position and velocity are defined as mathematical sets. Tran et al. [11] proposed a variable-length PSO representation for FS which was able to define smaller search space and improve the performance of PSO.

Observing these algorithms, most of them maintain the fully informed topology and the selection strategy of learning exemplars. In order to solve the problem of premature convergence, some PSO variants with different learning strategies were proposed. Qiu [49] proposed a FS method based on a multi-swarm PSO in which several small-sized sub-swarms evolved independently and an elite learning strategy was used to promote the information exchange among the sub-swarms. Kamyab et al. [50] utilized several local variants of PSO for FS and these approaches improved the population diversity of PSO. Gu et al. [51] employed the competitive swarm optimizer (CSO) for FS. In CSO, two particles are compared with each other and the loser learns from the winner.

3 The proposed approach

Searching for the optimal feature subset is a challenging problem, especially in the wrapper based framework. With the increase of feature number, the FS problem is becoming more and more difficult. Furthermore, FS can be considered as a multi-modal problem in which a large number of local optimal solutions exist [52]. PSO has attracted increasing attention from the FS community due to its algorithmic simplicity and fast convergence speed. However, PSO still faces the problem of premature convergence in complex FS problems. In the classical PSO, particles would move towards a single position by the guidance of the gbest. The fast information exchange speed among the swarm would lead to the quick loss of population diversity. FS method may easily lead to stagnation in local optimal solutions. Therefore, it is important to preserve high population diversity to avoid local optimal solutions [53]. Besides, the optimizer also needs to locate optimal feature subsets in a required time. This means that the optimizer in the FS model should keep a good balance between convergence speed and population diversity. The flowchart of the proposed TLPSO is shown in Fig. 1 and the details of the algorithm will be introduced in section 3.

Fig. 1

Flowchart of the TLPSO.

3.1 Fitness function

In the wrapper based FS method, a classifier is utilized to calculate the classification accuracy of the feature subsets. Previous works have proved that using a relatively simple classification algorithm in a wrapper approach can guarantee a good (near-optimal) feature subset in complex classification problems [4]. Therefore, KNN (K-nearest neighborhood) [54] is employed due to its simplicity and ease of implementation. KNN is a very popular non-parametric method which can be used to classify new instances based on the features and training samples. The only control parameter in KNN is the number of neighbors. In this study, K is set as 5 to keep the efficiency of KNN and avoid noisy data.

FS aims to have higher classification accuracy and fewer numbers of features. Therefore, the fitness function considers both the classification accuracy and the number of selected features. $Fitness = α \times E_{R} (F) + β \times \frac{{num}_{sel}}{D}$ (3) where E_R (F) is the classification error rate of the corresponding feature subset F, num_sel is the number of selected features, and D is the number of all features. α and β are two parameters to determine the relative importance of the error rate and the number of selected features. Since the main goal of FS is to improve the classification accuracy, α = 0.9 and β = 0.1 are used in this study [49].

3.2 TLPSO

It is a natural phenomenon that particles in the swarm are in different evolution status and their potential in the exploration and exploitation capabilities are also different. The main idea of the TLPSO is that particles are treated differently according to their evolution status. By using different learning strategies, the potential of different particles could be fully investigated. In order to promote population diversity, a random learning exemplar selection strategy is proposed instead of learning from those historical best positions. A local search operator is employed to enhance the exploitation ability and improve the convergence speed. Therefore, the algorithm is able to keep a balance between population diversity and convergence speed.

Suppose there are NP particles in the swarm. In each iteration, the particles are sorted in ascending order according to their fitness values, and then the particles are grouped into three layers: elite, ordinary, and inferior. The number of particles in each layer is equal, i.e, NP/3. Because superior solutions are more likely to be found nearby the elite particles, the elite particles focus on local exploitation and try to further improve the fitness values. Besides, the elite particles with better fitness values can be used to guide the searching of other particles. The inferior particles concentrate on global exploration and try to find more potential regions with high quality solutions. For the ordinary particles, it is favorable to keep a balance between exploration and exploitation. Figure 2 shows the main framework of the TLPSO.

Fig. 2

General idea of TLPSO.

Based on the above analysis, the particles in different layers search for better positions in different ways. For the particles in the inferior layer, they can learn from the elite and the ordinary particles. The ordinary particles update their positions by the guidance of the elite particles. The particles in the ordinary and inferior layers update their velocities and positions as follows:

$\begin{matrix} V_{L_{2}} (t + 1) = r_{1} \times V_{L_{2}} (t) + r_{2} \times (X_{L_{1, p}} (t) - X_{L_{2}} (t)) \\ + ϕ \times r_{3} \times (X_{L_{1, q}} (t) - X_{L_{2}} (t)) \end{matrix}$ (4) $\begin{matrix} V_{L_{3}} (t + 1) = r_{1} \times V_{L_{3}} (t) + r_{2} \times (X_{L_{1}} (t) - X_{L_{3}} (t)) \\ + ϕ \times r_{3} \times (X_{L_{2}} (t) - X_{L_{3}} (t)) \end{matrix}$ (5) $X_{L_{i}} (t + 1) = X_{L_{i}} (t) + V_{L_{i}} (t + 1), i = 2, 3$ (6) where V_{L
₂} (t) and V_{L
₃} (t) represent the velocities of the ordinary and inferior groups, respectively. r₁, r₂, and r₃ are three random variables between [0,1]. ϕ is a parameter within [0,1] which is used to control the influence of the second learning exemplar. In this study, ϕ is set to 0.4. These parameter settings are suitable for the ordinary and inferior particles searching for better positions [18]. Equation (4) shows that the inferior particles are guided by two random particles chosen from two better layers. These two superior particles can guide the inferior group explore the entire search space and find potential regions of good solutions. For the ordinary particles, they randomly choose two particles from the elite group and X_{L
_1,p} (t) owns better fitness value than X_{L
_1,q} (t).

According to Equations (4) and (5), particles in the ordinary and inferior layers can learn from any better particles in the current swarm instead of learning from historical best positions. Therefore, all the superior particles are the candidate learning exemplars. The randomness in the selection of learning exemplar could further improve the population diversity, which is beneficial for complex FS problems. This kind of learning and imitating behaviors realize the concept of social learning [55].

Compared with the particles in the ordinary and inferior layers, particles in the elite layer concentrate on local search. In some similar researches using the multi-layer structure, the superior particles directly entered into the next iteration without updating their positions. In CSO [56], the winners entered into the next iteration directly. In the level based learning swarm optimizer (LLSO) [23], the particles in the top level also went into the next iteration directly.

However, those superior particles possess valuable information about the global optimal solution and have strong local search abilities. According to the design scheme of TLPSO, the elite particles concentrate on local exploitation. Therefore, a local search operator based on the Gaussian sampling is performed on the elite particles to further improve their fitness values. For the elite particles in TLPSO, their positions are updated as follows: $X_{L_{1}} (t + 1) = N (\frac{X_{L_{1}} (t) + gbest}{2}, δ^{2})$ (7)

It can be seen from Equation (7) that the elite particles update their positions by the Gaussian distribution with the mean of (X_{L
₁} (t) + gbest)/2 and the variance of δ². Each elite particle searches in the vicinity area of the midpoint of itself and the gbest. The standard deviation δ decides the search range around the midpoint. An appropriate value of δ is important for the algorithm. With larger value of δ, the elite particle can search larger area. In this study, the time decreasing δ is employed. $δ = 0.3 - 0.2 \times \frac{ite}{maxite}$ (8)

According to Equation (8), the standard deviation decreases from 0.3 to 0.1 in the evolution process. Figure 3 shows a simple 2-dimensional example in which the dots are randomly generation around the point (0,0) with standard deviation 0.1 and 0.3. When the standard deviation is set to 0.3, the points can locate much larger area than the value of 0.1. By using the time decreasing δ, the elite particles search for relatively large areas initially and focus on more promising areas in the later stage of the algorithm.

Fig. 3

A simple 2-dimensional illustration of different std.

This local search operator can improve the local search ability and the convergence speed of the algorithm. Since the local search operator only performs on the elite particles, it would not bring the problem of losing population diversity or falling into local optimum. Even if one elite particle falls into local optima after the local search operation, it would be divided into a lower layer in the next iteration. Then it can jump out of local optimal position by learning from better particles.

The details of the TLPSO are shown in Algorithm 1. Compared with the original PSO, TLPSO owns several different characteristics.

Instead of the fully connected topology, TLPSO divides the particles into three layers according to their fitness values and different learning strategies are adopted.

In most PSO variants, all the particles learn from the historical best positions which may stay the same for several iterations. In the TLPSO, the random exemplar selection strategy can preserve better population diversity and reduce the probability of falling into local optimal solutions.

The local search operator can further improve the exploitation ability of the elite particles. It can make good use of the elite particles and the gbest position.

The original PSO realizes the selection mechanism by updating each particle’s pbest. The TLPSO does not need to store the pbest positions and their corresponding fitness values. In the TLPSO, the selection is implicitly achieved in the sorting process.

Algorithm 1: Pseudocode of the TLPSO for FS

Input:NP: the number of particles; D: the number of

all features; maxite: maximum iterations.

Output: optimal feature subset, fitness value, and its

classification accuracy in the test set.

Initialize the swarm randomly;

Calculate their corresponding fitness values

according to Equation (3);

Find the best particle in the swarm and store it as the

gbest and its fitness value gbestfit;

Whileite < = maxite

Calculate the fitness values of the swarm;

Sort all the particles in ascending order according to

their fitness values;

Divide the particles into three layers: elite, ordinary,

and inferior;

For elite particles

Update their positions according to Equation (7);

End For

For ordinary particles

Randomly choose two particles from the elite group;

Update their velocities and positions according to

Equations (4) and (6);

End For

For inferior particles

Randomly select one particle from each of the

higher layer;

Update their velocities and positions according

to Equations (4) and (6);

End For

Calculate the fitness values of the swarm;

Update gbest and gbestfit;

ite = ite = 1;

End While

Build the classification model with the

optimal feature subset;

3.3 TLSPO for FS

PSO is originally designed for continuous optimization problems. The velocity and the position in the standard PSO are both continuous values. However, in FS problems, each position vector represents a candidate feature subset and each dimension decides whether the corresponding feature is chosen (0 or 1). Therefore, a transformation function is needed to map the continuous search space to a binary version. Many previous PSO based FS methods employed the sigmoid transformation function [57]. But recently, several studies indicated that the continuous PSO with a simple decoding scheme shows better performance in FS problems [13 , 49].

In this study, the position of each particle i is denoted as X = {x_i1, x_i2, ⋯ , x_iD} in which each dimension is a real number between [0,1]. Each dimension of the position corresponds to an original feature. A threshold is used to decide whether a feature is selected. The threshold is set to 0.5 in this study. If x_id > 0.5, the dth feature is selected in this feature subset. Otherwise, the dth feature is not included. In this way, the TLPSO can be used to select feature subsets. Initially, the position is randomly generated in the range of [0,1] and the velocity is set to 0. During the search process, the position is restricted in the range of [0,1].

3.4 Time complexity analysis

Given the maximum number of iterations, the time complexity of a meta-heuristic is calculated by analyzing the extra time in each iteration without considering the time of fitness evaluation, which is problem dependent. From Algorithm 1, we can see that the proposed TLPSO takes O (NP × log(NP) + NP) to rank the swarm and divide the swarm into three layers in each iteration. In the evolution process, all the particles update their velocities and positions and this takes O (NP × D) which is the same as the classical PSO. Overall, TLPSO takes extra O (NP × log(NP) + NP) in each iteration compared with PSO. As for the space complexity, the TLPSO does not need to store the positions and fitness values of pbest, which takes O (NP × D) space. Therefore, TLPSO needs much smaller space than PSO.

4 Experimental results and analysis

4.1 Dataset

To examine the effectiveness of the proposed TLPSO, 16 UCI datasets and 2 high-dimensional microarray datasets are used for experiments. These diversified datasets represent a wide variety of real-world classification problems and they have been used in many researches of FS. Table 1 shows a brief description of the used datasets.

Table 1
Datasets

Dataset #Features #Instances #Classes

Breastcancer 9 699 2

Glass 10 214 2

Heart 13 270 2

Wine 13 178 3

Australia 14 690 2

Zoo 16 101 6

Congress 16 435 2

Lymphography 18 148 3

Parkinson 22 195 2

Spect 22 267 2

WDBC 30 569 2

Ionosphere 34 351 2

Sonar 60 208 2

Musk1 166 476 2

Arrhythmia 279 452 16

LSVT 309 126 2

Colon 2000 62 2

Leukemia 7129 72 2

Dataset	#Features	#Instances	#Classes
Breastcancer	9	699	2
Glass	10	214	2
Heart	13	270	2
Wine	13	178	3
Australia	14	690	2
Zoo	16	101	6
Congress	16	435	2
Lymphography	18	148	3
Parkinson	22	195	2
Spect	22	267	2
WDBC	30	569	2
Ionosphere	34	351	2
Sonar	60	208	2
Musk1	166	476	2
Arrhythmia	279	452	16
LSVT	309	126	2
Colon	2000	62	2
Leukemia	7129	72	2

For each dataset, 70% of the instances are randomly chosen for training and the remaining instances are used as the test set. During the training process, the feature subset is evaluated by the 10-fold cross-validation with KNN. The training set is divided into 10 folds with equal size. 9 folds are utilized for training and the remaining 1 fold is utilized to calculate the error rate. The mean error rate of the 10-fold cross validation is the classification error rate of the feature subset. It should be noticed that the cross-validation is an inner loop of the training data. The test set is kept hidden from the cross-validation process and it is only used for the final evaluation purpose only. Since the numbers of instance in the Colon and Leukemia datasets are relatively few, 5-fold cross validation is used in these two datasets.

4.2 Comparative algorithms and parameter settings

The proposed TLPSO is compared with the original PSO [57] and six other state-of-the-art meta-heuristic based FS methods: GA [58], SSA [34], GWO [31], CSO [51], BOA [38], and WOA [59]. For all the approaches, the population size is set to 20, and the maximum number of iterations is set to 50. Other parameters are shown in Table 2. To obtain statistically meaningful results, each approach is repeated 30 independent runs on each dataset. All the experiments are implemented in Matlab on Windows operating system using a desktop computer with Intel(R) Core(TM) i5-6500 at 3.2 GHz and 8.00 GB of RAM.

Table 2
The parameter settings

Algorithm Parameter Value

PSO c ₁ 2

c₂ 2

w [0.9 0.4]

GA Crossover 0.8

Mutation 0.1

SSA Number of leaders of1

GWO a [2 0]

CSO Social factor 0.1

BOA a 0.1

BOA c [0.01 0.25]

WOA a [2 0]

Algorithm	Parameter	Value
PSO	c ₁	2
	c₂	2
	w	[0.9 0.4]
GA	Crossover	0.8
	Mutation	0.1
SSA	Number of leaders	of1
GWO	a	[2 0]
CSO	Social factor	0.1
BOA	a	0.1
BOA	c	[0.01 0.25]
WOA	a	[2 0]

4.3 Comparison with other meta-heuristic based approaches

This subsection compares the performance of the TLPSO with the original PSO and six other well known meta-heuristic based FS methods. These algorithms are quantitatively compared with the following metrics: the mean and standard deviation of classification accuracy, the average number of selected features, the average fitness value in the training set, and the average computational time.

Table 3 outlines the means and standard deviations of classification accuracies in the 30 independent runs. TLPSO achieves the highest mean classification accuracy in 11 out of 18 datasets. TLPSO ranks 2nd in 4 datasets and 3rd in two datasets. The gaps between TLPSO and the leaders in these 6 datasets are very small. SSA outperforms other methods in 3 datasets. GWO comes next by producing the best classification accuracies in 2 datasets. PSO cannot outperform other comparative methods in any dataset. In the five high-dimensional datasets (>100 features), TLPSO outperforms other comparative methods in 4 datasets. In terms of the mean rank in all the 18 datasets, TLPSO ranks 1st with the value of 1.61 while CSO and BOA ties for the 2nd with the value of 4.11. The results clearly demonstrate the strength of TLPSO in obtaining accurate feature subsets.

Table 3
Means and standard deviations of classification accuracies in each dataset

PSO GA SSA GWO CSO BOA WOA TLSPO

Breastcancer 0.9727 0.9662 0.9719 0.9667 0.9681 0.9667 0.9705 0.9738

0.0093 0.0015 0.0076 0 0.0045 0 0.008 0.0075

Glass 0.6636 0.6415 0.6846 0.6338 0.6354 0.6354 0.6538 0.6815

0.0707 0.0193 0.0529 0.0107 0.0299 0.0146 0.0477 0.058

Heart 0.8296 0.821 0.8272 0.8339 0.8247 0.8358 0.8173 0.8438

0.0152 0.0298 0.0273 0.0319 0.0312 0.0377 0.0278 0.0193

Wine 0.9648 0.963 0.95 0.963 0.9704 0.9667 0.9463 0.9769

0.0184 0.0195 0.0248 0.0147 0.0156 0.017 0.0238 0.0171

Australia 0.8359 0.8389 0.8264 0.826 0.8385 0.8351 0.8442 0.8435

0.017 0.0164 0.0193 0.0232 0.0189 0.0219 0.0145 0.0175

Zoo 0.9204 0.9339 0.9226 0.9516 0.9355 0.9452 0.8903 0.9419

0.0269 0.0224 0.0224 0.0222 0.0263 0.0306 0.0531 0.0269

Congress 0.917 0.9313 0.9336 0.9305 0.9313 0.9313 0.9313 0.9313

0.0204 0 0.0072 0.0111 0 0.0036 0 0

Lymphography 0.7378 0.7667 0.7289 0.7689 0.7422 0.7822 0.7644 0.7755

0.0266 0.0395 0.0542 0.0335 0.0459 0.0403 0.0447 0.042

Parkinson 0.8881 0.8881 0.9068 0.8898 0.9017 0.8983 0.8949 0.9051

0.0164 0.0214 0.028 0.0268 0.0107 0.0309 0.0274 0.0272

Spect 0.7877 0.7778 0.7889 0.7926 0.7802 0.784 0.7926 0.7901

0.0152 0.0267 0.0288 0.0113 0.0278 0.0145 0.014 0.0234

WDBC 0.9587 0.9602 0.9339 0.955 0.9585 0.9602 0.9591 0.9608

0.0126 0.0132 0.0266 0.0163 0.0118 0.0106 0.0143 0.01

Ionosphere 0.8226 0.8575 0.8236 0.8316 0.8613 0.8264 0.8453 0.8698

0.0198 0.0253 0.0295 0.033 0.0184 0.0205 0.0173 0.0203

Sonar 0.7926 0.8032 0.773 0.7992 0.8016 0.7905 0.7921 0.8159

0.0326 0.0358 0.0389 0.0283 0.0431 0.0316 0.0264 0.0288

Musk1 0.82 0.8517 0.8042 0.8287 0.849 0.8182 0.8126 0.8434

0.0273 0.0296 0.0151 0.0219 0.0428 0.0255 0.0221 0.0284

Arrhythmia 0.6691 0.6632 0.6669 0.6507 0.6713 0.6596 0.6551 0.6801

0.0127 0.0133 0.0134 0.0241 0.0219 0.018 0.0231 0.014

LSVT 0.7316 0.7579 0.7553 0.7066 0.7368 0.7579 0.7553 0.7842

0.0461 0.0666 0.0412 0.0477 0.0411 0.0688 0.0448 0.0659

Colon 0.7579 0.7526 0.7579 0.7684 0.7737 0.7632 0.7842 0.7921

0.0368 0.0499 0.0444 0.0272 0.0254 0.0372 0.0461 0.0361

Leukemia 0.7045 0.6864 0.6591 0.6205 0.6727 0.7109 0.6682 0.7182

0.1011 0.0725 0.085 0.0473 0.0852 0.0813 0.0431 0.0469

Mean rank 5.22 4.5 5.28 5.39 4.11 4.11 4.89 1.61

Final rank 6 4 7 8 2 2 5 1

	PSO	GA	SSA	GWO	CSO	BOA	WOA	TLSPO
Breastcancer	0.9727	0.9662	0.9719	0.9667	0.9681	0.9667	0.9705	0.9738
	0.0093	0.0015	0.0076	0	0.0045	0	0.008	0.0075
Glass	0.6636	0.6415	0.6846	0.6338	0.6354	0.6354	0.6538	0.6815
	0.0707	0.0193	0.0529	0.0107	0.0299	0.0146	0.0477	0.058
Heart	0.8296	0.821	0.8272	0.8339	0.8247	0.8358	0.8173	0.8438
	0.0152	0.0298	0.0273	0.0319	0.0312	0.0377	0.0278	0.0193
Wine	0.9648	0.963	0.95	0.963	0.9704	0.9667	0.9463	0.9769
	0.0184	0.0195	0.0248	0.0147	0.0156	0.017	0.0238	0.0171
Australia	0.8359	0.8389	0.8264	0.826	0.8385	0.8351	0.8442	0.8435
	0.017	0.0164	0.0193	0.0232	0.0189	0.0219	0.0145	0.0175
Zoo	0.9204	0.9339	0.9226	0.9516	0.9355	0.9452	0.8903	0.9419
	0.0269	0.0224	0.0224	0.0222	0.0263	0.0306	0.0531	0.0269
Congress	0.917	0.9313	0.9336	0.9305	0.9313	0.9313	0.9313	0.9313
	0.0204	0	0.0072	0.0111	0	0.0036	0	0
Lymphography	0.7378	0.7667	0.7289	0.7689	0.7422	0.7822	0.7644	0.7755
	0.0266	0.0395	0.0542	0.0335	0.0459	0.0403	0.0447	0.042
Parkinson	0.8881	0.8881	0.9068	0.8898	0.9017	0.8983	0.8949	0.9051
	0.0164	0.0214	0.028	0.0268	0.0107	0.0309	0.0274	0.0272
Spect	0.7877	0.7778	0.7889	0.7926	0.7802	0.784	0.7926	0.7901
	0.0152	0.0267	0.0288	0.0113	0.0278	0.0145	0.014	0.0234
WDBC	0.9587	0.9602	0.9339	0.955	0.9585	0.9602	0.9591	0.9608
	0.0126	0.0132	0.0266	0.0163	0.0118	0.0106	0.0143	0.01
Ionosphere	0.8226	0.8575	0.8236	0.8316	0.8613	0.8264	0.8453	0.8698
	0.0198	0.0253	0.0295	0.033	0.0184	0.0205	0.0173	0.0203
Sonar	0.7926	0.8032	0.773	0.7992	0.8016	0.7905	0.7921	0.8159
	0.0326	0.0358	0.0389	0.0283	0.0431	0.0316	0.0264	0.0288
Musk1	0.82	0.8517	0.8042	0.8287	0.849	0.8182	0.8126	0.8434
	0.0273	0.0296	0.0151	0.0219	0.0428	0.0255	0.0221	0.0284
Arrhythmia	0.6691	0.6632	0.6669	0.6507	0.6713	0.6596	0.6551	0.6801
	0.0127	0.0133	0.0134	0.0241	0.0219	0.018	0.0231	0.014
LSVT	0.7316	0.7579	0.7553	0.7066	0.7368	0.7579	0.7553	0.7842
	0.0461	0.0666	0.0412	0.0477	0.0411	0.0688	0.0448	0.0659
Colon	0.7579	0.7526	0.7579	0.7684	0.7737	0.7632	0.7842	0.7921
	0.0368	0.0499	0.0444	0.0272	0.0254	0.0372	0.0461	0.0361
Leukemia	0.7045	0.6864	0.6591	0.6205	0.6727	0.7109	0.6682	0.7182
	0.1011	0.0725	0.085	0.0473	0.0852	0.0813	0.0431	0.0469
Mean rank	5.22	4.5	5.28	5.39	4.11	4.11	4.89	1.61
Final rank	6	4	7	8	2	2	5	1

Table 4 reports the average number of selected features via the 8 algorithms in each dataset. According to Table 4, all the algorithms can significantly reduce the number of features. TLPSO outperforms other methods in 14 datasets in terms of selecting the minimum number of features. Moreover, TLPSO outperforms other methods with a remarkable difference in those high-dimensional datasets. Take the LSVT dataset for example, TLPSO selects 76.35 features on average, while the second best is 135.9 achieved by CSO. It should be noticed that TLPSO also achieve the best classification accuracy in the LSVT dataset.

Table 4

Average number of selected features in each dataset

	PSO	GA	SSA	GWO	CSO	BOA	WOA	TLSPO
Breastcancer	2.73	2	2.7	2	2.1	2.1	2.2	2.6
Glass	2.87	2.9	3.4	2.95	3	3	2.7	2.85
Heart	3.9	4.5	4.8	5.55	4.7	4.5	4.3	3.9
Wine	5.1	5.1	5.2	5.5	4.4	5.4	4.6	4.35
Australia	5.27	4	4.3	4.7	3.8	3.5	3	3.6
Zoo	7.73	7.4	8	8.95	8.2	7.7	8.3	7.15
Congress	5.53	2	4.2	3.8	2	3.1	2	2
Lymphography	6.3	6.8	6.9	8.8	7.3	7.1	9.4	5.8
Parkinson	7.9	6.9	10	10.3	6.8	9.1	9.3	5.7
Spect	9.9	5.7	8.6	11.5	6.6	10.1	11.1	5.95
WDBC	12.2	7.5	10.1	11	6.6	10	8.9	6.6
Ionosphere	7.9	6.9	12.8	13.25	7	11.1	9.7	5.4
Sonar	23.2	24.45	26.5	33.15	22.2	26.8	25.3	17.05
Musk1	78.47	74.1	77.2	94.7	68.8	75.3	71.1	47.8
Arrhythmia	124.4	127.6	130.8	165.5	119.4	129.8	120.3	67.2
LSVT	143.7	149.9	149	195.65	135.9	143.3	153.9	76.35
Colon	963.7	969.6	959.9	1244.4	886.9	939.5	936.7	369.05
Leukemia	3485.4	3434	3469.5	3660.8	3186.6	3416.9	2550.9	1123.7
Mean rank	5	3.61	6.11	7.11	3.17	4.72	4.35	1.39
Final rank	6	3	7	8	2	5	4	1

Combining the results of Tables 3 and 4, we can conclude that TLPSO can achieve similar or even better classification accuracy with much fewer features, when compared with other methods. For example in the Musk1 dataset, GA achieves the best classification accuracy of 0.8517 with 74.1 features on average. The mean accuracy for the TLPSO is 0.8434 but TLPSO only selects 47.8 features on average. In summary, these results suggest that the TLPSO can effectively enhance the average classification performance and decrease the number of features.

To show the significant difference between the classification accuracies of the TLPSO and other approaches, the Wilcoxon Rank Sum test is performed with the significance level of 0.05. If the p-value is less than 0.05, the null hypothesis is rejected which means there is significant difference between the two approaches. Table 5 shows the results of the Wilcoxon Rank Sum Test, where ‘+’ or ‘–’ means the classification performance of the TLPSO is significantly better or worse than the comparative approach and ‘=’ means there is no significant difference between the TLPSO and the comparative approach. TLPSO achieves better or similar performance in all the datasets compared to other 7 methods. Comparing TLPSO to PSO, GA, SSA, GWO, CSO, BOA, and WOA, the numbers of datasets where TLPSO achieves significantly better classification performance are 12, 7, 9, 11, 7, 8, and 9, respectively.

Table 5

The results of Wilcoxon sign rank test

	BPSO	GA	BSSA	BGWO	CSO	BOA	WOA
Breastcancer	=	+	=	+	+	+	=
Glass	=	+	=	+	+	+	+
Heart	+	+	+	=	+	=	+
Wine	+	+	+	+	=	+	+
Australia	=	=	+	+	+	+	=
Zoo	+	=	=	=	=	=	+
Congress	+	=	=	=	=	=	=
Lymphography	+	=	+	=	+	=	=
Parkinson	+	+	=	+	=	=	=
Spect	=	=	=	=	=	=	=
WDBC	+	=	+	+	+	=	+
Ionosphere	+	+	+	+	=	+	+
Sonar	+	=	+	=	=	+	+
Musk1	+	=	+	+	=	+	+
Arrhythmia	=	=	=	+	=	+	+
LSVT	+	=	=	+	+	=	=
Colon	+	+	=	=	=	=	=
Leukemia	=	=	+	+	=	=	=

Table 6 outlines the computational time of the TLPSO and other seven FS methods. The running time of the 8 algorithms in the low-dimensional datasets is close. TLPSO has the best computational time in the five high-dimensional datasets. This is because the TLPSO selects much smaller feature subsets on these datasets which makes the fitness evaluation more computational efficient. Comparing the TLPSO to the original PSO, although TLPSO needs to sort the particles in each iteration, it still costs less time PSO in majority of the datasets since it selects fewer features. Overall, it can be stated that the proposed TLPSO can improve the performance of PSO in FS problems without additional computational burden.

Table 6

Average computational time

	PSO	GA	SSA	GWO	CSO	BOA	WOA	TLSPO
Breastcancer	43.59	45.35	49.31	47.64	54.54	55.99	42.98	48.2
Glass	7.47	4.28	4.96	7.82	4.32	4.39	4.08	4.49
Heart	13.62	11.35	10.66	13.59	10.42	13.49	8.31	9.98
Wine	7.56	4.86	5.55	7.01	4.5	6.31	3.95	3.95
Australia	34.63	53.31	47.18	31.42	47.38	34.69	34.63	47.42
Zoo	2.03	1.9	2.071	3.05	2.06	2.15	2.55	1.86
Congress	20.3	16.09	24.93	20.8	14.48	25.45	10.46	18.96
Lymphography	5.22	4.2677	5.15	5.88	5.21	5.76	5.81	3.88
Parkinson	9.05	9.9	10.3	11.98	8.97	9.78	11.44	7.41
Spect	16.17	12.16	16.49	13.3	11.33	16.52	13.04	12.59
WDBC	28.58	26.84	25.95	28.59	28.72	25.22	26.25	31.24
Ionosphere	12.83	21.6	12.32	12.77	19.13	11.47	11.8	21.41
Sonar	6.56	6.12	6.29	6.94	6.32	6.12	6.43	9.54
Musk1	49.3	42.42	46.7	51.54	46.2	43.11	48.93	37.98
Arrhythmia	57.76	58.83	59.99	72.25	62.74	60.23	85.88	52.64
LSVT	6.5	6.5	6.82	7.77	6.57	6.63	7.73	5.9
Colon	10.5	10.24	11.41	14.66	10.02	10.19	13.94	6.7
Leukemia	88.23	103.68	88.23	110.21	81.2	84.81	113.43	47.57
Mean rank	4.83	3.72	4.83	6.5	3.89	4.61	4.39	3.22
Final rank	5	2	5	8	3	7	4	1

Taken together, the comparison with other state-of-the-art algorithms shows the TLPSO owns following benefits:

TLPSO provides the highest classification accuracy in most of the cases and it shows stable performance in all the 18 datasets.

TLPSO selects much fewer features than other methods in high dimensional datasets, which is very crucial for feature selection problems.

TLPSO employs a novel structure and the hybrid learning strategies and it is still computationally efficient compared with other methods. It costs less time than other methods in high-dimensional datasets.

The good performance of the TLPSO can be attributed to following reasons:

TLPSO maintains the multi-layer structure and it is beneficial for preserving the population diversity.

Different learning strategies are utilized to fully investigate the potential of particles in different layers.

The local search operator with time decreasing std. improves the exploitation ability and accelerates the convergence speed of the algorithm.

TLPSO can keep the balance of global exploration and local exploitation due to its hybrid learning strategy.

4.4 Analysis on the convergence curves

Figure 4 shows the convergence curves of the TLPSO and the original PSO on four representative datasets (Glass, Ionosphere, Musk1, and Colon). These datasets cover small, medium, and large datasets. The graphs are drawn by the average values of the fitness values in the 30 independent runs. Figure 4 can show the convergence characteristics of the two algorithms and whether the algorithm falls into local optimal solutions.

Fig. 4

Convergence curves of TLPSO and PSO for Glass, Ionosphere, Muks1, and Colon datasets.

It can be seen that TLPSO outperforms PSO in terms of fitness value in 3 cases, especially in those high-dimensional datasets. PSO is likely to get trapped into local optimal feature subsets in those high-dimensional datasets. For example in the Musk1 dataset, the convergence speed of PSO slows down at around the 16th iteration and it can hardly improve the quality of feature subsets afterwards. However, the TLPSO is able to obtain better solutions gradually in the evolution process. The results indicate that the TLPSO can jump out of local optimal solutions and better explore the entire search space.

4.5 Discussion of the local search operator and the parameter δ

This subsection examines the impact of the local search operator and the parameter δ. In the TLPSO, the elite particles update their positions by the Gaussian distribution in which the standard deviation δ is time decreasing. In order to verify the effectiveness of the local search operator and the affect of the parameter δ, four methods with different values of parameter δ are used for comparison: TLPSO1 (time decreasing, δ = [0.3 0.1]), TLPSO2 (δ = 0.3), TLPSO3 (δ = 0 . 1), and TLPSO4 (without local search operator). Four representative datasets (Wine, Ionosphere, Sonar, and LSVT) are chosen for this set of experiment since they cover low-dimensional, median-dimensional, and high-dimensional datasets. Table 7 presents the means and standard deviations of classification accuracy, and the numbers of selected features.

Table 7
Comparison of different std

Mean std. #nof

Wine

TLPSO1 0.9769 0.0171 4.35

TLPSO2 0.9648 0.0216 4.2

TLPSO3 0.9685 0.02 5

TLPSO4 0.9639 0.0212 4.4

Ionosphere

TLPSO1 0.8698 0.0203 5.4

TLPSO2 0.8523 0.0276 4.85

TLPSO3 0.8594 0.0224 7

TLPSO4 0.8509 0.0193 7.9

Sonar

TLPSO1 0.8159 0.0228 17.05

TLPSO2 0.7989 0.0539 15.2

TLPSO3 0.8085 0.0372 21.27

TLPSO4 0.8 0.0388 25.07

LSVT

TLPSO1 0.7842 0.0659 76.35

TLPSO2 0.7948 0.0802 73.55

TLPSO3 0.7711 0.0527 109.3

TLPSO4 0.7632 0.043 124.7

	Mean	std.	#nof
Wine
TLPSO1	0.9769	0.0171	4.35
TLPSO2	0.9648	0.0216	4.2
TLPSO3	0.9685	0.02	5
TLPSO4	0.9639	0.0212	4.4
Ionosphere
TLPSO1	0.8698	0.0203	5.4
TLPSO2	0.8523	0.0276	4.85
TLPSO3	0.8594	0.0224	7
TLPSO4	0.8509	0.0193	7.9
Sonar
TLPSO1	0.8159	0.0228	17.05
TLPSO2	0.7989	0.0539	15.2
TLPSO3	0.8085	0.0372	21.27
TLPSO4	0.8	0.0388	25.07
LSVT
TLPSO1	0.7842	0.0659	76.35
TLPSO2	0.7948	0.0802	73.55
TLPSO3	0.7711	0.0527	109.3
TLPSO4	0.7632	0.043	124.7

TLPSO1 achieves the highest classification accuracy in three datasets and follows behind TLPSO2 in the LSVT dataset. But TLPSO1 obtains smaller standard deviation than TLPSO2 in the LSVT dataset. TLPSO2 shows the largest standard deviations in all the four datasets which means its performance is not as stable as other methods. TLPSO3 ranks 2nd in three dataset and 3rd in the LSVT dataset. TLPSO4 places 4th in three datasets and 3rd in the Sonar dataset. It can be concluded that the local search operator can help to improve the classification accuracy. But a large value of δ may lead to unstable performance. In terms of the feature number, TLPSO2 produces minimum feature numbers in all the 4 datasets. TLPSO1 places 2nd in these 4 datasets and the gaps between TLPSO1 and TLPSO2 are not large. TLPSO4 selects more features than other methods in most of the cases.

The experimental results confirm the efficiency of the local search operator in improving classification accuracy and reducing the number of features. The time decreasing δ can achieve stable classification performance and significantly reduce the number of features.

5 Conclusions

In this research, a novel three layer PSO (TLPSO) is proposed for solving FS problems. In each iteration, the particles in the swarm are grouped into three layers according to their fitness values. Since different particles possess different exploration and exploitation abilities, different learning strategies are employed for particles in different layers. For the ordinary and inferior particles, a random learning exemplar selection strategy is utilized to promote the population diversity. A local search operator based on the Gaussian distribution is employed to improve the exploitation ability of the elite particles. The proposed TLPSO can keep a balance between population diversity and convergence speed. The proposed approach is used for FS problems in the wrapper framework. Seven state-of-the-art meta-heuristic based FS methods are compared with the TLPSO on 18 datasets to assess the efficiency and effectiveness of the proposed approach. TLPSO produces the highest classification accuracy in 11 out of 18 datasets. In several high-dimensional datasets, TLPSO selects 50% fewer features than other meta-heuristic methods. The convergence curves support the ability of TLPSO in jumping out of local optimal solutions and searching for better solutions in the entire feature space. According to the results and analyses, it can be conclude that the proposed TLPSO is an effective tool for solving FS problems.

For future research, we will study how to set the number of particles in each layer adaptively. Furthermore, it would be interesting to hybridize TLPSO with other meta-heuristic techniques to improve its searching ability.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No.61802207 and the NUPTSF under Grant No.NY220027.

References

, Li

and Liu

, Recent advances in feature selection and its applications, Knowledge and Information Systems 53(3) (2017), 551–577.

Dash

and Liu

, Feature selection for classification, Intelligent Data Analysis 1(3) (1997), 131–156.

Liu

and Yu

, Toward integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering 17(4) (2005), 491–502.

Xue

, Zhang

, Browne

, et al., A survey on evolutionary computation approaches to feature selection, IEEE Transactions on Evolutionary Computation 20(4) (2016), 606–626.

Liu

, Wang

, Zhao

, et al., A new feature selection method based on a validity index of feature subset, Pattern Recognition Letters 92 (2017), 1–8.

Senawi

, Wei

and Billings

S.A.

, A new maximum relevance-minimum multi-collinearity method for feature selection and ranking, Pattern Recognition 67 (2017), 47–61.

Peng

, Long

and Ding

, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8) (2005), 1226–1238.

Sheikhpour

, Sarram

M.A.

, Gharaghani

, et al., A survey on semi-supervised feature selection methods, Pattern Recognition 64 (2017), 141–158.

Kabir

, Shahjahan

and Murase

, A new local search based hybrid genetic algorithm for feature selection, Neurocomputing 74(17) (2011), 2914–2928.

10.

Huang

, Cai

and Xu

, A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28(13) (2007), 1825–1844.

11.

Tran

, Xue

and Zhang

, Variable-length particle swarm optimization for feature selection on high-dimensional classification, IEEE Transactions on Evolutionary Computation 23(3) (2019), 473–487.

12.

Kashef

and Nezamabadipour

, An advanced aco algorithm for feature subset selection, Neurocomputing 147 (2015), 271–279.

13.

Qiu

, A hybrid two-stage feature selection method based on differential evolution, Journal of Intelligent & Fuzzy System 39(3), 1–14.

14.

Hancer

, Xue

, Karaboga

, et al., A binary abc algorithm based on advanced similarity scheme for feature selection, Applied Soft Computing 36 (2015), 334–348.

15.

Assiri

A.S.

, Hussien

A.G.

and Amin

, Ant Lion Optimization: variants, hybrids, and applications, IEEE Access (8) (2020), 77746–77764.

16.

Hussien

A.G.

, Abdelazim

, Amin

and Mohamed

A.E.

, a comprehensive review of moth-flame optimisation: variants, hybrids, and applications, Journal of Experimental & Theoretical Artificial Intelligence (2020), 1–21.

17.

Sagayam

and Jude

, ABC algorithm based optimization of 1-D hidden Markov model for hand gesture recognition applications, Computers in Industry 99 (2018), 313–323.

18.

Sagayam

, Hemanth

, Vasanth

, et al., Optimization of a HMM-based Hand Gesture Recognition System Using a Hybrid Cuckoo Search Algorithm, Hybrid Metaheuristics for Image Analysis (2018), 87–114.

19.

Subramaniyam

, A Modified Approach for Face Recognition using PSO and ABC Optimization, International Journal of Innovative Technology and Exploring Engineering 8(7) (2019), 1571–1577.

20.

Liu

, Wei

, Yuan

, et al., Topology selection for particle swarm optimization, Information Sciences 363 (2016), 154–173.

21.

Mahdavi

, Metaheuristics in large-scale global continues optimization: A survey, Information Sciences 295 (2015), 407–428.

22.

Yang

, Chen

, Li

, et al., Multimodal estimation of distribution algorithms, IEEE Transactions on Systems Man and Cybernetics 47(3) (2017), 636–650.

23.

Yang

, Chen

, Deng

, et al., A level-based learning swarm optimizer for large-scale optimization, IEEE Transactions on Evolutionary Computation 22(4) (2018), 578–594.

24.

Kennedy

and Eberhart

, Particle swarm optimization, In: Proceedings of ICNN’95 - International Conference on Neural Networks 4 (1995), 1942–1948.

25.

Chakraborty

, Genetic algorithm with fuzzy fitness function for feature selection, In: IEEE International Symposium on Industrial Electronics 1 (2002), 315–319.

26.

and Xia

, A tribe competition-based genetic algorithm for feature selection in pattern classification, Applied Soft Computing 58 (2017), 328–338.

27.

, Dong

and Sun

, Binary differential evolution based on individual entropy for feature subset optimization, IEEE Access 7 (2019), 24109–24121.

28.

Zorarpaci

and Ozel

, hybrid approach of differential evolution and artificial bee colony for feature selection, Expert Systems with Applications 62 (2016), 91–103.

29.

Wan

, Wang

, Ye

, et al., A feature selection method based on modified binary coded ant colony optimization algorithm, Applied Soft Computing 49 (2016), 248–258.

30.

Shunmugapriya

and Kanmani

, A hybrid algorithm using ant and bee colony optimization for feature selection and classification (ac-abc hybrid), Swarm and Evolutionary Computation 36 (2017), 27–36.

31.

Emary

, Zawbaa

and Hassanien

, Binary grey wolf optimization approaches for feature selection, Neurocomputing 172(8) (2016), 371–381.

32.

Abdel-Basset

, El-Shahat

, El-Henawy

, et al., A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection, Expert Systems with Applications 139 (2019), 112824.

33.

Aladeemy

, Adwan

, Booth

, et al., New feature selection methods based on opposition-based learning and self-adaptive cohort intelligence for predicting patient no-shows, Applied Soft Computing 86 (2020), 105866.

34.

Faris

, Mafarja

, Heidari

, et al., An efficient binary salp swarm algorithm with crossover scheme for feature selection problems, Knowledge Based Systems 154 (2018), 43–67.

35.

Eid

, Binary whale optimisation: an effective swarm algorithm for feature selection, International Journal of Metaheuristics 7(1) (2018), 6779.

36.

Hussien

A.G.

, Oliva

, Houssein

E.H.

, et al., Binary Whale Optimization Algorithm for Dimensionality Reduction, Mathematics 8(10) (2020), 1821.

37.

Sayed

, Khoriba

and Haggag

, A novel chaotic salp swarm algorithm for global optimization and feature selection, Applied Intelligence 48(10) (2018), 3462–3481.

38.

Mafarja

, Aljarah

, Faris

, et al., Binary grasshopper optimisation algorithm approaches for feature selection problems, Expert Systems with Applications 117 (2019), 267–286.

39.

Arora

and Anand

, Binary butterfly optimization approaches for feature selection, Expert Systems with Applications 116 (2019), 147160.

40.

Nguyen

, Xue

and Zhang

, A survey on swarm intelligence approaches to feature selection in data mining, Swarm and Evolutionary Computation 54 (2020), 100663.

41.

Chuang

, Tsai

and Yang

, Improved binary particle swarm optimization using catfish effect for feature selection, Expert Systems with Applications 38(10) (2011), 12699–12707.

42.

Vieira

, Mendonca

, Farinha

, et al., Modified binary pso for feature selection using svm applied to mortality prediction of septic patients, Applied Soft Computing 13(8) (2013), 3494–3504.

43.

Xue

, Zhang

and Browne

, Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms, Applied Soft Computing 18 (2014), 261–276.

44.

Moradi

and Gholampour

, A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy, Applied Soft Computing 43 (2016), 117–130.

45.

Nguyen

, Xue

, Andreae

, et al., Particle swarm optimisation with genetic operators for feature selection, In: IEEE congress on evolutionary computation, (2017), 286–293.

46.

Mistry

, Zhang

, Neoh

, et al., A micro-ga embedded pso feature selection approach to intelligent facial emotion recognition, Transactions on Cybernetics 47(6) (2017), 1496–1509.

47.

Tran

, Xue

and Zhang

, A new representation in pso for discretization-based feature selection, and Cybernetics 48(6) (2018), 1733–1746.

48.

Engelbrecht

, Grobler

and Langeveld

, Set based particle swarm optimization for the feature selection problem, Engineering Applications of Artificial Intelligence 85 (2019), 324–336.

49.

Qiu

, A novel multi-swarm particle swarm optimization for feature selection, Genetic Programming and Evolvable Machines 20(4) (2019), 503–529.

50.

Kamyab

and Eftekhari

, Feature selection using multimodal optimization techniques, Neurocomputing 171 (2016), 586–597.

51.

, Cheng

and Jin

, Feature selection for high-dimensional classification using a competitive swarm optimizer, Soft Computing (2016), 1–12.

52.

, Epitropakis

, Deb

, et al., Seeking multiple solutions: An updated survey on niching methods and their applications, IEEE Transactions on Evolutionary Computation 21(4) (2017), 518–538.

53.

, Feng

and Fan

, A novel multi-swarm particle swarm optimization with dynamic learning strategy, Applied Soft Computing 61 (2017), 832–843.

54.

Altman

, An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician 46(3) (1992), 175–185.

55.

Cheng

and Jin

, A social learning particle swarm optimization algorithm for scalable optimization, Information Sciences 291 (2015), 43–60.

56.

Cheng

and Jin

, A competitive swarm optimizer for large scale optimization, IEEE Transactions on Systems Man and Cybernetics 45(2) (2015), 191–204.

57.

Kennedy

and Eberhart

, A discrete binary version of the particle swarm algorithm, In: 1997 IEEE International Conference on Systems Man and Cybernetics 5 (1997), 4104–4108.

58.

Holland

, Genetic algorithms, Scholarpedia 7(12) (2012), 1482.

59.

Mafarja

and Mirjalili

, Whale optimization approaches for wrapper feature selection, Applied Soft Computing 62 (2018), 441–453.

A novel three layer particle swarm optimization for feature selection

Abstract

Keywords

1 Introduction

2 Background information and related works

2.1 Particle swarm optimization

3 The proposed approach

3.4 Time complexity analysis

4 Experimental results and analysis

4.1 Dataset

Table 2 The parameter settings Algorithm Parameter Value PSO c 1 2 c2 2 w [0.9 0.4] GA Crossover 0.8 Mutation 0.1 SSA Number of leaders of1 GWO a [2 0] CSO Social factor 0.1 BOA a 0.1 BOA c [0.01 0.25] WOA a [2 0]

Footnotes

Acknowledgments

References

Table 2
The parameter settings

Algorithm Parameter Value

PSO c ₁ 2

c₂ 2

w [0.9 0.4]

GA Crossover 0.8

Mutation 0.1

SSA Number of leaders of1

GWO a [2 0]

CSO Social factor 0.1

BOA a 0.1

BOA c [0.01 0.25]

WOA a [2 0]