Abstract
Cost-based feature selection is an important data preprocessing technique in classification problems. This paper focuses on a real case that the cost that may be associated with features is fuzzy number. First, a fuzzy transforming method is introduced to transform fuzzy cost-based feature selection problems into ones with interval number. Second, an effective feature selection algorithm based on interval multi-objective particle swarm optimization is proposed. In this algorithm, a risk coefficient that decision makers are willing to bear when delete any solution is used to update the archive. Also, an interval crowding distance measure is adopted to evaluate the distribution of non-dominated particles. Finally, feasibility of the presented algorithm is validated by simulation results. The results show that our algorithm is capable of generating excellent approximation of the true Pareto front.
Introduction
Feature selection (FS) is one of the most important issues in the field of data mining and pattern recognition [3, 10]. Its purpose is to select a subset of features from a larger of features, which can reduce the complexity of the classifier for a successful classification task [8]. Due to effective global search capability, recent years heuristic-based search methods have been used to solve FS, such as genetic algorithm [19], ant colony optimization [18], differential evolution [1, 27], etc.
However, those studies usually assume that the data are already stored in datasets and are available without charge. As we all know, data are not free in real-world applications. There are various costs, such as money, time or other resources that are needed to obtain feature values of objects [6, 15]. Although there have been a few attempts to deal with this cost-based FS problem [6, 25], all the existing approaches assume that the cost associated to any feature is precise. However, in many practical cases, it is not easy or even be impossible to exactly specify the cost for part features. On the contrary, decision makers can only give their fuzzy values, such as “about 0.9” or “high”. Because some coefficients become a fuzzy number rather than precise values, the existing algorithms are hardly applicable for solving fuzzy cost-based FS.
Particle swarm optimization (PSO) is a global search method inspired by the feeding behavior of birds or fish [9]. Now it has been used in the feature selection problem [4, 24], as well as in many other complicated problems [16, 28]. However, there is no work on fuzzy cost-based feature selection problems.
The overall goal of this paper is to solve the kind of feature selection problems by PSO. The main contributions are as follows: 1) a fuzzy transforming method is introduced, by which the fuzzy cost-based feature selection problems are converted into ones with interval number; 2) an effective feature selection algorithm based on multi-objective particle swarm optimization is proposed; 3) some effective operators, such as the interval probability dominance and the interval crowding distance measure, are used to improve the capability of PSO.
The proposed method
This paper uses a binary string to represent a solution for the feature selection problem.
For a set of data with D features, if a bit is 1, the corresponding feature
is chosen into the feature subset; otherwise, it is not. Then, the decision variable of this
problem can be described as follows:
Without loss of generality, we adopt a trapezoidal fuzzy number to express the cost associated with the i-th feature. Where a i and d i are the down corner points of the trapezoid, and b i and c i are the upper corner points of the trapezoid. The bigger the fuzzy number is, the higher the cost value of the corresponding feature. Thus, the cost values of all D features can be represented as follows:
For any feature subset that contains multiple features, this paper adopts the maximum value
among all the cost values of selected features to express the cost of the feature subset,
namely:
Next, we consider the classification error rate of a solution, i. e., the second objective function f2 (X). There are many effective classifiers, such as the Support Vector Machine [2, 23]. Due to easy implementation, this paper adopts the leave-one-out cross-validation (LOOCV) of the one nearest neighbor (1-NN) classifier to evaluate the classification error rate of a solution. This method has successfully been used in [14, 24].
Based on the above discussion, a fuzzy cost-based feature selection problem can be
transformed into an interval bi-objective optimization problem, as follows:
PSO adopts the simple speed-displacement model, regards each individual in the group as a
particle without size in the search space. Supposing the location of the i-th particle is
P
i
= (pi1,
pi2, …,
p
iD
), the velocity of this particle is
V
i
= (vi1,
vi2, …,
v
iD
), the optimal position that is found
by this particle by now (i.e., local leader) is
Lbest
i
= (lbesti1,
…, lbest
iD
), the optimal location find by the
swarm by now (i.e., global leader) is
Gbest = (gbest1,
gbest2, …,
gbest
D
). Then, the new position of this
particle is calculated as follows:
where t is the number of iterations, N is the size of the swarm, c1 and c2 are acceleration coefficients, r1 and r2 are random numbers between [0, 1]; w is the inertia weight.
Since the objective values of the particles are also not a precise vector, the traditional PSO are not applicable to the multi-objective feature selection problem with the interval coefficient. This paper proposes an improved PSO-based multi-objective feature selection algorithm.
Encoding of particle
This paper adopts the probability-based encoding strategy proposed in the previous work
[24]. In this strategy, an individual is
represented as a vector of probability:
Where the probability pi,j > 0.5 means that the j-th feature will be selected into the i-th feature subset.
In order to update the archive, we introduce a risk coefficient μ that decision makers are willing to bear when delete any solution. This risk coefficient determines the dominance degree among solutions saved in the archive. In each iteration, only new particles that are dominated with less than the probability of μ can get into the archive. Herein, the probability-based dominance relationship proposed in [13] is used to compare two solutions with interval objective values. Generally, the higher the value of μ is, the more the number of the solution satisfies the requirement of the decision maker; the smaller the value of μ is, the better is the convergence of the solutions obtained by an algorithm.
When the number of elements in the archive is more than its maximum capacity N, then elements with the worst distribution are deleted. In traditional multi-objective optimization, the crowding distance has been commonly used to evaluate the solutions’ distribution and, most of all, it is parameter-free. Aimed at a multi-objective optimization problem with interval fitness, in the previous work we proposed an improved crowding distance calculation method, which was called interval crowding distance [29].
Taking an element a
I
in the archive as an
example, its interval crowded distance can be expressed as:
Where represents the relative distance between interval and interval . represents the overlap degree between two interval numbers.
Based on the updated method above, on the one hand, when the particle P i (t + 1) is dominated by elements in the archive Ar (t + 1) with the probability μ> 0, we cannot assume that P i (t + 1) is poor, but that it is inferior to other elements with the probability μ. Therefore, P i (t + 1) should also be saved into the archive. On the other hand, due to the computational complexity of comparing probability-dominance relations, it is impractical to save those particles. In order to balance the two aspects, this paper introduces the tolerance coefficient of the decision maker μ.
The global leader of a particle, Gbest, is the best global position currently found by neighbors of this particle. In this paper we select the global leader of the particle from the external archive based on the distribution density of elements in the archive. Because the objective values of elements in the archive are interval vectors, the interval crowding distance measure [29] is used to measure their distribution density.
The local leader, Pbest, is the best position achieved by a particle
itself so far. Supposing that the old local leader of the particle
P
i
(t) is
Lbest
i
(t), the new
local leader is updated as follows:
where P i (t + 1) ≻ Lbest i (t) represents that P i (t + 1) dominates Lbest i (t); P i (t + 1) ||Lbest i (t) represents that P i (t + 1) and Lbest i (t) are non-dominated each other.
Implement of the proposed algorithm
Combining the above improved operators into PSO, Algorithm 1 shows the detailed steps of the proposed PSO-based multi-objective feature selection algorithm.
Comparison results and analysis
To validate the proposed PSO-based multi-objective feature selection algorithm, we compared it with two typical fuzzy multi-objective optimization algorithms: FPDMOPSO [17] and MFACPSO [20]. The three algorithms used the same parameter settings: the size of the particle = 30, the archive capacity = 30, two learning factors, c1 = c2 = 2.0, the maximum number of generation = 70, the inertia weight = 0.6. In addition, the value of μ in the proposed algorithm is set to 0.1. Several well-known real-world datasets, including Glass, Vehicle, WDBC and Ionosphere are selected. These datasets are available in the UCI Repository (Http://www.ics.uci.edu/~mlearn/MLRepository.htm).
Without loss of generality, we adopt the triangle function to describe the fuzzy cost of a feature. The vectors E1 and E2 show the midpoints and widths of those triangle fuzzy functions, respectively. Note that the cost vector of each dataset is composed of elements of the vectors E1 and E2, which begins at the first element, 0.52 and 0.05, until the last feature is assigned.
Furthermore, the two-set coverage (SC) metric [5] is employed to compare the dominance degree of the two algorithms. However, this paper replaces the traditional Pareto dominance relationship with the interval dominance probability in order to calculate the dominance degree between two solutions. To evaluate the distribution of solutions throughout the Pareto optimal set, the spacing metric (SP) [5] is adopted.
The three algorithms are applied to the four data sets, i.e., Glass, Vehicle, WDBC and Ionosphere. For each data set, each of these algorithms is running 20 times independently. Tables 1 and 2 show SC and SP values obtained by the three compared algorithms.
From Table 1, we can see that for the data set Glass with a small number of features, the proposed algorithm achieves similar SC results when compared to FPDMOPSO and MFACPSO. However, for the data sets Vehicle, WDBC and Ionosphere, the SC average obtained by the proposed algorithm is better than that of FPDMOPSO. Taking Vehicle as an example, over 38% of solutions obtained by FPDMOPSO are dominated by those of the proposed algorithm. Also, the proportion of MFACPSO dominated by the proposed algorithm is more than 19%. However, the proportions of the proposed algorithm dominated by FPDMOPSO and MFACPSO are both less than 3%. Hence, for the four data sets, the proposed algorithm shows the best convergence when compared with FPDMOPSO and MFACPSO.
From Table 2 we can see that for the data sets Glass, Vehicle and WDBC, all the Pareto solution sets obtained by the proposed algorithm have the best distribution when compared with FPDMOPSO and MFACPSO. For the data set Ionoshpere, the proposed algorithm also has the second best distribution, whose SP average is slightly bigger than that of FPDMOPSO. Overall, the proposed algorithm also has good distribution when compared with FPDMOPSO and MFACPSO.
Furthermore, Fig. 1 displays the Pareto fronts produced by the three compared algorithms. In each figure, the horizontal axis means the classification error rate, and the vertical axis represents the cost values of the feature subset. We can see from Fig. 1 that, for the dataset Vehicle, the proposed algorithm and FPDMOPSO are better than MFACPSO. Both the proposed algorithm and FPDMOPSO find the boundary point with 0 cost value. For the dataset WDBC, the proposed algorithm and FPDMOPSO both have good Pareto fronts, which dominate pat solutions of MFACPSO. For the dataset Ionosphere, compared with the proposed algorithm and FPDMOPSO, the solutions produced by MFACPSO are over-concentrated in the left part of the Pareto fronts (f2 < 0.07).
Conclusion
Aimed at the fuzzy cost-based feature selection problems, this paper proposed an improved multi-objective particle swarm algorithm. Applying the proposed algorithm to several data sets, and comparing with two fuzzy multi-objective particle swarm optimization algorithms, experimental results showed that the proposed algorithm has remarkable superiority on the convergence and the distribution.
Footnotes
Acknowledgments
This work was jointly supported by the National Natural Science Foundation of China (No. 61473299, 61473298, 61573361), Outstanding Innovation Team of China University of Mining and Technology (No. 2015QN003), and the Natural Science Foundation of Jiangsu Province (No. BK20130207).
