Abstract
Rough set reduction has been used as a momentous preprocessing tool for machine learning, pattern recognition, and big data analysis. It is well known that the traditional rough set theory can only handle features with categorical values. Therefore, a neighborhood rough set model is introduced to deal with numerical data sets. Classical greedy search strategies to neighborhood rough set reduction have often failed to achieve optimal reducts. Many researchers shift to swarm intelligence algorithms, such as particle swarmoptimization, ant colony optimization and fish swarm algorithm, giving a better solution but with a large cost of computational complexity. It is beneficial for exploring fast and effective feature reduction algorithms. In this paper, we firstly introduce a knowledge representation, named power set tree (PS-tree). It is an order tree enumerating all the subsets of a feature set. Each node of the PS-tree is a possible feature reduct. Furthermore, we develop a tree search framework for reduction question solving by the PS-tree. We present four tree search methods based on PS-tree, which are depth-first, breadth-first, uniform-cost and A* search methods. The effectiveness of these four proposed tree search methods are tested on some UCI data sets. Finally, we compare the A* search with traditional greedy search and swarm intelligence methods. The comparisons show that the selected features by A* search attain good reduction rates and simultaneously maintain the classification accuracy of whole features.
Introduction
Feature reduction is a quite useful data preprocessing technique, aiming to select an important feature subset from the original feature set by deleting some redundant information while persisting a high classification accuracy. It is also mentioned as a semantic-preserving dimension reduction [1], attribute reduction [2] and feature selection [3], implemented in many fields, such as pattern recognition [4, 5], big data [6], machine learning [7, 8], data mining [9, 10], etc. Feature reduction is a significant task to achieve the essence of noisy, irrelevant or misleading features before a pattern recognition process. By eliminating these inconsequential features, machine learning techniques such as face recognition [11] and image classification [12] can improve their time complexities and classification accuracies.
Rough set theory (RST), proposed by Pawlak [13], has been widely used in machine learning fields. It is an impactful feature selection tool to eliminate the noisy and redundant information and discover important data dependencies by a structural method. The classical reduction algorithms are established on the equivalence relation and only compatible for categorical data sets. They need to scatter the records when processing continuous numerical data, but this will result in losing of the inherent structure information, so the reducts of a numerical data set are strongly related with the methods of discretization. To overcome this drawback, many extensions of Pawlak rough set model and their corresponding definitions on feature reduction have been presented, including the fuzzy rough sets [14, 15], soft sets [16], tolerance approximate models [17, 18], similarity rough approximate models [19], dominance approximation relation models [20, 21], covering rough sets [22, 23], probabilistic rough sets [24], variable precision rough sets [25] and neighborhood rough sets [26–29]. Among all the extensions, neighborhood rough sets are widely used in many fields, such as medical diagnosis [30], automatic image annotation [31], outlier detection [32], etc. The neighborhood rough sets are suitable for both numerical and categorical data sets via a δ-neighborhood parameter, which will maintain the inherent structure information of data sets in real spaces.
Many feature reduction algorithms for neighborhood rough set model have been presented [27, 34]. The general algorithm of finding the minimal reduct is to select a reduct with minimal cardinality from all the possible reducts by an exhaustive tactic. However, finding the minimal reduct is proved to be an NP-hard problem. Since the time consumption of generating all reducts is an exponential growth, many heuristic search algorithms for neighborhood feature reduction are proposed. The heuristic method is a greedy search strategy. It is employing an incremental greedy heuristic information to select features. Neighborhood dependency is usually adopted as heuristic in the process of feature reduction. For example, Hu proposed a neighborhood reduction algorithm employing the neighborhood dependency as the guiding heuristic [27]. Liu focused on the time efficiency of neighborhood feature reduction, by the sorting of buckets to calculate the positive region of neighborhood rough sets [33]. We developed an entropy-based uncertainty measure for the neighborhood system, which is adopted as a heuristic information in reduction algorithm [34]. However, the greedy search strategy is often immersed in a local solution with a non-minimal feature reduct, since there is no guarantee of a perfectheuristic.
Consequently, many scholars have focused on the evolutionary computation for rough set feature reduction, which is a stochastic search strategy. Wrblewski proposed a genetic algorithm (GA) with a random search for finding the minimal reduct [35]. However, Wroblewski’s algorithm is a time-consuming cost. Elalami employed a genetic algorithm with the aim to effectively achieve a robust solution for finding an optimal relevant feature subset [36]. Chenet al. presented an efficient clustering algorithm based on RST and GA [37]. Jensen and Shen firstly introduced the ant colony optimization (ACO) to rough set reduction [1]. Wang et al. proposed a feature selection method with RST and particle swarm optimization (PSO) [38]. We proposed a feature reduction method with RST and fish swarm algorithm (FSA) [39]. The current RST-based feature reduction methods with evolutionary computation are mainly to deal with categorical data sets. Furthermore, evolutionary computation has the ability of global optimizing with achieving a more robust solution, but at the cost of large computational complexity.
The two key factors to feature reduction are representation and search. A knowledge representation called power set tree (PS-tree) [40] is introduced. It is an orderly tree structure, which nodes represent feature sets. An N-dimensional feature set has 2 N subsets. Each subset is mapped to a node of the PS-tree. The second central element of feature reduction is the notion of searching in the PS-tree. Therefore, we develop a tree search framework for finding a neighborhood feature reduct by traversing the PS-tree. The search framework includes four tree search methods based on the PS-tree, which are depth-first, breadth-first, uniform-cost and A* search methods. The depth-first and breadth-first are fixed-order methods resulting in poor time performance for their blind searches. The disadvantages of blind searches are overcome by best-first search methods, including uniform-cost and A* search. The uniform-cost is only using cost in the past as a guiding information, while the A* search is simultaneously considering cost and heuristic information for the searching process. The A* search is a prominent guided best-first search for the rough set reduction. Finally, the effectiveness of these four proposed tree search methods are tested on some UCI data sets under the PS-tree structure for neighborhood reduction question. The comparisons of the four search methods are also implemented and evaluation results show that the A* search attains good reduction effectiveness and time performance for the minimal reduction question solving under neighborhood system conditions. We also compare the A* search with traditional greedy search and swarm intelligence methods. The experimental results show that the selected features by A* search attain good reduction effectiveness and simultaneously maintain the classification accuracy of whole features.
The rest of our paper is organized in the follows. In Section 2, some concepts of neighborhood rough sets and feature reduction are presented. Section 3 introduces a tree structure named PS-tree, and discusses the PS-tree-based search framework. In Section 4, some experiments are carried out on standard data sets and the experimental results are analysed. Some conclusions and further works are proposed in Section 5.
Feature reduction based on neighborhood rough sets
The classical rough set feature reduction effectively operates categorical data sets. But for real-valued data sets, a discretization should be performed before further processing. However, the discretization will bring about information loss in the feature reduction. As for neighborhood rough set feature reduction, the discretization is discarded to avoid the information loss. This section will introduce several necessary concepts on a neighborhood rough set model and its reduction. Some further details can be found in references [27, 28].
A similarity relation and neighborhood classes
The theory of neighborhood rough sets is established on a similarity relation which uses a distance metric function. The similarity relation is generated by a neighborhood parameter for evaluating two objects.
The function f (x, y) is a distance measure. It satisfies the following conditions.
(1) f (x, y) ≥0: It is non-negative;
(2) f (x, y) =0: If and only if x = y;
(3) f (x, y) = f (y, x): It is symmetric;
(4) f (x, y) + f (y, z) ≥ f (x, z): It is a triangular inequality.
A distance function is a metric on two samples. Each sample has some features, including real values. As we know, three classic metrics are widely used, which are Manhattan, Euclidean and Chebychev distances. Let x and y are two samples with n-dimensional features. A general distance metric is defined by:
where v (x, a i ) is the value of sample x in the ith dimension, and p is a parameter. The different parameter values represent different distances. If p = 1, the distance metric is named Manhattan distance. If p = 2, the distance metric is named Euclidean distance. And if p =∞, the distance metric is named Chebychev distance. Since the neighborhood relation satisfies symmetry and reflexivity, it is a similarity relation. Specially, if the neighborhood parameter δ is equal to 0, the neighborhood relation is an equivalence relation. It is the case of being suitable for categorical data sets.
Let ND = (U, C ∪ D, δ) be a neighborhood decision system, the upper and lower approximations for the neighborhood decision system can be presented in the following.
And the two concepts of upper and lower approximations with respect to decision attribute D are defined in the following.
Similar to the definition in rough set [13], the neighborhood lower approximation of decision feature is an union of the neighborhood lower approximation of each decision class. It is denoted by POS B (D) δ , also named the neighborhood positive region of the decision. Then we can present the definition of dependency in neighborhood decision system similar to classic rough set model.
In classic rough set knowledge reduction, the dependency represents a proportion of discernible knowledge to the entire knowledge base. As to the neighborhood systems, the neighborhood dependency is a proportion of discernible neighborhood classes to the entire neighborhood class set. Obviously, the neighborhood dependency γ B (D) δ ∈ [0, 1].
(1) γ B (D) δ = γ C (D) δ ;
(2) ∀a ∈ B, γB-{a} (D) δ < γ B (D) δ .
Based on the above definition of neighborhood reduction, a reduction algorithm is proposed by Hu [3]. It starts from an empty as a candidate, and then choose a next best feature to the candidate set. The measure for judging a best feature is the neighborhood dependency. When the dependency is not changed after adding a new feature, the algorithm terminates.
Feature reduction is a problem solving, which can be handled by artificial intelligence (AI) methods. The focused concepts of AI methods are a state space and operators transforming one state to another. The one of key aspect of feature reduction is how to represent the problem. The another key aspect is to perform an intelligent search for finding the minimal reduct. In this article, we introduce a knowledge representation for the feature reduction question, named power set tree (PS-tree) [40]. It is an order tree to illustrate the state space. The PS-tree enumerates all the subsets of total feature set, which represent the possible solutions. We develop four search methodologies based on the PS-tree, and focus on efficiencies of cost search and heuristic search.
Definition of PS-tree
Let ND = (U, C ⋃ D, δ) be a neighborhood decision system, where C = {c1, c2, . . . , c
n
}. Suppose that P is a power set of C and PT is a tree structure of P. We say PT is a PS-tree for ND if it satisfies three properties in the follows.
The root of PT is a tuple <PE, PN>, where PE =∅ and PN = n. It means that the root has no elements and n children. Let PTC = {cn-PN+1, cn-PN+2, . . . , c
n
}, if PN = 0, then PTC =∅. Let PT1, PT2,..., PT
n
be the n children of root PT and m = card (PE) +1. PT1 has m elements and n - 1 children; PT2 has m elements and n - 2 children;...; PT
n
has m elements and 0 child. Elements of PT1 are expanded from PT by adding the 1st element in PTC; elements of PT2 are expanded from PT by adding the 2nd element in PTC;...; elements of PT
n
are expanded from PT by adding the nth element in PTC. Each child of the root is also a PS-tree.
An illustration of the PS-tree of a feature set {a, b, c, d} is shown in Fig. 1. The nodes of PS-tree list all the possible reducts of {a, b, c, d}. Each node includes two elements, which are the feature subset in the left part of tuple and the number of expanded children in the right part of tuple. The letter labeled on the top of tuple represents an adding feature in the next expanding step. Each node is a possible minimal reduct.
The PS-tree for {a,b,c,d}.
The PS-tree is a data structure for caching unordered elements of a power set. It is an effective tool for exploring new solutions to feature reduction question. During the test on feature reduction, we employ the PS-tree for finding best search strategies, which improve the efficiencies of the time and space complexities.
The two important elements of feature reduction are representation and searching method. The PS-tree can be used as a representation tool for storing state space. A node of the PS-tree maps to a possible reduct, while an arc of the PS-tree represents the transformation to a possible better reduct. The PS-tree is a data structure of state space. Each state is a possible reduct. The other important element of reduction problem solving is the searching method that can find a cheapest path in the PS-tree. The classic AI searching methods include depth-first, breadth-first, uniform-cost and A* search.
Depth-first search algorithm
Graph is a common data structure, while depth-first search is a generally used algorithm for traversing a graph. It is a blind search strategy for exploring a solution. The PS-tree-based feature reduction algorithm with depth-first search (also called PS-DFS) is listed in the follows.
Input: a neighborhood decision system ND = (U, C ⋃ D, δ).
Output: a feature reduct R.
(1) Let S . PE =∅, S . PN = |C|; Initial R = C, n = |C|; Put the start node S on a queue named OPEN.
(2) Until OPEN is empty or it reaches the halting time do
(2.1) Remove the first node N from OPEN and put it on a queue named CLOSED.
(2.2) If γN.PE (D) δ = = γ C (D) δ then
(2.2.1) If |N . PE| < |R|, then R = N . PE
(2.2.2) Goto step (2)
(2.3) If |N . PE| = = |R|-1, then goto step (2)
(2.4) If N . PN = =0 then goto step (2), else PTC = {Cn-N.PN+1, Cn-N.PN+2, . . . , C n }
(2.5) For i = 1 to N . PN
(2.5.1) Select the ith feature PTC i ∈ PTC, let N i . PE = N . PE ∪ {PTC i } and N i . PN = N . PN - 1
(2.5.2) Add N
i
in the
(3) Output R and the search path generated by the pointers associated with R.
The algorithm PS-DFS implements a depth-first search by putting the newly generated states at the front of the queue. In practice, the depth-first search method is usually foolish for its no heuristic. Assume that the reduct is {b, c}. The first found reduct is {a, b, c}, but it is a super reduct. An illustration of the depth-first search process is shown on Fig. 2.
A depth-first search.
Another blind search method is the breadth-first search, which seems a better search. In fact, the breadth-first search is a complete algorithm that guarantees to find the minimal reduct, as it expands all nodes. The PS-tree-based feature reduction algorithm with breadth-first search (also called PS-BFS) is listed in the follows.
Input: a neighborhood decision system ND = (U, C ⋃ D, δ).
Output: a feature reduct R.
(1) Let S . PE =∅, S . PN = |C|; Initial R = C, n = |C|; Put the start node S on a queue named OPEN.
(2) Until OPEN is empty or it reaches the halting time do
(2.1) Remove the first node N from OPEN and put it on a queue named CLOSED.
(2.2) If γN.PE (D) δ = = γ C (D) δ then
(2.2.1) If |N . PE| < |R|, then R = N . PE
(2.2.2) Goto step (2)
(2.3) If |N . PE| = = |R|-1, then goto step (2)
(2.4) If N . PN = =0 then goto step (2), else PTC = {Cn-N.PN+1, Cn-N.PN+2, . . . , C n }
(2.5) For i = 1 to N . PN
(2.5.1) Select the ith feature PTC i ∈ PTC, let N i . PE = N . PE ∪ {PTC i } and N i . PN = N . PN - 1
(2.5.2) Add N
i
in the
(3) Output R and the search path generated by the pointers associated with R.
The red route in Fig. 3. is a search path by the breadth-first search method starting at PS-tree root and going to the goal.

A breadth-first search.
Uniform-cost search is a graph search technique with traversing a graph or tree according to the past cost for guiding a route. The algorithm begins at the root to the goal with a minimum cost. We can label each node of PS-tree with a weight to represent a cost. The number of children of a PS-tree node is marked as an expanding weight. The level of a PS-tree is labeled as a layer weight. The root of PS-tree is the first level with marking n layer weight. The second level of PS-tree is marked as n-1 layer weight. The last lever of PS-tree is marked as 0 layer weight. The cost of a node in PS-tree is reversing the sum of expanding weight and layer weight. The cost is the value of 2n - expanding - layer. The green words on Fig. 4. illustrate the cost of each node in a PS-tree.

A PS-tree with uniform-cost.
The PS-tree-based feature reduction algorithm with uniform-cost search (also called PS-UCS) is listed in the follows.
Input: a neighborhood decision system ND = (U, C ⋃ D, δ).
Output: a feature reduct R.
(1) Let S . PE =∅, S . PN = |C|; Initial R = C, n = |C|, g (S) =0; Put the start node S on a queue named OPEN.
(2) Until OPEN is empty or it reaches the halting time do
(2.1) Remove the node N from OPEN whose g (N) is the smallest and put it on a queue named CLOSED.
(2.2) If γN.PE (D) δ = = γ C (D) δ then
(2.2.1) If |N . PE| < |R|, then R = N . PE
(2.2.2) Goto step (2)
(2.3) If |N . PE| = = |R|-1, then goto step (2)
(2.4) If N . PN = =0 then goto step (2), else PTC = {Cn-N.PN+1, Cn-N.PN+2, . . . , C n }
(2.5) For i = 1 to N . PN
(2.5.1) Select the ith feature PTC i ∈ PTC, let N i . PE = N . PE ∪ {PTC i } and N i . PN = N . PN - 1
(2.5.2) Compute g (N i ) =2n - |N i . PE| - N i . PN
(2.5.3) Add N i onto the the OPEN list and associate to N i a pointer back to N.
(3) Output R and the search path generated by the pointers associated with R.
The red route in Fig. 5. is a search path illustrating an uniform-cost search starting at PS-tree root and going to the goal.

An uniform-cost search.
The A* algorithm is one of famous graph search methods. It selects next node to be expanded according to the formula f (N) = g (N) + h (N), where the g (N) represents the total costs as traveling so far, and h (N) represents the heuristic information in the traveling process. f (N) is also called an evaluation function. When h (N) equals 0, it degenerates into an uniform-cost search. Therefore, it only considers the past cost to guide the search without involving heuristic information. When g (N) equals 0, it degenerates into a heuristic search. It searches for a best solution only considering the heuristic information and ignoring the past costs.
A PS-tree with cost and heuristic information is illustrated in Fig. 6. The green words in Fig. 6. are cost and heuristic information. The cost is reversing the sum of expanding weight and layer weight, while the heuristic information is the percentage of children features belonging to goal set to the whole children nodes. If there is no child, the heuristic information is the percentage of features belonging to goal set to itself node. Suppose that the goal set is {b, c}. As for a node < {b} , 2>, its children are < {b, c} , 1> and < {b, d} , 0>. In its children nodes, b, c and b are belonging to the goal set, while d is not belonging to the goal set. So, the heuristic information of node < {b} , 2> is measured by 3/4. With regard to childless node < {b, d} , 0>, b is belonging to the goal set, while d is not belonging to the goal set. So, its heuristic information is measured by 1/2. Particularly, the goal set is not existing in actual feature reduction experiments. Hence, the heuristic information is measured by the feature significance based on neighborhood dependency.

A PS-tree with cost and heuristic information.
The PS-tree-based feature reduction algorithm with A* search (also called PS-AS) is in the follows.
Input: a neighborhood decision system ND = (U, C ⋃ D, δ).
Output: a feature reduct R.
(1) Let S . PE =∅, S . PN = |C|; Initial R = C, n = |C|; Put the start node S on a queue named OPEN and compute f (S) = g (S) + h (S).
(2) Until OPEN is empty or it reaches the halting time do
(2.1) Remove the node N from OPEN whose f (N) is the smallest and put it on a queue named CLOSED.
(2.2) If γN.PE (D) δ = = γ C (D) δ then
(2.2.1) If |N . PE| < |R|, then R = N . PE
(2.2.2) Goto step (2)
(2.3) If |N . PE| = = |R|-1, then goto step (2)
(2.4) If N . PN = =0 then goto step (2), else PTC = {Cn-N.PN+1, Cn-N.PN+2, . . . , C n }
(2.5) For i = 1 to N . PN
(2.5.1) Select the ith feature PTC i ∈ PTC, let N i . PE = N . PE ∪ {PTC i } and N i . PN = N . PN - 1
(2.5.2) Compute g (N i ) =2n - |N i . PE| - N i . PN and compute h (N i ) using a heuristic estimate.
(2.5.3) Compute f (N i ) =0.5 * (g (N i )/2n) +0.5 * (1 - h (N i )).
(2.5.4) Add N i in the OPEN queue and associate to N i a pointer back to N.
(3) Output R and the search path generated by the pointers associated with R.
The red search route in Fig. 7. illustrates the A* search starting at PS-tree root and going to the goal. Moreover, the evaluation function is calculated by
A PS-tree based A* search.
f (N i ) = α * (g (N i )/2n) + (1 - α) * (1 - h (N i )),
where N i is a node of PS-tree, and n is the cardinality of full features set. Here, g is a cost of a past search and h is a heuristic information for a future search. α is a parameter corresponding to the importance of cost and heuristic information. In our experiments, we assume that the cost is same importance as the heuristic information and set α = 0.5.
As for depth-first search algorithm that is illustrated in Fig. 2, it travels 8 steps for finding the reduct. While in breadth-first search algorithm showed in Fig. 3, it also passes 8 steps. When we see Fig. 5 that represents the uniform-cost search, it reduces one step. But in Fig. 7 that is a search map of the A* search, it travels only 3 steps for find the solution. So, we can draw a result that the A* search is the best method than other search tactics.
As we know, the classical rough set model is effective for feature selection in categorical data sets. Moveover, the neighborhood rough set model is applicable to deal with numerical data sets. In this paper, we present a tree data structure for neighborhood rough set reduction and design four AI search methods based on the tree representation. In order to test the effectiveness of proposed AI search methods, we obtain some data sets from the machine learning data repository, University of California at Irvine. The data sets are listed in Table 1. On the one hand, we present running times of finding minimal reducts with different AI search methods including depth-first (PS-DFS), breadth-first (PS-BFS), uniform-cost (PS-UCS) and A* (PS-AS) algorithms on eight data sets. Considering the breadth-first search is a complete solution results to high time cost, these used data sets are small and medium scales. Aim to verifying an excellent ability of A* algorithm for finding the minimal reduct, we compare the running time of A* algorithm with that of swarm intelligence algorithms ACO [1] and FSA [39]. On the other hand, we defined a concept of reduction rate for evaluating the efficiency of reduction. And we also compare the A* algorithm with the conventional neighborhood rough set reduction algorithm [27] on reduction rate. Finally, we discuss the classification accuracy of selected features by A* search-based neighborhood feature reduction algorithm with different neighborhood parameter values. The values of experimental data are standardized to interval [0, 1]. All the algorithms employed in our experiments are using the Euclidean distance to granulate the data sets.
Data sets used in experiments
Data sets used in experiments
Experimental comparisons with different AI search methods
Experimental comparisons with different AI search methods
According to the experimental results, they show that the algorithms find the same minimal reducts in most data sets. As for bigger data sets such as Seeds, Segment and WDBC, they find different minimal reducts. Because minimal reducts are not unique, they are more quantitative existing in big data sets than in small data sets. It also can be seen that the running times of algorithms PS-DFS and PS-AS outperform that of other two algorithms in most situations. The running time of algorithm PS-DFS is shorter than that of algorithm PS-AS in smaller data sets. But for bigger data sets, the algorithm PS-AS has shorter running time than the algorithm PS-DFS. In a word, the performance of PS-AS is the best of four search algorithms.
Experimental comparisons with swarm intelligence methods
Experimental comparisons with swarm intelligence methods
The minimal reduct is the one which cardinality is smallest among all the reducts. From the experimental results, it can be seen that the cardinalities of selected features are not varied on a data set by different methods ACO, FSA and PS-AS. This means that the three feature selection methods can achieve the minimal reducts. The selected features are the same results for Ecoli and Glass. But for other six data sets, the selected features are different under different feature selection methods. Since the swarm intelligence algorithms search reducts from stochastic initial positions, they may find different minimal reducts. As for the comparisons of running time, the algorithm PS-AS is the best of the three methods. The running time of FSA is litter better than that of ACO for Ecoli, Fertility, Seeds and Segmentation. But for other four data sets, the running time of ACO is shorter than that of FSA. Since the ACO and FSA are both swarm intelligence algorithms, they have a close performance on the running time.
In the next experiments, we adopt a concept named reduction rate for evaluating the effectiveness of reduction, which is defined as follow:
where |R| is the number of selected features, also called the cardinality of R, |C| is the cardinality of all features. The reduction rate represents the degree of redundancy. The redundancy is lower, while the reduction rate is bigger.
In the reduction evaluation experiments, we will use comparable experiments to evaluate the reduction performance on Hu’s algorithm [27] and PS-AS with different values of neighborhood parameter on two data sets, which are Fertility and Wine. The value of neighborhood parameter is very important for neighborhood feature selection. It decides the granularity of data manipulation. Therefore, we take the varies of neighborhood parameter in the neighborhood feature reduction process. The value of neighborhood parameter is changed from 0.05 to 1 with step 0.05. We evaluate the reduction performance by calculating relevant reduction rate. To graphically illustrate the reduction performance with different values of neighborhood parameter, we take the neighborhood parameter δ as horizontal coordinate and the reduction rate as vertical coordinate. The experimental results are illustrated in Fig. 8.
Reduction rate with different neighborhood parameter values.
The neighborhood parameter is a granular threshold for numerical data. The granule is thinner as the neighborhood parameter is smaller. In another word, the granule is rougher as the neighborhood parameter is bigger. It can be observed from Fig. 8 that the reduction rates of both Hu’s algorithm and PS-AS are decreasing with the neighborhood parameter becoming bigger in most situations. That means the redundancy of features increases as the neighborhood parameter becoming bigger in most situations. Furthermore, we can see that the reduction rates of PS-AS are bigger than that of Hu’s algorithm in all data sets. That means the reduction performance of PS-AS is better than that of Hu’s algorithm.
In this subsection, we use the SVM machine learning algorithm to extract rules from the raw features and the selected features for rule negotiation in classification. We apply ten-fold cross validation to estimate the classification. We compare the classification accuracy of selected features by our proposed algorithm PS-AS with that of original whole features. As different feature subsets may produce different classification performance, we change the neighborhood parameter δ varying from 0.05 to 1 with step 0.05 to get different feature reduction subsets, and then evaluate the selected feature subsets by a SVM classify. To graphically illustrate the classification performance with different neighborhood parameters, we take the neighborhood parameter δ as horizontal coordinate and the classification accuracy as vertical coordinate. The experimental results are illustrated in Fig. 9.
Classification accuracy with different neighborhood parameter values.
In Fig. 9, we find that the classification accuracies of selected features can maintain the classification performance of whole original features in most situations. When the δ equals 0.65 in Fertility or 0.7 in Glass, it achieves a better performance by our algrithm. It shows that the neighborhood based feature selection is able to find some good features for classification and efficiently delete the redundant and irrelevant features from the original data. But we can also find that the classification performances are slightly weaker than the original data for most of the features are deleted in the reduced subspaces. We can specify a proper neighborhood parameter to avoid this problem.
The neighborhood rough set model and its reduct have been proved useful in many machine learning, data mining applications. As the big data processing has became the popular trend in machine learning and data mining, an efficient feature reduction algorithm for neighborhood rough sets is desired. The traditional technique often fails to find minimal feature reducts, since there is no perfect heuristic for the guarantee of optimality. When the minimal feature reduct is required, other search methods must be explored. This paper has introduced a representation PS-tree for neighborhood feature reduction. Furthermore, we propose four search methods based on the PS-tree for finding a minimal reduct. Our methods have overcome the disadvantage of conventional greedy searching approach to feature reduction. The best search algorithm of the four methods is A* search. The experimental results show that the A* search algorithm can improve the reduction rate and is more possible to find the minimal feature subset. The PS-tree is an order tree revealing the question space. Hence, we can develop some promising search strategies in the future. This will be applied to the problems of neighborhood rough set dimensionality reduction and feature selection with competitive results. But the four search methods are time consuming, we will study some pruning tactics for accelerating these searching processes in the future.
Footnotes
Acknowledgment
This work is supported by the National Natural Science Foundation of China (Nos. 61573297 and 61772120) and the Program for New Century Excellent Talents in Fujian Province University.
