A constraint score guided meta-heuristic searching to attribute reduction

Abstract

Essentially, the problem solving of attribute reduction can be regarded as a process of reduct searching which will be terminated if a pre-defined restriction is achieved. Presently, among a variety of searching strategies, meta-heuristic searching has been widely accepted. Nevertheless, it should be emphasized that the iterative procedures in most meta-heuristic algorithms rely heavily on the random generation of initial population, such a type of generation is naturally associated with the limitations of inferior stability and performance. Therefore, a constraint score guidance is proposed before carrying out meta-heuristic searching and then a novel framework to seek out reduct is developed. Firstly, for each attribute and each label in data, the index called local constraint score is calculated. Secondly, the qualified attributes are identified by those constraint scores, which consist of the foundation of initial population. Finally, the meta-heuristic searching can be further employed to achieve the required restriction in attribute reduction. Note that most existing meta-heuristic searchings and popular measures (evaluate the significance of attributes) can be embedded into our framework. Comprehensive experiments over 20 public datasets clearly validated the effectiveness of our framework: it is beneficial to reduct with superior stabilities, and the derived reduct may further contribute to the improvement of classification performance.

Keywords

Attribute reduction constraint score meta-heuristic searching rough set

1 Introduction

With the rapid advancements of modern communication technologies, the scale and volume of data have brought huge challenges to data processing and problem solving in various real-world applications. Note that among those challenges, the curse of dimensionality is essentially serious [6, 49]. Therefore, how to reduce the dimension of data without causing a significant information loss has become a critical task.

Importantly, attribute reduction [8 , 58], a key topic in the field of rough set theory [38 , 52], has been demonstrated to be a developed technology for dimension reduction or the so-called feature selection in machine learning. Generally speaking, the objective of attribute reduction is to select a subset of qualified attributes from raw data and then construct the expected reduct based on a pre-defined restriction. Neighborhood rough set [23, 54], which employs neighborhood relation to partition the universe, has a decent advantage in analyzing data with continuous or even mixed values. Therefore, as a common dimension reduction method, attribute reduction based on neighborhood rough set has been paid much attention to. Since the neighborhood rough set model was proposed to reveal the uncertainty or inconsistency in data, those uncertainty or inconsistency-based measures provide us fundamental basis and broad thinking for exploring attribute reduction related studies, e.g., the definitions of restriction, the principle of selecting attribute, and so on.

Presently, two key factors have been mainly addressed in attribute reduction: (1) evaluation criterion; (2) searching process. For the first case, the criterions based on supervised [21, 47], semi-supervised [13, 30] and even unsupervised [27, 33] measures have been respectively designed, which characterize the significance of attributes in terms of different views and then successfully implement the evaluation. For the second case, by examining the inherent limitations of exhaustive searching in expensive computational cost, heuristic searching has been especially favored in recent researches [35, 53].

Note that one of the widely accepted heuristic searchings in the problem solving of attribute reduction is greedy search [7 , 22]. In such a searching, the greedy principle is applied to select the optimal attribute for each iteration until the pre-defined restriction is achieved. Obviously, the performance of greedy searching is easily limited by a local optimum. That is why an efficient global searching is frequently required. Fortunately, meta-heuristic searching has attracted great attention recently due to the advantage of its global searching ability [19, 24].

Without loss of generality, a meta-heuristic searching is an algorithm designed by considering human experience or natural law and is capable of identifying optimal or suboptimal solutions by taking a global view. The population generation mechanisms in meta-heuristic searching produce multiple solutions/individuals in each iteration. Up to now, various effective meta-heuristic searchings have been applied to the task of attribute reduction, e.g., the genetic algorithm [3, 11], the forest optimization algorithm [18, 48], the fish swarm algorithm [9, 32], the particle swarm optimization [40, 45], the ant colony optimization [12, 44] and the bat algorithm [1, 2].

As an effective approach to solve attribute reduction task, many improved algorithms based on meta-heuristic searching have been proposed. For instance, Zhou et al. [61] proposed a correlation-guided genetic algorithm, which checks the quality of the potential solutions to reduce the possibility of producing inferior solutions; Rashno et al. [41] developed a multi-objective particle swarm optimization method, which updates velocity and position of particles by particle and feature ranks to guide particles moving toward the best solutions; Ma et al. [34] presented a two-stage hybrid ant colony optimization, which uses the features’ inherent relevance attributes and the classification performance to guide the optimal feature subset searching.

Nevertheless, it should be stressed that one primary limitation of meta-heuristic searching is the low stability of the derived result. Low stability indicates that a searching algorithm will output a very different solution if data varies slightly, which will cause ambiguity and unreliability in the subsequent learning tasks. The reason should be mainly contributed to the random initialized generation of the population in meta-heuristic searching. The importance of stability is frequently overlooked in high-dimensional data processing [10, 37]. Furthermore, we also observe that though the random initialization mechanism can guarantee the diversity of individuals in the population to a certain extent, some individuals are inevitably equipped with poor fitness. It follows that the performance of the derived result will also be easily affected by those poor individuals.

Inspired by the above analyses, new initialization of the population has become necessary. In this research, we develop a new initialization mechanism guided by an effective index called constraint score [31, 41]. Following such novel initialization of the population, a general framework to perform meta-heuristic searching can then be further constructed. Different from the random initialization of the population in conventional meta-heuristic searching, the function of the constraint score is to identify attributes with better discrimination ability among samples. Immediately, those attributes are regarded as the foundation of producing the initial population.

The main contributions of this paper can then be summarized as follows.

A constraint score with the local view is used. The traditional constraint score is proposed by considering all samples in data, which ignores the discrimination ability in terms of samples with specific labels [51, 55]. Therefore, the conventional constraint score is replaced by a local constraint score in our strategy, which aims to comprehensively reveal the distinguishability of attributes for different labels.

A simple but effective plug-and-play framework is developed. The distinguishing feature of our study is to present a general framework, and most existing meta-heuristic searchings to attribute reduction can be embedded into it to improve their performance further.

The rest of this paper is organized as follows. In Section 2, we review basic concepts related to this study. In Section 3, we introduce some popular meta-heuristic searchings. In Section 4, a constraint-guided meta-heuristic searching framework is formally elaborated. Section 5 shows the performance evaluation of the proposed framework over extensive datasets. Finally, in Section 6, we draw conclusions and future plans.

2 Preliminaries

2.1 Neighborhood rough set

In general, a training data can be represented by an decision system such that $D = 〈 U, AT \cup {d} 〉$ . U = {x_i|1 ≤ i ≤ n} is the nonempty finite set of samples, which is frequently regarded as the universe in many literature; AT = {a_r|1 ≤ r ≤ m} is the nonempty finite set of condition attributes; d is a specific decision attribute which records the labels of samples in U. Specifically, the set of all distinguished labels in $D$ is L = {l_p|1 ≤ p ≤ q} (q ≥ 2). Immediately, ∀x_i ∈ U, we know d (x_i) ∈ L where d (x_i) is the label of sample x_i. Similarity, ∀x_i ∈ U and ∀a_r ∈ AT, a_r (x_i) denotes the value of sample x_i over condition attribute a_r.

Obviously, by d, an indiscernibility relation can be formed over U: IND (d) = {(x_i, x_j) ∈ U × U|d (x_i) = d (x_j)}. Immediately, universe U is partitioned into serval disjoint decision classes such as U/IND (d) = {X₁, X₂, ⋯ , X_q}. ∀X_p ∈ U/IND (d), X_p is referred to as the p-th decision class, it is the collection of samples which possess the p-th label. Especially, ∀x_i ∈ U, the decision class which contains x_i is denoted by [x_i] _d in the context of this study.

From the perspective of Granular Computing (GrC) [15, 28], information granulation plays a crucial role in various data analyses. The objective of information granulation is to granulate either samples in U or condition attributes in AT, which can effectively reduce the volume of data or reveal the inherent structure of data. Among various information granulation techniques, the neighborhood strategy is widely accepted because of not only its clear explanation but also its simple implementation. In addition, it should also be emphasized that the flexible radius used in the neighborhood also provides us a meaningful characterization of the multi-granularity structure of data [25, 29]. The general form of the neighborhood is then presented as follows.

Definition 1. [23] Given a decision system $D$ , δ ≥ 0 is a radius, ∀A ⊆ AT, a neighborhood relation δ_A is

$δ_{A} = {(x_{i}, x_{j}) \in U \times U | {dis}_{A} (x_{i}, x_{j}) \leq δ},$ (1) in which dis_A (x_i, x_j) implies the distance between x_i and x_j over A, ∀x_i ∈ U, the neighborhood of x_i is

$δ_{A} (x_{i}) = {x_{j} \in U | (x_{i}, x_{j}) \in δ_{A}} .$ (2)

Following Definition 1, the set of all neighborhoods, i.e., {δ_A (x_i) | ∀ x_i ∈ U}, forms the result of information granulation over U based on the information provided by A. Furthermore, suppose that δ ≥ δ′, we then obtain $δ_{A} \supseteq δ_{A}^{'}$ and $δ_{A}^{'} (x_{i}) \subseteq δ_{A} (x_{i})$ for each x_i ∈ U. Such a case says that the multi-granularity structure induced by multiple radii is strictly nested. By the above definition of neighborhood, Hu et al. [21] have presented the following definitions of neighborhood lower and upper approximations which are expansions of Pawlak’s rough set [37].

Definition 2. Given a decision system $D$ , δ ≥ 0 is a radius, ∀A ⊆ AT, the neighborhood lower and upper approximations of d related to A can be defined as $\underline{δ_{A}} (d) = ⋃_{p = 1}^{q} \underline{δ_{A}} (X_{p}),$ (3) $\bar{δ_{A}} (d) = ⋃_{p = 1}^{q} \bar{δ_{A}} (X_{p}),$ (4) where for each X_p ∈ U/IND (d), $\underline{δ_{A}} (X_{p}) = {x_{i} \in U | δ_{A} (x_{i}) \subseteq X_{p}},$ (5) $\bar{δ_{A}} (X_{p}) = {x_{i} \in U | δ_{A} (x_{i}) \cap X_{p} \neq \emptyset} .$ (6)

2.2 Relative measures

Following Section 2.1, one of the important tasks is to sketch the power of characterizing decisions by condition attributes. In view of this, various measures have been explored with respect to different requirements. We mainly concentrate on the three following frequently used measures in the context of this study.

Definition 3. [21] Given a decision system $D$ , δ ≥ 0 is a radius, ∀A ⊆ AT, the dependency of d related to A is

$γ_{A} (d) = \frac{| \underline{δ_{A}} (d) |}{| U |},$ (7) where |X| denotes the cardinality of the set X.

Dependency shown in Definition 3 is the percentage of those samples that certainly belong to one of the decision classes. A higher value of dependency indicates that the performance of characterizing decision classes by neighborhood is better. Obviously, 0 ≤ γ_A (d) ≤1 holds.

Definition 4. [59] Given a decision system $D$ , δ ≥ 0 is a radius, ∀A ⊆ AT, the condition entropy of d related to A is $E_{A} (d) = - \frac{1}{| U |}$ (8) $\sum_{x_{i} \in U} | δ_{A} (x_{i}) \cap [x_{i}]_{d} | \log \frac{| δ_{A} (x_{i}) \cap [x_{i}]_{d} |}{| δ_{A} |} .$

Condition entropy shown in Definition 4 measures the discriminating ability of a condition attribute subset A for decision d. The smaller the value of such a condition entropy is, the more powerful the discriminating ability of A for d is. Obviously, 0 ≤ E_A (d) ≤ |U|/e holds.

Definition 5. [46] Given a decision system $D$ , δ ≥ 0 is a radius, ∀A ⊆ AT, the discrimination index of d related to A is

$H_{A} (d) = log \frac{| δ_{A} |}{| δ_{A} \cap IND (d) |} .$ (9)

Discrimination index shown in Definition 5 characterizes the ability of a condition attribute subset A to distinguish samples with different labels. The smaller the value of discrimination index is, the greater the distinguishing ability of A is. Obviously, 0 ≤ H_A (d) ≤ log |U| holds.

2.3 Constraint score

Obviously, the measures presented in section 2.2 can be used to measure the performance of attributes. Nevertheless, they do not take the labels of samples into account. In view of this, the constraint score proposed by Sun et al. [43] is presented as follows, which can quantitatively characterize the distinguishing ability among samples.

Definition 6. Given a decision system $D$ , by the labels of samples in U, a pairwise strong-link constraint is denoted as S = {(x_i, x_j) ∈ U²|d (x_i) = d (x_j)}, a pairwise cannot-link constraint is denoted as N = {(x_i, x_j) ∈ U²|d (x_i) ≠ d (x_j)}, ∀A ⊆ AT, the pairwise constraint score related to A is

$τ (A, d) = \frac{\sum_{(x_{i}, x_{j}) \in S} {dis}_{A} (x_{i}, x_{j})}{\sum_{(x_{i}, x_{j}) \in N} {dis}_{A} (x_{i}, x_{j})} .$ (10)

By Equation (10), a smaller value of τ (A, d) implies a higher ability of distinguishing samples by information provided by attributes in A. The reasons can be attributed to the following facts: (1) a smaller value of τ (A, d) can be obtained by a smaller value of ∑_{(x_i,x_j)∈S}dis_A (x_i, x_j) and a greater value of ∑_{(x_i,x_j)∈N}dis_A (x_i, x_j); (2) if samples in same decision class are as close as possible, then a smaller value of ∑_{(x_i,x_j)∈S}dis_A (x_i, x_j) is generated; if samples in different decision classes are as far as possible, then a greater value of ∑_{(x_i,x_j)∈N}dis_A (x_i, x_j) will be obtained.

3 Meta-heuristic searching to attribute reduction

3.1 Attribute reduction

Similar to the case in conventional rough set, attribute reduction is also a key in exploiting neighborhood rough set. Without loss of generality, the objective of attribute reduction is to remove redundant or irreverent attributes from raw data. Therefore, it can not only reduce the volume of data but also provide qualified attributes for subsequent learning tasks.

Up to now, note that various forms of attribute reduction have been explored through different measures. Nevertheless, most attribute reductions possess similar structures, i.e., a form of defining a minimal subset of attributes that satisfies a measure-based restriction. From this point of view, Yao et al. [57] have presented the following general definition of attribute reduction.

Definition 7. Given a decision system $D$ , $C_{ρ}$ is a restriction related to a measure ρ, ∀A ⊆ AT, A is referred to as a $C_{ρ}$ -reduct if and only if the the following conditions hold:

A achieves the restriction $C_{ρ}$ ;

∀A′ ⊂ A, A′ does not achieve the restriction $C_{ρ}$ .

In Definition 7, the measure ρ can be calculated by dependency γ_A (d), condition entropy E_A (d) and discrimination index H_A (d) shown in Definitions 3, 4 and 5, respectively. Note that γ_A (d) is positive preference while both E_A (d) and H_A (d) are negative preference. We then know that constraint $C_{ρ}$ shown in Definition 7 can be explained by the following two different phases [4]: (1) preserve or increase a positive preference based measure; (2) preserve or decrease a negative preference based measure.

Following Definition 7, how to seek out the corresponding reduct is then a challenge. Up to now, various searching strategies have been developed with respect to different perspectives, e.g., from the perspective of searching efficiency or searching objective. Among those strategies, the fitness function based searching is especially popular because of its lower complexity. Specifically, as a representative way of fitness function based searching, meta-heuristic searching has drawn much attention in recent years. Generally, a meta-heuristic searching can be regarded as the combination of randomly searching and heuristic searching. For example, the genetic algorithm, forest optimization algorithm, fish swarm algorithm and bat algorithm have been successfully introduced into the search of reduct.

Notations used in the above algorithms are listed as follows.

At the beginning of these algorithms, a population consisting of k individuals is randomly positioned, which represents the random generation of initial solutions of reducts. For instance, the population is represented by $P = {P_{1}, P_{2}, \dots, P_{k}}$ , and each individual in $P$ is encoded as a binary vector with size |AT|, where 1 denotes the existence of one attribute in the reduct pool while 0 denotes that the corresponding attribute is absent. P_best is the obtained reduct which is the best one in the population.

The fitness function is used to measure the adaptability of individuals in the population. Given a solution, it should be evaluated by using fitness function, which reveals the power or performance of such a solution. A higher value of fitness function implies a better solution has been obtained. Generally, the fitness of the solution can be represented by various ways, e.g., the dependency, condition entropy, discrimination index and so on.

t is a variable that indicates the number of iterations in the procedure of the algorithm. $T$ denotes the maximum number of iterations to perform the algorithm.

3.2 Genetic algorithm

Genetic Algorithm (GA) is a widely known meta-heuristic searching mechanism. It is inspired by the evolution theory to simulate the natural evolution process [20, 62]. Following this principle, the best individuals are the ones most likely to survive for subsequent reproduction. A population derived by such a way is usually superior to that of the previous generation. Without loss of generality, the generation of a new population is conducted by following three specific operators.

Selection operator. Generally, the selection operator adopts the roulette wheel method. The basic thinking is to make each selected individual’s probability and fitness proportional, that is, a selection strategy based on the fitness ratio. We put these individuals into a line segment ranging from 0 to 1 in proportion, and the length of the line segment represents the selection probability of each individual. Randomly generate a number in the interval [0, 1] and the falling interval represents the selected individual.

Crossover operator. The crossover operator is the core of the genetic algorithm, which directly determines the algorithm’s performance. It is a mechanism in which a new generation is created by exchanging entries (values in binary vector) between two individuals (attribute subsets) based on the result of the previous iteration.

Mutation operator. In the mutation operator, one or more previous individuals (attribute subsets) are selected randomly based on mutation probability, and one or more attributes in those subsets are exchanged. In other words, an attribute is randomly added into or removed from an attribute subset. Therefore, new individuals can also be obtained.

Immediately, we know that in GA, the parent population generates a new offspring population for the next evolution by the above three operators. The following Algorithm 1 shows a detailed pseudo-code of GA.

Algorithm 1 Genetic Algorithm (GA)
Input: A decision system $D$ , a restriction $C_{ρ}$ , a number of iterations $T$ .
Output: One reduct P_best.
1: Initialize the population $P = {P_{1}, P_{2}, \dots, P_{k}}$ by random
generation;
2: Set crossover and mutation probability;
3: Repeat
4: Generate new population by selection operator;
5: Swap pairs of individuals by crossover operator;
6: Inverting certain positions of individuals by mutation operator;
7: Rank the fitness values of individuals and identify the
current best individual P_best;
8: Until $C_{ρ}$ is achieved or $t \geq T$
9: ReturnP_best.

3.3 Forest optimization algorithm

Forest optimization algorithm (FOA) is a meta-heuristic method motivated by the growth and development of trees in a forest [17]. In FOA, trees use the seeding strategy to implement propagation, which includes the transfer and deployment of the seeds of trees. Generally, there are two representative methods of seeding: (1) local seeding: seeds near the parent trees can grow smoothly; (2) global seeding: sometimes, by considering external factors such as animals, wind and water, seeds travel long distances and are placed in entirely new locations [36]. Based on the explanations of local and global seedings, FOA creates exploration and exploitation mechanisms, which can identify the most appropriate surroundings for tree evolution in the problem space. The detailed phases of FOA are elaborated as follows.

Initializing trees. Similar to other popular meta-heuristic algorithms, the forest is initialized by randomly generating trees. Firstly, each variable of each tree is initialized randomly with either 0 or 1. For instance, if a data possesses |AT| attributes, the size of each tree will be |AT|+1, where the last variable denotes the “Age” of such a tree. Particularly, the “Age” is set to be 0 in the initialization of the algorithm. Note that the local seeding in each iteration of the algorithm will increase the “Age” of all trees except newly generated ones in the local seeding stage.

Local seeding. This stage adds some neighbors of each tree with “Age” 0 into the forest. Some variables are selected randomly (the parameter “LSC” determines the number of the selected variables). Then the values of the selected variables are changed from 0 to 1 or vice versa. After performing the local seeding stage, the “Ages” of all trees, except newly generated ones, are increased by 1, respectively.

Population limiting. In this stage, two series of trees will be removed from the forest, which form the candidate population: (i) “Age” of a tree is older than a parameter called “life time”; (ii) the extra trees that exceed the “area limit” parameter after sorting the trees by their fitness values. This stage forms the candidate population and a pre-defined percentage of the candidate population, which can be used in the following global seeding.

Global seeding. To perform the global seeding in FOA, for each tree in the candidate population, some of the variables are selected randomly. The number of the selected variables is determined by a parameter called “GSC”. Immediately, the value of each selected variable will be changed from 0 to 1 or vice versa.

Updating the best tree. In this stage, after ranking the trees by their fitness values, the tree with the best fitness value will be identified as the best one and its “Age” will set to be 0.

Note that stages (2)-(5) are performed iteratively until the stop condition is satisfied, i.e., the objective of attribute reduction is satisifed. The following Algorithm 2 presents a detailed process of FOA to attribute reduction.

Algorithm 2 Forest Optimization Algorithm (FOA)
Input: A decision system $D$ , a restriction $C_{ρ}$ , a number of iterations $T$ .
Output: One reduct P_best.
1: Initialize the forest $P = {P_{1}, P_{2}, \dots, P_{k}}$ by random
generation;
2: Set “LSC”, “GSC”, transfer rate, area limit, life time;
3: Initialize “Age” of each tree as 0;
4: Repeat
5: Fori = 1 : “LSC” do
6: Randomly choose a position of the selected tree;
7: Change the value of such position from 0 to 1 or vice
verse;
8: End For
9: Increase the “Age” of all trees by 1;
10: Limit the size of population by area limit and life time;
11: Global seeding of candidate population;
12: Choose transfer rate percent of the candidate population;
13: For “LSC” do
14: Randomly choose “GSC” positions of the selected tree;
15: Change the value of the position from 0 to 1 or vice
verse;
16: End For
17: Update the best so far tree;
18: Rank the trees and find the current P_best;
19: Set the “Age” of the best tree to 0;
20: Until $C_{ρ}$ is achieved or $t \geq T$
21: ReturnP_best.

3.4 Fish swarm algorithm

Fish swarm algorithm (FSA) is an optimization method through considering fish’s behavior of seeking food [5, 60]. The hamming distance between two fish indicates their relationship. Besides, there are some other parameters including visual distances of fish, maximum step length and crowd factor. In FSA, all fish try to identify locations that satisfy their food requirements by using the following three distinct behaviors: preying, swarming and following behaviors.

Preying behavior. In FSA, preying is the fundamental biological behavior of looking for food. It can be simulated by randomly searching with a tendency toward food concentration. In such a procedure, each artificial fish in the initial population tries their best within its exploration range visual to search for the qualified position where food resources are richer. Generally speaking, we randomly give the fish a new position within its visual. If the fitness of the new position is better than that of the current one, then the fish moves to such a new position. To avoid expansive time-consuming, note that if the condition is not satisfied after numerous tries, then a random position within the step range will be directly given to the fish.

Swarming behavior. In the natural environment, fish prefer to assemble in several swarms, which aims to minimize dangers. The common objectives of swarm include satisfying food intake requirements and attracting new population members. In FSA, the swarm consists of fish within the current fish’s visual. The central position of a swarm is then calculated by the arithmetic average of all swarm members’ positions over each dimension. If the swarm center possesses a better fitness value than the current positions of fish and the swarm center is not overly crowded, then fish will move from the current positions to the next positions toward the center. Otherwise, preying behavior is employed to identify the next position for fish.

Following behavior. Based on the consideration of visual, some fish may be able to find a greater amount of food than others can. Therefore, fish will naturally try to follow the best one to gain relatively more food and less crowding. In other words, in FSA, if a fish in the neighborhood can find a better fitness value by comparing with its current position, and if the swarm of the neighbor is not overly crowded, then it will move towards the position of this neighbor. It should be emphasized that the preying behavior commences if the following behavior is not able to determine the next position of a fish.

Based on the fish behaviors described above, Algorithm 3 shows the pseudocode of FSA to search for a reduct.

Algorithm 3 Fish Swarm Algorithm (FSA)
Input A decision system $D$ , a restriction $C_{ρ}$ , a number of iterations $T$ .
Output One reduct P_best.
1: Initialize the population $P = {P_{1}, P_{2}, \dots, P_{k}}$ by random
generation;
2: Define the length of moving step, the crowd factor, the trying number and the field of vision;
3: Repeat
4: Forind = 1 : kdo
5: If the swarm of the center is not overly crowded then
6: Get new position P_α by swarming;
7: Else
8: Get new position P_α by preying;
9: End If
10: If the swarm of the neighbor is not overly crowded then
11: Get new position P_β by following;
12: else
13: Get new position P_β by preying;
14: End If
15: Calculation the fitness values of both P_α and P_β;
16: If the fitness value of P_α is superior than that of P_βthen
17: P_ind = P_α;
18: else
19: P_ind = P_β;
20: End If
21: Rank the fitness values of all individuals and find the
current best individual P_best;
22: End For
23: Until $C_{ρ}$ is satisfied or $t \geq T$
24: ReturnP_best.

3.5 Bat algorithm

Bat Algorithm (BA) is an optimization algorithm based on the echolocation behavior of bats. In the world of nature, bats identify their targets by varying the loudness and pulse rate. Bats emit a very loud sound pulse and then listen for the echo that bounces back from the surrounding objects. The correlation between the bounced and the reflected pulses determines the obstacle type. Their pulses vary in properties and can be correlated with their hunting strategies, depending on the species. In BA, bats adjust their positions by using the following mechanisms.

Update velocity, position and frequency. Firstly, some parameters such as the position P_ind, velocity V_ind, and frequency F_ind are initialized for each bat. The movement of the virtual bats is given by updating their velocity and position through the following equations: $V_{ind} (t + 1) = V_{ind} (t)$ (11) $+ (P_{ind} (t) - P_{best}) F_{ind},$ $P_{ind} (t + 1) = P_{ind} (t)$ (12) $+ V_{ind} (t) + V_{ind} (t + 1),$ $F_{ind} = F_{\min} + (F_{\max} - F_{\min}) θ,$ (13) in which F_ind is the updated frequency and θ is a random number in [0,1s].

Random flight. A strategy called random walk can be employed to improve the variability of the possible solutions. That is, one best bat is primarily selected among the current solutions, and then the random walk is applied to generate a new solution. This bat accepts the condition as the following equation: $p_{new} = p_{old} + ɛ ψ^{t},$ (14) where ψ^t is the average loudness of all bats at time t, and ɛ is a random number in [-1,1]. To select the optimal solutions from the given search space, it is necessary to allow the bat to walk through the entire search space.

The Algorithm 4 shows the pseudocode of the bat algorithm to search the reduct.

Algorithm 4 Bat Algorithm (BA)
Input: A decision system $D$ , a constraint $C_{ρ}$ , a number of iterations $T$ .
Output: One reduct P_best.
1: Initialize the population $P = {P_{1}, P_{2}, \dots, P_{k}}$ by random
generation;
2: Initialize pulse rates, frequency, velocity and loudness;
3: Repeat
4: Generate new solutions by adjusting frequency, updating
velocities and positions;
5: Ifrand> pulse rates then
6: Select a solution among the best solutions randomly;
7: Generate a local solution around the selected best
solution;
8: End If
9: Generate a new solution P_ind by random flight;
10: Ifrand< loudness and the fitness value of P_best) is
superior than that of P_indthen
11: Accept the new solution;
12: Increase pulse rates and reduce loudness;
13: End If
14: Rank the bats and find the current P_best;
15: Until $C_{ρ}$ is satisfied or $t \geq T$
16: ReturnP_best.

4 The proposed algorithm

As what has been addressed in subsection 3.1, each individual in the population represents a reduct pool, and each position in the individual represents an attribute. If an attribute is identified and added into the corresponding reduct pool, the position of such an attribute in the individual will be set to 1, otherwise, it will be set to 0.

Obviously, if a meta-heuristic algorithm is carried out, the initial population is generated randomly. Such a random initialization strategy possesses the advantage of ensuring the diversity in the population. However, some limitations will emerge in the problem solving of attribute reduction, which can be illustrated as follows.

The performance of a meta-heuristic algorithm is especially influenced by the quality of the initial population. For example, though a random initialization strategy can ensure the diversity of individuals, the initial individuals may be equipped with poor fitness values due to the randomness mechanism. Consequently, those individuals, i.e., the candidate solutions or the so-called reduct pools, are modified constantly in the whole process of the meta-heuristic algorithm. From this point of view, an initial population with poor fitness values of individuals will inevitably influence the searching efficiency of an algorithm. Furthermore, it should also be emphasized that most common adjustments are realized by using the random addition or removal of attributes, e.g., the mutation operator, “LSC” parameter, “GSC” parameter, etc., which also limit the individuals’ movement within a certain range.

To fill the above gaps, in this research, a framework called constraint-guided meta-heuristic searching will be develop to the problem solving of attribute reduction. In such a framework, the basic thinking of constraint score shown in section 2.3 is employed. Generally, our method is inspired by the following facts.

The purpose of designing a constraint guidance is to present a new initialization strategy, which makes the initial population possess good quality. In other words, not only to ensure the diversity of individuals in population, but also to make the individuals in the initial population better.

The importance of attributes is evaluated by a local view instead of the global view of a constraint score. In other words, not only a new form of constraint score is defined, but also such a new form of constraint score is equipped with the distinguishing ability which is similar to that described in section 2.3.

From the discussions above, the main principle of our strategy is:

Before carrying out the meta-heuristic search procedure, a local constraint score which is related to each label in the data is defined. Such a local constraint score can possess the distinguishing ability among samples with respect to specific samples over a label.

Definition 8. Given a decision system $D$ , ∀l_p ∈ L, a l_p-related pairwise strong-link constraint is given by S^{l
_p} = {(x_i, x_j) ∈ U²|d (x_i) = d (x_j) = l_p}, a l_p-related pairwise cannot-link constraint is given by N^{l
_p} = {(x_i, x_j) ∈ U²|d (x_i) = l_p, d (x_j) ≠ l_p}, ∀a_r ∈ AT, the local pairwise constraint score related to label l_p over attribute a_r is given by $τ (a_{r}, l_{p}) =$ (15) $\frac{\sum_{(x_{i}, x_{j}) \in S^{l_{p}}} (a_{r} (x_{i}) - a_{r} (x_{j}))^{2}}{\sum_{(x_{i}, x_{j}) \in N^{l_{p}}} (a_{r} (x_{i}) - a_{r} (x_{j}))^{2}} .$

Similar to what have been addressed in Equation (10), by Equation (15), a smaller value of τ (a_r, l_p) implies a greater ability of distinguishing samples related to label l_p over the attribute a_r. Immediately, it is not difficult to obtain a matrix such that M_τ = {τ (a_r, l_p) |1 ≤ r ≤ m, 1 ≤ p ≤ q}. Through such a constraint score based matrix, the discriminating ability of each attribute for each label can be measured. Following M_τ, for each label l_p, an attribute with the smallest value of local constraint score is then identified and added into the reduct pool. This is mainly because the attribute with the smallest value of local constraint score indicates that it is equipped with superior distinguishability. This process is regarded as the constraint guidance based initialization in this study.

Obviously, in our initialization strategy, the number of labels plays a key role, and it straightly affects the quality of the initial population. From this point of view, a specific limit parameter is adopted as follow. Pseudo_threshold, it extends the number of labels in data, that is, generating a specified number of labels such as L′ = {l_p|1 ≤ p ≤ q′} (q′ ≥ 2) by pseudo-label strategy [54], which will avoid the difficulty of insufficient labels in raw data.

Algorithm 5 Constraint score guided initialization of population
Input: A decision system $D$ , pseudo_threshold q′.
Output: One population P.
1: Initialize the population $P = {P_{1} = \emptyset, P_{2} = \emptyset, \dots, P_{k} = \emptyset}$ and a constraint set B =∅;
2: By the number of pseudo_threshold q′, calculate the pseudo labels of samples in $D$ ;
3: Forr = 1 : mdo
4: Forp = 1 : q′ do
5: Compute the τ (a_r, l_p);
6: End For
7: End For
8: Forp = 1 : q′ do
9: Rank the attributes in AT by ascending order of τ (a_r, l_p);
10: Identify the attribute a_z = arg min {τ (a_r, l_p) \|1 ≤ r ≤ m} and add a_z into set B;
11: End For
12: Fors = 1 : kdo
13: P_s = B;
14: Randomly identify some attributes from AT - B and add them into P_s;
15: End For
16: Return $P$ .

The above algorithm elaborates our initialization strategy: the constraint score matrix is used to identify qualified attributes which are regarded as the foundation of the individuals in the population. In other words, constraint score offers a population preprocessing before the search. Consequently, not only the evaluations of attributes is more comprehensive and substantial because the local view of the discrimination ability of attributes is taken into account, but also the searching space of the population is compressed. Note that in steps 12-15, a random mechanism is also employed which is useful in further expanding individuals and can then guarantee the diversities among individuals.

The time complexity of algorithm 5 mainly comprises of three components: calculating the constraint matrix in steps 3-7, the cost of the calculation for each label and each attribute is $O (| AT | | L^{'} |)$ ; the time complexity of ranking the attributes in steps 8-11 is $O (| L^{'} | \log | L^{'} |)$ ; the time complexity of randomly identifying attributes in steps 12-15 is $O (k)$ . Therefore, in general the time complexity of algorithm 5 is $O (| AT | | L^{'} | + | L^{'} | \log | L^{'} | + | P |)$ .

The following Fig. 1 visually illustrates the process of Algorithm 5.

Fig. 1

A general framework of population initialization guided by constraint score.

To facilitate the further understanding of our initialization strategy, a toy example is presented as follows.

Example 1. Given a decision system shown in Table 1, U = {x₁, x₂, x₃, x₄, x₅, x₆}, AT = {a₁, a₂, a₃, a₄, a₅, a₆}. Set of labels in Table 1 is recorded by d and L = {l₁, l₂}. Suppose that we only use the raw labels in data, we then obtain U/IND (d) = {X₁, X₂} = {{x₁, x₂, x₃} , {x₄, x₅, x₆}}.

Table 1

An example of decision system

	a ₁	a ₂	a ₃	a ₄	a ₅	a ₆	L
x ₁	0.3017	0.0019	0.3059	0.2288	0.1348	0.3653	l ₁
x ₂	0.1017	0.1251	0.0947	0.1921	0.2405	0.3604	l ₁
x ₃	0.0316	0.1323	0.0807	0.1088	0.3405	0.2503	l ₁
x ₄	0.3319	0.2519	0.3013	0.3217	0.2701	0.1148	l ₂
x ₅	0.3640	0.4183	0.3124	0.3861	0.4462	0.3295	l ₂
x ₆	0.4516	0.4421	0.3543	0.5486	0.3555	0.2150	l ₂

Calculate the constraint matrix derived by each label and each attribute via Definition 7, then we can obtain $\begin{matrix} M_{τ} = \\ a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & a_{6} \\ l_{1} & (0.0710 & 0.1564 & 0.0278 & 0.2593 & 0.3929 & 0.7106 \\ l_{2} & 0.3636 & 0.0782 & 0.5665 & 0.0717 & 0.5361 & 0.2603 \end{matrix})$

Following the above result, for label l₁, we have τ (a₃, l₁) < τ (a₁, l₁) < τ (a₂, l₁) < τ (a₄, l₁) < τ (a₅, l₁) < τ (a₆, l₁). By Algorithm 5, attribute a₁ satisfies τ (a₃, l₁) = min {τ (a_r, l₁) |1 ≤ r ≤ 6}, which should be added into set B. Similarly, for label l₂, attribute a₄ is identified and added into set B. We finally obtain B = {a₃, a₄}.

Following Algorithm 5 which initializes the population, the following Algorithm 6 is then designed to further expand the selection of qualified attributes until the restriction in attribute reduction is achieved.

Algorithm 6 Constraint score guided meta-heuristic searching to attribute reduction
Input: A decision system $D$ , pseudo_threshold q′.
Output: One reduct P_best.
1: Initialize the population by Algorithm 5;
2: Following the initialized population, seek out a reduct P_best through a specific meta-heuristic searching;
3: ReturnP_best.

The time complexity of algorithm 6 mainly comprises of two components: the cost of initializing the population by Algorithm 5, which has been calculated above, is $O (| AT | | L^{'} | + | L^{'} | \log | L^{'} | + | P |)$ ; the cost of meta-heuristic searching is $O (| T | (| P | + | P | \log | P |))$ . Generally, the time complexity of algorithm 6 is $O (| AT | | L^{'} | + | L^{'} | \log | L^{'} | + (1 + | T | (1 + \log | P |)) | P |)$ .

Note that if the meta-heuristic searching used in Algorithm 6 is genetic algorithm, forest optimization algorithm, fish swarm algorithm and bat algorithm, then the corresponding searchings are named as “CG-GA”, “CG-FOA”, “CG-FSA” and “CG-BA”, respectively (“CG” is short for “Constraint Guide”).

5 Experimental analyses

5.1 Data descriptions

In this section, to validate the effectiveness of our framework, we perform it over 20 datasets. The details of these datasets are presented in the following Table 2. The programming language is Matlab R2017b. All the experiments have been carried out on a personal computer with Windows 10, Intel Core i5-8265U CPU (1.60 GHz) and 16.00 GB memory.

Table 2
Data description

ID Datasets # Samples # Attributes # Labels Domain

1 Breast Cancer Wisconsin (Diagnostic) 569 30 2 Biology

2 CLL_SUB_111 111 11340 3 Computer

3 Connectionist Bench (Sonar, Mines vs. Rocks) 208 60 2 Physics

4 Dermatology 366 34 6 Biology

5 Forest Type Mapping 523 27 4 Geography

6 Leukemia 72 5327 3 Biology

7 Libras Movement 360 90 15 Astronomy

8 LSVT Voice Rehabilitation 126 256 10 Computer

9 Madelon 2600 500 2 Artificiality

10 MLL 72 12582 3 Biology

11 Musk (Version 1) 476 166 2 Medicine

12 ORL 400 1024 40 Biology

13 Parkinson Multiple Sound Recording 1208 27 2 Medicine

14 SPECTF Heart 267 44 2 Biology

15 SRBCT 83 2308 4 Biology

16 Statlog (Australian Credit Approval) 690 14 2 Financial

17 Statlog (Landsat Satellite) 6435 36 7 Geography

18 TOX_171 171 5748 4 Biology

19 Urban Land Cover 675 147 9 Geography

20 Waveform Database Generator (Version 1) 5000 21 3 Computer

ID	Datasets	# Samples	# Attributes	# Labels	Domain
1	Breast Cancer Wisconsin (Diagnostic)	569	30	2	Biology
2	CLL_SUB_111	111	11340	3	Computer
3	Connectionist Bench (Sonar, Mines vs. Rocks)	208	60	2	Physics
4	Dermatology	366	34	6	Biology
5	Forest Type Mapping	523	27	4	Geography
6	Leukemia	72	5327	3	Biology
7	Libras Movement	360	90	15	Astronomy
8	LSVT Voice Rehabilitation	126	256	10	Computer
9	Madelon	2600	500	2	Artificiality
10	MLL	72	12582	3	Biology
11	Musk (Version 1)	476	166	2	Medicine
12	ORL	400	1024	40	Biology
13	Parkinson Multiple Sound Recording	1208	27	2	Medicine
14	SPECTF Heart	267	44	2	Biology
15	SRBCT	83	2308	4	Biology
16	Statlog (Australian Credit Approval)	690	14	2	Financial
17	Statlog (Landsat Satellite)	6435	36	7	Geography
18	TOX_171	171	5748	4	Biology
19	Urban Land Cover	675	147	9	Geography
20	Waveform Database Generator (Version 1)	5000	21	3	Computer

5.2 Experiment configurations

In experiment, we use 10-folds cross-validation to calculate reducts. This mechanism divides all samples into 10 disjoint groups. For each round of computation, we choose one group of samples, which is regarded as the test data to perform classification. The combination of rest samples is regarded as train data to derive reducts by different algorithms. The above process is repeated 10 times. Therefore, the mean values of compared metrics are recorded.

In the context of this experiment, we have used 10 different radii to conduct the experiment, they are 0.04, 0.08, ⋯, 0.40. To avoid adding too many redundant attributes into reducts, the length of each individual in initial population is set to be 10 % × |AT| if the 4 meta-heuristic searchings shown in Section 3 are employed.

5.3 Comparisons of stabilities of reducts

In this section, the stabilities of reducts derived by different types of searching will be compared. Given a decision system $D$ , supposing that universe U is divided into w (10-folds cross-validation is used in this study, thus, w = 10) disjoint groups with same size such as U₁, U₂, ⋯ , U_w, the stability of reduct is calculated by ${Sta}_{R} =$ (16) $\frac{2}{w \cdot (w - 1)} \sum_{g = 1}^{w - 1} \sum_{g' = g + 1}^{w} \frac{| A_{g} \cap A_{g'} |}{| A_{g} \cup A_{g'} |},$

in which A_g is the reduct derived over U - U_g.

Obviously, Sta_R ∈ [0, 1] holds. If A_g∩ A_g′ = ∅, then Sta_R achieves the minimal value 0, which means that the obtained reduct is completely unstable. If A_g = A_g′, then Sta_R achieves the maximal value 1, which means that the obtained reduct is completely stable. The detailed comparisons are reported in the following Tables 3 –5.

Table 3

Stabilities of reducts w.r.t. dependency

ID	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.7632	0.7483	0.7831	0.7627	0.7758	0.7956	0.8102	0.7817
2	0.4595	0.4172	0.4677	0.5105	0.4660	0.4416	0.5026	0.4364
3	0.0049	0.0027	0.0090	0.0026	0.0067	0.0027	0.0099	0.0027
4	0.2993	0.2286	0.3874	0.3778	0.3356	0.2355	0.4302	0.2298
5	0.6228	0.5974	0.6306	0.6349	0.6218	0.6073	0.6764	0.5822
6	0.0331	0.0032	0.0296	0.0033	0.0311	0.0029	0.0298	0.0033
7	0.1139	0.1034	0.2229	0.2112	0.1259	0.1000	0.3233	0.1065
8	0.1177	0.0567	0.1171	0.0629	0.1156	0.0568	0.1279	0.0573
9	0.1690	0.0580	0.1659	0.0515	0.1604	0.0517	0.1725	0.0581
10	0.0054	0.0023	0.0067	0.0022	0.0043	0.0020	0.0067	0.0029
11	0.1666	0.0694	0.2750	0.1602	0.1903	0.0706	0.1632	0.0744
12	0.1909	0.0042	0.0841	0.0047	0.1864	0.0038	0.1865	0.0049
13	0.7207	0.7049	0.7567	0.7293	0.7299	0.7021	0.7388	0.7048
14	0.1285	0.0972	0.2212	0.1641	0.1451	0.0938	0.1244	0.1012
15	0.3092	0.2843	0.3754	0.3701	0.3061	0.2827	0.2903	0.2818
16	0.1134	0.0033	0.0435	0.0036	0.1091	0.0032	0.1077	0.0020
17	0.3540	0.1038	0.2531	0.1248	0.3387	0.0965	0.3088	0.1320
18	0.0348	0.0044	0.0058	0.0019	0.0337	0.0032	0.0336	0.0031
19	0.1346	0.0707	0.2182	0.1340	0.1320	0.0737	0.1393	0.0772
20	0.6064	0.5290	0.5790	0.5928	0.6253	0.5166	0.5871	0.2686
AVG	0.2674	0.2045	0.2816	0.2452	0.2720	0.2071	0.2884	0.1955
	↑ 30.76%		↑ 14.85%		↑ 31.34%		↑ 47.52%

Table 4

Stabilities of reducts w.r.t. condition entropy

ID	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8895	0.8877	0.8888	0.8944	0.9127	0.8983	0.8979	0.8917
2	0.6250	0.6039	0.6387	0.6398	0.6180	0.5812	0.6408	0.6120
3	0.0072	0.0020	0.0055	0.0023	0.0062	0.0034	0.0065	0.0031
4	0.4054	0.2891	0.3926	0.3124	0.4545	0.2754	0.4545	0.2754
5	0.7654	0.7509	0.7564	0.7484	0.7558	0.7259	0.7873	0.7436
6	0.0344	0.0023	0.0104	0.0042	0.0332	0.0036	0.0358	0.0031
7	0.3303	0.1934	0.2889	0.2739	0.3376	0.1945	0.3141	0.2047
8	0.1264	0.0593	0.1188	0.0550	0.1271	0.0543	0.1396	0.0572
9	0.1730	0.0570	0.1699	0.0557	0.1714	0.0527	0.1685	0.0549
10	0.0062	0.0030	0.0060	0.0031	0.0066	0.0020	0.0061	0.0025
11	0.1840	0.0839	0.1960	0.0777	0.1793	0.0848	0.1804	0.0934
12	0.2196	0.0099	0.2072	0.1715	0.2144	0.0065	0.2160	0.0056
13	0.8496	0.8371	0.8637	0.8426	0.8622	0.8376	0.8482	0.8310
14	0.1365	0.1193	0.1330	0.1098	0.1353	0.1249	0.1419	0.1151
15	0.3487	0.3356	0.3736	0.3561	0.3406	0.3306	0.3689	0.3459
16	0.0659	0.0015	0.0435	0.0056	0.0635	0.0024	0.0616	0.0020
17	0.8095	0.7836	0.8192	0.8127	0.8217	0.7825	0.8198	0.7933
18	0.0176	0.0031	0.0125	0.0017	0.0174	0.0025	0.0185	0.0024
19	0.1418	0.0796	0.1649	0.0831	0.1292	0.0887	0.1448	0.0826
20	0.7046	0.7055	0.7056	0.7117	0.7130	0.6665	0.7189	0.6905
AVG	0.3420	0.2904	0.3398	0.3081	0.3450	0.2859	0.3485	0.2905
	↑ 17.77%		↑ 10.29%		↑ 20.67%		↑ 19.97%

Table 5

Stabilities of reducts w.r.t. discrimination index

ID	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.7256	0.7312	0.7317	0.7029	0.7497	0.7313	0.7343	0.7393
2	0.5390	0.5106	0.5614	0.5520	0.1503	0.0736	0.5334	0.4984
3	0.0057	0.0018	0.0051	0.0024	0.0076	0.0024	0.0228	0.0000
4	0.4148	0.2838	0.3918	0.2855	0.4249	0.2905	0.4247	0.3017
5	0.6687	0.6532	0.6735	0.6507	0.6873	0.6357	0.6974	0.6518
6	0.0348	0.0035	0.0105	0.0018	0.0342	0.0025	0.0338	0.0027
7	0.3245	0.1235	0.2248	0.2075	0.3441	0.1280	0.3221	0.1203
8	0.1122	0.0582	0.1140	0.0549	0.1170	0.0575	0.1160	0.0646
9	0.1602	0.0545	0.1729	0.0522	0.1712	0.0545	0.1689	0.0538
10	0.0083	0.0023	0.0075	0.0023	0.0088	0.0024	0.0075	0.0028
11	0.1812	0.0916	0.1928	0.0805	0.1754	0.0832	0.1830	0.0877
12	0.2288	0.0067	0.2085	0.1731	0.2236	0.0085	0.2242	0.0081
13	0.7307	0.7184	0.7756	0.7806	0.7526	0.7288	0.7443	0.7457
14	0.1270	0.1100	0.1345	0.1013	0.1160	0.1090	0.1430	0.1155
15	0.3543	0.3493	0.3898	0.4014	0.1330	0.0776	0.1542	0.1190
16	0.0920	0.0028	0.0435	0.0040	0.0927	0.0040	0.0924	0.0023
17	0.2704	0.1824	0.3486	0.3031	0.2895	0.1829	0.7753	0.7382
18	0.0428	0.0033	0.0090	0.0019	0.0414	0.0032	0.0438	0.0022
19	0.1434	0.0855	0.1451	0.0798	0.1355	0.0844	0.1364	0.0873
20	0.7017	0.6822	0.6946	0.7057	0.6978	0.6669	0.4399	0.2774
AVG	0.2933	0.2327	0.2918	0.2572	0.2676	0.1964	0.2999	0.2309
	↑ 26.04%		↑ 13.45%		↑ 36.25%		↑ 29.88%

With a thorough investigation of Tables 3 –5, it is not difficult to list the following items.

Compared with the reducts derived by 4 conventional meta-heuristic searchings (GA, FOA, FSA, BA), if our constraint score guided initialization of population is used, then the corresponding revised searching will generate reducts with superior stabilities. Taking data “LSVT Voice Rehabilitation (ID: 8)” as an example, if the measure of dependency shown in Equation (7) is used, then the stability related to “GA” is 0.0567 while that related to “CG-GA” is 0.1177; the stability related to “FOA” is 0.0629 while that related to “CG-FOA” is 0.1171. Such an observation has demonstrated that our constraint score guided initialization of population does offer better initialized population than conventional meta-heuristic searching can.

No matter which measure is used to evaluate attributes and search reducts, our framework also shows significant advantages. By a comparison between “GA” and “CG-GA”, in Table 3 (the measure is dependency defined in Equation (7)), the average stabilities related to “GA” and “CG-GA” are 0.2045 and 0.2674, respectively; in Table 4 (the measure is condition entropy defined in Equation (8)), the average stabilities related to “GA” and “CG-GA” with condition entropy are 0.2904 and 0.3420, respectively. Similar cases can also be observed in Table 5. Such an observation has also demonstrated that our constraint score guided initialization of population is independent to the used measure in the problem solving of attribute reduction.

In addition, it can be observed that our superiority is also significant over large-scale datasets. Taking data “Statlog (Landsat Satellite) (ID: 17)” as an example, if the measure of dependency shown in Equation (7) is used, then the stability related to “GA” is 0.3540 while that related to “CG-GA” is 0.1038; the stability related to “FOA” is 0.2531 while that related to “CG-FOA” is 0.1248.

From discussions above, we can conclude that based on our study, the constraint score guided initialization of population is not only effective in improving the stabilities of reducts derived by various meta-heuristic searchings, but also is adaptive to search reducts with respect to different measures. Not only in common datasets, but also in big scale datasets. That is why our strategy is equipped with the characteristic of plug-and-play.

5.4 Comparisons of classification stabilities

In this section, the classification stabilities [51] related to different types of reduct will be compared. The classifiers called CART (Classification And Regression Tree) and KNN (K-Nearest Neighbor, K=3) are used to derive classification results over test samples.

Similar to the stability of reduct shown above, given a decision system $D$ , supposing that the universe U is divided into w (10-folds cross-validation is used in this experiment, thus, w = 10) disjoint groups with the same size such as U₁, U₂, ⋯ , U_w, the classification stability is calculated by

${Sta}_{C} =$ (17) $\frac{2}{w \cdot (w - 1)} \sum_{g = 1}^{w - 1} \sum_{g' = g + 1}^{w} Agg (A_{g}, A_{g'}),$ in which A_g is the reduct obtained over U - U_g, and Agg (A_g, A_g′) is defined as follows.

In Table 6, Pre_{A
_g} (x_i) is the predicted label of test sample x_i by using reduct A_g, and ξ₁, ξ₂, ξ₃ and ξ₄ are numbers of samples which satisfy the corresponding conditions shown in Table 6, respectively. Therefore, the agreement of classification results, i.e., Agg (A_g, A_g′) is

Table 6

An example of decision system

	Pre_{A _g} (x_i) = d (x_i)	Pre_{A _g} (x_i) ≠ d (x_i)
$Pre A_{g}^{'} (x_{i}) = d (x_{i})$	ξ₁	ξ₂
$Pre A_{g}^{'} (x_{i}) \neq d (x_{i})$	ξ₃	ξ₄

$Agg (A_{g}, A_{g'}) = \frac{ξ_{1} + ξ_{4}}{ξ_{1} + ξ_{2} + ξ_{3} + ξ_{4}} .$ (18)

The comparisons among classification stabilities based on 2 classifiers in terms of 3 measures are presented in the following Tables 7 –9.

Table 7

Classification stabilities based on CART and KNN w.r.t. dependency

ID	CART								KNN
	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8609	0.8425	0.8404	0.8409	0.8674	0.8404	0.8481	0.8328	0.9030	0.8843	0.8796	0.8880	0.9042	0.8790	0.9471	0.9288
2	0.9046	0.9018	0.9005	0.9047	0.9079	0.8991	0.9400	0.9311	0.9618	0.9598	0.9582	0.9609	0.9544	0.9493	0.9663	0.9647
3	0.5365	0.5191	0.5165	0.4965	0.5061	0.5209	0.5327	0.5209	0.5670	0.5391	0.5252	0.5365	0.5513	0.5278	0.5745	0.5564
4	0.8800	0.8252	0.8625	0.8652	0.8852	0.8036	0.8652	0.8477	0.8671	0.8216	0.8756	0.8838	0.8578	0.7899	0.8679	0.8532
5	0.8508	0.8365	0.8529	0.8494	0.8471	0.8475	0.8598	0.8208	0.9144	0.9156	0.9281	0.9212	0.9200	0.9219	0.9210	0.9075
6	0.5986	0.5471	0.6543	0.5700	0.6171	0.5814	0.6429	0.5714	0.6000	0.5486	0.6229	0.5971	0.6486	0.5771	0.6286	0.5843
7	0.5722	0.5642	0.5731	0.5672	0.5700	0.5667	0.6556	0.6133	0.7636	0.7636	0.7764	0.7758	0.7656	0.7639	0.8056	0.7814
8	0.7515	0.7000	0.7315	0.7131	0.7238	0.6777	0.7864	0.6904	0.6746	0.6669	0.6800	0.6762	0.6708	0.6592	0.6960	0.7016
9	0.6845	0.5082	0.6813	0.5065	0.6816	0.5082	0.7010	0.5116	0.6042	0.5063	0.6063	0.5048	0.6123	0.5062	0.6142	0.5076
10	0.6357	0.6257	0.6314	0.6243	0.6357	0.6471	0.6429	0.6457	0.7286	0.7114	0.7443	0.7143	0.7257	0.7243	0.7286	0.7371
11	0.6944	0.6871	0.7117	0.7035	0.7027	0.6996	0.6922	0.6794	0.7906	0.7719	0.8135	0.7900	0.7696	0.7646	0.7554	0.7373
12	0.6450	0.6483	0.6680	0.6453	0.6393	0.6375	0.6428	0.6380	0.7670	0.5930	0.6243	0.6023	0.7638	0.5968	0.7720	0.6025
13	0.6401	0.6479	0.6464	0.6539	0.6490	0.6461	0.6235	0.6192	0.7963	0.7991	0.8085	0.8066	0.8023	0.7942	0.7808	0.7841
14	0.6343	0.6390	0.6457	0.6076	0.6329	0.6129	0.6519	0.6376	0.7381	0.7276	0.7748	0.7500	0.7305	0.7171	0.7705	0.7648
15	0.7359	0.7063	0.7389	0.7433	0.7185	0.7126	0.7506	0.7328	0.7630	0.7567	0.7752	0.7689	0.7444	0.7707	0.8068	0.8034
16	0.6188	0.5188	0.7250	0.5450	0.6463	0.5425	0.6463	0.5250	0.6450	0.5813	0.5650	0.5775	0.6150	0.5625	0.6113	0.5663
17	0.8401	0.7966	0.8325	0.8073	0.8391	0.7876	0.8402	0.8080	0.8559	0.8319	0.8557	0.8409	0.8584	0.8143	0.8644	0.8521
18	0.5314	0.5366	0.5217	0.5029	0.5423	0.5303	0.5411	0.5046	0.5846	0.5777	0.5726	0.5743	0.5977	0.5823	0.5846	0.5651
19	0.7350	0.6944	0.7479	0.6979	0.7342	0.6815	0.7704	0.6917	0.7825	0.7393	0.8086	0.7499	0.7807	0.7440	0.7996	0.7501
20	0.7056	0.6662	0.6873	0.6876	0.7099	0.6734	0.7021	0.6332	0.7796	0.7353	0.7675	0.7713	0.7842	0.7399	0.7677	0.6712
AVG	0.7028	0.6706	0.7085	0.6766	0.7028	0.6708	0.7168	0.6728	0.7543	0.7216	0.7481	0.7345	0.7529	0.7193	0.7631	0.7310
	↑ 4.80%		↑ 4.71%		↑ 4.77%		↑ 6.54%		↑ 4.53%		↑ 1.85%		↑ 4.67%		↑ 4.39%

Table 8

Classification stabilities based on CART and KNN w.r.t. condition entropy

ID	CART								KNN
	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8510	0.8413	0.8554	0.8425	0.8504	0.8394	0.8477	0.8399	0.9400	0.9283	0.9467	0.9335	0.9377	0.9243	0.9236	0.9267
2	0.8989	0.8982	0.9005	0.8974	0.9016	0.8979	0.9225	0.9202	0.9714	0.9733	0.9725	0.9737	0.9705	0.9635	0.9777	0.9786
3	0.5355	0.5236	0.5200	0.5091	0.5055	0.5091	0.5255	0.5164	0.6245	0.5882	0.6055	0.5791	0.5673	0.5509	0.5600	0.6091
4	0.8992	0.8466	0.8762	0.8079	0.9038	0.8312	0.9038	0.8312	0.8986	0.8600	0.8858	0.8214	0.9005	0.8378	0.9005	0.8378
5	0.8903	0.8790	0.8836	0.8699	0.8947	0.8743	0.8975	0.8823	0.9368	0.9320	0.9314	0.9223	0.9408	0.9244	0.9406	0.9381
6	0.7500	0.5971	0.5700	0.5643	0.7114	0.5829	0.7257	0.6043	0.7143	0.7114	0.6757	0.6829	0.7114	0.6600	0.6929	0.7100
7	0.6244	0.6250	0.6281	0.6233	0.6158	0.6231	0.6242	0.6147	0.8186	0.7950	0.8064	0.8033	0.8178	0.7906	0.8189	0.7953
8	0.7336	0.6776	0.7320	0.7464	0.7568	0.7144	0.7544	0.7040	0.7168	0.7312	0.7032	0.7360	0.7336	0.7328	0.6840	0.6968
9	0.6767	0.5096	0.6877	0.5102	0.6850	0.5152	0.6757	0.5072	0.6242	0.5073	0.6230	0.5028	0.6180	0.5052	0.6094	0.5041
10	0.6486	0.6186	0.6586	0.5800	0.6500	0.5900	0.6471	0.5843	0.6814	0.7014	0.7057	0.6457	0.7071	0.6857	0.6886	0.6714
11	0.6903	0.6897	0.6952	0.6722	0.6886	0.6880	0.6971	0.6806	0.7891	0.7808	0.7789	0.7678	0.7827	0.7722	0.8023	0.7728
12	0.5853	0.5910	0.6088	0.5908	0.5845	0.5978	0.5835	0.6035	0.7713	0.6080	0.7440	0.7108	0.7720	0.6050	0.7798	0.6068
13	0.6502	0.6476	0.6490	0.6556	0.6525	0.6519	0.6494	0.6606	0.8033	0.8069	0.8089	0.8038	0.8094	0.8074	0.8324	0.8302
14	0.6488	0.6390	0.6273	0.6351	0.6322	0.6420	0.6293	0.5980	0.7688	0.7507	0.7590	0.7624	0.7459	0.7741	0.6727	0.6600
15	0.7475	0.7411	0.7472	0.7389	0.7513	0.7498	0.7170	0.7275	0.7958	0.7906	0.8011	0.8042	0.7909	0.7860	0.7419	0.7230
16	0.7718	0.5341	0.5835	0.5176	0.7118	0.5012	0.7588	0.5106	0.6035	0.5341	0.5541	0.5388	0.6118	0.5471	0.5976	0.5482
17	0.8751	0.8665	0.8711	0.8699	0.8768	0.8670	0.8738	0.8652	0.9510	0.9481	0.9532	0.9531	0.9525	0.9478	0.9505	0.9486
18	0.5335	0.5153	0.5218	0.5094	0.5176	0.5147	0.5188	0.5206	0.5482	0.5806	0.5524	0.5347	0.5606	0.5553	0.5741	0.5465
19	0.7344	0.6936	0.7347	0.6898	0.7379	0.6902	0.7830	0.7222	0.8025	0.7433	0.7868	0.7345	0.7843	0.7319	0.8187	0.7963
20	0.7306	0.7065	0.7181	0.7201	0.7308	0.7077	0.7151	0.7077	0.8292	0.8048	0.8165	0.8223	0.8220	0.8024	0.8176	0.8095
AVG	0.7238	0.6821	0.7034	0.6775	0.7180	0.6794	0.7225	0.6801	0.7795	0.7538	0.7705	0.7517	0.7768	0.7452	0.7692	0.7455
	↑ 6.11%		↑ 3.82%		↑ 5.68%		↑ 6.23%		↑ 3.41%		↑ 2.50%		↑ 4.24%		↑ 3.18%

Table 9

Classification stabilities based on CART and KNN w.r.t. discrimination index

ID	CART								KNN
	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8459	0.8432	0.8322	0.8259	0.8383	0.8367	0.8436	0.8326	0.9301	0.9272	0.9258	0.9226	0.9252	0.9232	0.9241	0.9248
2	0.9260	0.9295	0.9304	0.9291	0.9182	0.8761	0.9284	0.9272	0.9716	0.9677	0.9711	0.9716	0.9402	0.9028	0.9707	0.9677
3	0.5183	0.5217	0.5235	0.5235	0.5209	0.5148	0.5273	0.5027	0.5539	0.5487	0.5661	0.5948	0.5357	0.5313	0.8896	0.8542
4	0.9186	0.8608	0.8707	0.8003	0.8899	0.8452	0.8874	0.8674	0.9044	0.8501	0.8647	0.7879	0.8915	0.8211	0.9365	0.9338
5	0.9012	0.8890	0.8940	0.8808	0.9065	0.8856	0.9040	0.8823	0.9252	0.9304	0.9377	0.9310	0.9387	0.9265	0.7314	0.6614
6	0.7557	0.5929	0.5557	0.5629	0.7886	0.5886	0.7586	0.5629	0.7200	0.6814	0.6600	0.6886	0.7186	0.6729	0.8286	0.8103
7	0.6192	0.6133	0.6269	0.6297	0.6258	0.6181	0.6369	0.6094	0.8200	0.8183	0.8272	0.8261	0.8264	0.8103	0.7360	0.7160
8	0.6808	0.6664	0.6800	0.6560	0.6864	0.6488	0.6784	0.6648	0.7008	0.7112	0.6840	0.6984	0.7040	0.7160	0.6074	0.5044
9	0.6770	0.5014	0.6824	0.5078	0.6866	0.5127	0.6825	0.5108	0.6038	0.5029	0.6085	0.5038	0.6063	0.5094	0.7857	0.7804
10	0.5800	0.5600	0.5800	0.5786	0.5771	0.5886	0.5800	0.5843	0.6614	0.6786	0.6914	0.6743	0.6586	0.6157	0.7898	0.7830
11	0.6966	0.6996	0.6960	0.6754	0.7023	0.6886	0.6825	0.6869	0.7895	0.7832	0.7749	0.7665	0.7857	0.7741	0.7276	0.7181
12	0.6278	0.6045	0.6078	0.6078	0.6228	0.6318	0.6195	0.6143	0.7565	0.6170	0.7208	0.7198	0.7553	0.6030	0.8083	0.7977
13	0.6409	0.6308	0.6429	0.6384	0.6406	0.6411	0.6456	0.6425	0.7861	0.7785	0.7925	0.7893	0.7923	0.7807	0.6800	0.5500
14	0.6238	0.6190	0.6171	0.6119	0.6586	0.6243	0.6595	0.6410	0.7210	0.7262	0.7190	0.6981	0.7281	0.6905	0.5718	0.5765
15	0.7237	0.7133	0.7381	0.7022	0.7830	0.7611	0.7883	0.7713	0.7785	0.7863	0.7804	0.7748	0.8177	0.7872	0.8121	0.7671
16	0.7200	0.5000	0.5463	0.5225	0.7238	0.5163	0.7038	0.5138	0.6725	0.5450	0.6088	0.5913	0.6700	0.5463	0.7361	0.6861
17	0.7512	0.7429	0.8145	0.7703	0.8711	0.8699	0.8629	0.8547	0.8484	0.8474	0.9448	0.8862	0.9422	0.9419	0.5155	0.4900
18	0.5859	0.5371	0.5500	0.5259	0.5765	0.5341	0.5747	0.5247	0.5859	0.5565	0.5676	0.5847	0.5647	0.5571	0.6900	0.6486
19	0.7714	0.7366	0.7523	0.7305	0.7684	0.7225	0.7782	0.7065	0.8076	0.7724	0.7959	0.7723	0.8003	0.7609	0.7508	0.6145
20	0.7210	0.6957	0.7115	0.7092	0.7225	0.7038	0.6949	0.6486	0.8170	0.7953	0.8031	0.8091	0.8086	0.8003	0.9439	0.9415
AVG	0.7142	0.6729	0.6926	0.6694	0.7254	0.6804	0.7219	0.6774	0.7677	0.7412	0.7622	0.7496	0.7705	0.7335	0.7718	0.7363
	↑ 6.14%		↑ 3.47%		↑ 6.61%		↑ 6.57%		↑ 3.58%		↑ 1.68%		↑ 5.04%		↑ 4.82%

With a deep investigation of Tables 7 –9, it is not difficult to list the following items.

If our constraint score guidance based initialization of population is introduced into “GA”, “FOA”, “FSA” and “BA”, the derived reducts can provide us with better classification stabilities over most datasets. Taking “Urban Land Cover (ID: 19)” in Table 7 (the measure is dependency mentioned in Equation (7)) as an example, the classification stability related to “FSA” is 0.6815 while that related to “CG-FSA” is 0.7342; the classification stability related to “BA” is 0.6917 while that related to “CG-BA” is 0.7704. Such significant improvements of classification stabilities should be contributed to increasing the stabilities of reducts shown in Tables 3 –5.

From the perspective of the average value, it can also be observed that our framework shows significant advantages for all 3 measures mentioned in Equations (7)-(9), respectively. For example, if the measure “dependency” and the classifier CART are employed (see Table 7), the average classification stabilities related to “GA” and “CG-GA” are 0.6706 and 0.7028, respectively; if the measure “condition entropy” and the classifier CART are used (see Table 8), the average classification stabilities related to “GA” and “CG-GA” are 0.6821 and 0.7238, respectively.

Furthermore, no matter which measure is used to evaluate the importance of attributes in the process of searching reducts, it can also be observed that the improvements of classification stabilities based on classifier CART are better than those based on KNN. Taking Table 7 (the measure is “dependency”) as an example, if “BA” is compared with “CG-BA”, the average classification stability is increased by about 6.54% based on CART, but only by 4.39% based on KNN

From discussions above, we can conclude that the constraint score guided initialization of population is effective in producing reducts which provide better stabilities of classification results. Additionally, similar to the case of stability of reduct, from the perspective of the classification stability, our framework is also adaptive to different measures.

5.5 Comparisons of classification accuracies

In this section, the classification accuracies related to different types of reduct will be compared. The comparisons among classification accuracies based on 2 classifiers in terms of 3 measures are presented in the above Tables 10 –12.

Table 10
Classification accuracies based on CART and KNN w.r.t. dependency

ID CART KNN

CG-GA GA CG-FOA FOA CG-FSA FSA CG-BA BA CG-GA GA CG-FOA FOA CG-FSA FSA CG-BA BA

1 0.8459 0.8375 0.8307 0.8358 0.8514 0.8330 0.8409 0.8284 0.8451 0.8386 0.8286 0.8355 0.8462 0.8277 0.8404 0.8323

2 0.9202 0.9198 0.9177 0.9163 0.9226 0.9230 0.9340 0.9258 0.9677 0.9668 0.9653 0.9681 0.9621 0.9598 0.9572 0.9581

3 0.5122 0.5443 0.5400 0.5148 0.4939 0.5017 0.5745 0.5664 0.5078 0.5139 0.5261 0.5278 0.5087 0.4965 0.5609 0.5445

4 0.8912 0.8625 0.8945 0.8942 0.8915 0.8416 0.9036 0.8718 0.8803 0.8608 0.9090 0.9142 0.8679 0.8345 0.9107 0.8822

5 0.8265 0.8152 0.8246 0.8244 0.8252 0.8129 0.8121 0.7935 0.8312 0.8265 0.8400 0.8346 0.8300 0.8300 0.8392 0.8312

6 0.7086 0.6543 0.7457 0.6414 0.7343 0.6700 0.7500 0.6500 0.6700 0.6000 0.6771 0.6571 0.7171 0.6571 0.6857 0.6257

7 0.5567 0.5381 0.5767 0.5631 0.5714 0.5472 0.5283 0.5014 0.8053 0.7903 0.8089 0.8039 0.8044 0.7858 0.7269 0.7308

8 0.7585 0.7331 0.7569 0.7485 0.7485 0.7177 0.7512 0.7024 0.7285 0.7200 0.7431 0.7446 0.7292 0.7154 0.7104 0.7104

9 0.7192 0.5392 0.7216 0.5413 0.7232 0.5452 0.7359 0.5482 0.6488 0.5164 0.6510 0.5163 0.6570 0.5182 0.6663 0.5240

10 0.6871 0.6843 0.7014 0.6800 0.6900 0.6643 0.6543 0.6971 0.7386 0.6743 0.7271 0.7300 0.7157 0.7171 0.7471 0.6957

11 0.7535 0.7508 0.7606 0.7619 0.7710 0.7565 0.7497 0.7535 0.8035 0.7960 0.8171 0.8075 0.7892 0.8031 0.7874 0.7836

12 0.4170 0.2725 0.2743 0.2780 0.4203 0.2765 0.4225 0.2908 0.7638 0.4823 0.4603 0.4660 0.7665 0.4618 0.7703 0.5110

13 0.6224 0.6164 0.6264 0.6197 0.6234 0.6192 0.6169 0.6109 0.6549 0.6583 0.6628 0.6583 0.6563 0.6621 0.6544 0.6535

14 0.6671 0.6833 0.7052 0.6738 0.6829 0.6829 0.7057 0.6914 0.7738 0.7752 0.8005 0.7981 0.7657 0.7581 0.8162 0.8038

15 0.7315 0.7281 0.7341 0.7404 0.7267 0.7281 0.7804 0.7834 0.7496 0.7448 0.7567 0.7515 0.7381 0.7419 0.7974 0.8023

16 0.6938 0.5638 0.6338 0.5388 0.7088 0.5588 0.7100 0.5388 0.6513 0.5438 0.5975 0.5950 0.6438 0.5513 0.6600 0.5713

17 0.8182 0.7933 0.8166 0.7996 0.8196 0.7879 0.8224 0.8048 0.8320 0.8184 0.8348 0.8257 0.8347 0.8107 0.8423 0.8386

18 0.5617 0.4503 0.4891 0.4674 0.5543 0.4651 0.5411 0.4526 0.6509 0.6211 0.6251 0.6206 0.6669 0.6274 0.6440 0.6251

19 0.7341 0.6948 0.7424 0.7013 0.7401 0.6822 0.7533 0.6764 0.7493 0.7142 0.7647 0.7221 0.7452 0.7142 0.7713 0.7084

20 0.7294 0.7055 0.7253 0.7249 0.7272 0.7103 0.7095 0.6865 0.7683 0.7429 0.7701 0.7734 0.7670 0.7532 0.7447 0.7173

AVG 0.7077 0.6694 0.7009 0.6733 0.7113 0.6662 0.7148 0.6687 0.7510 0.7102 0.7383 0.7275 0.7506 0.7113 0.7566 0.7175

↑ 5.72% ↑ 4.10% ↑ 6.77% ↑ 6.89% ↑ 5.74% ↑ 1.48% ↑ 5.53% ↑ 5.45%

ID	CART	KNN
1	0.8459	0.8375	0.8307	0.8358	0.8514	0.8330	0.8409	0.8284	0.8451	0.8386	0.8286	0.8355	0.8462	0.8277	0.8404	0.8323
2	0.9202	0.9198	0.9177	0.9163	0.9226	0.9230	0.9340	0.9258	0.9677	0.9668	0.9653	0.9681	0.9621	0.9598	0.9572	0.9581
3	0.5122	0.5443	0.5400	0.5148	0.4939	0.5017	0.5745	0.5664	0.5078	0.5139	0.5261	0.5278	0.5087	0.4965	0.5609	0.5445
4	0.8912	0.8625	0.8945	0.8942	0.8915	0.8416	0.9036	0.8718	0.8803	0.8608	0.9090	0.9142	0.8679	0.8345	0.9107	0.8822
5	0.8265	0.8152	0.8246	0.8244	0.8252	0.8129	0.8121	0.7935	0.8312	0.8265	0.8400	0.8346	0.8300	0.8300	0.8392	0.8312
6	0.7086	0.6543	0.7457	0.6414	0.7343	0.6700	0.7500	0.6500	0.6700	0.6000	0.6771	0.6571	0.7171	0.6571	0.6857	0.6257
7	0.5567	0.5381	0.5767	0.5631	0.5714	0.5472	0.5283	0.5014	0.8053	0.7903	0.8089	0.8039	0.8044	0.7858	0.7269	0.7308
8	0.7585	0.7331	0.7569	0.7485	0.7485	0.7177	0.7512	0.7024	0.7285	0.7200	0.7431	0.7446	0.7292	0.7154	0.7104	0.7104
9	0.7192	0.5392	0.7216	0.5413	0.7232	0.5452	0.7359	0.5482	0.6488	0.5164	0.6510	0.5163	0.6570	0.5182	0.6663	0.5240
10	0.6871	0.6843	0.7014	0.6800	0.6900	0.6643	0.6543	0.6971	0.7386	0.6743	0.7271	0.7300	0.7157	0.7171	0.7471	0.6957
11	0.7535	0.7508	0.7606	0.7619	0.7710	0.7565	0.7497	0.7535	0.8035	0.7960	0.8171	0.8075	0.7892	0.8031	0.7874	0.7836
12	0.4170	0.2725	0.2743	0.2780	0.4203	0.2765	0.4225	0.2908	0.7638	0.4823	0.4603	0.4660	0.7665	0.4618	0.7703	0.5110
13	0.6224	0.6164	0.6264	0.6197	0.6234	0.6192	0.6169	0.6109	0.6549	0.6583	0.6628	0.6583	0.6563	0.6621	0.6544	0.6535
14	0.6671	0.6833	0.7052	0.6738	0.6829	0.6829	0.7057	0.6914	0.7738	0.7752	0.8005	0.7981	0.7657	0.7581	0.8162	0.8038
15	0.7315	0.7281	0.7341	0.7404	0.7267	0.7281	0.7804	0.7834	0.7496	0.7448	0.7567	0.7515	0.7381	0.7419	0.7974	0.8023
16	0.6938	0.5638	0.6338	0.5388	0.7088	0.5588	0.7100	0.5388	0.6513	0.5438	0.5975	0.5950	0.6438	0.5513	0.6600	0.5713
17	0.8182	0.7933	0.8166	0.7996	0.8196	0.7879	0.8224	0.8048	0.8320	0.8184	0.8348	0.8257	0.8347	0.8107	0.8423	0.8386
18	0.5617	0.4503	0.4891	0.4674	0.5543	0.4651	0.5411	0.4526	0.6509	0.6211	0.6251	0.6206	0.6669	0.6274	0.6440	0.6251
19	0.7341	0.6948	0.7424	0.7013	0.7401	0.6822	0.7533	0.6764	0.7493	0.7142	0.7647	0.7221	0.7452	0.7142	0.7713	0.7084
20	0.7294	0.7055	0.7253	0.7249	0.7272	0.7103	0.7095	0.6865	0.7683	0.7429	0.7701	0.7734	0.7670	0.7532	0.7447	0.7173
AVG	0.7077	0.6694	0.7009	0.6733	0.7113	0.6662	0.7148	0.6687	0.7510	0.7102	0.7383	0.7275	0.7506	0.7113	0.7566	0.7175
	↑ 5.72%	↑ 4.10%	↑ 6.77%	↑ 6.89%	↑ 5.74%	↑ 1.48%	↑ 5.53%	↑ 5.45%

Table 11

Classification accuracies based on CART and KNN w.r.t. condition entropy

ID	CART								KNN
	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8419	0.8325	0.8400	0.8341	0.8409	0.8317	0.8332	0.8328	0.8658	0.8619	0.8238	0.8158	0.8661	0.8610	0.8546	0.8549
2	0.9158	0.9112	0.9126	0.9139	0.9165	0.9156	0.9100	0.9093	0.9558	0.9584	0.9584	0.9602	0.9591	0.9554	0.9602	0.9602
3	0.5309	0.5064	0.5427	0.5273	0.5427	0.5236	0.5418	0.5236	0.5327	0.5600	0.5245	0.5164	0.5400	0.5427	0.5300	0.5255
4	0.8967	0.8797	0.8915	0.8526	0.8995	0.8644	0.8995	0.8644	0.9025	0.8734	0.8912	0.8414	0.9011	0.8707	0.9011	0.8707
5	0.8884	0.8830	0.8848	0.8735	0.8916	0.8810	0.8215	0.8200	0.9004	0.8985	0.8962	0.8890	0.9063	0.8895	0.8379	0.8327
6	0.7557	0.5771	0.5871	0.5457	0.7443	0.5600	0.7657	0.6057	0.6857	0.5914	0.5843	0.5757	0.6729	0.5771	0.6557	0.5714
7	0.5667	0.5206	0.5417	0.5350	0.5503	0.5417	0.5533	0.5286	0.7494	0.7550	0.7614	0.7475	0.7408	0.7433	0.7556	0.7475
8	0.7480	0.7304	0.7336	0.7728	0.7776	0.7424	0.7264	0.7216	0.7856	0.7848	0.7720	0.8056	0.7888	0.7936	0.7136	0.7120
9	0.7240	0.5528	0.7312	0.5446	0.7307	0.5474	0.7193	0.5259	0.6620	0.5167	0.6649	0.5191	0.6530	0.5164	0.6462	0.5125
10	0.7357	0.6771	0.7686	0.6886	0.7300	0.7100	0.7186	0.6957	0.7200	0.7257	0.7629	0.6914	0.7629	0.7214	0.7357	0.7243
11	0.7644	0.7577	0.7583	0.7484	0.7554	0.7608	0.7657	0.7543	0.7886	0.8086	0.7794	0.7935	0.7760	0.8048	0.7943	0.8040
12	0.4803	0.3378	0.4030	0.3960	0.4843	0.3205	0.4833	0.3348	0.8258	0.5913	0.7020	0.7045	0.8225	0.5803	0.8265	0.5740
13	0.6181	0.6090	0.6163	0.6108	0.6149	0.6157	0.6443	0.6537	0.6688	0.6666	0.6700	0.6673	0.6696	0.6690	0.6770	0.6739
14	0.7254	0.7088	0.7073	0.7020	0.6927	0.7137	0.6746	0.6678	0.7976	0.7839	0.7922	0.7990	0.7854	0.8024	0.7151	0.7073
15	0.7626	0.7623	0.7551	0.7540	0.7608	0.7555	0.6958	0.7034	0.7551	0.7675	0.7657	0.7687	0.7626	0.7615	0.6725	0.6604
16	0.7506	0.4953	0.5965	0.4988	0.7235	0.5059	0.7329	0.5200	0.6894	0.5388	0.5835	0.5447	0.6859	0.5706	0.6776	0.5141
17	0.8607	0.8563	0.8605	0.8593	0.8609	0.8574	0.8596	0.8560	0.9053	0.9031	0.9070	0.9065	0.9060	0.9030	0.9049	0.9029
18	0.5129	0.4459	0.4347	0.4288	0.5129	0.4582	0.5141	0.4218	0.5759	0.5541	0.5718	0.5700	0.6071	0.5653	0.5982	0.5718
19	0.7527	0.7000	0.7462	0.6929	0.7487	0.6994	0.7668	0.6981	0.7889	0.7265	0.7748	0.7071	0.7809	0.7302	0.7510	0.7101
20	0.7460	0.7294	0.7391	0.7440	0.7413	0.7341	0.7284	0.7279	0.7798	0.7655	0.7772	0.7809	0.7740	0.7693	0.7730	0.7705
AVG	0.7289	0.6737	0.7025	0.6762	0.7260	0.6769	0.7177	0.6683	0.7668	0.7316	0.7482	0.7302	0.7680	0.7314	0.7490	0.7100
	↑ 8.19%		↑ 3.89%		↑ 7.25%		↑ 7.39%		↑ 4.81%		↑ 2.47%		↑ 5.00%		↑ 5.49%

Table 12

Classification accuracies based on CART and KNN w.r.t. discrimination index

ID	CART								KNN
	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA	CG-GA	GA	CG-FOA	FOA	CG-FSA	FSA	CG-BA	BA
1	0.8138	0.8090	0.8067	0.8075	0.8067	0.8087	0.8103	0.8061	0.8126	0.8151	0.8096	0.8146	0.8129	0.8122	0.8110	0.8086
2	0.9293	0.9356	0.9335	0.9370	0.9368	0.9118	0.9349	0.9353	0.9618	0.9593	0.9605	0.9600	0.9465	0.9261	0.9621	0.9607
3	0.5330	0.5243	0.5374	0.5252	0.5243	0.5226	0.5027	0.4836	0.5026	0.5209	0.5278	0.5035	0.5504	0.5148	0.5373	0.5055
4	0.9455	0.9055	0.9126	0.8548	0.9318	0.8970	0.9290	0.9162	0.9384	0.9044	0.9044	0.8375	0.9252	0.8762	0.9271	0.9049
5	0.8621	0.8537	0.8504	0.8454	0.8594	0.8475	0.8610	0.8467	0.8694	0.8696	0.8723	0.8702	0.8750	0.8667	0.8712	0.8710
6	0.8414	0.6229	0.6057	0.5757	0.8600	0.5929	0.8357	0.5857	0.7014	0.5986	0.6186	0.6257	0.7271	0.6171	0.7414	0.6357
7	0.5306	0.5042	0.5131	0.5186	0.5267	0.4981	0.5344	0.5161	0.7042	0.7064	0.7056	0.7058	0.7100	0.7067	0.7136	0.7081
8	0.7168	0.7160	0.7384	0.7184	0.7272	0.7024	0.7392	0.7136	0.7576	0.7640	0.7456	0.7808	0.7792	0.7784	0.7952	0.7768
9	0.7265	0.5297	0.7318	0.5372	0.7322	0.5421	0.7267	0.5490	0.6492	0.5198	0.6472	0.5042	0.6488	0.5128	0.6497	0.5129
10	0.6686	0.6229	0.6814	0.6243	0.6729	0.6643	0.6543	0.6429	0.7214	0.7086	0.7357	0.7157	0.7214	0.7000	0.7643	0.7157
11	0.7589	0.7619	0.7615	0.7564	0.7667	0.7537	0.7598	0.7577	0.7905	0.8112	0.7718	0.7952	0.7859	0.8118	0.7924	0.8032
12	0.4043	0.3285	0.3798	0.3768	0.4053	0.3073	0.4058	0.3160	0.7953	0.6070	0.6855	0.7073	0.7988	0.5713	0.7955	0.6040
13	0.6616	0.6614	0.6560	0.6592	0.6573	0.6643	0.6598	0.6600	0.7018	0.7021	0.6970	0.7011	0.6981	0.6977	0.7024	0.6998
14	0.6462	0.6638	0.6552	0.6395	0.6762	0.6648	0.6643	0.6605	0.7286	0.7290	0.7390	0.7324	0.7333	0.7167	0.7395	0.7305
15	0.7389	0.7367	0.7459	0.7389	0.7332	0.7174	0.7302	0.7257	0.7756	0.7663	0.7656	0.7641	0.7430	0.7328	0.7532	0.7389
16	0.7238	0.5588	0.6250	0.5625	0.7225	0.5350	0.7175	0.5188	0.7375	0.5963	0.6400	0.6313	0.7363	0.6125	0.7163	0.5938
17	0.8031	0.7977	0.8544	0.8178	0.8542	0.8494	0.8543	0.8513	0.8854	0.8842	0.9590	0.9115	0.9023	0.9018	0.9047	0.9027
18	0.5076	0.4565	0.4565	0.4435	0.5029	0.4629	0.5024	0.4824	0.6018	0.5512	0.5824	0.6065	0.6076	0.6035	0.6012	0.5841
19	0.7681	0.7241	0.7564	0.7130	0.7702	0.7108	0.7683	0.6988	0.7776	0.7376	0.7735	0.7436	0.7804	0.7350	0.7787	0.7357
20	0.7391	0.7280	0.7364	0.7376	0.7408	0.7325	0.7200	0.6956	0.7808	0.7676	0.7759	0.7780	0.7768	0.7743	0.7434	0.7247
AVG	0.7160	0.6721	0.6969	0.6695	0.7204	0.6693	0.7155	0.6681	0.7597	0.7260	0.7458	0.7344	0.7630	0.7234	0.7650	0.7258
	↑ 6.53%		↑ 4.09%		↑ 7.63%		↑ 7.09%		↑ 4.64%		↑ 1.55%		↑ 5.47%		↑ 5.40%

With a deep investigation of Tables 10 –12, it is not difficult to conclude that the explanations about the results of classification accuracies are similar to those about the results of classification stabilities shown in the above subsection. In other words, the reducts derived by our framework can also provide better classification accuracies over most datasets. Taking “Madelon (ID:9)” in Table 10 (the measure is “dependency”) as an example, if the classifier CART is employed, then the classification accuracy related “GA” is 0.5392 while that related to “CG-GA” is 0.7192.

Furthermore, it is interesting to note that the classification accuracies are improved more significant over some high-dimensional data such as “Leukemia (ID:6)”, “Madelon (ID:9)”, “MLL (ID:10)”, etc. Take “Leukemia (ID:6)” in Tab 12 (the measure is “discrimination index”) as an example, if the classifier CART is employed, then the classification accuracy related to “FSA” is 0.5929 while that related to “CG-FSA” is 0.8600. The classification accuracy has been increased by about 45%.

6 Conclusions and future work

In this paper, through considering the limitations of meta-heuristic searching for problem solving of attribute reduction, a novel framework based on the guidance of constraint score is developed. By constraint score, potential attributes with better distinguishability can be revealed, which initializes the population for subsequent meta-heuristic searching. Note that our used constraint score is designed via a local view, which can capture the distinguishable ability of attributes over each label. Therefore, the evaluation of attributes by our local constraint score is more comprehensive than that by conventional constraint score.

Moreover, it is worth pointing out that our method shows a simple but effective two-stages of searching, in which mostly meta-heuristic algorithms can be easily embedded into it. Following the substantial experimental comparisons, our framework is competitive in both stability of the derived reduct and the related classification performance, i.e., classification stability and classification accuracy.

The following topics deserve our further investigations: (1) how to take the redundancy among features into account by considering the diversity, which aims to obtain a higher quality initial population; (2) the proposed framework in this study only from the perspective of initial population, therefore, we can explore some new strategies by considering the efficiency and effectiveness of the subsequent searching, e.g., pruning and boosting mechanisms.

Footnotes

Acknowledgments

This work was supported by National Natural Science Foundation of China (Nos. 62076111, 62176107) and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (No. SJCX22_1902).

References

Akila

and Allin

, Christe, A wrapper based binary bat algorithmwith greedy crossover for attribute selection, Expert Systemswith Applications 187 (2022), 115828.

Alsalibi

, Abualigah

and Khader

A.T.

, A novel bat algorithm with dynamic membrane structure for optimization problems, Applied Intelligence 51 (2021), 1992–2017.

Amini

and Hu

G.P.

, A two-layer feature selection method using Genetic Algorithm and Elastic Net, Expert Systems with Applications 166 (2021), 114072.

, Liu

K.Y.

, Ju

H.R.

, Xu

S.P.

, Xu

T.H.

and Yang

X.B

, Triple-G: Anew MGRS and attribute reduction, International Journal of Machine Learning and Cybernetics 13(2) (2022), 337–356.

Tirkolaee

E.B.

, Goli

and Weber

G.-W.

, Fuzzy mathematicalprogramming and self-adaptive artificial fish swarm algorithm forjust-in-time energy-aware flow shop scheduling problem withoutsourcing option, IEEE Transactions on Fuzzy Systems 28(11) (2020), 2772–2783.

Barembruch

, Garivier

and Moulines

, On approximatemaximum-likelihood methods for blind identification: How to copewith the curse of dimensionality, IEEE Transactions on SignalProcessing 57(11) (2009), 4247–4259.

Binato

, de Oliveira

G.C.

and de Araujo

J.L.

, A greedy randomized adaptive search procedure for transmission expansion planning, IEEE Transactions on Power Systems 16(2) (2001), 247–253.

Chen

D.G.

and Yang

Y.Y.

, Attribute reduction for heterogeneous databased on the combination of classical and fuzzy rough set models, IEEE Transactions on Fuzzy Systems 22(5) (2014), 1325–1334.

Chen

Y.M.

, Zhu

Q.X.

and Xu

H.R.

, Finding rough set reducts with fishswarm algorithm, Knowledge-Based Systems 81 (2015), 22–29.

10.

Chlis

N.-K.

, Bei

E.S.

and Zervakis

M.E.

, Introducing a stablebootstrap validation framework for reliable genomic signatureextraction, IEEE/ACM Transactions on Computational Biology andBioinformatics 15(1) (2018), 181–190.

11.

Cho

H.-W.

, Kim

S.B.

, Jeong

M.K.

, Park

, Ziegler

T.R.

and Jones

D.P.

, Genetic algorithm-based feature selection in high resolution NMR spectra, Expert Systems with Applications 35(3) (2008), 967–975.

12.

Dadaneh

B.Z.

, Markid

H.Y.

and Zakerolhosseini

, Unsupervised probabilistic feature selection using antcolony optimization, Expert Systems with Applications 53(2016), 27–42.

13.

Dai

J.H.

, Hu

Q.H.

, Zhang

J.H.

, Hu

and Zheng

N.G.

, Attribute selection for partially labeled categorical data by rough setapproach, IEEE Transactions on Cybernetics 47(9) (2017), 2460–2471.

14.

Etzion

and Ostergard

P.R.J.

, Greedy and heuristic algorithms forcodes and colorings, IEEE Transactions on Information Theory 44(1) (1998), 382–388.

15.

Fang

, Gao

and Yao

Y.Y.

, Granularity-driven sequentialthree-way decisions: A cost-sensitive approach to classification, Information Sciences 507 (2020), 644–664.

16.

Fang

and Min

, Cost-sensitive approximate attribute reductionwith three-way decisions, International Journal of ApproximateReasoning 104 (2019), 148–165.

17.

Ghaemi

and Feizi-Derakhshi

M.-R.

, Forest optimization algorithm, Expert Systems with Applications 41(15) (2014), 6676–6687.

18.

Ghaemi

and Feizi-Derakhshi

M.-R.

, Feature selection using forestoptimization algorithm, Pattern Recognition 60 (2016), 121–129.

19.

Hijazi

N.M.

, Faris

and Aljarah

, A parallel metaheuristicapproach for ensemble feature selection based on multi-corearchitectures, Expert Systems with Applications 182(2021), 115290.

20.

Holland

J.H.

, Genetic algorithms and the optimal allocation oftrials, SIAM Journal on Computing 2(2) (1973), 88–105.

21.

Q.H.

, Pedrycz

, Yu

D.R.

and Lang

, Selecting discrete and continuous features based on neighborhood decision errorminimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 40(1) (2010), 137–150.

22.

Q.H.

, Xie

Z.X.

and Yu

D.R.

, Hybrid attribute reduction based on anovel fuzzy-rough model and information granulation, Pattern Recognition 40(12) (2007), 3509–3521.

23.

Q.H.

, Yu

D.R.

and Xie

Z.X.

, Neighborhood classifiers, Expert Systems with Applications 34(2) (2008), 866–876.

24.

Huang

C.W.

, Li

Y.X.

and Yao

, A survey of automatic parameter tuning methods for metaheuristics, IEEE Transactions on Evolutionary Computation 24 (2020), 201–216.

25.

Jiang

Z.H.

, Yang

X.B.

, Yu

H.L.

, Liu

, Wang

P.X.

and Qian

Y.H.

, Accelerator for multi-granularity attribute reduction, Knowledge-Based Systems 177 (2019), 145–158.

26.

F.C.

, Jin

C.X.

, Zhang

, Wang

and Liu

X.F.

, Attributeimportance measurement method based on data coordination degree, Knowledge-Based Systems 192 (2020), 105359.

27.

W.Y.

, Chen

H.M.

, Li

T.R.

, Wan

J.H.

and Sang

B.B.

, Unsupervised feature selection via self-paced learning and low-redundant regularization, Knowledge-Based Systems 240 (2022), 108150.

28.

Liu

K.Y.

, Li

T.R.

, Yang

X.B.

, Yang

, Liu

and Zhang

P.F.

, Granular cabin: An efficient solution to neighborhood learning in big data, Information Sciences 583 (2022), 189–201.

29.

Liu

K.Y.

, Yang

X.B.

, Fujita

, Liu

, Yang

and Qian

Y.H.

, An efficient selector for multi-granularity attribute reduction, Information Sciences 505 (2019), 457–472.

30.

Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Mi

J.S.

, Wang

P.X.

and Chen

X.J.

, Rough set based semi-supervised feature selection via ensemble selector, Knowledge-Based Systems 165 (2019), 282–296.

31.

Liu

M.X.

and Zhang

D.Q.

, Pairwise constraint-guided sparse learning for feature selection, IEEE Transactions on Cybernetics 46(1) (2016), 298–310.

32.

Luan

X.Y.

, Li

Z.P.

and Liu

T.Z.

, A novel attribute reductionalgorithm based on rough set and improved artificial fish swarmalgorithm, Neurocomputing 174 (2016), 522–529.

33.

Luo

, Zheng

, Li

T.R.

, Chen

H.M.

, Huang

Y.Y.

and Peng

, Orthogonally constrained matrix factorization for robust unsupervised feature selection with local preserving, Information Sciences 586 (2022), 662–675.

34.

W.P.

, Zhou

X.B.

, Zhu

, Li

L.W.

and Jiao

L.C.

, A twostage hybridant colony optimization for high-dimensional feature selection, Pattern Recognition 116 (2021), 107933.

35.

Nawaz

M.S.

, Nawaz

M.Z.

, Hasan

, Fournier-Viger

and Sun

, An evolutionary/heuristic-based proof searching frame-work for interactive theorem prover, Applied Soft Computing 104(2021), 107200.

36.

Nouri-Moghaddam

, Ghazanfari

and Fathian

, A novel multi-objective forest optimization algorithm for wrapper feature selection, Expert Systems with Applications 175 (2021), 114737.

37.

Park

C.H.

and Kim

S.B.

, Sequential random k-nearest neighbor feature selection for high-dimensional data, Expert Systems withApplications 42(5) (2015), 2336–2342.

38.

Pawlak

, Rough sets, International Journal of Computer &Information Sciences 11 (1982), 341–356.

39.

Pawlak

, Wong

S.K.M.

and Ziarko

, Rough sets: Proba-bilisticversus deterministic approach, International Journal of Man-Machine Studies 29(1) (1988), 81–95.

40.

Y.P.

, Xu

, Shang

C.J.

, Ge

X.L.

, Deng

A.S.

and Shen

, Inconsistency guided robust attribute reduction, InformationSciences 580 (2021), 69–91.

41.

Rashno

, Shafipour

and Fadaei

, Particle ranking: Anefficient method for multi-objective particle swarm optimization feature selection, Knowledge-Based Systems 245 (2022), 108640.

42.

Song

X.F.

, Zhang

, Guo

Y.N.

, Sun

X.Y.

and Wang

Y.L.

, Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data, IEEE Transactions on Evolutionary Computation 24(5) (2020), 882–895.

43.

Sun

and Zhang

D.Q.

, Bagging Constraint Score for featureselection with pairwise constraints, Pattern Recognition 43(6) (2010), 2106–2118.

44.

Tabakhi

and Moradi

, Relevance-redundancy feature selectionbased on ant colony optimization, Pattern Recognition 438 (2015), 2798–2811.

45.

Tran

, Xue

and Zhang

M.J.

, Variable-length particle swarmoptimization for feature selection on high-dimensionalclassification, IEEE Transactions on Evolutionary Computation 23(3) (2019), 473–487.

46.

Wang

C.Z.

, Hu

Q.H.

, Wang

X.Z.

, Chen

D.G.

, Qian

Y.H.

and Dong

, Feature selection based on neighborhood discrimination index, IEEE Transactions on Neural Networks and Learning Systems 29(7) (2018), 2986–2999.

47.

Wang

C.Z.

, Huang

, Shao

M.W.

, Hu

Q.H.

and Chen

D.G.

, Feature selection based on neighborhood self-information, IEEE Transactions on Cybernetics 50(9) (2020), 4031–4042.

48.

Wang

H.D.

and Jin

Y.C.

, A random forest-assisted evolutionary algorithm for data-driven constrained multiobjective combinatorial optimization of trauma systems, IEEE Transactions on Cybernetics 50(2) (2020), 536–549.

49.

Wojtowytsch

and E

, Can shallow neural networks beat the curseof dimensionality? A mean field training perspective, IEEE Transactions on Artificial Intelligence 1(2) (2020), 121–129.

50.

Xie

, Zhang

X.Y.

and Zhang

S.Y.

, Rough set theory and attribute reduction in interval-set information system, Journal of Intelligent & Fuzzy Systems 42(6) (2022), 4919–4929.

51.

S.P.

, Yang

X.B.

, Yu

H.L.

, Yu

D.J.

, Yang

J.Y.

and Tsang

E.C.C.

, Multi-label learning with label-specific feature reduction, Knowledge-Based Systems 104 (2016), 52–61.

52.

Yang

, Hu

B.Q.

and Qiao

J.S.

, Three-way decisions with rough membership functions in covering approximation space, Fundamenta Informaticae 165(2) (2019), 157–191.

53.

Yang

W.Y.

, Sun

B.Y.

, Zhu

Y.S.

and Wu

D.H.

, A secure heuristicsemantic searching scheme with blockchain-based verification, Information Processing & Management 58(4) (2021), 102548.

54.

Yang

X.B.

, Liang

S.C.

, Yu

H.L.

, Gao

and Qian

Y.H.

, Pseudolabel neighborhood rough set: Measures and attribute reductions, International Journal of Approximate Reasoning 105 (2019), 112–129.

55.

Yang

X.B.

and Yao

Y.Y

, Ensemble selector for attribute reduction, Applied Soft Computing 70 (2018), 1–11.

56.

Yang

X.S.

, A new metaheuristic bat-inspired algorithm, in: Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), C. Cruz, J. Gonzalez, N. Krasnogor and G. Terraza, eds., Springer, Granada, 2010, pp. 65–74.

57.

Yao

Y.Y

, Zhao

and Wang

, On Reduct Construction Algorithms, in: Transactions on Computational Science II, M.L. Tan, C.J.K. Wang, Y. Yao, Y. Wang and G., eds., Springer, Berlin, 2008, pp. 100–117.

58.

Zhang

X.Y.

and Yao

Y.Y.

, Tri-level attribute reduction in rough settheory, Expert Systems with Applications 190 (2022), 116187.

59.

Zhang

, Mei

C.L.

, Chen

D.G.

and Li

J.H.

, Feature selection inmixed data: A method using a novel fuzzy rough set-based informationentropy, Pattern Recognition 56 (2016), 1–15.

60.

Zhang

Z.Q.

, Wang

K.P.

, Zhu

L.X.

and Wang

, A Pareto improved artificial fish swarm algorithm for solving a multi-objective fuzzy disassembly line balancing problem, Expert Systems with Applications 86 (2017), 165–176.

61.

Zhou

and Hua

Z.S.

, A correlation guided genetic algorithm and its application to feature selection, Applied Soft Computing 123 (2022), 108964.

62.

Zhou

, Long

Y.X.

, Zhang

W.P.

, Pu

Q.L.

, Wang

, Nie

and He

, Adaptive Genetic algorithm-aided neural network with channel state information tensor decomposition for indoor localization, IEEE Transactions on Evolutionary Computation 25(5) (2021), 913–927.