Label-specific guidance for efficiently searching reduct

Abstract

In the era of big data for exploring attribute reduction/rough set-based feature selection related problems, to design efficient strategies for deriving reducts and then reduce the dimensions of data, two fundamental perspectives of Granular Computing may be taken into account: breaking up the whole into pieces and gathering parts into a whole. From this point of view, a novel strategy named label-specific guidance is introduced into the process of searching reduct. Given a formal description of attribute reduction, by considering the corresponding constraint, we divide it into several label-specific based constraints. Consequently, a sequence of these label-specific based constraints can be obtained, it follows that the reduct related to the previous label-specific based constraint may have guidance on the computation of that related to the subsequent label-specific based constraint. The thinking of this label-specific guidance runs through the whole process of searching reduct until the reduct over the whole universe is derived. Compared with five state-of-the-art algorithms over 20 data sets, the experimental results demonstrate that our proposed acceleration strategy can not only significantly accelerate the process of searching reduct but also offer justifiable performance in the task of classification. This study suggests a new trend concerning the problem of quickly deriving reduct.

Keywords

Accelerator attribute reduction label-specific guidance rough set

1 Introduction

Feature selection [1 , 24], an essential mechanism for data pre-processing, has been not only received extensive attention but also widely employed in various practical problems such as data sketching, rule generation, and so on. The core of feature selection is to select a subset of features, which satisfies the given constraint, while retaining a suitably high accuracy in representing the raw features [23].

Presently, the concept of attribute reduction [4 , 37] rooted in the rough set theory [33] has been demonstrated to be of great value in reducing the dimensions of data with clear semantic explanations. For such a reason, attribute reduction has been widely accepted as a substantial tool for realizing feature selection. Since attribute reduction is a rough set-based feature selection, the corresponding constraint in attribute reduction is closely related to the measure derived from rough set model. In other words, different rough set models or measures may be helpful in configuring different forms of attribute reduction. This is the reason why attribute reduction is with strong adaptability.

Following the above discussions, an open problem is how to search the corresponding subset of conditional attributes which satisfies the given constraint in attribute reduction. Such subset is referred to as the reduct in rough set theory. Up to now, discernibility matrix [8 , 41] and heuristic-based searching [11, 54] are two well-known approaches. Nevertheless, it should be pointed out that though the discernibility matrix can achieve all reducts, such an approach is quite time-consuming. Compared with the discernibility matrix, heuristic-based searching has been attracted more attention. Such an approach can quickly achieve one attribute reduction of problem solving because the heuristic information is involved in the process of searching reduct.

Among various heuristic-based searchings, forward greedy searching is especially popular. The main reasons can be attributed to the following two phases: 1) such a searching can be easily improved because of its simple structure; 2) such a searching can be employed to quickly select the candidate conditional attributes to the reduct. However, it’s worth noting that the forward greedy searching is still strenuous if the volume or the scale of the data is huge, especially in the era of big data. Therefore, many researchers have devoted themselves to designing various acceleration methods for further improving the efficiency of forward greedy searching. For example, Qian et al. [35] have proposed the framework with the name of positive approximation, which can gradually reduce the volume of samples in the process of searching reduct; Liu et al. [26] have introduced a mechanism named bucket into the computation of distances among samples, which employed a hash function for significantly reducing the unnecessary computations; Chen et al. [3] have developed the concept of attribute group, which is helpful in reducing the times of evaluating candidate conditional attributes; Rao et al. [38] have introduced the similarity/dissimilarity among conditional attributes into the process of searching reduct, which aims to select two or more candidate conditional attributes for each iteration. Meanwhile, to explore the difference of the existing attribute reduction methods, Li et al. [18] have introduced the relationships among the consistencies into the comparison of attribute reduction methods in formal decision contexts [19, 21], the results of the comparison can be employed to help users to select the appropriate attribute reduction method for achieving their requirements.

Through a careful reviewing of the previous studies, we observe that most of the existing acceleration methods of forward greedy searching pay attention to the sample or the attribute, few of them have made the best use of the information provided by the labels. With a thorough consideration about the role of the label, it is not difficult to observe the following facts: 1) the label provides us with a local view for exploring attribute reduction related problem, i.e., label-specific [43 , 51] based reduct; 2) the differences among different label-specific based reducts can be quantitatively described, and then the inherent relationships among different labels may be characterized by these reducts; 3) different label-specific based reducts can be further fused for improving the performances of the subsequent learning tasks.

To sum up, by considering the advantages of the label, a new strategy called label-specific guidance will be proposed in this paper. On the one hand, although the labels in data are employed in supervised approaches in previous research [3 , 38], these approaches only concerned with the labels in data from the perspective of measure, the hierarchy among the labels are less considered. Therefore, we can take the labels in data into consideration to divide the whole data into several pieces. On the other hand, the guidance thinking may be constructed to choose appropriate conditional attributes because of the hierarchy among the labels. For each piece of the data, the reduct related to the previous piece can be employed to guide the computation of the reduct related to the subsequent piece. Ultimately, with this guidance thinking, the reduct over the whole universe can be naturally obtained. In such a strategy, we can accelerate the process of searching reduct.

Therefore, the main contributions of our studies contain two aspects. 1) The principles of breaking up the whole into pieces and gathering parts into a whole in Granular Computing [9 , 47] are introduced into the process of forward greedy searching. The reduct related to the pieces can be employed to guide the generation of the reduct related to the whole, which can effectively reduce the time consumption during the period of deriving reduct. 2) A complex problem can be divided into several simple problems the information provided by the labels, which can greatly improve the efficiency of problem solving. In the context of this paper, the solutions of the several simple problems will eventually be combined into the solution of the original complex problem while keeping the distribution of the raw data unchanged.

The rest of this paper is organized as follows. Section 2 will review the basic notions related to some measures and two general expressions of attribute reduction in the rough set model. In Section 3, our proposed acceleration strategy, i.e., label-specific guidance for efficiently searching reduct will be illustrated in detail. In Section 4, comparative experimental results will be conducted, as well as the corresponding analyses. In Section 5, conclusions and future perspectives will conclude the whole paper.

2 Preliminaries

2.1 Measures

In the field of the rough set theory, a decision system can be given by a 3-tuple such that $D = < U, AT, L >$ , in which U is a nonempty finite set of samples, called the universe; AT is a nonempty finite set of conditional attribute; L is the label attribute for describing the categories of samples, i.e., ∀x_i ∈ U, there must be at least one L (x_i), which denotes the label of the sample x_i. In the context of this paper, we mainly pay attention to the situation of single label, i.e., ∀x_i ∈ U, there exists only one L (x_i).

Generally speaking, one of the fundamental motivations of the rough set theory is to quantitatively characterize the uncertainty in data. Therefore, given a decision system $D$ , the measure of uncertainty can be formally defined as a mapping such that $ρ : P (U) \times P (AT) \to ℝ,$ (1) in which $P (U)$ and $P (AT)$ are the power sets of U and AT, respectively, $ℝ$ is the set of all real numbers.

Following Eq. (1), two different perspectives can be employed to design various measures for characterizing the uncertainties: 1) if the whole universe is employed, then ρ (U, AT) can be regarded as a measure based on the global perspective; 2) if ∀X ⊂ U is used, then ρ (X, AT) can be considered as a measure based on the local perspective [46].

It is not difficult to observe that the difference between the above two perspectives lies in the volume of samples. In the initial stage of exploring the rough set theory, the global perspective dominates the characterization of uncertainty. In recent years, because of the diversities of practical requirements in the era of big data, it has been demonstrated that the local perspective is superior to the global one [23]. Interestingly, the information provided by the label attribute can be naturally employed for conducting the local perspective related studies.

G nsidering the label attribute in $D$ , the universe can be partitioned into serval disjoint classes, and the set of these classes is denoted as ℙ, | ℙ | is regarded as the number of the labels. Supposing that | ℙ |>1 in this paper, this is mainly because there always exists at least two labels in learning tasks. In what follows, ∀X_j∈ ℙ, X_j is the maximal subset of U, in which different samples possess the same label, and then the local perspective of measuring the uncertainty can be expressed as ρ (X_j, AT). Obviously, ρ (X_j, AT) is closely related to a fixed label in data, thus ρ (X_j, AT) will also be referred to as the label-specific measure in this paper.

Presently, following the above two perspectives to characterize uncertainties, various measures have been proposed. Most of the measures can be divided into the following two categories.

The higher the value of the measure calculated by using ρ, the less the degree of uncertainty is revealed. For example, approximation quality is such a type of measure. In this case, ρ (U, AT) denotes the approximation quality obtained over the whole universe while ρ (X_j, AT) is the label-specific approximation quality obtained over a specific class X_j [45].

The lower the value of the measure calculated by using ρ, the less the degree of uncertainty is revealed. For instance, conditional entropy is such a type of measure. In this situation, ρ (U, AT) denotes the conditional entropy acquired over the whole universe while ρ (X_j, AT) is the label-specific conditional entropy acquired over a specific class X_j [40].

2.2 Attribution reduction

As one of the most important ways to realize feature selection, attribute reduction has witnessed great success in the developments of both rough set and dimension reduction. Different from popular feature selection techniques, attribute reduction in the rough set theory is not only effective in removing irrelevant or redundant conditional attributes but also equipped with clear semantic explanations. It should be pointed out that these semantic explanations are closely related to the constraints defined in attribute reduction. For example, it can minimize the delayed decision [7, 39]/misclassification costs [16, 32], preserve the regions of approximations related to various rough sets [31], improve the generalization performances of classifiers [34], and so on.

Through a comprehensive review of the previous research, Yao [49] pointed out that most of the definitions of attribute reduction possess a similar structure, then he give the formal description of the similar structure as follows.

Definition 1. Given a decision system $D$ , $C_{ρ (U, AT)}$ is a constraint defined over the whole universe measure ρ (U, AT), ∀A ⊆ AT, A is referred to as a $C_{ρ (U, AT)}$ -reduct if and only if:

A satisfies the constraint $C_{ρ (U, AT)}$ ;

∀A′ ⊂ A, A′ does not satisfy the constraint $C_{ρ (U, AT)}$ .

Definition 2. Given a decision system $D$ , $C_{ρ (X_{j}, AT)}$ is a constraint based on a label-specific measure ρ (X_j, AT) in which X_j∈ ℙ, ∀A ⊆ AT, A is referred to as a $C_{ρ (X_{j}, AT)}$ -reduct if and only if:

A satisfies the constraint $C_{ρ (X_{j}, AT)}$ ;

∀A′ ⊂ A, A′ does not satisfy the constraint $C_{ρ (X_{j}, AT)}$ .

Obviously, the difference between Def. 1 and Def. 2 lies in the forms of the constraint. The constraint $C_{ρ (U, AT)}$ is constructed based on the global perspective for measuring uncertainty, while the constraint $C_{ρ (X_{j}, AT)}$ is constructed based on the local perspective for measuring uncertainty. Since ρ (X_j, AT) is closely related to the class X_j determined by the label attribute, the reduct shown in Def. 2 is also referred to as the label-specific based reduct in various studies.

In what follows, how to derive the reduct is an urgent problem. The following Algorithm 1 shows the detailed process of forward greedy searching, which has been widely employed for deriving reduct.

Algorithm 1. Forward greedy searching.

Inputs: Decision system $D$ , a constraint $C_{ρ (U, AT)}$ ;

Outputs: One $C_{ρ} (U, AT)$ -reduct A.

1. A =∅;

2. Calculate ρ (U, AT);

3. Repeat

4. ∀a ∈ AT - A, calculate ρ (U, A ∪ {a});

5. Select a qualified conditional attribute b ∈ AT - A with a criterion based on {ρ (U, A ∪ {a}) : ∀ a ∈ AT - A};

6. A = A ∪ {b};

7. Calculate ρ (U, A);

8. Until $C_{ρ (U, AT)}$ is satisfied;

9. Repeat

10. ∀c ∈ A, compute ρ (U, A - {c});

11. If $C_{ρ (U, AT)}$ is satisfied

12. A = A - {c};

13. End

14. Until A does not change or |A|=1;

15. Return A;

Algorithm 1 contains two main phases. The first phase is to add appropriate conditional attributes into the potential reduct, and the second phase is to remove redundant conditional attributes from the potential reduct. Therefore, such two phases are consistent with the conditions shown in Def. 1. Furthermore, it is not difficult to calculate the time complexity and the space complexity of Algorithm 1 are $O (| U |^{2} \cdot | AT |^{2})$ and $O (AT)$ .

Algorithm 1 is employed to derive reduct in Def. 1, it can also be easily modified to derive label-specific based reduct if the measure ρ (U, AT) is replaced by ρ (X_j, AT). In such a case, the space complexity of Def. 2 is the same with that of Def. 1, but the corresponding time complexity in Def. 2 is $O ((| U | \cdot | X_{j} |) \cdot | AT |^{2})$ . This is mainly because the former one is required to compute the similarities between samples over the whole universe, and then the measure ρ (U, AT) can be calculated. However, the latter one is only required to compute the similarities between samples in a specific class X_j, not the whole universe, then the measure ρ (X_j, AT) can be obtained.

3 Label-specific guidance

Though forward greedy searching has been successfully applied to derive reduct, various methods for further improving the efficiency of forward greedy searching have also been exploited [3, 38]. Without loss of generality, most of these methods can be divided into the following two perspectives.

Sample based perspective. In such a perspective, the efficiency of deriving reduct depends heavily on the volume of data. Therefore, with the increasing number of samples in data, the process of deriving reduct may be time-consuming.

From this point of view, by exploring the monotonic structure of the mapping ρ, Qian et al. [35] have designed a framework with the name of positive approximation. The positive approximation can gradually reduce the size of the samples for quickly evaluating candidate conditional attributes. Such a framework can also be regarded as one of the representative accelerators for speeding up the process of forward greedy searching. Furthermore, Liu et al. [26] have deemed that the computation of similarities among samples costs too much, thus the hash mapping was introduced into the quick calculation of similarities, which also aims to derive reduct efficiently.

Attribute based perspective. In such a perspective, the efficiency of deriving reduct depends heavily on the dimensions of data. Therefore, with the increasing number of conditional attributes in data, the process of deriving reduct is possible to be inefficient when facing some real-time tasks such as online feature selection [10, 53], streaming feature selection [27, 52], and so on.

In view of this, Chen et al. [3] have pointed out a mechanism named attribute group. In such a mechanism, conditional attributes are partitioned into several groups, we only need to evaluate the conditional attributes out of those groups which contain at least one conditional attribute in the potential reduct. In what follows, the times required for evaluating candidate conditional attributes can be reduced, and then the process of deriving reduct is accelerated further. Moreover, by considering the dissimilarity among conditional attributes, Rao et al. [38] have designed a new accelerator. Different from the attribute group, the dissimilarity based mechanism was employed to determine more than one appropriate conditional attribute for each iteration of selecting conditional attributes. It follows that the required iterations for constructing reduct may also be reduced, and then the corresponding elapsed time can be saved.

From discussions above, though both the perspectives of sample and attribute have been explored for speeding up the calculations of reducts, little attention has been paid to the information provided by the label.

It should be pointed out that based on the perspective of the label, some superiorities may emerge [2, 42]. For example, the label provides us with a local view for reviewing the attribute reduction related problems. It follows that the inherent relationships among labels can be clearly characterized by the selected conditional attributes; the label-specific based reduct [43 , 51] is derived over different labels, these reducts can be further fused for improving the performances of the subsequent learning tasks, e.g., accuracy and the stability of classification [45].

Though the essential thinking of label-specific is valuable, most of the previous computations of different label-specific based reducts are independent [5, 40]. Immediately, such computations may be either serial or parallel. Nevertheless, both of them do not fully take the overlaps among different label-specific based reducts into account, and the computing power may be wasted. From this point of view, a new strategy called label-specific guidance will be proposed in this paper for deriving reduct, the following Fig. 1 shows the details of our strategy.

Fig. 1

The process of label-specific guidance for searching reduct.

In Fig. 1, the detailed steps of label-specific guidance for searching reduct will be designed as follows:

Obtain ℙ based on the label attribute L;

∀X_j∈ ℙ, compute ρ (X_j, AT), and then sort them to derive a sequence of different disjoint classes such that X₁, X₂, ⋯ , X_n;

For X₁, search the $C_{ρ (X_{1}, AT)}$ -reduct A₁ by using Algorithm 1;

Let A₂ = A₁, if A₂ is not the $C_{ρ (X_{2}, AT)}$ -reduct, then select more appropriate conditional attributes from AT - A₂ and add them into A₂ until the corresponding constraint is satisfied, obtain $C_{ρ (X_{2}, AT)}$ -reduct;

......

Let A_n = A_n-1, if A_n is not the $C_{ρ (X_{n}, AT)}$ -reduct, then select more appropriate conditional attributes from AT - A_n and add them into A_n until the corresponding constraint is satisfied, obtain $C_{ρ (X_{n}, AT)}$ -reduct;

Let A = A_n, if A is not the $C_{ρ (U, AT)}$ -reduct, then select more appropriate conditional attributes from AT - A and add them into A until the corresponding constraint is satisfied, obtain $C_{ρ (U, AT)}$ -reduct.

The above steps divide the whole process of deriving reduct over the universe into two main phases: 1) for the j-th label (1 ≤ j ≤ n), calculate the $C_{ρ (X_{j}, AT)}$ -reduct; 2) calculate the $C_{ρ (U, AT)}$ -reduct over U. Different from previous quick searchings, it is not difficult to observe that in the first phase, the result of $C_{ρ (X_{j}, AT)}$ -reduct is recorded, and it is regarded as the basis for searching the following $C_{ρ (X_{j + 1}, AT)}$ -reduct; in the second phase, the result of $C_{ρ (X_{n}, AT)}$ -reduct is also recorded, and it is supposed to be the basis for searching $C_{ρ (U, AT)}$ -reduct. Immediately, the fundamental thinking of our strategy is that the previous reduct result can guide the searching of the subsequent reduct result, which may be useful in saving the computation load.

To formally express our strategy, the following Algorithm 2 will be designed.

Algorithm 2. Label-specific guidance searching.

Inputs: Decision system $D$ , constraint $C_{ρ (U, AT)}$ ;

Outputs: One $C_{ρ (U, AT)}$ -reduct A.

1. A =∅;

2. Obtain ℙ based on the label attribute L;

3. ∀X_j∈ ℙ, compute ρ (X_j, AT), and then sort them to derive a sequence of different disjoint classes such that

X₁, X₂, ⋯ , X_n;

4. Search $C_{ρ (X_{1}, AT)}$ -reduct A₁ by Algorithm 1; 5. For i = 2 to n

6. // i indicates the number of iterations

7. A_i = A_i-1;

8. // A_i-1 offers guidance for obtaining A_i

9. Repeat

10. ∀a ∈ AT - A_i, evaluate a by calculating ρ (X_i, A_i ∪ {a});

11. Select a qualified conditional attribute b ∈ AT - A with a criterion based on {ρ (X_i, A_i∪ {a}) : ∀ a ∈ AT - A_i};

12. A = A ∪ {b}

13. Calculate ρ (X_i, A)

14. Until $C_{ρ (X_{i}, AT)}$ is satisfied;

15. End

16. A = A_n;// A_n offers guidance for obtaining A

17. Calculate ρ (U, A);

18. Repeat

19. // Add the required conditional attributes

20. ∀a ∈ AT - A, evaluate a by calculating ρ (U, A ∪ {a});

21. Select a qualified conditional attribute b ∈ AT - A with a criterion based on {ρ (U, A∪ {a}) : ∀ a ∈

AT - A};

22. A = A ∪ {b};

23. Calculate ρ (U, A);

24. Until $C_{ρ (U, AT)}$ is satisfied;

25. Repeat

26. // Remove the redundant conditional attributes

27. ∀c ∈ A, compute ρ (U, A - {c});

28. If $C_{ρ (U, AT)}$ is satisfied

29. A = A - {c};

30. End

31. Until A does not change or |A|=1;

32. Return A.

Following Algorithm 2, the label-specific guidance searching divides $C_{ρ (U, AT)}$ -reduct into several computations of $C_{ρ (X_{i}, AT)}$ -reducts by using the information provided by the label. Moreover, the previous $C_{ρ (X_{i}, AT)}$ -reduct can guide the computation over the subsequent $C_{ρ (X_{i + 1}, AT)}$ -reduct, until $C_{ρ (X_{n}, AT)}$ -reduct guides the computation of $C_{ρ (U, AT)}$ -reduct.

Therefore, the time complexity of Algorithm 2 is $O (| X_{1} |^{2} \cdot | AT |^{2} \cdot | AT |^{2} + \dots + | X_{n} |^{2} \cdot | AT |^{2} + | U |^{2} \cdot | AT |^{2}) = O ((| X_{1} |^{2} + \dots + | X_{n} |^{2} + | U |^{2}) \cdot | AT |^{2}) = O ((| X_{1} |^{2} + \dots + | X_{n} |^{2} + (| X_{1} | + \dots + | X_{n} |)^{2}) \cdot | AT |^{2}) = O ((| X_{1} |^{2} + \dots + | X_{n} |^{2} + \dots + | X_{1} | | X_{n} | + \dots + | X_{n - 1} | | X_{n} |) \cdot | AT |^{2}) \leq O ((| X_{1} |^{2} + \dots + | X_{n} |^{2} + \dots + 2 | X_{1} | | X_{n} | + \dots + 2 | X_{n - 1} | | X_{n} |) \cdot | AT |^{2}) = O ((| X_{1} | + \dots + | X_{n} |)^{2} \cdot | AT |^{2}) = O (| U |^{2} \cdot | AT |^{2})$ .

The space complexity of Algorithm 2 is $O (AT)$ .

It is not difficult to observe that even though the space complexity of Algorithm 2 is the same with that of Algorithm 1, but the time complexity is much less than that of Algorithm 1. Therefore, our proposed Algorithm 2 can accelerate the process of deriving reduct in theory.

4 Experimental analysis

4.1 Data sets

To demonstrate the effectiveness of our proposed strategy, 20 UCI data sets have been selected to conduct the experiments. The details of the data sets are shown in the following Table 1.

Table 1
The detailed description of data sets

ID Data sets #Samples #Attributes #Labels

1 Breast Cancer Wisconsin (Diagnostic) 569 30 2

2 Breast Tissue 106 9 6

3 Cloud 1024 10 39

4 Diabetic Retinopathy Debrecen Data Set 1151 19 2

5 Forest Fires 517 13 4

6 Glass Identification 214 10 6

7 Ionosphere 351 34 2

8 LSVT Voice Rehabilitation 126 257 2

9 Madelon 4400 500 2

10 MAGIC Gamma Telescope 19020 11 2

11 Musk (Version 1) 476 168 2

12 Page Blocks Classification 5473 10 5

13 QSAR Biodegradation 1055 41 2

14 Quality Assessment of Digital Colposcopies 287 69 2

15 Seeds 210 7 3

16 Statlog (Image Segmentation) 2310 19 7

17 Statlog (Vehicle Silhouettes) 946 18 4

18 Ultrasonic Flowmeter Diagnostics 540 173 4

19 Wireless Indoor Localization 2000 7 4

20 Yeast 1484 8 10

ID	Data sets	#Samples	#Attributes	#Labels
1	Breast Cancer Wisconsin (Diagnostic)	569	30	2
2	Breast Tissue	106	9	6
3	Cloud	1024	10	39
4	Diabetic Retinopathy Debrecen Data Set	1151	19	2
5	Forest Fires	517	13	4
6	Glass Identification	214	10	6
7	Ionosphere	351	34	2
8	LSVT Voice Rehabilitation	126	257	2
9	Madelon	4400	500	2
10	MAGIC Gamma Telescope	19020	11	2
11	Musk (Version 1)	476	168	2
12	Page Blocks Classification	5473	10	5
13	QSAR Biodegradation	1055	41	2
14	Quality Assessment of Digital Colposcopies	287	69	2
15	Seeds	210	7	3
16	Statlog (Image Segmentation)	2310	19	7
17	Statlog (Vehicle Silhouettes)	946	18	4
18	Ultrasonic Flowmeter Diagnostics	540	173	4
19	Wireless Indoor Localization	2000	7	4
20	Yeast	1484	8	10

4.2 Experimental setup and configurations

All the experiments have been conducted on a personal computer with Windows 10, Intel Core i7-10710U CPU(1.61GHz), and 16 GB memory. The programming language is MATLAB R2016a.

In this section, the model used for deriving reduct is the neighborhood rough set [12 , 44]. Therefore, 20 different radii, i.e., 0.01, 0.02, 0.03,⋯, 0.2, which were recommended by Ref. [26], have been used for constructing neighborhoods. Moreover, approximation quality [25] and conditional entropy [6, 50] are employed to define attribute reduction. Finally, for each radius, the 5-fold cross-validation is employed to derive reduct.

To demonstrate the superiorities of our proposed acceleration strategy, the following five state-of-the-art algorithms have also been reproduced:

Forward Greedy Searching for Attribute Reduction(FGSAR);

Attribute Group for Attribute Reduction(AGAR) [3];

Positive Approximation for Attribute Reduction(PAAR) [35];

Dissimilarity Based Searching for Attribute Reduction(DBSAR) [38];

Multi-granularity Attribute Reduction(MGAR) [28].

4.3 Comparisons of the elapsed time

In this subsection, the elapsed time for deriving reducts w.r.t. the approximation quality and the conditional entropy by different algorithms will be compared. The detailed results are shown in the following Figs. 2 and 3, respectively.

Fig. 2

The elapsed time of deriving reducts w.r.t. the approximation quality.

Fig. 3

The elapsed time of deriving reducts w.r.t. the conditional entropy.

In Figs. 2 and 3, each subgraph contains six different colors’ lines, corresponding to the changes of the elapsed time of deriving reducts by six different strategies, respectively. It should be pointed out that the purple line corresponds to the elapsed time of deriving reducts by our strategy. Notably, in most subgraphs of Fig. 3, the top of the ordinate axis are greater than t(t represents a critical value of time consumption), which express that the time consumption of the attribute reduction method is much greater than t.

With a deep investigation of Figs. 2 and 3, it is not difficult to observe that the purple line significantly is lower than other colors’ lines. In other words, the process of deriving reducts by our proposed searching strategy requires less time-consuming. Taking the “Forest Fires(ID: 5)” data set as an example, if δ=0.1 and the measure of approximation quality is used, then the elapsed time of generating reducts by FGSAR, AGAR, DBSAR, PAAR, MGAR are 5.4677, 3.7705, 3.1739, 2.5609 and 6.4418 seconds, respectively. However, our strategy only needs 0.9129 seconds. Without loss of generality, our strategy can effectively compress the space of both attribute and sample by using labels’ information, then the efficiency of generating reduct can be significantly improved.

The values of the speed-up ratio will also be presented in the following Tables 2 and 3.

Table 2

The speed-up ratio for comparing elapsed time of deriving reduct w.r.t. approximation quality

ID	FGSAR V.S. Proposed	AGAR V.S. Proposed	DBSAR V.S. Proposed	PAAR V.S. Proposed	MGAR V.S. Proposed
1	5.1413	3.8000	3.5012	2.0760	16.0454
2	7.6952	6.7957	5.4153	5.6049	17.3624
3	16.0021	12.5605	8.5238	13.2591	33.8319
4	8.3503	5.8160	5.1908	5.9926	6.2518
5	5.5348	3.6075	3.2755	2.6303	8.0565
6	6.2141	5.1152	4.0177	5.4329	11.3872
7	4.7182	3.5562	3.7090	2.7414	4.7182
8	5.6744	5.5167	7.2748	3.3459	7.1744
9	17.2627	13.2501	8.0646	9.8230	33.1231
10	4.1271	3.3364	2.6403	4.0327	23.9634
11	4.4783	3.9571	4.4025	1.8605	10.4923
12	9.6783	8.1735	6.1786	8.2595	12.2381
13	4.5111	3.0253	2.6391	1.6281	6.6250
14	23.1563	21.9971	24.8057	10.0757	3.9487
15	7.0727	6.0958	5.3467	5.3426	19.7530
16	12.4934	9.4240	6.6407	4.8859	11.2402
17	8.0104	5.5448	4.5713	4.8099	10.9117
18	7.1521	3.9455	3.9841	2.6870	5.6908
19	7.7546	6.5076	5.0224	5.4624	10.1835
20	13.2893	11.2626	7.9629	12.1324	19.6160
Average	9.1021	7.2519	6.1981	5.7611	13.4073

Table 3

The speed-up ratio for comparing elapsed time of deriving reduct w.r.t. conditional entropy

ID	FGSAR V.S. Proposed	AGAR V.S. Proposed	DBSAR V.S. Proposed	PAAR V.S. Proposed	MGAR V.S. Proposed
1	3.0923	2.5163	3.0251	2.7653	42.8415
2	1.5027	2.0058	1.5816	1.6558	29.5401
3	2.7153	2.7492	2.1043	2.2517	36.2571
4	2.4518	1.9462	1.9216	2.3982	18.5564
5	3.6899	2.8382	2.9778	3.1635	32.5695
6	2.2922	2.6072	2.1164	2.4980	33.6711
7	1.7870	1.4637	2.0661	1.4462	15.1017
8	0.5810	0.8303	5.2172	0.3433	5.8015
9	2.5553	2.0126	15.9737	2.1130	12.0691
10	4.2693	4.1617	3.8502	5.8931	30.5430
11	3.0421	2.8711	5.7700	2.5092	12.7155
12	7.0811	7.6224	6.3218	9.3520	49.8523
13	3.1221	2.2114	2.3520	2.7878	27.1434
14	1.4813	1.8736	4.0050	1.2633	8.3363
15	1.6950	2.1361	1.8153	2.1108	30.8614
16	7.3755	6.1327	5.7809	6.8443	48.1897
17	4.5557	3.6922	3.8190	4.2847	37.8089
18	2.1194	1.4228	1.4797	1.7984	9.6596
19	4.3498	4.5949	4.2026	5.1417	49.0914
20	4.9155	5.0713	4.2705	5.3176	47.0784
Average	3.7037	3.4304	4.4178	3.7577	33.0049

Following Tables 2 and 3, it is not difficult to reveal that most of the values of the speed-up ratio are much greater than 1, that is, our proposed searching strategy ranks first over the six algorithms if the time efficiency of the algorithm is considered.

Moreover, by the average values of the speed-up ratio, it is not difficult to observe that no matter which measure is used, the speed-up ratio of MGAR v.s. Proposed is greater than that of FGSAR v.s. Proposed, the speed-up ratio of FGSAR v.s. Proposed is greater than that of AGAR v.s. Proposed, the speed-up ratio of AGAR v.s. Proposed is greater than that of PAAR v.s. Proposed. In other words, Proposed is faster than PAAR, PAAR is faster than AGAR, AGAR is faster than FGSAR and FGSAR is faster than MGAR.

Besides the time consumptions and speed-up ratios, to further compare the six algorithms from the viewpoint of statistics, the Wilcoxon signed rank test is employed, in which the significance level is appointed as 0.05. That is to say, if the returned p-value is lower than 0.05, then it indicates that these algorithms perform significantly different; otherwise, these algorithms perform similarly. The detailed results are shown in the following Tables 4 and 5.

Table 4

The p-values for comparing elapsed time of deriving reduct w.r.t. approximation quality

ID	FGSAR V.S. Proposed	AGAR V.S. Proposed	DBSAR V.S. Proposed	PAAR V.S. Proposed	MGAR V.S. Proposed
1	6.7956E-08	6.7956E-08	6.7956E-08	1.5757E-06	8.0065E-09
2	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
3	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
4	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
5	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
6	9.1728E-08	6.7956E-08	7.8980E-08	6.7956E-08	8.0065E-09
7	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
8	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
9	6.7956E-08	6.7956E-08	7.9479E-07	1.0646E-07	8.0065E-09
10	7.9479E-07	1.0473E-06	1.0473E-06	9.1266E-07	8.0065E-09
11	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
12	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
13	7.8980E-08	1.2346E-07	2.9598E-07	2.1000E-03	8.0065E-09
14	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
15	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
16	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
17	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
18	6.7956E-08	6.7956E-08	6.7956E-08	1.2346E-07	8.0065E-09
19	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
20	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09

Table 5

The p-values for comparing elapsed time of deriving reduct w.r.t. conditional entropy

ID	FGSAR V.S. Proposed	AGAR V.S. Proposed	DBSAR V.S. Proposed	PAAR V.S. Proposed	MGAR V.S. Proposed
1	4.5390E-07	1.0473E-06	1.4309E-07	1.0473E-06	8.0065E-09
2	2.5960E-05	9.1728E-08	6.0148E-07	1.8030E-06	8.0065E-09
3	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
4	4.5390E-07	7.9479E-07	1.6571E-07	3.9388E-07	8.0065E-09
5	1.4309E-07	6.7956E-08	6.7956E-08	1.4309E-07	8.0065E-09
6	9.1728E-08	6.7956E-08	7.8980E-08	6.7956E-08	8.0065E-09
7	1.2941E-04	4.0000E-03	3.9874E-06	7.1000E-03	8.0065E-09
8	2.9249E-05	9.1266E-07	6.7956E-08	3.1000E-03	8.0065E-09
9	1.2505E-05	2.2220E-04	6.7956E-08	3.7499E-04	8.0065E-09
10	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
11	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
12	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
13	5.2269E-07	9.1728E-08	6.7956E-08	7.9479E-07	8.0065E-09
14	9.7480E-06	1.4309E-07	6.7956E-08	6.2200E-04	8.0065E-09
15	3.4156E-07	7.8980E-08	7.8980E-08	7.8980E-08	8.0065E-09
16	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
17	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
18	1.0473E-06	9.1266E-07	1.0473E-06	1.0473E-06	8.0065E-09
19	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09
20	6.7956E-08	6.7956E-08	6.7956E-08	6.7956E-08	8.0065E-09

In Tables 4 and 5, the p-value of MGAR V.S. Proposed is the same because the value has reached the minimum in Matlab. With a careful investigation of Tables 4 and 5, it is not difficult to find that all obtained p-values are much less than 0.05, which implies that our proposed searching strategy performs significantly different from other strategies in time consumption. By combining with what have been observed in Figs. 2 and 3, our strategy is obviously superior to other accelerators in time efficiency.

4.4 Comparisons of classification performances

In this subsection, the classification accuracies related to the reducts derived by six algorithms will be compared. Notably, since 20 radii have been employed to derive reducts, the results shown in this subsection are the maximal accuracies related to the reducts of 20 radii. Furthermore, KNN, SVM classifiers are employed to conduct classification accuracies and the detailed results will be shown in the following Tables 6-9.

Table 6
The classification accuracies based on approximation quality based reduct over KNN classifier

ID FGSAR AGAR DBSAR PAAR MGAR Proposed

1 0.9701 0.9683 0.9666 0.9701 0.9666 0.9701

2 0.7264 0.7268 0.7264 0.7264 0.7264 0.7359

3 0.2217 0.2217 0.2207 0.2217 0.2168 0.2217

4 0.6394 0.6307 0.6411 0.6394 0.6237 0.6411

5 0.8624 0.8681 0.8585 0.8624 0.8585 0.8700

6 0.7055 0.7011 0.7198 0.7055 0.6685 0.7244

7 0.9117 0.9117 0.8860 0.9117 0.8548 0.9118

8 0.8175 0.8178 0.8249 0.8175 0.7378 0.8409

9 0.6635 0.6477 0.7335 0.6635 0.6581 0.7335

10 0.8320 0.8320 0.8320 0.8320 0.8320 0.8320

11 0.8149 0.8193 0.7920 0.8149 0.7878 0.8193

12 0.9580 0.9578 0.9580 0.9578 0.9576 0.9580

13 0.8531 0.8521 0.8502 0.8531 0.8483 0.8531

14 0.7455 0.7525 0.7212 0.7455 0.6757 0.7557

15 0.9381 0.9381 0.9333 0.9381 0.9286 0.9476

16 0.9649 0.9645 0.9667 0.9649 0.9606 0.9654

17 0.7105 0.7105 0.7105 0.7105 0.7105 0.7105

18 0.8556 0.8556 0.8556 0.8556 0.8556 0.8556

19 0.9875 0.9865 0.9865 0.9875 0.9865 0.9875

20 0.5371 0.5371 0.5371 0.5371 0.5371 0.5371

Average 0.7858 0.7850 0.7860 0.7858 0.7696 0.7936

ID	FGSAR	AGAR	DBSAR	PAAR	MGAR	Proposed
1	0.9701	0.9683	0.9666	0.9701	0.9666	0.9701
2	0.7264	0.7268	0.7264	0.7264	0.7264	0.7359
3	0.2217	0.2217	0.2207	0.2217	0.2168	0.2217
4	0.6394	0.6307	0.6411	0.6394	0.6237	0.6411
5	0.8624	0.8681	0.8585	0.8624	0.8585	0.8700
6	0.7055	0.7011	0.7198	0.7055	0.6685	0.7244
7	0.9117	0.9117	0.8860	0.9117	0.8548	0.9118
8	0.8175	0.8178	0.8249	0.8175	0.7378	0.8409
9	0.6635	0.6477	0.7335	0.6635	0.6581	0.7335
10	0.8320	0.8320	0.8320	0.8320	0.8320	0.8320
11	0.8149	0.8193	0.7920	0.8149	0.7878	0.8193
12	0.9580	0.9578	0.9580	0.9578	0.9576	0.9580
13	0.8531	0.8521	0.8502	0.8531	0.8483	0.8531
14	0.7455	0.7525	0.7212	0.7455	0.6757	0.7557
15	0.9381	0.9381	0.9333	0.9381	0.9286	0.9476
16	0.9649	0.9645	0.9667	0.9649	0.9606	0.9654
17	0.7105	0.7105	0.7105	0.7105	0.7105	0.7105
18	0.8556	0.8556	0.8556	0.8556	0.8556	0.8556
19	0.9875	0.9865	0.9865	0.9875	0.9865	0.9875
20	0.5371	0.5371	0.5371	0.5371	0.5371	0.5371
Average	0.7858	0.7850	0.7860	0.7858	0.7696	0.7936

Table 7

The classification accuracies based on conditional entropy based reduct over KNN classifier

ID	FGSAR	AGAR	DBSAR	PAAR	MGAR	Proposed
1	0.9630	0.9631	0.9648	0.9630	0.9630	0.9648
2	0.6892	0.6892	0.6797	0.6892	0.6701	0.6983
3	0.2324	0.2324	0.2324	0.2344	0.2285	0.2324
4	0.6498	0.6420	0.6386	0.6498	0.6342	0.6498
5	0.8795	0.8758	0.8815	0.8795	0.8700	0.8853
6	0.6966	0.6875	0.7059	0.6966	0.6779	0.7107
7	0.8918	0.8889	0.9004	0.8918	0.8747	0.9060
8	0.8332	0.8332	0.8163	0.8332	0.8332	0.8489
9	0.7658	0.7504	0.7358	0.7658	0.7438	0.7888
10	0.8320	0.8320	0.8320	0.8320	0.8320	0.8320
11	0.7711	0.7710	0.7753	0.7711	0.7648	0.7920
12	0.9598	0.9598	0.9598	0.9598	0.9598	0.9598
13	0.8550	0.8578	0.8578	0.8550	0.8540	0.8578
14	0.7214	0.7214	0.7315	0.7214	0.7214	0.7386
15	0.9238	0.9238	0.9238	0.9238	0.9190	0.9286
16	0.9671	0.9649	0.9667	0.9671	0.9628	0.9675
17	0.7056	0.7044	0.7068	0.7056	0.7032	0.7068
18	0.8722	0.8722	0.8722	0.8722	0.8722	0.8722
19	0.9845	0.9845	0.9845	0.9845	0.9845	0.9845
20	0.5512	0.5512	0.5512	0.5512	0.5512	0.5512
Average	0.7872	0.7852	0.7858	0.7873	0.7809	0.7937

Table 8

The classification accuracies based on approximation quality based reduct over SVM classifier

ID	FGSAR	AGAR	DBSAR	PAAR	MGAR	Proposed
1	0.9772	0.9772	0.9772	0.9772	0.9754	0.9772
2	0.5766	0.5766	0.5766	0.5766	0.5766	0.5766
3	0.2148	0.2158	0.2148	0.2148	0.2148	0.2168
4	0.6742	0.6742	0.6733	0.6742	0.6725	0.6742
5	0.8929	0.8891	0.8891	0.8929	0.8891	0.8948
6	0.4857	0.4857	0.4998	0.4857	0.4857	0.4950
7	0.8832	0.8831	0.8548	0.8832	0.8434	0.8833
8	0.8332	0.8643	0.8246	0.8332	0.8409	0.8483
9	0.6085	0.6154	0.6142	0.6085	0.6108	0.6123
10	0.7913	0.7913	0.7913	0.7913	0.7913	0.7913
11	0.7416	0.7332	0.7080	0.7416	0.7226	0.6912
12	0.9320	0.9320	0.9318	0.9320	0.9318	0.9320
13	0.8569	0.8578	0.8588	0.8569	0.8569	0.8588
14	0.7525	0.7525	0.7525	0.7525	0.7525	0.7525
15	0.9381	0.9381	0.9381	0.9381	0.9286	0.9381
16	0.9255	0.9242	0.9251	0.9255	0.9212	0.9255
17	0.7329	0.7329	0.7329	0.7329	0.7305	0.7329
18	0.6611	0.6611	0.6611	0.6611	0.6611	0.6611
19	0.9810	0.9805	0.9810	0.9810	0.9800	0.9820
20	0.5384	0.5384	0.5384	0.5384	0.5384	0.5384
Average	0.7499	0.7512	0.7472	0.7499	0.7462	0.7491

Table 9

The classification accuracies based on conditional entropy based reduct over SVM classifier

ID	FGSAR	AGAR	DBSAR	PAAR	MGAR	Proposed
1	0.9771	0.9771	0.9771	0.9771	0.9771	0.9771
2	0.5372	0.5372	0.5372	0.5372	0.5372	0.5372
3	0.2129	0.2129	0.2129	0.2129	0.2129	0.2139
4	0.6864	0.6855	0.6864	0.6864	0.6820	0.6864
5	0.8968	0.8968	0.8968	0.8968	0.8968	0.8987
6	0.5234	0.5280	0.5234	0.5234	0.5234	0.5234
7	0.8492	0.8520	0.8519	0.8492	0.8321	0.8663
8	0.8566	0.8566	0.8563	0.8566	0.8566	0.8966
9	0.6192	0.6208	0.6196	0.6192	0.6138	0.6208
10	0.7913	0.7913	0.7913	0.7913	0.7913	0.7913
11	0.6701	0.6764	0.6869	0.6701	0.6659	0.6869
12	0.9329	0.9329	0.9329	0.9329	0.9329	0.9329
13	0.8531	0.8559	0.8559	0.8531	0.8540	0.8559
14	0.7524	0.7524	0.7524	0.7524	0.7524	0.7524
15	0.9429	0.9429	0.9381	0.9429	0.9381	0.9429
16	0.9281	0.9281	0.9281	0.9281	0.9277	0.9281
17	0.7234	0.7234	0.7246	0.7234	0.7234	0.7258
18	0.6833	0.6833	0.6833	0.6833	0.6778	0.6833
19	0.9805	0.9800	0.9825	0.9805	0.9800	0.9825
20	0.5505	0.5505	0.5498	0.5505	0.5498	0.5505
Average	0.7484	0.7492	0.7494	0.7484	0.7463	0.7526

Through carefully observing Tabs. 6-9, it is not difficult to reveal that no matter which classifier or measure is employed, the reducts derived by our strategy will not lead to poorer classification accuracies compared with other strategies.

5 Conclusions and future perspectives

In this paper, we developed label-specific guidance for accelerating the process of deriving reduct in the rough set theory. Different from previous popular accelerators, our strategy takes two main ideas of Granular Computing into account: 1) the whole universe is divided into several disjoint groups for obtaining the local perspectives, which implies the main idea of breaking up the whole into pieces; 2) the guidance thinking is introduced into the searching of appropriate conditional attributes, which implies the main idea of gathering parts into a whole. It follows that such type of the guidance strategy is of great value in improving the efficiency of deriving reduct.

The following topics will deserve our further investigation.

Other effective measures such as conditional discrimination index and neighborhood decision error rate may be further employed for verifying the effectiveness of our acceleration searching strategy.

The acceleration searching strategy proposed in this paper is only designed based on the data with single label, the situation with multiple labels need to be further explored.

Footnotes

Acknowledgment

This work is supported by the Natural Science Foundation of China (Nos. 62076111, 61906078, 62006099, 62006128), Nature Science Foundation of Jiangsu, China (No. BK20191457), Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2020004) and the Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (No. OBDMA202002).

References

Cai

, Luo

J.W.

, Wang

S.L.

and Sheng

, Feature selection in machine learning: a new perspective, Neurocomputing 300 (2018), 70–79.

Che

X.Y.

, Chen

D.G.

and Mi

J.S.

, A novel approach for learning label correlation with application to feature selection of multi-label data, Information Sciences 512 (2020), 795–812.

Chen

, Liu

K.Y.

, Song

J.J.

, Fujita

, Yang

X.B.

and Qian

Y.H.

, Attribute group for attribute reduction, Information Sciences 535 (2020), 64–80.

Chen

D.G.

, Yang

Y.Y.

and Dong

, An incremental algorithm for attribute reduction with variable precision rough sets, Applied Soft Computing 45 (2016), 129–149.

Chen

D.G.

and Zhao

S.Y.

, Local reduction of decision system with fuzzy rough sets, Fuzzy Sets and Systems 161 (2010), 1871–1883.

Dai

J.H.

and Tian

H.W.

, Entropy measures and granularity measures for set-valued information systems, Information Sciences 240 (2013), 72–82.

Deng

T.Q.

, Yang

C.D.

and Hu

Q.H.

, Feature selection in decision systems based on conditional knowledge granularity, International Journal of Computational Intelligence Systems 4 (2011), 655–671.

Feng

Q.R.

and Zhou

, Soft discernibility matrix and its applications in decision making, Applied Soft Computing 24 (2014), 749–756.

Gao

S.X.

and Zhao

, Hierarchical classification with multi-path selection based on granular computing, Artificial Intelligence Review 54 (2021), 2067–2089.

10.

Han

, Tan

Y.K.

, Zhu

J.H.

, Guo

, Chen

and Wu

Q.Y.

, Online feature selection of class imbalance via PA algorithm, Journal of Computer Science and Technology 31 (2016), 673–682.

11.

Hoang

V.D.

and Jo

K.H.

, Path planning for autonomous vehicle based on heuristic searching using online images, Vietnam Journal of Computer Science 2 (2015), 109–120.

12.

Q.H.

, Yu

D.R.

, Liu

J.F.

and Wu

C.X.

, Neighborhood rough set based heterogeneous feature subset selection, Information Sciences 178 (2008), 3577–3594.

13.

Jiang

Z.H.

, Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Fujita

and Qian

Y.H.

, Accelerator for supervised neighborhood based attribute reduction, International Journal of Approximate Reasoning 119 (2020), 122–150.

14.

Jiang

Z.H.

, Yang

X.B.

, Yu

H.L.

, Liu

, Wang

P.X.

and Qian

Y.H.

, Accelerator for multi-granularity attribute reduction, Knowledge-Based Systems 177 (2019), 145–158.

15.

H.R.

, Yang

X.B.

, Song

X.N.

and Qi

Y.S.

, Dynamic updating multigranulation fuzzy rough set: approximations and reducts, International Journal of Machine Learning and Cybernetics 6 (2014), 981–990.

16.

H.R.

, Yang

X.B.

, Yu

H.L.

, Li

T.J.

, Yu

D.J.

and Yang

J.Y.

, Cost-sensitive rough set approach, Information Sciences 355-356 (2016), 282–298.

17.

Kong

Q.Z.

, Zhang

X.W.

, Xu

W.H.

and Xie

S.T.

, Attribute reducts of multi-granulation information system, Artificial Intelligence Review 53 (2020), 1353–1371.

18.

J.H.

, Kumar

C.A.

, Mei

C.L.

and Wang

X.Z.

, Comparison of reduction in formal decision contexts, International Journal of Approximate Reasoning 80 (2017), 100–122.

19.

J.H.

, Mei

C.L.

and Lv

Y.J.

, Knowledge reduction in real decision formal contexts, Information Sciences 189 (2012), 191–207.

20.

J.H.

, Mei

C.L.

, Xu

W.H.

and Qian

Y.H.

, Concept learning via granular computing: A cognitive viewpoint, Information Sciences 298 (2015), 447–467.

21.

J.H.

, Mei

C.L.

, Wang

L.D.

and Wang

J.H.

, On inference rules in decision formal contexts, International Journal of Computational Intelligence Systems 8 (2015), 175–186.

22.

, Si

, Zhou

G.J.

, Huang

S.S.

and Chen

S.C.

, FREL: a stable feature selection algorithm, IEEE Transactions on Neural Networks and Learning Systems 26 (2014), 1388–1402.

23.

F.C.

, Zhang

and Jin

C.X.

, Feature selection with partition differentiation entropy for large-scale data sets, Information Sciences 329 (2016), 690–700.

24.

Liang

J.Y.

, Wang

, Dang

C.Y.

and Qian

Y.H.

, A group incremental approach to feature selection applying rough set technique, IEEE Transactions on Knowledge and Data Engineering 26 (2014), 294–308.

25.

Lin

G.P.

, Liang

J.Y.

and Qian

Y.H.

, Uncertainty measures for multigranulation approximation space, International Journal of Uncertianty, Fuzziness and Knowledge-Based Systems 23 (2015), 443–457.

26.

Liu

, Huang

W.L.

, Jiang

Y.L.

and Zeng

Z.Y.

, Quick attribute reduct algorithm for neighborhood rough set model, Information Sciences 271 (2014), 65–81.

27.

Liu

J.H.

, Lin

Y.J.

, Li

Y.W.

, Weng

and Wu

S.X.

, Online multi-label streaming feature selection based on neighborhood rough set, Pattern Recognition 84 (2018), 273–287.

28.

Liu

K.Y.

, Yang

X.B.

, Fujita

, Liu

, Yang

and Qian

Y.H.

, An efficient selector for multi-granularity attribute reduction, Information Sciences 505 (2019), 457–472.

29.

Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Fujita

, Chen

X.J.

and Liu

, Supervised information granulation strategy for attribute reduction, International Journal of Machine Learning and Cybernetics 11 (2020), 2149–2163.

30.

F.M.

, Ding

M.W.

, Zhang

T.F.

and Cao

, Compressed binary discernibility matrix based incremental attribute reduction algorithm for group dynamic data, Neurocomputing 344 (2019), 20–27.

31.

J.S.

, Wu

W.Z.

and Zhang

W.X.

, Approaches to knowledge reduction based on variable precision rough set model, Information Sciences 159 (2004), 255–272.

32.

Min

, He

H.P.

, Qian

Y.H.

and Zhu

, Test-cost-sensitive attribute reduction, Information Sciences 181 (2011), 4928–4942.

33.

Pawlak

and Skowron

, Rough sets: some extensions, Information Sciences 177 (2007), 28–40.

34.

Qian

Y.H.

, Liang

J.Y.

and Dang

C.Y.

, Consistency measure, inclusion degree and fuzzy measure in decision tables, Fuzzy Sets and Systems 159 (2008), 2353–2377.

35.

Qian

Y.H.

, Liang

J.Y.

, Pedrycz

and Dang

C.Y.

, Positive approximation: an accelerator for attribute reduction in rough set theory, Artificial Intelligence 174 (2010), 597–618.

36.

Qian

Y.H.

, Liang

X.Y.

and Pedrycz

, An efficient accelerator for attribute reduction from incomplete data in rough set framework, Pattern Recognition 44 (2011), 1658–1670.

37.

Qin

K.Y.

, Li

and Pei

, Attribute reduction and rule acquisition of formal decision context based on object (property) oriented concept lattices, International Journal of Machine Learning and Cybernetics 10 (2019), 2837–2850.

38.

Rao

X.S.

, Yang

X.B.

, Yang

, Chen

X.J.

, Liu

and Qian

Y.H.

, Quickly calculating reduct: an attribute relationship based approach, Knowledge-Based Systems 200 (2020), Article 106014.

39.

Song

J.J.

, Tsang

E.C.C.

Chen

D.G.

and Yang

X.B.

, Minimal decision cost reduct in fuzzy decision-theoretic rough set model, Knowledge-Based Systems 126 (2017), 104–112.

40.

Wang

Y.B.

, Chen

X.J.

and Dong

, Attribute reduction via local conditional entropy, International Journal of Machine Learning and Cybernetics 10 (2019), 3619–3634.

41.

Wei

, Wu

X.Y.

, Liang

J.Y.

, Cui

J.B.

and Sun

Y.J.

, Discernibility matrix based incremental attribute reduction for dynamic data, Knowledge-Based Systems 140 (2018), 142–157.

42.

S.P.

, Ju

H.R.

, Shang

, Pedrycz

, Yang

X.B.

and Li

, Label distribution learning: a local collaborative mechanism, International Journal of Approximate Reasoning 121 (2020), 59–84.

43.

S.P.

, Yang

X.B.

, Yu

H.L.

, Yu

D.J.

, Yang

J.Y.

and Tsang

E.C.C.

, Multi-label learning with label-specific feature reduction, Knowledge-Based Systems 104 (2016), 52–61.

44.

Yang

X.B.

, Liang

S.C.

, Yu

H.L.

, Gao

and Qian

Y.H.

, Pseudo-label neighborhood rough set: measures and attribute reductions, International Journal of Approximate Reasoning 105 (2019), 112–129.

45.

Yang

X.B.

and Yao

Y.Y.

, Ensemble selector for attribute reduction, Applied Soft Computing 70 (2018), 1–11.

46.

Yang

X.B.

, Zhang

Y.Q.

and Yang

J.Y.

, Local and global measurements of MGRS rules, International Journal of Computational Intelligence Systems 5 (2012), 1010–1024.

47.

Yao

Y.Y.

, Three-way granular computing, rough sets, and formal concept analysis, International Journal of Approximate Reasoning 116 (2020), 106–125.

48.

Yao

Y.Y.

and Zhang

X.Y.

, Class-specific attribute reducts in rough set theory, Information Sciences 418-419 (2017), 601–618.

49.

Yao

Y.Y.

, Zhao

and Wang

, On reduct construction algorithms, Transactions on Computational Science II 5150 (2008), 100–117.

50.

Zhang

, Mei

C.L.

, Chen

D.G.

and Li

J.H.

, Feature selection in mixed data: a method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56 (2016), 1–15.

51.

Zhang

M.L.

and Wu

, Lift: multi-label learning with label-specific features, IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (2015), 107–120.

52.

Zhou

, Hu

X.G.

, Li

P.P.

and Wu

X.D.

, Online streaming feature selection using adapted neighborhood rough set, Information Sciences 481 (2019), 258–279.

53.

Zhou

, Hu

X.G.

, Li

P.P.

and Wu

X.D.

, Online feature selection for high-dimensional class-imbalanced data, Knowledge-Based Systems 136 (2017), 187–199.

54.

Zou

D.Q.

, Zhu

Y.W.

, Xu

S.H.

, Li

, Jin

and Ye

H.K.

, Interpreting deep learning-based vulnerability detector predictions based on heuristic searching, ACM Transactions on Software Engineering and Methodology 30 (2021), 1–31.