Gaussian kernel fuzzy rough based attribute reduction: An acceleration approach

Abstract

Presently, the Gaussian kernel approach has been widely accepted for measuring the similarities among samples and then constructing various fuzzy rough sets. Notably, the considered parameter plays a crucial role in deriving Gaussian kernel based similarities. This is mainly because different parameters will generate different scales of the similarities. From this point of view, different parameters may result in different fuzzy rough approximations and the corresponding reducts. Generally speaking, to search a parameterized reduct with better generalization performance, a naive approach can be designed by repeating the process of computing reduct through using different parameters. Obviously, it is very time-consuming. To fill such a gap, an acceleration approach is proposed which aims to reduce the elapsed time of searching reducts based on different parameters. The main mechanism of our proposed approach is to take the variation of the used parameters into account, and then the process of finding reduct under current parameter can be realized based on the previous parameter related reduct. The experimental results over 16 UCI data sets, which are obtained by testing different Gaussian kernel based fuzzy rough sets, demonstrate that our proposed acceleration strategy not only can significantly reduce the time consumption of finding reducts in terms of different parameters, but also will not lead to poorer classification performance and significant variation of length of the obtained reducts by comparing with the results obtained by the naive process. This study suggests technical support for quickly finding reducts of parameterized fuzzy rough sets.

Keywords

Acceleration strategy attribute reduction fuzzy dependency fuzzy rough set Gaussian kernel

1 Introduction

Attribute reduction [5–7 , 45], a rough set based feature selection technology [4 , 48], can be exploited in terms of various rough set models. Presently, with respect to different requirements, numerous generalizations of classical rough set model have been proposed and explored [1 , 62]. As one of the most popular extended rough set models, fuzzy rough set has been widely concerned for attribute reduction due to its effectiveness in analyzing and handling continuous and even mixed data [8 , 51].

For fuzzy rough set, it should be emphasized that different types of fuzzy relation may determine different fuzzy rough set models [41], it follows that different fuzzy rough approximations and the corresponding reducts may be generated. Immediately, how to acquire a suitable fuzzy relation has been an open problem in the researches of fuzzy rough set [36 , 59]. For such propose, Hu et al. [13] have proposed a more effective strategy: introduce the Gaussian kernel function into the measuring of similarities between samples, and then derive the fuzzy relation over the universe. Notably, the appointed kernel parameter plays an essential role in deriving Gaussian kernel based fuzzy relation [3 , 25]. This is mainly because different kernel parameters will contribute to different scales of Gaussian kernel based similarities, it follows that different fuzzy rough approximations and the corresponding reducts may be induced.

With a careful reviewing of the previous researches, it is not difficult to reveal that most of the Gaussian kernel fuzzy rough based attribute reduction approaches are essentially realized based on one and only one parameter, which are referred to as single parameter based attribute reductions in the context of this paper. In these approaches, given one fixed parameter, one and only one qualified reduct will be returned. Nevertheless, there are some inherent limitations in single parameter based attribute reduction, which can be summarized as the following aspects.

Single parameter based attribute reduction may not provide the better adaptability to granularity world, which will result in the lacking of comprehensiveness [58]. For example, given one Gaussian kernel parameter, a fuzzy relation can be derived, and the deriving of such fuzzy relation can be regarded as the process of information granulation [17 , 61]. Immediately, it is easily known that the considered kernel parameter actually suggests a corresponding level of granularity [28, 34]. Suppose that a reduct is derived in terms of such fixed parameter based granularity, it may not be still the reduct for a little finer or coarser granularity which may be caused by slight variation of parameter. It follows that multiple parameters based attribute reduction is required.

Single parameter based attribute reduction may not be suitable for observing the variation tendency of generalization performance of attributes in the obtained reduct. For example, if one reduct is derived over one and only one parameter, then there is no comparison among the generalization performances of attributes in the reducts obtained over multiple parameters. Therefore, multiple parameters based attribute reduction is also required.

To overcome the limitations mentioned above, it is necessary to develop a novel thinking for Gaussian kernel fuzzy rough based attribute reduction: take multiple different parameters into account. To realize that, a naive approach can be directly designed through repeating the process of searching single parameter based reduct with respect to the number of the considered parameters. However, it is obvious that such method is time-consuming, especially the number of the considered parameters is great. In view of this, an acceleration approach for multiple different parameters based attribute reduction is proposed, which aims to decrease the elapsed time of searching reducts over multiple different parameters. The main mechanism of our acceleration strategy is based on the consideration of variation of used parameters, and the searching of current parameter related reduct may be guided based on the previous parameter related reduct. Figure 1(a) shows the process of the naive approach to search multiple different parameters based reducts, while the framework of our proposed acceleration strategy is illustrated in Figure 1(b).

Fig.1

The processes of searching reducts based on the naive approach and the acceleration strategy.

Following Fig. 1, it is not difficult to observe that different from the naive approach, our process of deriving reduct over current parameter is generated based on the reduct related to previous parameter instead of the whole raw candidate attributes. For example, in Fig. 1(b), if the 1-st parameter based reduct is obtained, then the 2-nd parameter based reduct can be searched based on the reduct related to the 1-st parameter, that is, the searching of the 2-nd parameter related reduct is not begin with an empty set, but with the attributes in the 1-st parameter based reduct. Moreover, it is not difficult to reveal that the searching space of attributes will be reduced in the framework of our acceleration strategy, while all attributes should be checked in the process of computing reducts based on the naive approach. It follows that our acceleration strategy may contribute to lower elapsed time of searching multiple different parameters based reducts, and further improve the time efficiency of finding parameterized reducts.

The rest of this paper is organized as follows. Section 2 will review the basic notion related to various Gaussian kernel fuzzy rough set models and the corresponding attribute reduction. The acceleration strategy for multiple parameters based Gaussian kernel fuzzy rough attribute reduction will be presented in Section 3. Experiments will be shown in Section 4 as well as some related analyses. This paper will be concluded with conclusions and perspectives in Section 5.

2 Preliminary knowledge

2.1 Classical fuzzy rough set

Generally speaking, a decision system can be described as DS =< U, A, d >, in which U is the finite set of samples, also known as universe, A is the set of condition attributes, and d is the decision attribute. ∀x ∈ U, a (x) denotes the value of sample x over the attribute a ∈ A, while d (x) indicates the label of sample x. Following the decision attribute d, an equivalence relation over universe U can be obtained by d such that IND_d = {(x, y) ∈ U × U : d (x) = d (y)}. Immediately, a partition U/IND_d = {X₁, X₂, …, X_q} over universe U can be induced, in which X_p (X_p ∈ U/IND_d) is the p-th decision class consisting of samples with the same label.

In rough set theory [32, 33], the indiscernibility relation [35] can be regarded as the basic mechanism for granulating the universe, and then the corresponding results of granulation can be used to approximate the target concept, i.e., the decision class. However, if the considered data is continuous, then such model may be powerless in handling and analysing these data. This is mainly because the indiscernibility relation is only suitable for dealing with categorical data. In view of this, various approaches have been proposed to expand classical rough set [11 , 60] with respect to different requirements. Presently, the concept of fuzzy rough set has been widely concerned and can be regarded as one of the effective rough sets for analyzing continuous data. An important reason is that the fuzzy rough set is constructed based on the similarity among samples and it will not change the distribution of the raw data. As what has been reported in Reference [13], the Gaussian kernel can be employed to measure the similarity among samples and then such approach has formed a bridge between rough sets and kernel methods.

Given a decision system DS =< U, A, d >, ∀B ⊆ A, the similarity between arbitrary two samples can be measured by the Gaussian kernel function such that $R_{B}^{σ} (x, y) = exp (\frac{{- ∥ x - y ∥}_{B}^{2}}{2 σ^{2}})$ , in which ∥x - y ∥ _B is the Euclidean distance between samples x and y, i.e., $∥ x - y ∥_{B} = \sqrt{\sum_{a \in B} {(a (x) - a (y))}^{2}}$ , and σ is the Gaussian kernel parameter. Obviously, given B ⊆ A, ∀x, y ∈ U, different kernel parameters will generate different similarities between x and y. Moreover, the Gaussian kernel based fuzzy relation can be denoted as $R_{B}^{σ} = {R_{B}^{σ} (x, y) : \forall x, y \in U}$ . Immediately, the fuzzy rough set of X_p ∈ U/IND_d can be defined as follows.

Definition 1. [10] Given a decision system DS =< U, A, d >, ∀B ⊆ A and ∀x ∈ U, the fuzzy rough lower and upper approximations of X_p can be respectively denoted as the following membership functions $\underline{R_{B}^{σ}} X_{p} (x) = \inf_{y \in U - X_{p}} {1 - R_{B}^{σ} (x, y)},$ (1) $\bar{R_{B}^{σ}} X_{p} (x) = \sup_{y \in X_{p}} {R_{B}^{σ} (x, y)},$ (2) in which $R_{B}^{σ}$ is the fuzzy relation induced by Gaussian kernel and B.

Following Definition 1, it is not difficult to observe that the membership degree of x ∈ U belonging to fuzzy rough lower approximation of X_p is actually the fuzzy dissimilarity between x and its nearest sample from U - X_p. Similarly, the membership degree of x ∈ U belonging to fuzzy rough upper approximation of X_p is actually the fuzzy similarity between x and its nearest sample from X_p [10].

2.2 Various fuzzy rough sets

Following Definition 1, the operators play an essential role in deriving fuzzy rough approximations, it follows that different operators will result in different fuzzy rough approximations, and then different fuzzy rough set models can be constructed. Therefore, Hu et al. [12] have not only analyzed the inherent limitations of classical fuzzy rough set, but also investigated several operators, and further proposed corresponding fuzzy rough set models. The detailed definitions with respect to these operators are shown as follows.

Definition 2. [12] Given a random variable X, and the observed values of its t samples, which are denoted as x₁, x₂, …, x_t, assuming that x₁ < x₂ < … < x_t. Then the k - trimmed minimum of X is x_k+1; the k - trimmed maximum of X is x_t-k; k - mean minimum of X is $\frac{\sum_{i = 1}^{k} x_{i}}{k}$ ; the k - mean maximum of X is $\frac{\sum_{i = t - k + 1}^{t} x_{i}}{k}$ ; the k - median minimum of X is median (x₁, x₂, …, x_k); the k - median maximum of X is median (x_t-k+1, x_t-k+2, …, x_t), in which median(·) is a function to calculate the median value. These operators can be respectively denoted as $\min_{k - trimmed} (X)$ , $\max_{k - trimmed} (X)$ , $\min_{k - mean} (X)$ , $\max_{k - mean} (X)$ , $\min_{k - median} (X)$ and $\max_{k - median} (X)$ .

Following the operators shown in Definition 2, Hu et al. [12] have proposed three fuzzy rough set models which are referred to as k-trimmed, k-mean and k-median fuzzy rough sets, respectively. The detailed definitions of fuzzy rough approximations with respect to these three models can be formulated as follows.

Definition 3.[12] Given a decision system DS =< U, A, d >, ∀B ⊆ A and ∀x ∈ U, the fuzzy rough lower and upper approximations of X_p with respect to different operators can be respectively denoted as the following membership functions ${\underline{R_{B}^{σ}}}_{k - trimmed} X_{p} (x) = \underset{y \in U - X_{p}}{\min_{k - trimmed}} {1 - R_{B}^{σ} (x, y)},$ (3) ${\bar{R_{B}^{σ}}}_{k - trimmed} X_{p} (x) = \underset{y \in X_{p}}{\max_{k - trimmed}} {R_{B}^{σ} (x, y)},$ (4) ${\underline{R_{B}^{σ}}}_{k - mean} X_{p} (x) = \underset{y \in U - X_{p}}{\min_{k - mean}} {1 - R_{B}^{σ} (x, y)},$ (5) ${\bar{R_{B}^{σ}}}_{k - mean} X_{p} (x) = \underset{y \in X_{p}}{\max_{k - mean}} {R_{B}^{σ} (x, y)},$ (6) ${\underline{R_{B}^{σ}}}_{k - median} X_{p} (x) = \underset{y \in U - X_{p}}{\min_{k - median}} {1 - R_{B}^{σ} (x, y)},$ (7) ${\bar{R_{B}^{σ}}}_{k - median} X_{p} (x) = \underset{y \in X_{p}}{\max_{k - median}} {R_{B}^{σ} (x, y)},$ (8) in which $R_{B}^{σ}$ is the fuzzy relation induced by Gaussian kernel and B,

2.3 Fuzzy dependency

Definition 4.[10] Given a decision system DS =< U, A, d >, ∀B ⊆ A and based on the classical fuzzy rough set, the fuzzy positive region of d with respect to B is defined as ${POS}_{B}^{σ} (d) = ⋃_{p = 1}^{q} \underline{R_{B}^{σ}} X_{p} .$ (9)

The fuzzy positive region of decision attribute with respect to condition attribute set shown in Definition 4 is actually a fuzzy set, which is derived by the union of fuzzy rough lower approximations of decision classes. Additionally, the fuzzy positive regions with respect to different fuzzy rough sets can be obtained by replacing the counterparts in Equation (9) with fuzzy rough lower approximations based on corresponding operators.

Definition 5.[10] Given a decision system DS =< U, A, d >, ∀B ⊆ A and based on the classical fuzzy rough set, the fuzzy dependency of d with respect to B is defined as $γ_{B}^{σ} (d) = \frac{\sum_{x \in U} {POS}_{B}^{σ} (d) (x)}{| U |},$ (10) in which |X| is the cardinality of the set X, and ${POS}_{B}^{σ} (d) (x) = \sup {\underline{R_{B}^{σ}} X_{p} (x) : \forall X_{p} \in U / {IND}_{d}}$ .

The fuzzy dependency shown in Definition 5 reflects the average degree of the samples which certainly belong to decision classes. Obviously, $0 \leq γ_{B}^{σ} (d) \leq 1$ holds. Similarly, the fuzzy dependencies with respect to different fuzzy rough sets can be derived by replacing the counterparts in Equation (10) with fuzzy positive regions based on corresponding operators.

2.4 Attribute reduction in fuzzy rough set

Definition 6. Given a decision system DS =< U, A, d >, ∀B ⊆ A, B is regarded as a γ-reduct if and only if:

$γ_{B}^{σ} (d) = γ_{A}^{σ} (d)$ ;

∀B′ ⊂ B, $γ_{B^{'}}^{σ} (d) \neq γ_{B}^{σ} (d)$ .

Following Definition 6, it is not difficult to observe that the first condition aims to find all the relevant attributes, while the second condition guarantees that no redundant attributes are contained in the obtained reduct. Obviously, the fuzzy dependency is important in the attribute reduction shown in Definition 6, and the fuzzy dependency can be employed to design significance function, which is used for evaluating the significance of the candidate attribute. The definition of significance function can be formulated as follows.

Definition 7. Given a decision system DS =< U, A, d >, ∀B ⊂ A, and ∀a ∈ A - B, the significance of attribute a with respect to fuzzy dependency is ${Sig}^{σ} (a, B, d) = γ_{B \cup {a}}^{σ} (d) - γ_{B}^{σ} (d) .$ (11)

Sig^σ (a, B, d) can reflect the variation of fuzzy dependency when attribute a is added into B, therefore, the higher value of Sig^σ (a, B, d) is, the more important the attribute a will be. And such significance function can be employed in the forward greedy searching algorithm to compute γ-reduct. Immediately, the main steps of forward greedy searching for computing γ-reduct can be described as follows.

Potential reduct B is initialized to an empty set.

Calculate the fuzzy dependency $γ_{A}^{σ} (d)$ in terms of all condition attributes A and given kernel parameter σ.

For each candidate attribute a ∈ A - B, evaluate its significance in terms of the value of Sig^σ (a, B, d).

Select the most important candidate attribute, and add it into the potential reduct B.

If the potential reduct B satisfies the first condition shown in Definition 6, then go to Step (6); otherwise, repeat Steps (3), (4) and (5).

For each attribute c ∈ B, let , if the attribute subset does not satisfy the second condition shown in Definition 6, then remove the attribute c from B.

If no attribute can be removed from potential reduct B, then γ-reduct B is generated; otherwise, repeat Steps (6) and (7).

Based on the above steps, and to simplify our discussion, the detailed algorithm to compute γ-reduct based on classical fuzzy rough set can be described as follows, and the algorithms to compute γ-reduct with respect to different fuzzy rough set models can be analogized.

Algorithm 1. Forward greedy searching algorithm to compute γ-reduct

Inputs: DS =< U, A, d >, kernel parameter σ, threshold ɛ ∈ [0, 1);

Outputs: A γ-reduct B.

1. B← ∅;

2. Compute $γ_{A}^{σ} (d)$ ;

3. Do

1) ∀a ∈ A - B, compute Sig^σ (a, B, d);

2) Select b such that Sig^σ (b, B, d) = max {Sig^σ (a, B, d) : ∀ a ∈ A - B};

3) B ← B ∪ {b};

4) Compute $γ_{B}^{σ} (d)$ ;

While $γ_{A}^{σ} (d) - γ_{B}^{σ} (d) > ɛ \cdot γ_{A}^{σ} (d)$

4. Do

C ← B;

For each c ∈ B

Compute $γ_{B - {c}}^{σ} (d)$ ;

If $γ_{A}^{σ} (d) - γ_{B - {c}}^{σ} (d) \leq ɛ \cdot γ_{A}^{σ} (d)$

B ← B - {c};

End

While C ≠ B

5. Return B.

Following Algorithm 1, it should be noticed that the fuzzy dependency derived by arbitrary attribute subset will be less than that induced by the whole attribute set. This is mainly because the Gaussian kernel based similarity among samples will decrease with the growth of number of the used attributes, and then the membership degrees of samples belonging to the fuzzy rough lower approximations will be increased, which will lead to the increasing of value of fuzzy dependency. That is, the value of fuzzy dependency increases with the growth of number of the considered attributes. Consequently, the threshold ɛ in Algorithm 1 is employed to avoid the case that no attribute can be removed from the raw attribute set.

Moreover, it is not difficult to observe that all candidate attributes should be scanned in the process of searching reduct through using Algorithm 1, and the main time consumption of such algorithm lies in the evaluation of significance of candidate attribute. It follows that the time complexity of Algorithm 1 is O (|U|² · |A|²).

3 Acceleration strategy for attribute reduction

3.1 Problem description

Following discussions above, it is not difficult to observe that one and only one parameter has been considered in most of previous researches related to Gaussian kernel based fuzzy rough attribute reduction. However, it should be emphasised that there are some inherent limitations in one and only one parameter based attribute reduction, as what has been pointed out in the part of Introduction, and these limitations can be attributed to the following two points: 1) single parameter based attribute reduction may result in the lack of better adaptability to granularity world; 2) single parameter based attribute reduction may not be suitable for observing the variation tendency of generalization performance of derived reduct.

Consequently, to overcome the inherent limitations mentioned above, it is necessary to search reducts over a set of different parameters.

3.2 Naive approach for attribute reduction

Regarding the problem stated above, given a set of considered parameters, a naive approach can be directly designed: compute reducts repeatedly through using Algorithm 1 over a set of different parameters. Immediately, the main steps of naive approach for searching reducts over a set of considered parameters can be described as follows.

The set of γ-reducts $𝔹$ over a set of kernel parameters T is initialized to an empty set.

For each kernel parameter σ ∈ T, derive the reduct B with respect to σ through using Algorithm 1, and then add B into $𝔹$ .

The set of γ-reducts $𝔹$ is obtained.

Based on the above steps, and to simplify our discussion, take the classical fuzzy rough set as an example, the detailed naive algorithm to compute γ-reducts over a set of parameters is shown as follows.

Algorithm 2. Naive algorithm to compute γ-reducts over a set of parameters

Inputs: DS =< U, A, d >, a set of kernel parameters T, threshold ɛ ∈ [0, 1);

Outputs: A set of γ-reducts $𝔹$ .

1. $𝔹 \leftarrow \emptyset$ ;

2. For each σ ∈ T

1) Compute reduct B based on σ through using Algorithm 1;

2) Add B into $𝔹$ ;

End

3. Return $𝔹$ .

Following Algorithm 2, it should be noticed that the output result is a set of reducts instead of one and only one reduct. Moreover, it is not difficult to reveal that Algorithm 2 is mainly performed through repeating the process of Algorithm 1. It follows that the time complexity of Algorithm 2 is O (n · |U|² · |A|²), where n is the number of kernel parameter in T.

3.3 Forward acceleration strategy for attribute reduction

Following Algorithm 2, it is trivial to know that the process of naive approach for searching reducts based multiple parameters is time-consuming, especially the number of the considered parameters is great. Immediately, how to accelerate the process of computing reducts over multiple parameters has become an open topic. For such purpose, an acceleration approach will be designed as follows. Given a decision system DS, if two kernel parameters are considered such that σ₁ ≤ σ₂, then the main mechanism of our forward acceleration strategy is that the process of finding the σ₂ based reduct B₂ is in terms of the σ₁ based reduct B₁. In detail, to search the reduct with respect to σ₂, we begin our searching with σ₁ based reduct instead of an empty set. Correspondingly, the main steps in our forward acceleration strategy can be summarized as follows.

Compute the σ₁ based reduct B₁ through using Algorithm 1.

If B₁ satisfies the constraint related to σ₂, then we have B₂ = B₁; if B₁ does not satisfy the constraint with respect to σ₂, then we select more suitable attributes from A - B₁ and add them into B₁ until the constraint with respect to σ₂ is satisfied, it follows that B₂ is generated.

Through applying the above mechanism to the searching of multiple parameters based reducts, and based on classical fuzzy rough set, a forward acceleration algorithm to compute γ-reudcts over a set of parameters can be designed as follows.

Algorithm 3. Forward acceleration algorithm to compute γ-reducts over a set of parameters

Inputs: DS =< U, A, d >, a set of kernel parameters T, threshold ɛ ∈ [0, 1);

Outputs: A set of γ-reducts $𝔹$ .

1. $𝔹 \leftarrow \emptyset$ ;

2. Sort kernel parameters in T such that T = {σ₁, σ₂, …, σ_n}, in which σ₁ ≤ σ₂ ≤ … ≤ σ_n;

3. Compute reduct B₁ based on σ₁ through using Algorithm 1, and then add B₁ into $𝔹$ ;

4. For s = 2 to n

1) B_s ← B_s-1;

2) If B_s is the σ_s based reduct

Add B_s into $𝔹$ ;

s ← s + 1, and go to 1);

Else

Compute $γ_{A}^{σ_{s}} (d)$ , and go to 3);

End

3) Do

num 1) ∀a ∈ A - B_s, compute Sig^{σ
_s} (a, B_s, d);

num 2) Select b such that Sig^{σ
_s} (b, B_s, d) = max {Sig^{σ
_s} (a, B_s, d) : ∀ a ∈ A - B_s};

num 3) B_s ← B_s ∪ {b};

num 4) Compute $γ_{B_{s}}^{σ_{s}} (d)$ ;

While $γ_{A}^{σ_{s}} (d) - γ_{B_{s}}^{σ_{s}} (d) > ɛ \cdot γ_{A}^{σ_{s}} (d)$

4) Do

C ← B_s;

For each c ∈ B_s

Compute $γ_{B_{s} - {c}}^{σ_{s}} (d)$ ;

If $γ_{A}^{σ_{s}} (d) - γ_{B_{s} - {c}}^{σ_{s}} (d) \leq ɛ \cdot γ_{A}^{σ_{s}} (d)$

B_s ← B_s - {c};

End

While C ≠ B_s

5) Add B_s into $𝔹$ ;

End

5. Return $𝔹$

Following Algorithm 3, it should be noticed that the first parameter based reduct is still derived by using Algorithm 1, it follows that the time complexity of computing reduct B₁ is O (|U|² · |A|²). However, the time complexity of computing reduct B₂ is O (|U|² · |A - B₁|²). This is mainly because in the worst case, B₁ does not satisfy the constraint with respect to parameter σ₂, and then the searching space of attributes is actually A - B₁ instead of A. Similarly, the time complexity of computing reduct B₃ is O (|U|² · |A - B₂|²); …; the time complexity of computing reduct B_n is O (|U|² · |A - B_n-1|²). Consequently, the time complexity of Algorithm 3 is $O (| U |^{2} \cdot | A |^{2} + | U |^{2} \cdot \sum_{s = 1}^{n - 1} | A - B_{s} |^{2})$ . It is obvious that the reduct obtained by using Algorithm 1 is nonempty, it follows that $O (| U |^{2} \cdot | A |^{2} + | U |^{2} \cdot \sum_{s = 1}^{n - 1} | A - B_{s} |^{2}) < O (n \cdot | U |^{2} \cdot | A |^{2})$ . From this point of view, compared with the naive approach, it is expected that the proposed forward acceleration strategy may contribute to lower elapsed time of searching reducts over a set of parameters.

Additionally, to facilitate the understanding of our proposed acceleration strategy, an example is presented as follows.

Example 1. Let us consider the following example of data which includes 12 samples and 9 condition attributes. All samples are classified into three decision classes by d (see Table 1).

Table 1
A toy example of decision system

a ₁ a ₂ a ₃ a ₄ a ₅ a ₆ a ₇ a ₈ a ₉ d

x ₁ 0.1564 0.5061 0.1843 0.2004 0.0388 0.1743 0.1220 0.2331 0.1558 1

x ₂ 0.0963 0.5444 0.5817 0.1008 0.0185 0.1523 0.1435 0.1146 0.1082 1

x ₃ 0.1187 0.5989 0.6144 0.1463 0.0346 0.2083 0.1660 0.1646 0.1563 1

x ₄ 0.1419 0.6297 0.3752 0.2244 0.0462 0.1874 0.1329 0.2581 0.1502 1

x ₅ 0.0926 0.3169 0.2941 0.0468 0.0057 0.0859 0.0834 0.0629 0.0930 2

x ₆ 0.0934 0.1493 0.2824 0.0670 0.0055 0.0612 0.0458 0.0971 0.0891 2

x ₇ 0.0193 0.4637 0.3464 0.0185 0.0020 0.0558 0.0417 0.0389 0.0216 2

x ₈ 0.0735 0.2815 0.1908 0.0431 0.0050 0.0800 0.0492 0.0679 0.0761 2

x ₉ 0.0545 0.1004 0.1503 0.0491 0.0009 0.0096 0.0026 0.0806 0.0387 3

x ₁₀ 0.0258 0.3320 0.1961 0.0171 0.0007 0.0217 0.0265 0.0420 0.0180 3

x ₁₁ 0.1638 0.1483 0.1281 0.0777 0.0064 0.0628 0.0501 0.1073 0.1546 3

x ₁₂ 0.0831 0.3781 0.2824 0.0542 0.0091 0.1246 0.0823 0.0730 0.0917 3

	a ₁	a ₂	a ₃	a ₄	a ₅	a ₆	a ₇	a ₈	a ₉	d
x ₁	0.1564	0.5061	0.1843	0.2004	0.0388	0.1743	0.1220	0.2331	0.1558	1
x ₂	0.0963	0.5444	0.5817	0.1008	0.0185	0.1523	0.1435	0.1146	0.1082	1
x ₃	0.1187	0.5989	0.6144	0.1463	0.0346	0.2083	0.1660	0.1646	0.1563	1
x ₄	0.1419	0.6297	0.3752	0.2244	0.0462	0.1874	0.1329	0.2581	0.1502	1
x ₅	0.0926	0.3169	0.2941	0.0468	0.0057	0.0859	0.0834	0.0629	0.0930	2
x ₆	0.0934	0.1493	0.2824	0.0670	0.0055	0.0612	0.0458	0.0971	0.0891	2
x ₇	0.0193	0.4637	0.3464	0.0185	0.0020	0.0558	0.0417	0.0389	0.0216	2
x ₈	0.0735	0.2815	0.1908	0.0431	0.0050	0.0800	0.0492	0.0679	0.0761	2
x ₉	0.0545	0.1004	0.1503	0.0491	0.0009	0.0096	0.0026	0.0806	0.0387	3
x ₁₀	0.0258	0.3320	0.1961	0.0171	0.0007	0.0217	0.0265	0.0420	0.0180	3
x ₁₁	0.1638	0.1483	0.1281	0.0777	0.0064	0.0628	0.0501	0.1073	0.1546	3
x ₁₂	0.0831	0.3781	0.2824	0.0542	0.0091	0.1246	0.0823	0.0730	0.0917	3

Suppose that the threshold ɛ is 0.05 and classical fuzzy rough set is used, we want to find γ-reducts over a set of kernel parameters such that T = {σ₁, σ₂, σ₃} = {0.01, 0.03, 0.05} by Algorithm 3. The main process of computing reducts over T is elaborated as follows.

For kernel parameter σ₁, the corresponding reduct is initially derived by using Algorithm 1 such that B₁ = {a₁, a₂}.

For kernel parameter σ₂, B₂ is initialized to B₁, due to $γ_{A}^{σ_{2}} - γ_{B_{1}}^{σ_{2}} = 0.9891 - 0.9341 = 0.0550 > ɛ \cdot γ_{A}^{σ_{2}} = 0.05 \times 0.9891 = 0.0495$ , then B₁ does not satisfy σ₂ based constraint. Therefore, B₁ can not be σ₂ based reduct, and other remained candidate attributes, i.e., a₃, a₄, a₅, a₆, a₇, a₈, a₉ will be further evaluated. That is, the current searching space is A - B₁ instead of A. Based on the evaluations of these remained candidate attributes, a₆ is selected and added into B₁. Immediately, B₂ = B₁ ∪ {a₆} = {a₁, a₂, a₆}, since B₂ satisfies the constraint with respect to σ₂, and no attributes can be removed from B₂, it follows that B₂ is σ₂ based reduct.

For kernel parameter σ₃, similarly, B₃ is initialized to B₂, due to $γ_{A}^{σ_{3}} - γ_{B_{2}}^{σ_{3}} = 0.9257 - 0.8130 = 0.1127 > ɛ \cdot γ_{A}^{σ_{3}} = 0.05 \times 0.9257 = 0.0463$ , then B₂ does not satisfy σ₃ based constraint. Therefore, B₂ can not be σ₃ based reduct, and other remained candidate attributes, i.e., a₃, a₄, a₅, a₇, a₈, a₉ will be further evaluated. That is, the current searching space is A - B₂ instead of A. Based on the evaluations of these remained candidate attributes, a₃ is selected and added into B₂. Immediately, B₃ = B₂ ∪ {a₃} = {a₁, a₂, a₃, a₆}, since B₃ satisfies the constraint with respect to σ₃, and no attributes can be removed from B₃, it follows that B₃ is σ₃ based reduct.

Following Example 1, it is not difficult to observe that different from the naive approach, the searching space of attributes has been reduced in the framework of our forward acceleration strategy. For instance, if σ₂ based reduct is required, then all candidate attributes in A should be searched by using Algorithm 2, while only the candidate attributes in A - B₁ are searched in terms of Algorithm 3.

4 Experimental analysis

To demonstrate the effectiveness of our acceleration strategy, 16 UCI data sets have been employed to conduct the experiments, and the detailed description of data sets is shown in the following (see Table 2).

Table 2
The detailed description of data sets

ID Data sets #Samples #Attributes #Decision classes

1 Amphetamines Consumption 1885 12 7

2 Breast Cancer Wisconsin (Diagnostic) 569 30 2

3 Breast Tissue 106 9 6

4 Diabetes 372 8 2

5 Forest Type Mapping 523 27 4

6 Libras Movement 360 90 15

7 Parkinsons 195 23 7

8 Parkinson Multiple Sound Recording Data 1040 26 2

9 Seeds 210 7 3

10 Sonar 208 60 2

11 Statlog (Heart) 270 13 2

12 Statlog (Vehicle Silhouettes) 846 18 4

13 Synthetic Control Chart Time Series 600 60 6

14 Waveform 5000 21 3

15 Wine 178 13 3

16 Wireless Indoor Localization 2000 7 4

ID	Data sets	#Samples	#Attributes	#Decision classes
1	Amphetamines Consumption	1885	12	7
2	Breast Cancer Wisconsin (Diagnostic)	569	30	2
3	Breast Tissue	106	9	6
4	Diabetes	372	8	2
5	Forest Type Mapping	523	27	4
6	Libras Movement	360	90	15
7	Parkinsons	195	23	7
8	Parkinson Multiple Sound Recording Data	1040	26	2
9	Seeds	210	7	3
10	Sonar	208	60	2
11	Statlog (Heart)	270	13	2
12	Statlog (Vehicle Silhouettes)	846	18	4
13	Synthetic Control Chart Time Series	600	60	6
14	Waveform	5000	21	3
15	Wine	178	13	3
16	Wireless Indoor Localization	2000	7	4

In our experiments, a set of 30 different kernel parameters has been appointed such that T = {0.01, 0.02, …, 0.30}. As used in Reference [12], the value of k has been set to be 3 in k-trimmed, k-mean and k-median fuzzy rough set models. The value of threshold ɛ in algorithms to compute reducts has been set to be 0.05. Moreover, 5-fold cross-validation has been employed. That is, each data set U is divided into 5 disjoint parts such that U = U₁ ∪ U₂ ∪ … ∪ U₅. In the first round of calculation, U₂ ∪ U₃ ∪ U₄ ∪ U₅ is regraded as the training set for computing reducts in terms of different parameters, U₁ is regarded as the testing set for evaluating the classification performance of attributes in the obtained reducts; …; In the last round of calculation, U₁ ∪ U₂ ∪ U₃ ∪ U₄ is regarded as the training set for computing reducts in terms of different parameters, U₅ is regarded as the testing set for evaluating the classification performance of attributes in the obtained reducts. Accordingly, the mean values of the corresponding experimental results from 5-fold cross-validation will be mainly compared.

Notably, in the following experiments, to simply our discussion, NA-FRS, NA-FRS k-trimmed, NA-FRS k-mean and NA-FRS k-median denote the Naive Approach in terms of classical, k-trimmed, k-mean and k-median Fuzzy Rough Sets, respectively; FAS-FRS, FAS-FRS k-trimmed, FAS-FRS k-mean and FAS-FRS k-median indicate the Forward Acceleration Strategy in terms of classical, k-trimmed, k-mean and k-median Fuzzy Rough Sets, respectively.

4.1 Comparisons of time consumption

In the following, the elapsed time of searching reducts over a set of kernel parameters through using the naive approach and our acceleration strategy will be compared. The detailed results are exhibited in the following.

With a careful investigation of Fig. 2, it is not difficult to observe the following.

Compared with the naive approach, our proposed acceleration approach can significantly reduce the elapsed time of finding reducts over multiple different parameters, especially based on k-trimmed, k-mean and k-median fuzzy rough set models. Moreover, in most cases, if the value of employed σ increases, then the corresponding time consumption of finding reduct may tend to be higher. This is mainly because with the increasing of value of σ, then the constraint with respect to σ may be stricter, it follows that more attributes may be required to satisfy such constraint, and then the corresponding time consumption of searching reduct may be increased.

Compared with the classical fuzzy rough set, searching reducts based on other three fuzzy rough set models generally result in higher elapsed time. Take the data set “Forest Type Mapping” as an example, if σ = 0.25, then finding the corresponding reduct by NA-FRS costs 2.1067 seconds, while the searchings of corresponding reducts with respect to NA-FRS k-trimmed, NA-FRS k-mean and NA-FRS k-meidan cost 10.8481 seconds, 11.2685 seconds and 11.5677 seconds, respectively. The main reason for such result is that the operators used to derive fuzzy rough lower approximations in the latter three models are actually based on k nearest samples instead of only one nearest sample in the classical fuzzy rough set model, it follows that more elapsed time is required.

The time consumption of computing reduct over the first parameter based on naive approach is similar to that based on our forward acceleration strategy. Take the data set “Waveform” as an example, if σ = 0.01, then searching such reduct by NA-FRS costs 38.1567 seconds, while computing corresponding reduct in terms of FAS-FRS costs 39.9724 seconds. This is mainly because both the first parameter based reducts from the naive approach and our forward acceleration approach are generated by Algorithm 1.

Fig.2

Comparisons among time consumption (seconds) of finding reducts.

To further compare the time consumption of computing reducts between these two comparative approaches from the viewpoint of statistical theory, Wilcoxon rank sum test [46] will be employed. The purpose of our computation is trying to reject the null-hypothesis that such two approaches perform equally well in the elapsed time of searching reducts over multiple parameters. Assuming that the significance level is set to be 0.05, if the returned p-values are less than 0.05, then we reject the null-hypothesis. The detailed results of p-values are shown as follows.

Following Table 3, it is not difficult to observe that all the returned p-values are less than 0.05. Therefore, we can reject the null-hypothesis that such two approaches perform equally well in terms of time consumption of computing reducts over multiple different parameters. In other words, such result suggests that the elapsed time of computing reducts by using such two approaches is significantly different.

Table 3

p-values of Wilconxon rank sum test w.r.t. time consumption of computing reducts

ID	NA-FRS & FAS-FRS	NA-FRS k-trimmed & FAS-FRS k-trimmed	NA-FRS k-mean & FAS-FRS k-mean	NA-FRS k-median & FAS-FRS k-median
1	3.34E-11	3.02E-11	3.02E-11	3.02E-11
2	3.02E-11	3.34E-11	3.02E-11	3.02E-11
3	3.02E-11	3.02E-11	3.34E-11	3.34E-11
4	3.02E-11	3.02E-11	3.02E-11	3.02E-11
5	3.02E-11	3.02E-11	3.02E-11	3.34E-11
6	3.34E-11	3.02E-11	3.34E-11	3.34E-11
7	3.34E-11	3.34E-11	3.34E-11	3.02E-11
8	3.34E-11	3.34E-11	3.34E-11	3.02E-11
9	3.34E-11	3.69E-11	3.02E-11	3.02E-11
10	3.02E-11	3.02E-11	3.02E-11	3.34E-11
11	3.02E-11	3.02E-11	3.02E-11	3.02E-11
12	3.02E-11	3.02E-11	3.69E-11	3.02E-11
13	3.02E-11	3.69E-11	3.02E-11	3.02E-11
14	3.34E-11	3.02E-11	3.02E-11	3.02E-11
15	3.02E-11	3.02E-11	3.02E-11	3.02E-11
16	3.34E-11	3.02E-11	3.34E-11	3.34E-11

Additionally, following the results shown in Fig. 2 and Table 3, it is not difficult to reveal that compared with the naive approach, our proposed acceleration strategy can significantly reduce the elapsed time of finding reducts over multiple different parameters.

4.2 Comparisons of classification performance

In the following, the classification performances of attributes in the obtained reducts will be compared. The KNN (parameter in KNN is 3) and SVM (LIBSVM [2]) classifiers are used for testing the classification performance. Note that the maximal and mean values of classification accuracies over a set of parameters will be mainly compared in this experiment. The detail results are presented as follows, where the higher values of classification accuracies are highlighted in bold, and the value in brackets is the kernel parameter corresponding to the maximal classification accuracy.

With a deep investigation of Tables 4–7, it is not difficult to observe the following.

No matter which fuzzy rough set model is employed, the maximal values of classification accuracies obtained by using our proposed acceleration approach are similar to those induced by using the naive approach. Take the data set “Breast Tissue” (ID: 3) as an example, if KNN classifier is employed, then the maximal value of classification accuracies derived by NA-FRS is 0.6801, and the maximal value of classification accuracies derived by FAS-FRS is 0.6706; if SVM classifier is used, then the maximal value of classification accuracies derived by NA-FRS is 0.5952, and the maximal value of classification accuracies derived by FAS-FRS is 0.5952.

The mean values of classification accuracies induced by using our proposed acceleration approach are similar to those derived by using the naive approach. Take the data set “Waveform” (ID: 14) as an example, if KNN classifier is used, then the mean values of classification accuracies derived by NA-FRS, NA-FRS k-trimmed, NA-FRS k-mean and NA-FRS k-median are 0.7697, 0.7565, 0.7630 and 0.7595, respectively; the mean values of classification accuracies derived by FAS-FRS, FAS-FRS k-trimmed, FAS-FRS k-mean and FAS-FRS k-median are 0.7664, 0.7578, 0.7598 and 0.7608, respectively. Such result implies that compared with the naive approach, our proposed acceleration strategy may not result in the poorer classification performance of attributes in the obtained reducts.

Table 4
Maximal values of classification accuracies w.r.t. different reducts (KNN classifier)

ID NA-FRS FAS-FRS NA-FRS k-trimmed FAS-FRS k-trimmed NA-FRS k-mean FAS-FRS k-mean NA-FRS k-median FAS-FRS k-median

1 0.4440(0.15) 0.4509(0.15) 0.4509(0.12) 0.4546(0.15) 0.4435(0.18) 0.4509(0.16) 0.4573(0.12) 0.4515(0.12)

2 0.9719(0.30) 0.9719(0.30) 0.9736(0.11) 0.9719(0.29) 0.9701(0.30) 0.9684(0.29) 0.9701(0.25) 0.9701(0.26)

3 0.6801(0.07) 0.6706(0.13) 0.6991(0.09) 0.6991(0.07) 0.6706(0.06) 0.6706(0.05) 0.6706(0.06) 0.6706(0.06)

4 0.6910(0.13) 0.6910(0.14) 0.6936(0.21) 0.6936(0.21) 0.6962(0.02) 0.7015(0.02) 0.7070(0.02) 0.7070(0.02)

5 0.8585(0.18) 0.8547(0.26) 0.8604(0.30) 0.8528(0.24) 0.8585(0.30) 0.8547(0.30) 0.8528(0.23) 0.8528(0.23)

6 0.8222(0.25) 0.8083(0.25) 0.8111(0.24) 0.8083(0.29) 0.8083(0.30) 0.8111(0.26) 0.8167(0.16) 0.8139(0.24)

7 0.1128(0.01) 0.1128(0.01) 0.1385(0.02) 0.1333(0.01) 0.1282(0.01) 0.1282(0.01) 0.1282(0.01) 0.1282(0.01)

8 0.6920(0.08) 0.6945(0.06) 0.6887(0.08) 0.6896(0.07) 0.6920(0.08) 0.6879(0.15) 0.6962(0.07) 0.6929(0.07)

9 0.9381(0.10) 0.9429(0.10) 0.9381(0.15) 0.9333(0.14) 0.9381(0.11) 0.9429(0.12) 0.9381(0.13) 0.9429(0.11)

10 0.8372(0.30) 0.8274(0.15) 0.8369(0.26) 0.8418(0.30) 0.8462(0.29) 0.8367(0.27) 0.8370(0.27) 0.8466(0.20)

11 0.8407(0.10) 0.8370(0.11) 0.8037(0.15) 0.8370(0.19) 0.8296(0.10) 0.8185(0.21) 0.8370(0.14) 0.8185(0.18)

12 0.6998(0.26) 0.6951(0.17) 0.6986(0.25) 0.6998(0.30) 0.6986(0.30) 0.6962(0.19) 0.6974(0.26) 0.6974(0.27)

13 0.9267(0.29) 0.9333(0.30) 0.8917(0.30) 0.8950(0.30) 0.9267(0.30) 0.9217(0.30) 0.9117(0.30) 0.9300(0.30)

14 0.7988(0.25) 0.7988(0.25) 0.7988(0.29) 0.7988(0.29) 0.7988(0.27) 0.7988(0.27) 0.7974(0.27) 0.7974(0.27)

15 0.9833(0.20) 0.9833(0.20) 0.9776(0.30) 0.9833(0.26) 0.9776(0.25) 0.9776(0.23) 0.9776(0.23) 0.9776(0.25)

16 0.9855(0.10) 0.9855(0.10) 0.9860(0.13) 0.9860(0.13) 0.9860(0.11) 0.9860(0.11) 0.9860(0.11) 0.9860(0.11)

ID	NA-FRS	FAS-FRS	NA-FRS k-trimmed	FAS-FRS k-trimmed	NA-FRS k-mean	FAS-FRS k-mean	NA-FRS k-median	FAS-FRS k-median
1	0.4440(0.15)	0.4509(0.15)	0.4509(0.12)	0.4546(0.15)	0.4435(0.18)	0.4509(0.16)	0.4573(0.12)	0.4515(0.12)
2	0.9719(0.30)	0.9719(0.30)	0.9736(0.11)	0.9719(0.29)	0.9701(0.30)	0.9684(0.29)	0.9701(0.25)	0.9701(0.26)
3	0.6801(0.07)	0.6706(0.13)	0.6991(0.09)	0.6991(0.07)	0.6706(0.06)	0.6706(0.05)	0.6706(0.06)	0.6706(0.06)
4	0.6910(0.13)	0.6910(0.14)	0.6936(0.21)	0.6936(0.21)	0.6962(0.02)	0.7015(0.02)	0.7070(0.02)	0.7070(0.02)
5	0.8585(0.18)	0.8547(0.26)	0.8604(0.30)	0.8528(0.24)	0.8585(0.30)	0.8547(0.30)	0.8528(0.23)	0.8528(0.23)
6	0.8222(0.25)	0.8083(0.25)	0.8111(0.24)	0.8083(0.29)	0.8083(0.30)	0.8111(0.26)	0.8167(0.16)	0.8139(0.24)
7	0.1128(0.01)	0.1128(0.01)	0.1385(0.02)	0.1333(0.01)	0.1282(0.01)	0.1282(0.01)	0.1282(0.01)	0.1282(0.01)
8	0.6920(0.08)	0.6945(0.06)	0.6887(0.08)	0.6896(0.07)	0.6920(0.08)	0.6879(0.15)	0.6962(0.07)	0.6929(0.07)
9	0.9381(0.10)	0.9429(0.10)	0.9381(0.15)	0.9333(0.14)	0.9381(0.11)	0.9429(0.12)	0.9381(0.13)	0.9429(0.11)
10	0.8372(0.30)	0.8274(0.15)	0.8369(0.26)	0.8418(0.30)	0.8462(0.29)	0.8367(0.27)	0.8370(0.27)	0.8466(0.20)
11	0.8407(0.10)	0.8370(0.11)	0.8037(0.15)	0.8370(0.19)	0.8296(0.10)	0.8185(0.21)	0.8370(0.14)	0.8185(0.18)
12	0.6998(0.26)	0.6951(0.17)	0.6986(0.25)	0.6998(0.30)	0.6986(0.30)	0.6962(0.19)	0.6974(0.26)	0.6974(0.27)
13	0.9267(0.29)	0.9333(0.30)	0.8917(0.30)	0.8950(0.30)	0.9267(0.30)	0.9217(0.30)	0.9117(0.30)	0.9300(0.30)
14	0.7988(0.25)	0.7988(0.25)	0.7988(0.29)	0.7988(0.29)	0.7988(0.27)	0.7988(0.27)	0.7974(0.27)	0.7974(0.27)
15	0.9833(0.20)	0.9833(0.20)	0.9776(0.30)	0.9833(0.26)	0.9776(0.25)	0.9776(0.23)	0.9776(0.23)	0.9776(0.25)
16	0.9855(0.10)	0.9855(0.10)	0.9860(0.13)	0.9860(0.13)	0.9860(0.11)	0.9860(0.11)	0.9860(0.11)	0.9860(0.11)

Table 5

Maximal values of classification accuracies w.r.t. different reducts (SVM classifier)

ID	NA-FRS	FAS-FRS	NA-FRS k-trimmed	FAS-FRS k-trimmed	NA-FRS k-mean	FAS-FRS k-mean	NA-FRS k-median	FAS-FRS k-median
1	0.5199(0.09)	0.5199(0.11)	0.5183(0.10)	0.5215(0.10)	0.5194(0.10)	0.5199(0.12)	0.5194(0.13)	0.5204(0.08)
2	0.9630(0.14)	0.9648(0.06)	0.9666(0.13)	0.9701(0.22)	0.9665(0.15)	0.9683(0.14)	0.9683(0.17)	0.9666(0.18)
3	0.5952(0.15)	0.5952(0.15)	0.5952(0.21)	0.5952(0.21)	0.5952(0.16)	0.5952(0.16)	0.5952(0.17)	0.5952(0.17)
4	0.7098(0.13)	0.7098(0.14)	0.7098(0.24)	0.7098(0.24)	0.7124(0.09)	0.7098(0.18)	0.7124(0.08)	0.7124(0.08)
5	0.8681(0.15)	0.8700(0.14)	0.8681(0.22)	0.8681(0.20)	0.8681(0.16)	0.8700(0.16)	0.8700(0.17)	0.8662(0.16)
6	0.4889(0.22)	0.4833(0.18)	0.4833(0.25)	0.4667(0.30)	0.4833(0.13)	0.4750(0.22)	0.4778(0.16)	0.4722(0.26)
7	0.2769(0.17)	0.2769(0.21)	0.2667(0.21)	0.2821(0.20)	0.2769(0.17)	0.2974(0.16)	0.2923(0.17)	0.2974(0.17)
8	0.6589(0.06)	0.6589(0.07)	0.6639(0.09)	0.6631(0.10)	0.6622(0.07)	0.6598(0.08)	0.6614(0.08)	0.6614(0.08)
9	0.9238(0.04)	0.9190(0.26)	0.9190(0.11)	0.9190(0.18)	0.9190(0.04)	0.9190(0.15)	0.9238(0.20)	0.9238(0.11)
10	0.7503(0.27)	0.7309(0.30)	0.7455(0.17)	0.7214(0.27)	0.7599(0.24)	0.7310(0.22)	0.7791(0.30)	0.7308(0.26)
11	0.8407(0.29)	0.8407(0.28)	0.8407(0.28)	0.8185(0.29)	0.8407(0.23)	0.8407(0.30)	0.8444(0.27)	0.8259(0.29)
12	0.6111(0.11)	0.6123(0.14)	0.6206(0.15)	0.6146(0.19)	0.6111(0.15)	0.6123(0.16)	0.6135(0.16)	0.6123(0.17)
13	0.9617(0.30)	0.9483(0.30)	0.9317(0.30)	0.9283(0.29)	0.9567(0.30)	0.9483(0.28)	0.9417(0.30)	0.9450(0.29)
14	0.8630(0.25)	0.8630(0.25)	0.8630(0.29)	0.8630(0.29)	0.8638(0.26)	0.8638(0.26)	0.8626(0.27)	0.8626(0.27)
15	0.9889(0.26)	0.9833(0.23)	0.9722(0.24)	0.9778(0.29)	0.9833(0.29)	0.9889(0.29)	0.9889(0.30)	0.9889(0.30)
16	0.9790(0.05)	0.9780(0.05)	0.9790(0.05)	0.9785(0.08)	0.9790(0.06)	0.9780(0.05)	0.9790(0.06)	0.9790(0.06)

Table 6

Mean values of classification accuracies w.r.t. different reducts (KNN classifier)

ID	NA-FRS	FAS-FRS	NA-FRS k-trimmed	FAS-FRS k-trimmed	NA-FRS k-mean	FAS-FRS k-mean	NA-FRS k-median	FAS-FRS k-median
1	0.4298	0.4361	0.4358	0.4391	0.4351	0.4355	0.4355	0.4398
2	0.9608	0.9577	0.9570	0.9531	0.9580	0.9586	0.9590	0.9567
3	0.6636	0.6608	0.6516	0.6548	0.6573	0.6567	0.6545	0.6535
4	0.6773	0.6787	0.6738	0.6700	0.6782	0.6778	0.6824	0.6828
5	0.8441	0.8394	0.8337	0.8268	0.8415	0.8417	0.8400	0.8354
6	0.7505	0.7368	0.7062	0.6882	0.7478	0.7406	0.7347	0.7159
7	0.0458	0.0468	0.0593	0.0569	0.0523	0.0470	0.0538	0.0497
8	0.6778	0.6775	0.6715	0.6739	0.6735	0.6729	0.6726	0.6729
9	0.9162	0.9176	0.9094	0.9138	0.9111	0.9106	0.9124	0.9102
10	0.7856	0.7937	0.7707	0.7706	0.7834	0.7938	0.7825	0.7644
11	0.7781	0.7783	0.7478	0.7523	0.7646	0.7728	0.7577	0.7577
12	0.6666	0.6649	0.6502	0.6476	0.6594	0.6578	0.6573	0.6553
13	0.8392	0.8363	0.7674	0.7379	0.8122	0.8097	0.7921	0.7821
14	0.7697	0.7664	0.7565	0.7578	0.7630	0.7598	0.7595	0.7608
15	0.9495	0.9583	0.9365	0.9428	0.9430	0.9531	0.9476	0.9514
16	0.9827	0.9825	0.9820	0.9820	0.9823	0.9824	0.9825	0.9825

Table 7

Mean values of classification accuracies w.r.t. different reducts (SVM classifier)

ID	NA-FRS	FAS-FRS	NA-FRS k-trimmed	FAS-FRS k-trimmed	NA-FRS k-mean	FAS-FRS k-mean	NA-FRS k-median	FAS-FRS k-median
1	0.5164	0.5162	0.5160	0.5171	0.5161	0.5166	0.5161	0.5167
2	0.9591	0.9610	0.9570	0.9571	0.9588	0.9588	0.9581	0.9553
3	0.5682	0.5711	0.5537	0.5597	0.5650	0.5663	0.5597	0.5638
4	0.7055	0.7080	0.7021	0.7016	0.7046	0.7046	0.7043	0.7041
5	0.8572	0.8549	0.8521	0.8504	0.8568	0.8535	0.8579	0.8524
6	0.4299	0.4244	0.4097	0.3925	0.4281	0.4106	0.4184	0.4039
7	0.2532	0.2588	0.2549	0.2509	0.2538	0.2571	0.2581	0.2552
8	0.6518	0.6514	0.6476	0.6465	0.6509	0.6510	0.6488	0.6492
9	0.9133	0.9127	0.9016	0.9030	0.9068	0.9111	0.9097	0.9125
10	0.7039	0.7054	0.7097	0.6897	0.7135	0.6853	0.7277	0.6726
11	0.8033	0.8012	0.7848	0.7699	0.8021	0.7953	0.7974	0.7819
12	0.5852	0.5836	0.5715	0.5758	0.5795	0.5794	0.5771	0.5795
13	0.8736	0.8622	0.8144	0.7852	0.8541	0.8542	0.8359	0.8246
14	0.8306	0.8279	0.8152	0.8165	0.8246	0.8200	0.8208	0.8199
15	0.9519	0.9555	0.9392	0.9385	0.9461	0.9526	0.9516	0.9494
16	0.9765	0.9765	0.9763	0.9759	0.9762	0.9762	0.9764	0.9763

Similarly, Wilcoxon rank sum test [46] will be employed to compare the naive approach and our acceleration strategy in terms of the classification performance of attributes in the obtained reducts. Assuming that the significance level is set to be 0.05. The detailed p-values are shown in the following, where the poorer values are highlighted in italics.

Through a careful investigation of Tables 8 and 9, it is not difficult to observe that the derived p-values over the most data sets are greater than 0.05. In view of such result, for KNN and SVM classifiers, the classification performances of attributes in the obtained reducts generated by such two approaches are not significantly different.

Table 8

p-values of Wilcoxon rank sum test w.r.t. classification performance (KNN classifier)

ID	NA-FRS & FAS-FRS	NA-FRS k-trimmed & FAS-FRS k-trimmed	NA-FRS k-mean & FAS-FRS k-mean	NA-FRS k-median & FAS-FRS k-median
1	0.0016	0.0772	0.0091	0.0156
2	0.2604	0.0427	0.8823	0.6305
3	0.2353	0.2291	0.8375	0.6347
4	0.9210	0.7422	0.9815	0.9818
5	0.2899	0.0270	0.7838	0.4056
6	0.4595	0.4288	0.7957	0.2836
7	0.9062	0.2548	0.4397	0.1555
8	0.9351	0.9823	0.4414	0.9941
9	0.6979	0.9029	0.4622	0.2849
10	0.5541	0.8360	0.3554	0.7787
11	0.8413	0.1594	0.0717	0.5540
12	0.1270	0.3988	0.3321	0.2362
13	0.8302	0.2770	0.9764	0.7116
14	0.9764	1.0000	0.6677	0.8822
15	0.0912	0.2594	0.1916	0.8230
16	0.9564	0.9543	0.9386	0.9659

Table 9

p-values of Wilcoxon rank sum test w.r.t. classification performance (SVM classifier)

ID	NA-FRS & FAS-FRS	NA-FRS k-trimmed & FAS-FRS k-trimmed	NA-FRS k-mean & FAS-FRS k-mean	NA-FRS k-median & FAS-FRS k-median
1	0.3063	0.0247	0.0684	0.0968
2	0.0483	0.9646	0.6555	0.2331
3	0.8787	0.4415	0.9558	0.8772
4	0.9407	0.8988	0.7862	0.9697
5	0.5715	0.5288	0.8467	0.1532
6	0.8941	0.1578	0.2364	0.2251
7	0.0767	0.2739	0.3616	0.2675
8	0.7040	0.4104	0.6511	0.7839
9	0.9758	0.9462	0.0503	0.1933
10	0.1956	0.0018	0.0005	0.0000
11	0.3732	0.0322	0.2729	0.0361
12	0.1973	0.3949	0.9882	0.7390
13	0.3182	0.2427	0.8302	0.6466
14	0.9057	0.9409	0.8765	0.8822
15	0.5286	0.5822	0.6517	0.5838
16	0.6615	0.3669	1.0000	0.6657

Moreover, by Tables 4–9, it is not difficult to conclude that compared with the naive approach, our acceleration strategy can generate the reducts of which the attributes may not contribute to poorer classification performance.

4.3 Comparisons of length of reducts

In the following, the lengths of reducts, which are generated by using the naive approach and our acceleration strategy based on different fuzzy rough set models, will be compared. The detailed results are shown as follows.

With a careful investigation of the results shown in Fig. 3, it is not difficult to observe the following.

The length of reducts generated with our acceleration strategy is similar to that generated with the naive approach over most data sets. Such observation result indicates that compared with the naive approach, the using of our acceleration strategy will not lead to significant difference in terms of the length of obtained reducts.

With the increasing of value of kernel parameter σ, the lengths of obtained reducts based on different models show us an increased trend whether the naive approach or our acceleration strategy is employed. Such observation tells that if the value of σ increases, then more attributes will be selected to derive the reducts, which satisfy the intended constraint. The main reason for such result is that the fuzzy similarity between samples will be increased with the increasing of value of σ, and the membership degree of sample belonging to fuzzy rough lower approximation will be decreased, then the corresponding fuzzy dependency will be decreased. Therefore, to search the reducts which satisfy the predefined constraint, more attributes are selected.

The lengths of the obtained reducts based on classical fuzzy rough set are greater than the lengths of the obtained reducts based on other three fuzzy rough sets over most data sets whether the naive approach or our acceleration approach is employed. Such result indicates that compared with the classical fuzzy rough set, the using of the other three fuzzy rough set models may generate the reducts with fewer attributes.

Fig.3

Comparisons of lengths among different reducts.

Similar to Sections 4.1 and 4.2, Wilcoxon rank sum test [46] is selected for comparing the lengths of reducts derived by the naive approach and our acceleration strategy. The detailed p-values are shown in the following, where the poorer values are highlighted in italic.

Assuming that the significance level is set to be 0.05, following the results shown in Table 10, it is not difficult to observe that the most p-values are greater than 0.05. From this point of view, such two approaches do not have significant difference with respect to the length of the obtained reducts.∥Moreover, based on the results shown in Fig. 3 and Table 10, it is not difficult to conclude that lengths of reducts derived by the naive approach and our acceleration strategy are similar, and the differences between such two approaches are not significant. That is, compared with the naive approach, the using of our acceleration strategy will not contribute to significant variation of the length of the obtained reduct.

Table 10

p-value of Wilcoxon rank sum test w.r.t. length of reduct

ID	NA-FRS & FAS-FRS	NA-FRS k-trimmed & FAS-FRS k-trimmed	NA-FRS k-mean & FAS-FRS k-mean	NA-FRS k-median & FAS-FRS k-median
1	0.0436	0.0893	0.0181	0.0920
2	0.9293	0.9293	0.9941	0.9823
3	1.0000	0.9940	0.9937	1.0000
4	0.8104	1.0000	1.0000	1.0000
5	0.7897	0.7559	0.8128	0.8129
6	0.9234	0.9410	0.9646	0.9941
7	0.6255	0.8417	0.7729	0.7170
8	0.9881	0.9882	1.0000	0.9881
9	0.9644	0.9107	0.9286	0.9400
10	0.8475	0.8185	0.9882	0.9882
11	0.5832	0.0350	0.2514	0.5943
12	0.8927	0.5886	0.8118	0.7165
13	0.9117	1.0000	0.9528	0.9705
14	1.0000	1.0000	0.9823	0.9882
15	0.8765	0.8645	0.8590	0.8357
16	0.9563	0.9804	0.9932	0.9795

5 Conclusions and future perspectives

Gaussian kernel is frequently employed to measure the similarities between samples, it follows that different kernel parameters may result in different fuzzy approximations and the corresponding reducts. Consequently, given a set of multiple different kernel parameters, to find a parameterized reduct with better generalization performance has become a key topic. For such purpose, a naive approach can be realized by repeating the process of finding reduct over each parameter. Nevertheless, such approach is too time-consuming, especially the number of the considered parameters is great. To alleviate such a problem, an acceleration approach is proposed to decrease the time consumption of searching reducts over multiple parameters. Such approach is mainly designed by considering the variation of the used kernel parameters, and the process of computing reduct over current parameter is searched based on the reduct related to previous parameter, which can reduce the searching space of attributes and the whole elapsed time of finding reducts over multiple different parameters. Moreover, the experimental results suggest that the proposed acceleration strategy can not only significantly reduce the time consumption of finding reducts over multiple different parameters, but also may not contribute to poorer classification performance of attributes in the obtained reducts.

The following topics deserve our further investigations.

Only the forward acceleration strategy for attribute reduction is considered in this paper, the backward acceleration strategy will be further taken into account.

Only the attribute reduction based on the measure fuzzy dependency is considered in this paper, the attribute reduction based on other measures such as conditional entropy and mutual information will be further explored.

The acceleration mechanism to attribute reduction over single parameter will be introduced to further improve the time efficiency of our proposed framework.

Footnotes

Acknowledgments

This work is supported by the Natural Science Foundation of China (Nos. 61906078, 61572242), Key Laboratory of Data Science and Intelligence Application, Fujian Province University (No. D1901).

References

Aggarwal

, Probabilistic fuzzy rough sets, Journal of Intelligent and Fuzzy Systems 29 (2015), 1901–1912.

Chang

C.C.

, Lin

C.J.

, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011), 1–27.

Chen

D.G.

, Hu

Q.H.

, Yang

Y.P.

, Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets, Information Sciences 181 (2011), 5169–5179.

Chen

H.M.

, Li

T.R.

, Fan

, Luo

, Feature selection for imbalanced data based on neighborhood rough sets, Information Sciences 483 (2019), 1–20.

Chen

, Song

J.J.

, Liu

K.Y.

, Lin

Y.J.

, Yang

X.B.

, Combined accelerator for attribute reduction: A sample perspective, Mathematical Problems in Engineering (2020) doi: 10.1155/2020/2350627.

Dai

J.H.

, Rough set approach to incomplete numerical data, Information Sciences 241 (2013), 43–57.

Dai

J.H.

, Tian

H.W.

, Wang

W.T.

, Liu

, Decision rule mining using classification consistency rate, Knowledge-Based Systems 43 (2013), 95–102.

Dai

J.H.

, Xu

, Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification, Applied Soft Computing 13 (2013), 211–221.

Dubois

, Prade

, Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems 17 (1990), 191–209.

10.

Q.H.

, An

, Yu

D.R.

, Soft fuzzy rough sets for robust feature evaluation and selection, Information Sciences 180 (2010), 4384–4400.

11.

Q.H.

, Yu

D.R.

, Liu

J.F.

, Wu

C.X.

, Neighborhood rough set based heterogeneous feature subset selection, Information Sciences 178 (2008), 3577–3594.

12.

Q.H.

, Zhang

, An

, Yu

D.R.

, On robust fuzzy rough set models, IEEE Transactions on Fuzzy Systems 20 (2012), 636–651.

13.

Q.H.

, Zhang

, Chen

D.G.

, Pedrycz

, Yu

D.R.

, Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications, International Journal of Approximate Reasoning 51 (2010), 453–471.

14.

Jia

X.Y.

, Rao

, Shang

, Li

T.J.

, Similarity-based attribute reduction in rough set theory: A clustering perspective, International Journal of Machine Learning and Cybernetics (2019) doi: 10.1007/s13042-019-00959-w.

15.

Jiang

Z.H.

, Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Fujita

, Qian

Y.H.

, Accelerator for supervised neighborhood based attribute reduction, International Journal of Approximate Reasoning 119 (2020), 122–150.

16.

Jiang

Z.H.

, Yang

X.B.

, Yu

H.L.

, Liu

, Wang

P.X.

, Qian

Y.H.

, Accelerator for multi-granularity attribute reduction, Knowledge-Based Systems 177 (2019), 145–158.

17.

H.R.

, Li

H.X.

, Yang

X.B.

, Zhou

X.Z.

, Huang

, Cost-sensitive rough set: A multi-granulation approach, Knowledge-Based Systems 123 (2017), 137–153.

18.

H.R.

, Yang

X.B.

, Song

X.N.

, Qi

Y.S.

, Dynamic updating multigranulation fuzzy rough set: Approximations and reducts, International Journal of Machine Learning and Cybernetics 5 (2014), 981–990.

19.

H.R.

, Yang

X.B.

, Yu

H.L.

, Li

T.J.

, Yu

D.J.

, Yang

J.Y.

, Cost-sensitive rough set approach, Information Sciences 355–356 (2016), 282–298.

20.

, Li

D.Y.

, Zhai

Y.H.

, Wang

S.G.

, Zhang

, A novel attribute reduction approach for multi-label data based on rough set theory, Information Sciences 367-368 (2016), 827–847.

21.

J.H.

, Mei

C.L.

, Lv

Y.J.

, Knowledge reduction in decision formal contexts, Knowledge-Based Systems 24 (2011), 709–715.

22.

J.H.

, Mei

C.L.

, Wang

J.H.

, Zhang

, Rule-preserved object compression in formal decision contexts using concept lattices, Knowledge-Based Systems 71 (2014), 435–445.

23.

J.H.

, Ren

, Mei

C.L.

, Qian

Y.H.

, Yang

X.B.

, A comparative study of multigranulation rough sets and concept lattices via rule acquisition, Knowledge-Based Systems 91 (2016), 152–164.

24.

J.Z.

, Yang

X.B.

, Song

X.N.

, Li

J.H.

, Wang

P.X.

, Yu

D.J.

, Neighborhood attribute reduction: A multi-criterion approach, International Journal of Machine Learning and Cybernetics 10 (2019), 731–742.

25.

Y.W.

, Lin

Y.J.

, Liu

J.H.

, Weng

, Shi

Z.K.

, Wu

S.X.

, Feature selection for multi-label learning based on kernelized fuzzy rough sets, Neurocomputing 318 (2018), 271–286.

26.

Lin

G.P.

, Liang

J.Y.

, Qian

Y.H.

, Li

J.J.

, A fuzzy multigranulation decision-theoretic approach to multi-source fuzzy information systems, Knowledge-Based Systems 91 (2016), 102–113.

27.

Lin

Y.J.

, Li

Y.W.

, Wang

C.X.

, Chen

J.K.

, Attribute reduction for multi-label learning with fuzzy rough set, Knowledge-Based Systems 152 (2018), 51–61.

28.

Liu

K.Y.

, Yang

X.B.

, Fujita

, Liu

, Yang

, Qian

Y.H.

, An efficient selector for multi-granularity attribute reduction, Information Sciences 505 (2019), 457–472.

29.

Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Fujita

, Chen

X.J.

, Liu

, Supervised information granulation strategy for attribute reduction, International Journal of Machine Learning and Cybernetics (2020). doi: 10.1007/s13042-020-01107-5.

30.

Liu

K.Y.

, Yang

X.B.

, Yu

H.L.

, Mi

J.S.

, Wang

P.X.

and Chen

X.J.

, Rough set based semi-supervised feature selection via ensemble selector, Knowledge-Based Systems 165 (2019), 282–296.

31.

Min

, Zhu

, Attribute reduction of data with error ranges and test costs, Information Sciences 211 (2012), 48–67.

32.

Pawlak

, Rough set, International Journal of Computer and Information Sciences 11 (1982), 341–356.

33.

Pawlak

, Rough set theory and its applications to data analysis, Cybernetics and Systems 29 (1998), 661–688.

34.

Qian

Y.H.

, Cheng

H.H.

, Wang

J.T.

, Liang

J.Y.

, Pedrycz

, Dang

C.Y.

, Grouping granular structures in human granulation intelligence, Information Sciences 382-383 (2017), 150–169.

35.

Qian

Y.H.

, Liang

J.Y.

, Dang

C.Y.

, Knowledge structure, Letters 24 (2003), 833–849.

36.

Qian

Y.H.

, Liang

J.Y.

, Wei

, Consistency-preserving attribute reduction in fuzzy rough set framework, International Journal of Machine Learning and Cybernetics 4 (2013), 287–299.

37.

Qian

Y.H.

, Wang

, Cheng

H.H.

, Liang

J.Y.

, Dang

C.Y.

, Fuzzy-rough feature selection accelerator, Fuzzy Sets and Systems 258 (2015), 61–78.

38.

Sheeja

T.K.

, Kuriakose

A.S.

, A novel feature selection method using fuzzy rough sets, Computers in Industry 97 (2018), 111–116.

39.

Song

J.J.

, Yang

X.B.

, Song

X.N.

, Yu

H.L.

, Yang

J.Y.

, Hierarchies on fuzzy information granulations: A knowledge distance based lattice approach, Journal of Intelligent and Fuzzy Systems 27 (2014), 1107–1117.

40.

Swiniarski

R.W.

, Skowron

, Rough set methods in feature selection and recognition, Pattern Recognition Letters 24 (2003), 833–849.

41.

Wang

C.Z.

, Huang

, Shao

M.W.

, Fan

X.D.

, Fuzzy rough set-based attribute reduction using distance measures, Knowledge-Based Systems 164 (2019), 205–212.

42.

Wang

C.Z.

, Shao

M.W.

, He

, Qian

Y.H.

and Qi

L.Y.

, Feature subset selection based on fuzzy neighborhood rough sets, Knowledge-Based Systems 111 (2016), 173–179.

43.

Wang

C.Z.

, Shi

Y.P.

, Fan

X.D.

, Shao

M.W.

, Attribute reduction based on k-nearest neighborhood rough sets, International Journal of Approximate Reasoning 106 (2019), 18–31.

44.

Wang

Y.B.

, Chen

X.J.

, Dong

, Attribute reduction via local conditional entropy, International Journal of Machine Learning and Cybernetics 10 (2019), 3619–3634.

45.

Wei

, Song

, Liang

J.Y.

, Wu

X.Y.

, Accelerating incremental attribute reduction algorithm by compacting a decision table, International Journal of Machine Learning and Cybernetics 10 (2019), 2355–2373.

46.

Wilcoxon

, Individual comparisons by ranking methods, Biometrics Bulletin 1 (1945), 80–83.

47.

S.P.

, Ju

H.R.

, Shang

, Pedrycz

, Yang

X.B.

and Li

Chun

, Label distribution learning: A local collaborative mechanism, International Journal of Approximate Reasoning 121 (2020), 59–84.

48.

S.P.

, Yang

X.B.

, Yu

H.L.

, Yu

D.J.

, Yang

J.Y.

, Tsang

E.C.C.

, Multi-label learning with label-specific feature reduction, Knowledge-Based Systems 104 (2016), 52–61.

49.

W.H.

, Sun

W.X.

, Liu

Y.F.

, Zhang

W.X.

, Fuzzy rough set models over two universes, International Journal of Machine Learning and Cybernetics 4 (2013), 631–645.

50.

W.H.

, Wang

Q.R.

, Zhang

X.T.

, Multi-granulation rough sets based on tolerance relations, Soft Computing 17 (2013), 1241–1252.

51.

W.H.

, Wang

Q.R.

, Luo

S.Q.

, Multi-granulation fuzzy rough sets, Journal of Intelligent and Fuzzy Systems 26 (2014), 1323–1340.

52.

Yang

, Hu

B.Q.

, Communication between fuzzy information systems using fuzzy covering-based rough sets, International Journal of Approximate Reasoning 103 (2018), 414–436.

53.

Yang

, Hu

B.Q.

, Fuzzy neighborhood operators and derived fuzzy coverings, Fuzzy Sets and Systems 370 (2019), 1–33.

54.

Yang

, Hu

B.Q.

, On some types of fuzzy covering-based rough sets, Fuzzy Sets and Systems 312 (2017), 36–65.

55.

Yang

X.B.

, Chen

Z.H.

, Dou

H.L.

, Zhang

, Yang

J.Y.

, Neighborhood system based rough set: Models and attribute reductions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 20 (2012), 399–419.

56.

Yang

X.B.

, Liang

S.C.

, Yu

H.L.

, Gao

, Qian

Y.H.

, Pseudo-label neighborhood rough set: Measures and attribute reductions, International Journal of Approximate Reasoning 105 (2019), 112–129.

57.

Yang

X.B.

, Qi

Y.S.

, Song

X.N.

, Yang

J.Y.

, Test cost sensitive multigranulation rough set: Model and minimal cost selection, Information Sciences 250 (2013), 184–199.

58.

Yang

X.B.

, Yao

Y.Y.

, Ensemble selector for attribute reduction, Applied Soft Computing 70 (2018), 1–11.

59.

Zhang

, Mei

C.L.

, Chen

D.G.

, Li

J.H.

, Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56 (2016), 1–15.

60.

Zhang

Q.H.

, Xia

D.Y.

, Liu

K.X.

, Wang

G.Y.

, A general model of decision-theoretic three-way approximations of fuzzy sets based on a heuristic algorithm, Information Sciences 507 (2020), 522–539.

61.

Zhang

Q.H.

, Xu

, Wang

G.Y.

, Fuzzy equivalence relation and its multigranulation spaces, Information Sciences 346–347 (2016), 44–57.

62.

Zhang

Q.H.

, Zhang

, Wang

G.Y.

, The uncertainty of probabilistic rough sets in multi-granulation spaces, International Journal of Approximate Reasoning 77 (2016), 38–54.