Attribute reduction algorithm based on combined distance in clustering

Abstract

Attribute reduction is a widely used technique in data preprocessing, aiming to remove redundant and irrelevant attributes. However, most attribute reduction models only consider the importance of attributes as an important basis for reduction, without considering the relationship between attributes and the impact on classification results. In order to overcome this shortcoming, this article firstly defines the distance between samples based on the number of combinations formed by comparing the samples in the same sub-division. Secondly, from the point of view of clustering, according to the principle that the distance between each point in the cluster should be as small as possible, and the sample distance between different clusters should be as large as possible, the combined distance is used to define the importance of attributes. Finally, according to the importance of attributes, a new attribute reduction mechanism is proposed. Furthermore, plenty of experiments are done to verify the performance of the proposed reduction algorithm. The results show that the data sets reduced by our algorithm has a prominent advantage in classification accuracy, which can effectively reduce the dimensionality of high-dimensional data, and at the same time provide new methods for the study of attribute reduction models.

Keywords

Rough sets attribute reduction clustering combined distance

1 Introduction

Classification is to express and understand different things more clearly. According to different classification standards, various kinds of elements are divided into several categories. With the development of the Internet, information has grown exponentially. To quickly dig out useful information from this massive amount of data, data sets must be preprocessed. Through the attribute reduction method, redundant and irrelevant attributes are eliminated, which can speed up the data analysis process and improve the accuracy of data analysis [30]. Usually, people take the best result of clustering as a reference standard for classification, and classify similar samples into the same cluster. In the process of cluster analysis, if the relationship among clusters is measured by distance, the larger is the sample distance between clusters, the stronger the distinguishing ability, and vice versa. From the original intention of attribute reduction, due to the existence of redundant data, the data analysis process not only wastes a lot of time and space resources, but also affects the accuracy of data classification. Therefore, it is expected that attribute reduction can reduce the dimension of data and improve the classification accuracy of data set.

As a powerful mathematical tool for dealing with inaccurate and incomplete data, rough sets [1 –3] have drawn much attention from researchers and have been successfully applied in the fields of machine learning [4, 5], data mining [6 –8], expert system [9], fault diagnosis [10], et al. Attribute reduction [38 –43] is one of the research hotspots of rough sets, and it is a widely used technique in the data preprocessing stage. Its goal is to effectively delete redundant conditional attributes and achieve data dimensionality reduction while maintaining the distinguishing ability of original decision-making system among samples. More importantly, it reduces the consumption of space resources, speeds up the running speed of the algorithm and improves the classification accuracy of data sets. In order to find out effective attribute reduction methods, researchers have proposed and improved the classic rough sets model, and designed a variety of reduction models. For example, positive region method uses equivalence class division [9–11 , 23]. The discernibility matrix method adopts the discernibility matrix to derive the discernibility function, and then uses the disjunction of the discernibility function to find out the reduction sets [12–14 , 37]. The information entropy reduction model applies information entropy to measure the importance of attributes and eliminate redundant attributes in turn [15–17 , 27]. The granular computing attribute reduction method is to search out a subset of the attributes with the coarsest granularity in the conditional attributes as the reduction core, and on this basis, calculate the importance of the remaining attributes to obtain the reduction set [18 , 36], et al., and the effectiveness of the above reduction model is verified through numerous experiments.

Although the methods of traditional reduction models are different, most of them use the indistinguishable relationship or equivalence relationship between samples to calculate the attribute reduction set, and these reduction sets all depend on the decision class of the equivalence relationship. According to the research [23 , 37] findings, the reduction result which is obtained by using the equivalence relationship between the condition attribute and the decision attribute cannot fully reflect the obvious improvement of the sample classification ability after the reduction. Given a decision system DS = (U, C ∪ D), U is a nonempty finite set of data objects and x ∈ U ∧ y ∈ U, C is a conditional attribute set with C = {a, b, c, e}, and D is a decision attribute. Suppose, the set {x, y} is classified according to the decision label and we get [x] ∈ D_i ∧ [y] ∈ D_j ∧ i ≠ j, but after reduction, we get [x] = [y]. In other words, data objects x and y belong to the same equivalence relationship. For example, x = (1, 2, 1, 0, 1), y = (1, 2, 3, 1, 2), x ≠ y. If the reduction set is a, b, then we get x = (1, 2), y = (1, 2), at this time, x and y belong to equivalent relations. This phenomenon of inconsistency in the division of samples before and after reduction has a negative impact on the classification of the data set after reduction. In order to avoid the problems mentioned above, this paper proposes an attribute reduction method from the perspective of clustering.

To address above problem, we propose an attribute reduction method based on combined distance in the context of clustering. Firstly, divide all data objects according to decision attribute, and the objects with the same decision value are called intra-class samples. The intra-class samples are divided according to the conditional attributes, and the numbers of combinations formed by comparing the samples of the same sub-division pair by pair. Use those numbers of combinations to define the similarity distance of intra-class. Secondly, samples with different decision values are called inter-class samples, and the samples of different clusters are compared two by two different clusters, and the number of combinations obtained is used to define the inter-class sample distance. Thirdly, in order to comprehensively consider the factors of inter-class distance and intra-class distance, we use the formulas intra_distance + λ * inter_distance to calculate distance, and adjust the scale of parameters to determine the best clustering effect. Finally, delete unnecessary attributes to get the reduction set.

The rest parts of this paper are organized as following: Some basic concepts of rough sets are introduced in Section 2. In Section 3, the concept of intra-class combination distance and inter-class combination distance is defined, then the relevant standards of measuring clustering effects are applied to define the importance of attributes, and reduction rules and an optimized reduction algorithm are designed. Through experiments, the classification accuracy of the reduction set and the correlation of the data set before and after the reduction are analyzed in Section 4. In Section 5, a summary is made on the contribution of this article and the problems that need to be solved urgently in the future.

2 Preliminary

In this section, a brief introduction is made on the basic concepts of equivalence, distinguishable relationship, indistinguishable relationship and approximate combination number in rough sets.

2.1 Basic concepts

Definition 1[1]. Given the information system is a quadruple tuple: S = (U, Att, V, f), where U is a finite non-empty set of objects. V = ∪ _a∈AV_a, V_a is a set of its values, and f : U × A → V is an information function with f_a (x) = V_a for each a ∈ A and x ∈ U. If Att = C ∪ D, where C is the conditional set of attributes and D is the decision attribute set, then S = (U, C ∪ , V, f) is also called as a decision system. Moreover, the information system (or decision table) can be simply written as IS = (U, C ∪ D). For each subset B ⊆ Att, an indiscernibility relation is defined as: $IND (B) = {(x, y) \in U \times U | \forall a \in B, f_{a} (x) = f_{a} (y)}$ (1)

Then, an equivalence class containing the object x is represented as: $[x]_{B} = {y \in U | (x, y) \in IND (B)}$ (2)

Equivalence class is a subset of samples with the same conditional attribute value.

Definition 2 [1]. Suppose, there is an information system IS = (U, C ∪ D), for every subset X ⊆ U and indiscernibility relation IND (B), the upper approximation set and the lower approximation set of X can be defined by the basic set of B as follows, $\underline{B (X)} = \cup {Y_{i} | (Y_{i} \in U | IND (B), Y_{i} \subseteq X)}$ (3) $\bar{B (X)} = \cup {Y_{i} | (Y_{i} \in U | IND (B), Y_{i} \cap X \neq \emptyset)}$ (4) where $\underline{B (X)}$ is a set of objects that belong to X with certainty, where $\bar{B (X)}$ is a set of objects that possibly belong to X. The universe U is partitioned into three disjoint regions by these two approximations $\bar{B (X)}$ and $\underline{B (X)}$ : the positive region POS_B (X), the negative region $\underline{B (X)}$ , and the boundary region BND_B (X) is the set of $\bar{B (X)}$ minus $\underline{B (X)}$ . Then the three different regions are defined as following, respectively: ${\begin{matrix} {NEG}_{B} (X) = U - \bar{B (X)} \\ {BND}_{B} (X) = \bar{B (X)} - \underline{B (X)} \\ {POS}_{B} (X) = \underline{B (X)} \end{matrix}$ (5)

Definition 3 [2]. (Positive region reduction) Given an information system IS = (U, C ∪ D), R is subset of conditional attributes and R ⊆ C. R is a positive region reduction of C as to D, if R satisfies the following conditions:

POS_R (D) = POS_C (D)

for any a ∈ R, POS_R-a (D) ≠ POS_C (D)

In this definition, the main purpose of positive region reduction is to find out a minimal attribute subset till the positive region unchanging.

3 Attribute reduction algorithm based on combination distance

Usually, attribute reduction is mainly applied to delete redundant attributes. In the classification work, if we delete the redundant and irrelevant attributes, not only the classification accuracy will be improved, but also a lot of time and space resources will be saved. How to measure the validity of the data set after reduction preprocessing, classification accuracy is one of the important criterions for evaluating attribute reduction algorithms. Due to that the traditional attribute reduction process does not rely on subsequent classifiers, the classification effect of the reduced data set is not ideal [21]. Therefore, we need to design a reduction model that combines the distribution of the data itself and the classification characteristics. In general, we can take the most ideal process of clustering as the classification efficiency. A class in the classification analysis process corresponds to a cluster in the cluster analysis process, and objects of the same class are attributed to the same cluster. Therefore, using the principle of clustering to reduce data set, we can improve the validity of data and improve the classification effect of the data set more conductively. The distance calculation is key to cluster analysis. In the following section, we discuss the calculation method of distance.

3.1 The Intra-class combination distance

The data set is divided according to decision attribute, and samples are classified into one class with the same decision value. The number of combinations is formed by the pairwise comparison of all objects in intra-class, and the larger is the value, the higher the sample similarity. In the process of cluster analysis, samples with greater similarity are easier to be classified into the same class. The intra-class combination distance is discussed as following.

Definition 4 [20]. Given a decision information system IS = (U, C ∪ D), C and D are the conditional and decision attributes, respectively. For x ∈ U, y ∈ U, the quantitative indiscernibility relation between two objects x and y under the subset of attributes R ⊆ C as following: ${Sr}_{R} (x, y) = \frac{| {a \in R | f_{a} (x) = f_{a} (y)} |}{| R |}$ (6) , where |R| represents the cardinality of set R. On the contrary, the degree of discernibility relation between x and y as following: ${Dr}_{R} (x, y) = \frac{| {a \in R | f_{a} (x) \neq f_{a} (y)} |}{| R |}$ (7)

The value of Sr_R (x, y) denotes how many attributes have the same value on the objects x and y based on attribute sets R, we can draw the conclusion that Dr_R (x, y) =1 - Sr_R (x, y).

Definition 5 [21]. In an information system IS = (U, C ∪ D), for decision class D_q ∈ D, its intra-class similarity with respect to the subset of attributes R ⊆ C is defined as following: $IntraS (R, D_{q}) = \frac{2 \cdot \sum_{i = 1}^{| D_{q} | - 1} \sum_{j > i}^{| D_{q} |} {Sr}_{R} (D_{qi}, D_{qj})}{| D_{q} | \cdot (| D_{q} | - 1)}$ (8) where D_qi and D_qj are two different objects of D_q, and |D_q| denotes the cardinality of the D_q.

Definition 6. Given a decision information system IS = (U, C ∪ D), for decision class D_q ∈ D, let [D_q/a] = {X₁, X₂, ⋯ , X_m}. Then the sample set D_q is divided according to the condition attribute “a”, and the number of combinations formed by the pairwise comparison of the samples in each sub-division is as follows: $D_{q}^{a} = \sum_{i = 1}^{m} C_{| X_{i} |}^{2}$ (9)

Where $C_{n}^{2} = n (n - 1) / 2$ . According to definition 1, all samples in sub-division X_i have the same value on attribute a. If these samples are compared in pairs, there are a total of $C_{| X_{i} |}^{2} = \frac{| X_{i} | (| X_{i} | - 1)}{2}$ pairs.

Definition 7. Given a decision information system IS = (U, C ∪ D), for decision class D_q ∈ D, its intra-class resemblance rate of combinations under the attribute set R ⊆ C is defined as follows: $IntraRr (R, D_{q}) = \frac{\frac{1}{| R |} \sum_{a \in R} D_{q}^{a}}{C_{| D_{q} |}^{2}}$ (10)

Property 1. Given a decision information system IS = (U, C ∪ D), for decision class D_q∈ D, a ∈R R ⊆ C, Let [D_q/a] = {X₁, X₂, ⋯ , X_m}, the definition 7 is equal to definition 5.

Proof: According to Definition 4, we have ${Sr}_{R} (x, y) = \frac{| {a \in R | f_{a} (x) = f_{a} (y)} |}{| R |} \Rightarrow$ ${Sr}_{R} (x, y) = \frac{\sum_{a \in R} | f_{a} (x) = f_{a} (y) |}{| R |} .$

Assume a ∈ R and |a| = 1, which means Sr_a (x, y) = $| f_{a} (x) = f_{a} (y) | and {Sr}_{R} (x, y) = \frac{\sum_{a \in R} {Sr}_{a} (x, y)}{| R |} .$

Normally, assume x and y are two different random objects in D, if f_a (x) = f_a (y) then |Sr_a (x, y) | = 1. To look for all pairs of samples in D_q that satisfy condition f_a (x) = f_a (y), the number of pairwise comparisons of all elements in indistinguishable relation [x] _a is calculated through $C_{| [x]_{a} |}^{2}$ . For [D_q/a] = {X₁, X₂, ⋯ , X_m}, there are $\sum_{i = 1}^{m} C_{| X_{i} |}^{2}$ pairs for the indistinguishable element pairs of conditional attribute ‘a’ in division D_q,where $\sum_{i = 1}^{m} C_{| X_{i} |}^{2} = \sum_{i = 1}^{| D_{q} | - 1} \sum_{j > i}^{| D_{q} |} {Sr}_{a}$ . (D_qi, D_qj) and qi, qj are two different samples on D_q. Since $\sum_{i = 1}^{| D_{q} | - 1} \sum_{j > i}^{| D_{q} |} {Sr}_{R} ($ D_qi, D_qj) = $\frac{1}{| R |} \sum_{i = 1}^{| D_{q} | - 1} \sum_{j > i}^{| D_{q} |} \sum_{a \in R} {Sr}_{a} (D_{qi}, D_{qj})$ $= \frac{1}{| R |} \sum_{a \in R} D_{q}^{a}$ .

So, it is easy to conclude that formula (10) is equivalent to formula (8), from which we conclude that Definition 7 is equivalent to Definition 5.

Although Definition 5 is equivalent to Definition 7, the time complexity of calculating Definition 5 is O (|D_q|²|R|), while Definition 7’s is O (|D_q||R|).

Since O (|D_q||R|) ≺ O (|D_q|²|R|), the latter is an improvement of the former.

Definition 8. Let IS = (U, C ∪ D), for decision class D_q ∈ D, its intra-class dissimilarity rate of combinations under the attribute set R ⊆ C is defined as following: $\begin{matrix} IntraDr (R, D_{q}) = 1 - IntraRr (R, D_{q}) \\ = 1 - \frac{1}{| R |} \sum_{a \in R} D_{q}^{a} / C_{| D_{q} |}^{2} \end{matrix}$ (11)

Definition 7 is to find out the degree of similarity within the intra-class D, and Definition 8 is to recognize the distinguish-ability within the inter-class D. Since IntraS (R, D_q) represents the degree of similarity of D_q, then the bigger is the value, the higher the degree of similarity, the closer the samples in intra-class D_q. On the contrary, 1 - IntraS (R, D_q) represents the degree of distinguishability of intra-class D_q, the larger is the value, the higher degree of dissimilarity, and the looser the samples in intra-class D_q.

Example 1. The data is shown in decision Table 1. We have U = {x₁, x₂, ⋯ , x₁₀}, C = {a, b, c, e), D₁ = {x₁, x₂, x₃}, D₂ = {x₄, x₅, x₆}, D₃ = {x₇, x₈, x₉, x₁₀}. For all samples of decision D₁, subdivide according to each conditional attribute in C, we have [D₁/a] = {{x₁, x₃} , {x₂}}, [D₁/b] = {{x₁} , {x₂, x₃}}, [D₁/c] = {{x₁, x₂} , {x₃}}, [D₁/e] = {{x₁, x₂, x₃}}, $IntraRrc (C, D_{1}) = \frac{1}{| C |} \sum_{a \in C} D_{1}^{a} / C_{| D_{1} |}^{2} = \frac{1}{2}$ . If the attribute b is removed from C, we have IntraRr ({a, c, e} , D₁) = raise0.7ex5 / lower0.7ex9and IntraDrc ({a, c, e} , D₁) =1 - raise0.7ex5 / lower0.7ex9 = raise0.7ex4 / lower0.7ex9. Because the value of IntraRrc ({a, c, e} , D₁) is bigger than IntraRrc (C, D₁), the samples in division D₁ have a higher degree of intra-class tightness on attribute set {a, c, e} than on attribute set C.

Table 1

A decision table

U	a	b	C	e	D
x₁	1	0	1	1	1
x₂	0	1	1	1	1
x₃	1	1	0	1	1
x₄	1	1	0	0	2
x₅	0	1	1	1	2
x₆	0	0	1	1	2
x₇	1	0	1	0	3
x₈	0	0	0	1	3
x₉	1	1	1	1	3
x₁₀	0	1	0	1	3

Definition 9. (Intra-class distance of the decision class) Given a decision information system IS (U, C ∪ D), TD = {D₁, D₂, ⋯ , D_m} is a partition of U, the intra-class combination distance of TD with respect to the subset of attributes R ⊆ C is defined as:

If calculate resemblance rate

IntraDis (R, TD) = \frac{1}{m} \sum_{i = 1}^{m} IntraRr (R, D_{i})

If calculate distinctive rate

IntraDis (R, TD) = \frac{1}{m} \sum_{i = 1}^{m} IntraDr (R, D_{i})

(12)

3.2 The Inter-class combination distance

The inter-class distance refers to the scale of distance between samples of different class. We assort two samples of different categories into a data set, and then divide them according to conditional attributes, where samples with the same value on conditional attribute are collected in the same sub-division. The number of combinations is formed by comparing the samples in the same sub-division pair by pair. The smaller is the value, the lower the degree of similarity.

Definition 10. Given a decision information system IS = (U, C ∪ D), for decision class D_q ∈ D, D_p ∈ D, let D_q/R = {X₁, X₂, ⋯ , X_m} , D_p/R = {Y₁, Y₂, ⋯ , Y_m′}, $X_{i}^{'} = X_{i} \cup Y_{i}$ , assuming that D_q/R and D_p/R have k sub-divisions with the same value on the conditional attribute R, that is to say f (x, R) = f (y, R) and x ∈ X_i, y ∈ Y_i, 1 ⩽ i ⩽ k, then we define the inter-class resemblance rate of combinations as following: $InterRr (D_{R}^{q}, D_{R}^{p}) = \frac{\sum_{i = 1}^{k} (C_{| X_{i} | + | Y_{i} |}^{2} - C_{| X_{i} |}^{2} - C_{| Y_{i} |}^{2})}{C_{| D_{p} | + | D_{q} |}^{2} - C_{| D_{p} |}^{2} - C_{| D_{q} |}^{2}}$ (13)

Property 2. Given a decision information system IS = (U, C ∪ D), for decision class D_q ∈ D, D_p ∈ D, A ⊆ B ⊆ C, we have $InterRr (D_{B}^{q}, D_{B}^{p}) ⩽ InterRr (D_{A}^{q}, D_{A}^{p})$ .

Proof: Assume X_i ∈ [D_q/A] , Y_j ∈ [D_p/A], which means f (X_i, A) = f (Y_j, A). Since A ⊂ B, each sub-division in D_q/B must be a refinement of a certain sub-division in D_q/A. Let X_i1, X_i2 ∈ D_q/ /B, X_i∈ D_q/A, X_i1 ∪ X_i2 = X_i, Y_i1, Y_i2 ∈ D_p/B, Y_i ∈ D_p/A, Y_i1 ∪ D_q/A, X_i1 ∪ X_i2 = X_i, Y_i1, Y_i2 ∈ D_p/B, Y_i ∈ D_p/A, Y_i1 ∪ Y_i2 = Y_i, then we have |X_i| = |X_i1|+ |X_i2|, |Y_i| = |Y_i1| + |Y_i2|. Assume f (X_i1, B) = f (Y_i1, B), f (X_i2, B) = f (Y_i2, B) Y_i2, according to the definition 10, we may have $InterRr (D_{B}^{q}, D_{B}^{p})$ $= \frac{(C_{X_{i 1} + Y_{i 1}}^{2} - C_{X_{i 1}}^{2} - C_{Y_{i 1}}^{2}) + (C_{X_{i 2} + Y_{i 2}}^{2} - C_{X_{i 2}}^{2} - C_{Y_{i 2}}^{2})}{2 | D_{p} | | D_{q} |}$ $= (| X_{i 1} | | Y_{i 1} | + | X_{i 2} | | Y_{i 2} |) / (| D_{p} | | D_{q} |)$

and $InterRr (D_{A}^{q}, D_{A}^{p}) = \frac{C_{X_{i} + Y_{i}}^{2} - C_{X_{i}}^{2} - C_{Y_{i}}^{2}}{2 | D_{p} | | D_{q} |} = \frac{| X_{i} | | Y_{i} |}{| D_{p} | | D_{q} |} .$ $InterRr (D_{A}^{q}, D_{A}^{p}) - InterRr (D_{B}^{q}, D_{B}^{p})$ $= \frac{| X_{i} | | Y_{i} | - | X_{i 1} | | Y_{i 1} | - | X_{i 2} | | Y_{i 2} |}{| D_{p} | | D_{q} |} .$

Since |X_i| = |X_i1| + |X_i2|, |Y_i| = |Y_i1| + |Y_i2|, we have $InterRr (D_{A}^{q}, D_{A}^{p}) - InterRr (D_{B}^{q}, D_{B}^{p}) =$

$\frac{| X_{i 1} | | Y_{i 2} | + | X_{i 2} | | Y_{i 1} |}{| D_{p} | | D_{q} |} ⩾ 0 □$

Definition 11. Let IS = (U, C ∪ D), for decision class D_q, D_p ∈ D, its inter-class dissimilarity rate of combinations under the attribute set R ⊆ C is defined as follows: $InterDr (D_{R}^{q}, D_{R}^{p}) = 1 - InterRr (D_{R}^{q}, D_{R}^{p})$ (14)

Formula (13) indicates the degree of inter-class similarity, and formula (14) represents the degree of distinguishability. The bigger is the value of $InterDr (D_{R}^{q}, D_{R}^{p})$ , the farther the sample D_q and D_p, and the stronger the samples’ distinguishing ability.

Definition 12. (Inter-class distance within the whole class) Given a decision information system IS = (U, C ∪ D), TD = {D₁, D₂, ⋯ , D_m} is a partition of U, the inter-class distance of combinations of TD with respect to the subset of attributes R ⊆ C is defined as:

If calculate resemblance rate $\begin{matrix} SumInterRr (R, TD) = \\ \frac{2 \sum_{i = 1}^{m - 1} \sum_{i < j}^{m} InterRr (D_{R}^{i}, D_{R}^{j})}{m (m - 1)} \end{matrix}$

If calculate distinctive rate

SumInterDr (R, TD) = \frac{2 \sum_{i = 1}^{m - 1} \sum_{i < j}^{m} InterDr (D_{R}^{i}, D_{R}^{j})}{m (m - 1)}

(15)

Example 2. The data set is shown in Table 1, let R = {b, c, e}, we have $InterRr (D_{R}^{1}, D_{R}^{2}) = \frac{2}{9}$ , $InterDr (D_{R}^{1}, D_{R}^{2}) = \frac{7}{9}$ , $InterRr (D_{R}^{1}, D_{R}^{3}) = \frac{1}{6}$ , $InterDr (D_{R}^{1}, D_{R}^{3}) = 1 - \frac{1}{6} = \frac{5}{6}$ , $InterRr (D_{R}^{2}, D_{R}^{3}) = \frac{1}{12}$ , $InterDr (D_{R}^{2}, D_{R}^{3}) = \frac{11}{12}$ . By calculating resembl-ance rate, we have $SumInterRr (R, TD) = 2 [InterRr (D_{R}^{1}$ , $D_{R}^{2}) + InterRr (D_{R}^{1}, D_{R}^{3}) + InterRr (D_{R}^{2}, D_{R}^{3})] / (3 * 2) = \frac{17}{108}$ ; As for distinctive rate, we have $SumInterDr (R, TD) = 2 [InterDr (D_{R}^{1}, D_{R}^{2}) + InterDr (D_{R}^{1}, D_{R}^{3})$ + $InterDr (D_{R}^{2}, D_{R}^{3})] / (3 * 2) = \frac{91}{108} .$

3.3 Multiple combination distance calculating methods

Among the various clustering methods, some only consider the closeness between samples in the same clusters [45], and some only pay attention to the looseness between samples of different clusters [44]. This article considers both the inter-class distance and the intra-class distance to make the clustering more effective. The distance calculation methods of clustering samples can be measured by similarity and dissimilarity. The sample comparison includes intra-class samples and inter-class samples. In this way, the calculation methods and comparison objects are considered at the same time, and the following four distance calculation methods are explained:

Case 1: Intra-class resemblance rate and inter-class similarity rate. One of the desirable clustering effects requires high intra-class resemblance rate and low inter-class similarity rate. That’s to say, the resemblance distance of samples within intra-class should be large, and the similarity distance of samples within inter-class should be small. Generally speaking, the similarity of samples within intra-class is bigger than that of inter-class. For the same λ₁, the bigger is the IntraRr (R, TD) and the smaller the InterRr (R, TD), the better the clustering effect. So, we use formula SIM_R1 = IntraDis (R, TD) - λ₁ * InterDis (R, TD) to measure the importance of conditional attribute R. The bigger is the value of SIM_R1, the more important the conditional attribute R. The formula SIM_R1 = IntraDis (R, TD) - λ₁ * InterDis (R, TD) is marked as SIM_R1 = IntraR _ InterR.

Case 2: Intra-class dissimilarity rate and inter-class dissimilarity rate. Another one of the better clustering effects requires weak intra-class dissimilarity and strong inter-class dissimilarity. That’s to say, the dissimilarity distance of samples of intra-class should be small and that of inter-class should be large. Generally speaking, the dissimilarity of samples of inter-class is bigger than that of intra-class. For the same λ2, the bigger is the InterDr (D, TD), the smaller the IntraDr (R, TD), the better the clustering effect. The formula SIMR2 = InterDis (R, TD) - λ₂ * IntraDis (R, TD) is used to measure the importance of attribute R, and the bigger is SIM_R2, the more important R will be. Similar to other cases, SIM_R2 = - IntraDis (R, TD)+ λ₂ * InterDis (R, TD) is marked as SIM_R2 = IntraD _ InterD.

Case 3: Intra-class similarity rate and inter-class dissimilarity rate. The third one of the ideal clustering effects requires strong intra-class similarity and strong inter-class dissimilarity.

When IntraRr (R, TD) and InterDr (R, TD) become larger, the optimal clustering effects will be. We use the formula SIM_R3 = IntraDis (R, TD) + λ₃ * InterDis (R, TD) to measure the importance of attribute R, the bigger is SIM_R3, the more important R will be. And we use the mark SIM_R3 = IntraR _ InterD.

Case 4: Intra-class dissimilarity rate and inter-class similarity rate. The last one of the optimal clustering effects requires strong intra-class similarity and weak inter-class similarity. When IntraDr(R,TD) and InterRr (R, TD) become smaller, the clustering effect will be better. So the formula SIM_R4 = IntraDis (R, TD) + λ₄InterDis (R, TD) is used to measure the importance of attribute R. The smaller is SIM_R4, the more important R will be. And we use the mark SIM_R4 = IntraD _ InterR.

For the above four cases, when calculating IntraDis, we select IntraDr or IntraRr accordingly to obtain the intra-class sample distances. Similarly for calculating InterDis, InterDr or InterRr is selected properly to get the inter-class sample distance. Let λ_i ∈ [0, + ∞), λ could be adjusted according to the importance on inter-class or intra-class.

Definition 13. Given a decision information system IS = (U, C ∪ D), a conditional attribute R ⊆ C is a resemblance-combination attribute reduction if and only if: ${\begin{matrix} R = arg max_{R \subseteq C} {{SIM}_{R}}, if {SIM}_{R} = IntraR_InterR or \\ {SIM}_{R} = IntraD_InterD or {SIM}_{R} = IntraR_InterD \\ R = arg min_{R \subseteq C} {{SIM}_{R}}, if {SIM}_{R} = IntraD_InterR \end{matrix}$ (16)

and ${\begin{matrix} \forall R^{'} \subset R, {SIM}_{R^{'}} < {SIM}_{R}, if {SIM}_{R} = IntraR_InterR or \\ {SIM}_{R} = IntraD_InterD or {SIM}_{R} = IntraR_InterD \\ \forall R^{'} \subset R, {SIM}_{R^{'}} > {SIM}_{R}, if {SIM}_{R} = IntraD_InterR \end{matrix}$ (17)

In definition 13, the first condition guarantees that the reduced set R has the most important meaning and obtains the most ideal clustering effect. The second condition guarantees that the attribute set R is the smallest reduce set.

Example 3. In Table 1, let SIM_R = IntraR _ InterR, the larger is the value of SIM_R, the better the clustering effect. Let λ = 0.085, we have SIM_C = 0.3996, and the maximal SIM_{c,e} value is 0.4156 with respect to the attribute set {c, e}, thus the attribute re-duction is {c, e}. Let λ = 0.1, we have SIM_C = 0.3991, and the maximal SIM_{a,c,e} value is 0.4111, thus the attribute reduction is {a, c, e}. Given SIM_R = IntraD _ InterR, the smaller is the value of SIM_R, the better the clustering effect. Let λ = 0.085, we have SIM_C = 0.6003, and the minimum SIM_{c,e} value is 0.5844 with respect to the attribute set {c, e}, thus the attribute reduction is {c, e}. Let λ = 0.1, we have SIM_C = 0.6009, and the maximal SIM_{a,c,e} value is 0.5889, thus the attribute reduction is {a, c, e}. The other two cases can be deduced by analogy.

3.4 The attribute reduction algorithm

By using the combination distance to measure the degree of resemblance or dissimilarity of intra-class samples and inter-class samples, Definition 13 provides an ideal method for finding the optimal attribute reduction set. Although the distance of samples within inter-class is monotonic, the distance of samples within intra-class is not. Therefore, in order to get the optimal reduction, each condition attribute must be fully exerted.

When classifying and analyzing data, effective removal of redundant attributes will help improve the accuracy of data classification. So, in this paper, we adopt a deletion strategy to remove meaningless attributes. Firstly, calculate the all divisions of U/D. Secondly, we choose the method of calculating the importance of attributes, and compute the amount of information for each attribute according to the SIM. If we choose the case 1, case 2 and case 3 in section 3.3, the larger is the value of SIM, the better the clustering effect. Assume r ∈ R, delete r from R, if the value of SIM_R-r - SIM_R is larger, the classification accuracy of the data set after deleting r is higher, then the interference of r is greater. Suppose let Max (SIM_R-r) = SIM_R-a, we delete attribute a, R = R - a. Thirdly, if SIM_R-a ≻ SIM_R, then loop through the previous steps. Otherwise, the algorithm terminates. If we choose the case 4, the smaller is the value of SIM, the better the clustering effect. Every time we choose the smallest value in Min (SIM_R-r). Assume SIM_R-a is the smallest among the SIM_R-r and attribute a, R = R - a will be deleted, the algorithm will loop until SIM_R-a ≺ SIM_R.

For any attribute a ∈ C, its inner amount of information is defined as following: $Information (a, C) = {\begin{matrix} {SIM}_{C - a} - {SIM}_{C} case 1, 2, 3 \\ {SIM}_{C} - {SIM}_{C - a} case 4 \end{matrix}$ (18)

In case 1, 2 and 3, the larger the value of Information (a,C) is the less significant the attribute a is. In case 4, the smaller the value of Information (a,C) is, the more significant the attribute a is. The algorithm is described as Algorithm 1(in short RRCB).

In RRCB, according to Definition 9, the time complexity of computing IntraDis (R, TD) is O (|U||C|).

According to Definition 12, the time complexity of computing InterDis (R, TD) is O (2|U||C|). So, the time complexity of Step 2 is O (|U||C|)+O (2|U||C|).

In Step 4, we calculate the importance of each conditional attribute according to formula 18, the complexity of computing SIM_R-a is O (|U||R| (|R| - 1)). If the reduction set is R, the Step 4 needs to loop |C| - |R| times. Therefore, the time complexity of Step 4 is O (|U| (|C| (|C| + 1) (2|C| + 1) - |R| (|R| + 1) (2|R| + 1))/ 6) ≈O (|U| (|C|³ - |R|³)/6). In summary, the time complexity of RRCB is O (|U| (|C|³ - |R|³)/6) + O (|U||C|) +O (2|U||C|) ≈ O (|U| (|C|³ - |R|³)/6).

Algorithm 1 A resemblance rate approach based on clustering background (RRCB)
Input: A decision information systems IS = (U, C ∪ D), coefficient λ and the method type
Output: an attribute reduction set R
Step 1. Calculate the all divisions of U/D, C → R
Step 2. Calculate the amount of information SIM_C of C by Equation 18 based on type
switch(type) {
case 1: SIM_C = IntraRr - λ₁InterRr;
case 2: SIM_C = InterDr - λ₂IntraDr;
case 3: SIM_C = IntraRr + λ₃InterDr;
case 4: SIM_C = InterRr + λ₄IntraDr; }
Step 3. SIM_C → SIM_R
Step 4. While (1) do
Step4.1 temp=SIM_R
Step4.2 if type in {1, 2, 3} then
for each a in R:
calculate the non-importance of attribute a and select the max SIM_R-a
if SIM_R-a > temp then
remove(a)
R = R –a
temp= SIM_R
else
return R
else:
for each a in R:
calculate the non-importance of attribute a and select the min SIM_R-a
if SIM_R-a < temp then
remove(a)
R = R –a
temp= SIM_R
else
return R
end while

4 Experiments analysis

In order to verify the effectiveness of the reduction algorithm RRCB and other four algorithms, we selected 21 data sets from the UCI [44] website. The basic information of the data set for comparison is shown in Table 2, where |U| is the number of samples, |C| is the number of sample categories.

Table 2
UCI Machine Learning data sets

ID Data set |U| |C| Class

1 Zoo 101 16 7

2 Tic-tac-toe 958 9 2

3 Heart-statlog 270 13 2

4 Promoters 106 57 2

5 Hepatitis 155 19 2

6 Cancer 699 9 2

7 Heart-c 296 13 5

8 Handwritten 5620 64 10

9 Kr-vs-kp 3196 36 2

10 Audiology 226 70 2

11 Spectf 349 45 2

12 Spect heart 267 22 2

13 Splice 3190 61 3

14 Letters 20000 17 26

15 Vote 435 16 2

16 Mushroom 8124 22 2

17 Krkopt 28056 6 18

18 Shuttle 43500 9 7

19 Dermatology 366 33 6

20 Satimage 6435 36 6

21 Qsar 1055 42 2

ID	Data set	\|U\|	\|C\|	Class
1	Zoo	101	16	7
2	Tic-tac-toe	958	9	2
3	Heart-statlog	270	13	2
4	Promoters	106	57	2
5	Hepatitis	155	19	2
6	Cancer	699	9	2
7	Heart-c	296	13	5
8	Handwritten	5620	64	10
9	Kr-vs-kp	3196	36	2
10	Audiology	226	70	2
11	Spectf	349	45	2
12	Spect heart	267	22	2
13	Splice	3190	61	3
14	Letters	20000	17	26
15	Vote	435	16	2
16	Mushroom	8124	22	2
17	Krkopt	28056	6	18
18	Shuttle	43500	9	7
19	Dermatology	366	33	6
20	Satimage	6435	36	6
21	Qsar	1055	42	2

4.1 Reduction result analysis

In the experiment, each data set is subjected to 10 cross-validation to find out the average value, and the accuracy result is recorded in the form of “mean+standard deviation”. Four other types of reduction algorithm are named PRA, DMA, EA, KGA [48].

PRA represents the position region algorithm;

DMA represents the discernibility matrix algorithm;

EA represents the general entropy-based feature selection algorithm and KGA represents the knowled-ge granularity algorithm, respectively. RRCB is the proposed reduction algorithm in this paper by resemblance rate of combination. The setting of parameter λ varies with the method selection in the RRCB algorithm. More parameter setting analysis will be introduced in the following part. Although there are four different methods in RRCB, the reduction results are highly similar, we use inter-class resemblance and intra-class dissimilarity as representatives to describe the reduction process. In order to evaluate the efficiency of data classification after reduction, four common classifiers in WEKA are used, including NaiveBayes(NB), SMO, J48, RandomForest(RF). All consecutive data are discretized and filtered by WEKA tool, and we use unsupervised methods for other parameters, and the default parameter settings of the tool.

Use different algorithms to reduce the data set in Table 2, and the reduction results are shown in Table 3. The accuracy comparison of the data set is shown in Tables 4 , 6 and 7. The data in bold in the table indicates that the corresponding method has achieved the best results, and the data in italics with shade background is the worst effect. The Best and Worst rows in the table represent the times of best and worst occasions of classification accuracy, time consuming and reduction set. The reduction results of RRCB are not promising from figures in Table 3 with 7 Worst results and 10 Best ones. The number of best results achieved by DMA, PRA, EA and KGA are 11, 5,10, 11 respectively, and the corresponding number of worst ones are 6,11,5,6. Obviously, DMA and KGA obtain the best reduction results with the smallest reduction set. The RRCB method does not perform so well in reduction, but get competitive results in computing time and classification accuracy.

Table 3
Number of selected attributes with different attribute reduction algorithms

Data ID RRCB DMA PRA EA KGA

RS Time/S RS Time/S RS Time/S RS Time/S RS Time/S

1 7 0.020 5 0.128 7 0.109 5 0.082 5 0.024

2 8 0.016 7 1.725 7 1.628 7 0.067 7 0.575

3 4 0.029 3 0.602 3 0.552 3 0.114 3 0.139

4 5 0.960 4 1.786 6 1.587 4 3.762 4 0.094

5 9 0.052 8 0.424 9 0.401 8 0.212 8 0.067

6 5 0.005 4 0.381 5 0.315 6 0.021 6 0.127

7 8 0.032 3 0.729 3 0.724 4 0.151 4 0.167

8 12 89.696 7 10956.619 22 8656.619 7 360.812 7 452.654

9 5 5.719 8 607.732 8 507.732 8 34.315 6 42.311

10 29 3.801 29 12.273 33 10.283 29 15.209 28 0.526

11 4 1.559 3 12.090 3 10.041 3 7.353 3 0.806

12 18 0.140 16 1.694 18 1.523 17 0.637 18 0.231

13 9 33.428 10 1784.118 10 1639.217 11 133.232 9 85.973

14 9 3.812 10 4831.528 12 2131.326 11 10.871 10 376.152

15 10 0.052 11 1.424 12 1.206 10 0.214 10 0.267

16 4 4.377 5 1202.723 5 1002.752 5 16.292 5 136.735

17 6 6.112 6 2522.502 6 1852.518 6 24.448 6 261.251

18 4 8.385 5 3862.466 5 3152.482 4 33.244 5 620.822

19 11 0.643 13 7.128 12 6.608 13 2.552 13 0.648

20 10 23.398 13 2582.408 12 2381.705 12 92.031 13 348.534

21 31 3.836 35 96.362 34 86.184 33 15.052 35 6.883

Best 10 19 11 0 5 0 10 0 11 2

Worst 7 0 6 19 11 0 5 2 6 0

Data ID	RRCB	DMA	PRA	EA	KGA
1	7	0.020	5	0.128	7	0.109	5	0.082	5	0.024
2	8	0.016	7	1.725	7	1.628	7	0.067	7	0.575
3	4	0.029	3	0.602	3	0.552	3	0.114	3	0.139
4	5	0.960	4	1.786	6	1.587	4	3.762	4	0.094
5	9	0.052	8	0.424	9	0.401	8	0.212	8	0.067
6	5	0.005	4	0.381	5	0.315	6	0.021	6	0.127
7	8	0.032	3	0.729	3	0.724	4	0.151	4	0.167
8	12	89.696	7	10956.619	22	8656.619	7	360.812	7	452.654
9	5	5.719	8	607.732	8	507.732	8	34.315	6	42.311
10	29	3.801	29	12.273	33	10.283	29	15.209	28	0.526
11	4	1.559	3	12.090	3	10.041	3	7.353	3	0.806
12	18	0.140	16	1.694	18	1.523	17	0.637	18	0.231
13	9	33.428	10	1784.118	10	1639.217	11	133.232	9	85.973
14	9	3.812	10	4831.528	12	2131.326	11	10.871	10	376.152
15	10	0.052	11	1.424	12	1.206	10	0.214	10	0.267
16	4	4.377	5	1202.723	5	1002.752	5	16.292	5	136.735
17	6	6.112	6	2522.502	6	1852.518	6	24.448	6	261.251
18	4	8.385	5	3862.466	5	3152.482	4	33.244	5	620.822
19	11	0.643	13	7.128	12	6.608	13	2.552	13	0.648
20	10	23.398	13	2582.408	12	2381.705	12	92.031	13	348.534
21	31	3.836	35	96.362	34	86.184	33	15.052	35	6.883
Best	10	19	11	0	5	0	10	0	11	2
Worst	7	0	6	19	11	0	5	2	6	0

Table 4

Comparison of the classification accuracies on reduced data sets with NB(%)

ID	RRCB	DMA	PRA	EA	KGA
1	92.0792 + 0.0001	84.1548 + 0.2010	85.2500 + 0.0301	84.1548 + 0.0210	84.1548 + 0.0102
2	72.4426 + 0.0011	71.6075 + 0.0004	71.2944 + 0.0010	72.4426 + 0.0021	71.6075 + 0.0032
3	83.3333 + 0.0106	59.6296 + 0.0090	59.5296 + 0.0041	71.8519 + 0.0240	59.6296 + 0.0001
4	93.2380 + 0.0003	65.3962 + 0.0250	57.5472 + 0.0017	61.5094 + 0.0064	67.3962 + 0.0030
5	85.8065 + 0.0025	80.0001 + 0.0010	81.2903 + 0.0037	81.2903 + 0.0015	80.0012 + 0.0042
6	77.7542 + 0.0002	68.4521 + 0.0118	68.7502 + 0.2010	61.2036 + 0.0301	70.2891 + 0.0410
7	81.5182 + 0.0029	62.3762 + 0.0102	64.2891 + 0.0051	75.2475 + 0.0120	72.8934 + 0.4010
8	94.2478 + 0.0237	92.4779 + 0.0420	92.0354 + 0.0035	83.6238 + 0.0104	90.2347 + 0.0320
9	82.3097 + 0.2091	62.6868 + 0.1046	66.7972 + 0.0136	78.7772 + 0.2019	62.3123 + 0.0429
10	92.2090 + 0.0429	88.2666 + 0.0211	89.1538 + 0.0501	90.1752 + 0.4102	90.7071 + 0.0012
11	66.9228 + 0.0150	75.2809 + 0.1042	74.9064 + 0.0052	66.2921 + 0.0310	75.2809 + 0.0310
12	78.6517 + 0.0420	77.1536 + 0.0023	77.5281 + 0.0162	78.2772 + 0.0053	77.5281 + 0.0121
13	93.8871 + 0.1032	75.0157 + 0.0511	75.0157 + 0.0591	94.7002 + 0.0032	74.8589 + 0.0013
14	59.5201 + 0.1093	63.2051 + 0.0254	65.9400 + 0.2010	65.9400 + 0.0311	59.6350 + 0.1029
15	93.1034 + 0.2015	92.6437 + 0.0083	87.3563 + 0.0630	91.2664 + 0.0243	91.2664 + 0.0018
16	100.000 + 0.0000	99.8495 + 0.0001	99.0214 + 0.0012	98.0342 + 0.0052	99.9204 + 0.0020
17	58.9573 + 0.3026	36.2531 + 0.0291	38.9302 + 0.0943	38.0029 + 0.1945	40.3892 + 0.0018
18	99.9982 + 0.0001	91.0934 + 0.0281	90.9382 + 0.0395	92.0975 + 0.1039	93.0946 + 0.0383
19	92.3123 + 0.0945	90.0193 + 0.0184	90.3021 + 0.0284	89.3452 + 0.3851	90.8459 + 0.0283
20	84.2671 + 0.0493	75.0385 + 0.0029	74.0925 + 0.0298	74.0925 + 0.2846	76.9375 + 0.3784
21	81.9340 + 0.0892	73.0962 + 0.0346	73.9043 + 0.1981	75.0394 + 0.0940	76.9092 + 0.0287
Best	19	1	1	0	0
Worst	1	7	5	6	2

Table 5

Comparison of the classification accuracies on reduced data sets with SMO(%)

ID	RRCB	DMA	PRA	EA	KGA
1	90.5021 + 0.0352	84.2310 + 0.0032	85.2347 + 0.0012	85.2347 + 0.0023	85.3389 + 0.0485
2	78.7065 + 0.0129	76.7223 + 0.0148	77.4530 + 0.0450	78.7056 + 0.0302	76.3201 + 0.0820
3	84.0741 + 0.0013	57.0370 + 0.0032	57.0370 + 0.0028	73.7037 + 0.0502	57.0372 + 0.0538
4	90.3396 + 0.0321	65.8491 + 0.0253	56.6038 + 0.0041	69.6226 + 0.0204	65.8491 + 0.0893
5	85.8065 + 0.0024	80.0110 + 0.0042	81.9335 + 0.0023	82.5806 + 0.0352	80.0110 + 0.0438
6	76.8800 + 0.0502	64.5328 + 0.0015	65.3468 + 0.0015	65.6253 + 0.0392	66.3258 + 0.0284
7	80.8581 + 0.0200	60.0662 + 0.0530	61.2390 + 0.0042	75.9076 + 0.0482	75.9076 + 0.0536
8	94.7628 + 0.4013	92.9204 + 0.0084	94.0921 + 0.0294	88.0531 + 0.0530	93.3628 + 0.0352
9	87.3492 + 0.0039	67.4021 + 0.0402	74.0391 + 0.0200	85.6762 + 0.0245	66.4769 + 0.0842
10	95.1052 + 0.0052	96.0756 + 0.0439	96.0756 + 0.0495	96.4956 + 0.0538	96.1827 + 0.0324
11	79.4007 + 0.0017	78.4050 + 0.0041	77.4356 + 0.0014	78.4050 + 0.0258	79.4007 + 0.0839
12	87.5196 + 0.0424	86.1432 + 0.0420	86.8914 + 0.0405	85.9678 + 0.0253	85.7678 + 0.0143
13	93.8558 + 0.0004	75.8307 + 0.0023	75.8307 + 0.0302	92.1097 + 0.0634	74.5141 + 0.0632
14	80.7822 + 0.0039	71.0821 + 0.0428	72.3428 + 0.0052	79.0853 + 0.0258	71.8550 + 0.0568
15	95.6422 + 0.0139	94.9425 + 0.0019	92.4138 + 0.0130	94.2831 + 0.0184	94.2828 + 0.0242
16	100.000 + 0.0000	99.6475 + 0.0022	98.1224 + 0.0051	98.6382 + 0.0155	99.7244 + 0.0622
17	65.9873 + 0.2021	41.2831 + 0.0192	40.0392 + 0.0442	41.0329 + 0.1742	42.3592 + 0.0718
18	99.9968 + 0.0201	92.0536 + 0.0101	91.9285 + 0.0591	91.0572 + 0.0173	92.0740 + 0.0283
19	93.5121 + 0.0241	90.1105 + 0.1163	89.3124 + 0.0187	90.3754 + 0.2854	90.5452 + 0.0253
20	86.2773 + 0.0396	76.0311 + 0.0509	73.1395 + 0.0220	75.0225 + 0.1844	75.6372 + 0.1786
21	82.9541 + 0.0262	73.5922 + 0.0547	74.9803 + 0.1482	74.0792 + 0.0741	75.8093 + 0.0607
Best	20	0	0	1	0
Worst	1	7	7	2	4

Table 6

Comparison of the classification accuracies on reduced data sets with J48(%)

ID	RRCB	DMA	PRA	EA	KGA
1	91.0891 + 0.0428	85.1485 + 0.0828	85.1485 + 0.0183	85.1485 + 0.0834	85.1485 + 0.0183
2	84.5511 + 0.0329	80.2714 + 0.0283	80.0626 + 0.0842	83.1716 + 0.0713	80.2714 + 0.0294
3	81.4858 + 0.0283	57.5185 + 0.0823	58.1572 + 0.0084	75.1852 + 0.0364	59.0124 + 0.0174
4	83.0189 + 0.0183	58.9057 + 0.0183	58.4096 + 0.0289	67.9623 + 0.0475	69.9057 + 0.0742
5	83.8710 + 0.0213	79.3548 + 0.0382	83.8710 + 0.0774	83.8710 + 0.0174	79.3548 + 0.0811
6	71.8750 + 0.0284	70.5000 + 0.0183	62.5001 + 0.0284	66.2511 + 0.0278	71.2318 + 0.0023
7	77.8877 + 0.0934	63.6964 + 0.0188	63.6964 + 0.0735	75.9076 + 0.0917	75.9076 + 0.0028
8	94.6908 + 0.0480	92.9204 + 0.0833	94.6906 + 0.0294	83.6238 + 0.0183	93.8053 + 0.0253
9	72.5021 + 0.0843	65.3203 + 0.0473	71.1922 + 0.0138	85.0890 + 0.0375	64.9644 + 0.0834
10	99.5588 + 0.0284	98.6245 + 0.0283	98.6245 + 0.0244	96.7722 + 0.0825	99.5307 + 0.0038
11	76.0300 + 0.0832	79.1007 + 0.0189	78.6517 + 0.0134	79.7753 + 0.0721	79.4007 + 0.0183
12	89.2659 + 0.0428	86.8914 + 0.0384	88.0150 + 0.0243	86.1423 + 0.0134	87.1511 + 0.0230
13	93.5423 + 0.0183	82.1003 + 0.0813	82.1003 + 0.0485	85.6426 + 0.0184	81.8182 + 0.0284
14	89.2314 + 0.0384	86.3027 + 0.0875	85.2103 + 0.0385	88.2750 + 0.0574	86.7350 + 0.0834
15	95.6332 + 0.0028	95.4023 + 0.0289	88.5057 + 0.0628	94.7126 + 0.0279	94.7126 + 0.0201
16	99.8900 + 0.5631	98.6475 + 0.0012	98.1224 + 0.0153	97.6382 + 0.0252	98.0244 + 0.1620
17	64.9373 + 0.1025	38.1838 + 0.0142	36.0342 + 0.0112	36.0221 + 0.1446	36.3693 + 0.0417
18	98.9562 + 0.0501	91.1526 + 0.0502	90.5287 + 0.0291	90.0376 + 0.0137	91.0640 + 0.0233
19	94.4151 + 0.0238	89.2204 + 0.0537	88.1125 + 0.0132	88.3450 + 0.2256	90.2456 + 0.0227
20	87.2441 + 0.0375	74.0611 + 0.0802	74.1793 + 0.0304	73.0262 + 0.1084	75.2376 + 0.1382
21	83.9641 + 0.0281	75.5222 + 0.0642	75.9860 + 0.1284	73.0637 + 0.0245	74.5093 + 0.0702
Best	19	0	0	1	1
Worst	1	1	8	8	3

Table 7

Comparison of the classification accuracies on reduced data sets with RF(%)

ID	RRCB	DMA	PRA	EA	KGA
1	94.0594 + 0.0188	85.1845 + 0.0819	85.1845 + 0.0181	85.1845 + 0.0184	85.1845 + 0.0182
2	92.5850 + 0.0834	83.9012 + 0.0842	82.6722 + 0.0893	91.8580 + 0.0884	83.9120 + 0.0392
3	78.8889 + 0.0883	57.4074 + 0.0485	57.4074 + 0.0421	69.6296 + 0.0187	68.3021 + 0.0282
4	88.0021 + 0.0181	68.3962 + 0.2859	71.3208 + 0.0298	75.5660 + 0.0084	83.3962 + 0.0184
5	84.5161 + 0.0289	82.5806 + 0.0821	82.5806 + 0.0284	83.8710 + 0.0841	82.5806 + 0.0856
6	72.5001 + 0.0281	64.3750 + 0.0829	59.3752 + 0.0189	53.7539 + 0.0849	60.3258 + 0.0189
7	76.5677 + 0.0287	55.1155 + 0.0182	55.1155 + 0.0018	70.9571 + 0.0827	70.7923 + 0.0870
8	95.1327 + 0.0483	94.2478 + 0.0284	95.1327 + 0.0287	87.6106 + 0.0187	94.6903 + 0.0793
9	81.2360 + 0.0039	71.9751 + 0.0538	79.6975 + 0.0018	87.1601 + 0.0174	70.8897 + 0.0849
10	100.000 + 0.0000	100.000 + 0.0185	98.3476 + 0.0188	100.000 + 0.0000	99.9678 + 0.0174
11	81.8092 + 0.0185	76.0300 + 0.0198	78.2772 + 0.0742	79.0260 + 0.0171	76.0300 + 0.0874
12	93.2584 + 0.0857	92.8839 + 0.0838	92.5094 + 0.0173	91.7063 + 0.0864	91.3858 + 0.0271
13	99.9231 + 0.0184	99.9687 + 0.0008	97.4052 + 0.0184	99.9687 + 0.0004	99.9373 + 0.0001
14	94.7892 + 0.0495	92.3102 + 0.0184	90.4369 + 0.0850	93.2900 + 0.0742	92.8450 + 0.0183
15	95.8621 + 0.0281	95.6332 + 0.0891	89.1954 + 0.0882	95.4023 + 0.0811	95.4023 + 0.0742
16	100.000 + 0.0000	99.2276 + 0.40212	98.1114 + 0.0550	98.5322 + 0.11553	98.6243 + 0.0521
17	63.3773 + 0.5022	40.2435 + 0.0219	42.1393 + 0.01426	40.0623 + 0.1448	41.3698 + 0.0211
18	97.8961 + 0.0309	90.0238 + 0.0802	90.3285 + 0.02941	92.0871 + 0.0103	91.0692 + 0.0782
19	94.5212 + 0.0423	91.2105 + 0.2265	88.5124 + 0.0782	91.2757 + 0.1852	89.2455 + 0.2256
20	84.5114 + 0.0692	72.0216 + 0.0206	70.1395 + 0.0220	69.0337 + 0.1243	68.5373 + 0.1687
21	85.5543 + 0.0463	72.5421 + 0.0843	74.2804 + 0.1470	74.0396 + 0.0245	76.3092 + 0.0305
Best	20	0	0	1	0
Worst	0	5	11	3	2

Among the 21 data sets, the RRCB algorithm has achieved good time performance in 19 data sets. On the data sets Handwritten, Mushroom and Sati-mage, the calculation time required by the RRCB algorithm is (89.869, 4.377, 23.389) seconds, and the DMA algorithm with poor time effect needs 10956.61, 1202.723, 2582.408) seconds on these data sets respectively. Especially, when reducing data sets with many samples and few attributes, RRCB has a more obvious time effect compared with the other four algorithms. For example, in the data sets Letters, Krkopt and Shuttle, the RRCB algorithm takes time (3.812,6.122,8.385), seconds, while the DMA algorithm takes(4831.528, 2522.502,3862.466) seconds respectively. The main reason is that the RRCB algorithm makes full use of the previous division results to calculate the sample similarity when calculating the similarity distance, while the DMA algorithm needs to consume a lot of space to build a discernibility matrix, wasting a lot of memory, and resulting in the speed declining of the calculation.

Similarly, the RRCB algorithm also has an excellent performance in the classification accuracy of the reduced set. We use the four classifiers NB, SMO, J48 and RF in the Weak tool to classify and analyze the reduction sets obtained by the above five reduction algorithms. Among them, on the NB and J48 classifiers, 19 of the reduced sets of the RRCB algorithm obtained the highest accuracy, and on the SMO and RF classifiers, 20 best results were obtained, and some accuracy was even higher than 30%. For example, on the NB classifier, the Promoter data set reduced by the RRCB algorithm obtained an accuracy of 93.2380, and the classification accuracy of the other four algorithms reduced sets were (65.3962,57.5472, 61.5094,67.3962) respectively. Similarly, for classifier SMO, RRCB achieves the best accuracy 90.3396, which is far better than the results(65.8491,56.6038, 69.6260,65.8491) obtained by DMA, PRA, EA and KGA. After reduction on data set Heart-statlog, RRCB gets an accuracy of 81.4858 by classifier J48, and the corresponding results of other four algorithms are(57.5185, 58.1585,75.1852,59.0124). Table 4 –7 shows that our method has better classification effects with higher data quality than other methods.

4.2 Correlation analysis

In this section, we analyze the correlation between the original data set and the changes in the reduced data set. In this article, the standard Euclidean distance is used to measure the similarity between samples, and the commonly used VDM [31] method is to calculate the distance for non-numeric attributes ${VDM}_{a} (x, y) = \sum_{i = 1}^{| U / D |} {| \frac{m_{a, i} (x)}{m_{a} (x)} - \frac{m_{a, i} (y)}{m_{a} (y)} |}^{2}$ (19)

Where |U/D| is the classification number of the sample set on the decision attribute, m_a (x) represents the number of samples with value x on attribute a, and m_a,i (x) represents the number of samples with value x on attribute a and belonging to the i-th class on decision attribute. Then by using the Euclidean distance to calculate the distance between the objects x and y, the formula is as following: $dis (x, y) = \sqrt{\sum_{i = 1}^{| C |} {VDM}_{Ci} (x, y)}$ (20)

|C| represents the number of conditional attributes,

Ci represents the i-th conditional attribute. By Using the formula Sig to standardize the calculation results, Sig is as following: $Sig (x, y) = \frac{1}{1 + e^{- d (x, y)}}$ (21)

Due to that the reduction only removes redundant conditional attributes without reducing the number of samples, the number of samples in the data set remains unchanged before and after the reduction. In order to analyze the correlation between the original data and the reduced data, we use the Pearson coefficient, and they are strongly correlated if the result exceeds 0.8. We use the Jarque-Bera (in short JB) test to detect the significance of the correlation results. If the result is closer to 0, we believe that the entire data is closer to the normal distribution. Generally, the threshold is set to 0.05. If the significance result is less than 0.05, the original hypothesis is acceptable. For a data set with n samples, pairwise combinations form a total of $C_{n}^{2} = n (n - 1) / 2$ pairs, the Euclidean distance between each group of objects is calculated respectively. Supposing the element pair distance sequence formed by the original data is X, and the distance sequence formed by the reduced data is Y, the Pearson coefficient is used to calculate the correlation between X and Y. The Pearson coefficient is defined as following: $ρ_{X, Y} = \frac{cov (X, Y)}{σ_{X} σ_{Y}}$ (22)

Where σ_X, σ_Y is the standard deviation of X,Y respectively, and cov (X, Y) is the covariance of (X,Y). JB test relies on skewness Skew(X) and kurtosis Kurt(X) for normality test. $\begin{matrix} Skew (X) = E [{(\frac{X - E (X)}{σ_{X}})}^{3}], \\ Kurt (X) = E [{(\frac{X - E (X)}{σ_{X}})}^{4}], \\ JB = \frac{C_{n}^{2}}{6} [{Skew}^{2} + \frac{(Kurt - 3)^{2}}{4}] \end{matrix}$ (23)

Where E (X) is the mean. In order to verify the consistency of the data sets before and after reduction, we conducted correlation and significance analysis on the original data and the reduced data. The results are shown in Table 8. For the convenience of description, the data set reduced by the RRCB algorithm is referred to as RRCB_data, and the original data set is referred to as Raw-data. In the 21 data sets in Table 2, the correlation between RRCB-data and Raw-data is greater than 0.8. Among them, the correlation of 13 data sets is greater than 0.9, indicating that RRCB_data and Raw-data maintain a high degree of similarity. In order to evaluate the reliability of the correlation of the data sets before and after the reduction, we carried out a significance test on the correlation between the 21 RRCB-data and Raw-data, and the results are shown in the Revealing column of Table 8. All the results of significance detection are less than the common threshold of 0.05, and the significance results of 18 data sets are less than 0.005, indicating that the previous correlation analysis is completely feasible. Therefore, it can be concluded that the reduced set obtained by the RRCB algorithm is strongly correlated with the original data set, and can fully retain the original information of data set.

Table 8

Correlation and Significance analysis between the original and the reduced data

ID	Correlation	Revealing	ID	Correlation	Revealing
1	0.9803	0.0002	12	0.9691	0.0022
2	0.993	0.0015	13	0.9681	0.0008
3	0.8982	0.0002	14	0.8158	0.0004
4	0.8421	0.0003	15	0.993	0.0045
5	0.9872	0.0001	16	0.9923	0.0026
6	0.921	0.0057	17	0.9321	0.0035
7	0.8317	0.0017	18	0.8945	0.0055
8	0.9012	0.0033	19	0.9592	0.0024
9	0.8562	0.0015	20	0.9749	0.0015
10	0.9753	0.0021	21	0.8826	0.0089
11	0.8298	0.003

4.3 Parameter λ_i analysis

When we perform cluster analysis, in order to get the best clustering effect, the distance which is usually required between samples within intra-class should be close, and the sample distance between different clusters should be far. In section 3.1, we propose four combined distance calculation methods. Different methods have different parameters λ_i for calculating the distance between samples. The proportion of intra-class distance and inter-class distance is adjusted by changing the size of parameter λ_i to achieve the best clustering effect. The reduction results obtained by RRCB are processed by four classifiers to get the classification accuracy, and the mean value of them is used as the criteria for evaluating data validity. Since λ_i ∈ [0, + ∞), when λ_i approaches 0, the intra-class distance weight will increase. When λ_i become larger and larger, the inter-class distance weight will increase.

It can be seen from Fig. 1 that when λ_i is increased, the accuracy of the reduced data set in the four cases is on the rise. But when it reaches a certain level, the accuracy will decrease reversely. It shows that the classification accuracy of the obtained data set is higher when both the distance of intra-class and inter-class are taken into consideration for the reduction method in the context of clustering. When λ_i < 1, with the increase of λ_i, the classification accuracy of the reduced data set increase accordingly. When λ_i > 1, the weight of the inter-class distance is getting larger and larger, and the classification accuracy tends to be flat, with little change, but abrupt accuracy drop will occur until it reaches a certain value. In this case, the distance of intra-class has little effect on the distance of inter-class, resulting in lower and lower accuracy.

Fig. 1

Influence of λ_i with average classification accuracy of all classifiers on all data sets.

5 Conclusion and outlook

Attribute reduction is a popular technique used in the data preprocessing. Although the classic rough sets attribute reduction models have various methods, most of them define the importance of attributes from the perspective of equivalence class division or indistinguishable relationship, which leads to inconsistent division of data sets after reduction and affects the classification accuracy of data sets. In order to improve the classification accuracy of the reduced data set, according to the similarity, dissimilarity, intra-class, inter-class between the samples, we propose four calculation methods for the distance between samples, and then define the importance of attributes according to different distance calculation methods, and design reduction methods. Since this method fully considers the relationship between conditional attribute equivalence class division and the decision attribute classification, the classification accuracy of the reduced data set is greatly improved. In order to verify the effectiveness of the algorithm RRCB, the correlation ofthe data sets before and after the reduction is also compared, and the significance is used as a supplement as well. The disadvantage of the algorithm in this paper is that it does not consider the dynamic changes of the data set. When the samples of the data set changed, the previous reduction results may not be applicable, and it will cost a lot of time to recalculate the reduction set. In addition, the measurement of similarity and dissimilarity between samples needs to be further improved. Hence, it is highly expected in the near future to consider the introduction of incremental learning methods into rough sets attribute reduction while maintaining the ideal classification accuracy, so as to make the algorithm be adaptive to dynamic data sets.

Footnotes

Acknowledgments

This work was partially supported by the Natural Science Foundation of China(61836016), the key disciplines of Computer Science technology of Chaohu University (kj22zdxk01), The Provincial Natural Science Research Program of Higher Education Institutions of Anhui province(KJ2021A1030), the Key Subject Sub-projects of Chaohu University ZDXK-201815.

Conflicts of interest

The authors declare that they have no conflicts of interest to report regarding the present study.

References

Pawlak

and Skowron

, Rudiments of rough set, Inf Sci 177 (2007), 3–27.

Pawlak

, Rough set, Int J Comput Inf Sci 11 (1982), 341–356.

Pawlak

, Rough set theory and its application to data analysis, Cybern Syst 29 (1998), 661–668.

Lingas

, Chen

and Miao

D.Q.

, Rough cluster quality index based on decision theory, IEEE Trans Knowledge Data Engage 21(7) (2009), 1014–1026.

Riza

L.S.

, Janusz

, Bergmeir

, et al., Implementing algorithms of rough set theory and fuzzy rough set theory in the r package “rough sets”, Inf Sci 287 (2014), 68–89.

Ananthanarayana

V.S.

, Narasimha

M.M.

and Subramanian

D.K.

, Tree structure for efficient data mining using rough sets, Pattern Recognition Lette 24 (2003), 851–862.

Yao

Y.Y.

and Zhao

, Attribute reduction in decision-theoretic rough set models, Inf Sci 178(17) (2008), 3356–3373.

Yao

Y.Y.

, Three-way decisions with probabilistic rough sets, Inf Sci 180(3) (2010), 341–353.

Q.H.

, Yu

D.R.

and Xie

Z.X.

, Neighborhood classifiers, Expert Syst Appl 34 (2008), 866–876.

10.

Duntsch

and Gediga

, Uncertainty measures of rough set prediction, Artificial Intelligence 106(1) (1998), 109–137.

11.

Feng

, Luo

, Fang

, et al., Approaches for attribute core and attribute reduction based on improved extended positive region, J Shandong Univ (Eng Sci) 47(1) (2012), 73–76. (in Chinese).

12.

Dong

, Sun

and Yang

Y.Y.

, Fast algorithm of attribute reduction for covering decision systems with minimal elements in discernibility matrix, Int J Mach Learn & Cyber 7 (2016), 297–310.

13.

Yao

Y.Y.

and Zhao

, Discernibility matrix simplification for constructing attribute reducts, Inf Sci 179(7) (2009), 867–882.

14.

Qian

, Miao

D.Q.

, Zhang

Z.H.

and Li

, Hybrid approach es to attribute reduction based on indiscernibility and discernibility relation, Int J Approx Reason 52(2) (2011), 212–230.

15.

Zhang

, Mei

C.L.

, Chen

D.G.

, et al., Active Incremental Feature Selection Using a Fuzzy-Rough-Set-Based Information Entropy, IEEE Transactions on Fuzzy Systems 28(5) (2020), 901–915.

16.

Zheng

K.F.

and Wang

X.J.

, Feature selection method with joint maximal information entropy between features and class, Pattern Recognition 77 (2018), 20–29.

17.

Yang

, Approximate Reduction Based on Conditional Information Entropy in Decision Tables, Acta Electronica Sinica 35(11) (2007), 2156–2160.

18.

Yao

Y.Y.

, Zhou

and Chen

Y.H.

, Interpreting low and high order rules: a granular computing approach, in: Proc. RSEISP, (2007), 371–380.

19.

Jing

Y.G.

, Li

T.R.

, Huang

and Zhang

, An incre mental attribute reduction approach based on knowledge granularity under the attribute generalization, Int J Approx Reason 76 (2016), 80–95.

20.

Zhao

, Yao

and Luo

, Data analysis based on dis cernibility and indiscernibility, Inf Sci 177(22) (2007), 4959–4976.

21.

Jia

X.Y.

, Rao

, Shang

and Li

T.J.

, Similarity-based at tribute reduction in rough set theory: a clustering perspec tive, International Journal of Machine Learning and Cybernetics 11 (2020), 1047–1060.

22.

Yin

L.Z.

and Jiang

Z.H.

, A fast attribute reduction algorithm based on a positive region sort ascending decision table, Symmerty-Basel 12 (2020), 1–18.

23.

, Zhao

S.Y.

, Wang

X.Z.

, et al., PARA: A posi tive-region based attribute reduction accelerator, Inf Sci 11(503) (2019), 533–550.

24.

Sowkuntla

and Prasad

P.S.V.S.S.

, MapReduce based paral lel fuzzy-rough attribute reduction using discernibility matrix, Applied Intelligence, (2021). https://doi.org/10.1007/S10489-021-02253-1

25.

Wei

and Wu

X.Y.

, Discernibility matrix based incre mental attribute reduction for dynamic data, Knowledge-Based Systems 140 (2018), 142–157.

26.

Wang

Y.B.

, Chen

X.J.

and Dong

, Attribute reduction via local conditional entropy, International Journal of Machine Learning and Cybernetics 10(12) (2019), 3619–3634.

27.

Liang

B.H.

, Wang

and Liu

, Attribute reduction based on improved information entropy, Journal of Intelligent & Fuzzy Systems 36(1) (2019), 709–718.

28.

Yang

, Zhong

, Lang

, et al., Granular Matrix: A New Approach for Granular Structure Reduction and Redundancy Evaluation, IEEE Transactions on Fuzzy Systems 28(12) (2020), 3133–3144.

29.

Gao

, Lai

Z.H.

, Zhou

, et al., Granular maximum decision entropy-based monotonic uncertainty measure for attribute reduction, Int J Approx Reason 104 (2019), 9–24.

30.

Thangavel

and Pethalakshmi

, Dimensionality reduction based on rough set theory: a review, Appl Soft Comput 9(1) (2009), 1–12.

31.

Stanfill

and Waltz

, Toward memory-based reasoning, Commun ACM 29(12) (1986), 1213–1228.

32.

Yang

B.R.

, Liu

Z.P.

and Yang

B.R.

, A Quick Attribute ReductionAlgorithm with Complexity of max(O (|C||U|) , O (|C|²|U/C|)), Chinese of Computer 29(3) (2006), 391–399.

33.

Jing

Y.G.

and Li

T.R.

, Reduction Algorithm of Positive Do main for Decision Table Based on Relationship Matrix, Computer Science 40(11) (2013), 261–264.

34.

Yang

and Sun

Z.H.

, Improvement of discernibility matrix and the computation of a core, Journal of Fudan University (Natural Science) 43(5) (2004), 865–868.

35.

Yao

Y.Y.

, Interpreting concept learning in cognitive infor mat ics and granular computing, IEEE Trans Syst Man Cybern 39(4) (2009), 855–866.

36.

Qian

Y.H.

, Liang

X.Y.

, Lin

G.P.

, et al., Local multi-granulation decision-theoretic rough sets, Int J Approx Reason 82 (2017), 119–137.

37.

D.Y.

and Chen

Z.J.

, A new discernibility matrix and the computation of a core, Acta Electronic Sinica 30(7) (2002), 1086–1088.

38.

Yao

Y.Y.

, Probabilistic rough set approximations, Int J Approx Reason 49 (2018), 255–271.

39.

W.W.

, Huang

Z.Q.

, Jia

X.Y.

and Cai

X.Y.

, Neighborhood based decision-theoretic rough set models, Int J Approx Reason 69 (2016), 1–17.

40.

Yue

X.D.

, Chen

Y.F.

, Miao

D.Q.

and Qian

, Tri-partition neighborhood covering reduction for robust classification, Int J Approx Reason 83 (2017), 371–384.

41.

, Li

L.S.

, Xu

and Yang

C.J.

, Quick general reduction algorithms for inconsistent decision tables, Int J Approx Reason 82 (2017), 56–80.

42.

Qian

, Liang

, Wang

, et al., Local rough set: a solution to rough data analysis in big data, Int J Approx Reason 97 (2018), 38–63.

43.

Xie

and Qin

, A novel incremental attribute reduc tion approach for dynamic incomplete decision systems, Int J Approx Reason 93 (2018), 443–462.

44.

UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/

45.

Saha

and Bandyopadhyay

, Application of a new symmetry-based cluster validity index for satellite image segmentation, IEEE Geoscience and Remote Sensing Letters 5(2) (2008), 166–170.

46.

Yang

M.H.

, Research on Internal Validation and Algorithm for Categorical Data Clustering, University of Science and Technology Beijing (in Chinese).

47.

Gao

, Lai

Z.H.

, Zhou

, et al., Maximum decision entro py-based attribute reduction in decision-theoretic rough set model, Knowledge-Based Systems 143 (2018), 179–191.

48.

Jing

Y.G.

, Li

T.R.

, Luo

, et al., An incremental approach for attribute reduction based on knowledge granularity, Knowledge-Based Systems 104 (2016), 24–38.

Data ID	RRCB		DMA		PRA		EA		KGA
	RS	Time/S	RS	Time/S	RS	Time/S	RS	Time/S	RS	Time/S
1	7	0.020	5	0.128	7	0.109	5	0.082	5	0.024
2	8	0.016	7	1.725	7	1.628	7	0.067	7	0.575
3	4	0.029	3	0.602	3	0.552	3	0.114	3	0.139
4	5	0.960	4	1.786	6	1.587	4	3.762	4	0.094
5	9	0.052	8	0.424	9	0.401	8	0.212	8	0.067
6	5	0.005	4	0.381	5	0.315	6	0.021	6	0.127
7	8	0.032	3	0.729	3	0.724	4	0.151	4	0.167
8	12	89.696	7	10956.619	22	8656.619	7	360.812	7	452.654
9	5	5.719	8	607.732	8	507.732	8	34.315	6	42.311
10	29	3.801	29	12.273	33	10.283	29	15.209	28	0.526
11	4	1.559	3	12.090	3	10.041	3	7.353	3	0.806
12	18	0.140	16	1.694	18	1.523	17	0.637	18	0.231
13	9	33.428	10	1784.118	10	1639.217	11	133.232	9	85.973
14	9	3.812	10	4831.528	12	2131.326	11	10.871	10	376.152
15	10	0.052	11	1.424	12	1.206	10	0.214	10	0.267
16	4	4.377	5	1202.723	5	1002.752	5	16.292	5	136.735
17	6	6.112	6	2522.502	6	1852.518	6	24.448	6	261.251
18	4	8.385	5	3862.466	5	3152.482	4	33.244	5	620.822
19	11	0.643	13	7.128	12	6.608	13	2.552	13	0.648
20	10	23.398	13	2582.408	12	2381.705	12	92.031	13	348.534
21	31	3.836	35	96.362	34	86.184	33	15.052	35	6.883
Best	10	19	11	0	5	0	10	0	11	2
Worst	7	0	6	19	11	0	5	2	6	0