Attribute reduction based on improved information entropy

Abstract

The choice of attribute significance is the most important step of attribute reduction algorithm. Information entropy is a method of calculating the importance of attributes. Due to the fact that information view only takes the size of knowledge granularity into account rather than measures the importance of attributes objectively and comprehensively this paper begins with putting forward the definition of approximate boundary accuracy based on algebra view. Afterwards, this paper proposes two concepts of relative information entropy and enhanced information entropy based on the definition of relative fuzzy entropy, which has obvious magnification effect. Then, two new methods of attribute reduction are proposed by incorporating approximate boundary precision into relative information entropy and enhanced information entropy, so that the choice of the importance of the attribute is more objective and comprehensive. Finally, it will analyze and compare the classification accuracy of each kind of algorithm by using the SVM classifier and ten-fold crossover method, and analyze the influence of outliers on the effect of the algorithm. Through experimental analysis and comparison, it can be concluded that the attribute reduction based on improved entropy is feasible and effective.

Keywords

Attribute reduction approximate boundary accuracy relative information entropy enhanced information entropy blend entropy

1 Introduction

Rough set theory [1, 2] is a typical mathematical tool to dispose the imprecise, incomplete, and inconsistent information in fuzzy information system. At present, rough set theory has been successfully applied to a number of areas: data mining, machine learning, rule generation, troubleshooting, risk profile and other fields [3 –7].

Attribute reduction is one of the core contents in the research of rough set. It aims to remove redundancy attribute and replace original data with fewer attribute to achieve the ideal reduction effectiveness in the case of keeping the same capacity of classification. In recent years the researchers of rough set have proposed lots of measures about attribute importance, based on which many heuristic attribute reduction algorithms are designed. Up till now, there are two views in rough set theory, including algebra view and information view. The former bases on the indiscernibility relations and studies the effect of attribute on certain classfication subset in domain. It defines the attribute importance with algebraic expression and the algorithms mainly contain positive [8 –10], discernibility matrix [11, 12]. The latter is to study the effect of attribute on uncertain classfication subset in domain from the view of information granules, which can measure the size of knowledge granularity effectively.

Information entropy was proposed by Shannon to measure the changes of material state in 1948 [13]. Then the concept is introduced to rough set by many researchers to measure uncertainty information and some attribute reduction algorithms based on information entropy emerge accordingly [14 –20]. Literature [14] proposes an attribute reduction algorithm based on conditional informattion entropy. Literature [15] proposes approximate reduction based on conditional information entropy. According to the actual application, literature [15] removes some redundant attributes, effectively enhancing its anti-noise and anti-interference ability, and playing a supplementary role in attribute reduction of decision table based on conditional information entropy. And literature [16] proposes the concept of correcting condition entropy and fuzzy relative entropy in decision system based on the characteristic of uncertainty of fuzzy entropy and blurring.

Most of the existing methods of attribute reduction based on entropy mainly calculate the importance of attribute through infromation theory, and seldom do they combine with algebra view. Literature [14] proposes and proves that algebra view is highly complementary to attribute importance defined by information theory. The objectivity of measurement of attribute importance directly influence the results of attribute reduction and time consume of the system. In order to search the most important attribute quickly and precisely, it is necessary to take all associated factors into consideration and propose a more comprehensive attribute importance measurement method by combining the features of algebra view and information theory. In addition, the existing attribute reduction algorithms based on entropy have higher time complexity and need to be imporved.

In view of the above question, literature [17] proposes approximate decision entropy, which combines the approximation quality under algebra view and the conditional entropy under information theory, then an overall attribute importance measurement method is gained. To further investigate the measurement metohd objectively and fully, this paper proposes a new information entropy of approximate boundary precision model based on algebra view. At first, it calculates the effect of the positive region on boundary precision under algebra view. This paper proposes and proves that approximate boundary precision and approximate precision have the same monotonicity, and play a role of modifying approximate precision with magnification. Secondly, the concept of the relative information entropy and the enhanced entropy are proposed on the basis of fuzzy relative entropy in literature [16] from the perestive of information theory. This paper discusses the monotonicity and reasonability of the two information entropy, which has more obvious magnifaction effect than the fuzzy relative entropy. Hence, A new reduction method is proposed on the basis of combinig modified approximate boundary precision with two kinds of entropy. Finally, experimental analysis and comparison with six kinds of similar algorithms on reduction time and attribute redction results will be given.

This paper proposes the reduction algorithm of relative information entropy (ABIE) and enhanced information entropy (ABIE′) based on approximate boundary precision information entropy. In the process of attribute reduction, as the algorithm of ABIE and ABIE′ shows non-strict monotonicity, it provides the basis for the rationality of the algorithm. Meanwhile the new model which measures the importance of attribute has obvious magnifaction effect than the fuzzy relative entropy of paper [16] under the same conditions. It can provide an effective way to select the most important attribute quickly. The new information entropy model is more comprehensive because it considers the effect of determined-classfication and uncertainty fully. And the reduction algorithm shows the feasibility and validity of the new model through the analysis of experimental data. Besides, the advantages of time performance and attribute reduction quality are shown under the new information entropy model where deficiency is great influenced by the outer rate which can provide the reasonable references.

2 Preliminary

Definition 1. Let DT = (U, C ∪ D, V, f) be a decision table, among which _U is non-empty finite set; C is condition attribute set and D is decision attribute set, C∩ D = ∅. Meanwhile, V = ⋃ _a∈C∪DV_a, V_a is attribute range of “a”; and f : U × (C ∪ D) → V is an information function, i.e., for any a ∈ C ∪ D and x ∈ U then f (x, a) ∈ V_a. For every subset B ⊆ A, an indiscernibility relation ind (B) is defined as follows: $\begin{matrix} IND (B) = {(x, y) \in U^{2} | \forall b \in B, f (x, b) = f (y, b)} \end{matrix}$ (1)

Obviously, if IND (B) denotes as U/B, U/B is an equivalence relation. Supposing U/B includes x, the equivalence relation x is defined as follows: $[x]_{B} = {y | \forall b \in B, f (x, b) = f (y, b)}$ (2)

Definition 2. Given the decision table DT = (U, C ∪ D, V, f), for every subset B ⊆ C ∪ D, X ⊆ U, the lower approximation set, the upper approximation set, the positive region, the boundary region and the negative region of X can be defined by the basic set of B as follows, $\begin{matrix} B (X)^{*} = {x \in U | [x]_{B} \cap X \neq \emptyset} = \cup {[x]_{B} \in U / B | [x]_{B} \cap X \neq \emptyset} . \end{matrix}$ (3) $\begin{matrix} B (X)_{*} = {x \in U | [x]_{B} \subseteq X} = \cup {[x]_{B} \in U / B | [x]_{B} \subseteq X} . \end{matrix}$ (4) ${POS}_{B} (X) = B (X)_{*}$ (5) $BND (X) = B (X)^{*} - B (X)_{*}$ (6) ${NEG}_{B} (X) = U - B (X)^{*}$ (7)

Definition 3. [19] Given the decision table DT = (U, C ∪ D, V, f), for any subset B ⊆ C ∪ D, X ⊆ U approximate precision of X with respect to IND (B) is defined as follows:

$α_{B} (X) = \frac{| B (X)_{*} |}{| B (X)^{*} |}$ (8)

Definition 4. Given the decision table DT = (U, C ∪ D, V, f), for any subset B ⊆ C ∪ D, X ⊆ U approximate boundary precision of X with respect to IND (B) is defined as follows:

$β_{B} (X) = \frac{| {POS}_{B} (X) |}{| BND (X) |}$ (9)

Proposition 1. Given the decision table DT = (U, C ∪ D, V, f), for any subset B ⊆ C ∪ D, X ⊆ U, the approximate precision α_B (X) has the same monotony with the approximate boundary precision β_B (X). M eanwhile the β_B (X) is equal to or great er than the α_B (X).

Proof. By Definitions 3 and 4, we can get the result as follows,

$\begin{matrix} β_{B} (X) = \frac{| P O S_{B} (X) |}{| B N D (X) |} = \frac{| B {(X)}_{*} |}{| B {(X)}_{*} | - | B {(X)}_{*} |} \\ = \frac{\frac{| B {(X)}_{*} |}{| B {(X)}^{*} |}}{1 - \frac{| B {(X)}_{*} |}{| B {(X)}^{*} |}} = \frac{α_{B} (X)}{1 - α_{B} (X)} \end{matrix}$ (10)

Supposing that as the α_B (X) increases and 1 - α_B (X) decreases, the value of $\frac{α_{B} (X)}{1 - α_{B} (X)}$ tends to increase. If the α_B (X) decreases and the 1 - α_B (X) increases, the value of $\frac{α_{B} (X)}{1 - α_{B} (X)}$ tends to decrease. Therefore, the α_B (X) and the β_B (X) has the same monotony.

By Definitions 2 and 3, we arrive at the conclusion of 0 ≤ α_B (X) ≺1. Then the value of β_B (X) - α_B (X) is $\frac{α_{B} (X)}{1 - α_{B} (X)} - α_{B} (X)$ , the result is $\frac{α_{B} (X)^{2}}{1 - α_{B} (X)}$ . Because the value of α_B (X) ² and the value of 1 - α_B (X) are both greater than or equal to zero, the β_B (X) is bigger than the α_B (X).

In a word, both the α_B (X) and the β_B (X) have the same monotonicity. In order to calculate the unity in scope and confine it between 0 and 1, the amended approximate boundary precision definition is proposed as follows,

Definition 5. Given the decision table DT = (U, C ∪ D, V, f), for any subset B ⊆ C ∪ D, X ⊆ U, amended approximate boundary precision of X with respect to IND (B) is defined as follows, $β_{B}^{'} (X) = α_{B} (X) (2 - α_{B} (X))$ (11)

Obviously, we can get the result of $0 \leq β_{B}^{'} (X)$ ≤1.

Proposition 2. Given the decision table DT = (U, C ∪ D, V, f), for any subset B ⊆ C ∪ D, X ⊆ U, the approximate precision α_B (X) has the same monotony as the approximate boundary precision $β_{B}^{'} (X)$ , and meanwhile the $β_{B}^{'} (X)$ is equal to or great er than the α_B (X) and $0 \leq {β^{'}}_{B} (X) \leq 1$ .

Proof. By definition 5, ${β^{'}}_{B} (X) = 1 - {(α_{B} (X) - 1))}^{2}, {β^{'}}_{B} (X)$ , is parabola going downwards regarding with α_B (X) and vertex is (1,1). Because 0 ≤ α_B (X) ≤1, the and the α_B (X) have the same monotony and $0 \leq {β^{'}}_{B} (X) \leq 1$ . $\begin{matrix} {β^{'}}_{B} (X) - α_{B} (X) = α_{B} (X) (2 - α_{B} (X)) - α_{B} (X) \\ = α_{B} (X) (1 - α_{B} (X) \geq 0 \end{matrix}$

From above, we can get the conclusion that $β_{B}^{'} (X)$ is equal to or greater than α_B (X) 0 and .

3 Improved information entropy based on approximate boundary precision

3.1 Relative information entropy and enhanced information entropy

Definition 6. [16] Given the decision table DT = (U, C ∪ D, V, f), let U be divided into m sub-division U/D = {Y₁, Y₂, ⋯ , Y_m} under the decision attribute D, and n sub-division under the condition attribute B, B ⊆ C such as U/B = {X₁, X₂, ⋯ , X_n}. The amended condition information entropy of relative knowledge B under the decision attribute D is as follows_.

$\begin{matrix} H (D_{B}) = & - Σ_{j = 1}^{m} \sum_{i = 1}^{n} [μ_{Yj}^{B} (x_{i}) ln μ_{Yj}^{B} (x_{i}) + (1 - μ_{Yj}^{B} (x_{i})) \\ ln (1 - μ_{Yj}^{B} (x_{i}))] \end{matrix}$ (12)

Relative fuzzy entropy E (D_B) is $\sum_{j = 1}^{m} \sum_{i = 1}^{n} μ_{Yj}^{B} (x_{i}) (1 - μ_{Yj}^{B} (x_{i}))$ . The rough membership function is $μ_{Y}^{B} (x) = \frac{| [x]_{P} \cap Y |}{| [x]_{P} |}$ under the concept of knowledge B. The equivalent form of the relative fuzzy entropy is defined as follows: $E^{'} (D_{B}) = \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |})$ (13)

The equivalent form of the amended condition information entropy is as follows, $\begin{matrix} H^{'} (D_{B}) = - \sum_{i = 1}^{n} | X_{i} | \times \sum_{j = 1}^{m} [\frac{| X_{i} \cap X_{j} |}{| X_{i} |} ln \frac{| X_{i} \cap X_{j} |}{| X_{i} |} \\ + (1 - \frac{| X_{i} \cap X_{j} |}{| X_{i} |}) ln (1 - \frac{| X_{i} \cap X_{j} |}{| X_{i} |})] \end{matrix}$ (14)

Definition 7. Given the decision table DT = (U, C ∪ D, V, f), let U be divided into m sub-division U/D = {Y₁, Y₂, …, Y_m} under the decision attribute D, and n sub-division under the condition attribute B, such as U/B = {X₁, X₂, …, X_n}. The relative information entropy is proposed as follows,

$\begin{matrix} \begin{matrix} RIE (D_{B}) = {\begin{matrix} \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - \bar{P (Y_{j} | X_{i})} / - P (Y_{j} | X_{i})] \frac{1}{2} < P (Y_{j} | X_{i}) \leq 1 \\ \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - P (Y_{j} | X_{i}) / - \bar{P (Y_{j} | X_{i}})] 0 \leq P (Y_{j} | X_{i}) \leq \frac{1}{2} \end{matrix} \end{matrix} \end{matrix}$ (15)

Let P (Y_j|X_i) be $\frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ and $\bar{P (Y_{j} | X_{i})}$ be $1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ , then the equivalent form of RIE (D_B) will be follows: (1) when $\frac{1}{2} \leq \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} \leq 1$ , then $\begin{matrix} RIE (D_{B}) = \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}) / - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}] \end{matrix}$ (16)

(2) when $0 \leq \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} < \frac{1}{2}$ , then $\begin{matrix} RIE (D_{B}) = \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} / - (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |})] \end{matrix}$ (17)

Theorem 1. Given the decision table DT = (U, C ∪ D, V, f), Let B_i ⊆ C and B_i+1 ⊆ C, if U/B_i+1 is less than U/B_i, then RIE (D_{B
_i+1}) will be less than RIE (D_{B
_i}).

Proof. Let an equivalence relation be formed on U. Such a U/B_i+1 = {X₁, X₂, ⋯ , X_n}. But the new division X_k ∪ X_l among the U/B_i = {X₁, X₂, ⋯ , X_k-1, X_k, X_k+1 the ⋯, X_l-1, X_l+1, ⋯ , X_n, X_k ∪ X_l} is a combination of the sub division X_k and X_l. D = {Y₁, Y₂, ⋯ , Y_m} is the division under U, too.

(1) If $\frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ is greater than or equal to 0, and less than $\frac{1}{2}$ , then $\begin{matrix} RIE (D_{B_{i}}) = & RIE (D_{B_{i + 1}}) + | X_{k} \cup X_{l} | \\ \times \sum_{j = 1}^{m} [1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{| X_{k} \cup X_{l} |} / - (1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{| X_{k} \cup X_{l} |})] \\ - | X_{k} | \sum_{j = 1}^{m} [1 - \frac{| X_{k} \cap Y_{j} |}{| X_{k} |} / - (1 - \frac{| X_{k} \cap Y_{j} |}{| X_{k} |})] \\ - | X_{l} | \sum_{j = 1}^{m} [1 - \frac{| X_{l} \cap Y_{j} |}{| X_{l} |} / - (1 - \frac{| X_{l} \cap Y_{j} |}{| X_{l} |})] \end{matrix}$ (18) $\begin{matrix} {RIE}_{Δ} = & RIE (D_{B_{i}}) - RIE (D_{B_{i + 1}}) \\ = & \sum_{j = 1}^{m} [| X_{k} \cup X_{l} | \times (1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{| X_{k} \cup X_{l} |} / - (1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{| X_{k} \cup X_{l} |})) \\ - | X_{k} | \times (1 - \frac{| X_{k} \cap Y_{j} |}{| X_{k} |} / - (1 - \frac{| X_{k} \cap Y_{j} |}{| X_{k} |})) \end{matrix}$ $\begin{matrix} - & | X_{l} | \times (1 - \frac{| X_{l} \cap Y_{j} |}{| X_{l} |} / - (1 - \frac{| X_{l} \cap Y_{j} |}{| X_{l} |}))] \\ = & | X_{k} \cup X_{l} | \times (1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{| X_{k} \cup X_{l} | - | (X_{k} \cup X_{l}) \cap Y_{j} |}) \\ - & | X_{k} | \times (1 - \frac{| X_{k} \cap Y_{j} |}{| X_{k} | - | X_{k} \cap Y_{j} |}) - | X_{l} | \times (1 - \frac{| X_{l} \cap Y_{j} |}{| X_{l} | - | X_{l} \cap Y_{j} |}) \end{matrix}$ (19)

We get the results of |X_k ∪ X_l| = |X_k| + |X_l| and | (X_k ∪ X_l) ∩ Y_j| = |X_k ∩ Y_j| + |X_l ∩ Y_j| because of X_k∩ X_l = ∅. Let |X_k| be x, |X_l| be y, |X_k ∩ Y_j| be ax and |X_l ∩ Y_j| be by. Obviously, x and y are greater than 0, and a and b are greater than equal to 0 and less than $\frac{1}{2}$ . We could get the following conclusion, $\begin{matrix} {RIE}_{Δ} = & \sum_{j = 1}^{m} [- (x + y) \times \frac{ax + by}{x + y - ax - by} + x \\ \times \frac{ax}{x - ax} + y \times \frac{by}{y - by}] \\ = \sum_{j = 1}^{m} \frac{(ax + by) [xb (1 - a) + ya (1 - b)]}{(x + y - ax - by) • (1 - a) • (1 - b)} \geq 0 \end{matrix}$ (20)

(2) If $\frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ is greater than or equal to $\frac{1}{2}$ , and less than or equal to 1, then $\begin{matrix} RIE (D_{B_{i + 1}}) = & \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}) / - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}] \\ = \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | (2 - \frac{| X_{i} |}{| X_{i} \cap Y_{j} |}) \end{matrix}$ (21) $\begin{matrix} RIE (D_{B}) = & RIE (D_{Bi + 1}) + | X_{k} \cup X_{l} | \\ \times \sum_{j = 1}^{m} (2 - \frac{| X_{k} \cup X_{l} |}{| (X_{k} \cup X_{l}) \cap Y_{j} |}) \\ - | X_{k} | \sum_{j = 1}^{m} (2 - \frac{| X_{k} |}{| X_{k} \cap Y_{j} |}) - | X_{l} | \sum_{j = 1}^{m} (2 - \frac{| X_{l} |}{| X_{l} \cap Y_{j} |}) \end{matrix}$ (22) $\begin{matrix} {RIE}_{Δ} = & RIE (D_{B_{i}}) - RIE (D_{B_{i + 1}}) \\ = & \sum_{j = 1}^{m} [\frac{| X_{k} |^{2}}{| X_{k} \cap Y_{j} |} + \frac{| X_{l} |^{2}}{| X_{l} \cap Y_{j} |} - \frac{| X_{k} \cup X_{l} |^{2}}{| (X_{k} \cup X_{l}) \cap Y_{j} |}] \end{matrix}$ (23)

Let |X_k| be x, |X_l| be y, |X_k ∩ Y_j| be ax and |X_l ∩ Y_j| be by. Obviously, x and y are greater than 0, and a and b are greater than equal to $\frac{1}{2}$ and less than or equal to 1. We could get the following conclusion, $\begin{matrix} {RIE}_{Δ} = \sum_{j = 1}^{m} [\frac{x}{a} + \frac{y}{b} - \frac{x + y}{ax + by}] = \sum_{j = 1}^{m} \frac{xy (a - b)^{2}}{ab (ax + by)} \geq 0 \end{matrix}$ (24)

In conclusion, RIE (D_{B
_i+1}) is less than or equal to RIE (D_{B
_i}).

As the relative information entropy is calculated, we must judge the range of $\frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ every time and increase system overhead. The improved relative information entropy is proposed as follows:

Definition 8. Given the decision table DT = (U, C ∪ D, V, f), let every condition attribute B a subset of C. Let an equivalence relation be formed on U as U/B = {X₁, X₂, ⋯ , X_n} and D = {Y₁, Y₂, ⋯ , Y_m} is the division under U, too.

The enhanced information entropy is proposed as follows: $\begin{matrix} {RIE}^{'} (D_{B}) = \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |} / - (1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |})] \end{matrix}$ (25)

Theorem 2. In the domain U, B is condition attribute set and D is decision attribute set. Let U/B_i+1 be {X₁, X₂, …, X_n}, U/B_i be {X₁, X₂, …, X_k-1, X_l+1, …, X_n, X_k ∪ X_l} and U/D be {Y₁, Y₂, …, Y_m}, then RIE′ (D_{B
_i+1}) is less than or equal to RIE′ (D_{B
_i}).

Proof. By Definition 8, the following conclusion can be obtained: $\begin{matrix} {RIE}^{'} (D_{B}) = & {RIE}^{'} (D_{B_{i + 1}}) + | X_{i} \cup Y_{j} | \\ \times \sum_{j = 1}^{m} [1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |} / - (1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |})] \\ - | X_{k} | \sum_{j = 1}^{m} [1 - \frac{| X_{k} \cap Y_{j} |}{2 | X_{k} |} / - (1 - \frac{| X_{k} \cap Y_{j} |}{2 | X_{k} |})] \\ - | X_{l} | \sum_{j = 1}^{m} [1 - \frac{| X_{l} \cap Y_{j} |}{2 | X_{l} |} / - (1 - \frac{| X_{l} \cap Y_{j} |}{2 | X_{l} |})] \end{matrix}$ (26) $\begin{matrix} {RIE}_{Δ}^{'} = & \sum_{j = 1}^{m} [| X_{k} | (\frac{| X_{k} \cap Y_{j} |}{2 | X_{k} |} / - (1 - \frac{| X_{k} \cap Y_{j} |}{2 | X_{k} |})) + \\ | X_{l} | (\frac{| X_{l} \cap Y_{j} |}{2 | X_{l} |} / - (1 - \frac{| X_{l} \cap Y_{j} |}{2 | X_{l} |})) - \\ \frac{| (X_{k} \cup X_{l}) \cap Y_{j} | • | X_{k} \cup X_{l} |}{2 | X_{k} \cup X_{l} |} / - (1 - \frac{| (X_{k} \cup X_{l}) \cap Y_{j} |}{2 | X_{k} \cup X_{l} |})] \end{matrix}$ (27)

Let |X_k| be x, |X_l| be y, |X_k ∩ Y_j| be ax and |X_l ∩ Y_j| be by. Obviously, x and y are greater than 0, and a and b are greater than equal to 0 and less than or equal to 1. We could get the following conclusion, $\begin{matrix} {RIE}_{Δ}^{'} = & \sum_{j = 1}^{m} (\frac{ax}{2 - a} + \frac{by}{2 - b} - \frac{(x + y) (ax + by)}{2 x + 2 y - ax - by}) \\ = & \sum_{j = 1}^{m} \frac{2 xy (a - b)^{2}}{(2 x + 2 y - ax - by) (2 - a) (2 - b)} \geq 0 \end{matrix}$ (28)

In conclusion, RIE′ (D_{B
_i+1}) is less than or equal to RIE′ (D_{B
_i}).

Proposition 3. Given the decision table DT = (U, C ∪ D, V, f) for every subset B ⊆ C. Let U/B be {X₁, X₂, …, X_n}, U/D be {Y₁, Y₂, …, Y_m}, then enhanced information entropy RIE′ (D_B) is greater than or equal to E (D_B) of literature [16].

Proof. By definition [6] and [8], we can get the following conclusion, $\begin{matrix} {RIE}^{'} (D_{B}) - E (D_{B}) = & \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | [1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |} / - (1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |})] \\ - & \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | • \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}) \\ = & \sum_{j = 1}^{m} \sum_{i = 1}^{n} | X_{i} | ([1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |} / - (1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |})] \\ - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |})) \end{matrix}$ (29)

Suppose $\frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ is the equal of _x, then x is greater than equal to 0 and less than equal to 1. We can draw the following conclusion: $\begin{matrix} {RIE}^{'} (D_{B}) - E (D_{B}) = & \frac{2 - 2 x}{2 - x} - x (1 - x) \\ = & \frac{(1 - x) [(x - 1)^{2} + 1]}{2 - x} \geq 0 \end{matrix}$ (30)

So RIE′ (D_B) is greater than or equal to E (D_B).

3.2 Improved information entropy

Definition 9. Let DT = (U, C ∪ D, V, f) be a decision table, for every subset B ⊆ C. Let U/B be {X₁, X₂, ⋯ , X_n}, U/D be {Y₁, Y₂, …, Y_m}.

The blended entropy consists of approximate boundary precision and relative information entropy based on D relative to B, the definition will be as follows: $A B I E (D | B) = {\begin{matrix} \sum_{j = 1}^{m} \sum_{i = 1}^{n} [{β^{'}}_{B} (Y_{j})] • | X_{i} | • [1 - (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}) / \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}] \\ \begin{matrix} \frac{1}{2} \leq \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} \leq 1 \\ \begin{matrix} \sum_{j = 1}^{m} \sum_{i = 1}^{n} [{β^{'}}_{B} (Y_{j})] • | X_{i} | • (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} / (1 - \frac{| X_{i} \cap Y_{j} |}{| X_{i} |})) \\ 0 \leq \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} < \frac{1}{2} \end{matrix} \end{matrix} \end{matrix}$ (31)

The definition of approximate boundary precision enhanced information entropy will be as follows: $\begin{matrix} A B I E^{'} (D | B) = \sum_{j = 1}^{m} \sum_{i = 1}^{n} [1 - {β^{'}}_{B} (Y_{J})] • | X_{i} | \\ • [\frac{1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |}}{(1 - \frac{| X_{i} \cap Y_{j} |}{2 | X_{i} |})}] \end{matrix}$ (32)

${β^{'}}_{B} (Y_{j})$ is the amended boundary precision of Y_j as Definition 5 (1 ≤ i ≤ n, 1 ≤ j ≤ m).

There are algorithms based on algebra view and algorithms based on information view in the fields of attribute reduction. The algorithm of algebra view can measure the completeness of knowledge effectively, but it is not ideal to measure the size of knowledge. On the contrary, information view can measure the size of knowledge effectively, but there is nothing to be done about the completeness of knowledge. Obviously, study on attribute reduction from the sole algebra view or information view is one-sided. As can be seen from Definition 9, the information entropy enhanced by approximate boundary precision effectively combines the characteristics of information view and algebra view. The new model considers both completeness and the size of knowledge compared with the traditional information entropy model. So the new information entropy can measure the uncertainty of the rough set comprehensively and objectively.

Theorem 3. Let DT = (U, C ∪ D, V, f) be a decision table, for every subset B of C, a is subset of C - B, the following conclusions are reliable: ${ABIE}^{'} (D | B) \geq {ABIE}^{'} (D | (B \cup a)),$ (33) $ABIE (D | B) \geq ABIE (D | (B \cup a))$ (34)

Proof. Since the definition of the approximate boundary precision enhanced information entropy is up of the combination of the approximate boundary precision and the enhanced information entropy. It is proved from the following two aspects:

(1) According to the definition of U/B and the partial ordering of $\underline{≺}$ , we can get the result of $U / (B \cup a) \underline{≺} U / B$ . For every subset Y of U, $| Y_{B}^{*} | ⩾$ is greater than or equal to $| Y_{B \cup a}^{*} |$ and |Y_B*| is less than or equal to |Y_B∪a * |. By Definition 5, we can get the following conclusion: ${β^{'}}_{B} (Y) = \frac{| Y *_{B} | / | Y *_{B} |}{2 - | Y *_{B} | / | Y *_{B} |} \leq \frac{| Y_{B \cup a *} | / | Y_{B \cup a *} |}{2 - | Y_{B \cup a *} | / | Y_{B \cup a *} |} = {β^{'}}_{B \cup a} (Y)$

So, it can be concluded that the value of is less than or equal to .

(2) Let U/D be {Y₁, Y₂, ⋯ , Y_m}. By Theorem 2, we can get the following conclusion: $\begin{matrix} \sum_{i = 1}^{n} \sum_{X \in U / (B \cup a)} [1 - \frac{| X \cap Y_{i} |}{2 | x |} / (1 - \frac{| X \cap Y_{j} |}{2 - | X |})] \\ \leq \sum_{i = 1}^{n} \sum_{X \in U / B} [1 - \frac{| X \cap Y_{i} |}{2 | x |} / (1 - \frac{| X \cap Y_{j} |}{2 - | X |})] \end{matrix}$

That is to say, RIE′ (D|B) is greater than or equal to RIE′ (D| (B ∪ a)). Similarly, by Theorem 1, we can get the result of RIE (D|B) ≥ RIE (D| (B ∪ a)).

In summary, Theorem 3 is correct.

4 The algorithm of attribute reduction

By Theorem 3, we can know that both function ABIE′ () and ABIE () have non-strict monotonicity. Therefore, we can use the characteristics of approximate boundary precision enhanced information entropy (or relative information entropy) to measure the importance of attribute, and based on which reduction set can be calculated.

Definition 10. Let DT = (U, C ∪ D, V, f) be a decision table, ∀B ⊆ C. Suppose the value of ABIE′ (D|B) is equal to ABIE′ (D|C), and every subset a of B and ABIE′ (D|B - {a}) is not equal to ABIE′ (D|C), then B is an attribute reduction set of approximate boundary precision enhanced information entropy about C relative D.

Definition 11. Let DT = (U, C ∪ D, V, f) be a decision table, B ⊆ C and ∀b ∈ C - B. The significance of relative information entropy of b in B is defined as

$Sig (b, B, D) = ABIE (D | B) - ABIE (D | B \cup {b})$ (35)

The significance of enhanced information entropy is defined as ${Sig}^{'} (b, B, D) = {ABIE}^{'} (D | B) - {ABIE}^{'} (D | B \cup {b})$ (36)

There is a large number of calculations of |X_i ∩ Y_j| in process of getting information entropy. X_i is the sub-division of U/B. Y_j is the sub-division of _U/D. The solution code of X*Y is as follows:

char[ ] intersect(string X, string Y)

{ Char[ ] Z;

int i = 0,j = 0,k=0;

while(i < X.length && j < Y.length)

{ if (X[i] = =y[j])

{ Z[k++] = X[i]; i++; j++; }

else if (X[i] > Y[j])

j++;

else

i++; }

return Z; }

In this paper, the time complexity of computing intersection of two sets is reduced from |X||Y| to|X| + |Y|, among which |X| is the radix of set X, and |Y| is the radix of set Y.

enlargethispage 5pt

Algorithm 1. The calculation of Approximation Boundary Precision Enhancement Information Entropy and Relative Information Entropy

Input: Let DT = (U, C, D, V, f) be a decision table, U/B = {X₁, X₂, ⋯ , X_n}, U/D = {Y₁, Y₂, ⋯, Y_m}, and B ⊆ C.

Output: ABIE′ (D|B) and ABIE (D|B).

int ABIE′ = ABIE = 0;

int type;

for (i from 1 to m)//The number of sub-division of U/D is m.

{ int t1 = t2 = t3 = t4 = count = 0;

input type;

for (j from 1 to n)//The number of sub-division of U/B is n.

{count = count + |X_j|;

temp = intersect (X_j, Y_i).length;//statistics the numbers of lower approximations

if (temp==|X_j|) then

t1 = t1+ |X_j| ;

else if (temp = =0)

t2 = t2 + |X_j|;

else { switch(type) {

Case 1://compute enhanced information entropy $t 3 = t 3 + | X_{j} | • \frac{2 (1 - t e m p / | X_{j} |)}{2 - t e m p / | X_{j} |};$

break;

Case 2://compute relative information entropy

if $(0 \leq \frac{temp}{| X_{j} |} ≺ \frac{1}{2}) t 4 = t 4 + | X_{j} | • [1 - \frac{temp}{| X_{j} |} / -$ $(1 - \frac{temp}{| X_{j} |})]$

else

$t 4 = t 4 + | X_{j} | • [1 - (1 - \frac{temp}{| X_{j} |}) / - \frac{temp}{| X_{j} |}];}$

break; } }

t2 = count – t2;//statistics the numbers of upper approximations $ABAE = ABAE + (1 - \frac{t 1 / - t2}{2 - t 1 / - t2}) • t 4$ ; ${ABAE}^{'} = {ABAE}^{'} + (1 - \frac{t 1 / - t2}{2 - t 1 / - t2}) • t 3;}$

Algorithm 2. Information entropy reduction and relative information entropy reduction based on approximate boundary precision enhancement

Input: DT = (U, C, D, V, f);

Output: R.

Step 1. R = ∅ , C^′ = C;

Step 2. calculate U/C, U/D;

Step 3. calculate ABIE′ (D|C) and ABIE (D|C) ;

Step 4. while (1)

{ calculate $\underset{b_{i} \in C^{'}}{Max} {Sig}^{'} (b_{i}, R, D)$ and let ${Sig}^{'} (b_{k}, R, D) = \underset{b_{i} \in C^{'}}{Max} {Sig}^{'} (b_{i}, R, D)$ , $\underset{b_{i} \in C^{'}}{Max} Sig$ (b_i, R, D) and let $Sig (b_{k}, R, D) = \underset{b_{i} \in C^{'}}{Max} Sig (b_{i},$ R, D), meanwhile let 1 ≤ i ≤ |C′|. If the importance of multiple attributes achieve its maximum, then choose one b_k;

Let R = R ∪ b_k and C′ = C′ - b_k;

if (ABIE′ (R|C) equal to ABIE′ (D|C))

output the reduction R of enhanced information entropy;

if (ABIE (R|C) equal to ABIE (D|C))

output the reduction R of relative information entropy.}

The main time expense of the algorithm is to find information entropy. If an attribute is added to reduction set R, the approximate boundary precision information entropy should be recalculated. Because of the flexible use of _U/B to calculate U/(B ∪ b), the time of calculation A is reduced to O (|U|). In the process of calculating attribute reduction,any one of the most important attributes is added to the reduction set every time, the worst time complexity of the algorithm is O (|C|²|U|).

5 Experimental analysis

To further verify the performance of algorithm in this paper and other similar algorithms, we use VS2012 based on Windows7 system as development tool. This is a kind of machines with 4 GB main memory and Intel(R) Coretrademark i7-4712MQ Processor (2 cores in all, each having a clock frequency of 2.3 GHZ. In this paper, six data sets are all downloaded from UCI repository of machine learning databases, as following, (1) Ecoli data set with 336 samples, 8 attributes and 8 classes; (2) Chess King-Rook vs. King-Pawn(Chess) data set with 3196 samples, 37 attributes and 2 classes; (3) Tic-Tac-Toe endgame(Tic) data set with 958 samples,10 attributes and 2 classes; (4) Mushroom(Mush) data set with 8124 samples,23 attributes and 2 classes. (5) Zoo data set with 101 elements, 17 attributes and 7 classes; (6) Lymphography (Lymph) data set, with 148 samples, 19 attributes and 4 classes;

enlargethispage 2pt

This paper is compared with the following several representative algorithms.

(1) Approximation decision entropy based method (_ADEAR) [17]; (2) Relative fuzzy entropy based method (_RFE) [16]; (3) Combination conditional entropy based method (_CCE) [19]. (4) Conditional entropy based method (_CE) [14].

The experiment was compared on the results of reduction, classification precision and running time. Table 1 shows the reduction of the different algorithms based on the above six data sets, of which RN is the radix of minimum attribute reduction. From it, it can be seen that the three algorithms of CE, RFE and CCE only study information entropy with calculating the significance of attribute, affecting the effect of reduction. However, the algorithm of _ADEAR integrates approximate precision into information entropy; _ABIE integrates approximation precision into relative information entropy and ABIE′ integrates approximate boundary precision into enhanced information entropy. These algorithms combine the concept of algebra view and information view effectively, which can measure the importance of attribute comprehensively and objectively. There is better reduction effect with the six data sets in Table 1. By comparing Algorithm ABIE with ABIE′, CE, RFE and CCE in running time, the results are shown in Table 1.

Table 1
Comparison of reduction result (residue number/run time(s))

DataSet RN CE RFE CCE ADEAR ABIE ABIE’

RN₁ Time RN₂ Time RN₃ Time RN₄ Time RN₅ Time RN₆ Time

Ecoli 5 6 2.63 6 2.25 6 0.145 5 0.036 5 0.038 5 0.029

Chess King-Rook vs. Pawn 29 30 279.12 30 127.95 29 18.67 29 1.65 29 1.69 29 1.58

Tic-Tac-Toe 8 8 8.73 8 6.76 8 0.36 8 0.101 8 0.103 8 0.089

Mushroom 4 5 300.22 5 166.92 5 24.88 5 1.80 5 1.88 4 1.67

Zoo 5 9 2.921 5 1.022 5 0.102 5 0.042 5 0.049 5 0.042

Lymphography 6 8 3.124 6 1.36 7 0.126 6 0.053 6 0.054 6 0.048

DataSet	RN	CE	RFE	CCE	ADEAR	ABIE	ABIE’
Ecoli	5	6	2.63	6	2.25	6	0.145	5	0.036	5	0.038	5	0.029
Chess King-Rook vs. Pawn	29	30	279.12	30	127.95	29	18.67	29	1.65	29	1.69	29	1.58
Tic-Tac-Toe	8	8	8.73	8	6.76	8	0.36	8	0.101	8	0.103	8	0.089
Mushroom	4	5	300.22	5	166.92	5	24.88	5	1.80	5	1.88	4	1.67
Zoo	5	9	2.921	5	1.022	5	0.102	5	0.042	5	0.049	5	0.042
Lymphography	6	8	3.124	6	1.36	7	0.126	6	0.053	6	0.054	6	0.048

As shown in Table 1, the running time of algorithm ABIE is much less than CE and RFE under the same data reduction. As the algorithms of ADEAR and ABIE utilize efficient methods to save a lot of system time cost overhead. The calculation of attributes importance is computationally equivalent to those algorithms, so there is no significant difference in time performance. Method ABIE′ has less judgment steps than method ABIE. And by using the characteristics of information entropy expansion, the most important attributes can be found quickly and the speed of reduction can be increased, so the time performance is superior.

(2) In order to further verify the algorithm based on information entropy and positive domain (FABS [20]), and the accuracy of the classification of ABIE′ in this paper, the reduction results of each algorithm are used as the input of SVM classifier. In order to improve the classification accuracy of SVM classifier, 25% of the data is for the test and 75% for the training set. The “Radial Basis Function” is used as kernel function to adjusts the Lagrange multiplier upper bound and the radial basis parameters. And the ten fold cross-validation is applied to evaluating all reduction classification accuracy. The results are shown in Fig 1.

Fig. 1.

Comparison of classification accuracy.

According to the classification results in Fig.1, the higher classification accuracy of reduction results can be obtained in such data sets as Ecoli, Chess, Tic, Mush, Zoo, and Lymph. The classification accuracy of CE reduction results is not ideal, as it calculates the significance of attribute by considering the information entropy only. The reduction performance of _FABS algorithm is relatively stable, but the classification accuracy is lower in most cases, because it only considers the distinguishing ability of positive domain samples in decision table without considering the negative domain samples. Especially, when calculating the importance of attributes using the positive region method, it is often the case where the attribute importance values of multiple condition attributes are the same. Since each calculation of attribute significance is relative, the randomly selected attributes will affect the next attribute selection. By adopting two methods of enlarging entropy of information in this paper and combining with the importance of attributes of approximate boundary accuracy in algebraic view, the subtle differences among samples can be effectively measured. So it shows that the classification effect of the algorithm in this paper has obvious advantage in the subsequent classification work.

(3) In order to study the influence of outliers on the reduction of the algorithm, the Euclidean distance is used to calculate the spatial distance between each sample object, and then the number of outliers is calculated by using Radial criterion. The numbers of outliers in above six data sets are as follows, Ecoli(4), Chess(188), Tic(0), Lymph(5), Zoo(2), Mush(156). If the ratio of the number of outliers to the total samples is defined as rate of outliers, then the rate of outliers of each data set is following, Ecoli(11.9‰), Chess(58.8‰), Tic(0‰), Lymph(33.8‰), Zoo(19.8‰), Mush(19.2‰). Among the above six data sets, the classification accuracy of the algorithm ABIE′ is lower than the algorithm FABS in the data set chess and Lymph. Because the data set Chess or Lymph has a higher the rate of outliers, the classification performance is affected by the rate of outliers. By contrast, the algorithm FABS is less affected by the rate of outlier and its classification accuracy is higher than the ABIE′ algorithm. In order to improve the performance of ABIE′, it is necessary to find out the outliers and minimize the impact before the attribute reduction.

6 Conclusion

The attribute reduction is one of the cores of Rough Set Theory. Most of the researchers design algorithms of reduction only from the algebraic view or information view. A method to calculate the importance of attribute, which integrates the characteristics of algebraic view and information view, is proposed in this paper, which has higher classification accuracy. In addition, by making the most of results of U/B in the calculation of U/(B ∪ b), the time of calculating equivalent is reduced greatly and system resources are saved.

The next step is to consider how to reduce the computational capacity in the case of maintaining high precision classification to further improve the time performance of algorithm and reduce the time complexity of the algorithm by combining computation with modern multi-core processors.

Footnotes

Acknowledgments

This work is supported by the studying project of visiting home and abroad of Anhui Province of China gxfx2017100, by the natural science foundation of Anhui Province 1308085MF101, by the natural science foundation of Anhui Province, KJZ2013Z231, KJ2016A502.

References

Pawlak and

Skowron , Rough sets: Some extensions, Inform Sci 117(1) (2007), 28–40.

Pawlak , Rough sets, Int J Comput Inform Sci 11(5) (1982), 341–356.

Zhang and

Wei , Rules acquisition and attribute reduction of ordered formal decision contexts, PR AI 29(11) (2016), 976–984.

C.C.

Huang ,

Tzu-Liang , Tseng ,

F.H.

Jiang , et al., Rough set theory: A novel approach for extraction of robust decision rules based on incremental attributes, Ann Oper Res 216(1) (2014), 163–189.

Kim ,

Y.Y.

Chu ,

J.Z.

Watada , et al., A DNA-based algorithm for minimizing decision rules: A rough sets approach, IEEE Trans Nanobiosci 10(3) (2011), 139–151.

Tao ,

Jialin ,

Xudong and

Houhe , Research of aircraft generator fault diagnostic decision based on attribute reduction in variable precision rough sets, Appl Res Comput 34(4) (2017), 1101–1104.

Ying and

Gaopen , Goal mine safety risk prediction by RS-SVM combined model, J China Univ Mining Technol 46(2) (2017), 423–429.

Baohua and

Shiyi , Fast attribute reduction algorithm based on row storage, Pattern Recogn ArtifIntell 8(9) (2015), 795–801.

Bowen ,

Guohe ,

Weijiang , et al., Attribute reduction method based on enhanced positive region, Appl Res Comput 34(1) (2017), 107–109.

10.

Taotao ,

Fumin and

Tengfei , Incremental algorithm for attribute reduction based on positive region and dis-cernibility element, Comput. Eng 42(8) (2016), 183–187, 193.

11.

Hao and

Chao , Incremental updating algorithm for attribute reduction based on improved discernibility matrix, Comput Sci 42(6) (2015), 251–255.

12.

Yu , Attribute reduction with rough set based on dis-cernibility information tree, Control Dec 30(8) (2015), 1531–1536.

13.

C.E.

Shannon , The mathematical theory of communication, Bell Syst Tech J 27(3-4) (1948), 373-423.

14.

Guoying ,

Hong and

Dachun , Decision table reduction based on conditional information entropy, Chin J Comput 25(7) (2002), 759–776.

15.

Ming , Approximate reduction based on conditional information entropy in decision tables, Acta Electron Sin 35(11) (2007), 2156–2160.

16.

Qinghua and

Yu , New attribute reduction on information entropy, J Front Comput Sci Technol 7(4) (2013), 359–367.

17.

Feng ,

Shasha ,

Junwei , et.al., Attribute reduction based on approximation decision entropy, Control Decision 30(1) (2015), 65–70.

18.

J.R.

Vergara and

P.A.

Estevez , A review of feature selection methods based on mutual information, Neural Comput Appl 24(1) (2014), 175–186.

19.

Yuhua ,

Jiye ,

Pedrycz , et al., Positive approximation: An accelerator for attribute reduction in rough set theory, ArtifIntell 174(9) (2010), 597–618.

20.

Yu ,

Yintian and

Chao , Fast algorithm for attribute reduction based on bucket sort, Control Decision 26(2) (2011), 207–212.