A granulated fuzzy rough set and its measures

Abstract

The theory of rough sets is an efficient mathematical tool for dealing and reasoning with uncertainty information systems. The measures of traditional rough sets are applicable to discrete-valued information systems, but not suitable to real-valued data sets. In this paper, by introducing a distance matrix to granulate these real-valued data, a granulated fuzzy rough set model is proposed, which combines fuzziness and roughness into a rough set theoretical framework. By constructing a fuzzy similar relation with a distance matrix form, real-valued data sets can be deal with. We also define some operations on the fuzzy relations and fuzzy granules. Furthermore, two kinds of measures of fuzzy granules are proposed, which are information entropy measure and information granularity measure. These measures are calculated by a novel representation with a fuzzy granule matrix. As a result, uniform representations of fuzzy rough sets and their information measures are formed in this work.

Keywords

Granular computing fuzzy rough sets uncertainty measures information entropy information granularity

1 Introduction

The theory of rough sets [46] proposed by Pawlak in 1982 is a valid mathematical tool for dealing with the imprecise, uncertain and tremendous data in information systems. The basic idea of rough sets uses two approximate sets to describe a vague object set. The two approximate sets, named upper and lower approximation, contain some knowledge granules induced by equivalence relations with respect to some attributes. The theory of rough sets has been widely used in many fields, such as feature selection [27], gene selection [29], attribute reduction [5], big data mining [23], approximate reasoning [17], granular computing [33], image segmentation [26] and classification [43].

In order to tackle the widely existences of vague data, many uncertainty measures are proposed in rough sets. Pawlak [47] presented accuracy and roughness for evaluating a rough set in an information system, and also introduced approximation accuracy and approximation roughness for measuring a rough classification in a decision system. The Pawlak’s uncertainty measures are not meticulous, since there is a case that the two different equivalence class sets probably have the same accuracy or roughness. Therefore, many researchers have proposed other measures to overcome the deficiency from different perspectives, including information quality [25], information entropy [11], rough set entropy, approximate quality [20], knowledge granularity [12], etc. Since these measures target to evaluate equivalence relations and equivalence classes, other authors have presented uncertainty measures and applications for neighborhood rough set models [18, 31, 42, 44], probabilistic rough set models [10, 35], covering rough set models [9, 39], cost-sensitive rough set models [15] and fuzzy rough set models [21, 28, 32].

The theory of Pawlak rough sets is mainly applicable to the information systems with discrete data. As for widely existing continuous data, a discretization should be implemented. However, this preprocessing will result in the loss of information, reducing the classification accuracy. The rough set model is extended to a fuzzy set domain by replacing the equivalence relation with fuzzy equivalence relation. The usefulness of fuzzy rough set model is evident from its applications in feature selection [1, 7, 40], gene selection [30], attribute reduction [8, 13, 37], hybrid data reduction [34], mining stock price [3, 41], rule extraction [2, 16], decision trees [22, 36], cancer classification [38] and image classification [4]. Fuzzy equivalence relation satisfying reflexivity, symmetry and transitivity, is similar to general equivalence relation. In practise, there is a more general and extensive relation only fulfilled with reflexivity and symmetry. It is called as a similarity relation. Actually, distances between objects can describe their similarity. In this paper, we proposed a novel fuzzy similarity relation based on the granulation by a distance matrix. Fuzzy similarity relation categorizes objects in the classes with fuzzy boundaries depended on their similarity according to their distances. The fuzzy similar classes can be seen as fuzzy similar granules that are close to the human decision process. Furthermore, a distance-based fuzzy rough set model is proposed by building upper and lower approximations with fuzzy similar granules. We also tend to granulate data for simplification and memorization of information in our cognition. Since the similarity relation is not the same as an equivalence relation, the classical uncertainty measurement tools and methods are not applicable to the fuzzy knowledge classification systems. Considering the characteristics of vague and fuzzy data, we use the fuzzy rough set model to granulate those data. Moreover, we study the uncertainty measures in the fuzzy rough set model. After introducing several measures in classical rough sets, we propose naturally extensional measures of uncertainty for information systems.

The remainder sections are structured as follows. An introduction to rough sets and several measures are presented in Section 2. In Section 3, a granulated fuzzy rough set model is introduced. Then we propose two uncertainty measures in our proposed fuzzy rough set model, which are information entropy-based and information granularity-based measures. Furthermore, we prove some theorems associated with the proposed measures. In Section 4, for contributing to understand further concepts of information entropy and information granularity, we carry out some experiments in decision systems. Finally, this paper is concluded with some discussions and remarks in Section 5.

2 Preliminaries

In the following section, some basic concepts in Pawlak rough sets are recalled, which can be found in [46] and [47]. These definitions mainly include equivalence relation, upper and lower approximations. Furthermore, we also introduce some uncertainty measures in Pawlak rough sets.

2.1 Pawlak rough sets

The information systems are formalized representations of some practical application systems. Generally, an information system is expressed by a quadruple IS = (U, A, V, f), where U = {x ₁, x ₂,…, x _m} is an m-dimensional sample set; A = {a ₁,…, a _n} is an n-dimensional attribute or feature set; V = ⋃ _a∈A V _a is a union of values on attribute a and f : U × A → V is a mapping function.

For any attribute subset P ⊆ A, an equivalence relation is represented by IND (P) in the following:

IND (P) = {(x, y) ∈ U × U| ∀ p ∈ P, f (x, p) = f (y, p)}.

The IND (P) satisfies reflexivity, symmetry and transitivity. A partition of U induced by the IND (P) is denoted by U/IND (P) or U/P. We define the partition by:

U/P = {[x _i] _P : x _i ∈ U where [x _i] _P represents an equivalence class belonging to a sample x _i. The elements of an equivalence class are equivalent to each other. Equivalence classes are used to depict arbitrary subsets of U, which are also named as information granules, elementary sets and blocks.

Definition 1. [46] Given an information system IS = (U, A, V, f) and an attribute subset P ⊆ A, for any sample subset X ⊆ U, the lower and upper approximations of X on attribute subset P are defined by:

P _* (X) = {x _i ∈ U| [x _i] _P ⊆ X},

P* (X) = {x _i ∈ U| [x _i] _P ∩ X ≠ ∅},

where [x _i] _P represents an equivalence class on P of a sample x _i.

The lower approximation consists of samples which equivalence classes are contained in X, while the upper approximation consists of samples which equivalence classes are intersected with X. The tuple <P _* (X), P* (X)> is named as a rough set of X, if the lower approximation is not equal to the upper approximation. The rough set degrades into a crisp set, if they are equal.

Definition 2. [46] Given an information system IS = (U, A, V, f) and an attribute subset P ⊆ A, for any sample subset X ⊆ U, the positive region, negative region and boundary region of X on attribute subset P are defined by: $\begin{matrix} {POS}_{P} (X) = P_{*} (X), \\ {NEG}_{P} (X) = U - P^{*} (X), \\ {BND}_{P} (X) = P^{*} (X) - P_{*} (X) . \end{matrix}$

2.2 Uncertainty measures in Pawlak rough sets

In rough sets, there are two aspects uncertainty measures: algebra representation measures and entropy representation measures. On the one aspect, the algebra representation measures include accuracy, approximation accuracy, roughness and approximation roughness. On the other aspect, there are mainly three entropy representation measures which are information entropy, conditional entropy and mutual information.

The accuracy measures the imprecision of a rough set by the ratio of lower approximation to upper approximation. The roughness is an inverse of accuracy by a subtraction.

Definition 3. [47] Given an information system IS = (U, A, V, f), for any object subset X ⊆ U and any attribute subset P ⊆ A, the accuracy and roughness of X on attribute subset P are defined as follows: $\begin{matrix} α_{P} (X) = \frac{| P_{*} (X) |}{| P^{*} (X) |}, \\ ρ_{P} (X) = 1 - α_{P} (X) . \end{matrix}$

Definition 4. [47] For an information system IS = (U, A, V, f), if A = C ∪ D, where C is a conditional attribute and D is a decisional attribute, we call it a decision system, simply noted as DS = (U, C ∪ D, V, f).

Definition 5. [47] Let DS = (U, C ∪ D, V, f) be a decision system, and U/D = D ₁, D ₂,…, D _m be equivalence classes deduced by a decisional attribute D on the universe U. For any conditional attribute subset B ⊆ C, the approximation accuracy measure of U/D by B is defined as $α_{B} (U / D) = \frac{\sum_{D_{i} \in U / D} | B_{*} (D_{i}) |}{| B^{*} (D_{i}) |}$

Approximation roughness measure is calculated by subtracting the approximation accuracy, which is defined as $ρ_{B} (U / D) = 1 - α_{B} (U / D)$

The measures of accuracy and roughness are used for evaluating uncertainties of information systems, while the approximation accuracy and approximation roughness are used for evaluating uncertainties of decision systems. However, these measures are not enough meticulous. Liang et al. [24] explained that the two different partitions may reach the same accuracy or roughness.

In 1948, Shannon [6] originally proposed the information theory to measure the uncertainty of a channel transmission. Düntsch [19], Yao [45] and Miao [11] developed various entropy measures for evaluating uncertainties of attributes or features of information systems.

Definition 6. [11, 19] Suppose IS = (U, A, V, f) is an information system. For any subset P ⊆ A and U/P = {X ₁, X ₂,…, X _n}, the information entropy of P is defined as follows:

$H (P) = - \sum_{i = 1}^{n} p (X_{i}) log p (X_{i}) = - \sum_{i = 1}^{n} \frac{| X_{i} |}{| U |}$ $log \frac{| X_{i} |}{| U |}$ , where $p (X_{i}) = \frac{| X_{i} |}{| U |}$ , and the |. | is a cardinality of a set.

Definition 7. [14, 45] Given an information system IS = (U, A, V, f), for any two subsets P, Q ⊆ A, U/P = {X ₁, X ₂,…, X _n} and U/Q = {Y ₁, Y ₂,…, Y _m}, the joint entropy of P and Q is defined by: $H (PQ) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} p (X_{i} Y_{j}) log p (X_{i} Y_{j}) = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{| X_{i} \cap Y_{j} |}{| U |} log \frac{| X_{i} \cap Y_{j} |}{| U |}$ , where $p (X_{i} Y_{j}) = \frac{| X_{i} \cap Y_{j} |}{| U |}$ .

Definition 8. [14, 45] Suppose IS = (U, A, V, f) is an information system. For any subset P, Q ⊆ A, U/P = {X ₁, X ₂,…, X _n} and U/Q = {Y ₁, Y ₂,…, Y _m}, the conditional entropy of Q on attribute subset P is defined by:

$H (Q | P) = - \sum_{i = 1}^{n} p (X_{i}) \sum_{j = 1}^{m} p (Y_{j} | X_{i}) log p (Y_{j} | X_{i}) = - \sum_{i = 1}^{n} \frac{| X_{i} |}{| U |} \sum_{j = 1}^{m} \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} log \frac{| X_{i} \cap Y_{j} |}{| X_{i} |} = - \sum_{i = 1}^{n} \sum_{j = 1}^{m} \frac{| X_{i} \cap Y_{j} |}{| U |} log \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ , where $p (Y_{j} | X_{i}) = \frac{| X_{i} \cap Y_{j} |}{| X_{i} |}$ .

3 A granulated fuzzy rough set and its measures

The traditional rough set theory is used to tackle categorical data in information systems. As for numerical data, a discretization should be implemented before a machine learning task. However, this preprocessing will lead to an information loss. In the follows, a granulated fuzzy rough set model is presented to deal with categorical data in a knowledge representation system. The fuzzy similar relation is founded by a distance matrix, then similar granules are achieved by a granulating method. Furthermore, several uncertainty measurement tools are proposed to meticulously measure fuzzy relations or fuzzy granules.

3.1 Fuzzy granulating with a distance matrix

The granulating idea of Pawlak rough sets is founded on equivalence classes and equivalence relations. In this section, we will introduce a distance matrix to describe a similarity between two objects. Then, we construct a fuzzy similar relation from the distance matrix.

Definition 9. Suppose IS = (U, A, V, f) is an information system. For an object x _i ∈ U and an attribute a _j ∈ A, the values of the information system are normalized by:

$v^{'} (x_{i}, a_{j}) = \frac{v (x_{i}, a_{j}) - \min (v (x_{k}, a_{j}))}{\max (v (x_{k}, a_{j})) - \min (v (x_{k}, a_{j}))}$ , where v (x _i, a _j) is the value of object x _i on attribute a _j, max (v (x _k, a _j)) is the maximal value on a _j, and min (v (x _k, a _j)) is the minimal value on a _j. For simplicity, all the values of the following information systems are normalized.

Definition 10. Given an information system IS = (U, A, V, f), for a subset B ⊆ A, a distance matrix of B, noted as D (B), is defined by: $\begin{matrix} D (B) = (\begin{matrix} d_{11} & d_{12} & . . . & d_{1 n} \\ d_{21} & d_{22} & . . . & d_{2 n} \\ . . . & . . . & d_{ij} & . . . \\ d_{n 1} & d_{n 2} & . . . & d_{nn} \end{matrix}) \end{matrix}$ where d _ij ∈ [0, 1] is the distance metric between two objects x _i and x _j on B. A distance metric is a function mapping a pair of objects into a non-negative numerical value.

Definition 11. Suppose x _i and x _j are two objects in an information system IS = (U, A, V, f). For a subset B ⊆ A, the distance metric d _ij on B is generally calculated as follows:

$d_{ij} = (\sum_{k = 1}^{m} | f (x_{i}, a_{k}) - f (x_{j}, a_{k}) |^{s})^{1 / s}$ , where m = |B|, and f (x _i, a _k) is the value of object x _i on attribute a _k. The s is a parameter, while its variable values represent different distances. For example, if s = 1, it is a Manhattan distance; if s = 2, it is an Euclidean distance.

Example 1. Suppose IS = (U, A, V, f) is a normalized information system. It includes an object set U = {x ₁, x ₂, x ₃, x ₄} and an attribute set A = {a, b, c}, as shown in Table 1.

Table 1
An information system.

U a b c

x ₁ 0 1 1

x ₂ 0.21 0.86 0

x ₃ 0.49 0.1 0.26

x ₄ 1 0 0.19

U	a	b	c
x ₁	0	1	1
x ₂	0.21	0.86	0
x ₃	0.49	0.1	0.26
x ₄	1	0	0.19

We use the Euclidean distance to granulate these objects in the information system. Three distance matrices of the attribute sets R = {a}, S = {a, b} and T = {a, b, c} are achieved as follows: $\begin{matrix} D (R) = (\begin{matrix} 0 & 0.21 & 0.49 & 1 \\ 0.21 & 0 & 0.28 & 0.79 \\ 0.49 & 0.28 & 0 & 0.51 \\ 1 & 0.79 & 0.51 & 0 \end{matrix}) \end{matrix}$ $\begin{matrix} D (S) = (\begin{matrix} 0 & 0.25 & 1.02 & 1.41 \\ 0.25 & 0 & 0.81 & 1.17 \\ 1.02 & 0.81 & 0 & 0.52 \\ 1.41 & 1.17 & 0.52 & 0 \end{matrix}) \end{matrix}$ $\begin{matrix} D (T) = (\begin{matrix} 0 & 1.03 & 1.26 & 1.63 \\ 1.03 & 0 & 0.85 & 1.18 \\ 1.26 & 0.85 & 0 & 0.53 \\ 1.63 & 1.18 & 0.53 & 0 \end{matrix}) \end{matrix}$

Definition 12. For two distance matrices D (P) and D (Q), suppose p _ij and q _ij are elements of D (P) and D (Q) respectively. If ∀p _ij ≤ q _ij, we say that distance matrix D (P) is not more than D (Q), noted as D (P) ≤ D (Q).

Theorem 1. Suppose IS = (U, A, V, f) is an information system. For P, Q ⊆ A, D (P), D (Q) are two distance matrices on P, Q. If P ⊆ Q, then D (P) ≤ D (Q).

Proof. Since P ⊆ Q, then Q = P ∪ R, thus m = |P| ≤ n = |Q|. According to the Definition 11, we have $p_{ij} = (\sum_{k = 1}^{m} | f (x_{i}, a_{k}) - f (x_{j}, a_{k}) |^{s})^{1 / s}$ and $q_{ij} = (\sum_{k = 1}^{n} | f (x_{i}, a_{k}) - f (x_{j}, a_{k}) |^{s})^{1 / s}$ . Therefore, p _ij ≤ q _ij. From the Definition 12, we have D (P) ≤ D (Q).

Definition 13 Given an information system IS = (U, A, V, f), for B ⊆ A, the D (A) and D (B) are two distance matrices on A, B. The D′ (B) is normalized by:

D′ (B)=D (B)/max (D (A)), where max (D (A)) is a constant, which is the maximal value of elements of D (A). For simplicity, all the distance matrices in the follows are normalized.

Example 2. From the Example 1, we know that the max (D (T)) is 1.63, so the above three distance matrices are normalized as follows: $\begin{matrix} D (R) = (\begin{matrix} 0 & 0.13 & 0.3 & 0.61 \\ 0.13 & 0 & 0.17 & 0.48 \\ 0.3 & 0.17 & 0 & 0.31 \\ 0.61 & 0.48 & 0.31 & 0 \end{matrix}) \end{matrix}$ $\begin{matrix} D (S) = (\begin{matrix} 0 & 0.15 & 0.63 & 0.7 \\ 0.15 & 0 & 0.5 & 0.72 \\ 0.63 & 0.5 & 0 & 0.32 \\ 0.7 & 0.72 & 0.32 & 0 \end{matrix}) \end{matrix}$ $\begin{matrix} D (T) = (\begin{matrix} 0 & 0.63 & 0.77 & 1 \\ 0.63 & 0 & 0.52 & 0.73 \\ 0.77 & 0.52 & 0 & 0.33 \\ 1 & 0.73 & 0.33 & 0 \end{matrix}) \end{matrix}$

Definition 14. Given an information system IS = (U, A, V, f), for a subset B ⊆ A, the D (B) is a distance matrix on B. The d _ij is an element of D (B), which represents the distance between two objects x _i and x _j on B. The $\tilde{B}$ is a fuzzy similar relation on B, denoted by $M (\tilde{B})$ , which is defined by: $\begin{matrix} M (\tilde{B}) = (\begin{matrix} r_{11} & r_{12} & . . . & r_{1 n} \\ r_{21} & r_{22} & . . . & r_{2 n} \\ . . . & . . . & r_{ij} & . . . \\ r_{n 1} & r_{n 2} & . . . & r_{nn} \end{matrix}) \end{matrix}$ where r _ij ∈ [0, 1] is the similarity value between two objects x _i and x _j on B, and r _ij = 1 - d _ij.

The distance matrix describes how far they are between two objects, while the fuzzy similar relation matrix represents how similar they are between two objects. It means that there is a relation between the two matrices. Consequently, we construct a fuzzy similar relation matrix from a distance matrix by the formula $M (\tilde{B}) = 1 - D (B)$ . Obviously, $\tilde{B}$ is a fuzzy similar relation, ∀x, y ∈ U, which satisfies: 1)Reflexivity: $\tilde{B} (x, x) = 1$ ; 2)Symmetry: $\tilde{B} (x, y) = \tilde{B} (y, x)$ .

For two fuzzy similar relations $\tilde{P}$ and $\tilde{Q}$ , the complement, intersection, union and inclusion operations are defined by: 1) $\tilde{R} = ⌝ \tilde{P} \Leftrightarrow \tilde{R} (x, y) = 1 - \tilde{P} (x, y)$ , 2) $\tilde{R} = \tilde{P} \cap \tilde{Q} \Leftrightarrow \tilde{R} (x, y) = \min {\tilde{P} (x, y), \tilde{Q} (x, y)}$ , 3) $\tilde{R} = \tilde{P} \cup \tilde{Q} \Leftrightarrow \tilde{R} (x, y) = \max {\tilde{P} (x, y), \tilde{Q} (x, y)}$ , 4) $\tilde{P} \subseteq \tilde{Q} \Leftrightarrow \tilde{P} (x, y) \leq \tilde{Q} (x, y)$ .

These operations can be interpreted as extensions of fuzzy-theoretic operations in a framework of fuzzy similar relation computations. Different fuzzy similar relation may be formed by different distance metrics of objects. The above operations may be used for combining fuzzy similar relation. It can be easily verified that these operations satisfy the following properties: 1)Commutativity: $\tilde{P} \cap \tilde{Q} = \tilde{Q} \cap \tilde{P}, \tilde{P} \cup \tilde{Q} = \tilde{Q} \cup \tilde{P}$ ; 2)Associativity: $(\tilde{P} \cap \tilde{Q}) \cap \tilde{R} = \tilde{P} \cap (\tilde{Q} \cap \tilde{R})$ , $(\tilde{P} \cup \tilde{Q}) \cup \tilde{R} = \tilde{P} \cup (\tilde{Q} \cup \tilde{R})$ ; 3)De Morgan’s law: $⌝ (\tilde{P} \cap \tilde{Q}) = ⌝ \tilde{P} \cup ⌝ \tilde{Q}$ , $⌝ (\tilde{P} \cup \tilde{Q}) = ⌝ \tilde{P} \cap ⌝ \tilde{Q}$ ; 4)Double negation law: $⌝ ⌝ \tilde{P} = \tilde{P}$ ;

Theorem 2. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}$ , $\tilde{Q}$ are two fuzzy similar relations on P, Q. If P ⊆ Q, then $\tilde{P} \supseteq \tilde{Q}$ .

Proof Since P ⊆ Q, according to the Theorem 1, we achieve D (P) ≤ D (Q). From the Definition 14, we know $M (\tilde{P}) \geq M (\tilde{Q})$ , then $\tilde{P} (x, y) \geq \tilde{Q} (x, y)$ . Therefore, $\tilde{P} \supseteq \tilde{Q}$ . An equivalence relation is derived from a partition of the domain on an information system. The partition generates a family of equivalence classes. And a similar relation induces some intersecting blocks, which constitute a cover. Correspondingly, a fuzzy similar relation induces a fuzzy cover of the domain and some fuzzy similar classes. The fuzzy similar classes are called fuzzy similar granules.

Definition 15. Given an information system IS = (U, A, V, f), for a subset B ⊆ A, the $\tilde{B}$ is a fuzzy similar relation on B. A fuzzy cover of the domain induced by a fuzzy similar relation is defined by:

$\frac{U}{\tilde{B}} = {_{\tilde{B}}}_{i = 1}^{n}$ , where $[x_{i}]_{\tilde{B}} = {(r_{i 1} / x_{1}) + (r_{i 2} / x_{2}) + . . . + (r_{in} / x_{n})}$ . $[x_{i}]_{\tilde{B}}$ is the fuzzy similar class belonging to x _i. It is also named a fuzzy similar granule. The r _ij is a similarity value between x _i and x _j. And, "+" means union of elements. The cardinality of a fuzzy similar granule $[x_{i}]_{\tilde{B}}$ is defined by:

$| [x_{i}]_{\tilde{B}} | = \sum_{j = 1}^{n} r_{ij}$ .

Example 3. From the Example 2, we can construct fuzzy similar relation matrices as follows: $\begin{matrix} M (\tilde{R}) = (\begin{matrix} 1 & 0.87 & 0.7 & 0.39 \\ 0.87 & 1 & 0.83 & 0.52 \\ 0.7 & 0.83 & 1 & 0.69 \\ 0.39 & 0.52 & 0.69 & 1 \end{matrix}) \end{matrix}$ $\begin{matrix} M (\tilde{S}) = (\begin{matrix} 1 & 0.85 & 0.37 & 0.3 \\ 0.85 & 1 & 0.5 & 0.28 \\ 0.37 & 0.5 & 1 & 0.68 \\ 0.3 & 0.28 & 0.68 & 1 \end{matrix}) \end{matrix}$ $\begin{matrix} M (\tilde{T}) = (\begin{matrix} 1 & 0.37 & 0.23 & 0 \\ 0.37 & 1 & 0.48 & 0.27 \\ 0.23 & 0.48 & 1 & 0.67 \\ 0 & 0.27 & 0.67 & 1 \end{matrix}) \end{matrix}$

Then, the fuzzy similar granules are listed in the follows:

$[x_{1}]_{\tilde{R}} = {\frac{1}{x_{1}} + \frac{0.87}{x_{2}} + \frac{0.7}{x_{3}} + \frac{0.39}{x_{4}}}$ , $[x_{2}]_{\tilde{R}} = {\frac{0.87}{x_{1}} + \frac{1}{x_{2}} + \frac{0.83}{x_{3}} + \frac{0.52}{x_{4}}}$ , $[x_{3}]_{\tilde{R}} = {\frac{0.7}{x_{1}} + \frac{0.83}{x_{2}} + \frac{1}{x_{3}} + \frac{0.69}{x_{4}}}$ , $[x_{4}]_{\tilde{R}} = {\frac{0.39}{x_{1}} + \frac{0.52}{x_{2}} + \frac{0.69}{x_{3}} + \frac{1}{x_{4}}}$ .

Definition 16. Suppose $[x_{i}]_{\tilde{A}}$ and $[x_{i}]_{\tilde{B}}$ are two fuzzy similar granules. The complement, intersection, union and inclusion operations between these granules are defined by:

1) $⌝ [x_{i}]_{\tilde{A}} = {(1 - a_{i 1}) / x_{1} + (1 - a_{i 2}) / x_{2} + . . . + (1 - a_{in}) / x_{n}} = \sum_{j = 1}^{n} \frac{1 - a_{ij}}{x_{j}}$ , 2) $[x_{i}]_{\tilde{A}} \cap [x_{i}]_{\tilde{B}} = {\min (a_{i 1}, b_{i 1}) / x_{1} + \min (a_{i 2},$ $b_{i 2}) / x_{2} + . . . + \min (a_{in}, b_{in}) / x_{n}} = \sum_{j = 1}^{n} \frac{a_{ij} \land b_{ij}}{x_{j}}$ , 3) $[x_{i}]_{\tilde{A}} \cup [x_{i}]_{\tilde{B}} = {\max (a_{i 1}, b_{i 1}) / x_{1} + \max (a_{i 2},$ $b_{i 2}) / x_{2} + . . . + \max (a_{in}, b_{in}) / x_{n}} = \sum_{j = 1}^{n} \frac{a_{ij} \lor b_{ij}}{x_{j}}$ , 4) $[x_{i}]_{\tilde{A}} \subseteq [x_{i}]_{\tilde{B}} \Leftrightarrow \forall j \in {1, 2, . . ., n}, a_{ij} \leq b_{ij} \Leftrightarrow \sum_{j = 1}^{n} \frac{a_{ij}}{x_{j}} \leq \sum_{j = 1}^{n} \frac{b_{ij}}{x_{j}}$ , where a _ij is the similarity value between x _i and x _j on $\tilde{A}$ , b _ij is the similarity value between x _i and x _j on $\tilde{B}$ , and the ∑ represents union of elements.

Definition 17. We say π _A is thinner than π _B if it satisfies the following partial order:

$π_{A} ⪯ π_{B} \Leftrightarrow \forall [x_{i}]_{\tilde{A}} \in π_{A}, \exists [x_{i}]_{\tilde{B}} : [x_{i}]_{\tilde{A}} \subseteq [x_{i}]_{\tilde{B}}$ .

Theorem 3. Suppose IS = (U, A, V, f) is an information system. For P, Q ⊆ A, let $\tilde{P}$ , $\tilde{Q}$ be two fuzzy similar relations on P, Q, we have

$Q \subseteq P \Rightarrow \frac{U}{\tilde{P}} ⪯ \frac{U}{\tilde{Q}}$ .

Proof. Since Q ⊆ P and the Theorem 2, we obtain $\tilde{P} \subseteq \tilde{Q}$ . From the Definition 16, we achieve $[x_{i}]_{\tilde{P}} \subseteq [x_{i}]_{\tilde{Q}}$ . Therefore, $\frac{U}{\tilde{P}} ⪯ \frac{U}{\tilde{Q}}$ .

Definition 18. Given an information system IS = (U, A, V, f) and an attribute subset P ⊆ A, let $\tilde{P}$ be a fuzzy similar relation on P and $\tilde{X}$ be a fuzzy similar granule, the lower and upper approximation fuzzy granules of $\tilde{X}$ related to $\tilde{P}$ are defined in the follows:

${\tilde{P}}_{*} (\tilde{X}) = {\min (⌝ [x]_{\tilde{P}} \lor \tilde{X}), x \in U}$ ,

${\tilde{P}}^{*} (\tilde{X}) = {\max ([x]_{\tilde{P}} \land \tilde{X}), x \in U}$ .

The tuple $< {\tilde{P}}_{*} (\tilde{X}), {\tilde{P}}^{*} (\tilde{X}) >$ is called a rough fuzzy set or a rough fuzzy granule, if the lower approximation fuzzy granule is not equal to the upper approximation fuzzy granule.

Example 4. Suppose a fuzzy similar granule is $\tilde{X} = {\frac{1}{x_{1}} + \frac{0}{x_{2}} + \frac{1}{x_{3}} + \frac{1}{x_{4}}}$ , and a fuzzy similar relation matrix is $M (\tilde{R})$ in the Example 3. According to the definition of lower approximation fuzzy granule, we calculate

${\tilde{R}}_{*} (\tilde{X}) = {\min (⌝ [x]_{\tilde{R}} \lor \tilde{X}), x \in U} = {\frac{r_{1}}{x_{1}} + \frac{r_{2}}{x_{2}} + \frac{r_{3}}{x_{3}} + \frac{r_{4}}{x_{4}}}$ as follows: r ₁ = min {0 ∨1, 0.13 ∨ 0, 0.3 ∨ 1, 0.61 ∨ 1} = min {1, 0.13, 1, 1} =0.13; r ₂ = min {0.13 ∨ 1, 0 ∨ 0, 0.17 ∨ 1, 0.48 ∨ 1} = min {1, 0, 1, 1} =0; r ₃ = min {0.3 ∨ 1, 0.17 ∨ 0, 0 ∨ 1, 0.31 ∨ 1} = min {1, 0.17, 1, 1} =0.17; r ₄ = min {0.61 ∨ 1, 0.48 ∨ 0, 0.31 ∨ 1, 0 ∨ 1} = min {1, 0.48, 1, 1} =0.48. Then, we have the upper approximation fuzzy granule from the formula ${\tilde{R}}^{*} (\tilde{X}) = {\max ([x]_{\tilde{R}} \land \tilde{X}), x \in U} = {\frac{s_{1}}{x_{1}} + \frac{s_{2}}{x_{2}} + \frac{s_{3}}{x_{3}} + \frac{s_{4}}{x_{4}}$ ]. It is calculated by the follows: s ₁ = max {1 ∧1, 0.87 ∧ 0, 0.7 ∧ 1, 0.39 ∧ 1} = max {1, 0, 0.7, 0.39} =1; s ₂ = max {0.87 ∧ 1, 1 ∧ 0, 0.83 ∧ 1, 0.52 ∧ 1} = max {0.87, 0, 0.83, 0.52} =0.87; s ₃ = max {0.7 ∧ 1, 0.83 ∧ 0, 1 ∧ 1, 0.69 ∧ 1} = max {0.7, 0, 1, 0.69} =1; s ₄ = max {0.39 ∧ 1, 0.52 ∧ 0, 0.69 ∧ 1, 1 ∧ 1} = max {0.39, 0, 0.69, 1} =1. Consequently, we have ${\tilde{R}}_{*} (\tilde{X}) = {\frac{0.13}{x_{1}} + \frac{0}{x_{2}} + \frac{0.17}{x_{3}} + \frac{0.48}{x_{4}}}$ and ${\tilde{R}}^{*} (\tilde{X}) = {\frac{1}{x_{1}} + \frac{0.87}{x_{2}} + \frac{1}{x_{3}} + \frac{1}{x_{4}}}$ .

Definition 19. Given an information system IS = (U, A, V, f) and an attribute subset P ⊆ A, let $\tilde{P}$ be a fuzzy similar relation on P and $\tilde{X}$ be a fuzzy similar granule, the positive, negative and boundary regions of $\tilde{X}$ related to $\tilde{P}$ are defined in the follows:

${POS}_{\tilde{P}} (\tilde{X}) = {\tilde{P}}_{*} (\tilde{X})$ ,

${NEG}_{\tilde{P}} (\tilde{X}) = ⌝ {\tilde{P}}^{*} (\tilde{X})$ ,

${BND}_{\tilde{P}} (\tilde{X}) = {\tilde{P}}^{*} (\tilde{X}) - {\tilde{P}}_{*} (\tilde{X})$ .

3.2 Information entropy measures of fuzzy similar granules

Information entropy is originally proposed by Shannon in literature [6]. It tells how much information there is in an event. In general, the more uncertain or random the event is, the more information it will contain. Information entropy refers to disorder or uncertainty, measuring the uncertainty of an attribute set or an equivalence relation in Pawlak rough sets. In the follows, it is applied for evaluating fuzzy similar relations or fuzzy similar granules. Furthermore, we present its various extensional forms.

Definition 20. Given an information system IS = (U, A, V, f), for P ⊆ A, the $\tilde{P}$ is a fuzzy similar relation induced by P. The $[x_{i}]_{\tilde{P}}$ is a fuzzy similar granule generated by $\tilde{P}$ . The information entropy of $\tilde{P}$ is defined by:

$H (\tilde{P}) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} |}{| U |}$ .

Definition 21. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations induced by P, Q, respectively. The $[x_{i}]_{\tilde{P}}$ and $[x_{i}]_{\tilde{Q}}$ are fuzzy similar granules generated by $\tilde{P}$ and $\tilde{Q}$ . The joint entropy of $\tilde{P}$ and $\tilde{Q}$ is defined by:

$H (\tilde{P} \tilde{Q}) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| U |}$ .

Definition 22. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations induced by P, Q, respectively. The $[x_{i}]_{\tilde{P}}$ and $[x_{i}]_{\tilde{Q}}$ are fuzzy similar granules generated by $\tilde{P}$ and $\tilde{Q}$ . The conditional entropy of $\tilde{P}$ conditioned to $\tilde{Q}$ is defined by:

$H (\tilde{P} | \tilde{Q}) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| [x_{i}]_{\tilde{Q}} |}$ .

Theorem 4. $H (\tilde{P} | \tilde{Q}) = H (\tilde{P} \tilde{Q}) - H (\tilde{Q})$ .

Proof. From the Definitions 20, 21 and 22, we know that $H (\tilde{P} \tilde{Q}) - H (\tilde{Q}) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| U |}$ $- (- \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{Q}} |}{| U |}) = - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| [x_{i}]_{\tilde{Q}} |} = H (\tilde{P} | \tilde{Q})$ . Therefore, the theorem is proved.

Theorem 5. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations induced by P, Q ⊆ A, respectively. If P ⊆ Q, then $H (\tilde{P}) \leq H (\tilde{Q})$ .

Proof. Since P ⊆ Q, according to the Theorem 2, we achieve $\tilde{P} \supseteq \tilde{Q}$ . From the Definition 16, we achieve $[x_{i}]_{\tilde{P}} \supseteq [x_{i}]_{\tilde{Q}}$ and $| [x_{i}]_{\tilde{P}} | \geq | [x_{i}]_{\tilde{Q}} |$ . Hence, $- \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{P}} |}{| U |} \leq - \frac{1}{| U |} \sum_{i = 1}^{| U |} log \frac{| [x_{i}]_{\tilde{Q}} |}{| U |}$ . From the Definition 20, we obtain $H (\tilde{P}) \leq H (\tilde{Q})$ .

Example 5. Suppose there are two fuzzy similar relation matrices $\tilde{R}$ , $\tilde{S}$ , illustrated in the Example 3. We have R ⊂ S. Then, the information entropies of $\tilde{R}$ and $\tilde{S}$ are calculated by

$H (\tilde{R}) = - \frac{1}{4} (\log \frac{1 + 0.87 + 0.7 + 0.39}{4} + \log \frac{0.87 + 1 + 0.83 + 0.52}{4} + \log \frac{0.7 + 0.83 + 1 + 0.69}{4} + \log \frac{0.39 + 0.52 + 0.69 + 1}{4})$ $= - \frac{1}{4} (\log \frac{2.96}{4} + \log \frac{3.22}{4} + \log \frac{3.22}{4} + \log \frac{2.6}{4})$ $= - \frac{1}{4} (- 0.1308 - 0.0942 - 0.0942 - 0.1871)$ = 0.1266, $H (\tilde{S}) = - \frac{1}{4} (\log \frac{1 + 0.85 + 0.37 + 0.3}{4} + \log \frac{0.85 + 1 + 0.5 + 0.28}{4} + \log \frac{0.37 + 0.5 + 1 + 0.68}{4} + \log \frac{0.3 + 0.28 + 0.68 + 1}{4})$ $= - \frac{1}{4} (\log \frac{2.52}{4} + \log \frac{2.63}{4} + \log \frac{2.55}{4} + \log \frac{2.26}{4})$ $= - \frac{1}{4} (- 0.2007 - 0.1821 - 0.1955 - 0.2480)$ = 0.2066. Therefore, $H (\tilde{R}) < H (\tilde{S})$ .

Theorem 6. Given an information system IS = (U, A, V, f), the $\tilde{P}, \tilde{Q}, \tilde{R}$ are three fuzzy similar relations generated by P, Q, R ⊆ A, respectively. If P ⊆ Q, then $H (\tilde{P} | \tilde{R}) \leq H (\tilde{Q} | \tilde{R})$ .

Proof. It can be easily proved by the Definition 21 and Theorem 4.

Theorem 7. 1) $H (\tilde{P} \tilde{Q}) \geq \max {H (\tilde{P}), H (\tilde{Q})}$ . 2) $P \subseteq Q \Leftrightarrow H (\tilde{P} \tilde{Q}) = H (\tilde{Q})$ . 3) $P \subseteq Q \Leftrightarrow H (\tilde{P} | \tilde{Q}) = 0$ .

Proof. 1) From the Definitions 15 and 16, we obtain $[x_{i}]_{\tilde{P}} = {(p_{i 1} / x_{1}) + (p_{i 2} / x_{2}) + . . . + (p_{in} / x_{n})}$ and $[x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} = {\min (p_{i 1}, q_{i 1}) / x_{1} + \min (p_{i 2}, q_{i 2}) / x_{2} + . . . + \min (p_{in}, q_{in}) / x_{n}}$ , hence $| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} | \leq | [x_{i}]_{\tilde{P}} |$ or $| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} | \leq | [x_{i}]_{\tilde{Q}} |$ . According to the Definitions 20 and 21, we have $H (\tilde{P} \tilde{Q}) \geq H (\tilde{P})$ or $H (\tilde{P} \tilde{Q}) \geq H (\tilde{Q})$ . Therefore, $H (\tilde{P} \tilde{Q}) geqmax {H (\tilde{P}), H (\tilde{Q})}$ . 2) It can be easily proved by the Definitions 20 and 21. 3) It can be proved from the 2).

Theorem 8. let DS = (U, C ∪ D, V, f) be a decision system, for any conditional subset B, the $\tilde{B}, \tilde{C}, \tilde{D}$ are three fuzzy similar relations generated by B, C, D, respectively. If B ⊆ C, then $H (\tilde{B} | \tilde{D}) \leq H (\tilde{C} | \tilde{D})$ .

Proof. It can be easily proved by the Theorem 6.

3.3 Information granularity measures of fuzzy similar granules

The information granularity also called knowledge granularity is firstly proposed in literature [12]. But the measure of information granularity is not suitable for the fuzzy data. In the subsection, a new information granularity is proposed to evaluate the uncertainty of a fuzzy similar relation or fuzzy similar granules.

Definition 23. Given an information system IS = (U, A, V, f), for P ⊆ A, the $\tilde{P}$ is a fuzzy similar relation generated by P. The $[x_{i}]_{\tilde{P}}$ is a fuzzy similar granule generated by $\tilde{P}$ . The information granularity of $\tilde{P}$ is defined by:

$G (\tilde{P}) = \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{P}} |}{| U |^{2}}$ .

Definition 24. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations induced by P, Q, respectively. The $[x_{i}]_{\tilde{P}}$ and $[x_{i}]_{\tilde{Q}}$ are fuzzy similar granules generated by $\tilde{P}$ and $\tilde{Q}$ . The joint granularity of $\tilde{P}$ and $\tilde{Q}$ is defined by:

$G (\tilde{P} \tilde{Q}) = \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| U |^{2}}$ .

Definition 25. Given an information system IS = (U, A, V, f), for P, Q ⊆ A, the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations generated by P, Q, respectively. The $[x_{i}]_{\tilde{P}}$ and $[x_{i}]_{\tilde{Q}}$ are fuzzy similar granules generated by $\tilde{P}$ and $\tilde{Q}$ . The conditional granularity of $\tilde{P}$ conditioned to $\tilde{Q}$ is defined by:

$G (\tilde{P} | \tilde{Q}) = \frac{1}{| U |^{2}} \sum_{i = 1}^{| U |} (| [x_{i}]_{\tilde{Q}} | - | [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |)$ .

Theorem 9. $G (\tilde{P} | \tilde{Q}) = G (\tilde{Q}) - G (\tilde{P} \tilde{Q})$ .

Proof. From the Definitions 23, 24 and 25, we know that $G (\tilde{Q}) - G (\tilde{P} \tilde{Q}) = \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{Q}} |}{| U |^{2}} - \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |}{| U |^{2}} = \frac{1}{| U |^{2}} \sum_{i = 1}^{| U |} (| [x_{i}]_{\tilde{Q}} | - | [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} |) = G (\tilde{P} | \tilde{Q})$ . Therefore, the theorem is proved.

Theorem 10. Given an information system IS = (U, A, V, f), the $\tilde{P}, \tilde{Q}$ are two fuzzy similar relations generated by P, Q ⊆ A, respectively. If P ⊆ Q, then $G (\tilde{P}) \geq G (\tilde{Q})$ .

Proof. Since P ⊆ Q, according to the Theorem 2, we achieve $\tilde{P} \supseteq \tilde{Q}$ . From the Definition 16, we achieve $[x_{i}]_{\tilde{P}} \supseteq [x_{i}]_{\tilde{Q}}$ and $| [x_{i}]_{\tilde{P}} | \geq | [x_{i}]_{\tilde{Q}} |$ . Hence, $\sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{P}} |}{| U |^{2}} \geq \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{\tilde{Q}} |}{| U |^{2}}$ . From the Definition 23, we obtain $G (\tilde{P}) \geq G (\tilde{Q})$ .

Example 6. Suppose there are two fuzzy similar relation matrices $\tilde{R}$ , $\tilde{S}$ , illustrated in the Example 3. We have R ⊂ S. Then, the information granularity values of $\tilde{R}$ and $\tilde{S}$ are calculated by

$G (\tilde{R}) = \frac{1}{4 * 4} ((1 + 0.87 + 0.7 + 0.39) + (0.87 + 1 + 0.83 + 0.52) + (0.7 + 0.83 + 1 + 0.69) + (0.39 + 0.52 + 0.69 + 1))$ $= \frac{1}{16} (2.96 + 3.22 + 3.22 + 2.6)$ = 0.75, $G (\tilde{S}) = \frac{1}{4 * 4} ((1 + 0.85 + 0.37 + 0.3) + (0.85 + 1 + 0.5 + 0.28) + (0.37 + 0.5 + 1 + 0.68) + (0.3 + 0.28 + 0.68 + 1))$ $= \frac{1}{16} (2.52 + 2.63 + 2.55 + 2.26)$ = 0.6225. Therefore, $G (\tilde{R}) > G (\tilde{S})$ .

Theorem 11. Given an information system IS = (U, A, V, f), the $\tilde{P}, \tilde{Q}, \tilde{R}$ are three fuzzy similar relations generated by P, Q, R ⊆ A, respectively. If P ⊆ Q, then $G (\tilde{P} | \tilde{R}) \leq G (\tilde{Q} | \tilde{R})$ .

Proof. It can be easily proved by the Definition 24 and Theorem 9.

Theorem 12. 1) $G (\tilde{P} \tilde{Q}) \leq \min {G (\tilde{P}), G (\tilde{Q})}$ . 2) $P \subseteq Q \Leftrightarrow G (\tilde{P} \tilde{Q}) = G (\tilde{Q})$ . 3) $P \subseteq Q \Leftrightarrow G (\tilde{P} | \tilde{Q}) = 0$ .

Proof. 1) From the Definitions 15 and 16, we obtain $[x_{i}]_{\tilde{P}} = {(p_{i 1} / x_{1}) + (p_{i 2} / x_{2}) + . . . + (p_{in} / x_{n})}$ and $[x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} = {\min (p_{i 1}, q_{i 1}) / x_{1} + \min (p_{i 2}, q_{i 2}) / x_{2} + . . . + \min (p_{in}, q_{in}) / x_{n}}$ , hence $| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} | \leq | [x_{i}]_{\tilde{P}} |$ or $| [x_{i}]_{\tilde{P}} \cap [x_{i}]_{\tilde{Q}} | \leq | [x_{i}]_{\tilde{Q}} |$ . According to the Definitions 23 and 24, we have $G (\tilde{P} \tilde{Q}) \leq G (\tilde{P})$ or $G (\tilde{P} \tilde{Q}) \leq G (\tilde{Q})$ . Therefore, $G (\tilde{P} \tilde{Q}) \leq \min {G (\tilde{P}), G (\tilde{Q})}$ . 2) It can be easily proved by the Definitions 23 and 24. 3) It can be proved from the 2).

Theorem 13. let DS = (U, C ∪ D, V, f) be a decision system, for any conditional subset B, the $\tilde{B}, \tilde{C}, \tilde{D}$ are three fuzzy similar relations generated by B, C, D, respectively. If B ⊆ C, then $G (\tilde{B} | \tilde{D}) \leq G (\tilde{C} | \tilde{D})$ .

Proof. It can be easily proved by the Theorem 11.

4 Experiments

In order to demonstrate the advantages of the uncertainty measures proposed in this paper, some experiments are conducted on two real-life data sets, which are Iris and Wine. The two real-life data sets are used available from UCI repository of machine learning database. The conditional attribute subset is increased from one attribute to all attributes during the experiments.

The measuring results are the information entropy of fuzzy granules, information granularity of fuzzy granules and approximation accuracy method,which are shown in Fig. 1 and Fig. 2. It can be seen from the two Figs. that the values of these measures are increasing with the number of selected attributes becoming bigger. We know that these measures are inversely proportional to the uncertainty. Therefore, that means the uncertainty decreases as the number of attributes we select increases. In other words, the uncertainty decreases when more available knowledge is supplied. They demonstrate the validity of the three measures in decision systems. It follows that the three measures both can be used to measure the uncertainty. It is easy to find that the values of approximation accuracy are the same when the number of attributes increased from 3 to 4 in Iris. Similarly, there is no change as the number of attributes increases from 4 to 5 and from 7 to 13 in Wine. In comparison, the information entropy and information granularity can evaluate the uncertainty more accurately. The results show that measures of information entropy and information granularity can provide more information for evaluating the uncertainty in decision systems.

Fig. 1

The result of Iris data set.

Fig. 2

The result of Wine data set.

5 Conclusions

The contribution of our work is two aspects. On one aspect, we construct fuzzy similar relations and fuzzy similar granules by introducing a distance matrix on information systems. Furthermore, we define some operations on these fuzzy similar relation and granules. And we present a fuzzy rough set model which combines roughness and fuzziness, for deal with information systems containing real-value data. On the other aspect, this paper has focused the development of uncertainty measures, tackling the problems of noisy and real-valued data, as well as dealing with mixtures of discrete and continuous value attributes of an information system. We have achieved this by proposing a fuzzy rough set and its some uncertainty measures. The original measures proposed by Pawlak are not suitable for real-value information systems. Consequently, two uncertainty measures are developed to handle the uncertainty of a fuzzy similar relation, which are information entropy and information granularity. Theoretical analyses and experimental results show that these measures are monotonic and valid. These measures will be used to evaluate the significance of attributes or features in an information system. Therefore, some attribute reduction algorithms will be developed based on these measures in the future. They may be potentially applied to fields such as feature selection, gene selection and decision rule extraction.

Footnotes

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Nos. 61573297 and 61672442), the Social Science Planning Project of Fujian Province (No. FJ2017C012), the Natural Science Foundation of Fujian Province (Nos. 2015J05015 and 2016J01325) and the Program for New Century Excellent Talents in Fujian Province University.

References

Zeng

A.P.

, Li

T.R.

, Liu

, Zhang

J.B.

and Chen

H.M.

, A fuzzy rough set approach for incremental feature selection on hybrid information systems, Fuzzy Sets and Systems 258 (2015), 39–60.

Huang

, Wei

D.K.

, Li

H.X.

and Zhuang

Y.L.

, Using a rough set model to extract rules in dominance-based interval-valued intuitionistic fuzzy information systems, Information Sciences 221 (2013), 215–229.

Sun

B.Q.

, Guo

H.F.

, Karimi

H.R.

, Ge

Y.J.

and Xiong

, Prediction of stock index futures prices based on fuzzy sets and multivariate fuzzy time series, Neurocomputing 151(3) (2015), 1528–1536.

Affonso

, Sassi

R.J.

and Barreiros

R.M.

, Biological image classification using rough-fuzzy artificial neural network, Expert Systems with Applications 42(24) (2015), 9482–9488.

Huang

C.C.

, Li

J.H.

and Dias

S.M.

, Attribute significance, consistency measure and attribute reduction in formal concept analysis, Neural Network World 26(6) (2016), 607–623.

Shannon

C.E.

, A mathematical theory of communication, Bell System and Technical Journal 27 (1948), 379–656.

Wang

C.Z.

, Shao

M.W.

, He

, Qian

Y.H.

and Qi

Y.L.

, Feature subset selection based on fuzzy neighborhood rough sets, Knowledge-Based Systems 111 (2016), 173–179.

Chen

D.G.

, Zhang

, Zhao

S.Y.

, Hu

Q.H.

and Zhu

P.F.

, A novel algorithm for finding reducts with fuzzy rough sets, IEEE Transactions on Fuzzy Systems 20(2) (2012), 385–389.

Chen

D.G.

, Zhang

X.X.

and Li

W.L.

, On measurements of covering rough sets based on granules and evidence theory, Information Sciences 317 (2015), 329–348.

10.

Liu

, Li

T.R.

and Zhang

J.B.

, Incremental updating approximations in probabilistic rough sets under the variation of attributes, Knowledge-Based Systems 73 (2015), 81–96.

11.

Miao

D.Q.

and Wang

, An information representation of the concepts and operations in rough set theory, Journal of Software 10(2) (1999), 113–116.

12.

Miao

D.Q.

and Fan

S.D.

, The calculation of knowledge granulation and its application, Systems Engineering Theory & Practice 22(1) (2002), 48–56.

13.

Tsang

E.C.C.

, Chen

D.G.

, Yeung

D.S.

, Wang

X.Z.

and Lee

, Attributes reduction using fuzzy rough sets, IEEE Transactions on Fuzzy Systems 16(5) (2008), 1130–1141.

14.

F.F.

, Miao

D.Q.

and Wei

, Fuzzy-rough attribute reduction via mutual information with an application to cancer classification, Computer and Mathematics with Applications 57(6) (2009), 1010–1017.

15.

Min

and Zhu

, Attribute reduction of data with error ranges and test costs, Information Sciences 211 (2012), 48–67.

16.

Bai

H.X.

, Ge

, Wang

J.F.

, Li

D.Y.

, Liao

Y.L.

and Zheng

X.Y.

, A method for extracting rules from spatial data based on rough fuzzy sets, Knowledge-Based Systems 57 (2014), 28–40.

17.

Zhang

H.Y.

and Yang

S.Y.

, Feature selection and approximate reasoning of large-scale set-valued decision tables based on dominance-based quantitative rough sets, Information Sciences 378 (2017), 328–347.

18.

Zhao

, Wang

and Hu

Q.H.

, Cost-sensitive feature selection based on adaptive neighborhood granularity with multilevel confidence, Information Sciences 366 (2016), 134–149.

19.

Düntsch

and Gediga

, Uncertainty measures of rough set prediction, Artificial Intelligence 106(1) (1998), 109–137.

20.

Dai

J.H.

and Xu

, Approximations and uncertainty measures in incomplete information systems, Information Sciences 198 (2012), 62–80.

21.

Dai

J.H.

and Xu

, Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification, Applied Soft Computing 13(1) (2013), 211–221.

22.

Zhai

J.H.

, Fuzzy decision tree based on fuzzy-rough technique, Soft Computing 15(6) (2011), 1087–1096.

23.

Qian

, Miao

D.Q.

, Zhang

Z.H.

and Yue

X.D.

, Parallel attribute reduction algorithms using MapReduce, Information Sciences 279 (2014), 671–690.

24.

Liang

J.Y.

, Wang

and Qian

Y.H.

, A new measure of uncertainty based on knowledge granulation for rough sets, Information Sciences 179(4) (2009), 458–470.

25.

Liang

J.Y.

, Li

and Qian

Y.H.

, Distance: A more comprehensible perspective for measures in rough set theory, Knowledge-Based Systems 27(11) (2012), 126–136.

26.

Mushrif

M.M.

and Ray

A.K.

, Color image segmentation: Rough-set theoretic approach, Pattern Recognition Letters 29(4) (2008), 483–493.

27.

Raza

M.S.

and Qamar

, An incremental dependency calculation technique for feature selection using rough sets, Information Sciences 343 (2016), 41–65.

28.

Maji

and Garai

, Fuzzy-rough simultaneous attribute selection and feature extraction algorithm, IEEE Transactions on Cybernetics 43(4) (2013), 1166–1177.

29.

Maji

and Pal

S.K.

, Fuzzy-rough sets for information measures and selection of relevant genes from microarray data, IEEE Transactions on Systems Man & Cybernetics Part B 40(3) (2010), 741–751.

30.

Maji

and Pal

S.K.

, Rough-fuzzy C-medois algorithm and selection of bio-basis for amino acid sequence analysis, IEEE Trans Knowl Data Eng 19(6) (2007), 859–872.

31.

Q.H.

, Yu

D.R.

and Xie

Z.X.

, Neighborhood classifiers, Expert Systems with Applications 34(2) (2008), 866–876.

32.

Q.H.

, Yu

D.R.

, Xie

Z.X.

and Liu

J.F.

, Fuzzy probabilistic approximation spaces and their information measures, IEEE Transactions on Fuzzy Systems 14(2) (2006), 191–201.

33.

Q.H.

, Mi

J.S.

and Chen

D.G.

, Granular computing based machine learning in the era of big data, Information Sciences 378 (2017), 242–243.

34.

Q.H.

, Xie

Z.X.

and Yu

D.R.

, Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation, Pattern Recognition 40(12) (2007), 3509–3521.

35.

Zhang

Q.H.

, Zhang

and Wang

G.Y.

, The uncertainty of probabilistic rough sets in multi-granulation spaces, International Journal of Approximate Reasoning 77 (2016), 38–54.

36.

Bhatt

R.B.

and Gopal

, FRCT: Fuzzy-rough classification trees, Pattern Analysis & Applications 11(1) (2008), 73–88.

37.

Jensen

and Shen

, Fuzzy-rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems 15(1) (2007), 73–89.

38.

Maulik

and Chakraborty

, Fuzzy preference based feature selection and isupervised SVM for cancer classification, IEEE Transactions on Nanobioscience 13(2) (2014), 152–160.

39.

Zhu

and Wang

, Reduction and axiomization of covering generalized rough sets, Information Sciences 152 (2003), 217–230.

40.

Zhang

, Mei

C.L.

, Chen

D.G.

and Li

J.H.

, Feature selection in mixed data: A method using a novel fuzzy rough set-based information entropy, Pattern Recognition 56(1) (2016), 1–15.

41.

Wang

Y.F.

, Mining stock price using fuzzy rough set system, Expert System Application 24(1) (2003), 13–23.

42.

Chen

Y.M.

, Wu

K.S.

, Chen

X.H.

, Tang

C.H.

and Zhu

Q.X.

, An entropy-based uncertainty measurement approach in neighborhood systems, Information Sciences 279 (2014), 239–250.

43.

, Pedrycz

and Miao

D.Q.

, Neighborhood rough sets based multi-label classification for automatic image annotation, International Journal of Approximate Reasoning 54(9) (2013), 1373–1387.

44.

Yao

Y.Y.

, Relational interpretations of neighborhood operators and rough set approximation operators, Information Science 111 (1998), 239–259.

45.

Yao

Y.Y.

Information-theoretic measures for knowledge discovery and data mining, Karmeshu (Ed.), Springer, Berlin, 2003, pp. 115–136.

46.

Pawlak

, Rough sets, International Journal of Computer Information Science 11 (1982), 341–356.

47.

Pawlak

Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, 1991.

A granulated fuzzy rough set and its measures

Abstract

Keywords

1 Introduction

2 Preliminaries

2.1 Pawlak rough sets

2.2 Uncertainty measures in Pawlak rough sets

3 A granulated fuzzy rough set and its measures

3.1 Fuzzy granulating with a distance matrix

Table 1 An information system. U a b c x 1 0 1 1 x 2 0.21 0.86 0 x 3 0.49 0.1 0.26 x 4 1 0 0.19

3.3 Information granularity measures of fuzzy similar granules

4 Experiments

Footnotes

Acknowledgment

References

Table 1
An information system.

U a b c

x ₁ 0 1 1

x ₂ 0.21 0.86 0

x ₃ 0.49 0.1 0.26

x ₄ 1 0 0.19