Abstract
Rough set characterizes an uncertain set with two crisp boundaries, i.e., lower and upper-approximation sets, and it has successfully been applied into artificial intelligence fields. However, there is little discuss on how to establish a crisp set as approximation set of a target set instead of only giving out two boundary lines. Although many uncertain decision rules can be acquired from the information system based on lower-approximation set by keeping the positive region unchanged, the accuracy of these rules depends on the similarity between a target set and its lower-approximation set. In order to solve this problem, an approximation set model of rough set was proposed and applied into attribute reduction or uncertain image segmentation successfully in our previous research. In this paper, a kind of fuzzy similarity between a target set and its approximation set is presented based on Euclidean distance instead of the similarity based on cardinality of a finite set. Then many good properties of 0.5-approximation set of a rough set are presented, and the conclusion that 0.5-appoximation set is the best approximation set of a rough set in all definable sets is successfully proved with Euclidean similarity between two fuzzy sets. Finally, the change laws of fuzzy similarity between a target set and its 0.5-approximation set in different granularity spaces is analyzed in detail.
Introduction
Rough set theory proposed by Pawlak in 1982 [1] has been a valid tool to deal with uncertain information systems, and has been applied into artificial intelligence fields, such as data mining, troubleshooting, medical systems, pattern recognition, conflict analysis, decision analysis, feature selection [2–15] and so on. In order to handle with the complex information systems in real life, many extended rough set models were proposed, such as probability rough set model [16, 17], variable precision rough set model [18, 19], rough set models based on tolerance relation[20] and so on. Rough set model and its extended models focus on establishing two boundary lines of an uncertain set (or concept), and these two boundary lines are called upper-approximation set and lower-approximation set based on an indiscernibility relation (or equivalence relation) [20, 21]. A lot of research on knowledge acquisition based on lower-approximation set is presented gradually, and the common purpose of these methods is keeping the positive region of an uncertain set unchanged. Therefore, many approximate decision rules can be acquired with an accuracy standard if we acquire the decision rules based on lower-approximation set or upper-approximation set of a target set. Especially, in real uncertain information systems, rough set model and its extended models are widely applied into all kinds of knowledge discovery fields [22–24] because these models are easy to calculate.
However, both lower-approximation set and upper-approximation set of an uncertain target set are crisp sets, so to a certain extent, rough set theory is a kind of methodology that uses precise sets to characterize an uncertain set. Generally, the better the approximate degree between a target set and its approximation set is, the higher the precision of the knowledge acquired from information systems is. So, whether are there other approximation sets which are closer to the target set compared with upper-approximation set or lower-approximation set? This is our main research motivation on approximation set of a rough set in this paper. Though there are lots of literatures on extended rough set models, there is little research on how to establish a crisp approximation set of an uncertain set with the existing knowledge granules in a rough approximation space, and this approximation set probably is closer to the target set than lower-approximation set or upper-approximation set.
In order to establish a better approximation set of an uncertain target set in a rough approximation space, a new definition of approximation set of rough set [21] was proposed, the related properties were analyzed and related conclusions were presented, and more details can be referred to [21] and [25]. Those results are applied to the attribute reduction and incremental knowledge acquisition. Both theoretical analysis and experimental results show that the methods based on the approximation set of a rough set are more effective and more suitable for dealing with uncertain information systems [25, 26]. In the previous research, we established a kind of 0.5-approximation set of an uncertain target set, and this approximation set is based on the similarity between two crisp subsets. In [21], the value of similarity is a ratio value of |R0.5 (X) ∩ X|/|R0.5 (X) ∪ X|. With the further research, we find that this similarity exists some shortcomings, and they mainly embody in two aspects as follows. One is that we still can not draw the conclusion that 0.5-approximation set is the best approximation set to target set. The other one is that the formula |R0.5 (X) ∩ X|/|R0.5 (X) ∪ X| is not suitable for a fuzzy target set.
So, in this paper we will focus on how to establish a crisp set as approximation set of a target set by the fuzzy similarity between two fuzzy sets in order to improve the properties of the approximation set. For this purpose, a new similarity formula based on fuzzy similarity is proposed. This similarity formula is based on Euclidean distance between two fuzzy sets. Using this similarity formula, we successfully prove that 0.5-approximation set is the best approximation set of a target set. And the change laws of the fuzzy similarity between a target set and its 0.5-approximation set with changing knowledge granularities are analyzed in detail. These properties are consistent with human cognition habits.
The rest of this paper is organized as follows. Many preliminary concepts are reviewed briefly and many new concepts are defined in Section 2. In Section 3, the similarity formula based on Euclidean distance between two fuzzy sets will be proposed, the good properties are presented, and the conclusion that the 0.5-approximation set is the best approximation set of a target set is proved successfully. In Section 4, the change laws of fuzzy similarity between a target set and its 0.5-approximation set with the changing knowledge granularities are analyzed. Finally, the conclusions are drawn in Section 5.
Preliminaries
In order to better present this paper, the basic concepts or terms on rough set, fuzzy set, approximation set and similarity between two fuzzy sets will be reviewed briefly in this section.
And U/ind (R) = {[x] R |x ∈ U} is called a partition of U induced by ind (R), where [x] R denotes the equivalence class of x (x ∈ U). An indiscernibility relation is also called equivalence relation.
Obviously, lower-approximation set of X is a union of the equivalence classes which contain in X, and upper-approximation set of X is a union of the equivalence classes which have nonempty intersection sets with X.
For any subset X ⊆ U, if , X is called a definable set or crisp set. If , X is called a rough set. The pair (U, R) is called a rough approximation space. According to Definition 3, holds obviously. So, a rough set successfully describes an uncertain target set with two crisp boundary lines, in other words, rough set is a kind of method for describing an uncertain target set by two crisp sets without giving out a crisp approximation set of a target set.
is called a boundary region of X;
is called a negative region of X;
is called a positive region of X.
Obviously, if the boundary region is bigger, the uncertainty of a target set is bigger. In order to evaluate the uncertainty of an uncertain target set, there are several typical uncertainty measurements, such as accuracy [28], roughness [28] and fuzziness [29]. Pawlak proposed two numerical measurements for evaluating uncertainty of a target set, one is accuracy, and the other is roughness. The accuracy is equal to the ratio of the cardinalities of lower-approximation set and upper-approximation set of X as follows,
On the contrary, the roughness represents the degree of uncertainty of an uncertain target set in a rough approximation space, and it is calculated as follows,
Obviously, for any target set X, 0 ≤ α R (X) ≤1 and 0 ≤ ρ R (X) ≤1. If , the ρ R (X) =0 and α R (X) =1, that is to say, the roughness of a definable set in a rough approximation space is equal to 0. If , 0 < ρ R (X) ≤1 and 0 ≤ α R (X) <1, that is to say, the roughness of a rough set in a rough approximation space is greater than 0 and less than 1. In other words, the roughness ρ R (X) of a target set X is the ratio of the cardinalities of the boundary region and upper-approximation set of X. In order to overcome above limitations of roughness and accuracy, Liang, et al. [28] presented a new method for evaluating uncertainty of a rough set and an axiomatic definition of knowledge granularity for an information system was proposed too.
Although these measurements are effective, they all have their own limitations. Especially, when given two target sets X and Y in a rough approximation space U/R, if and , then we cannot distinguish these two rough sets by their roughness, accuracy and knowledge granularity respectively.
In [29], Wang, et al proposed a kind of uncertainty measurement of rough set as follows,
In different rough approximation spaces, this fuzziness formula can better characterize the change of uncertainty with changing knowledge granularities, and it better accords with human cognition habits [29]. For two rough sets X and Y, though and , their own fuzziness probably are different. So, the fuzziness formula (3) probably is better to characterize the uncertainty of an uncertain set. In this paper, we will establish a kind of approximation set based on the fuzzy set of a target set in a rough approximation space and analyze the similarity between a target set and its approximation set.
R λ (X) is called λ-approximation set of X in U/R.
R λ (X) is a crisp approximation set, and it is a union set of those equivalence classes which have at least λ% elements belonging to X. According to classical approximation calculation principle, i.e., omitting decimal fractions smaller than 0.5 and counting all others, we let the threshold λ = 0.5 and present0.5-approximation set of X, denoted by R0.5 (X) and it is shown as Fig. 1. In Fig. 1, the red line is the boundary line of 0.5-approximation set of X. Obviously, the region bounded by red line is closer to the target set X than lower-approximation set or upper-approximation set.
Let A and B be two subsets of domain. A kind of similarity between A and B was defined [21], and a similarity formula was presented as follows,
Here | • | represents a cardinality of a finite set. S (A, B) stands for the similarity between A and B. Depending on this similarity formula, the related theorems are proposed and proved successfully as follows.
Theorem 2 shows that the similarity between a target set X and its 0.5-approximation set is greater than that of between X and its lower-approximation set. And Theorem 3 shows that under some conditions the similarity between a target set X and its0.5-approximation set also is greater than that of between X and its upper-approximation set.
Although 0.5-approximation set exhibits many good properties based on the similarity formula (5), we still can not prove that 0.5-approximation set of a target set is the best approximation set in all definable sets in a rough approximation space. And in the previous research we find out a subinterval of [0, 1] in which the optimal approximation set can be obtained, and present an algorithm for searching optimal approximation set. Based on in-depth analysis and further research, the approximation set theories or methods can improved if we select a better similarity measurement, because the formula (5) is only suitable for crisp subsets, and it is not suitable for fuzzy sets (i.e., fuzzy target set). Therefore, in this paper a new similarity formula based on Euclidean distance between two fuzzy sets will be presented. Although there are many kinds of similarity formulas [30–35] from different literatures, the fuzzy similarity based on Euclidean distance is more reasonable, and it accords with human cognition habits.
d (A, B) = d (B, A), for any two fuzzy subsets A, B ∈ F (U); d (A, A) =0, for any fuzzy subset A ∈ F (U);
, for any subset A ∈ F (U), ∼A denotes the complementary set of A; For any A, B, C ∈ F (U), if A ⊆ B ⊆ C, then d (A, B) ≤ d (A, C) and d (B, C) ≤ d (A, C).
F (U) is the family of all fuzzy subsets on U, and P (U) is the power set of U. In order to easily understand the distance between two fuzzy sets in Definition 5, in this paper, let d (A, ∼ A) =1 for any fuzzy subset A ∈ F (U), then the distance of two fuzzy sets looks more standardized.
S (A, B) = S (B, A), for any two fuzzy subsets A, B ∈ F (U); S (A, A) =1, for any fuzzy subset A ∈ F (U); S (A, ∼ A) =0, for any crisp subset A ∈ P (U), ∼A denotes complementary set of A; For any A, B, C ∈ F (U), if A ⊆ B ⊆ C, then S (A, B) ≥ S (A, C) and S (B, C) ≥ S (A, C).
According to Definition 6, many fuzzy similarity formulas can be established. If the domain U is discrete, namely, U = {x1, x2, ⋯ , x
n
}. Supposing two fuzzy subsets A, B ∈ F (U) are expressed as follows,
Then in this paper, the fuzzy distance formula between A and B is defined as follows,
Obviously, this formula is based on classical Euclidean distance between two fuzzy sets, and it satisfies all conditions of the Definition 6, and the fuzzy similarity between A and B also is defined as follows,
The fuzzy set of X in U/R is shown as follows,
In a rough approximation space, whether does there exist a better approximation set of a target set X than R0.5 (X)? In other words, is the approximation set R0.5 (X) the best approximation set of X in U/R? Next, we will present the answers, and many good characters of 0.5-approximation set will be put forward in the following theories.
(1)When 0.5 < λ ≤ 1. Let R0.5 (X) = X i 1 ∪ X i 2 ∪ ⋯ ∪ X i k , where X i 1 , X i 2 , ⋯ , X i k ∈ U/R. According to Theorem 1, R λ (X) ⊆ R0.5 (X).
If R λ (X) = R0.5 (X), obviously holds.
If R
λ
(X) ⊂ R0.5 (X), in other words, there exists at least an element X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X). For simplicity, supposing there only is an element X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X) (the more complex cases can be transformed into this simple case, so we will not repeat them here) and let X
i
p
= X
l
= {xl1, xl2, ⋯ , x
ln
l
}. So, , we have,
So,
(2) If 0 ≤ λ < 0.5. The proof is similar to (1). Let R0.5 (X) = X i 1 ∪ X i 2 ∪ ⋯ ∪ X i k , where X i 1 , X i 2 , ⋯ , X i k ∈ U/R. According to the Theorem 1, R0.5 (X) ⊆ R λ (X) holds.
If R λ (X) = R0.5 (X), then holds.
If R0.5 (X) ⊂ R λ (X), there exist at least an element X i p ⊆ R λ (X) but X i p ⊄ R0.5 (X).
For simplicity, supposing there only exist an element X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X) (the more complex case can be transformed into this simple case, so we do not repeat them here) and let X
i
p
= X
l
= {xl1, xl2, ⋯ , x
ln
l
}. So, , we have
So, in a similar way to (1), we easily know that the inequality holds.
In summary, according to (1) and (2) Theorem 4 is proved successfully.
In Theorem 4, when λ = 0, we have , and when λ = 1, we have .
(1) If 0.5 < λ ≤ 1. Let R0.5 (X) = X i 1 ∪ X i 2 ∪ ⋯ ∪ X i k , where X i 1 , X i 2 , ⋯ , X i k ∈ U/R. According to Theorem 1, R λ (X) ⊆ R0.5 (X).
If R λ (X) = R0.5 (X), then S (X, R0.5 (X)) = S (X, R λ (X)).
If R
λ
(X) ⊂ R0.5 (X), there exist at least an element X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X). For simplicity, supposing there only exist X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X) (the more complex case can be transformed into this simple case, so we do not repeat them here) and let X
i
p
= X
l
= {xl1, xl2, ⋯ , x
ln
l
}. So , that is to say, |X
l
∩ X| ≥ |X
l
| - |X
l
∩ X|. We have
So, the inequality holds.
Then,
(2) If 0 ≤ λ < 0.5. Let R0.5 (X) = X i 1 ∪ X i 2 ∪ ⋯ ∪ X i k , where X i 1 , X i 2 , ⋯ , X i k ∈ U/R. According to Theorem 1, R0.5 (X) ⊆ R λ (X).
If R λ (X) = R0.5 (X),S (X, R0.5 (X)) = S (X, R λ (X)) holds.
If R0.5 (X) ⊂ R
λ
(X), there exist at least an element X
i
p
⊆ R
λ
(X) but X
i
p
⊄ R0.5 (X). For simplicity, supposing there only exist X
i
p
⊆ R0.5 (X) but X
i
p
⊄ R
λ
(X) (the more complex case can be transformed into this simple case, so we do not repeat them here) and let X
i
p
= X
l
= {xl1, xl2, ⋯ , x
ln
l
}. So , that is to say, |X
l
∩ X| < |X
l
| - |X
l
∩ X|. We have
And,
So, the inequality holds.
In a similar way to (1), we easily know the inequality S (X, R0.5 (X)) ≥ S (X, R λ (X)) holds.
In summary, according to (1) and (2), Theorem 5 is proved completely.
In Theorem 5, as long as λ = 0, and when λ = 1. Theorem 5 shows that for any threshold λ ∈ [0, 1], the approximation set R0.5 (X) is the best approximation set of X, and R0.5 (X) is also called optimal approximation set of X. In a rough approximation space U/R, if a set is a union of many granules (i.e., equivalence classes), it must be a definable set (crisp set). For an uncertain target set X, is there a better crisp set than R0.5 (X)? The answer is presented in Theorem 6.
And , , that is to say,
So,
And because,
And , , ⋯, , that is to say,
So, we have,
Obviously,
So,
Theorem 6 is proved completely.
Theorem 6 shows given an uncertain target set, its 0.5-approximation set is the closest to the target set in all definable sets. In a sense, Theorem 5 is a special case of Theorem 6, but Theorem 5 shows that0.5-approximation set is the best approximation set of a target set from viewpoint of cut-set.
The change laws of similarity with changing knowledge granularities
With the changing knowledge granularities, researchers pay their attentions to how to obtain the change laws of the uncertainty of an uncertain target set in a rough approximation space [29, 36–39]. Next we will analyze the change laws of the fuzzy similarity between a target set X and its 0.5-approximation set in the changing knowledge spaces.
(1) If X1 ∩ X = φ, then Y1 ∩ X = φ and Y2 ∩ X = φ. Because X2 = Y3, X3 = Y4, ⋯, X m = Y t (t = m + 1), we have , then holds.
(2) If X1 ⊆ X, then Y1 ⊆ X and Y2 ⊆ X. Because X2 = Y3, X3 = Y4, ⋯ , X m = Y t (t = m + 1), we have , then .
(3) If, that is to say, . Let |X1| = a, |X1 ∩ X| = b, |Y1| = a1, |Y1 ∩ X| = b1, |Y2| = a2 and |Y2 ∩ X| = b2. Here a1 + a2 = a and b1 + b2 = b. Then,
And,
Obviously,
Because,
And
Given a function with two variables as follows,
Solving these equations, we know that f (a1, b1) can reach its maximum value when , and f (a1, b1) can reach its minimum value 0 when b1 = a1 = b. So, the inequality always holds. That is to say, the inequality always holds.
Finally, holds. Theorem 7 is successfully proved.
Theorem 7 shows that when the knowledge granules in (U, R′) are subdivided into many finer knowledge granules in (U, R″), the fuzzy similarity between X and is smaller than that of between X and . Next, we will analyze the change laws of fuzzy similarity between a target set and its 0.5-approximation set with changing knowledge granularities.
Next we prove this theorem in two cases as follows.
(1) If , then .
(i) If and , then and . Because X2 = Y3, X3 = Y4, ⋯ , X m = Y t , so , so .
(ii) If and , then and . We easily have,
Then, the inequality ∑x∈Y2 (μ X (x) - μR′0.5(X)) 2 > ∑x∈Y2 (μ X (x) - μR″0.5(X)) 2 holds.
So, we have,
(iii) If and , then and , in a similar way to (ii), the inequality holds.
According to (i), (ii) and (iii), we can have .
(2) If , then .
(i) If and , then and . Because X2 = Y3, X3 = Y4, ⋯ , X m = Y t , , then .
(ii) If and , then and . Then,
So, the inequality ∑x∈Y1 (μ X (x) - μR′0.5(X)) 2 > ∑x∈Y1 (μ X (x) - μR″0.5(X)) 2 holds.
So, we can have,
(iii) If , and , then and , in a similar way to (ii), the inequality holds.
According to (i), (ii) and (iii), the inequality
holds.
In summary, Theorem 8 is proved successfully according to above (1) and (2).
Figure 1 shows 0.5-approximation set (the region bounded by red line) of a target set in a coarse knowledge space, and Fig. 2 shows 0.5-approximation set in a fine knowledge space. Theorem 8 shows in a finer rough approximation space, the fuzzy similarity between a target set X and its 0.5-approximation set is bigger than that of in a coarser rough approximation space. In other words, with the increasing of the attributes, R0.5 (X) will gradually close to X. Theorem 8 can be illustrated by Figs. 1 and 2.
Conclusions
With the rapid development of data science, there emerge more and more uncertain data in artificial intelligence field, such as intelligence control, intelligence decision, internet of things, remote-sensing information, weather forecast, satellite cloud image and so on. How to deal with the massive uncertain data has become an important issue. There are many theories and methods for handing uncertain phenomenon, such as probability theory established by Kolmogorov in 1933 [40], fuzzy set theory proposed by Zadeh in 1965 [41, 42], rough set theory proposed by Pawlak in 1982 [1], and Could model proposed by Li in 1995 [43]. Because of simple calculation, rough set theory has become an important tool to deal with uncertain boundary problems, and is successfully applied into uncertain knowledge discovery field [44, 45], such as image segmentation, medical diagnosis, industrial control, business decisions and so on. However, rough set does not explicitly give out the method for precisely or approximately establishing a crisp set as an approximation set of an uncertain target set with existing knowledge granules. In this paper, a new approximation set of rough set is proposed based on fuzzy similarity. The conclusion that 0.5-approximation set of rough set is the best approximation set in all definable sets of rough approximation space is successfully proved. Then the change laws of fuzzy similarity between an uncertain target set and its approximation set in different granularity spaces is analyzed in detail. These results show 0.5-approximation set with fuzzy similarity has good properties. We hope this research can improve the development of uncertain artificial intelligence.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of P.R. China (Nos. 61472056 and 61572091), and the Natural Science Foundation of Chongqing of P.R. China (No. CSTC2013jjb40003).
