Abstract
In this paper, we define covering-based rough membership function for the tenth type of covering-based rough sets which was examined by Tsang et al. and obtain some properties about the rough membership function which bases on as well as characterize the covering-based rough set approximation operators from the viewpoint of numerical values. An example in evidence-based medicine are proposed to illustrate the practical application backgrounds of Pawlak’s rough membership function in real life. By this example, we can show that this covering-based rough membership function is more realistic than Pawlak’s rough membership function in applications of real life.
Introduction
Rough set theory originally proposed by Pawlak [14] which has been acknowledged as a useful and powerful tool in data analysis particularly for dealing with granularity and vagueness. As a mathematical tool for handling uncertain knowledge, it also has been successfully applied in pattern recognition, data mining, machine learning, and so on [6, 38]. A problem with Pawlak’s rough set theory is that partition or equivalence relation is explicitly used in the definition of the lower and upper approximations. However, such a partition or equivalence relation is still restrictive for many applications because it can only deal with complete information systems [13, 37]. To address this issue, generalizations of rough set theory were considered by scholars in order to deal with complex practical problems. One approach was to extend equivalence relations to tolerance relations [24, 33] and others [29, 43]. Another important approach was to relax the partition to a covering and obtained the covering-based rough sets. In 1983, Zakowski first extended Pawlak’s rough set theory by using a covering of the universe, rather than a partition [36]. A pair of lower and upper approximation operators are defined by a straightforward generalization of the Pawlak definition. The generalized approximation operator are no longer dual to each other with respect to set complement [20, 33]. This new model is often referred to as the first type of covering-based rough sets. Based on the mutual correspondence of the concepts of extension and intension, Bryniarski [47] and Bonikowski et al. [1] gave the second type of covering-based rough sets. The third type of covering-based rough sets was introduced in [26]. The applied background and attributes reduct of this type of covering-based rough sets were discussed by Tsang et al. [27]. The difference between the third type of covering-based rough sets and Pawlak’s rough sets, and the conditions of coverings under which the common properties of classical rough sets hold for the third type of covering-based rough sets were studied in [45]. Zhang et al. provided the representation of relation-based rough sets for the third type of covering-based rough sets [39]. The characterizations of covering for the third type of covering-based upper approximation operator to be closure operator were investigated in [4, 46]. Moreover, Wang et al. systematically analyzed the relationships between generalized rough sets in pure reflexive neighborhood system and the third type of covering-based rough sets [28]. In addition, Yang et al. initially constructed a new reduction theory, which redefined the approximation spaces and the reduced of covering-based rough sets, and applied it to the third type of covering-based rough sets [31].
The concept of rough membership functions played an important role in rough set theory for measuring the uncertainty of a set in an information system [17]. The main idea of rough membership functions comes from Pawlak’s work [15], and was explicitly introduced by Pawlak and Skowron in [17]. The rough membership functions were obtained a lot of attention in recent years. For a finite universe, a rough membership function was typically computed by Pawlak et al., and was used to present numerical characterizations of Pawlak’s rough set approximations [16]. Furthermore, the relations between fuzzy membership functions and rough membership functions were investigated by comparing core and support of fuzzy set theory with lower and upper approximations of rough set theory [32]. Based on the rough membership function, Yao revisited probabilistic rough set approximation operators. He also made a survey on existing studies, and gave some new results on the decision-theoretic rough set model [34]. Chakraborty used the notion of rough membership function to generalize the category theoretic approach of Obtulowicz to Pawlak’s rough sets, and established a link between rough sets and L-fuzzy sets for some special lattices [2]. Pawlak and Skowron interpreted rough sets by constructing membership function, weak membership function or strong membership function [17]. Greco et al. utilized the concept of absolute and relative rough membership functions to present a parameterized rough set model, which is a generalization of the original definition of rough sets and variable precision rough sets [5]. In addition, the relative rough membership function was an instance of a class of measures known as the Bayesian confirmation measures [3]. However, as pointed out by the authors of [10, 36], a partition induced by equivalence relation may not provide a realistic view of relationships among elements in the real-word application although it is easy to analyze. Instead, a covering of the universe might be considered as an alternative to provide a more realistic model of rough sets [1, 36]. Based on coverings of the universe, Yao and Zhang defined minimum, maximum and average rough membership functions, and studied their properties [35]. Furthermore, Intan and Mukaidono constructed minimum, maximum and average rough membership functions which are based on α-coverings of the universe, and examined their properties [8]. Xu and Zhang proposed new lower and upper approximations and constructed a covering-based rough membership function for them. They also defined a measure of roughness based on the covering-based rough membership function and discussed some significant applications of this measure [30]. From the covering-based rough membership function defined in [30], Shi and Gong constructed similarity measure for covering rough sets, and established relationships between covering-based rough sets and Pawlak probabilistic rough sets [23]. In view that the rough membership functions studied in the above papers are described only by a single binary relation or a single covering on a given universe, which can not be applied in some practical multigranulation backgrounds, Lin et al. employed the maximal and minimal degrees of rough membership to characterize the uncertainty of covering-based multigranulation rough sets [10].
However, to the best of our knowledge, there is no researcher pays attention to the rough membership function of the tenth type of covering-based rough sets mentioned in the [45], or to the practical applications of Pawlak’s rough membership function in real life. In this paper, we use an example in evidence-based medicine to illustrate the practical application backgrounds of Pawlak’s rough membership function in real life. By this example, we also point out the limitations of Pawlak’s rough membership function in applications of real life and the necessity of constructing rough membership functions for covering-based rough sets. Then, we construct covering-based rough membership function for the tenth type of covering-based rough sets. We not only present theoretical backgrounds for the covering-based rough membership function, but also show that this covering-based rough membership function is more realistic than Pawlak’s rough membership function in applications of real life.
The remainder of this paper is arranged as follows: In Section 2, after reviewing the concept of Pawlak’s rough membership function and numerical characterizations of Pawlak’s rough set approximations, we propose theoretical backgrounds of the Pawlak’s rough membership function. Then, we give an example in medical diagnosis to illustrate practical backgrounds of the Pawlak’s rough membership function in real life. From this example, we also point out the limitations of Pawlak’s rough membership function in applications of real life. In Section 3, we introduce several fundamental concepts and basic facts needed in this paper. Section 4 is the core of this paper. We construct covering-based rough membership function for the tenth type of covering-based rough sets, and reveals its numerical characterizations. In Section 5, after showing theoretic backgrounds for the covering-based rough membership function, we use the example presented in Section 2 to explain why this covering-based rough membership function is more realistic than the Pawlak’s rough membership function when considering practical applications. This paper are concluded in Section 6 with remarks for future works.
Pawlak’s rough sets
In this section, we first review the concept of Pawlak’s rough membership function and numerical characterizations of Pawlak’s rough set approximations. Then we present theoretical backgrounds of Pawlak’s rough membership function. Finally, we employ an example in medical diagnosis to illustrate the practical backgrounds and limitations of Pawlak’s rough membership function.
Pawlak’s rough sets are defined as follows [16]:
Let U be a finite set and R be an equivalence relation on U. R will generate a partition U/R on U, and a block of the partition U/R containing the element x will be denoted as [x]
R
. ∀X ⊆ U, the lower, upper approximations and the boundary region of X are defined in the following way respectively:
Definition of rough membership function
Pawlak’s rough membership function is a function , defined by , where x ∈ U, X ⊆ U and |X| denotes the cardinality of X [16].
The rough membership function expresses conditional probability that x belongs to X given by R and can be interpreted as the degree that x belongs to X in view of information about x expressed by R [16].
Numerical characterizations
Pawlak’s rough sets can be also defined by the rough membership function instead of approximation. That is, if be a rough membership function on U, then ∀X ⊆ U, the approximations and the boundary region of X can be defined as follows [16]:
Backgrounds of rough membership function
Theoretical backgrounds
The rough membership function may be interpreted as a special kind of fuzzy membership function. Under this interpretation, it is possible to establish the connection between Pawlak rough sets and fuzzy sets as follows [31]: ∀X ⊆ U,
Besides, the rough membership function, in contrast to fuzzy membership function, has a probabilistic flavor. The relationship between probabilistic rough sets and Pawlak rough sets was established as follows in [37]: If the parameters α = 1 and β = 0, then the probabilistic lower approximation and upper approximation are degenerated into the lower approximation and upper approximation in the Pawlak rough sets respectively. That is, for any X ⊆ U,
Practical backgrounds
As we mentioned in Introduction Section, rough membership functions play an important role in Pawlak’s rough sets. However, to the best of our knowledge, there is no paper which discusses the practical backgrounds of Pawlak’s rough membership function in real life. In the following, by means of the analysis of an example about evidence-based medical diagnosis data, we explain how we can use the Pawlak’s rough membership function to determine the initial treatment of patients, in order to help doctors to make subjective diagnose. At the same time, by this example, we point out the limitations of Pawlak’s rough membership function in applications of real life, and show the necessity to establish the covering-based rough membership functions. An evidence-based medical diagnosis database of a hospital is a database based on information of patients who visited the hospital and the diseases of them were diagnosed. The database consists of symptom reaction of patients and finally diagnosed illness. An evidence-based medicine database cannot be simply regarded as an ’if... then ’ system, since it is possible that two patients with identical symptoms were finally diagnosed with different diseases. The following simple example of evidence-based medicine database is selected from a medical doctoral dissertation [7]. The example consists of data of 30 patients, including 26 patients (p1 - p26) with diseases were identified according to their symptoms, for 2 patients (p27 and p28) although symptoms clear, but the disease has not been identified, for the last two patients (p29 and p30) part of the symptom reaction still not clear. The detailed information of the example is in Table 1.
In Table 1, {cough, bloody sputum, chest pain, fever, shout breath, thin, feeling tired, anorexia, local diffusion, distant metastasis} is a set of condition attributes, {lung cancer} is a set of decision attribute, and C.L.C, P.L.C, T.B.L.B.C denote central lung cancer, peripheral lung cancer, thin bronchuses lung bubble cancer respectively. Moreover, each row can be seen as information about a specific patient, and 1 denotes yes, 0 denotes no. The patients p
i
(i = 27, 28, 29, 30) are waiting for the hospital diagnosis, whereas the information of patient p
i
(1 ≤ i ≤ 26), which are from diagnostic database of lung cancer cases in [7], can determine the following nine decision rules: if (cough,1) and (bloody sputum,1) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,1) and (feeling tired,1) and (anorexia,1) and (local diffusion,1) and (distant metastasis,0), then (lung cancer, central lung cancer); if (cough,1) and (bloody sputum,1) and (chest pain,1) and (fever,1) and (shout breath,1) and (thin,1) and (feeling tired,1) and (anorexia,1) and (local diffusion,1) and (distant metastasis,1), then (lung cancer, central lung cancer); if (cough,1) and (bloody sputum,1) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,1) and (feeling tired,1) and (anorexia,1) and (local diffusion,0) and (distant metastasis,1), then (lung cancer, central lung cancer); if (cough,1) and (bloody sputum,1) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,1) and (feeling tired,1) and (anorexia,1) and (local diffusion,0) and (distant metastasis,1), then (lung cancer, peripheral lung cancer); if (cough,1) and (bloody sputum,1) and (chest pain,1) and (fever,1) and (shout breath,1) and (thin,1) and (feeling tired,1) and (anorexia,1) and (local diffusion,0) and (distant metastasis,1), then (lung cancer, peripheral lung cancer); if (cough,1) and (bloody sputum,0) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,0) and (feeling tired,0) and (anorexia,0) and (local diffusion,0) and (distant metastasis,0), then (lung cancer, peripheral lung cancer); if (cough,1) and (bloody sputum,0) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,0) and (feeling tired,0) and (anorexia,0) and (local diffusion,0) and (distant metastasis,0), then (lung cancer, thin bronchuses lung bubble cancer); if (cough,1) and (bloody sputum,0) and (chest pain,1) and (fever,0) and (shout breath,1) and (thin,0) and (feeling tired,0) and (anorexia,0) and (local diffusion,1) and (distant metastasis,0), then (lung cancer, thin bronchuses lung bubble cancer). if (cough,1) and (bloody sputum,0) and (chest pain,0) and (fever,0) and (shout breath,1) and (thin,0) and (feeling tired,0) and (anorexia,0) and (local diffusion,0) and (distant metastasis,0), then (lung cancer, thin bronchuses lung bubble cancer).
Further analysis of decision rules induced from Table 1, we can note that some rules are inconsistent, such as rule 1 and 2. This leads to patients 26 and 27 could not be easily diagnosed by these rules. One approach to overcoming this problem is using the following method based on the Pawlak’s rough membership function to take the most frequent decision in the decision table.
An equivalence relation I (C) can be defined on U = {p i : 1 ≤ i ≤ 28} by set of condition attributes C as follows:
I (C) = {(x, y) ∈ U × U : f a (x) = f a (y) , ∀ a ∈ C}, where f a (x) is the value of a on x ∈ U.
For C = { cough, bloody sputum, chest pain, fever, shout breath, thin, feeling tired, anorexia, local diffusion, distant metastasis }, denote C (x) = {y ∈ U : (x, y) ∈ I (C)}.
By Table 1, it is easy to verify that
Thus, U/I (C) = {C (p1) , C (p9) , C (p10) , C (p17) , C (p18) , C (p20) , C (p26)} is a partition of U, and Fig. 1 presents its intuitive description.
Let 〈U, U/I〉 be the Pawlak approximation space and let X1 = {p i : 1 ≤ i ≤ 10}, X2 = {p i : 11 ≤ i ≤ 18} and X3 = {p i : 19 ≤ i ≤ 26}. Figure 2 presents its intuitive description.
By Fig. 2, we can easily calculate values of the Pawlak’s rough membership function of p
i
(i = 27, 28) belonging to X
i
(i = 1, 2, 3) with respect to R = U/I (C) as follows:
We can make a preliminary judgement that p27 and p28 probably suffers central lung cancer and peripheral lung cancer, respectively.
Limitations of rough membership function
In Example 1, we demonstrate that how we can make frequent decisions in diagnosis of lung cancer by using the Pawlak’s rough membership function when all the symptoms of illness are clear. However, in some cases, since patients are unable to describe all the symptoms of illness expressly and the clinical treatment levels of doctors are not high enough to make them clear either, the descriptions of clinical data about symptoms of patients are incomplete, such as those of patients of p29 and p30 presented in Table 1. In such cases, different of what we did in Example 1, we can not take the most frequent decision by means of the Pawlak’s rough membership function. For example, on U = {p i : 1 ≤ i ≤ 26 or i = 29, 30}, the set of condition attributes C is a covering rather than a partition, because the blocks which are formulated by condition attributes C have overlaps. Taking p29 for example, since the values of condition attributes fever and local duffusion are unknown, they can be 0 or 1. If the values are 1, then by I (C), p9 and p29 are indiscernible. Thus, {p9, p29} is a block determined by condition attributes C. If the value are 0, then by I (C), p10, p11, p12, p13, p14, p15, p16 and p29 are indiscernible. So, {p10, p11, p12, p13, p14, p15, p16, p29} is a block determined by condition attributes C. Therefore, the above two blocks have a common element p29, and it follows that condition attributes C is a covering instead of a partition. The condition of p30 is similar. Thus, the Pawlak’s rough membership function based on equivalence relation can not be used to make the frequent decisions for patients p29 and p30 in Table 1. To solve this problem, the approach in presented Section 2.3 should be improved by using rough membership function based on covering instead of the Pawlak’s rough membership function based on equivalence relation.
Basic concepts
In this section, we present the basic concepts needed in this paper. To begin with, we list some definitions in probabilistic approaches to rough sets.
P (U) =1, P (A ∪ B) = P (A) + P (B), where A, B ∈ 2
U
and A ∩ B = φ.
Then, we show some concepts about coverings to be used in this paper.
In the following discussion, unless stated to the contrary, the universe are considered to be finite, and it follows that coverings consist of a finite number of sets.
The following facts about are obvious:
Main results
In this section, we study the rough membership function based on . First, we construct a rough membership function which is based on topological structures of the . Then, we present numerical characterizations of the by means of the covering-based rough membership function.
Definitions of the and its rough membership function
We call CL the covering lower approximation operation and the tenth type of covering upper approximation operation.
If g x (y) =1, then , from the Definition of N (x), thus y ∈ N (x).
there exists x0 ∈ X such that y ∈ N (x0)
G
x
(y) =1
(2 ⇒ 3) From the Lemma 1, it is easy to proof.
(3 ⇒ 1) From the Lemma 1 and Definition 10, it is obvious.
Numerical characterizations
On the other hand, ∀y ∈ U, if , then for any x ∈ X, we have y ∉ N (x), by the Definition 10, whence G x (y) =0. Therefore .
On the other hand, ∀y ∈ U, if , then there exists x0 ∈ X such that G x 0 (y) =1. From the Lemma 2, we know y ∈ N (x0) and there exists such that x0, y ∈ K and K ⊆ X. So y ∈ CL (X).
On the other hand, if , from the Lemmas 2 and 4, it is easy to prove . Thus .
By combining Theorems 1 and 2, we can get the following result:
Theoretical backgrounds and practical applications of rough membership function on the tenth type of covering-based rough sets
In this part, we discuss relationships between covering-based probabilistic rough sets and the tenth type of covering-based rough sets first, which gives us the theoretical backgrounds of covering-based rough membership function studied in this paper. Then we use the example in Section 2.4 to express practical applications of this function.
Theoretical backgrounds
The covering-based probabilistic rough sets proposed in this paper can be degenerated into covering-based rough sets as follows.
In Definition 12, if the parameters α = 1, β = 0 and
, then the lower approximation and the upper approximation are degenerated into the lower approximation CL (X) and the upper approximation respectively in the covering approximation space. That is, for any X ⊆ U,
Practical applications
In Section 2.4, we point out that we cannot use the Pawlak’s rough membership function based on equivalence relation to make the frequent decisions for patients p29 and p30 in Table 1, because the data of their symptoms of illness is incomplete. In the following, we show that we can solve this problem by using the membership function on covering-based rough set which is proposed in Section 4.
For set of condition attributes C, a similarity relation can be defined on U:
or f a (x) = * or * = f a (y)}, where f a (x) is the value of a on x ∈ U, and * indicates unknown values.
Moreover, for C = { cough, bloody sputum, chest pain, fever, shout breath, thin, feeling tired, anorexia, local diffusion, distant metastasis }, we write .
By Table 1, it is easy to verify that
Let X1 = {p
i
: 1 ≤ i ≤ 10}, X2 = {p
i
: 11 ≤ i ≤ 18} and X3 = {p
i
: 19 ≤ i ≤ 26}. By Fig. 4 and Definition 12, we can obtain:
It is found that the degrees of p29 belonging to X1 is only , and X3 is 0, which means that it may not be central lung cancer and thin bronchuses lung cancer with respect to conditional attributes set C, although there are two conditional attributes unknown for p29. The membership degree of p29 belonging to X2 is , and it means that p29 may well be peripheral lung cancer. Although the accuracy of this decision should be further validated by means of clinical analysis unless peripheral lung cancer has been confirmed, we can make a preliminary judgement which will help the doctors in making their finial decisions. Similarly, according to the membership degrees of p30 belonging to X i (1 ≤ i ≤ 3), we can make a preliminary decision that p30 may well be thin bronchuses lung bubble cancer.
Conclusion
The main contribution of this paper is to construct the tenth type of covering-based rough membership function and discuss its properties and applications in depth. First, by using a practical example in evidence-based medicine, we show the applications of Pawlak’s rough membership function in real life and the limitations of it. Then, based on topological structures of the tenth type of covering-based rough sets which is examined by Tsang et al. in [27], we create corresponding covering-based rough membership function and present numerical characterizations of this rough set by this function. Furthermore, the theoretical backgrounds of it are discussed. At the end, we illustrate practical applications of this covering-based rough membership function in medical diagnosis.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China, Grant Nos. 11371130, 61273018 and by the Natural Science Foundation of Guangxi, Grant No. 2014GXNSFBA118015, 201204LX335, 2012YJZD20.
