Abstract
The decision-theoretic rough set, as a special case of probabilistic rough set, mainly adopts Bayesian decision procedure to achieve the thresholds from a given loss function. It provides a novel semantic interpretation for rough regions by utilizing three-way decision approach and has been widely applied in decision making. However, there is a limitation of classical decision-theoretic rough set that it lacks of ability to deal with hybrid data. Where the condition attributes are composed of multiple types, for instance, real-valued, set-valued, interval-valued, fuzzy-valued, intuitionistic fuzzy-valued attribute and so on. These complex data constitute a knowledge representation system named lattice-valued decision information system. In this talk, we develop a decision-theoretic rough set model in a lattice-valued decision information system to study these hybrid data. Then, some essential properties of this model are addressed and decision rules are investigated. Furthermore, we design two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. Finally, a series of examples based on medical diagnosis are conducted to interpret decision rules and demonstrate these algorithms.
Keywords
Introduction
Rough set theory (RST) which was originated by Pawlak in the early 1980s, is an extension of classical set theory and could be regarded as a mathematical and soft computing tool to handle imprecision, vagueness and uncertainty in data analysis [23]. RST is built on the basis of a classification mechanism, it is classified by an equivalence relation in a specific universe and constitutes a partition of the universe. In the viewpoint of granular computing (GrC), an equivalence class can be viewed as a knowledge granule which be induced by an indiscernibility relation. The main innovation of RST is the use of some knowledge in knowledge base to approximate the inaccurate and uncertain knowledge by a pair of approximation operations. It has become a well-established theory for uncertainty management in a wide variety of applications related to feature selection [8, 9], uncertainty analysis [4], data modeling [29], information fusion [5, 38], knowledge discovery [45], dynamic updating [44], rule learning [13] and pattern recognition [31].
In recent decades, since there are no fault tolerance mechanisms between knowledge granules and concept set, several proposals of generalized quantitative rough set models were developed to resolve this limitation by using a graded set inclusion. The probabilistic rough set (PRS) introduces the probability uncertainty measure into RST [24], which forms the basis of mainstream quantitative models [1 , 46]. PRS offers measurability, generality, and flexibility and exhibits a series of concrete models which consist of the game-theoretic rough set [2], variable rough set [47], Bayesian rough set [32], parameter rough set [7], and decision-theoretic rough set (DTRS) [40]. They aim at modeling data relationships expressed in terms of frequency distribution rather than in terms of a full inclusion relation, which is used in the classical definition of rough set. DTRS systematically calculate the thresholds with respect to a set of loss functions based on Bayesian decision procedure. The physical meaning of the loss function can be interpreted according to practical notions of costs and risks [42]. It provides a novel semantic interpretation of the positive region, negative region and boundary region by applying the three-way decision theory and has been widely utilized into data mining [18, 21] and decision analysis [12, 17]. In order to solve the practical problems which from different backgrounds, a large number of generalized models are developed, for instance, multigranulation decision-theoretic rough set [27] and its expansion models [20, 28], neighborhood based decision-theoretic rough set [15], double-quantitative decision-theoretic rough set [14, 37], decision-theoretic rough fuzzy set [34] and decision-theoretic fuzzy rough set [6], decision-theoretic rough set in dynamic environments [3, 30].
Since the uncertainty of human cognitive and various random factors widely exist in practical data. A generalized information system is needed to depict the hybrid knowledge. The lattice-valued decision information system (LvDIS) which consist of real-valued attribute set, set-valued attribute set, interval-valued attribute set, fuzzy-valued attribute set and intuitionistic fuzzy-valued attribute set, it is better for describing hybrid data [36, 49]. It means that the condition attributes are composed of multiple types where the domain of all condition attributes are finite lattices [22]. Prior to this research, Zhang developed a novel notion of lattice-valued interval soft sets as a general frame of soft set model [48], Xu addressed some essential properties of lattice-valued information system based on rough set theory [36], and Zhang studied the approach of rules acquisition in lattice-valued information system with fuzzy decision [49].
However, to the best of our knowledge, there are numerous achievements about decision-theoretic rough set and its generalized models but few approaches can be directly applied to lattice-valued decision information system. Consequently, we try to develop a decision-theoretic rough set model in a lattice-valued decision information system. This paper defines a novel partial ordering relation with respect to multiple types of condition attributes and establishes a decision-theoretic rough set model based on a known loss function. Then, two heuristic attribute reduction algorithms based on rough entropy and positive region preservation are designed to reduce the redundant knowledge, respectively. Parallel with theoretical research, we conduct a series of verification examples to interpret decision rules and demonstrate the algorithms.
The remainder of this paper is organized as follows. Section 2 outlines some preliminary and necessary knowledge. Then, a decision-theoretic rough set model is established in a lattice-valued decision information system in Section 3. Furthermore, Section 4 represents two heuristic at-tribute reduction algorithms. At last, Section 5 concludes this investigation and identifies further study direction.
Preliminaries
In this section, we briefly introduce some necessary notions which consist of rough set, lattice-valued decision information system and decision-theoretic rough set. It should be noted that
Rough set
The RST is built based on an information system which is a quadruple S = (U, AT, V, f), where U is a finite non-empty set of objects, AT is a finite non-empty set of attributes, V is a set of attribute value and f is a mapping which from U to V, the f
a
(x) means the attribute value of x with respect to a. In RST, an equivalence relation is the foundation of classification mechanism and it can be defined with respect to A (where A ⊆ AT) as follows:
According to the indiscernibility relation R
A
, we can obtain an equivalence class containing x that [x]
A
= {y ∈ U : (x, y) ∈ R
A
}. In the view of GrC, equivalence classes are the basic building blocks for the representation and approximation of concept. Each equivalence class may be viewed as a granule consisting of indistinguishable elements. For any basic concept
Then, we can obtain the rough regions based on this pair of approximation operators.
Lattice-valued decision information system
In an information system, an attribute is a criterion if the domain of the attribute is ordered according to a decreasing or increasing preference. Thus, an indiscernibility relation should be replaced by a partially order relation. Let
An information system S = (U, AT, V, f) is a lattice-valued information system (LvIS) if for any a ∈ AT have (V
a
, ≽) is a finite lattice, where V
a
is the range of a and "≽" is the partial ordering relation on V
a
with respect to a. Furthermore, a LvIS is lattice-valued decision information system (LvDIS) if there is a decision attribute set D and D∩ AT = ∅, which is denoted as
In order to establish a fault tolerance mechanism between the equivalence classes and basic concept set, Pawlak and Skowron [25] suggested using a rough membership function to redefine the two approximations and the rough membership function μ
A
(X) is defined as follows:
In the Bayesian decision produce, a finite set of states can be written as Ω = {ω
1, ω
2, ⋯ , ω
s
}, and a finite set of r possible actions can be denoted by A = {a
1, a
2, ⋯ , a
r
}. Let P (ω
j
|
With respect to the membership of an object in X, we have a set of two states and a set of three actions for each state. The set of states is given by Ω = {X, X C } indicating that an element is in X or not in X, respectively. The set of actions with respect to a state is given by A = {a P , a B , a N }, where P, B and N represent the three actions in deciding x ∈ pos (X), deciding x ∈ bn (X), and deciding x ∈ neg (X), respectively. The loss function regarding the risk or cost of actions in different states is given by following way:
In Table 1, λ PP , λ NP and λ BP denote the losses incurred for taking actions a P , a N and a B when an object belongs to X, and λ PN , λ NN and λ BN denote the losses incurred for taking the same actions when the object does not belong to X, respectively. The expected loss R (a i | [x] R ) associated with taking the individual actions can be expressed in [29].
The loss function
When λ
PP
≤ λ
BP
< λ
NP
and λ
NN
≤ λ
BN
< λ
PN
, the Bayesian decision procedure leads to the following minimum-risk decision rules: If P (X| [x]
R
) ≥ γ and P (X| [x]
R
) ≥ α, decide pos (X); If P (X| [x]
R
) ≤ β and P (X| [x]
R
) ≤ γ, decide neg (X); If β ≤ P (X| [x]
R
) ≤ α, decide bn (X).
Where the parameters α, β and γ are defined as:
If a loss function further satisfies the condition that (λ
PN
- λ
BN
) (λ
NP
- λ
BP
) ≥ (λ
BN
- λ
NN
) (λ
BP
- λ
PP
), then we can get α ≥ γ ≥ β, then α > γ > β if α > β, thus, the DTRS has the decision rules: If P (X| [x]
R
) ≥ α, decide pos (X); If P (X| [x]
R
) ≤ β, decide neg (X); If β < P (X| [x]
R
) < α, decide bn (X).
Using these three decision rules, we can obtain the probabilistic approximations, namely, the lower and upper approximations of DTRS model as follows:
Furthermore, the positive region
In this section, we will discuss the decision-theoretic rough set in a lattice-valued decision information system. Since the attribute value set of a LvDIS which consist of real-valued, set-valued, interval-valued, fuzzy-valued and intuitionistic fuzzy-valued attribute set and so on. Therefore, we should investigate the partial ordering relation with respect to the hybrid values information system.
f (x, a) ≥ f (y, a), if a is a real-valued attribute;
f (x, a) ⊇ f (y, a), if a is a set-valued attribute;
f
+ (x, a) ≥ f
+ (y, a) and f
- (x, a) ≥ f
- (y, a), if a is a interval-valued attribute, where the f
- (x, a) and f
+ (x, a) are the left and right endpoint of x with respect to a, respectively;
f (x, a) ≥ f (y, a), if a is a fuzzy-valued attribute;
f
μ
(x, a) ≥ f
μ
(y, a) and f
ν
(x, a) ≤ f
ν
(y, a), if a is a intuitionistic fuzzy-valued attribute, where f
μ
(x, a) and f
ν
(x, a) are membership degree and non-membership degree of x with respect to a, respectively.
Based on the above descriptions, the partial ordering relation with respect to any A ⊆ AT can be denoted by following way.
where the (x, y) ∈ U × U and f (x, a) ≽ f (y, a) means that x is at least as good as y with respect to criterion a. According to Definition 3.1, the description of
Where A
R
, A
S
, A
I
, A
F
and A
IF
denotes real-valued, set-valued, interval-valued, fuzzy-valued and intuitionistic fuzzy-valued attribute set, respectively, and A = A
R
∪ A
S
∪ A
I
∪ A
F
∪ A
IF
. It is clear that the partial ordering relation
Typically, a decision rule in RST is exhibited as the form of Positive rule: if Negative rule: if Boundary rule: if
Given a rule
This measure focus on the quality of rule, the bigger the confidence, the higher reliability the rule is. It can be described as a condition probability
where the thresholds α and β are determined by a given loss function. Here, the positive region, negative region and boundary region of D
j
with respect to
According to Definition 3.4, it is not difficult to obtain the following descriptions of rough regions.
Based on the previous discussions, we will conduct an example to exhibit the process of decision-theoretic rough set modeling in a LvDIS. Here, it should be noted that the data in this case is constructed from [36].
A lattice-valued decision information system
The universe U is divided into two distinct parts by decision attribute D that π
D
= {D
1, D
2}, where D
1 = {x
1, x
3, x
6, x
8, x
9} and D
2 = {x
2, x
4, x
5, x
7, x
10}. Furthermore, we can obtain a binary relation matrix regards to attribute set AT based on Definition 3.2.
The matrix M = (m
ij
) |U|×|U|, where m
ij
means x
j
≽ x
i
with respect to
Then, we can obtain the confidence of rule
The confidence of remainder rules can be calculated in a same way. Here, the confidence of rule
Then, we can get that α = 0.7 and β = 0.5. Thus, we can achieve the lower and upper approximations of D
1 and D
2 with respect to
Thus, we can achieve the following decision rules based on three-way decision theory, which are presented as follows: These patients x
1, x
2, x
4, x
7, x
10 are sub-healthy with respect to AT under the given loss function. They need to take further treatment. The x
6 and x
8 are healthy with respect to AT in terms of the given loss function. They are healthy in the light of current indicators. These people x
3, x
5, x
9 cannot be diagnosed based on present information. A further diagnosis is need to them.
With the rapid development data science has given rise to a massive volume of freely available, user-generated data. The advent of Big Data has seen both the sources and volumes of data increase rapidly. Meanwhile, there are many unpredictable factors that affect the validity and authenticity of data. It is almost impossible for people to make sense of the overall picture in a short period of time. Consequently, an effective data filtering approach is necessary for data mining. Since the significance of attribute for classification is different and the rough set is based on a classification mechanism. Thus, deleting some attributes which with little influence on classification is crucial for reducing the data dimension. Attribute reduction is one the most significant issues in RST, which is a useful mathematical tool for data mining and has been widely utilized in numerous fields. In an intuitive viewpoint, an attribute reduction is a minimal subset of original attributes which induced rule sets with the same level of performance as the entire set of attributes, or lower but satisfied some certain requirements.
Attribute reduction based on rough entropy
RST is established on a classification mechanism which induced by an attribute set. Consequently, a refined attribute set is defined by requiring that the classification mechanism is unchanged. There are numerous of measures to measure the quality of classification [16, 26]. In this subsection, we will conduct the investigation based on the rough entropy of knowledge.
For a lattice-valued decision information system, an attribute set A ⊆ AT is a reduction of AT if the following conditions are satisfied: Jointly sufficient condition: Individually necessary condition:
Usually, the reduction of AT is marked as Red (AT). An attribute a is necessary in A if
It is clear that there are minimum and maximum values of the rough entropy. The minimum of Er (A) is 0 if for any x ∈ U have
Since computing all attribute reductions is an NP-hard problem [33], there are numerous of heuristic algorithms for searching one reduction have been investigated, for instance [10 , 50]. A heuristic algorithm usually includes two segments which consist of heuristics and search strategy. For the heuristics in an attribute reduction heuristic algorithm, the core of attribute set is usually adopted. With respect to the search strategies of a heuristic algorithm, there are two kinds of are considered which include directional and nondirectional search strategy. The first one strategy can be further categorized into deletion method, addition method and addition-deletion method [39]. The second one is usually utilized into evolutionary algorithms, swarm algorithms and other population-based meta-heuristic algorithms for optimization issues [15]. In the following discussion, we will utilize the addition-deletion method for the sake of simplicity. Therefore, two measures are needed to measure the significance of an attribute. The first measure is the absolute significance of a in A.
In particular, we can obtain that Sig
in
(a, A) = |U| · log2|U| - Er ({a}) if A = {a}, that means Er (∅) = |U| · log2|U|, namely, there are maximum uncertainty. According to Definition 4.2, the following propositions hold for Sig
in
(a, A): 0 ≤ Sig
in
(a, A) ≤ |U| · log
2|U|; The attribute a is necessary in A if and only if Sig
in
(a, A) >0; The core of A is Core (A) = {a ∈ A : Sig
in
(a, A) >0}.
Corresponding to the absolute significance measure Sig in (a, A), we need define a relative significance to measure the remaining attributes that AT ∖ A. Where AT ∖ A is the difference set of them that AT ∖ A = {a : a ∈ AT, a ∉ A}.
It is clear that Sig out (∅ , A) =0. For any a ∈ AT ∖ A, the greater the relative significance Sig out (a, A) means the more significant of a relative to A. Consequently, it is usually believed as a heuristic index for one attribute reduction search. Here, we will design a heuristic attribute reduction algorithm based on Sig out (a, A) (see Algorithm 1).
In order to verify the feasibility of this algorithm, we conduct an example based on Algorithm 1.
Therefore, for any a ∈ AT, the absolute significance Sig
in
(a, AT) are listed as follows:
Sig
in
(a
1, AT) =6.24 - 6.24 = 0.0,
Sig
in
(a
2, AT) =8.33 - 6.24 = 2.09,
Sig
in
(a
3, AT) =7.04 - 6.24 = 0.80,
Sig
in
(a
4, AT) =6.24 - 6.24 = 0.0,
Sig
in
(a
5, AT) =8.74 - 6.24 = 2.50 .
Based on these achievements, we can get that the core of AT is Core (AT) = {a
2, a
3, a
5}. Utilizing Algorithm 1, let Red (AT) = {a
2, a
3, a
5}, then the relationship matrix with respect to {a
2, a
3, a
5} is
Then, we can calculate the rough entropy of Red (AT) and take a comparison with Er (AT) based on the relationship matrix M
{a
2,a
3,a
5}. According to the relationship matrix M
{a
2,a
3,a
5}.
It is clear that Er ({a 2, a 3, a 5}) ≠ Er (AT). That means we need compute the relative significance for attributes in AT ∖ Red (AT). According to Definition 4.3, we can get the following results:
Since Sig
out
(a
1, Red (AT)) is equal to Sig
out
(a
4, Red (AT)), thus we can get the following results:
Consequently, both {a
1, a
2, a
3, a
5} and {a
2, a
3, a
4, a
5} may be attribute reduction. Then, we need to take a further verification. Here, we can get
Er ({a
1, a
2, a
3, a
5}) = Er (AT) ,
Er ({a
2, a
3, a
4, a
5}) = Er (AT) ,
which means that both {a
1, a
2, a
3, a
5} and {a
2, a
3, a
4, a
5} are attribute reductions of AT. Usually we can obtain one of the attribute reductions by utilizing Algorithm 1, but not all of attribute reductions. We can select arbitrary one of them as a candidate if the relative significance of multiply attributes is the same. Typically, it is chosen with respect to its order in the information system. Therefore, the algorithm is suitable for finding an attribute reduction rather than all attribute reductions in practice applications.
In DTRS model, the decision rules are related to rough regions. Especially, the positive region plays an extremely important role in the decision-making process. In the above discussion, a heuristic attribute reduction algorithm with respect to rough entropy is investigated. Here, we will study another attribute reduction approach based on the positive region preservation.
From the viewpoint of semantic, the objects in positive region can be "probably" classified into a "certain" decision class. Larger positive region usually be with smaller uncertain region, which means less ambiguous objects [15]. In order to measure the classification ability quantitatively, the approximation quality of decision attributes D with respect to attribute set A in DTRS model is denoted as follows:
Given a lattice-valued decision information system Jointly sufficient condition: Individually necessary condition:
In classical rough set model, there exists a monotonicity of the positive region with respect to the set of condition attributes, that is, for any A
1 ⊆ A
2 we have pos
A
1
(π
D
) ⊆ pos
A
2
(π
D
) that means γ
A
1
(π
D
) ≤ γ
A
2
(π
D
), where
Jointly sufficient condition: Individually necessary condition:
In Definition 4.4, the quantitative index
Where "Δ" is symmetric difference of two sets that XΔY = (X ∪ Y) - (X ∩ Y). It is clear that Sig
in
(a, A, D) ∈ [0, 1], where Sig
in
(a, A, D) =0 means
Sig
in
(a
1, AT, D) =0,
Sig
in
(a
2, AT, D) =0,
Sig
in
(a
3, AT, D) =0.43,
Sig
in
(a
4, AT, D) =0,
Sig
in
(a
5, AT, D) =0.29.
It is not difficult to get that
Furthermore, we need to verify that each attribute is necessary according to line 10 to line 14 of Algorithm 2. The verification achievements show that all attributes are necessary in the attribute concentration in which it is located. Then we can obtain that all {a 1, a 3, a 5}, {a 2, a 3, a 5} and {a 3, a 4, a 5} are attribute reductions based on positive region preservation. A heuristic attribute reduction algorithm can effectively search a reduction rather than all reductions. It should be noted that if we just need to find a reduction when the significances of attributes are identical. We can select the candidate according to the original order of the attributes. That means attribute set {a 1, a 3, a 5} should be considered in priority. Combined with Example 3.1, we can get the positive decision rule that patients {x 1, x 2, x 4, x 7, x 10} are sub-healthy with respect to knowledge {a 1, a 3, a 5} under α = 0.7 and β = 0.5. It shows that attributes a 2 and a 4 are redundant to obtain the same positive region. From the perspective of medical diagnostics, the results indicate that some detection indicators are not necessary for the diagnosis of certain diseases.
In this discussion, we presented two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. According to the achievements of Example 4.1 and 4.2, we can obtain that using different algorithms may induce different results with respect to one data set. Actually, the essential purposes of these algorithms are different. Algorithm 1 is designed based on the influence of attributes in terms of classification, and rough entropy is an index to measure the quality of classification. It indicates that the objective of Algorithm 1 is classification preservation. However, the target of Algorithm 2 is positive rules preservation, that is maintain the positive region. Since the larger positive region usually be with smaller uncertain region, which means less ambiguous objects. Algorithm 2 does not focus on the classification, only concerns the positive rules. It is clear that the goal of Algorithm 1 is more rigorous than Algorithm 2, and the results of Algorithm 2 are included in the results of Algorithm 1. The experimental achievements also indicate these circumstances. Thus, we should choose an appropriate algorithm based on practical requirements in applications.
The uncertainty of human cognitive and numerous random factors is widespread in practical data. In order to more accurately depict these hybrid data that consist of multiple types of attributes, a lattice-valued decision information system was presented where the domain of all condition attributes are finite lattices. Based on this generalized decision information system, we define a novel partial ordering relation and establish a decision-theoretic rough set model to deal with the hybrid data which with various forms. With the advent of Big Data era has seen both the volumes and update rates of data increase rapidly, while the redundant information is increasing. Consequently, we design two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. Meanwhile, several indexes which be utilized to measure the significance of attributes are defined in the process of attribute reduction. Algorithm 1 and Algorithm 2 can be applied to find one of the attribute reductions but not all of attribute reductions in practical issues. This paper establishes a basic decision-theoretic rough set model in lattice-valued decision information system and presents two heuristic attribute reduction algorithms to reduce the redundant knowledge based on different targets. However, there is still a lot of work that needs to be further considered. Therefore, we will try to develop a decision-theoretic rough set model in dynamic circumstance and design a novel attribute reduction algorithm to further improve the efficiency in future study.
Footnotes
Acknowledgments
We would like to express our thanks to the Editor-in-Chief, handling associate editor and anonymous referees for his/her valuable comments and constructive suggestions.
