Decision-theoretic rough set in lattice-valued decision information system 1

Abstract

The decision-theoretic rough set, as a special case of probabilistic rough set, mainly adopts Bayesian decision procedure to achieve the thresholds from a given loss function. It provides a novel semantic interpretation for rough regions by utilizing three-way decision approach and has been widely applied in decision making. However, there is a limitation of classical decision-theoretic rough set that it lacks of ability to deal with hybrid data. Where the condition attributes are composed of multiple types, for instance, real-valued, set-valued, interval-valued, fuzzy-valued, intuitionistic fuzzy-valued attribute and so on. These complex data constitute a knowledge representation system named lattice-valued decision information system. In this talk, we develop a decision-theoretic rough set model in a lattice-valued decision information system to study these hybrid data. Then, some essential properties of this model are addressed and decision rules are investigated. Furthermore, we design two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. Finally, a series of examples based on medical diagnosis are conducted to interpret decision rules and demonstrate these algorithms.

Keywords

Attribute reduction decision-theoretic rough set lattice-valued decision information system positive region preservation rough entropy

1. Introduction

Rough set theory (RST) which was originated by Pawlak in the early 1980s, is an extension of classical set theory and could be regarded as a mathematical and soft computing tool to handle imprecision, vagueness and uncertainty in data analysis [23]. RST is built on the basis of a classification mechanism, it is classified by an equivalence relation in a specific universe and constitutes a partition of the universe. In the viewpoint of granular computing (GrC), an equivalence class can be viewed as a knowledge granule which be induced by an indiscernibility relation. The main innovation of RST is the use of some knowledge in knowledge base to approximate the inaccurate and uncertain knowledge by a pair of approximation operations. It has become a well-established theory for uncertainty management in a wide variety of applications related to feature selection [8, 9], uncertainty analysis [4], data modeling [29], information fusion [5, 38], knowledge discovery [45], dynamic updating [44], rule learning [13] and pattern recognition [31].

In recent decades, since there are no fault tolerance mechanisms between knowledge granules and concept set, several proposals of generalized quantitative rough set models were developed to resolve this limitation by using a graded set inclusion. The probabilistic rough set (PRS) introduces the probability uncertainty measure into RST [24], which forms the basis of mainstream quantitative models [1 , 46]. PRS offers measurability, generality, and flexibility and exhibits a series of concrete models which consist of the game-theoretic rough set [2], variable rough set [47], Bayesian rough set [32], parameter rough set [7], and decision-theoretic rough set (DTRS) [40]. They aim at modeling data relationships expressed in terms of frequency distribution rather than in terms of a full inclusion relation, which is used in the classical definition of rough set. DTRS systematically calculate the thresholds with respect to a set of loss functions based on Bayesian decision procedure. The physical meaning of the loss function can be interpreted according to practical notions of costs and risks [42]. It provides a novel semantic interpretation of the positive region, negative region and boundary region by applying the three-way decision theory and has been widely utilized into data mining [18, 21] and decision analysis [12, 17]. In order to solve the practical problems which from different backgrounds, a large number of generalized models are developed, for instance, multigranulation decision-theoretic rough set [27] and its expansion models [20, 28], neighborhood based decision-theoretic rough set [15], double-quantitative decision-theoretic rough set [14, 37], decision-theoretic rough fuzzy set [34] and decision-theoretic fuzzy rough set [6], decision-theoretic rough set in dynamic environments [3, 30].

Since the uncertainty of human cognitive and various random factors widely exist in practical data. A generalized information system is needed to depict the hybrid knowledge. The lattice-valued decision information system (LvDIS) which consist of real-valued attribute set, set-valued attribute set, interval-valued attribute set, fuzzy-valued attribute set and intuitionistic fuzzy-valued attribute set, it is better for describing hybrid data [36, 49]. It means that the condition attributes are composed of multiple types where the domain of all condition attributes are finite lattices [22]. Prior to this research, Zhang developed a novel notion of lattice-valued interval soft sets as a general frame of soft set model [48], Xu addressed some essential properties of lattice-valued information system based on rough set theory [36], and Zhang studied the approach of rules acquisition in lattice-valued information system with fuzzy decision [49].

However, to the best of our knowledge, there are numerous achievements about decision-theoretic rough set and its generalized models but few approaches can be directly applied to lattice-valued decision information system. Consequently, we try to develop a decision-theoretic rough set model in a lattice-valued decision information system. This paper defines a novel partial ordering relation with respect to multiple types of condition attributes and establishes a decision-theoretic rough set model based on a known loss function. Then, two heuristic attribute reduction algorithms based on rough entropy and positive region preservation are designed to reduce the redundant knowledge, respectively. Parallel with theoretical research, we conduct a series of verification examples to interpret decision rules and demonstrate the algorithms.

The remainder of this paper is organized as follows. Section 2 outlines some preliminary and necessary knowledge. Then, a decision-theoretic rough set model is established in a lattice-valued decision information system in Section 3. Furthermore, Section 4 represents two heuristic at-tribute reduction algorithms. At last, Section 5 concludes this investigation and identifies further study direction.

2. Preliminaries

In this section, we briefly introduce some necessary notions which consist of rough set, lattice-valued decision information system and decision-theoretic rough set. It should be noted that $P (U)$ is the power set of U, the ∼X and X ^C are the complement of X, and |X| means the cardinality of set X throughout this paper.

2.1. Rough set

The RST is built based on an information system which is a quadruple S = (U, AT, V, f), where U is a finite non-empty set of objects, AT is a finite non-empty set of attributes, V is a set of attribute value and f is a mapping which from U to V, the f _a (x) means the attribute value of x with respect to a. In RST, an equivalence relation is the foundation of classification mechanism and it can be defined with respect to A (where A ⊆ AT) as follows: $R_{A} = {(x, y) \in U \times U : f_{a} (x) = f_{a} (y), a \in A} .$

According to the indiscernibility relation R _A, we can obtain an equivalence class containing x that [x] _A = {y ∈ U : (x, y) ∈ R _A}. In the view of GrC, equivalence classes are the basic building blocks for the representation and approximation of concept. Each equivalence class may be viewed as a granule consisting of indistinguishable elements. For any basic concept $X \in P (U)$ , one can characterize X by a pair of lower and upper approximations as follows: $\begin{matrix} {\underline{R}}_{A} (X) = {x \in U : [x]_{A} \subseteq X}, \\ {\bar{R}}_{A} (X) = {x \in U : [x]_{A} \cap X \neq \emptyset} . \end{matrix}$

Then, we can obtain the rough regions based on this pair of approximation operators.

2.2. Lattice-valued decision information system

In an information system, an attribute is a criterion if the domain of the attribute is ordered according to a decreasing or increasing preference. Thus, an indiscernibility relation should be replaced by a partially order relation. Let $L = (U, ≽)$ be a partially ordered set (poset). Then, the $L$ is called a lattice if for any x, y ∈ U have both infimum and supremum, which denoted by x ∧ y and x ∨ y, respectively. Furthermore, the $L$ is a finite lattice if U is finite.

An information system S = (U, AT, V, f) is a lattice-valued information system (LvIS) if for any a ∈ AT have (V _a, ≽) is a finite lattice, where V _a is the range of a and "≽" is the partial ordering relation on V _a with respect to a. Furthermore, a LvIS is lattice-valued decision information system (LvDIS) if there is a decision attribute set D and D∩ AT = ∅, which is denoted as $LS = (U, AT \cup D, V, f)$ . It is clear that an indiscernibility relation is invalid in a lattice-valued information system. So, we will introduce a partial ordering relation. For any A ⊆ AT, if the attribute set A according to increasing preference or decreasing preference, then we can construct the dominance relation $R_{A}^{≽}$ with regard to A. In [36], the dominance relation with respect to A in a lattice-valued information is defined as follows:

$R_{A}^{≽} = {(x, y) \in U \times U : x ≽_{A} y},$ (1) where x ≽ _A y means that x is at least as good as y with respect to any criterion a ∈ A, namely, the x dominates y or y is dominated by x. So, the dominance class of any x ∈ U with respect to A is denoted as $[x]_{A}^{≽} = {y \in U : (y, x) \in R_{A}^{≽}}$ . It describes the set of objects that may dominates x in terms of A. Furthermore, the granular structure of U with respect to A is $U / R_{A}^{≽} = {[x_{1}]_{A}^{≽}, [x_{2}]_{A}^{≽}, \dots,$ $[x_{n}]_{A}^{≽}}$ where n = |U|.

2.3. Decision-theoretic rough set

In order to establish a fault tolerance mechanism between the equivalence classes and basic concept set, Pawlak and Skowron [25] suggested using a rough membership function to redefine the two approximations and the rough membership function μ _A (X) is defined as follows: $μ_{A} (X) = \frac{| [x]_{A} \cap X |}{| [x]_{A} |} .$ (2)

In the Bayesian decision produce, a finite set of states can be written as Ω = {ω ₁, ω ₂, ⋯ , ω _s}, and a finite set of r possible actions can be denoted by A = {a ₁, a ₂, ⋯ , a _r}. Let P (ω _j|x) be the conditional probability of an object x being in state ω _j given that the object is described by x. Let λ (a _i|ω _j) denote the loss, or cost for taking action a _i when the state is ω _j, the expected loss function associated with taking action a _i is given by: $R (a_{i} | x) = \sum_{j = 1}^{s} λ (a_{i} | ω_{j}) P (ω_{j} | x) .$ (3)

With respect to the membership of an object in X, we have a set of two states and a set of three actions for each state. The set of states is given by Ω = {X, X ^C} indicating that an element is in X or not in X, respectively. The set of actions with respect to a state is given by A = {a _P, a _B, a _N}, where P, B and N represent the three actions in deciding x ∈ pos (X), deciding x ∈ bn (X), and deciding x ∈ neg (X), respectively. The loss function regarding the risk or cost of actions in different states is given by following way:

In Table 1, λ _PP, λ _NP and λ _BP denote the losses incurred for taking actions a _P, a _N and a _B when an object belongs to X, and λ _PN, λ _NN and λ _BN denote the losses incurred for taking the same actions when the object does not belong to X, respectively. The expected loss R (a _i| [x] _R) associated with taking the individual actions can be expressed in [29].

Table 1

The loss function

	X (P)	X ^C (P)
a _P	λ _PP	λ _PN
a _N	λ _NP	λ _NN
a _B	λ _BP	λ _BN

$\begin{matrix} R (a_{P} | [x]_{R}) = λ_{PP} P (X | [x]_{R}) + λ_{PN} P (X^{C} | [x]_{R}), \\ R (a_{N} | [x]_{R}) = λ_{NP} P (X | [x]_{R}) + λ_{NN} P (X^{C} | [x]_{R}), \\ R (a_{B} | [x]_{R}) = λ_{BP} P (X | [x]_{R}) + λ_{BN} P (X^{C} | [x]_{R}) . \end{matrix}$

When λ _PP ≤ λ _BP < λ _NP and λ _NN ≤ λ _BN < λ _PN, the Bayesian decision procedure leads to the following minimum-risk decision rules:

If P (X| [x] _R) ≥ γ and P (X| [x] _R) ≥ α, decide pos (X);

If P (X| [x] _R) ≤ β and P (X| [x] _R) ≤ γ, decide neg (X);

If β ≤ P (X| [x] _R) ≤ α, decide bn (X).

Where the parameters α, β and γ are defined as: $α = \frac{λ_{PN} - λ_{BN}}{(λ_{PN} - λ_{BN}) + (λ_{BP} - λ_{PP})},$ (4) $β = \frac{λ_{BN} - λ_{NN}}{(λ_{BN} - λ_{NN}) + (λ_{NP} - λ_{BP})},$ (5) $γ = \frac{λ_{PN} - λ_{NN}}{(λ_{PN} - λ_{NN}) + (λ_{NP} - λ_{PP})} .$ (6)

If a loss function further satisfies the condition that (λ _PN - λ _BN) (λ _NP - λ _BP) ≥ (λ _BN - λ _NN) (λ _BP - λ _PP), then we can get α ≥ γ ≥ β, then α > γ > β if α > β, thus, the DTRS has the decision rules:

If P (X| [x] _R) ≥ α, decide pos (X);

If P (X| [x] _R) ≤ β, decide neg (X);

If β < P (X| [x] _R) < α, decide bn (X).

Using these three decision rules, we can obtain the probabilistic approximations, namely, the lower and upper approximations of DTRS model as follows: $\begin{matrix} {\underline{R}}_{A}^{(α, β)} (X) = {x \in U : P (X | [x]_{A}) \geq α}, \\ {\bar{R}}_{A}^{(α, β)} (X) = {x \in U : P (X | [x]_{A}) > β} . \end{matrix}$

Furthermore, the positive region ${pos}_{A}^{(α, β)}$ , negative region ${neg}_{A}^{(α, β)}$ and boundary region ${bn}_{A}^{(α, β)}$ of X with respect to A and thresholds α, β can be achieved.

3. Decision-theoretic rough set in lattice-valued decision information system

In this section, we will discuss the decision-theoretic rough set in a lattice-valued decision information system. Since the attribute value set of a LvDIS which consist of real-valued, set-valued, interval-valued, fuzzy-valued and intuitionistic fuzzy-valued attribute set and so on. Therefore, we should investigate the partial ordering relation with respect to the hybrid values information system.

Definition 3.1. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, the criterions for any a ∈ AT are characterized as follows:

f (x, a) ≥ f (y, a), if a is a real-valued attribute;

f (x, a) ⊇ f (y, a), if a is a set-valued attribute;

f ⁺ (x, a) ≥ f ⁺ (y, a) and f ^- (x, a) ≥ f ^- (y, a), if a is a interval-valued attribute, where the f ^- (x, a) and f ⁺ (x, a) are the left and right endpoint of x with respect to a, respectively;

f (x, a) ≥ f (y, a), if a is a fuzzy-valued attribute;

f ^μ (x, a) ≥ f ^μ (y, a) and f ^ν (x, a) ≤ f ^ν (y, a), if a is a intuitionistic fuzzy-valued attribute, where f ^μ (x, a) and f ^ν (x, a) are membership degree and non-membership degree of x with respect to a, respectively.

Based on the above descriptions, the partial ordering relation with respect to any A ⊆ AT can be denoted by following way.

Definition 3.2. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, a partial ordering relation with respect to any A ⊆ AT is denoted in following way: $R_{A}^{≽} = {(x, y) : f (x, a) ≽ f (y, a), \forall a \in A} .$ (7)

where the (x, y) ∈ U × U and f (x, a) ≽ f (y, a) means that x is at least as good as y with respect to criterion a. According to Definition 3.1, the description of $R_{A}^{≽}$ can be presented in following ways. $\begin{matrix} R_{A}^{≽} = {\begin{matrix} (x, y) : f (x, a) \geq f (y, a), \forall a \in A^{R}; \\ (x, y) : f (x, a) \supseteq f (y, a), \forall a \in A^{S}; \\ (x, y) : f^{\pm} (x, a) \geq f^{\pm} (y, a), \forall a \in A^{I}; \\ (x, y) : f (x, a) \geq f (y, a), \forall a \in A^{F}; \\ (x, y) : f^{μ} (x, a) \geq f^{μ} (y, a), \\ f^{ν} (x, a) \leq f^{ν} (y, a), \forall a \in A^{IF} . \end{matrix} \end{matrix}$

Where A ^R, A ^S, A ^I, A ^F and A ^IF denotes real-valued, set-valued, interval-valued, fuzzy-valued and intuitionistic fuzzy-valued attribute set, respectively, and A = A ^R ∪ A ^S ∪ A ^I ∪ A ^F ∪ A ^IF. It is clear that the partial ordering relation $R_{A}^{≽}$ is reflective and transitive, but not symmetric. Analogously, the basic knowledge granule with respect to A can be denoted as $A ≽ = {y \in U : (y, x) \in R_{A}^{≽}} .$ (8) Based on the basic knowledge granular structure, the approximations of any concept $X \in P (U)$ with respect to A ⊆ AT are depicted as $\begin{matrix} \underline{R_{A}^{≽}} (X) = {x \in U : [x]_{A}^{≽} \subseteq X}, \\ \bar{R_{A}^{≽}} (X) = {x \in U : [x]_{A}^{≽} \cap X \neq \emptyset} . \end{matrix}$

Typically, a decision rule in RST is exhibited as the form of $[x]_{A}^{≽} \to D_{j}$ , stating that an object with description $[x]_{A}^{≽}$ would be in the decision class D _j. According to the definitions of positive region, negative region and boundary region, Yao introduced three types of decision rules [40], which consist of positive rules, negative rules and boundary rules. Analogously, let π _D be a partition of U with respect to the decision attributes. For any x ∈ U, D _j ∈ π _D, three kinds of classifications can be induced, which be described as follows:

Positive rule: if $[x]_{A}^{≽} \subseteq {pos}_{A}^{(α, β)} (D_{j})$ , that a positive rule can be induced, it is denoted as $[x]_{A}^{≽} \to_{P} D_{j}$ ;

Negative rule: if $[x]_{A}^{≽} \subseteq {neg}_{A}^{(α, β)} (D_{j})$ , that a negative rule can be obtained, it is denoted as $[x]_{A}^{≽} \to_{N} D_{j}$ ;

Boundary rule: if $[x]_{A}^{≽} \subseteq {bn}_{A}^{(α, β)} (D_{j})$ , that a boundary rule can be achieved, it is denoted as $[x]_{A}^{≽} \to_{B} D_{j}$ .

Given a rule $[x]_{A}^{≽} \to D_{j}$ , the confidence is presented as a ratio of the cardinality of knowledge granule $[x]_{A}^{≽}$ that are correctly classified as the decision class D _j and the cardinality of $[x]_{A}^{≽}$ , which be defined in following way.

Definition 3.3. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, the confidence of rule $[x]_{A}^{≽} \to D_{j}$ with respect to A is defined as follows: $confidence ([x]_{A}^{≽} \to D_{j}) = \frac{| [x]_{A}^{≽} \cap D_{j} |}{| [x]_{A}^{≽} |} .$ (9)

This measure focus on the quality of rule, the bigger the confidence, the higher reliability the rule is. It can be described as a condition probability $P (D_{j} | [x]_{A}^{≽})$ , which is directly associated with thresholds α and β that decide by a loss function. Therefore, we can define the decision-theoretic rough set in a lattice-valued decision information system based on this index.

Definition 3.4. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any D _j ∈ π _D, A ∈ AT and 0 < β < α ≤ 1, the lower and upper approximations of D _j with respect to $R_{A}^{≽}$ is denoted by following way:

$\begin{matrix} {\underline{R_{A}^{≽}}}_{(α, β)} (D_{j}) = {x \in U : P (D_{j} | [x]_{A}^{≽}) \geq α}, \\ {\bar{R_{A}^{≽}}}_{(α, β)} (D_{j}) = {x \in U : P (D_{j} | [x]_{A}^{≽}) > β} . \end{matrix}$

where the thresholds α and β are determined by a given loss function. Here, the positive region, negative region and boundary region of D _j with respect to $R_{A}^{≽}$ are presented as follows.

${pos}_{A}^{(α, β)} (D_{j}) = {\underline{R_{A}^{≽}}}_{(α, β)} (D_{j}),$

${neg}_{A}^{(α, β)} (D_{j}) = {\bar{R_{A}^{≽}}}_{(α, β)} (D_{j}),$

${bn}_{A}^{(α, β)} (D_{j}) = {\bar{R_{A}^{≽}}}_{(α, β)} (D_{j}) - {\underline{R_{A}^{≽}}}_{(α, β)} (D_{j}) .$

According to Definition 3.4, it is not difficult to obtain the following descriptions of rough regions.

${pos}_{A}^{(α, β)} (D_{j}) = {x \in U : P (D_{j} | [x]_{A}^{≽}) \geq α},$

${neg}_{A}^{(α, β)} (D_{j}) = {x \in U : P (D_{j} | [x]_{A}^{≽}) \leq β},$

${bn}_{A}^{(α, β)} (D_{j}) = {x \in U : β < P (D_{j} | [x]_{A}^{≽}) < α} .$

Based on the previous discussions, we will conduct an example to exhibit the process of decision-theoretic rough set modeling in a LvDIS. Here, it should be noted that the data in this case is constructed from [36].

Example 3.1. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system about medical diagnosis, where U is composed of 10 patients, the condition attributes AT = {a ₁, a ₂, a ₃, a ₄, a ₅} are five diagnostic indexes and D = {d} is a decision attribute set about diagnosis conclusion, where the "1" means healthy and "0" is sub-healthy. The detailed characteristics of data are exhibited in Table 2.

Table 2

A lattice-valued decision information system

U	a ₁	a ₂	a ₃	a ₄	a ₅	d
x ₁	2	{a}	[0.4, 0.7]	0.6	〈0.1, 0.8〉	1
x ₂	3	{a, b, c}	[0.6, 0.8]	0.8	〈0.1, 0.8〉	0
x ₃	2	{a}	[0.1, 0.6]	0.6	〈0.1, 0.8〉	1
x ₄	2	{a}	[0.8, 0.9]	0.6	〈0.4, 0.3〉	0
x ₅	1	{a, b, c}	[0.1, 0.6]	0.3	〈0.4, 0.3〉	0
x ₆	1	{a, b, c}	[0.6, 0.8]	0.3	〈1.0, 0.0〉	1
x ₇	3	{a, b, c}	[0.4, 0.7]	0.8	〈0.4, 0.3〉	0
x ₈	1	{a, b, c}	[0.8, 0.9]	0.3	〈1.0, 0.0〉	1
x ₉	1	{a}	[0.1, 0.6]	0.3	〈0.4, 0.3〉	1
x ₁₀	3	{a, b, c}	[0.4, 0.7]	0.6	〈1.0, 0.0〉	0

The universe U is divided into two distinct parts by decision attribute D that π _D = {D ₁, D ₂}, where D ₁ = {x ₁, x ₃, x ₆, x ₈, x ₉} and D ₂ = {x ₂, x ₄, x ₅, x ₇, x ₁₀}. Furthermore, we can obtain a binary relation matrix regards to attribute set AT based on Definition 3.2. $M = (\begin{matrix} 1 & 1 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}) .$

The matrix M = (m _ij) _|U|×|U|, where m _ij means x _j ≽ x _i with respect to $R_{AT}^{≽}$ if m _ij = 1, otherwise is contrary. Therefore, we can get the basic knowledge granules of this lattice-valued decision information system, which are listed as follows:

$[x_{1}]_{AT}^{≽} = {x_{1}, x_{2}, x_{4}, x_{7}, x_{10}}$ ,

$[x_{2}]_{AT}^{≽} = {x_{2}}$ ,

$[x_{3}]_{AT}^{≽} = {x_{1}, x_{2}, x_{3}, x_{4}, x_{7}, x_{10}}$ ,

$[x_{4}]_{AT}^{≽} = {x_{4}}$ ,

$[x_{5}]_{AT}^{≽} = {x_{5}, x_{6}, x_{7}, x_{8}, x_{10}}$ ,

$[x_{6}]_{AT}^{≽} = {x_{6}, x_{8}}$ ,

$[x_{7}]_{AT}^{≽} = {x_{7}}$ ,

$[x_{8}]_{AT}^{≽} = {x_{8}}$ ,

$[x_{9}]_{AT}^{≽} = {x_{4}, x_{5}, x_{6}, x_{7}, x_{8}, x_{9}, x_{10}}$ ,

$[x_{10}]_{AT}^{≽} = {x_{10}}$ .

Then, we can obtain the confidence of rule $[x_{i}]_{AT}^{≽} \to D_{j}$ for any x _i ∈ U and D _j ∈ π _D, for instance, $confidence ([x_{1}]_{AT}^{≽} \to D_{2}) = \frac{| [x_{1}]_{AT}^{≽} \cap D_{2} |}{| [x_{1}]_{AT}^{≽} |} = 0.8 .$

The confidence of remainder rules can be calculated in a same way. Here, the confidence of rule $[x]_{AT}^{≽} \to D_{2}$ for all x ∈ U are (0.80, 1.0, 0.67, 1.0, 0.6, 0.0, 1.0, 0.0, 0.57, 1.0). Here, given a loss function as follows: $\begin{matrix} λ_{PP} = 0, λ_{PN} = 19, \\ λ_{BP} = 7, λ_{BN} = 2, \\ λ_{NP} = 9, λ_{NN} = 0 . \end{matrix}$

Then, we can get that α = 0.7 and β = 0.5. Thus, we can achieve the lower and upper approximations of D ₁ and D ₂ with respect to $R_{AT}^{≽}$ under this loss function, respectively. It should be noted that for simplicity and without any loss of generality, we conduct case study based on D ₂ in the following discussions. $\begin{matrix} {\underline{R_{AT}^{≽}}}_{(0.7, 0.5)} (D_{2}) = {x_{1}, x_{2}, x_{4}, x_{7}, x_{10}}, \\ {\bar{R_{AT}^{≽}}}_{(0.7, 0.5)} (D_{2}) = {x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{7}, x_{9}, x_{10}} . \end{matrix}$ Moreover, we can obtain the rough regions as follows:

${pos}_{AT}^{(0.7, 0.5)} (D_{2}) = {x_{1}, x_{2}, x_{4}, x_{7}, x_{10}}$ ,

${neg}_{AT}^{(0.7, 0.5)} (D_{2}) = {x_{6}, x_{8}}$ ,

${bn}_{AT}^{(0.7, 0.5)} (D_{2}) = {x_{3}, x_{5}, x_{9}}$ .

Thus, we can achieve the following decision rules based on three-way decision theory, which are presented as follows:

These patients x ₁, x ₂, x ₄, x ₇, x ₁₀ are sub-healthy with respect to AT under the given loss function. They need to take further treatment.

The x ₆ and x ₈ are healthy with respect to AT in terms of the given loss function. They are healthy in the light of current indicators.

These people x ₃, x ₅, x ₉ cannot be diagnosed based on present information. A further diagnosis is need to them.

4. Attribute reduction

With the rapid development data science has given rise to a massive volume of freely available, user-generated data. The advent of Big Data has seen both the sources and volumes of data increase rapidly. Meanwhile, there are many unpredictable factors that affect the validity and authenticity of data. It is almost impossible for people to make sense of the overall picture in a short period of time. Consequently, an effective data filtering approach is necessary for data mining. Since the significance of attribute for classification is different and the rough set is based on a classification mechanism. Thus, deleting some attributes which with little influence on classification is crucial for reducing the data dimension. Attribute reduction is one the most significant issues in RST, which is a useful mathematical tool for data mining and has been widely utilized in numerous fields. In an intuitive viewpoint, an attribute reduction is a minimal subset of original attributes which induced rule sets with the same level of performance as the entire set of attributes, or lower but satisfied some certain requirements.

4.1. Attribute reduction based on rough entropy

RST is established on a classification mechanism which induced by an attribute set. Consequently, a refined attribute set is defined by requiring that the classification mechanism is unchanged. There are numerous of measures to measure the quality of classification [16, 26]. In this subsection, we will conduct the investigation based on the rough entropy of knowledge.

For a lattice-valued decision information system, an attribute set A ⊆ AT is a reduction of AT if the following conditions are satisfied:

Jointly sufficient condition: $R_{AT}^{≽} = R_{A}^{≽}$ ;

Individually necessary condition: $R_{A - {a}}^{≽} \neq R_{A}^{≽}$ , ∀a ∈ A.

Usually, the reduction of AT is marked as Red (AT). An attribute a is necessary in A if $R_{A - {a}}^{≽} \neq R_{A}^{≽}$ . A set is a core of AT which consist of all necessary attributes, be marked as Core (AT). It is not unique that the attribute reduction of an information system, but there is one and only one core. Therefore, we usually design a heuristic attribute reduction algorithm which starts from a core. Here, we define the rough entropy measure in a lattice-valued information system in a similar way.

Definition 4.1. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any A ⊆ AT, the rough entropy of A is defined as follows: $Er (A) = \sum_{i = 1}^{| U |} \frac{| [x_{i}]_{A}^{≽} |}{| U |} \cdot \log_{2} | [x_{i}]_{A}^{≽} | .$ (10)

It is clear that there are minimum and maximum values of the rough entropy. The minimum of Er (A) is 0 if for any x ∈ U have $[x]_{A}^{≽} = {x}$ , and the maximum of Er (A) is |U| · log ₂|U| if for any x ∈ U have $[x]_{A}^{≽} = {x_{1}, x_{2}, \dots, x_{n}} = U$ . According to Definition 4.1, we can obtain that the rough entropy of A is only depended on the size of basic knowledge granule for a given information system. The rough entropy only varies with the classification. For any A, B ⊆ AT and x _i ∈ U, we have $[x_{i}]_{A}^{≽} \subseteq [x_{i}]_{B}^{≽}$ if A ⊇ B. That means Er (A) ≤ Er (B) if B ⊆ A, namely, $R_{A}^{≽} \subseteq R_{B}^{≽}$ . Therefore, we can get the following proposition hold.

Proposition 4.1. An attribute set A is a reduction of AT, if Er (A) = Er (AT) and Er (A - {a}) ≠ Er (A) for any a ∈ A.

Proof. It can be directly proved by Definition 4.1. □

Since computing all attribute reductions is an NP-hard problem [33], there are numerous of heuristic algorithms for searching one reduction have been investigated, for instance [10 , 50]. A heuristic algorithm usually includes two segments which consist of heuristics and search strategy. For the heuristics in an attribute reduction heuristic algorithm, the core of attribute set is usually adopted. With respect to the search strategies of a heuristic algorithm, there are two kinds of are considered which include directional and nondirectional search strategy. The first one strategy can be further categorized into deletion method, addition method and addition-deletion method [39]. The second one is usually utilized into evolutionary algorithms, swarm algorithms and other population-based meta-heuristic algorithms for optimization issues [15]. In the following discussion, we will utilize the addition-deletion method for the sake of simplicity. Therefore, two measures are needed to measure the significance of an attribute. The first measure is the absolute significance of a in A.

Definition 4.2. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any A ⊆ AT and a ∈ A, the absolute significance of a in A is defined as follows: ${Sig}_{in} (a, A) = Er (A - {a}) - Er (A) .$ (11)

In particular, we can obtain that Sig _in (a, A) = |U| · log₂|U| - Er ({a}) if A = {a}, that means Er (∅) = |U| · log₂|U|, namely, there are maximum uncertainty. According to Definition 4.2, the following propositions hold for Sig _in (a, A):

0 ≤ Sig _in (a, A) ≤ |U| · log ₂|U|;

The attribute a is necessary in A if and only if Sig _in (a, A) >0;

The core of A is Core (A) = {a ∈ A : Sig _in (a, A) >0}.

Corresponding to the absolute significance measure Sig _in (a, A), we need define a relative significance to measure the remaining attributes that AT ∖ A. Where AT ∖ A is the difference set of them that AT ∖ A = {a : a ∈ AT, a ∉ A}.

Definition 4.3. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any A ⊆ AT and a ∈ AT ∖ A, the relative significance of a in AT ∖ A is defined as follows: ${Sig}_{out} (a, A) = Er (A) - Er (A \cup {a}) .$ (12)

It is clear that Sig _out (∅ , A) =0. For any a ∈ AT ∖ A, the greater the relative significance Sig _out (a, A) means the more significant of a relative to A. Consequently, it is usually believed as a heuristic index for one attribute reduction search. Here, we will design a heuristic attribute reduction algorithm based on Sig _out (a, A) (see Algorithm 1).

Algorithm 1 A heuristic attribute reduction algorithm based on rough entropy
Input: A LvDIS $L S = (U, AT \cup D, V, f)$ .
Output: A reduction Red (AT).
begin
1: let Red (AT) =∅; // initialization of the reduction
2: for i = 1 ; i< = \|AT\| ; i ++ do
3: if Sig _in (a _i, AT) >0 then
4: Red (AT) = Red (AT)∪ {a _i} ;
5: else
6: Red (AT) = Red (AT) ;
7: end
8: end
9: while Er (Red (AT)) ≠ Er (AT) do
10: $a^{'} = \underset{a \in AT ∖ Red (AT)}{argmax} {{Sig}_{out} (a, Red (AT))};$
11: Red (AT) = Red (AT)∪ {a′} $ ;
12: end
13: return Red (AT) .
end

In order to verify the feasibility of this algorithm, we conduct an example based on Algorithm 1.

Example 4.1. (Continued from Example 3.1) For any a ∈ AT, we can obtain the absolute significance Sig _in (a, AT) by utilizing Definition 4.1 and Definition 4.2. Thus, the rough entropy of AT can be achieved in the following way.

$\begin{matrix} Er (AT) = & \frac{5}{10} \log_{2} 5 + \frac{1}{10} \log_{2} 1 + \frac{6}{10} \log_{2} 6 + \frac{1}{10} \log_{2} 1 \\ + \frac{5}{10} \log_{2} 5 + \frac{2}{10} \log_{2} 2 + \frac{2}{10} \log_{2} 2 \\ + \frac{1}{10} \log_{2} 1 + \frac{7}{10} \log_{2} 7 + \frac{1}{10} \log_{2} 1 \\ = & 6.24 . \end{matrix}$

Therefore, for any a ∈ AT, the absolute significance Sig _in (a, AT) are listed as follows:

Sig _in (a ₁, AT) =6.24 - 6.24 = 0.0,

Sig _in (a ₂, AT) =8.33 - 6.24 = 2.09,

Sig _in (a ₃, AT) =7.04 - 6.24 = 0.80,

Sig _in (a ₄, AT) =6.24 - 6.24 = 0.0,

Sig _in (a ₅, AT) =8.74 - 6.24 = 2.50 .

Based on these achievements, we can get that the core of AT is Core (AT) = {a ₂, a ₃, a ₅}. Utilizing Algorithm 1, let Red (AT) = {a ₂, a ₃, a ₅}, then the relationship matrix with respect to {a ₂, a ₃, a ₅} is $M_{{a_{2}, a_{3}, a_{5}}} = (\begin{matrix} 1 & 1 & 0 & 1 & 0 & 1 & 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 1 \end{matrix}) .$

Then, we can calculate the rough entropy of Red (AT) and take a comparison with Er (AT) based on the relationship matrix M _{{a
₂,a
₃,a
₅}}. According to the relationship matrix M _{{a
₂,a
₃,a
₅}}. $\begin{matrix} Er (Red (AT)) = & Er ({a_{2}, a_{3}, a_{5}}) \\ = & 1.97 + 0.48 + 3.32 + 0.2 + 1.16 \\ + 0.2 + 0.8 + 0 + 1.97 + 0.48 \\ = & 10.58 . \end{matrix}$

It is clear that Er ({a ₂, a ₃, a ₅}) ≠ Er (AT). That means we need compute the relative significance for attributes in AT ∖ Red (AT). According to Definition 4.3, we can get the following results:

$\begin{matrix} {Sig}_{out} & (a_{1}, Red (AT)) \\ = Er (Red (AT)) - Er ({a_{2}, a_{3}, a_{5}} \cup {a_{1}}) \\ = 4.34 . \\ {Sig}_{out} & (a_{4}, Red (AT)) \\ = Er (Red (AT)) - Er ({a_{2}, a_{3}, a_{5}} \cup {a_{4}}) \\ = 4.34 . \end{matrix}$

Since Sig _out (a ₁, Red (AT)) is equal to Sig _out (a ₄, Red (AT)), thus we can get the following results: $\begin{matrix} a_{1} = \underset{a \in AT ∖ Red (AT)}{argmax} {{Sig}_{out} (a, Red (AT))}, \\ a_{4} = \underset{a \in AT ∖ Red (AT)}{argmax} {{Sig}_{out} (a, Red (AT))} . \end{matrix}$

Consequently, both {a ₁, a ₂, a ₃, a ₅} and {a ₂, a ₃, a ₄, a ₅} may be attribute reduction. Then, we need to take a further verification. Here, we can get

Er ({a ₁, a ₂, a ₃, a ₅}) = Er (AT) ,

Er ({a ₂, a ₃, a ₄, a ₅}) = Er (AT) ,

which means that both {a ₁, a ₂, a ₃, a ₅} and {a ₂, a ₃, a ₄, a ₅} are attribute reductions of AT. Usually we can obtain one of the attribute reductions by utilizing Algorithm 1, but not all of attribute reductions. We can select arbitrary one of them as a candidate if the relative significance of multiply attributes is the same. Typically, it is chosen with respect to its order in the information system. Therefore, the algorithm is suitable for finding an attribute reduction rather than all attribute reductions in practice applications.

4.2. Attribute reduction based on positive region

In DTRS model, the decision rules are related to rough regions. Especially, the positive region plays an extremely important role in the decision-making process. In the above discussion, a heuristic attribute reduction algorithm with respect to rough entropy is investigated. Here, we will study another attribute reduction approach based on the positive region preservation.

From the viewpoint of semantic, the objects in positive region can be "probably" classified into a "certain" decision class. Larger positive region usually be with smaller uncertain region, which means less ambiguous objects [15]. In order to measure the classification ability quantitatively, the approximation quality of decision attributes D with respect to attribute set A in DTRS model is denoted as follows: $γ_{A}^{(α, β)} (D) = \frac{| {pos}_{A}^{(α, β)} (π_{D}) |}{| U |},$ (13) where the ${pos}_{A}^{(α, β)} (π_{D}) = ⋃_{D_{j} \in π_{D}} {pos}_{A}^{(α, β)} (D_{j})$ . A classical attribute reduction approach in RST model and its expansion models is established based on positive preservation. We define the attribute reduction with respect to positive preservation in a lattice-valued decision information system in a similar way.

Given a lattice-valued decision information system $LS = (U, AT \cup D, V, f)$ , for any β < α ≤ 1, an attribute set A ⊆ AT is a positive preservation based attribute reduction with respect to decision attribute set D if the following conditions are satisfied:

Jointly sufficient condition: ${pos}_{A}^{(α, β)} (π_{D}) = {pos}_{AT}^{(α, β)} (π_{D})$ ;

Individually necessary condition: ${pos}_{A - {a}}^{(α, β)} (π_{D}) \neq {pos}_{A}^{(α, β)} (π_{D})$ , ∀a ∈ A.

In classical rough set model, there exists a monotonicity of the positive region with respect to the set of condition attributes, that is, for any A ₁ ⊆ A ₂ we have pos _{A
₁} (π _D) ⊆ pos _{A
₂} (π _D) that means γ _{A
₁} (π _D) ≤ γ _{A
₂} (π _D), where $γ_{A_{1}} (π_{D}) = γ_{A_{1}}^{(α, β)} (π_{D})$ when α = 1 and β = 0. However, the monotonicity does not always hold in DTRS models. That means given A ₁ ⊆ A ₂, we may have ${pos}_{A_{2}}^{(α, β)} \subseteq {pos}_{A_{1}}^{(α, β)}$ . For the condition (1), there nay exists a subset A ⊆ AT such that ${pos}_{AT}^{(α, β)} \subseteq {pos}_{A}^{(α, β)}$ . For the condition (2), the A - {a} cannot guarantee the reduction is a minimal result. Consequently, we need to define a novel attribute reduction approach based on positive region preservation by using $γ_{A}^{(α, β)} (π_{D})$ .

Definition 4.4. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any β < α ≤ 1, an attribute set A ⊆ AT is a positive region based reduction with respect to decision attribute set D if it satisfies the following two conditions:

Jointly sufficient condition: $γ_{A}^{(α, β)} (π_{D}) \geq γ_{AT}^{(α, β)} (π_{D})$ ;

Individually necessary condition: $γ_{A - {a}}^{(α, β)} (π_{D}) < γ_{A}^{(α, β)} (π_{D})$ , ∀a ∈ A.

In Definition 4.4, the quantitative index $γ_{A}^{(α, β)} (π_{D})$ is utilized to obtain an attribute reduction. Compared to the classical positive region preservation-based attribute reduction approach, it can achieve the attribute reduction which includes a larger positive region. Similar to the Algorithm 1, we need to further define an index to measure the significance of each attribute, which determines the order of condition attributes in the process of addition-deletion attributes. Here, the absolute significance of attribute a in attribute set A is defended as follows:

Definition 4.5. Let $LS = (U, AT \cup D, V, f)$ be a lattice-valued decision information system, for any 0 < β < α ≤ 1, A ⊆ AT and a ∈ A, the absolute significance of a in A with respect to decision attribute set D is denoted as follows: ${Sig}_{in} (a, A, D) = \frac{| {pos}_{A}^{(α, β)} (π_{D}) Δ {pos}_{A - {a}}^{(α, β)} (π_{D}) |}{| {pos}_{A}^{(α, β)} (π_{D}) |} .$

Where "Δ" is symmetric difference of two sets that XΔY = (X ∪ Y) - (X ∩ Y). It is clear that Sig _in (a, A, D) ∈ [0, 1], where Sig _in (a, A, D) =0 means ${pos}_{A}^{(α, β)} (π_{D}) = {pos}_{A - {a}}^{(α, β)} (π_{D})$ , namely, the attribute a is not necessary. For any a ∈ A, the greater the absolute significance Sig _in (a, A, D) means the more significant of a in A with respect to D. According to this quantitative measure, we design a heuristic attribute reduction algorithm based on the positive region preservation, which is exhibited in Algorithm 2.

Algorithm 2 A heuristic attribute reduction algorithm based on positive region preservation
Input: A LvDIS $L S = (U, AT \cup D, V, f)$ .
Output: A reduction Red (AT).
begin
1: let Red (AT) =∅; // initialization of the reduction
2: for each a ∈ AT do
3: compute Sig _ in (a, A, D) ;
4: end
5: while $γ_{Red (AT)}^{(α, β)} (π_{D}) < γ_{AT}^{(α, β)} (π_{D})$ do
6: $a^{'} = \underset{a \in AT}{argmax} {{Sig}_{in} (a, AT, D)};$
7: AT = AT - {a′} ;
8: Red (AT) = Red (AT)∪ {a′} ;
9: end
10: for each a ∈ Red (AT) do
11: if ${pos}_{Red (AT) - {a}}^{(α, β)} (π_{D}) = {pos}_{Red (AT)}^{(α, β)} (π_{D})$ then
12: Red (AT) = Red (AT) - {a} ;
13: end
14: end
15: return Red (AT) .
end

Example 4.2. (Continued from Example 3.1) According to Definition 3.4, we can get that ${pos}_{AT}^{(α, β)} (π_{D}) = ⋃_{D_{j} \in π_{D}} {x \in U : P (D_{j} | [x]_{AT}^{≽}) \geq α}$ , that is to say

${pos}_{AT}^{(0.7, 0.5)} (π_{D}) = {pos}_{AT}^{(0.7, 0.5)} (D_{1}) \cup {pos}_{AT}^{(0.7, 0.5)} (D_{2})$ in this example. Here, we can achieve that ${pos}_{AT}^{(0.7, 0.5)} (π_{D}) = {x_{1}, x_{2}, x_{4}, x_{6}, x_{7}, x_{8}, x_{10}},$ and $γ_{AT}^{(0.7, 0.5)} (π_{D}) = 0.7$ . For any a ∈ AT, we can obtain the significance of a in AT with respect to decision attribute set D by utilizing Definition 4.4, which are listed as follows:

Sig _in (a ₁, AT, D) =0,

Sig _in (a ₂, AT, D) =0,

Sig _in (a ₃, AT, D) =0.43,

Sig _in (a ₄, AT, D) =0,

Sig _in (a ₅, AT, D) =0.29.

It is not difficult to get that $γ_{\emptyset}^{(0.7, 0.5)} (π_{D}) < γ_{AT}^{(0.7, 0.5)} (π_{D})$ . Therefore, we need to find the attribute which with maximum significance in following way. $a_{3} = \underset{a \in AT}{argmax} {{Sig}_{in} (a, AT, D)},$ that is to say, Red (AT) = {a ₃}. Since $γ_{{a_{3}}}^{(0.7, 0.5)}$ $(π_{D}) < γ_{AT}^{(0.7, 0.5)} (π_{D})$ , so we need to repeat the above steps, then let Red (AT) = {a ₃, a ₅}. Analogously, we can get that $γ_{{a_{3}, a_{5}}}^{(0.7, 0.5)} (π_{D}) < γ_{AT}^{(0.7, 0.5)} (π_{D})$ , it means that the solution process needs to be continued. Here, it should be noted that a ₁, a ₂ and a ₄ are candidate attributes since Sig _in (a ₁, AT, D) = Sig _in (a ₂, AT, D) = Sig _in (a ₄, AT, D). That means attribute set {a ₁, a ₃, a ₅}, {a ₂, a ₃, a ₅} and {a ₃, a ₄, a ₅} may be attribute reductions. Consequently, we need to take further measures to judge these attributes.

$γ_{{a_{1}, a_{3}, a_{5}}}^{(0.7, 0.5)} (π_{D}) = γ_{AT}^{(0.7, 0.5)} (π_{D}),$

$γ_{{a_{2}, a_{3}, a_{5}}}^{(0.7, 0.5)} (π_{D}) = γ_{AT}^{(0.7, 0.5)} (π_{D}),$

$γ_{{a_{3}, a_{4}, a_{5}}}^{(0.7, 0.5)} (π_{D}) = γ_{AT}^{(0.7, 0.5)} (π_{D}) .$

Furthermore, we need to verify that each attribute is necessary according to line 10 to line 14 of Algorithm 2. The verification achievements show that all attributes are necessary in the attribute concentration in which it is located. Then we can obtain that all {a ₁, a ₃, a ₅}, {a ₂, a ₃, a ₅} and {a ₃, a ₄, a ₅} are attribute reductions based on positive region preservation. A heuristic attribute reduction algorithm can effectively search a reduction rather than all reductions. It should be noted that if we just need to find a reduction when the significances of attributes are identical. We can select the candidate according to the original order of the attributes. That means attribute set {a ₁, a ₃, a ₅} should be considered in priority. Combined with Example 3.1, we can get the positive decision rule that patients {x ₁, x ₂, x ₄, x ₇, x ₁₀} are sub-healthy with respect to knowledge {a ₁, a ₃, a ₅} under α = 0.7 and β = 0.5. It shows that attributes a ₂ and a ₄ are redundant to obtain the same positive region. From the perspective of medical diagnostics, the results indicate that some detection indicators are not necessary for the diagnosis of certain diseases.

In this discussion, we presented two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. According to the achievements of Example 4.1 and 4.2, we can obtain that using different algorithms may induce different results with respect to one data set. Actually, the essential purposes of these algorithms are different. Algorithm 1 is designed based on the influence of attributes in terms of classification, and rough entropy is an index to measure the quality of classification. It indicates that the objective of Algorithm 1 is classification preservation. However, the target of Algorithm 2 is positive rules preservation, that is maintain the positive region. Since the larger positive region usually be with smaller uncertain region, which means less ambiguous objects. Algorithm 2 does not focus on the classification, only concerns the positive rules. It is clear that the goal of Algorithm 1 is more rigorous than Algorithm 2, and the results of Algorithm 2 are included in the results of Algorithm 1. The experimental achievements also indicate these circumstances. Thus, we should choose an appropriate algorithm based on practical requirements in applications.

5. Conclusion and further work

The uncertainty of human cognitive and numerous random factors is widespread in practical data. In order to more accurately depict these hybrid data that consist of multiple types of attributes, a lattice-valued decision information system was presented where the domain of all condition attributes are finite lattices. Based on this generalized decision information system, we define a novel partial ordering relation and establish a decision-theoretic rough set model to deal with the hybrid data which with various forms. With the advent of Big Data era has seen both the volumes and update rates of data increase rapidly, while the redundant information is increasing. Consequently, we design two heuristic attribute reduction algorithms based on rough entropy and positive region preservation, respectively. Meanwhile, several indexes which be utilized to measure the significance of attributes are defined in the process of attribute reduction. Algorithm 1 and Algorithm 2 can be applied to find one of the attribute reductions but not all of attribute reductions in practical issues. This paper establishes a basic decision-theoretic rough set model in lattice-valued decision information system and presents two heuristic attribute reduction algorithms to reduce the redundant knowledge based on different targets. However, there is still a lot of work that needs to be further considered. Therefore, we will try to develop a decision-theoretic rough set model in dynamic circumstance and design a novel attribute reduction algorithm to further improve the efficiency in future study.

Footnotes

Acknowledgments

We would like to express our thanks to the Editor-in-Chief, handling associate editor and anonymous referees for his/her valuable comments and constructive suggestions.

References

Azam and

J.T.

Yao , Analyzing uncertainties of probabilistic rough set regions with game-theoretic rough sets, International Journal of Approximate Reasoning 55(1) (2014), 142–155.

Azam and

J.T.

Yao , Interpretation of equilibria in game-theoretic rough sets, Information Sciences 295(3-4) (2015), 586–599.

H.M.

Chen ,

T.R.

Li ,

Luo ,

S.J.

Horng and

G.Y.

Wang , A decision-theoretic rough set approach for dynamic data mining, IEEE Transactions on Fuzzy Systems 23(6) (2015), 1958–1970.

Düntsch and

Gediga , Uncertainty measures of rough set prediction, Artificial Intelligence 106(1) (1998), 109–137.

Feng ,

S.P.

Zhang and

J.S.

Mi , The reduction and fusion of fuzzy covering systems based on the evidence theory, International Journal of Approximate Reasoning 53(1) (2012), 87–103.

Feng and

J.S.

Mi , Variable precision multigranulation decision-theoretic fuzzy rough sets, Knowledge-Based Systems 91 (2016), 93–101.

Greco ,

Matarazzo and

Slowinski , Parameterized rough set model using rough membership and Bayesian confirmation measures, International Journal of Approximate Reasoning 49(2)(2008), 285–300.

Jeon ,

Kim and

Jeong , Rough Sets Attributes Reduction Based Expert System in Interlaced Video Sequences, IEEE Transactions on Consumer Electronics 52(4) (2006), 1348–1355.

Jensen and

Shen , Fuzzy-rough sets assisted attribute selection, IEEE Transaction on Fuzzy Systems 15(1) (2007), 73–89.

10.

Jensen and

Shen , Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches, IEEE Transactions on Knowledge & Data Engineering 16(12) (2004), 1457–1471.

11.

L.J.

Ke ,

Z.R.

Feng and

Z.G.

Ren , An efficient ant colony optimization approach to attribute reduction in rough set theory, Pattern Recognition Letters 29(9) (2008), 1351–1357.

12.

G.M.

Lang ,

D.Q.

Miao and

M.J.

Cai , Three-way decision approaches to conflict analysis using decision-theoretic rough set theory, Information Sciences 406 (2017), 185–207.

13.

H.X.

Li ,

M.H.

Wang ,

X.Z.

Zhou and

J.B.

Zhao , An interval set model for learning rules from incomplete information table, International Journal of Approximate Reasoning 53(1) (2012), 24–37.

14.

W.T.

Li and

W.H.

Xu , Double-quantitative decision-theoretic rough set, Information Sciences 316 (2015), 54–67.

15.

W.W.

Li ,

Z.Q.

Huang ,

X.Y.

Jia and

X.Y.

Cai , Neighborhood based decision-theoretic rough set models, International Journal of Approximate Reasoning 69 (2016), 1–17.

16.

J.Y.

Liang ,

Z.Z.

Shi ,

D.Y.

Li and

M.J.

Wierman , Information entropy, rough entropy and knowledge granulation in incomplete information systems, International Journal of General Systems 35(6) (2006), 641–654.

17.

D.C.

Liang ,

Pedrycz and

Liu , Determining three-way decisions with wecision-theoretic rough sets using a relative value approach, IEEE Transactions on Systems Man & Cybernetics Systems 47(8) (2017), 1785–1799.

18.

D.C.

Liang ,

Z.S.

Xu and

Liu , Three-way decisions based on decision-theoretic rough sets with dual hesitant fuzzy information, Information Sciences 396 (2017), 127–143.

19.

Liu ,

T.R.

Li and

R.D.

Ruan , Probabilistic model criteria with decision-theoretic rough sets, Information Sciences 181(17) (2011), 3709–3722.

20.

C.H.

Liu ,

Pedrycz ,

M.Z.

Wang , Covering-based multigranulation decision-theoretic rough sets, Journal of Intelligent & Fuzzy Systems 32(1) (2017), 749–765.

21.

Luo ,

T.R.

Li ,

Yi and

Fujita , Matrix approach to decision-theoretic rough sets for evolving data, Knowledge-Based Systems 99 (2016), 123–134.

22.

H.Y.

Pan ,

Y.M.

Li and

Y.Z.

Cao , Lattice-valued simulations for quantitative transition systems, International Journal of Approximate Reasoning 56 (2015), 28–42.

23.

Pawlak , Rough set, International Journal of Computer & Information Sciences 11(5) (1982), 341–356.

24.

Pawlak ,

S.K.

Wong and

Ziarko , Rough sets: Probabilistic versus deterministic approach, International Journal of Man-Machine Studies 29(1) (1988), 81–95.

25.

Pawlak and

Skowron , Rough membership functions, in: Advances in the Dempster-Shafer Theory of Evidence, John Wiley and Sons, New York, 1994, pp. 251–271.

26.

Y.H.

Qian ,

J.Y.

Liang ,

W.Z.

Wu and

C.Y.

Dang , Knowledge structure, knowledge granulation and knowledge distance in a knowledge base, International Journal of Approximate Reasoning 50 (2009), 174–188.

27.

Y.H.

Qian ,

Zhang ,

Y.L.

Sang and

J.Y.

Liang , Multigranulation decision-theoretic rough sets, International Journal of Approximate Reasoning 55(1) (2014), 225–237.

28.

Y.H.

Qian ,

X.Y.

Liang ,

G.P.

Lin ,

Guo and

J.Y.

Liang , Local multigranulation decision-theoretic rough sets, International Journal of Approximate Reasoning 82(2017), 119–137.

29.

Rebolledo , Rough intervals-enhancing intervals for qualitative modeling of technical systems, Artificial Intelligence 170 (2006), 667–685.

30.

Y.L.

Sang ,

J.Y.

Liang and

Y.H.

Qian , Decision-theoretic rough sets under dynamic granulation, Knowledge-Based Systems 91 (2016), 84–92.

31.

Shen and

Chouchoulas , A rough-fuzzy approach for generating classification rules, Pattern Recognition 35 (2002), 2425–2438.

32.

Slezak and

Ziarko , The investigation of the Bayesian rough set model, International Journal of Approximate Reasoning 40 (1)(2005), 81–91.

33.

Skowron and

Rauszer , The discernibility matrices and functions in information systems, Theory & Decision Library 11(1992), 331–362.

34.

B.Z.

Sun ,

W.M.

Ma and

H.Y.

Zhao , Decision-theoretic rough fuzzy set model and application, Information Sciences 283 (2014), 180–196.

35.

G.Y.

Wang ,

X.A.

Ma and

Yu , Monotonic uncertainty measures for attribute reduction in probabilistic rough set model, International Journal of Approximate Reasoning 59(C)(2015), 41–67.

36.

W.H.

Xu ,

S.H.

Liu and

W.X.

Zhang , Lattice-valued information systems based on dominace relation, International Journal of Machine Learning and Cybernetics 4 (2013), 245–257.

37.

W.H.

Xu and

Y.T.

Guo , Generalized multigranulation double-quantitative decision-theoretic rough set, Knowledge-Based Systems 105 (2016), 190–205.

38.

W.H.

Xu and

J.H.

Yu , A novel approach to information fusion in multi-source datasets: A granular computing viewpoint, Information Sciences 378 (2017), 410–423.

39.

Y.Y.

Yao ,

Zhao and

Wang , On reduct construction algorithms, in: Proceedings of RSKT, 2006, pp. 297–304.

40.

Y.Y.

Yao , Decision-theoretic rough set models, in:Proceedings of RSKT’07, LNAI, vol. 4481, 2007, pp. 1–12.

41.

Y.Y.

Yao , Probabilistic rough set approximations, International Journal of Approximate Reasoning 49(2) (2008), 255–271.

42.

Y.Y.

Yao and

Zhao , Attribute reduction in decision-theoretic rough set models, Information Sciences 178 (2008), 3356–3373.

43.

Y.Y.

Yao , Three-way decisions with probabilistic rough sets, Information Sciences 180(3) (2010), 341–353.

44.

J.H.

Yu ,

M.H.

Chen and

W.H.

Xu , Dynamic computing rough approximations approach to time-evolving information granule interval-valued ordered information system, Applied Soft Computing 60 (2017), 18–29.

45.

Zeng ,

Pan ,

Q.L.

Zheng and

Peng , Knowledge acquisition based on rough set theory and principal component analysis, IEEE Intelligent Systems 21(2) (2006), 78–85.

46.

Ziarko , Probabilistic Rough Sets[C]// Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, International Conference, Rsfdgrc 2005, Regina, Canada, August 31 - September 3, 2005, Proceedings. DBLP, 2005, pp. 283–293.

47.

Ziarko , Variable precision rough set model, Journal of Computer and System Sciences 46(1) (1993), 39–59.

48.

X.H.

Zhang and

W.S.

Wang , Lattice-valued interval soft sets -A general frame of many soft set models, Journal of Intelligent & Fuzzy Systems 26 (2014), 1311–1321.

49.

X.Y.

Zhang ,

Wei and

W.H.

Xu , Attributes reduction and rules acquisition in an lattice-valued information system with fuzzy decision, International Journal of Machine Learning and Cybernetics 8 (2017), 135–147.

50.

W.X.

Zhang ,

J.S.

Mi and

W.Z.

Wu , Approaches to knowledge reductions in inconsistent systems, International Journal of Intelligent Systems 18 (2003), 989–1000.