Abstract
An information system (IS) is a database that expresses relationships between objects and attributes. An IS with decision attributes is said to be a decision information system (DIS). An incomplete real-valued decision information system (IRVDIS) is a DIS based on incomplete real-valued data. This paper studies three-way decision (3WD) for incomplete real-valued data and its application. In the first place, the distance between two objects on the basis of the conditional attribute set in an IRVDIS is constructed. In the next place, the fuzzy T cos -equivalence relation on the object set of an IRVDIS is received by means of Gaussian kernel. After that, the decision-theoretic rough set model for an IRVDIS is presented. Furthermore, the 3WD method is proposed based on this model. Lastly, to illustrate the feasibility of the proposed method, an application of the proposed method is given. It is worth mentioning that levels of risk may be determined by thresholds that can be directly acquired according to risk preference of different decision-makers, as well as the decision rule for each decision class under different levels of risk is showed in tabular forms.
Introduction
Research background and related works
Rough set theory (RST) brought forward by Pawlak [14, 15]. RST is a tool to describe incomplete information and fuzzy information. It can analyze the fuzzy information or incomplete information and find the potential relationship in the implicit expectation, so as to reveal the underlying rules. Rough set is based on equivalence relation division to study the uncertainty of knowledge, and the concept of uncertainty is described by two definite sets, the upper approximation set and the lower approximation set. The “upper approximation” and “lower approximation” are used to describe the possibility and the certainty respectively, and the “boundary region” is used to describe the uncertainty. When it comes to decision problems, roughness (or precision) can be used to describe the performance of the extracted decision rules. As a data analysis and processing theory, RST has been widely and successfully applied in machine learning, information security, Internet of things, cloud computing, biological information processing, decision support and analysis and other fields.
The approximation of classical rough sets is defined based on the qualitative relationship between concepts (that is, contains or intersects not null). It does not take into account the extent to which concepts intersect, and is therefore not applicable to many practical problems. In order to solve the problem that Pawlak rough set model is too strict and lacks fault tolerance, various probabilistic rough set extension models have been proposed. Further research shows that the 3WDs are not limited to rough sets, but a more general and effective decision and information processing model. In 1990, Yao et al. [30] proposed the decision rough set model, extending 0.5-probability rough set model that is put forward by Pawlak et al. [17]. The main starting point of decision rough set model is to use conditional probability to define the degree of intersection of concepts and two thresholds to define the upper and lower approximation sets of probability.
Under the decision-theoretic rough set model, Yao [25–27, 29] brought out a decision-making model. Since the 3WDs were put forward, they have attracted wide attention from scholars at home and abroad. A lot of progress has been made in the theory, model and application of the three decision making. For example, Aranda-Corral et al. [1] put forward a model of 3WDs for knowledge harnessing; Li al. [28] applied 3WD to complex network analysis; Lang [4] investigated general conflict analysis model based on 3WD; Luo et al. [5] researched 3WD under incomplete information in RST; Li et al. [10, 11] presented respectively 3WD methods in a hybrid information system and a hybrid information system with images, and applied these 3WD methods to medical diagnosis; Luo et al. [6] researched 3WD with incomplete information based on similarity and satisfiability; Li et al. [9] studied 3WD on two universes; Shen et al. [21] considered 3WD based blocking reduction models in hierarchical classification; Zhang et al. [33] brought forward a novel sequential 3WDs model based on penalty function; Zhang et al. [36] investigated 3WDs of rough vague sets from the perspective of fuzziness; Wang et al. [23] proposed a 3WD method based on Gaussian kernel in a hybrid information system with images.
Motivation and inspiration
An information system (IS) is a database which expresses relationships between objects and attributes. RST is developed around an IS. An incomplete real-valued decision information system (IRVDIS) is an IS whose information values are real numbers and incomplete. So far, 3WD (3WD) in an IRVDIS has not been studied. The purpose of this paper is to use decision-theoretic rough set model in an IRVDIS to go into 3WD.
The remaining sections of this article are designed below. In Section 2, we recall fuzzy relations, Pawlak’s rough sets and IRVDISs. In Section 3, we give distance between two objects based on the conditional attribute set in an IRVDIS and construct the fuzzy T cos -equivalence relations by using Gaussian kernel. In Section 4, we propose information structures in an IRVDIS. In Section 5, we present a 3WD method based on decision-theoretic rough set model in an IRVDIS. In Section 6, we put forward an application to show the feasibility of the proposed method. In Section 7, we conclude this paper.
Preliminaries
We first review some basic notions of fuzzy relations, Pawlak rough sets and IRVDISs.
Throughout this paper, U signifies a non-empty finite set, 2 U implies the collection of all subsets of U, |S| means the cardinality of S ∈ 2 U and I denotes [0, 1].
Put
Fuzzy relations
A fuzzy set F in U is defined as a function F : U → I, where F (u) is said to be the membership degree of u to F.
In this paper, I U denotes the set of all fuzzy sets in U. The cardinality of F ∈ I U can be calculated with
If R ∈ IU×U, then R is said to be a fuzzy relation on U.
Let R ∈ IU×U. Then R may be represented by
R is said to be identity, if M (R) = E (here E means an identity matrix), and written as R =▵; R is said to be universal, if ∀ i, j, r ij = 1, and written as R = ω.
Let R ∈ IU×U. ∀ u ∈ U, define
Obviously,
(1) Commutativity: T (m, n) = T (n, m) ;
(2) Associativity: T (T (m, n) , p) = T (m, T (n, p)) ;
(3) Monotonicity: m≤ p, n ⩽ q = T (m, n) ⩽ T (p, q) ;
(4) Boundary condition: T (m, 1) = m .
If T satisfies (1), (2), (3) and (4), then it is said to be a t-norm.
(1) Reflexivity: R (u, u) =1,
(2) Symmetry: R (u, v) = R (v, u) ,
(3) T-transitivity: T (R (u, v) , R (v, w)) ⩽ R (u, w) .
If R satisfies (1), (2) and (3), then it is said to be a T-fuzzy equivalence relation on U.
Pawlak rough sets
Suppose that R is an equivalence relation on U. Denote
Using ordinary inclusion measure,
From the above definition, since the equivalence relation appears to be too strict, the objects whose membership function values are non-0 and non-1 in the universe are assigned to the boundary region. In reality, some more diffuse classifications will be more meaningful. Consequently, Ziarko [31] first introduced an error-tolerance level with set inclusion [35] through a pair of threshold α and β instead of 0 and 1 where 0 ≤ β < α ≤ 1. Variable precision rough set model is defined as
An IRVDIS
In this part, we recall the concept of an IRVDIS.
Suppose that (U, A) is an IS. Then (U, A) is said to be a decision IS, if A = C ∪ D, C is a conditional attribute set and D is a decision attribute set.
Given that (U, A) is an IS and P ⊆ A. Define
Apparently, ind (P) is an equivalence relation on U and ind (P) = ⋂ a∈Pind ({a}) .
Given that (U, A) is an IIS and P ⊆ A. Define
sim (P) = {(u, v) ∈ U × U : ∀ a ∈ P, a (u) = a (v) or a (u) = * or a (v) = * , where * is a missing value.Clearly, sim (P) is a tolerance relation on U and sim (P) = ⋂ a∈Psim ({a}) .
Let (U, A) be an IIS. ∀ a ∈ A, denote
An IRVDIS
An IRVDIS
Table 1 means an IRVDIS (U, C ∪ {d}), where
In this section, the fuzzy T cos -equivalence relation in an IIVIS is constructed.
The distance between two objects in an IRVDIS
The distance between two objects in an IRVDIS is presented as below.
Then
Thus
Calculate the distance d C (u1, u3) between u1 and u3.
Since a3 is a real-valued attribute, by Definition 3.1, we have
Then
d
C
(u1, u3)
The distance between a1 (u
i
) and a1 (u
j
) (i, j = 1, 2, …, 13)
The distance between a1 (u i ) and a1 (u j ) (i, j = 1, 2, …, 13)
The distance between two a2 (u i ) and a2 (u j ) (i, j = 1, 2, …, 13)
The distance between a3 (u i ) and a3 (u j ) (i, j = 1, 2, …, 13)
The distance between a4 (u i ) and a4 (u j ) (i, j = 1, 2, …, 13)
The distance between a5 (u i ) and a5 (u j ) (i, j = 1, 2, …, 13)
The distance between a6 (u i ) and a6 (u j ) (i, j = 1, 2, …, 13)
The distance between a7 (u i ) and a7 (u j ) (i, j = 1, 2, …, 13)
By Definition 3.1, d C in (U, C ∪ {d}) is shown as below.
In this part, the fuzzy T cos -equivalence relation induced by an IRVDIS by means of Gaussian kernel method is given.
Gaussian kernel
Evidently, G (u, v) satisfies:
(1) G (u, v) ∈ [0, 1];
(2) G (u, v) = G (v, u);
(3) G (u, u) =1.
Thus, the fuzzy T
cos
-equivalence relation
An algorithm for arising the fuzzy T
cos
-equivalence relation
In Algorithm 1, we first find the distance between two information values through a double loop according to Definition 3.1, and then use the obtained distance between the information values to calculate the distance between two objects based on C according to Definition 3.2, finally, according to the specified threshold, use the distance between the two objects to find the similarity between the two objects in an IRVDIS. Therefore, the time complexity of Algorithm 1 is related to the number of object and attributes of an IRVDIS. For a dimensionality of |C| × |U|, because the complexity of computing the distance between two information value under an attribute in an IRVDIS is O (|U| × (|U|-1)), so the overall time complexity of Algorithm 1 is O (|C||U|2).
Information structures in an IRVDIS
In this category, information structures in an IRVDIS are considered.
Why do we study its information structures in IRVDIS? The reason is that its information structures contribute to knowledge discovery from this IRVDIS.
Given R ∈ IU×U. Qian et al. [18] defined
A 3WD method based on gaussian kernel in an IRVDIS
In this section, a 3WD method based on Gaussian kernel in an IRVDIS is proposed.
A summary of 3WD methods
In this portion, the knowledge of 3WD is reviewed.
∀ S ∈ 2
U
, U can be divided into three regions:
Under Bayesian risk decision criterion, POS (S) generates the positive rule: acceptance; NEG (S) generates the negative rule: rejection; BND (S) generates the boundary rule: delay. That’s the 3WD method.
(1) P (U) =1 .
(2) If E∩ F = ∅, then P (E ∪ F) = P (E) + P (F) .
If P satisfies (1) and (2), then it is said to be a probability measure.
By a probability measure, we have
Decision-theoretic rough set model is derived from Bayesian theory. In this model, we use two state sets Ω = {X, ¬ X} and three action sets to depict the decision process, where Ω = {X, ¬ X} measns an element in X and not in X, respectively. A set of three decision sets
In this matrix, λ PP , λ BP and λ NP denote the losses caused by taking actions a P , a B and a N , respectively, when an object belongs to X. Similarly, λ PN , λ BN and λ NN denote the losses incurred by taking actions a P , a B and a N when the object belongs to ¬X (N). The expected losses (conditional risks) of taking three different actions on objects in [u] R are:
The Bayesian decision procedure gives the minimum-risk decision rule as below.
(P) If R (a P | [u] R ) ⩽ R (a B | [u] R ) and R (a P | [u] R ) ⩽R (a N | [u] R ), then u ∈ POS (S);
(B) If R (a B | [u] R ) ⩽ R (a P | [u] R ) and R (a B | [u] R ) ⩽R (a N | [u] R ), then u ∈ BND (S);
(N) If R (a N | [u] R ) ⩽ R (a P | [u] R ) and R (a N | [u] R ) ⩽R (a B | [u] R ), then u ∈ NEG (S).
Apparently, P (S| [u]) + P (¬ S| [u]) =1 . Liu et al. [8] found that the rule only relied on the conditional probability P (S| [u]) and the loss functions λ•• (• = P, B, N). On account of Yao’s study [27], considering a reasonable semantic interpretation of loss functions with λ PP ≤ λ BP < λ NP and λ NN ≤ λ BN < λ PN , he simplified the rule only based on the conditional probability P (S| [u]) and the loss function λ••. Then the minimum-risk decision rule (P), (B), (N) is able to written as:
(P1) If P (S| [u] R ) ⩾ α, then u ∈ POS (S);
(B1) If β < P (S| [u] R ) < α, then u ∈ BND (S);
(N1) If P (S| [u]
R
) ⩽ β, then u ∈ NEG (S). Where
Thus, the (α, β)-probabilistic positive region, (α, β)-probabilistic boundary region and (α, β)-probabilistic negative region of S ∈ 2
U
that are given by rules (P1), (B1), and (N1) can be redefined as below.
The decision-theoretic rough set model in an IRVDIS
In this subsection, the decision-theoretic rough set model in an IRVDIS is proposed.
First, the inclusion measure between two fuzzy sets is recalled.
If E, F ∈ 2
U
, then
Below, the decision-theoretic rough set model in an IRVDIS are defined as below, where the conditional probability is replaced by the inclusion measure between two fuzzy sets.
Denote
Thus
A 3WD method based on Gaussian kernel in an IRVDIS
In this part, a 3WD method is proposed based on Gaussian kernel in an IRVDIS by using the decision-theoretic rough set model, as well as its algorithm is given.
Suppose that (U, C ∪ {d}) is an IRVDIS. Given that α, β is a pair threshold with 0 ≤ β < α ≤ 1, δ ∈ (0, 1]. Based on the Bayesian decision procedure and Yao’s decision-theoretic study, the decision rule of D ∈ U/ind ({d}) can be written as below.
(P) : If
(N) : If
(B) : If
The detailed step-wise procedure as an algorithm of 3WD in an IRVDIS is given as below.
An application in auto diagnostic
In this section, an example of auto diagnostic is employed to clarify an application of the proposed method.
The detailed step-wise procedure as an algorithm of 3WD in an IRVDIS is given as below.
Auto diagnostic is the process of detecting the working ability or condition of an auto. In our following discussions, we researcch the problem on diagnosis of low configuration (large consumption of fuel and the traction control system is incomplete) with three-way decision. The standards of judging a low configuration auto depend on many factors (such as Cylinders, Displacement, Horsepower, Weight, Acceleration, Model year, Origin etc.). The IRVDIS in Table 1 is used to explain the process of auto diagnostic. In Table 1, U = {u1, u2, ⋯ , u13}
represents thirteen autos; C = {a1, a2, ⋯ , a7} represents seven different auto configurations; d denotes the decision attribute. However, sometimes we may misjudge an auto’s conditions only according to theses configurations. This requires a more accurate analysis. λ•• expresses the loss function when one takes a certain action with it’s corresponding state. Specifically, the set of two states is given by Ω = {X, ¬ X}, which indicates whether an auto has a low configuration(large consumption of fuel and the traction control system is incomplete). λ PP , λ BP and λ NP signify the losses incurred for improving the configuration, further observation and do not improve(that means high configuration) when an auto is poorly configured(large consumption of fuel and the traction control system is incomplete), respectively. λ PN , λ BN and λ NN express the losses incurred for taking the same actions when an auto is highly configured, respectively. The corresponding loss functions for each low configuration are carefully examined by technicians.
First, the low congfiguration auto’s condition is preliminarily judged by Table 1.U/ind ({d}) = {D1, D2, D3}, whereD1 = {u1, u2, u3, u4, u5} is a set of autos that have high configurations,
D2 = {u6, u7, u8, u9, u10} is a set of autos may large consumption of fuel or the traction control system is incomplete,D3 = {u11, u12, u13} is a set of autos that are low configuration.Pick
Then, by Example 4, the information structure of (U, C ∪ {d}) is proposed as below (i = 1, 2, 3):S (C, δ i ) = (S R i (u1) , S R i (u2) , S R i (u3) , S R i (u4) , S R i (u5) , S R i (u6) , S R i (u7) , S R i (u8) , S R i (u9) , S R i (u10) , S R i (u11) , S R i (u12) , S R i (u13)), where
S R i (u1) , S R i (u2) , S R i (u3) , S R i (u4) , S R i (u5) , S R i (u6) , S R i (u7) , S R i (u8) , S R i (u9) , S R i (u10) , S R i (u11) , S R i (u12) , S R i (u13) are thirteen fuzzy information granules.Next, we compute I (S R i (u j ) , D k ) (i = 1, 2, 3 ; j = 1, 2, ⋯ , 13 ; k = 1, 2, 3) (see Tables 9–11).
Computing I (S
R
1
(u) , D
i
) (i = 1, 2, 3)
Computing I (S R 1 (u) , D i ) (i = 1, 2, 3)
Computing I (S R 2 (u) , D i ) (i = 1, 2, 3)
Computing I (S R 3 (u) , D i ) (i = 1, 2, 3)
Ultimately, we give the 3WD rules (see Tables 12–14).
The 3WD rules of D1
The 3WD rules of D2
The 3WD rules of D3
λ•• signifies the loss function when we take a certain action with it’s corresponding state.
In this paper, the corresponding loss functions for each low configuration auto are carefully estimated by technicians is considered. Nevertheless, the process can’t be controlled as it is subjective and uncertain. To make better decisions, different decision rules are compared when α and β change (0 ≤ β < α).
Below, instructions are given by three cases.(1) Pick
Thus, the decision rule of D1 is proposed as below.(P1) If
Thus, the decision rule of D2 is proposed as below.(P2) If
Thus, the decision rule of D3 is proposed as below.(P3) If
This paper has studied 3WD for incomplete real-valued data. Gaussian kernel has been used to extract the fuzzy T cos -equivalence relation on the object set of an IRVDIS. The decision-theoretic rough set model in an IRVDIS has been given. A 3WD method has been presented based on Gaussian kernel in an IRVDIS by using this rough set model. An example of auto diagnostic has been employed to illustrate the proposed method’s feasibility. These results show that proposed method will contribute to disposing incomplete real-valued data. This study can be applied in evaluation and measurement of intellectual property value. In future work, we will apply other rough set methods to study an IRVDIS.
Footnotes
Acknowledgments
This work was supported by National Social Science Fund’s Major Research Special Project (18VHQ013), China-ASEAN Institute for Innovation Governance and Intellectual Property Research (2019ZCY04), Collaborative Innovation Center for Integration of Terrestrial and Marine Economies (2019YB22), Natural Science Foundation of Guangxi (2018GXNSFAA294134), Guangxi Science and Technology Program (2017AD23056), and Funding of high-level innovation team and excellent scholar program of Guangxi universities (Document No. [2019] 52).
