Abstract
An information system as a database that represents relationships between objects and attributes is an important mathematical model in the field of artificial intelligence. Hybrid data means boolean, categorical, real-valued, set-valued data and missing data in this paper. A hybrid information system is an information system where its attribute is hybrid data. This paper proposes a three-way decision method based on hybrid data. First, the distance between two objects based on the conditional attribute set in a given hybrid information system is developed and Gaussian kernel based on this distance is acquired. Then, the fuzzy T cos -equivalence relation, induced by this information system, is obtained by using Gaussian kernel. Next, the decision-theoretic rough set model in this hybrid information system is presented. Moreover, a three-way decision method is given by means of this decision-theoretic rough set model and inclusion degree between two fuzzy sets. Finally, an example is employed to illustrate the feasibility of the proposed method, which may provide an effective method for hybrid data analysis in real applications.
Introduction
Rough set theory, proposed by Pawlak [23], is a valid tool for dealing with uncertainty information. Based on rough set theory, an information system as a database that represents relationships between objects and attributes was presented by Pawlak [23, 24]. To date, rough sets have been widely used in many fields, such as uncertainty modeling [2, 30], reasoning with uncertainty [36], rule extraction [1, 33], classification and feature selection [7, 32], these fields are associated with information systems.
In recent decades, several extensions of Pawlak rough set model have been presented according to various requirements, including decision-theoretic rough set model [37], Bayesian rough set model [37], variable precision rough set model [31] and probabilistic rough sets model [40].
On the basis of decision-theoretic rough set model, Yao [38, 43] proposed a decision-making model which is called a three-way decision model, this model give positive region, boundary region and negative region to good semantic interpretation. More concretely, using these three regions, three-way decision is raised based on decision-theoretic rough sets where decision rules contain three kinds: the positive rule get from positive region is used to represent the acceptance of something; the boundary rule get from boundary region is used to represent further observation, or deferment decision; the negative rule get from negative region is used to represent the rejection of something. Three-way decisions well show the thinking mode of human in solving decision problems, moreover, provide a reliable theoretical foundation for decision making problems.
Three-way decisions have received much attention by scholars, as well as many excellent research contributions have been made. For example, in view of the decision risk of the loss function, Li et al. [17] put forward the corresponding three-way decision making models; taking into account the multiple source information environment; Qian et al. [26] proposed multi-granulation three-way decisions; considering different uncertainty decision measures; Liu et al. presented a series of uncertainty three-way decisions consisted of stochastic three-way decisions [15], interval three-way decisions [14] and fuzzy three-way decisions [16]; Hu et al. [3–6] studied three-way decision spaces; Yu et al. [42, 44] developed a lot of studies on three-way decision model with clustering analysis; Li et al. [18, 34] gave a three-way decision method in a hybrid information system with images and its application in medical diagnosis. Therefore, these existing works have greatly promoted the development of three-way decisions.
In reality, there are a lot of information systems in which the data are hybrid. To handle this phenomenon, Zeng et al. [46] studied a fuzzy rough set approach for incremental feature selection on hybrid information systems. Up to now, three-way decisions in hybrid information systems have not been studied. In this paper, we will investigate three-way decisions based on decision-theoretic rough set in a hybrid information system.
The remaining part of this paper is organized as follows. In Section 2, we recall some concepts about fuzzy relations, Pawlak’s rough sets and hybrid information systems, and explain basic thought of Bayesian theory and three-way decision. In Section 3, we construct distance between two objects based on the conditional attribute set in a hybrid information system and obtain fuzzy T cos equivalence relation induced by this information system by means of Gaussian kernel method. In Section 4, we propose a three-way decision method based on hybrid data. In Section 5, we give an example is given to show the feasibility of the proposed method. In Section 6, we introduce “Compliance with Ethical Standards”. In Section 7, we conclude this paper and highlight the prospects for potential future development in different fields.
Preliminaries
In this section, we review some notions about fuzzy sets, fuzzy relations, Pawlak rough sets and hybrid information systems, and clarify basic thought of Bayesian theory and three-way decision.
Throughout this paper, U denotes a finite set called the universe, 2 U expresses the family of all subsets of U. I means the unit interval [0, 1].
Put
Fuzzy sets and fuzzy relations
Fuzzy sets are extensions of ordinary sets. A fuzzy set P in U is defined as a function assigning to each element x of U a value A (x) ∈ I and A (x) is called the membership degree of x to the fuzzy set A.
In this paper, I
U
denotes the set of all fuzzy sets in U. The cardinality of A ∈ I
U
can be calculated with
If R is a fuzzy set in U × U, then R is called a fuzzy relation on U. In this paper, IU×U denotes the set of all fuzzy relations on U.
Let R ∈ IU×U. Then R may be represented by
If M (R) is a unit matrix, then R is said to be a fuzzy identity relation, and we write as R =▵; if r ij = 1, i, j ≤ n, then R is said to be a fuzzy universal relation, and we write as R = ω.
Let R ∈ IU×U. For each x ∈ U, we define a fuzzy set S
R
(x):
(1) Commutativity: T (a, b) = T (b, a) ,
(2) Associativity: T (T (a, b) , c) = T (a, T (b, c)) ,
(3) Monotonicity: a ≤ c, b ⩽ d = T (a, b) ⩽ T (c, d) ,
(4) Boundary condition: T (a, 1) = a .
(1) Reflexivity: R (x, x) =1,
(2) Symmetry: R (x, y) = R (y, x) ,
(3) T-transitivity: T (R (x, y) , R (y, z)) ⩽ R (x, z) .
Pawlak rough sets
In this section, we briefly recall some basic concepts about Pawlak rough sets.
Given that R is an equivalence relation on U. Then R partitions the universe U into a family of disjoint subsets called equivalence classes. For an equivalence relation R, the equivalence class including x is denoted as follows,
Using the inclusion degree, the lower approximation, upper approximation and boundary region of X ∈ 2
U
can be equivalently defined as
From the above definition, the objects of non-0 and non-1 membership function values in the universe are assigned to the boundary region since the equivalence relation appears to be too strict. In practice, some more diffuse classifications will make more sense. Therefore, Ziarko [45] first proposed an error-tolerance level with set inclusion [45] through a pair of threshold α and β instead of 0 and 1 where 0 ≤ β < α ≤ 1, in order to put forward the concept of variable precision rough set model which is defined as follows:
Hybrid information systems
In this section, we briefly recall some basic concepts about hybrid information systems.
If (U, A) is an information system, and A = C ∪ D where C is a conditional attribute set and D is a decision attribute set, then (U, A) is called a decision information system.
For any P ⊆ A and x ∈ U, denote
A hybrid information system (U, A)
A hybrid information system (U, A)
In this subsection, we review basic thought of Bayesian theory and three-way decision.
(1) P (U) =1,
(2) If A∩ B = ∅, then P (A ∪ B) = P (A) + P (B) .
Suppose that the state and decision set denoted by Ω = {ω1, ω2, ⋯ , ω
m
} and A = {a1, a2, ⋯ , a
n
}, respectively. P (ω
j
| [x]
R
) represents to the conditional probability of the object x in the state ω
j
. λ (a
i
|ω
j
) expresses the cost of taking the action a
i
under the state ω
j
. Then the expected utility of the action a
i
can be expressed as
For each object x, one can calculate the risk conditions of (a i |x) and select the minimum risk from the conditions of action.
Based on the lower and upper approximations of Pawlak’s rough sets, for any X ∈ 2
U
, positive region, boundary region and negative region on X can be divided as follows:
A positive rule generated by a positive region corresponds to an acceptance decision; A negative rule generated by a negative region corresponds to a rejection decision; The boundary rules generated by the boundary region make the delay decision. That’s what we say, three-way decision model.
In three-way decision, we make use of two state sets and three action sets depicting the decision process. Two states can be denoted by Ω = {X, ¬ X}, indicating that an equivalence class [X] R is in X and not in X, respectively. Three action sets can be described as A = {a1, a2, a3}, where a1, a2 and a3 indicate to acceptance of something, deferment decision and reject of something, respectively.
Given that P (X| [x]) and P (¬ X| [x]) are the condition probability of the object belongs to the state X and ¬X, respectively. For i = 1, 2, 3, λ (a i |X) and λ (a i | ¬ X) are denoted as the cost, for taking the action a i of the states X and ¬X, respectively. Then, the expected loss related to take the action a i as follows,
Through the Bayesian decision process, the decision rules with minimum-cost criterion as follows:
(P1) : x ∈ POS (X), if R (a1| [x]) ⩽ R (a2| [x]) and R (a1| [x])⩽ R (a3| [x]) ;
(B1) : x ∈ BND (X), if R (a2| [x]) ⩽ R (a1| [x]) and R (a2| [x])⩽ R (a3| [x]) ;
(N1) : x ∈ NEG (X), if R (a3| [x]) ⩽ R (a1| [x]) and R (a3| [x]) ⩽ R (a2| [x]) .
On account of Yao’s study, the minimum-risk decision rules (P), (B), (N) are able to written as:
(P1′) : x ∈ POS (X), if P (X| [x]) ⩾ α and P (X| [x])⩽ γ ;
(B1′) : x ∈ BND (X), if β⩽ P (X| [x]) ⩽ α ;
(N1′) : x ∈ NEG (X), if P (X| [x]) ⩽ β and P (X| [x]) ⩽ γ .
Where
The fuzzy T cos -equivalence relation induced by a hybrid information system
In this section, we give the fuzzy T cos -equivalence relation induced by a hybrid information system by means of Gaussian kernel method.
The distance between two objects based on the conditional attribute set
To construct the distance between two objects based on the conditional attribute set in a hybrid information system, a novel distance between two information values should be presented.
For A, B ∈ 2
U
, denote
Obviously, ∣A ⊕ B ∣ = ∣ A ∪ B ∣ - ∣ A ∩ B ∣ .
Because each boolean attribute can be viewed as a categorical attribute, thus the following proposition ensures that Definition 3.2 is reasonable.
If a (x) = a (y) , we have [x]
a
= [y]
a
, then [x]
a
∪ [y]
a
= [x]
a
∩ [y]
a
= [x]
a
.Thus
□
According to the above five definitions, a new hybrid distance is defined as follows.
The fuzzy T cos -equivalence relation induced by a hybrid information system
Gaussian kernel method is an important methodology in machine learning and pattern recognition. For making data linear and simplifying classification tasks, it maps data into a higher dimensional feature space [28]. Hu et al. [8, 9] found that there are some relationships between rough sets and Gaussian kernel method, so Gaussian kernel is used to obtain fuzzy relations. In this section, we use Gaussian kernel to extract a fuzzy T cos -equivalence relation on the object set of a given hybrid information system.
Gaussian kernel
Obviously, G (x, y) satisfies:
(1) G (x, y) ∈ [0, 1];
(2) G (x, y) = G (y, x);
(3) G (x, x) =1.
Thus the fuzzy T
cos
-equivalence relation
An algorithm for obtaining the fuzzy T cos -equivalence relation is designed as follows.
A three-way decision method based on hybrid data
In this section, we give a three-way decision method based on hybrid data.
Decision-theoretic rough set models for a hybrid information system
In this subsection, we propose the decision-theoretic rough set model based on the fuzzy T cos -equivalence relation induced by a given hybrid information system.
Based on the above decision-theoretic rough model, for any X ∈ 2
U
, the universe U can be divided into three regions, which are called the positive region, boundary region and negative region of X, denoted by
Obviously,
A three-way decision method in a hybrid information system
Under Bayesian risk decision criterion, the positive rule generated by the positive region corresponds to make a decision of acceptance; the negative rule generated by the negative region corresponds to make a decision of rejection; the boundary rule generated by the boundary region corresponds to make a decision of delay. That’s what we say, three-way decision method.
Based on the Bayesian decision procedure and Yao’s decision-theoretic study, three-way decision which are based on inclusion degree can be written as follows:
(P) : if
(B) : if
(N) : if
An illustrative example
In this section, we employ an example to illustrate the feasibility of the proposed method.
Below, we will describe in detail the process of three-way decision in a hybrid information system (U, C ∪ {d}).(1) Pick α = 0.8, β = 0.15 and
By using Gaussian kernel, we obtain the Gaussian kernel matric of (U, A) with respect to δ1.
Denote 
Denote
Conclusions
In this paper, the distance between two objects based on the conditional attribute set in a given hybrid information system has been developed. Gaussian kernel based on this distance has been applied for obtaining the fuzzy T cos -equivalence relation. Through the inclusion degree between two fuzzy sets, the decision-theoretic rough set model in this hybrid information system has been presented. A three-way decision method in this hybrid information system is given. An example by using this method has been employed to illustrate effectiveness of the proposed method in real applications. It shows that the proposed method shall be helpful for handling decision problems with hybrid data. In future work, we will apply other rough set models to study hybrid information systems.
Footnotes
Acknowledgments
The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by National Social Science Foundation of China (16XJY015), Research Topic of Guangxi Philosophy and Social Science Planning (15BGL003), Natural Science Foundation of Guangxi (2018GXNSFAA294134), Guangxi Higher Education Institutions of China (Document No.[2019] 52).
