Abstract
The graded rough set and multi-granulation rough set are two significant generalized rough set models which be constructed on the indiscernibility relation. They solve the issues that the degree of overlap between the equivalence class and basic set in different view points of quantitative information. The purpose of this study is that research the good points of graded rough set in the multi-granulation environment which in different granules have different grades based on dominance relation. Three new types of multi-granulation with different grades rough set models are proposed, which include the optimistic, pessimistic and mean multi-granulation with different grades rough set. Then, their principal structure, basic properties and serval kinds of uncertainty measure methods are investigated as well. Furthermore, an experimental evaluation about urban investment is utilized to verify the proposed properties, which is valuable for applying these theories to deal with practical issues.
Keywords
Introduction
In 1980s, Pawlak first proposed the Rough set theory (RST) in [7]. It is a mathematical tool to deal with uncertainty in an information system. Compared with the probability theory, fuzzy set and evidence theory, it have their own advantages in the fields of medical diagnosis, knowledge discovery, image processing and so on.
It is built on the basis of the classification mechanism and classified as the equivalence relation in a specific universe, and the relation constitutes a partition of the discuss universe. The main idea of RST is the utilize a known knowledge in knowledge base to approximate the inaccurate and indeterminate knowledge. But, there is a severe limitation for Pawlak rough set. That there is no fault tolerance mechanisms between equivalence class and the basic set. Thus, people take considered the degree of overlap of the equivalence class and the basic set in the view of quantitative information. They thought the RST should be improved and expanded that the quantification of particular value are considered. So, Yao et al. investigated the relationship between rough set model and modal logics, the graded rough set (GRS) was proposed by utilizing the modal logics [20]. The absolute quantitative information about knowledge and concepts are described in this model, and expands the Pawlak model. Zhang have accomplished a lot of research studies on graded rough set and related works [27]. Measures | [x] R | - |X ∩ [x] R | and |X ∩ [x] R | reflect the absolute number of | [x] R | elements outside and inside X, and called external grade and internal grade, respectively. So, based on above absolute numbers and the means union of the elements which whose classes’s internal grade aboutX is greater than k; means union of the elements which whose classes’s external grade about X is at most k [28]. This nature number k is called the grade of GRS. Because we describe the lower and upper approximations in absolute quantitative information so the does not hold, in general. The GRS model which based on two discuss universes was proposed by Liu [5].
Furthermore, the attributes with preference-ordered domains (sometimes it’s named criteria) are not studied in the original approaches. In a lot of real applications, we have to faced the issues that one attribute play a crucial role in make decision. For this reason, Yao considered this kind problem that through looking foe the relationship between orderings of attribute values and the objects to mining ordering rules in [21]. For looking for the rules, the general information table be generalized to an information system with order that is ordered information system(OIS). In [2], through taking into account the ordering properties of criteria the RST was expanded by Greco et al. The expanded model also named dominance-based rough set model (DRSA). Yang investigated the RST approach and reductions based on dominance relation (DR) in incomplete ordered information system [23]. Xu have systematic studied the rough set in OIS [15]. The main innovation of these works is use a dominance relation replace the indiscernibility relation.
As a useful tool for information processing, based on Zadeh’s ‘information granularity’ [29] the Granular Computing (GrC) was proposed. The GrC is a term of methodologies, tools and techniques for making utilize of granules in the process of solving real problems. In recently decades, more and more scientists focus on the study theory and applications of GrC. In many fields it has been successfully applied such as knowledge discovery, concept formation and machining learning. Based on this view, the classical single-granulation Pawlakąŕs rough set have been extended to a multi-granulation rough set model by Qian et al. [8, 9]. And later, many researchers have extended the multi-granulation rough set. Xu et and Wang further investigated a fuzzy multi-granulation RST model in [16], a generalized multi-granulation RST approach [17] and a multi-granulation RST model in ordered information systems. The hierarchical structure properties of the multi-granulation RST and the multi-granulation RST in incomplete information system were investigated by Yang et al. in [24] and [25], respectively. Lin et al. presented a neighborhood-based multi-granulation RST [6]. Furthermore, the properties of multi-granulation RST and the topological structures were deep analyzed by She et al. in [10]. Recently, follow the Xu’s work Li et al. developed a further study of multi-granulation T-fuzzy rough set, relationships between multi-granulation and classical T-fuzzy rough set were studied carefully [3]. Xu researches the uncertainty measure of Atanassov’s intuitionistic fuzzy T equivalence information systems [18], and Fan investigates uncertainty measures in ordered information system based on approximation operators [1]. In recent years, processing large volumes of data has presented a challenging issue, particularly in data-redundant systems. Li et al. proposed a parallel CRF algorithm by combining and parallelizing the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) and Viterbi algorithms [4], Zhu demonstrated how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm [30], Salah investigated the Lazy-Merge, a novel implementation of sequential in-place k-way merging algorithms, that can be utilized in their parallel counterparts [11], Xu developed an improved hybrid version of the chemical reaction optimization (CRO) method called HCRO (hybrid CRO) to solve the directed acyclic graph(DAG) DAG-based task scheduling problem [19], Xiao first elaborated a new and important query in the context of an uncertain database, namely uncertain top-(k,l) range (UTR) query, which retrieves l uncertain tuples that are expected to meet score range constraint [CR 1, CR 2] and have the maximum top-k probabilities but no less than a user-specified probability threshold q [14], Zhou researched skyline queries over uncertain data in distributed environments (DSUD query) [31], Truong proposed a new chemical reaction optimization with greedy strategy algorithm (CROG) to solve KP01 and explained the operator design and parameter turning methods for CROG [13], Tang et al. systematically designed a security-driven scheduling architecture that can dynamically measure the trust level of each node in the system by using differential equations [12].
The rest of this paper is organized by the following way. Some necessary and essential concepts are introduced in Section 2. In Section 3, three types multi-granulation with different grades rough set models are constructed based on DR and their properties are discussed. In Section 4, the uncertainties of the proposed models are measured. And an experimental evaluation about urban investment is utilized to verify the properties of the proposed in Section 5. The Section 6 are the conclusions and the further studies of this topic.
Preliminaries
In this section, a few basic notions about RST in OIS [2, 15], graded rough set [20, 28] and multi-granulation rough set [8, 9] are simply reviewed.
The rough set based on dominance relation
An order triple I = (U, AT, F) is an information system, let for a criterion a l ∈ AT and it’s domain is complete pre-ordered through an outranking relation ≽ a l , then x ≽ a l y implies that x is at least no less than y under the criterion a l . In other words, we can call that x dominates y. One can define x ≽ a l y by f l (x) ≥ f l (y) use the increasing preference, that is a l ∈ AT, x and y ∈ U. Give a subset A means A ⊆ AT have for all a l ∈ A, x ≽ y is equivalent to x ≽ a l y. There is a another mean that x dominates y under the all rules in A. In summary, an information system which induced by a dominance relation called ordered information system and it’s usually be denoted as I≽ = (U, AT, F).
Suppose there is an ordered information system that I≽ = (U, AT, F), given A ⊆ AT, the is a dominance relation in I≽, namely, , and is the set of dominance classes induced by a dominance relation , where is called dominance class containing x, and . For all X ⊆ U, A ⊆ AT, the upper and lower approximations are and , respectively. When , in this ordered information system one may call X is a rough set.
Graded rough set
The absolute quantitative information on basic concepts and knowledge granules are the main investigated topics of the graded rough set model. It is also a generalization of the classical rough set. If give a non-negative integer k and it is named ‘graded’.
Given an information system I = (U, AT, F), for any A ⊆ AT, X ⊆ U, k ∈
Multi-granulation rough set
We just introduce the models of multi-granulation rough set and the details can be found in references [2]. If there is an information system that I = (U, AT, f), and A j ⊆ AT, 1 ≤ j ≤ m, m is the number of the considered attribute sets and [x] A j = {y| (x, y) ∈ R A j }, R A i is an equivalent relation with respect to the attributes set A j . We can get the definitions of the optimistic upper and lower approximations of the set X ∈ U under the A j are , and , respectively. The pessimistic lower and upper approximations of a set X ∈ U about the A j ⊆ A, 1 ≤ j ≤ m can be similarly defined by following way. They are , and , respectively. Moreover, , (), one can say that X is the optimistic (or pessimistic) rough set with respect to multiple equivalence relations or multiple granulations. If not the X is the optimistic (or pessimistic) definable set about these multiple equivalence relations or multiple granulations. It’s similar to the Pawlak rough set, we can obtain other regions according to the upper and lower approximations, respectively.
Based on the definitions of optimistic and pessimistic multi-granulation rough set, the follow properties are established. They are , and , respectively. According to these two types of rough set models, we can see that the optimistic boundary region is smaller and the pessimistic boundary region is bigger than the classical rough set model. In some cases, it can deal with uncertain problems easily.
Multi-granulation with different grades rough set based on dominance relation
To study the good points of graded rough set in the multi-granulation environment based on OIS, three new types multi-granulation with different grades rough set models are proposed in this section. They are the optimistic, pessimistic and mean multi-granulation with different grades rough set, respectively.
The optimistic multi-granulation with different grades RST based on DR
We combine optimistic multi-granulation and graded rough set model which in different granules have different grades based on dominance relation.
We can also define the optimistic multi-granulation with different grades negative region, positive region, and boundary region of X in the ordered information system. ; ; ; ; .
In the graded rough set model the lower approximation is not totally included in upper approximation, it means that not holds all the time. So, we describe the boundary region through defined the lower and upper boundary region. The “ △ " is the symmetric difference operator of sets. So, the boundary region of X can de described as , too.
Thus, the theorem is proved. □
; ; ; ; ; ; ; .
(3) If X ⊆ Y that mens for any j ∈ {1, 2, …, m. So, . For any exist j s.t. so, . That is, , namely, .
(4) For any , then, for all j have . So, , hence, . Thus, .
(5) For any X, Y ⊆ U, X ∩ Y ⊆ X, Y, then, according property (3) the “⊆” can be proved. For . Based on above definition, we can get at last have one j s.t. and . So, it must have one j, such that . Thus, , means “⊇” is hold. Consequently, the prove is fulfilled.
The remaining (6), (7) and (8) can be similarly to prove. □
Especially, for ∀ k j = 0, j ∈ {1, 2, …, m}, then , . So, the multi-granulation with different grades rough set model is also an expansion of multi-granulation RST model.
It’s similarly to optimistic multi-granulation rough set model, we investigate pessimistic multi-granulation with different grades RST in OIS.
Moreover, , we say that X is a pessimistic rough set about these multiple grades. If not, one can say that X is pessimistic definable set with respect to multiple grades and multiple granulations in ordered information system. Based on the above approximation operators, the other rough regions can be defined like the optimistic case.
; ; ; ; ; ; ; .
Especially, for ∀ k j = 0, j = 1, 2, …, m, then , . So, the multi-granulation with different grades rough set is an expansion of the multi-granulation rough set.
To discuss the mean graded when there many grades in multi-granulation environment, we define a mean multi-granulation with different grades rough set in OIS.
Then, other regions can be obtained like previous.
; ; ; ; ; ; ; .
In this section, serval measures are utilized to calculate the uncertainty of these models which proposed in previous. The uncertainty of knowledge is caused by the boundary regions, in the view point of rough set approximation. The larger the boundary area is, the more uncertainly. The accuracy, roughness and approximation quality are studied in the next. It should be noted that the means the basic set is X, and the grade of j-th granule is k j . According to the uncertainty measure method in classical rough set, we can define the accuracy, roughness and approximation quality for each model as follows.
Where the | · | means the cardinality of a set. Similarly to this way another two models’ accuracy can be defined as = and , respectively.
For simplicity, we just give a generic definition for the remained uncertainty measure. They can be applied into each model in same way. Corresponding to accuracy is roughness,The first type roughness of set X with respect to is
Roughness measure is the well-known Marczewski-Steinhaus distance between the lower and upper approximations according to Yao [22]. Because some properties of expanded model have changed. We need define a new method to measure the accuracy, and the definition of second type accuracy as follows.
Similarly to the first type of roughness, the second type roughness for each model can be obtained as followed way that
Simplification the Equation (16) have . Especially, if k j = 0 for any j = 1, 2, …, m then, there are α I = α II and β I = β II . There are other uncertainty measure methods besides the two types of accuracy. We will give the approximation degree and approximation quality in the next.
In Equation (17), if the denominator X revised to U and other unchanged. It’s named approximation quality of set X with respect to . That is
Based on these definitions one can say that these types measure methods are basic and conventional uncertainty measures based on approximation set. In fact, Uncertainty is a very common and complicated phenomenon in real world. There are also other uncertainty measure methods like information entropy, condition entropy and Shannon entropy and so on. In our study, these measures are important numerical characterizations that quantify the imprecision of a rough set model that caused by their boundary regions. In this section, we study on a urban investment case based on our previous discussion that find the best solution from a few of given alternative investment options.
There’s an investment company is looking for a good city as investment project and there are ten projects can be selected. For more effective, the company employed serval experts test and scoring these city with five indexes. They are traffic, medical, education, tax and salary, respectively. Let I≽ is an ordered information system about city investment where the universe U = {x1, x2, …, x10} stands for ten cities, the set of condition attributes are AT = {Tra . , Med . , Edu . , Tax, Sal . }, and let A1 = {Med . , Edu . , Sal .}, A2 = {Tra . , Med . , Edu} are two granules. The score of city for each index have four levels (good ≽ fine ≽ poor ≽ bad) and the evaluation data as shown in Table 1.
Model demonstration
In order to facilitate understanding of these model, we exhibit a computational process for each rough set model. We first calculate the dominance classes for each object with respect to granule A1 and A2. Suppose X = {x1, x2, …, x5}, and k1 = 1, k2 = 2 based on the above dominance classes by the definitions we can get the follows results.
Based on the upper and lower approximations other rough regions of X can get as follow. They are , = {x8}, = {x4, x9}, = {x1, x2, x3, x5, x6, x7, x10}, and = {x1, x2, x3, x4, x5, x6, x7, x9, x10}.
The other rough regions of X can be got by the lower and upper approximations as. They are , =∅, = {x5, x6}, = {x3, x4, x8, x9, x10}, and = {x3, x4, x5, x6, x8, x9, x10}.
According the lower and upper approximations, the other rough regions of X can get as follow. They are , , =∅, = {x2, x3, x4, x5, x7, x8, x9, x10}, and = {x2, x3, x4, x5, x7, x8, x9, x10}.
It’s obviously that these three types results are not entirely consistent. Then, the uncertainties are not entirely consistent for them and servals kinds of uncertainty measure methods are necessary. Consequently, in different fields should select different model according the different requirements in practical application. Based on the calculated approximation sets, we can measure the uncertainty of the alternative projects to estimate the investment.
Uncertainty evolution
To evaluate the performance of the proposed uncertainty measure methods, we conduct a series of experiments to calculate these four uncertainty measures. The experimental evaluation is running on a personal computer with following hardware and software that Intel i3 - 3217U, Windows8 64 bit and Visual C ++ 6.0. For convenience, we still take k1 = 1 and k2 = 2, more granules can be obtained in a similarly way.
Suppose there are five cities should be invested, but there is no other factors which help the decision maker. He has to randomly select five cities to make investment that means |X|=5 (x i ∈ X is a city between x1, x2, …, x10 and not repeat any other cities in X). In our experiment, we randomly select five cites from the given alternatives, and uncertainty evolutions for them as shown in Table 2.
According to the Table 2 and Fig 1., one can see that the otherness of the first type accuracy between these models is very big. That is because the lower approximation set is not necessarily include in upper approximation set in graded rough set. It means this type accuracy has a smaller significant in experimental evolution. That’s reason for us why proposed the second type accuracy. The remained three uncertainty measure approaches can estimate the investment program better. From the Fig. 2., Fig. 3. and Fig. 4., we can get that the second project (x1,2,5,6,7) with higher accuracy, approximation degree and approximation quality for each model. Furthermore, it is easy to find that the the measure results of pessimistic model bigger than another two models almost all the time. But there is no determinate order relationship between them. According to these experimentations and investigations, we recommend the second program as best investment project to a decision maker. There is no doubt that there are more recommended programs, but it’s best in the given ten options.
Conclusion
The multi-granulation rough set and graded rough set are important expansions of classical rough set and have been applied into many fields. In our study, we integrate the good points of graded rough set and multi-granulation rough set theory in OIS. The main contribution of this paper is that we constructed three different types of multi-granulation with different grades rough set associated with granular computing, in which the upper and lower approximation operators are got by multiple dominance relations, respectively. We have discussed some properties of these three types RST models and four uncertainty measures are proposed for each RST model. They are two types of accuracy, approximation degree and approximation quality, respectively. Finally, we make a series of experiments about city investment. That can provide a valuable suggestion for decision makers in real world. This study extended classical rough set in viewpoint of GrC and meaningful compared with the generalization of RST. In our further work, we will research the reduction of these proposed models and solve the practical issues.
Footnotes
Acknowledgments
This work is supported by Natural Science Foundation of China (Nos. 61472463, 61402064), National Natural Science Foundation of CQ CSTC (Nos. cstc 2013jcyjA40051, cstc2015jcyjA40053), and Graduate Innovation Foundation of Chongqing University of Technology (No. YCX2014236), and Graduate Innovation Foundation of CQ (No. CYS15223).
