Abstract
The interval rough number rough sets model is the generalization of the classical rough sets. Since the lower approximation condition of interval rough number rough sets model is a full inclusion relation which is too strict to tolerate noisy data, strict conditions increase the possibility of a sample classified into a wrong class. To overcome the above shortcomings, an interval rough number variable precision rough sets model is proposed in this paper, which is combined with interval rough number similarity and the concept of variable precision rough sets. The model introduces the error parameter and can improve the tolerance of noise data. Then the related properties of the model are also proved. Moreover, we construct a maximal positive domain attribute reduction method based on the proposed model, which can process the data type of interval rough number without discretization. Finally, numerical examples are given to verify the rationality of the model.
Introduction
Rough set theory, put forward by Z. Pawlak [1] in 1982, deals with uncertainty and inconsistency in information systems. The classical rough set is classified according to the equivalence relationship and the lower approximation conditions are too strict. Moreover, there are variable factors in reality, and the single-value information system is no longer in line with the actual situation. Therefore, many scholars have expanded the rough set from different directions, such as discourse domain and relationship. At present the research on the fuzzy rough set [2, 3], the variable precision rough set [4, 5] and other forms is particularly prominent, which are widely used in risk assessment [6, 7], data mining [8, 9] and other fields. Particularly, interval rough number (IRN) can reflect a certain degree of certainty in the uncertainty of the data when describing the uncertainty of the data. It is more appropriate to use the IRN to describe some practical problems. Thus, the construction and application of its rough set model has become the focus of many scholars.
IRN is an evolution of the interval rough variable proposed by LIU [10] in 2002 in which the intervals are used to replace the exact value in the upper and lower approximations of the classical rough set. At present, the study of IRN is mainly centralized in the establishment and promotion of its related rough set model. Weng et al. [11] constructed a dominant relationship rough set model based on the expectation-variance and area probability comparison method; Cheng et al. [12] proposed the IRN rough set model under the similarity relationship by defining the similarity of IRNs; In the same year, Lv et al. [13] studied the coverage classification redundancy and attribute reduction problem of IRN information system; He [14] proposed a rough set model for IRN coverage based on compatibility relationship. In the same year, Weng et al. [15] introduced the dominance threshold and defined the dominant relationship rough set model based on the dominant degree. Synthesizing the above model research, it is found that the information processing of the lower approximation is similar to the definition of the full inclusion relationship in the classical rough set, which is too strict for the classification conditions to solve the problem of containing a certain degree of “inclusion" and “belonging". Therefore, the IRN variable precision rough set model is established based on the similar relationship which can improve the flexibility of processing information and the adaptability to noise data.
Attribute reduction, which is also called feature selection, is one of the important research directions of rough set theory. The research methods on attribute reduction are mainly divided into two categories: attribute reduction based on heuristic information [16] and attribute reduction based on distinction matrix [17]. According to the analysis of the literature [18], for the current attribute reduction algorithm, there are the following problems: first, the relative positive domain is not reduced with the reduction of the attribute, on the contrary, there are cases of becoming larger or unchanged, that is, there will be a jump phenomenon in the process of attribute reduction; secondly, in the algorithm process, once the relative positive domain obtained by the new attribute subset is detected to be the same as the original, the algorithm will end, but the resulting subset of attributes at this time may not be the minimum result, that is, the process of reduction will be missed. Therefore, this paper adopts the idea of maximum positive domain attribute reduction based on the proposed model to obtain the best attribute reduction that satisfies the conditions. The method can not only obtain the smallest reduction result, but also improve the computational efficiency of the whole process of attribute reduction, and provide reference for enriching and perfecting the IRN theory and the method of attribute reduction.
This paper is organized as follows. Section 2 mainly recalls the basic concepts about the IRN information system. In Section 3, we propose the IRN variable precision rough sets model. Simultaneously, we discusses some properties of the proposed IRN variable precision rough sets. The maximal positive region attribute reduction based on the model is proposed and some numerical examples are given in Section 4. Section 5 concludes our work.
Preliminaries
This section mainly reviews the theoretical knowledge of IRNs.
Obviously, according to Definition 2.4,
Let X and Y be non-empty subsets of a finite universe U, the measure c (X, Y) of the relative degree of misclassification of the set X with respect to set Y is defined as:
IRN variable precision rough sets model
The threshold β is introduced and the IRN variable precision rough sets model is proposed based on literature [12] in this section. Meanwhile this section proves its properties.
Interval rough number information table
Interval rough number information table
According to Table 1 and Definition 2.3, we have S0.7 (x1, x2) =0.3954, S0.7 (x1, x2) =0.5031, S0.7 (x2, x3) =0.5111 under attribute q1 and S0.7 (x1, x2) =0.4969, S0.7 (x1, x2) =0.7622, S0.7 (x2, x3) =0.3740 under attribute q2.
According to Definition 2.4, we have
Similarly, we have
If rough set based on complete similarity relation in literature [12] is established, we will have
(1)
(2)
(3)
(4)
(5)
(6)
(7)
If
(2) Since β ∈ (0.5, 1], then 1 - β ∈ [0, 0.5), for every
(3) Since X =∅, then
Similarly,
(4) For any
Similarly,
(5) Because X ∩ Y ⊆ X, X ∩ Y ⊆ Y, according to (4), we have
If X = {x1, x2} , Y = {x2, x3} , T
a
i
(x) = {x1, x2, x3} , β = 0.5, we have
Similarly,
If X = {x1, x2} , Y = {x2} , T
a
i
(x) = {x1, x2, x3} , β = 0.5, we have
(6) can be proved as the same as (5).
(7) According to Definition 3.1, for any
Similarly, we have
Since 0.5 < β1 ≤ β2 ≤ 1, one has 0 ≤ 1 - β2 ≤ 1 - β1 < 0.5, then, for all
Interval rough number decision table
In Table 2, decision class are D1, D2, and D1 = {x1, x2, x4} , D2 = {x3, x5, x6, x7}. Then, we take attribute q1 for example, and the similarity between objects under q1 is shown in Table 3.
Similarity between objects under q1
In Table 3, according to the Definition 2.4, we have
Similarly, we have
Then, we obtain
Because β ∈ (0.5, 1] is dynamic, it may be advisable to study the values of β, we take β = 0.6, 0.7, 0.8, 1 separately shown in Table 4:
Similarity between objects under q1
(2) Theorem 3.6 can be verified by analyzing Table 3. As the value of β becomes larger, the range of the lower approximation will gradually become smaller, the range of the upper approximation will gradually become larger, that is, the new proposed model can appropriately relax the strict conditions of the lower approximation and allow for a certain error.
(3) In particular, when β = 1 , the model is as the same as the IRN rough set model in the literature [12], which can be understood that the model in the literature [12] is a special case in this paper.
(4) Due to the restrictions are relaxed, the decision-makers can adjust the error threshold according to the actual problem and process the data flexibly, which is one of the advantages of the new model.
According to Examples 3.2 and 3.7, the model can improve the tolerance of the noise data, and realize the knowledge acquisition at multiple granular levels, so the IRN variable precision rough sets model proposed in this section is more widely applicable than that in literature [12].
This section obtains the best attribute reduction in the IRN information system based on the idea of a maximum positive domain, by removing attributes that are not important to the overall set of attributes.
For P ⊆ C, if POS (P, D, β) = POS (C, D, β) , ∀a ∈ P is necessary,
then P in C is a reduction regard to D, RED (C, D, β). The attribute reduction is not unique, but its cross is unique, called the kernel which is definited as:
The attribute reduction algorithm in literature [18] is presented in Algorithm 1.
Now, in the IRN information system, based on the idea of the maximum positive domain, we make the relative positive domain or relative positive domain cardinality becomes larger, and traverse the entire property set, so as to obtain the smallest reduction, the reduction algorithm of attribute deletion is proposed in Algorithm 2.
First, after iterating through the entire property set C and calculating to delete each property one by one, the resulting relative positive field is shown in Table 4:
We can find that the relative positive field has not changed, so any attribute can be deleted. If q1 is delete, then C1 = {q2, q3, q4};
Then the algorithm traverses C1, and calculates the relative positive field after deleting each attribute one by one, as shown in Table 5:
C/q
i
(i = 1, 2, 3, 4) Relative positive region
C/q i (i = 1, 2, 3, 4) Relative positive region
C1/q i (i = 2, 3, 4) Relative positive region
We can find that q2 or q3 is delete, the relative positive region becomes lager, which explains that these two properties have an impact on the entire property set C so that q2 and q3 cannot be deletable, thus, the best reduction result for the entire property is {q2, q3}.
Interval rough number decision table
C/q i (i = 1, 2, 3, 4) Relative positive region
Relative positive domain is POS (C, D, β) = {x7}. When any one of these attributes is omitted in C, the relative positive field is obtained as shown in Table 6:
In Table 7, we can find that if we delete attributes q1 or q2 or q4, it does not affect the overall decision classification, and if q3 ia deleted, that makes the relative positive domain larger, which affects the overall classification result. Therefore, q3 should be remained in the reduction and can not be deleted. Next, we can delete q1, so we have red = {q2, q3, q4}, at this time, then we repeat the above steps again, and obtain the final minimum reduction {q3}.
(2) Because it is traversed through the entire attribute set, layer by layer comparison, there will be no jumping and omission phenomenon, that is, the obtained attributes are reduced to the best.
(3) For the algorithm, it has been updated and improved that compared with the exhaustive method in literature [18] which needs 2
n
- 1 times to find all maximal positive domain reduction sets, the model in this paper calculates
The main purpose of this section is that attribute reduction is one of the most important applications of rough sets, and attribute reduction based on the idea of maximal normal domain is applied to the IRN information system at first time so as to expand the application of IRN theory.
In this paper, we propose the IRN variable precision rough sets model based on the similar relation in IRN decision information systems, which can improve the classification ability of decision-making and increase the tolerance of noise. In addition, we study the attribute reduction based on the idea of maximum positive domain in IRN decision information systems. That the IRN and variable precision rough set model are combined together is an extension of the theoretical system of the IRN rough set model and enriches the types of rough set models. In future work, the algorithm of attribute reduction under the model can be studied to improve the efficiency, or explore the classification of the model under multiple decision-making properties.
