Abstract
As an important extension of classical rough sets, local rough set model can effectively process data with noise. How to effectively calculate three approximation regions, namely positive region, negative region and boundary region, is a crucial issue of local rough sets. Existing calculation methods for approximation regions are based on conditional probability, the time complexity is O (|X||U||C|). In order to improve the computational efficiency of three approximation regions of local rough sets, we propose a double-local conditional probability based fast calculation method. First, to improve the computational efficiency of equivalence class, we define the double-local equivalence class. Second, based on the double-local equivalence class, we define the double-local conditional probability. Finally, given the probability thresholds and a local equivalence class, the monotonicity of double-local conditional probability is proved, on this basis, a double-local conditional probability based fast calculation method for approximation regions of local rough sets is proposed, and the time complexity is O (MAX (|X|2|C|, |X||X C ||C|)). Experimental results based on 9 datasets from UCI demonstrate the effectiveness of the proposed method.
Keywords
Introduction
Pawlak’s rough set theory provides a powerful mathematical framework for analyzing and handling imprecise, inconsistent, and incomplete data [1–3]. It has been successfully applied in many fields, such as machine learning, pattern recognition, artificial intelligence, and data mining [4–8]. However, classical rough set theory is based on complete inclusion relation, which cannot effectively processe data with noise. In order to better meet the needs of practical applications, by introducing conditional probability and probabilistic thresholds, different kinds of probabilistic rough set models have been proposed [9–21], such as decision-theoretic rough sets [11–13], variable precision rough sets [14], Bayesian rough sets [15], local rough sets [16–19], double-quantitative rough sets [20] and game-theoretic rough sets [21]. They are insensitive to noise data and improve the applicability and generality of rough sets [22–35].
Concept approximation is a key point in probabilistic rough sets. A concept is a subset of the universe of objects. Given an undefinable concept, probabilistic rough sets use a pair of crisp approximation sets, called the lower approximation set and upper approximation set to express this concept approximately. Based on concept approximation, the universe is divided into three pairwise disjoint approximation regions, namely, the positive region, the negative region and the boundary region. Three approximation regions are the base of attribute reduction and rule extraction of probabilistic rough sets. When calculating the approximations of a given target concept, the probabilistic rough set models first calculate the equivalence classes of all objects in the universe. Therefore, such models are very time-consuming and even infeasible when processing large-scale datasets. In order to improve computational efficiency of probabilistic rough sets, Qian et al. [16] proposed local rough set model. As an extended model of the probabilistic rough set, compared to the rough set model, the local rough set can process data with noise. Different from existing probabilistic rough set models, when calculating the approximations of the target concept, it does not calculate the equivalence classes of all objects in the universe, but only the equivalence classes of objects in the concept, which can avoid the calculation of a large number of equivalence classes.
In the local rough set, given a universe U and a target concept X, The calculation process of existing local rough set algorithms [16–19] generally includes the following three steps: 1) Compare each object in the target concept with all objects in the universe to calculate the equivalent classes. The time complexity is O (|X||U||C|), where |C| is the number of attributes; 2) Compute the conditional probability of each object based on its equivalence class, and the time complexity is O (|X||U|); 3) Compare the conditional probability of each object with the given probability thresholds to calculate the approximations, and the time complexity is O (|X|). It can be seen that the calculation of equivalence classes and conditional probability are the two most time-consuming steps in existing researches. The reason for the high time complexity of step 1 is that, the calculation of the equivalence class of each object requires comparing the object with all objects in the universe, and the reason for the high time complexity of step 2 is that, the calculation of the conditional probability of each object requires calculating the intersection of the given target concept and the equivalent class. Therefore, when calculating the equivalent classes of objects, reducing the number of objects compared with them can improve the calculation efficiency, and when calculating conditional probability, avoiding intersection operation can also improve the calculation efficiency. Based on the above analysis, this paper proposes a fast local rough set approximation algorithm based on double-local conditional probability. The time complexity of the algorithm is O (MAX (|X|2|C|, |X||X C ||C|)), where |X C | is the number of objects in the complementary set of the target concept. The main contributions are as follows. Firstly, in order to improve the computational efficiency of equivalence class, based on the idea of divide and conquer, the double-local equivalence class is defined; Then, based on the double-local equivalence class, the definition of double-local conditional probability is given; Finally, given the probability thresholds and a local equivalence class, the monotonicity of double-local conditional probability is proved, on this basis, a double-local conditional probability based fast calculation method for approximation regions of local rough sets is proposed, which can effectively improve the computational efficiency of approximation regions by reducing the number of comparisons between objects. Experiments based on UCI datasets demonstrate the effectiveness of the proposed algorithm.
In order to intuitively show the innovations of this paper, we illustrate the variations of time complexity of approximate regions calculation of classical probabilistic rough sets, local rough sets and the proposed double-local conditional probability based fast calculation method for approximation regions of local rough sets in Fig. 1. In Fig. 1, the time complexity of the approximate regions calculation of classical probabilistic rough sets is O (|U||U||C|), where the first |U| is the number of objects to calculate the equivalent class, and the second |U| is the number of objects to be compared with each object that needs to calculate the equivalent class. The local rough sets reduce the time complexity of approximate region calculation to O (|X||U||C|) by changing the number of objects to calculate the equivalent class from |U| objects in the universe to |X| objects in the concept. The proposed double-local conditional probability based fast calculation method for approximation regions of local rough sets further reduces the time complexity of approximate region calculation to O (MAX (|X|2|C|, |X||X C ||C|)) by changing the number of objects to be compared with each object that need to calculate the equivalent class from |U| to |X| and |X C |.

Variations of time complexity of approximate regions calculation of probabilistic rough sets, local rough sets and double-local conditional probability based fast calculation method for approximation regions of local rough sets.
The remainder of the paper is organized as follows: Section 2 briefly reviews some related concepts of the local rough sets. In Section 3, a double-local conditional probability based fast calculation method for approximation regions of local rough sets is proposed. In Section 4, the effectiveness of the proposed method is demonstrated by experiments. Finally, we conclude our work and present future challenges in Section 5.
In this section, we briefly review some related concepts of the local rough sets.
From definition 2, it is obvious that the calculation of the equivalence class of each object requires comparing the object with all objects in the universe, namely, the calculation of equivalence class is based on all (global) objects in the universe.
Based on these two approximations, the universe can be divided into three disjoint approximation regions, which are the positive region
In this section, we introduce a double-local conditional probability based fast calculation method for approximation regions of local rough sets. First, using the strategy of “divide and conquer”, we define the double-local equivalence class, which divides the calculation of equivalence class into two parts.
Where, X
C
is the complement of X,
Based on definition 5, ∀x ∈ U, when calculating its equivalence class, we compare it with objects in X and X C respectively rather than with objects in U. Based on the double-local equivalence class, we define the double-local conditional probability as follows:
Definition 6 indicates that the double-local conditional probability is only related to
According to Definition 6, given
According to Theorem 1, it is not difficult to find that when
Based on the above discussions, when probabilistic thresholds (α, β) and
When calculating three approximation regions of the X-based double-local conditional probability local rough set according to definition 7, ∀x ∈ X, suppose
The fast algorithm for calculating approximation regions of X-based double-local conditional probability local rough set is shown in Algorithm 1:
In Algorithm 1, step 2 calculates
To sum up, based on the double-local conditional probability, when
According to Definition 6, given
According to Theorem 2, it is not difficult to find that when
Based on the above discussions, when probabilistic thresholds (α, β) and
When calculating three approximation regions of the X
C
-based double-local conditional probability local rough set according to definition 8, ∀x ∈ U, suppose
The fast algorithm for calculating approximation regions of X C -based double-local conditional probability local rough set is shown in Algorithm 2:
In Algorithm 2, step 2 calculates
Both algorithm 1 and algorithm 2 can quickly calculate three approximation regions of the double-local conditional probability local rough set. Although the time complexity of algorithm 1 and algorithm 2 is the same, their actual computing time is different. According to algorithm 1, ∀x ∈ X, we need to compare x with all objects in X to compute
In the following part, we use an example to illustrate the fast algorithm for calculating approximation regions of double-local conditional probability based local rough set (FADLLRS).
A decision information system
The fast algorithm for calculating approximation regions of double-local conditional probability based local rough set (FADLLRS) is used to calculate three approximation regions of Table 1. Because |X| < |X C |, algorithm 1 is selected. The detailed calculation process of algorithm 1 is described as follows:
For x1 ∈ X, first, we calculate the local equivalence class of x1 with respect to X,
For x2 ∈ X, first, we calculate the local equivalence class of x2 with respect to X,
For x3 ∈ X, x3 is equivalent to x2. Therefore, x3 is compared with all objects in U and x3 should be divided into the boundary region. Compared with the calculation according to definition 4, the proposed algorithm does not reduce the number of objects to be compared with x3.
To sum up, the calculation results of FADLLRS are:
In this section, 9 datasets from UCI machine learning data repository [36] are selected to verify the effectiveness of the fast algorithm for calculating approximation regions of double-local conditional probability based local rough set. The datasets are described in Table 2.
Description of the datasets
Description of the datasets
In these datasets, Blood Transfusion Service Center is a categorical dataset, and the others are continuous datasets. The continuous dataset is discretized into categorical ones using equal-frequency discretization method [37]. We compare our proposed algorithm with two methods, including the classical method for calculating approximation regions of the local rough set [16] and the method for calculating approximation regions of the local rough set based on double-local rough set [38]. To simplify the discussion, the proposed fast algorithm for calculating approximation regions of double-local conditional probability based local rough set is denoted as FADLLRS, the classical method for calculating approximation regions of the local rough set is denoted as CLRS, and the method for calculating approximation regions of the local rough set based on double-local rough set is denoted as DLLRS. We implement experiments on a PC with Windows 10, Intel (R) Core (TM) i5-11400 CPU 2.60 GHz and 16 GB memory. The programming language is Python on the platform of IDEA 2018.
We implement two groups of experiments. In the first group of experiments, given a pair of probability thresholds and a set of target concepts with different sizes, we compare the running time of FADLLRS, CLRS and DLLRS to show the impact of different sizes of the target concept on the running time. In the second group of experiments, given a target concept and a set of probability threshold pairs, we compare the running time of FADLLRS, CLRS and DLLRS to show the impact of different probability thresholds on the running time. We run each experiment 100 times and take the average result.
Given a pair of probability thresholds: α = 0.7, β = 0.3. To compare the running time of FADLLRS, CLRS and DLLRS with different sizes of the target concept, 10%, 20%, …, 90% of the objects in the universe are selected randomly as the target concept. The running time of FADLLRS, CLRS and DLLRS with different sizes of the target concept is shown in Fig. 2.

Running time of FADLLRS, CLRS and DLLRS with different sizes of the target concept.
In Fig. 2, the x-axis represents the size of the target concept, the y-axis represents the running time. Figure 1 shows that FADLLRS is generally superior to CLRS and DLLRS with different sizes of the target concept.With an increase of the size of the target concept, the running time of CLRS and DLLRS increases monotonously. Moreover, the running time of FADLLRS first increases and then decreases with an increase of the size of the target concept, which is a convex line in Fig. 2. The reason is analyzed as follows.
According to FADLLRS, when the size of the target concept is less than the size of the complementary set of target concept, FADLLRS executes algorithm 1 and the running time of algorithm 1 increases with the increasing size of the target concept. When the size of the target concept is more than the size of the complementary set of target concept, FADLLRS executes algorithm 2 and the running time of algorithm 2 decreases with the increasing size of the target concept. So the running time of FADLLRS first increases and then decreases with an increase of the size of the target concept.
In the following part, given a target concept and a set of probability threshold pairs, we compare the running time of FADLLRS, CLRS and DLLRS with different probability threshold pairs. When calculating the approximation regions by FADLLRS, ∀x ∈ U, assume that

Running time of FADLLRS, CLRS and DLLRS with different probability thresholds.
In Fig. 3, the x-axis represents the probability thresholds (α, β) and the y-axis represents the running time. Figure 3 shows that the running time of CLRS and DLLRS does not change with different probability thresholds. The running time of FADLLRS increases with an increase of α and a decrease of β. However, with different probability thresholds, the running time of the proposed FADLLRS is significantly less than that of CLRS and DLLRS on all datasets, which indicates that FADLLRS is still superior to CLRS and DLLRS with different probability thresholds.
The local rough set is an important extended model of the rough set theory. The calculation of approximation regions is a crucial issue of local rough sets. However, the time complexity of existing methods for calculating approximation regions of local rough sets is high. Therefore, this paper proposes a double-local conditional probability based fast calculation method. Firstly, based on the idea of “divide and conquer”, we propose the double-local equivalence class. Based on the double-local equivalence class, the definition of double-local conditional probability is given. Finally, a fast algorithm for calculating approximation regions of double-local conditional probability based local rough set is proposed. Experimental results show the effectiveness of the proposed algorithm. In future work, on the one hand, considering the dynamic changes in the information system, we will further study the incremental algorithm for calculating approximation regions of local rough set when the decision information system varies dynamically with objects, attributes, or attribute values. On the other hand, considering the flourishing development of multi-criteria classification [39, 40], We will study the application of fast algorithms in multi-criteria classification.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 62076002, 61402005, 61972001), the Natural Science Foundation of Anhui Province, China (No. 2008085MF194, 1308085QF114, 1908085MF188), the Higher Education Natural Science Foundation of Anhui Province, China (No. KJ2013A015).
Conflict of Interest/Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Consent for publication
Manuscript is approved by all authors for publication.
Availability of data and materials
Data and materials will be made available on request.
Code availability
Code will be made available on request.
Authors’ contributions
Yi Xu: Conceptualization, Methodology, Supervision, Writing –original draft. Meng Zhou: Software, Data curation, Writing –review & editing.
