The attribute reduction algorithm based on parallel computing

Abstract

For a decision table, it is firstly proved that the effective pair is equivalent to improved discernibility matrix to guarantee that the attribute reduction based on discernibility matrix can be calculated by effective pair. To get the attribute reduction more quickly, a parallel radix sort model is proposed. Then a simple decision table and effective pair are obtained. Taking the length of distinguishable or indistinguishable elements string among the effective pair as heuristic information, an attribute reduction algorithm based on parallel Logical OR is proposed. Finally, the examples and experiments are shown to verify the effectiveness and feasibility of the proposed algorithm.

Keywords

Rough set attribute reduction effective pair parallel computing Logical OR multi-thread

1 Introduction

Rough set theory is a new mathematical tool to deal with fuzzy and uncertain knowledge without any precondition [1, 2], which can effectively analyze and deal with imprecise, incomplete information, etc., and can find hidden knowledge in large amount of data. At present, rough set theory has been successfully applied to such fields as machine learning, intelligent control, biological and medicine fields [3 –7], etc. Attribute reduction is one of the focus points in the research on rough set. It intends to eliminate redundancy and unimportant data by replacing excavated data set with fewer attribute without affecting the capacity of classification in information system. Minimum attribute reduction (MAR) problem in the context of rough set theory is an NP-hard nonlinearly constrained combinatorial optimization problem [8]. Until now, heuristic algorithm is said to be the best way to solve the MAR problem.

There are some common heuristic methods, such as information entropy about condition attribute, the granularity size of information, the frequency and dependence of attribute distinction in discernible matrix, etc.

Attribute reduction from large data is an expensive preprocessing step. To accelerate the process of attribute reduction, many kinds of techniques have been proposed in recent years [3–8 , 17]. For example, Literature [3] proposes an instance selection method based on discernibility matrix with the training set. However, it affects the classification accuracy. Literature [15] proposes a method based on the vertically partitioned binary discernibility matrix with essential part being transferred into the memory merely, and others stored on the external space. Literature [7] proposes an updating attribute reduction algorithm based on differential matrix, which the minimum attributes reduction set can be acquired quickly. But it consumes much time when the algorithm updates the set of discernibility element.

As these methods are still sequential and cannot process big data with a cluster. To get the optimal reduction or approximate result quickly, researchers focus on algorithm itself to improve the efficiency instead of considering how to divide the data set to adapt to processor processing. At present, there are still fewer researches on the algorithm of parallel processing of multiprocessor. Sun L L proposes a merge sort algorithm based on multi-thread in references 12, which uses multiple threads to improve the speed of execution of the algorithm. ZHANG J B proposes a reduced way of improving information entropy in references 9, which applies the MapReduce model to reduction to improve the efficiency of attributes reduction. XU F F raises a parallel attribute reduction algorithm based on information view to improve information matrix in references 10. However, considerable storage space is wasted in storing information matrix, and it doesn’t take advantage of multithread to achieve parallel computations. This paper proposes an effective concept of element pair that combines effective pairs with every condition attribute to form indistinguishable string and makes it the basis of selecting important attribute. Then it proposes an attribute reduction algorithm. In order to make use of multiprocessor, the algorithm can be realized by parallel multithread so that the speed can get approximate N times of serial algorithm and the total time complexity is O (|C||POS_C (D) ||U′|/ - (N - 1)). As experimental results show, the algorithm of attribute reduction with parallel multithread processing is effective and feasible. The parallel computational model based on multi threads can accelerate dramatically in other fields as well.

The rest of this article is organized as follows. Section 2 describes some related concepts of the Rough Set. Section 3 includes the elementary background introduction about simultaneous multithreading and its application to radix sorting. Section 4 proposes the algorithm about parallel attribute reduction based on logical OR. Section 5 presents the experimental analysis. The paper ends up with conclusions and future work in Section 6.

2 Preliminary

Definition 1. [11] Let DT = (U, A, V, f) be a decision table. U is a non-empty finite set. A = C ∪ D is a set consisting of attributes of objectives, and subsets C and D are called condition attribute set respectively and decision attribute set, C∩ D = ∅. A takes value in V and V ⋃ _a∈C∪DV_a. Meanwhile, Let f : U × A → V be an information function, i.e., for any a ∈ A and x ∈ U then f (x, a) ∈ V_a. For every subset P ⊆ A, a indiscernibility relation IND (P) is defined as follows,

$\begin{matrix} IND (P) & = & {(x, y) \in U \times U | \forall a \in P, \\ f (x, a) = f (y, a)} \end{matrix}$ (1)

Obviously, if IND (P) denotes as U/P, U/P is an equivalence relation. We assume U/P includes x, the equivalence relation x is defined as follows, $[x]_{P} = {y | \forall a \in P, f (x, a) = f (y, a)}$ (2)

Definition 2. [11] Given the decision table DT = (U, A, V, f), for every subset X ⊆ U and indiscernibility relation IND (P), the lower approximation set and the upper approximation set of X can be defined by the basic set of P as follows, $P^{-} (X) = \cup {Y_{i} | (Y_{i} \in U | IND (P), Y_{i} \cap X \neq \emptyset)},$ (3) $P_{-} (X) = \cup {Y_{i} | (Y_{i} \in U | IND (P), Y_{i} \subseteq X)} .$ (4)

Definition 3. [11] Given the decision table DT = (U, A, V, f), U is the domain of objectives. P is a subset of C. The positive region of D with respect to P is defined as follows, ${POS}_{P} (D) = ⋃_{X \in U / D} P_{-} (X)$ (5)

The negative region is defined as NEG_P (D) = U - POS_P (D).

Definition 4. [11] Given the decision table DT = (U′, C ∪ D, V, f), U′ is the domain of simplified decision table without repetitive objectives and $U^{'} = U_{POS}^{'} \cup U_{NEG}^{'}$ , and subsets $U_{POS}^{'}$ and $U_{NEG}^{'}$ are called the positive region and the negative respectively. The element m_ij of discernibility matrix M defined as follows,

$\begin{matrix} m_{ij} \\ = {\begin{matrix} {a | a \in C, x, y \in {POS}_{C} (D)^{'} \land f (x, a) \neq f (y, a) \\ \land f (x, D) \neq f (y, D) \land x \neq y} \\ {a | a \in C, x \in {POS}_{C} (D)^{'}, y \in {NEG}_{C} (D)^{'} \land \\ f (x, a) \neq f (y, a)} \\ \emptyset otherwise \end{matrix} \end{matrix}$ (6)

Definition 5. According to the improved discernibility matrix M in Definition 4, delete all the empty sets. The rest nonempty elements are distinguishable marks in terms of every two objectives on condition attribute. Suppose the effective pairs are formed through comparison between two different elements, the definition will be as follows,

$\begin{matrix} Effective_pairs \\ = {(x, y) | (x \in U_{POS}^{'} \land y \in U_{NEG}^{'}) \\ \lor (x, y \in U_{POS}^{'} \land f (x, D) \neq f (y, D))} \end{matrix}$ (7)

Theorem 1. The elements of improved discernibility matrix are equivalent to effective pairs.

Proof. (1) By definition 4 and 5, the element pair in M can certainly be found ineffective pair.

(2) Suppose, any c is subset of condition attribute set C and D is decision attribute. For every element pair (x,y) of effective pairs, x is subset of $U_{POS}^{'}$ and y is subset of $U_{NEG}^{'}$ , we can’t find f (x, c) ≠ f (y, c). As c is arbitrary, we can get the result of f (x, c) = f (y, c). The elements of effective pairs are from simplified decision table, different from each other. So the result of f (x, C) = f (y, C) is false, and the assumption is not the case. We can find an element pair (x, y) in M and f (x, c) ≠ f (y, c). Similarly, if x and y are all subset of $U_{POS}^{'}$ and f (x, D) ≠ f (y, D), we can find an element pair (x,y) of M and f (x, c) ≠ f (y, c).

In a word, M and effective pairs are equivalent.

Definition 6. For each element pair (x,y) of effective pairs, different numbers are given replace it. The value of x about C_i is different from y. If all the different numbers are connected, a distinguishable and indistinguishable order strings of attribute set C_i can be gained as follows: $\begin{matrix} Distinct_String (C_{i}) \\ = {(x, y) | f (x, C_{i}) \neq f (y, C_{i}), \\ (x, y) \in Effective_pairs} \end{matrix}$ (8) $\begin{matrix} InDistinct_String (C_{i}) \\ = {(x, y) | f (x, C_{i}) = f (y, C_{i}), \\ (x, y) \in Effective_pairs} \end{matrix}$ (9)

Proposition 1. Suppose P ⊆ C|U/P_i| = 1, then all the sample data of U/P_i is belong to positive region POS_P (D).

Proof. By the Definition 1, given two samples x₁, x₂ from U and f (x₁, P) = f (x₂, P), then x₁ and x₂ belong to the same division. x₁ and x₂ are being contained in positive region POS_P (D) if f (x₁, D) = f (x₂, D); otherwise they are parts of negative region NEG_P (D). There is only one sample data x in the ith division of U/P because of |U/P_i| = 1. Suppose x is included in negative region, we can find other sample data y and f (x, P) = f (y, P) and f (x, D) ≠ f (y, D). But f (x, P) is only one since |U/P_i| = 1. In conclusion, this assumption is invalid and x will be contained in positive region.

Proposition 2. The shorter the indistinguishable order string InDistinct _ String (C_i) is, the stronger its ability to distinguish will be.

Proof. By Definition 6, we know that in a given decision table, the corresponding Effective _ pairs are settled. Effective _ pairs is made up of Distinct _ String (C_i) and InDistinct _ String (C_i). Obviously, the longer the number string of Distinct _ String (C_i) is, the more distinguishable the samples will be on condition attribute C_i. The shorter the number string of InDistinct _ String (C_i) is, the longer the number string of Distinct _ String (C_i) is, so the stronger its distinguishing ability is.

3 Parallel radix sort

3.1 Comparison of parallel computing and serial computing

Suppose the task is made up of n parts, the running time of every part is T_i, it’ll take time $\sum_{i = 1}^{N} T_{i}$ to complete the task at least by using serial computing method. However, it’ll take time Max (T_i) (1 ≤ i ≤ N) by using parallel computing. The CPU had a single core in the past and thread was carried out through serial computing. In order to realize efficient parallel operation, it’s necessary to operate on multicore processors so as to make the best use of every core. If the program is operated on P- core system with every core running a thread, the best expected effect is T_parallel = T_serially/p. As the number p increases, the time spent on computing will be less and less. Due to thread switch and data transmission between threads, if the number of threads grows, the system overhead will increase, which will fail to get the optimal result.

3.2 The application of parallel computing to radix sort

There are lots of repetitive or inconsistent sample data in mined data set, which has great influence on the time and space of the digging results. So it’s advisable to delete theose useless data to reduce the redundancy. This paper uses radix sorting to sort the data set. The traditional radix sort took the serial sort into consideration, which was applicable to single-core processor. Nowadays, with the development of science and technology, modern computers operate on multi-core. Consequently, applying the characteristic of parallel processing to radix sort can make best use of multi-core process and quicken the speed of sorting. Using multi-thread can speed up the parallel result in multi-core system, but more threads don’t mean the faster speed. With the threads growing, the more system overhead will be increased because of more thread switches. An important factor that affects both the speed and performance of processors is the number of threads and processors. We can get optimal results, if the number of threads is equivalent to processors [12]. Suppose the machine has N processors, among which one processor is used to collect data, and the other N-1 processors are used to sort data to improve the efficiency, it is practical to sort by using the following model to make use of every processor to complete the program of sorting and reduction (see Fig. 1).

Doctor Xu Z Y used radix sorting to get simplified decision table in paper [13]. But it also has two main demerits:

It cannot delete the distinguishable elements effectively, and affect the efficiency of completing positive and negative region.

It is suitable only for serial computation, which cannot reach their full effect of multicore in multicore system.

The simplified decision table algorithm based on parallel radix sort (Algorithm 1)

Input: U, N, C; //U is original data set, N is the number of processors about CPU

Output: U′, POS_C (D) , NEG_C (D)

Step 1. Let U^" = U, P = C

Step 2. Select a condition attribute C_i in a serial fashion among the attribute set P, all the sample data is distributed equally to N-1 threads with the last one collecting data.

Step 3. The data is divided by different key words.

Step 4. Collect the every subdivision, if |U/P_i|=1 then POS_C (D) ∪ U/P_i and U^" = U^" - U/P_i

Step 5. if P ! =∅ or U^" ! =∅ then P = P - C_i continue Step 2.

Step 6. Scan all the subdivisions. If |U/P_i| ≥ 2 and, ∃x ∃ y f (x, C) = f (y, C) ∧ f (x, D) ≠ f (y, D)

then select the first element of U/P_i and put it into NEG_C (D)

else

put it into POS_C (D)

Let U′ = POS_C (D) ∪ NEG_C (D)

The time complexity of Step 1 is O(1). In Step 2, data of |U| is averagely partitioned and distributed to N-1 processors, and the complexity is O (|U|). In Step 3, all data is divided by different key words, and the time complexity is O (|U|/(N - 1)). In Step 4, collect all the subdivision, and the time complexity is O (|U/P_i|/(N - 1)). Step 5 polls to see whether all divisions of condition attribute are completed, and calculates O (|C|) times. So, the time complexity of Step 4 and Step 5 is O (|U/P_i||C|/(N - 1)) total. Step 6 collects all the first sample of division, and time spent is O (|U/C|).

In conclusion, the total time complexity is Max (O (|U|) , O (|U/P_i||C|/(N - 1))).

4 The attribute reduction algorithm based on parallel computing

4.1 The attribute reduction algorithm based on Logical OR (Algorithm 2)

Lots of attribute reduction algorithms are based on discernibility matrix [14, 15], which put emphasis on the number of discernibility elements. The more the discernibility elements are, the important the attribute is. If all elements are linked to make into a linked list, then the longest one is selected every time, leading to the waste of much more time. Researcher shows that greater importance can be attached to the information of indistinguishable elements. The fewer the pair of indistinguishable elements are, the stronger the individual discrimination becomes. In this way, attribute including the least indistinguishable elements is selected. And when an attribute is added to the core, intersection of them can be selected as the prerequisite for the next attribute, which will speed up the minimum reduction.

Proposition 3. Compare the indistinguishable queue Q1 with another queue Q2. If the elements of Q2 is indistinguishable, then the merged result is Q1 ∩ Q2, otherwise it is Q1 - Q2.

Proof. Suppose, there are two Boolean arrays B1 and B2, and the number of arrays B1 is the same as arrays B2. Element value “0” stands for indiscernible objects and “1” means discernible ones. If all Boolean “0” elements in B1 are combined into Q1 and all the same Boolean elements are combined into Q2, then the four situations of B1 [i] ||B2 [i] are as follows

if B1[i] = 0 and B2[i] = 0

then B1[i] ||B2[i] = 0;

if B1[i] = 0 and B2[i] = 1

then B1[i] ||B2[i] = 1;

if B1[i] = 1 and B2[i] = 0

then B1[i] ||B2[i] = 1;

if B1[i] = 1 and B2[i] = 1

then B1[i] ||B2[i] = 1;

According to the above four cases, if all elements of Q2 are “0”, the final result is Q1 ∩ Q2 after computing the result of B1||B2 and storing the indistinguishable elements.

If all elements of Q2 are “1”, which means these are distinguishable, the result is Q1 - Q2 after computing the result of B1||B2, and getting all the “0” elements which are included in Q1 and excluded from Q2, So.

The attribute reduction algorithm based on logical OR (Algorithm 2):

Input: simplified decision table U′, positive region POS_C (D) and negative region NEG_C (D)

Output: Reduction Set.

Step 1. Use parallel radix sort to compute simplified decision table, positive and negative region;

Step 2. All elements in simplified decision table are classified according to different decision attribute.

Compute all the effective pairs and number them consecutively according to the definition of Effective _ pairs.

Step 3. Let Core = ∅ , UnDistString = Effective _ pairs//UnDistString be indistinguishable string

Step 4. Calculate the Distinct _ String (C_i) and InDistinct _ String (C_i) of every condition attribute among the C - Core, then keep the shorter string and mark it as Shortest (C_i).

Step 5. Select the shortest Shortest (C_i) length corresponding attribute and C_k is incorporated into Core.

Step 6. if Shortest (C_k) is indistinguishable string then set UnDistString = UnDistString ∩ Shortest (C_k)

else

set UnDistString = UnDistString - Shortest (C_k)

Step 7. if UnDistString =∅ then

terminate the algorithm

else Step 4.

With the fast development of hardware, modern processors are almost operated on multicore. Algorithm 2 doesn’t take the running of multi-processor into account, which is to realize the full advantage of multi-core processors. Algorithm 2 can be managed in multi-thread and the following attribute Algorithm 3 can be got based on parallel computing.

4.2 Logical OR attribute algorithm based parallel computing (Algorithm 3)

Input and output is the same as algorithm.

Step 1 and Step 2 is the same as Algorithm 2

Step 3. Suppose a queue includes three members such as condition attribute C_i, the label of distinguishability or in distinguishability and the order list of distinguishable elements. Let Core =∅, UnDistString = Effective _ pairs

Step 4. Distribute the above effective pairs to N-1 threads equally, and compute the label about condition attribute C_i (“1” is distinguishable and “0” is indistinguishable). The Nth thread collects the serial numbers of element pairs. Store the queue of the fewer elements and mark distinguishability or not.

Step 5. Distribute the element pairs to N-1 threads averagely and compute the of Distinct _ String (C_i), InDistinct _ String (C_i) every condition attribute among the C-Core. Store the shortest string and mark it Shortest (C_i). Select an attribute corresponding to the Shortest (C_i) and incorporate it into Core.

Step 6. Distribute the elements of UnDistString to N-1 threads equally,

if Shortest (C_k) is indistinguishable string then

set UnDistString = UnDistString ∩ Shortest (C_k)

else

set UnDistString = UnDistString - Shortest (C_k)

Step 7. if UnDistString≠ ∅ then

go to Step 5.

else

terminate algorithm.

The time complexity analysis: as the Algorithm 3 uses multi-thread, which is based on Algorithm 2, the time complexity of Algorithm 3 is one-nth of Algorithm 2. Concrete analysis is as follows:

Step 1 time is needed to compute the simplified decision table and the time is Max (O (|U|) , O (|U/P_i||C|/(N - 1))).

Step 2 time is spent in computing the different categories on decision D about simplified table. It spends time O (|U′|) in scanning the simplified table one time, then spends time O (|POS_C (D) ||U′|) computing the effective pairs. The time complexity is O (|POS_C (D) ||U′|/(N - 1)) because of using N-1 processors to compute. Step 3 distributes O (|POS_C (D) ||U′|) elements to N – 1 threads, the time complexity is O (|POS_C (D) ||U′|/(N - 1)). Step 4 computes the marks of effective pairs on condition attribute C_i, every processor spends time O (|POS_C (D) ||U′|/(N - 1)). Step 5 compute indistinguishable element pairs about all condition attributes. This step spends time O (|POS_C (D) ||U′||C|/(N - 1)) because of |C| attributes. In the sixth step, the elements of the shortest UnDistString were distributed by N-1 processors, which spends time |UnDistString|/(N - 1). At most, UnDistString includes only half of effective elements, the time complexity is O (|POS_C (D) ||U′|/ - 2 (N - 1)). In the seventh step, we repeat the Step 6. The worst-case needs |C| times, the time complexity is O (|C||POS_C (D) ||U′|/ - 2 (N - 1)). In conclusion, the total time complexity is O (|C||POS_C (D) ||U′|/ - (N - 1)).

5 Example and experiment analysis

5.1 Example analysis

5.1.1 Parallel radix sort

Suppose, S is simplified decision table, Let S = (U, C, D, V, f). The data of S used in the example is outlined in Table 1.

Suppose, multiprocessor has four cores and each processor processes data via one thread, among which one core is used to collect data and the others to compute the sorting process. The data of Table 1 is distributed equally by three cores. Suppose 12 samples are distributed to thread1, thread2 and thread3 in due order. At first, the data is classified according to different values of attribute “a”,and thread4 is used to collect the data to get three classifications {1,2,5,6,7,9,10},{3,4,11} and {8,12}. Next, the result is distributed by three threads and classified according to different values of attribute “b”. The fourth thread collects these classifications as follow: {1,7},{2,6},{5,9,10},{3,11},{4},{8},{12}. Because every sub classification {4},{8} and {12} includes one element, merge the elements {4},{8} and {12} POS_C (D), then delete them from U^". By that analogy, we can get classification about other attributes. At last, we get simplified decision table which includes element {1,2,3,4,5,8,11,12,9}. NEG_C (D) includes the sample {9}, and others belong to POS_C (D).

5.1.2 Example of parallel attribute reduction algorithm

The data of simplified decision table is distributed by decision attribute D. The result is {{1,2},{3,4,5,11},{8,12},{9}}. As {9} is element of negative region, we use ‘*’ instead of decision value sample 9. In addition, every decision value of the negative region sample is marked with ‘*’, which stands for different values.

These effective pairs are as follow:

We distribute the elements from NO.1 to NO.9 to thread1, from NO.10 to NO.18 to thread2 and others to thread3, so that twenty-eight element pairs come into being. Compute the distinguishable and indistinguishable element pairs about condition attribute “a” of every thread with thread4 collecting the number corresponding to the effective element, then we can get five pairs of indistinguishable element whose numbers are {3,7,10,14,23}, and the remaining twenty-three pairs are distinguishable. In the same way, we can get eight indistinguishable pairs whose numbers are {1,4,5,13,15,20,23,24} and 20 distinguishable pairs on condition attribute “b”. Twelve indistinguishable element pairs are obtained as follows: {3, 8,9,11,12,13,15,16,18,19, 24, 25}, and the others are distinguishable on attribute “c”. On attribute “d”, we can get distinguishable pairs such as {1,7,8,14,15,,16,17,20,23,26,27,28} and the other sixteen pairs are indistinguishable.

Because the indistinguishable element pairs on attribute “a” are the least and the distinguishing ability is the strongest, attribute “a” is selected to get merged into Core and let UnDistString = {3, 7, 10, 14, 23}. The indistinguishable queue on attribute “b” is {23} about UnDistString. The indistinguishable queue on attribute “c” is {3,23}. The given element pairs are distinguishable on attribute “d”. If UnDistString and attribute “d” are merged into a queue, computing their indistinguishable queue is to compute UnDistString - Shortest (d). Also, we can find the element in UnDistString and not in Shortest (d). The merged indistinguishable queue includes {3, 10}. Because the indistinguishable element pairs are the least, the attribute “b” is selected and merged into Core. At this time, Core equals {a, b}. Similarly, we can get the final Core which includes {a, b, c}. Just then, the UnDistString is empty, and the algorithm ends.

5.2 Experiment analysis

5.2.1 Determination of the number of threads

In order to further verify the performance of the Algorithm 2 in this paper and other similar methods, hardware and software configuration used in experiment include Windows 7, VS2012, Intel^® Core™ i7-4712MQ, the CPU 2.3G, and 4 GB memory. In order to explore the multi-core and multi-thread’s impact on the system’s operating speed of procedures, we use stochastic function to generate one hundred thousand data, two hundred thousand data and five hundred thousand data. Sort above data with Algorithm 1, and we conduct five times to calculate the average. The results are shown in Figs. 2–4.

The experimental data is computed under the hardware condition of dual-core and quad core, and the number of threads is 2, 4, 8 and 16 respectively, then the data value will be 100000, 200000 and 500000.

The computing speed of quad-core processor is faster than dual-core processor. Comparing four computers with the same configuration running one task to two computers, it’s true that the more computers the better. Certainly the computing speed of quad-core is not the double of dual-core. For example,when we sort the one hundred thousand data, and the number of threads is 2, dual-core processor spends 21.5 seconds while quad-core processor needs 15.8 seconds. After dealing with data, the data is collected to switch thread. So, more system resource is wasted.

When thread number is the same as core number, the computing speed can get the best effects under the same condition, which can be reflected in Fig. 2–4.

For example, we sort one hundred thousand data, two hundred thousand and five hundred thousand data in turn. When the number of threads is 2, the performance period is the shortest. So we can infer that the computing speed is the fastest when there are four threads in 4-core processor system.

As the experiment is carried out with quad-core processor, based on the above, we use four threads to run the following reduction experiment.

5.2.2 Parallel reduction

In order to further verify the performance of Algorithm 2 in this paper and other similar methods or Algorithm 2 based on parallel computing, we use the same programming environment as above. In this paper, nine data sets are all downloaded from UCI repository of machine learning databases [16], which are outlined in Table 3. |U| is the number of samples, |C| is the number of attribute, |RN| is the number of smallest reduction. RA, R2 and R3 are the numbers of reduction attribute in paper [17], Algorithms 2 and 3 respectively in this paper. TA, T2 and T3 are the running time of the above three algorithms in due order.

We can get the following main conclusions from Table 3: (1) The reduction results of Algorithm 2 are similar to algorithm A under the same data set condition, which is very close to the minimum reduction. But the running time of Algorithm 2 is less than algorithm A. Because algorithm A selects the attribute with strongly distinguishing ability and most distinguishable elements, it enables system’s spending time to increase. But Algorithm 2 selects the indistinguishable element pairs. If an attribute is inco- rporated into reduction set, indistinguishable element pairs will be fewer and fewer, which will speed up the process of attribute reduction.

(2) Compare Algorithm 3 with Algorithm 2, the running time of algorithm2 is less than Algorithm 3 when there are not enough samples. Although Algorithm 3 computes through parallel multi-threading, thread switches increase the system’s time. Consequently, when there are not enough samples, not only can multi-core not display its advantage, but it reflects the shortcomings of multi-core. For example, Algorithm 2 takes 0.003 seconds to reduce small data set Zoo, but Algorithm 3 needs 0.005 seconds. At the same time, Algorithm 2 needs 0.008 seconds but Algorithm 3 takes 0.010 seconds to run monks reduction. When the quantity of experiment data is increased, the advantages of multi-threading based on multi-core processor become more and more obvious. For example, Algorithm 2 needs 27.31 seconds to reduce the big data set Mushroom, but Algorithm 3 takes 14.33 seconds under the same condition. The running speed of Algorithm 3 is increased nearly 1 time further. When we reduce the data set Poker, Algorithm 3 only needs 45.21 seconds but Algorithm 2 takes 140.41 seconds. The time of Algorithm 3 is close to one third of Algorithm 2.

6 Conclusion

Attribute reduction is one of important hot spots of data rough set. An algorithm of attribute reduction based on discernibility matrix and other related improved algorithms are effective reduction way. But the time complexity of most algorithms is O (|C|²|U|²) based on improved discernibility matrix. The paper proposes the definition of effective pair and puts forward the attribute reduction algorithm based on logical OR, which makes the time complexity reduced to O (|C||U|²). In order to fully display the performance of hardware, we incorporate multi-threading and parallel computing into reduction algorithm in this paper, leading the time complexity to go down to O (|C||U|²/(N - 1)). It can effectively save the running time of system, and the effect should be more obvious for big data set. The next step in this work is how to use parallel algorithm to reduce the dynamic update data set and adapt the modern large data reduction.

Footnotes

Acknowledgments

This work is supported by the Postdoctoral Science Foundation of China 2014M551565, by the natural science foundation of Anhui Province 1308085MF101, by the natural science foundation of Anhui Province, KJ2016A502.

References

Pawlak

and Skowron

, Rough sets: Some extensions, Information Sciences 177(1) (2007), 28–40.

Pawlak

and Skowron

, Rudiments of rough sets, Information Sciences 177(1) (2007), 3–27.

Wang

X.Z.

, Wang

T.T.

and Zhai

J.H.

, An attribute reduction algorithm based on instance selection, Journal of Computer Research and Development 49(11) (2012), 2305–2310.

Kim

, Chu

Y.Y.

, Watada

J.Z.

, et al., A DNA-based algorithm for minimizing decision rules: A rough sets approach, IEEE Trans Nanobiosci 10(3) (2011), 139.

Qian

, Liang

, Pedrycz

and Dang

, An efficient accelerator for attribute reduction from incomplete data in rough set framework, Pattern Recognition 44(8) (2011), 1658–1670.

, Pedrycz

, Yu

and Lang

, Selecting discreteand continuous features based on neighborhood decision error minimization, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 40(1) (2010), 137–150.

Qian

, Liang

, Pedrycz

and Dang

, Positive approximation: An accelerator for attribute reduction in rough set theory, Artificial Intelligence 174(9-10) (2010), 597–618.

Wong

S.K.M.

and Ziarko

, On optimal decision rules in decision tables, Bulletin of the Polish Academy of Sciences 33(11-12) (1985), 693–696.

Zhang

J.B.

and Li

T.R.

, PLAR: Parallel large-scale attribute reduction on cloud systems, In: Proceedings of PDCAT, 2013, pp. 184–191.

10.

F.F.

, Wei

, Bi

Z.Q.

, etc., Research on fuzzy rough parallel reduction based on mutual information, Journal of Computational Information Systems 10(12) (2014), 5391–5401.

11.

Pawlak

, Rough sets, International Journal of Computer & Information Sciences 11(5) (1982), 341–356.

12.

Sun

L.L.

, Hou

X.P.

, Zhu

, etc. Merge sort algorithm design based on multi thread, Journal of Jilin University(Information Science Edition) 33(1) (2015), 105–110.

13.

Z.Y.

, Liu

Z.P.

, Yang

B.R.

and Song

, A quick attribute reduction algorithm with complexity Max (O (|C||U|) , O (|C|²|U/C|)), Chinese Journal of Computers 3(29) (2006), 391–399.

14.

Huang

G.S.

, Zeng

F.Z.

, Chen

G.Y.

, etc., An algorithm of attribute reduction based on HU’s discernibility matrix by distinguishing capability, Journal of Chinese Computer Systems 33(8) (2012), 1800–1804.

15.

Yang

C.J.

, Ge

and Li

L.S.

, Attribute reduction of vertically partitioned binary discernibility matrix, Control and Decision 28(4) (2013), 563–568.

16.

UCI Machine Learning Repository[DB/OL].[2007-12-20]. http//www.ics.uci.edu/~mlearn/MLRepository.html

17.

Chen

and Zhao

, A new attribute reduction method based on binary discernibility matrix, Computer Applications and Soft Ware 30(9) (2013), 123–127.