Abstract
For a decision table, it is firstly proved that the effective pair is equivalent to improved discernibility matrix to guarantee that the attribute reduction based on discernibility matrix can be calculated by effective pair. To get the attribute reduction more quickly, a parallel radix sort model is proposed. Then a simple decision table and effective pair are obtained. Taking the length of distinguishable or indistinguishable elements string among the effective pair as heuristic information, an attribute reduction algorithm based on parallel Logical OR is proposed. Finally, the examples and experiments are shown to verify the effectiveness and feasibility of the proposed algorithm.
Introduction
Rough set theory is a new mathematical tool to deal with fuzzy and uncertain knowledge without any precondition [1, 2], which can effectively analyze and deal with imprecise, incomplete information, etc., and can find hidden knowledge in large amount of data. At present, rough set theory has been successfully applied to such fields as machine learning, intelligent control, biological and medicine fields [3–7], etc. Attribute reduction is one of the focus points in the research on rough set. It intends to eliminate redundancy and unimportant data by replacing excavated data set with fewer attribute without affecting the capacity of classification in information system. Minimum attribute reduction (MAR) problem in the context of rough set theory is an NP-hard nonlinearly constrained combinatorial optimization problem [8]. Until now, heuristic algorithm is said to be the best way to solve the MAR problem.
There are some common heuristic methods, such as information entropy about condition attribute, the granularity size of information, the frequency and dependence of attribute distinction in discernible matrix, etc.
Attribute reduction from large data is an expensive preprocessing step. To accelerate the process of attribute reduction, many kinds of techniques have been proposed in recent years [3–8, 17]. For example, Literature [3] proposes an instance selection method based on discernibility matrix with the training set. However, it affects the classification accuracy. Literature [15] proposes a method based on the vertically partitioned binary discernibility matrix with essential part being transferred into the memory merely, and others stored on the external space. Literature [7] proposes an updating attribute reduction algorithm based on differential matrix, which the minimum attributes reduction set can be acquired quickly. But it consumes much time when the algorithm updates the set of discernibility element.
As these methods are still sequential and cannot process big data with a cluster. To get the optimal reduction or approximate result quickly, researchers focus on algorithm itself to improve the efficiency instead of considering how to divide the data set to adapt to processor processing. At present, there are still fewer researches on the algorithm of parallel processing of multiprocessor. Sun L L proposes a merge sort algorithm based on multi-thread in references 12, which uses multiple threads to improve the speed of execution of the algorithm. ZHANG J B proposes a reduced way of improving information entropy in references 9, which applies the MapReduce model to reduction to improve the efficiency of attributes reduction. XU F F raises a parallel attribute reduction algorithm based on information view to improve information matrix in references 10. However, considerable storage space is wasted in storing information matrix, and it doesn’t take advantage of multithread to achieve parallel computations. This paper proposes an effective concept of element pair that combines effective pairs with every condition attribute to form indistinguishable string and makes it the basis of selecting important attribute. Then it proposes an attribute reduction algorithm. In order to make use of multiprocessor, the algorithm can be realized by parallel multithread so that the speed can get approximate N times of serial algorithm and the total time complexity is O (|C||POS C (D) ||U′|/ - (N - 1)). As experimental results show, the algorithm of attribute reduction with parallel multithread processing is effective and feasible. The parallel computational model based on multi threads can accelerate dramatically in other fields as well.
The rest of this article is organized as follows. Section 2 describes some related concepts of the Rough Set. Section 3 includes the elementary background introduction about simultaneous multithreading and its application to radix sorting. Section 4 proposes the algorithm about parallel attribute reduction based on logical OR. Section 5 presents the experimental analysis. The paper ends up with conclusions and future work in Section 6.
Preliminary
Obviously, if IND (P) denotes as U/P, U/P is an equivalence relation. We assume U/P includes x, the equivalence relation x is defined as follows,
The negative region is defined as NEG P (D) = U - POS P (D).
(2) Suppose, any c is subset of condition attribute set C and D is decision attribute. For every element pair (x,y) of effective pairs, x is subset of and y is subset of , we can’t find f (x, c) ≠ f (y, c). As c is arbitrary, we can get the result of f (x, c) = f (y, c). The elements of effective pairs are from simplified decision table, different from each other. So the result of f (x, C) = f (y, C) is false, and the assumption is not the case. We can find an element pair (x, y) in M and f (x, c) ≠ f (y, c). Similarly, if x and y are all subset of and f (x, D) ≠ f (y, D), we can find an element pair (x,y) of M and f (x, c) ≠ f (y, c).
In a word, M and effective pairs are equivalent.
Comparison of parallel computing and serial computing
Suppose the task is made up of n parts, the running time of every part is T i , it’ll take time to complete the task at least by using serial computing method. However, it’ll take time Max (T i ) (1 ≤ i ≤ N) by using parallel computing. The CPU had a single core in the past and thread was carried out through serial computing. In order to realize efficient parallel operation, it’s necessary to operate on multicore processors so as to make the best use of every core. If the program is operated on P- core system with every core running a thread, the best expected effect is T parallel = T serially /p. As the number p increases, the time spent on computing will be less and less. Due to thread switch and data transmission between threads, if the number of threads grows, the system overhead will increase, which will fail to get the optimal result.
The application of parallel computing to radix sort
There are lots of repetitive or inconsistent sample data in mined data set, which has great influence on the time and space of the digging results. So it’s advisable to delete theose useless data to reduce the redundancy. This paper uses radix sorting to sort the data set. The traditional radix sort took the serial sort into consideration, which was applicable to single-core processor. Nowadays, with the development of science and technology, modern computers operate on multi-core. Consequently, applying the characteristic of parallel processing to radix sort can make best use of multi-core process and quicken the speed of sorting. Using multi-thread can speed up the parallel result in multi-core system, but more threads don’t mean the faster speed. With the threads growing, the more system overhead will be increased because of more thread switches. An important factor that affects both the speed and performance of processors is the number of threads and processors. We can get optimal results, if the number of threads is equivalent to processors [12]. Suppose the machine has N processors, among which one processor is used to collect data, and the other N-1 processors are used to sort data to improve the efficiency, it is practical to sort by using the following model to make use of every processor to complete the program of sorting and reduction (see Fig. 1).
Doctor Xu Z Y used radix sorting to get simplified decision table in paper [13]. But it also has two main demerits: It cannot delete the distinguishable elements effectively, and affect the efficiency of completing positive and negative region. It is suitable only for serial computation, which cannot reach their full effect of multicore in multicore system.
The simplified decision table algorithm based on parallel radix sort (Algorithm 1)
Input: U, N, C; //U is original data set, N is the number of processors about CPU
Output: U′, POS C (D) , NEG C (D)
then select the first element of U/P i and put it into NEG C (D)
else
put it into POS C (D)
Let U′ = POS C (D) ∪ NEG C (D)
The time complexity of Step 1 is O(1). In Step 2, data of |U| is averagely partitioned and distributed to N-1 processors, and the complexity is O (|U|). In Step 3, all data is divided by different key words, and the time complexity is O (|U|/(N - 1)). In Step 4, collect all the subdivision, and the time complexity is O (|U/P i |/(N - 1)). Step 5 polls to see whether all divisions of condition attribute are completed, and calculates O (|C|) times. So, the time complexity of Step 4 and Step 5 is O (|U/P i ||C|/(N - 1)) total. Step 6 collects all the first sample of division, and time spent is O (|U/C|).
In conclusion, the total time complexity is Max (O (|U|) , O (|U/P i ||C|/(N - 1))).
The attribute reduction algorithm based on parallel computing
The attribute reduction algorithm based on Logical OR (Algorithm 2)
Lots of attribute reduction algorithms are based on discernibility matrix [14, 15], which put emphasis on the number of discernibility elements. The more the discernibility elements are, the important the attribute is. If all elements are linked to make into a linked list, then the longest one is selected every time, leading to the waste of much more time. Researcher shows that greater importance can be attached to the information of indistinguishable elements. The fewer the pair of indistinguishable elements are, the stronger the individual discrimination becomes. In this way, attribute including the least indistinguishable elements is selected. And when an attribute is added to the core, intersection of them can be selected as the prerequisite for the next attribute, which will speed up the minimum reduction.
if B1[i] = 0 and B2[i] = 0
then B1[i] ||B2[i] = 0;
if B1[i] = 0 and B2[i] = 1
then B1[i] ||B2[i] = 1;
if B1[i] = 1 and B2[i] = 0
then B1[i] ||B2[i] = 1;
if B1[i] = 1 and B2[i] = 1
then B1[i] ||B2[i] = 1; According to the above four cases, if all elements of Q2 are “0”, the final result is Q1 ∩ Q2 after computing the result of B1||B2 and storing the indistinguishable elements. If all elements of Q2 are “1”, which means these are distinguishable, the result is Q1 - Q2 after computing the result of B1||B2, and getting all the “0” elements which are included in Q1 and excluded from Q2, So.
The attribute reduction algorithm based on logical OR (Algorithm 2):
Input: simplified decision table U′, positive region POS C (D) and negative region NEG C (D)
Output: Reduction Set.
Compute all the effective pairs and number them consecutively according to the definition of Effective _ pairs.
else
set UnDistString = UnDistString - Shortest (C k )
terminate the algorithm
else Step 4.
With the fast development of hardware, modern processors are almost operated on multicore. Algorithm 2 doesn’t take the running of multi-processor into account, which is to realize the full advantage of multi-core processors. Algorithm 2 can be managed in multi-thread and the following attribute Algorithm 3 can be got based on parallel computing.
Logical OR attribute algorithm based parallel computing (Algorithm 3)
Input and output is the same as algorithm.
Step 1 and Step 2 is the same as Algorithm 2
Step 3. Suppose a queue includes three members such as condition attribute C i , the label of distinguishability or in distinguishability and the order list of distinguishable elements. Let Core =∅, UnDistString = Effective _ pairs
Step 4. Distribute the above effective pairs to N-1 threads equally, and compute the label about condition attribute C i (“1” is distinguishable and “0” is indistinguishable). The Nth thread collects the serial numbers of element pairs. Store the queue of the fewer elements and mark distinguishability or not.
Step 5. Distribute the element pairs to N-1 threads averagely and compute the of Distinct _ String (C i ), InDistinct _ String (C i ) every condition attribute among the C-Core. Store the shortest string and mark it Shortest (C i ). Select an attribute corresponding to the Shortest (C i ) and incorporate it into Core.
Step 6. Distribute the elements of UnDistString to N-1 threads equally,
if Shortest (C k ) is indistinguishable string then
set UnDistString = UnDistString ∩ Shortest (C k )
else
set UnDistString = UnDistString - Shortest (C k )
Step 7. if UnDistString≠ ∅ then
go to Step 5.
else
terminate algorithm.
The time complexity analysis: as the Algorithm 3 uses multi-thread, which is based on Algorithm 2, the time complexity of Algorithm 3 is one-nth of Algorithm 2. Concrete analysis is as follows:
Step 1 time is needed to compute the simplified decision table and the time is Max (O (|U|) , O (|U/P i ||C|/(N - 1))).
Step 2 time is spent in computing the different categories on decision D about simplified table. It spends time O (|U′|) in scanning the simplified table one time, then spends time O (|POS C (D) ||U′|) computing the effective pairs. The time complexity is O (|POS C (D) ||U′|/(N - 1)) because of using N-1 processors to compute. Step 3 distributes O (|POS C (D) ||U′|) elements to N – 1 threads, the time complexity is O (|POS C (D) ||U′|/(N - 1)). Step 4 computes the marks of effective pairs on condition attribute C i , every processor spends time O (|POS C (D) ||U′|/(N - 1)). Step 5 compute indistinguishable element pairs about all condition attributes. This step spends time O (|POS C (D) ||U′||C|/(N - 1)) because of |C| attributes. In the sixth step, the elements of the shortest UnDistString were distributed by N-1 processors, which spends time |UnDistString|/(N - 1). At most, UnDistString includes only half of effective elements, the time complexity is O (|POS C (D) ||U′|/ - 2 (N - 1)). In the seventh step, we repeat the Step 6. The worst-case needs |C| times, the time complexity is O (|C||POS C (D) ||U′|/ - 2 (N - 1)). In conclusion, the total time complexity is O (|C||POS C (D) ||U′|/ - (N - 1)).
Example and experiment analysis
Example analysis
Parallel radix sort
Suppose, S is simplified decision table, Let S = (U, C, D, V, f). The data of S used in the example is outlined in Table 1.
Suppose, multiprocessor has four cores and each processor processes data via one thread, among which one core is used to collect data and the others to compute the sorting process. The data of Table 1 is distributed equally by three cores. Suppose 12 samples are distributed to thread1, thread2 and thread3 in due order. At first, the data is classified according to different values of attribute “a”,and thread4 is used to collect the data to get three classifications {1,2,5,6,7,9,10},{3,4,11} and {8,12}. Next, the result is distributed by three threads and classified according to different values of attribute “b”. The fourth thread collects these classifications as follow: {1,7},{2,6},{5,9,10},{3,11},{4},{8},{12}. Because every sub classification {4},{8} and {12} includes one element, merge the elements {4},{8} and {12} POS C (D), then delete them from U". By that analogy, we can get classification about other attributes. At last, we get simplified decision table which includes element {1,2,3,4,5,8,11,12,9}. NEG C (D) includes the sample {9}, and others belong to POS C (D).
Example of parallel attribute reduction algorithm
The data of simplified decision table is distributed by decision attribute D. The result is {{1,2},{3,4,5,11},{8,12},{9}}. As {9} is element of negative region, we use ‘*’ instead of decision value sample 9. In addition, every decision value of the negative region sample is marked with ‘*’, which stands for different values. These effective pairs are as follow: We distribute the elements from NO.1 to NO.9 to thread1, from NO.10 to NO.18 to thread2 and others to thread3, so that twenty-eight element pairs come into being. Compute the distinguishable and indistinguishable element pairs about condition attribute “a” of every thread with thread4 collecting the number corresponding to the effective element, then we can get five pairs of indistinguishable element whose numbers are {3,7,10,14,23}, and the remaining twenty-three pairs are distinguishable. In the same way, we can get eight indistinguishable pairs whose numbers are {1,4,5,13,15,20,23,24} and 20 distinguishable pairs on condition attribute “b”. Twelve indistinguishable element pairs are obtained as follows: {3, 8,9,11,12,13,15,16,18,19, 24, 25}, and the others are distinguishable on attribute “c”. On attribute “d”, we can get distinguishable pairs such as {1,7,8,14,15,,16,17,20,23,26,27,28} and the other sixteen pairs are indistinguishable. Because the indistinguishable element pairs on attribute “a” are the least and the distinguishing ability is the strongest, attribute “a” is selected to get merged into Core and let UnDistString = {3, 7, 10, 14, 23}. The indistinguishable queue on attribute “b” is {23} about UnDistString. The indistinguishable queue on attribute “c” is {3,23}. The given element pairs are distinguishable on attribute “d”. If UnDistString and attribute “d” are merged into a queue, computing their indistinguishable queue is to compute UnDistString - Shortest (d). Also, we can find the element in UnDistString and not in Shortest (d). The merged indistinguishable queue includes {3, 10}. Because the indistinguishable element pairs are the least, the attribute “b” is selected and merged into Core. At this time, Core equals {a, b}. Similarly, we can get the final Core which includes {a, b, c}. Just then, the UnDistString is empty, and the algorithm ends.
Experiment analysis
Determination of the number of threads
In order to further verify the performance of the Algorithm 2 in this paper and other similar methods, hardware and software configuration used in experiment include Windows 7, VS2012, Intel® Core™ i7-4712MQ, the CPU 2.3G, and 4 GB memory. In order to explore the multi-core and multi-thread’s impact on the system’s operating speed of procedures, we use stochastic function to generate one hundred thousand data, two hundred thousand data and five hundred thousand data. Sort above data with Algorithm 1, and we conduct five times to calculate the average. The results are shown in Figs. 2–4.
The experimental data is computed under the hardware condition of dual-core and quad core, and the number of threads is 2, 4, 8 and 16 respectively, then the data value will be 100000, 200000 and 500000. The computing speed of quad-core processor is faster than dual-core processor. Comparing four computers with the same configuration running one task to two computers, it’s true that the more computers the better. Certainly the computing speed of quad-core is not the double of dual-core. For example,when we sort the one hundred thousand data, and the number of threads is 2, dual-core processor spends 21.5 seconds while quad-core processor needs 15.8 seconds. After dealing with data, the data is collected to switch thread. So, more system resource is wasted. When thread number is the same as core number, the computing speed can get the best effects under the same condition, which can be reflected in Fig. 2–4.
For example, we sort one hundred thousand data, two hundred thousand and five hundred thousand data in turn. When the number of threads is 2, the performance period is the shortest. So we can infer that the computing speed is the fastest when there are four threads in 4-core processor system.
As the experiment is carried out with quad-core processor, based on the above, we use four threads to run the following reduction experiment.
Parallel reduction
In order to further verify the performance of Algorithm 2 in this paper and other similar methods or Algorithm 2 based on parallel computing, we use the same programming environment as above. In this paper, nine data sets are all downloaded from UCI repository of machine learning databases [16], which are outlined in Table 3. |U| is the number of samples, |C| is the number of attribute, |RN| is the number of smallest reduction. RA, R2 and R3 are the numbers of reduction attribute in paper [17], Algorithms 2 and 3 respectively in this paper. TA, T2 and T3 are the running time of the above three algorithms in due order.
We can get the following main conclusions from Table 3: (1) The reduction results of Algorithm 2 are similar to algorithm A under the same data set condition, which is very close to the minimum reduction. But the running time of Algorithm 2 is less than algorithm A. Because algorithm A selects the attribute with strongly distinguishing ability and most distinguishable elements, it enables system’s spending time to increase. But Algorithm 2 selects the indistinguishable element pairs. If an attribute is inco- rporated into reduction set, indistinguishable element pairs will be fewer and fewer, which will speed up the process of attribute reduction.
(2) Compare Algorithm 3 with Algorithm 2, the running time of algorithm2 is less than Algorithm 3 when there are not enough samples. Although Algorithm 3 computes through parallel multi-threading, thread switches increase the system’s time. Consequently, when there are not enough samples, not only can multi-core not display its advantage, but it reflects the shortcomings of multi-core. For example, Algorithm 2 takes 0.003 seconds to reduce small data set Zoo, but Algorithm 3 needs 0.005 seconds. At the same time, Algorithm 2 needs 0.008 seconds but Algorithm 3 takes 0.010 seconds to run monks reduction. When the quantity of experiment data is increased, the advantages of multi-threading based on multi-core processor become more and more obvious. For example, Algorithm 2 needs 27.31 seconds to reduce the big data set Mushroom, but Algorithm 3 takes 14.33 seconds under the same condition. The running speed of Algorithm 3 is increased nearly 1 time further. When we reduce the data set Poker, Algorithm 3 only needs 45.21 seconds but Algorithm 2 takes 140.41 seconds. The time of Algorithm 3 is close to one third of Algorithm 2.
Conclusion
Attribute reduction is one of important hot spots of data rough set. An algorithm of attribute reduction based on discernibility matrix and other related improved algorithms are effective reduction way. But the time complexity of most algorithms is O (|C|2|U|2) based on improved discernibility matrix. The paper proposes the definition of effective pair and puts forward the attribute reduction algorithm based on logical OR, which makes the time complexity reduced to O (|C||U|2). In order to fully display the performance of hardware, we incorporate multi-threading and parallel computing into reduction algorithm in this paper, leading the time complexity to go down to O (|C||U|2/(N - 1)). It can effectively save the running time of system, and the effect should be more obvious for big data set. The next step in this work is how to use parallel algorithm to reduce the dynamic update data set and adapt the modern large data reduction.
Footnotes
Acknowledgments
This work is supported by the Postdoctoral Science Foundation of China 2014M551565, by the natural science foundation of Anhui Province 1308085MF101, by the natural science foundation of Anhui Province, KJ2016A502.
