Abstract
Granular computing (GrC) is a frame computing paradigm that realizes the transformation between two granule spaces with different granularities. A comparative analysis of granular computing clustering is discussed in the paper. Firstly, a granule is defined as the form of vectors by the center and the granularity, especially, an atomic granule is induced by a point which has the granularity 0. Secondly, the join operator realizes the transformation from the granule space with smaller granularity to the granule space with lager granularity, and is used to form the granular computing clustering (GrCC) algorithms. Thirdly, the granular computing clustering algorithms are evaluated from the view of set, such as Global Consistency Error (GCE), Normalized Variation of Information (NVI), and Rand Index (RI). The superiority and feasibility of GrCC are compared with Kmeans and FCM by experiments on the benchmark data sets.
Introduction
Cluster analysis or clustering is the task of partitioning a set of objects into some subsets in a way that objects in the same subset (called a cluster) are more similar (in some sense or another) to each other than to those in other subsets (clusters). The main task of clustering includes exploration of data, and statistical data analysis, used in many fields, including machine learning [1], pattern recognition [2], image analysis [3], information retrieval [4], and bioinformatics [5].
Clustering is a popular method by which a set is partitioned into multiple subsets. Clustering can be considered the most important unsupervised learning problem, which deals with finding a structure in a collection of unlabeled data. K-means clustering and fuzz c-means (FCM) clustering are two important unsupervised algorithms. The K-means clustering algorithm is an iterative technique that is used to partition a data set into K clusters [6]. FCM is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. This technique was originally introduced by J. Bezdek in 1981 [7] as an improvement on earlier clustering methods. FCM provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters. FCM algorithm incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership function in the neighborhood of each datum under consideration [8].
Granular computing (GrC) is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency.
In this paper, we present a framework of granular computing clustering algorithms (GrCC). Firstly, the granules are represented as the normal forms which denote the granule with different shapes for different distance parameters. Secondly, the operation ∨ and operation ∧ are introduced to realize the transformation between two granule spaces with different granularities. Thirdly, the threshold of granularity is used to control the operation between two granules. The GrCC is analyzed by Global Consistency Error (GCE), Normal Variation of Information (NVI), and Rand Index (RI) from the view of set.
The rest of this paper is presented as follows: Section 2 describes granular computing from the view of set. Granular computing clustering is described in Section 3. Section 4 evaluates the granular computing clustering results from the view of set. Section 5 analyses GrCC by the benchmark data sets selected from the references and websites. Our work contributions and further directions are summarized in Section 6.
Granular computing
Generally, granular computing (GrC) is a computing paradigm partitioning the set into some subsets. The GrC realizes the transformation between two granule spaces with different granularities by the control of granularity parameter ρ.
Related work and motivation
Due to the vast and rapid increase in data, data mining has become an increasingly important tool for the purpose of knowledge discovery in order to prevent the presence of rich data but poor knowledge. Data mining tasks can be undertaken in two ways, namely, manual walkthrough of data and use of machine learning approaches [9]. The purpose of machine learning is to find the clustering results corresponding to the manual walkthrough.
As a clustering method, GrC has been proposed and studied in many fields, including machine learning and data analysis [10–15].
Witold Pedrycz proposes a new concept of granular rule-based models whose rules assume a format, such as if G (A i ) then G (f i ), where G (.) s are granular generalizations of the numeric conditions and conclusions of the rules [10]. Granular computing with multiple granular layers is proposed as an emerging computing paradigm of information processing, which simulates the multi-granular intelligent thinking model of human brain [11]. Witold Pedrycz and his colleague elaborated on the fundamental hierarchically organized layers of processing supporting the development and interpretation of granular time series [12].
A framework is proposed for studying a particular class of set-theoretic approaches to granular computing. A granule is a subset of a universal set, a granular structure is a family of subsets of the universal set, and relationship between granules is given by the standard set-inclusion relation [13]. Vassilis G. Kaburlasos and his colleague analyzed the representation method of granule, the inclusion measure between two granules, and the operations between two granules from set theory and lattice computing [14]. They proposed an effective synergy of the Intervals’ Number k-nearest neighbor (INknn) classifier, that is a granular extension of the conventional knn classifier in the metric lattice of Intervals’ Numbers (INs), with the gravitational search algorithm (GSA) for stochastic search and optimization. Their proposed techniques are demonstrated, comparatively, by computer simulation experiments regarding an industrial dispensing application and the benchmark classification datasets [15, 16].
Inspired by people’s cognitive structure, we analyze the granular clustering algorithm based on distance from the view of set, and the granule is represented as the subset for the training set S. The operation and distance between two granules are formed to design GrCC.
In general, for two subsets of training set S, distance between two non-empty sets is the minimum of the distances between any two elements belonging two different subsets, i.e.
Considering the limitation of Euclidean distance between two subsets, we propose the granular computing clustering based on the novel distance which violates the properties of non-negativity, symmetry, and triangular inequality from the view of set. We analyze the proposed GrCC from Global Consistency Error (GCE), Normal Variation of Information (NVI), and Rand Index (RI).
In reality, the subset of data set S is regarded as a granule which has an irregular shape. In order to study granular computing, the granule is represented as regular shape, such as hyperdiamond, hypersphere, hypercube in N-dimensional space. These three shape granules can be represented as the following normal form.
Where C is the center of granule, r is the radius of granule which is generally induced by the distance between two points in N-dimensional space. Different distance form means the different shape of granule. Granularity is the size of granule, such as the radius of granule.
In Fig. 2, the normal form granule G = (0.1, 0.2, 0.5) is hyperdiamond granule shown in Fig. 2(a) in R2 space, whose center is (0.1, 0.2) and granularity is 0.5 induced by the L1-norm distance. G = (0.1, 0.2, 0.5) is hypershpere granule shown in Fig. 2(b) with center (0.1, 0.2) and granularity 0.5 induced by L2-norm distance. G = (0.1, 0.2, 0.5) is hypercube granule shown in Fig. 2(c) with center (0.1, 0.2) and granularity 0.5 induced by L∞-norm. From the Fig. 2, we can see different granules have different shapes even if they have the same forms of representations, such hyperdiamond granules, hypersphere granules, and hypercube granules in N-dimensional space.
Distance measure between granules
The distance between granules refers to the minimal distance between two points which belong to different granules.
For two hyperdiamond granules G1 = (C1, r1) and G1 = (C2, r2), the distance between two granules is defined as follows.
For two hypersphere granule G1 = (C1, r1) and G2 = (C2, r2), the distance between two hypersphere granules is defined as follows.
For hypercube granules G1 = (C1, r1) and G1 = (C2, r2), the distance between two hypercube granules is defined as follows.
Where C1 and C2 are two vectors which represent the centers of hypercube granules G1 and G2, r1 and r2 are granularities of hypercube granules G1 and G2. The distance between C1 and C2 is defined as the following L∞-norm.
According to the distance between two granules mentioned above, the distance between two granules is the arbitrary real number. There is margin between two granules when d > 0, there is an only common point between two granules when d = 0, and there is an overlap between two granules when d < 0. When d > 0, the greater d means the greater margin between two granules, and when d < 0, the greater d means the smaller overlap. Figure 3 shows the distance between two granules, including d < 0, d = 0, and d < 0.
The operations between two granules reflect the transformation between macroscopic and microcosmic of human cognitions. When a person want to observe the object more carefully, the object is partitioned into some suitable sub-objects, namely the universe is transformed into some parts in order to study the object in detail in the view of microscopic. Conversely, there is the same attributes of some objects, we regard the objects as a universe to simple the process in the view of macroscopic. The operations between two granules are designed to realize the transformation between macroscopic and microscopic. Set-based models of granular structures are special cases of lattice-based models, where the lattice join operation ∨ coincides with set union operation ∪ and lattice meet operation ∧ coincides with set intersection operation ∩.
Join operation ∨ and meet operation ∧ are used to realize the transformation between macroscopic and microcosmic. Operation ∨ unites the granules with small granularities to the granules with the large granularities. Inversely, Operation ∧ divides the granules with large granularities into the granules with small granularities. Join operation ∨ is designed as follows.
Any points are regarded as atomic granules which are indivisible, the join process is the key to obtain the larger granules compared with atomic granules. Likewise, the whole space is a granule with the maximal granularity, the meet process produces the smaller granules compared with original granules.
For two hyperdiamond granules G1 = (C1, R1) and G1 = (C2, R2) in the N-dimensional space, the join hyperdiamond granule is
The center C and the granularity R of G are computed as follows.
Firstly, the vector from C1 to C2 and vector from C2 to C1 are computed.
If C1 = C2, then C12 = 0 and C21 = 0.
If C1 ≠ C2, then C12 = (C2 - C1)/d (C1 - C2) and C21 = (C1 - C2)/d (C2 - C1).
Secondly, the crosspoints of G and G1 are P1 = C1 - C12R1 and P2 = C1 + C12R1. The crosspoints of G and G2 are Q1 = C2 - R2C21 and Q2 = C2 + R2C21.
Thirdly, the center C and granularity R of the join hyperdiamond granule G are computed by the centers and granularities of G1 and G2, and Algorithm 1 is designed to realise the computing process, in which the distance is computed by the formula (2). In algorithm 1, if we design the hypersphere granular computing, the distance is computed by the formula (3), if we design the hypercube granular computing, the distance is computed by the formula (4).
Figure 4 shows the join process between the hyperdiamond granule G1 = [0.2 0.15 0.1] and the hyperdiamond granule G2 = [0.1 0.2 0.1]. The crosspoints of hyperdiamond granule G1 and the line crossing vector C12 = [-0.6667, 0.3333] are P1 = [0.2667, 0.1167] and P2 = [0.1333, 0.1833]. The crosspoints of hyperdiamond granule G2 and the line crossing vector C21 = [0.333 - 0.6667] are Q1 = [0.0333, 0.23333] and Q2 = [0.1667, 0.1667]. According to algorithm1, the central vector and granularity of the join hyperdiamond granule G are C = [0.15, 0.175] and R = 0.175, namely G = [0.15 0.175 0.175].
For the data set S = {
Firstly, the samples are used to form the atomic granule. Secondly, the threshold of granularity is introduced to conditionally union the atomic granules by the aforementioned join operation, and the granule set is composed of all the join granules. Thirdly, if all atomic granules are included in the granules of GS, the join process is terminated, otherwise, the second process is continued. The GrCC process is described as follows.
Suppose the atomic granules induced by data set S are g1, g2, g3, g4, g5. The clustering process can be described as the following tree structure shown in Fig. 5, leafs denote the atomic granules, root denotes GS including its child nodes G1, G2, and g3. g1 is selected as the first granule, g2 is the nearest granule to g1, G1 is induced by join operation of child nodes g1 and g2 because the granularity of join granule of g1 and g2 is less than or equal to ρ. g3 is the nearest granule to G1, but the granularity of the join granule between G1 and g3 is greater than ρ, so g3 becomes the member of GS. g4 is the nearest granule to g3, the granularity of join granule g3 ∨ g4 is greater than ρ, so g3 is not be updated, and g4 is the member of GS. g5 is the nearest granule to g4, the granularity of g4 ∨ g5 is less than ρ, so g4 is updated as G2 = g4 ∨ g5. The whole process of obtaining GS is the bottle up process.
The GrCC framework is described as Algorithm 2.
We take the hypersphere granular clustering in 2D space for example, the data set including 6 sample is S = {x1, x2, x3, x4, x5, x6} = {(0, 0) , (0.14, 0.7) , (0.3, 0.38) , (0.46, 0.06) , (0.6, 0.76) , (0.76, 0.44)} which is composed of 6 vectors, and each vector is represented as an atomic hypersphere granule, 6 atomic hypersphere granules are g1 = (0, 0, 0), g2 = (0.14, 0.7, 0), g3 = (0.3, 0.38, 0), g4 = (0.46, 0.06, 0), g5 = (0.6, 0.76, 0), g6 = (0.76, 0.44, 0), which are shown in Fig. 6(a). If the threshold ρ is set to 0.3, the clustering process is described as follows.
For the selected sample x1, the corresponding atomic granule g1 is selected to form GS = {g1}, x1 is removed from S, and S = {x2, x3, x4, x5, x6}, the corresponding atomic granules are g2, g3, g4, g5, and g6. The distances between the selected granule g1 and the rest granules are
For the partitions
Global consistency error
D. Martin proposed several error measures to quantify the consistency between partitions of the same set [17, 18]. Let π and π′ be two partitions of data set S = {x1, x2, …, x
n
} consisting of n objects with N features. For a given datum x
i
, consider the classes that contain x
i
in π and π′, C (π, x
i
) and C (π′, x
i
) are denoted as the subsets contain x
i
in π and π′, respectively. Local Refinement Error (LRE) is then defined at point x
i
as:
Normalized variation of information
Work in [19] computes a measure of information content in each of the partitions. The proposed measure, termed the Variation of Information (VI), is a metric and is related to the conditional entropies between the class label distribution of the partitions. The VI is computed by the following steps.
Firstly, computing the entropies En (π) and En (π′) associated with partitions π and π′.
For π = {S1, S2, …, S S }
En (π) = - (P (1) log 10P (1) + … + P (s) log 10P (s))
where P (i) = |S i |/n, and for
En (π′) = - (P′ (1) log 10P′ (1) + … + P′ (t) log 10P′ (t))
Where , log 100 = 0.
Secondly, computing the mutual information between π and π′
Thirdly, computing the VI
The VI can be normalized by the following formula [20].
For i ≠ j, , P (i, j) =0.
For i = j, , P (i) = P (j) , P (i, j) = P (i).
So VI (π, π′) =0, and NVI (π, π′) =0.
Rand index
Rand Index (RI) was motivated by standard classification problems in which the result of a classification scheme has to be compared to a correct classification [21]. The most common performance measure for this problem calculates the fraction of correctly classified (respectively misclassified) elements to all elements. For RI, comparing two clusters is just a natural extension of this problem which has a corresponding extension of the performance measure, instead of counting single elements, The correct classified pairs are counted. Thus, RI is defined by
Experiments
In order to verify the superiority and feasibility of GrCC, we compared the proposed clustering with clustering algorithms by Kmeans and FCM, and all the experiments are performed in the same environment, such as Intel PIV PC with 2.8 GHz CPU and 2 GB memory, Microsoft Windows XP Professional and Matlab 7.0.
The performance includes global consistency error (GCE), normalized variant information (NVI), and Rand Index (RI) which measure the index value between partition by algorithms and partition by human. On the one hand, we compare the proposed algorithms, such as GrCC by the formula (2) (GrCC1), GrCC by the formula (3) (GrCC2), and GrCC by the formula (4) (GrCC3). On the other hand, we compare GrCC with the traditional clustering algorithms, such as FCM and Kmeans, GrCC is performed to obtain the granule set, and the size of granule set is regarded as the clustering numbers FCM and Kmeans.
For the selection of parameters, the parameter ρ of GrCC is selected from the large ρ to small ρ with the step length to realize the transformation from the granule space with large granularity to the granule space with small granularity. The numbers of cluster of FCM and Kmeans are set as the numbers of achieved granules by GrCC.
Firstly, the data set named spiral including 3 artificial clusters in [22] is used to verify the clustering feasibility of GrCC. The comparisons of GrCC1, GrCC2, and GrCC3 are listed in Table 1, the best results are in bold, and the value in brackets indicates the parameters to achieve the optimal performance of the algorithm. The curve of GCE of GrCC algorithms are shown in Fig. 7 with the decrease of granularity parameter from 4.5 to 1 with the step 0.01, from the figure we can see GrCC1 reaches the minimum GCE 0 firstly compared with GrCC2 and GrCC3. For NVI, GrCC3 is the best clustering algorithm because GrCC3 achieves the minimal NVI 0.1936 firstly compared with the minimal NVI 0.1939 by GrCC2 and the minimal NVI 0.2033 by GrCC1, the NVI by GrCC3 is less than NVI by GrCC2 and GrCC1 when ρ is less than 3.49. For RI, when ρ= 4.31, GrCC2 reaches the optimal value 0.6959 firstly which is greater than the maximal RI 0.6905 by GrCC1 and the maximal RI 0.6947 by GrCC3. For the data set spiral GrCC2 and GrCC3 are better than GrCC1 because GrCC2 achieves the best GCE 0 and the best RI 0.6959, and GrCC3 achieves the best GCE 0 and the best NIV 0.1936. We select GrCC2 to compare with the traditional clustering algorithms, such as FCM and Kmeans.
Secondly, we compare GrCC2 with FCM and Kmeans clustering algorithms in N-dimensional space by the selected data sets which can be found in website http://cs.joensuu.fi/sipu/datasets/. The performance is listed in Table 2, the best results are in bold, and the value in brackets indicates the parameters to achieve the optimal performance of the algorithm. From the table, we can see GrCC2 is better than FCM and Kmeans from GCE, NVI, and RI. For data sets Dim32, Dim64, Dim128, Dim256, GrCC2 achieved the same clustering results as human, and the clustering results of GrCC2 are different from human for data sets iris and sensor because there is a large margin between the clusters for data sets Dim32, Dim64, Dim128, and Dim256 compared with data sets iris and sensor.
The parameter ρ is sensitive to the margin for the data set. We discuss the influence of margin to the performance by the distance formula (3). According to the supervision information of data set, each artificial cluster is mapped into a hypersphere granule, the distance between two artificial clusters by formula (3) reflects the margin between two clusters. For the data set including n artificial clusters, we obtain n × n matrix D which is composed of the distance between arbitrary two artificial clusters and the diagonal elements are defined as infinity.
Where d (G
i
, G
j
) is the distance by formulas (2), (3), or (4), G
i
and G
j
are the corresponding granules which are induced by the ith artficial cluster and the jth artificial cluster. The minimum of D can be regarded as the margin M of data set, namely the margin M of data set is defined as follows.
When the margin is less than 0, the data set has the greater mutual information and greater NVI, namely the minor parameter ρ for GrCC and greater number (K) of cluster for FCM and Kmeans is selected to achieve the optimal GCE, NVI, and RI. For data set spiral, the minimal distance between two artificial clusters is –57.9495, GrCC1 achieves the minimal GCE 0, minimal NVI 0.2033, maximal RI 0.6905, and 45 hyperdiamond granules when ρ = 4.4, the number of clusters by GrCC1 is more than that of artificial clusters.
Conclusions and future directions
Evaluation indices, such GCE, NVI, and RI, are used to compared GrCC with conventional clustering algorithms FCM and Kmeans from the view of set. The evaluation methods are used to verify the clustering algorithms for the supervised clustering problems. The experimental results show that GrCC is better than FCM and Kmeans in the aspects of GCE, NVI, and RI. As a novel clustering algorithm, GrCC has some improvements, such as the GrCC achieved the better performance for the data sets whose margins are greater than 0. In order to obtain the satisfaction GCE, NVI, and RI, the GrCC induced the more cluster numbers compared with the artificial clustering.
For the future directions, as the method of information processing, granular computing reflects the degree that people understand the world. Human-centered information processing was initiated with the introduction of fuzzy sets. The mathematical theory of granular computing is still one of the research focus in granular computing, how to give deep understanding of human cognition from the viewpoint of granular computing is a future direction for granular computing researchers. In this paper, for the given supervised clustering problem, we only analyze the effect of margin on the performance of the clustering algorithms, and do not use margin to guide the selection of parameters, how to learn the granularity parameter ρ from the data set is another one future direction. The hypercube granule can be represented as the Cartesian product of intervals with the same interval lengths, we will study granular computing based on interval analysis in our future work.
Footnotes
Acknowledgments
This work was supported in part by the Natural Science Foundation of China (Grant Nos. 61170202, 61402393, 61501393).
