A comparative analysis of granular computing clustering from the view of set

Abstract

Granular computing (GrC) is a frame computing paradigm that realizes the transformation between two granule spaces with different granularities. A comparative analysis of granular computing clustering is discussed in the paper. Firstly, a granule is defined as the form of vectors by the center and the granularity, especially, an atomic granule is induced by a point which has the granularity 0. Secondly, the join operator realizes the transformation from the granule space with smaller granularity to the granule space with lager granularity, and is used to form the granular computing clustering (GrCC) algorithms. Thirdly, the granular computing clustering algorithms are evaluated from the view of set, such as Global Consistency Error (GCE), Normalized Variation of Information (NVI), and Rand Index (RI). The superiority and feasibility of GrCC are compared with Kmeans and FCM by experiments on the benchmark data sets.

Keywords

Granule space granular computing granular computing clustering

1 Introduction

Cluster analysis or clustering is the task of partitioning a set of objects into some subsets in a way that objects in the same subset (called a cluster) are more similar (in some sense or another) to each other than to those in other subsets (clusters). The main task of clustering includes exploration of data, and statistical data analysis, used in many fields, including machine learning [1], pattern recognition [2], image analysis [3], information retrieval [4], and bioinformatics [5].

Clustering is a popular method by which a set is partitioned into multiple subsets. Clustering can be considered the most important unsupervised learning problem, which deals with finding a structure in a collection of unlabeled data. K-means clustering and fuzz c-means (FCM) clustering are two important unsupervised algorithms. The K-means clustering algorithm is an iterative technique that is used to partition a data set into K clusters [6]. FCM is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. This technique was originally introduced by J. Bezdek in 1981 [7] as an improvement on earlier clustering methods. FCM provides a method that shows how to group data points that populate some multidimensional space into a specific number of different clusters. FCM algorithm incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership function in the neighborhood of each datum under consideration [8].

Granular computing (GrC) is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information. Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency.

In this paper, we present a framework of granular computing clustering algorithms (GrCC). Firstly, the granules are represented as the normal forms which denote the granule with different shapes for different distance parameters. Secondly, the operation ∨ and operation ∧ are introduced to realize the transformation between two granule spaces with different granularities. Thirdly, the threshold of granularity is used to control the operation between two granules. The GrCC is analyzed by Global Consistency Error (GCE), Normal Variation of Information (NVI), and Rand Index (RI) from the view of set.

The rest of this paper is presented as follows: Section 2 describes granular computing from the view of set. Granular computing clustering is described in Section 3. Section 4 evaluates the granular computing clustering results from the view of set. Section 5 analyses GrCC by the benchmark data sets selected from the references and websites. Our work contributions and further directions are summarized in Section 6.

2 Granular computing

Generally, granular computing (GrC) is a computing paradigm partitioning the set into some subsets. The GrC realizes the transformation between two granule spaces with different granularities by the control of granularity parameter ρ.

2.1 Related work and motivation

Due to the vast and rapid increase in data, data mining has become an increasingly important tool for the purpose of knowledge discovery in order to prevent the presence of rich data but poor knowledge. Data mining tasks can be undertaken in two ways, namely, manual walkthrough of data and use of machine learning approaches [9]. The purpose of machine learning is to find the clustering results corresponding to the manual walkthrough.

As a clustering method, GrC has been proposed and studied in many fields, including machine learning and data analysis [10 –15].

Witold Pedrycz proposes a new concept of granular rule-based models whose rules assume a format, such as if G (A_i) then G (f_i), where G (.) s are granular generalizations of the numeric conditions and conclusions of the rules [10]. Granular computing with multiple granular layers is proposed as an emerging computing paradigm of information processing, which simulates the multi-granular intelligent thinking model of human brain [11]. Witold Pedrycz and his colleague elaborated on the fundamental hierarchically organized layers of processing supporting the development and interpretation of granular time series [12].

A framework is proposed for studying a particular class of set-theoretic approaches to granular computing. A granule is a subset of a universal set, a granular structure is a family of subsets of the universal set, and relationship between granules is given by the standard set-inclusion relation [13]. Vassilis G. Kaburlasos and his colleague analyzed the representation method of granule, the inclusion measure between two granules, and the operations between two granules from set theory and lattice computing [14]. They proposed an effective synergy of the Intervals’ Number k-nearest neighbor (INknn) classifier, that is a granular extension of the conventional knn classifier in the metric lattice of Intervals’ Numbers (INs), with the gravitational search algorithm (GSA) for stochastic search and optimization. Their proposed techniques are demonstrated, comparatively, by computer simulation experiments regarding an industrial dispensing application and the benchmark classification datasets [15, 16].

Inspired by people’s cognitive structure, we analyze the granular clustering algorithm based on distance from the view of set, and the granule is represented as the subset for the training set S. The operation and distance between two granules are formed to design GrCC.

In general, for two subsets of training set S, distance between two non-empty sets is the minimum of the distances between any two elements belonging two different subsets, i.e. $d (A, B) = \min_{x \in A, y \in B} d (x, y)$ (1) where d (x, y) is Euclidean distance between two points and has the properties, such as nonnegativity, symmetry, and triangular inequality. For aforementioned distance formula (1), it is suitable that intersection of set A and set B is empty set. In Fig. 1, sets A = {x₁, x₂, x₃, x₄, x₅} and B = {y₁, y₂, y₃, y₄, y₅ y₆} are denoted by ball A and B. In Fig. 1(a) distance between A and B is the distance between point x₅ and y₆, obviously, d (A, B) is greater than 0. In Fig. 1(b), distance between set A and B also is the distance between x₅ and y₆. If the distance between x₅ and y₆ in Fig. 1(a) is equal to the distance between x₅ and y₆ in Fig. 1(b), the distance d (A, B) in Fig. 1(a) is equal to the distance d (A, B) in Fig. 1(b). Obviously, the distance d (A, B) in Fig. 1(b) is less than Fig. 1(a), but d (A, B) in Fig. 1(b) is equal to Fig. 1(b) according to the distance formula (1). The above distance formula (1) does not reflect the real distance between two sets, and we define the distance between two sets, where sets are represented as the form of hyperdiamond, hypersphere, and hypercube. Granular computing clustering is formed based the defined distance measure.

Considering the limitation of Euclidean distance between two subsets, we propose the granular computing clustering based on the novel distance which violates the properties of non-negativity, symmetry, and triangular inequality from the view of set. We analyze the proposed GrCC from Global Consistency Error (GCE), Normal Variation of Information (NVI), and Rand Index (RI).

2.2 Representation of granule and granularity

In reality, the subset of data set S is regarded as a granule which has an irregular shape. In order to study granular computing, the granule is represented as regular shape, such as hyperdiamond, hypersphere, hypercube in N-dimensional space. These three shape granules can be represented as the following normal form.

Where C is the center of granule, r is the radius of granule which is generally induced by the distance between two points in N-dimensional space. Different distance form means the different shape of granule. Granularity is the size of granule, such as the radius of granule.

In Fig. 2, the normal form granule G = (0.1, 0.2, 0.5) is hyperdiamond granule shown in Fig. 2(a) in R² space, whose center is (0.1, 0.2) and granularity is 0.5 induced by the L₁-norm distance. G = (0.1, 0.2, 0.5) is hypershpere granule shown in Fig. 2(b) with center (0.1, 0.2) and granularity 0.5 induced by L₂-norm distance. G = (0.1, 0.2, 0.5) is hypercube granule shown in Fig. 2(c) with center (0.1, 0.2) and granularity 0.5 induced by L_∞-norm. From the Fig. 2, we can see different granules have different shapes even if they have the same forms of representations, such hyperdiamond granules, hypersphere granules, and hypercube granules in N-dimensional space.

2.3 Distance measure between granules

The distance between granules refers to the minimal distance between two points which belong to different granules.

For two hyperdiamond granules G₁ = (C₁, r₁) and G₁ = (C₂, r₂), the distance between two granules is defined as follows. $d (G_{1}, G_{2}) = {∥ C_{1} - C_{2} ∥}_{1} - r_{1} - r_{2}$ (2) where C₁ and C₂ are two vectors which represent the centers of hyperdiamond granules G₁ and G₂, r₁ and r₂ are granularities of hyperdiamond granules G₁ and G₂. The distance between C₁ and C₂ is defined as the following L₁-norm. $\begin{matrix} {∥ C_{1} - C_{2} ∥}_{1} \\ = | x_{1} - y_{1} | + | x_{2} - y_{2} | + \dots + | x_{N} - y_{N} | \end{matrix}$

For two hypersphere granule G₁ = (C₁, r₁) and G₂ = (C₂, r₂), the distance between two hypersphere granules is defined as follows. $d (G_{1}, G_{2}) = {∥ C_{1} - C_{2} ∥}_{2} - r_{1} - r_{2}$ (3) where C₁ and C₂ are two vectors which represent the centers of hypersphere granules G₁ and G₂, r₁ and r₂ are granularities of hypersphere granules G₁ and G₂. The distance between C₁ and C₂ is defined as the following L₂-norm. $\begin{matrix} {∥ C_{1} - C_{2} ∥}_{2} \\ = \sqrt{(x_{1} - y_{1})^{2} + (x_{2} - y_{2})^{2} + \dots + (x_{N} - y_{N})^{2}} \end{matrix}$

For hypercube granules G₁ = (C₁, r₁) and G₁ = (C₂, r₂), the distance between two hypercube granules is defined as follows. $d (G_{1}, G_{2}) = {∥ C_{1} - C_{2} ∥}_{\infty} - r_{1} - r_{2}$ (4)

Where C₁ and C₂ are two vectors which represent the centers of hypercube granules G₁ and G₂, r₁ and r₂ are granularities of hypercube granules G₁ and G₂. The distance between C₁ and C₂ is defined as the following L_∞-norm. $\begin{matrix} {∥ C_{1} - C_{2} ∥}_{\infty} \\ = max {| x_{1} - y_{1} |, | x_{2} - y_{2} |, \dots, | x_{N} - y_{N} |} \end{matrix}$

According to the distance between two granules mentioned above, the distance between two granules is the arbitrary real number. There is margin between two granules when d > 0, there is an only common point between two granules when d = 0, and there is an overlap between two granules when d < 0. When d > 0, the greater d means the greater margin between two granules, and when d < 0, the greater d means the smaller overlap. Figure 3 shows the distance between two granules, including d < 0, d = 0, and d < 0.

2.4 Operations between two granules

The operations between two granules reflect the transformation between macroscopic and microcosmic of human cognitions. When a person want to observe the object more carefully, the object is partitioned into some suitable sub-objects, namely the universe is transformed into some parts in order to study the object in detail in the view of microscopic. Conversely, there is the same attributes of some objects, we regard the objects as a universe to simple the process in the view of macroscopic. The operations between two granules are designed to realize the transformation between macroscopic and microscopic. Set-based models of granular structures are special cases of lattice-based models, where the lattice join operation ∨ coincides with set union operation ∪ and lattice meet operation ∧ coincides with set intersection operation ∩.

Join operation ∨ and meet operation ∧ are used to realize the transformation between macroscopic and microcosmic. Operation ∨ unites the granules with small granularities to the granules with the large granularities. Inversely, Operation ∧ divides the granules with large granularities into the granules with small granularities. Join operation ∨ is designed as follows.

Any points are regarded as atomic granules which are indivisible, the join process is the key to obtain the larger granules compared with atomic granules. Likewise, the whole space is a granule with the maximal granularity, the meet process produces the smaller granules compared with original granules.

For two hyperdiamond granules G₁ = (C₁, R₁) and G₁ = (C₂, R₂) in the N-dimensional space, the join hyperdiamond granule is $G = G_{1} \lor G_{2} = (C, R)$

The center C and the granularity R of G are computed as follows.

Firstly, the vector from C₁ to C₂ and vector from C₂ to C₁ are computed.

If C₁ = C₂, then C₁₂ = 0 and C₂₁ = 0.

If C₁ ≠ C₂, then C₁₂ = (C₂ - C₁)/d (C₁ - C₂) and C₂₁ = (C₁ - C₂)/d (C₂ - C₁).

Secondly, the crosspoints of G and G₁ are P₁ = C₁ - C₁₂R₁ and P₂ = C₁ + C₁₂R₁. The crosspoints of G and G₂ are Q₁ = C₂ - R₂C₂₁ and Q₂ = C₂ + R₂C₂₁.

Thirdly, the center C and granularity R of the join hyperdiamond granule G are computed by the centers and granularities of G₁ and G₂, and Algorithm 1 is designed to realise the computing process, in which the distance is computed by the formula (2). In algorithm 1, if we design the hypersphere granular computing, the distance is computed by the formula (3), if we design the hypercube granular computing, the distance is computed by the formula (4).

Algorithm 1. computing C and R of join hypersphere granule G between G₁ and G₂

Input: G₁ = (C₁, R₁) and G₂ = (C₂, R₂)

Output: G = (C, R)

if R₁ > = R₂

if d (C₁, C₂) < = R₁ - R₂, C = C₁, R = R₁

else C = (P₁ + Q₁)/2, R = d (P₁, Q₁)/2

end

else

if d (C₁, C₂) < = R₂ - R₁, C = C₂, R = R₂

else C = (P₁ + Q₁)/2, R = d (P₁, Q₁)/2

end

end

Figure 4 shows the join process between the hyperdiamond granule G₁ = [0.2 0.15 0.1] and the hyperdiamond granule G₂ = [0.1 0.2 0.1]. The crosspoints of hyperdiamond granule G₁ and the line crossing vector C₁₂ = [-0.6667, 0.3333] are P₁ = [0.2667, 0.1167] and P₂ = [0.1333, 0.1833]. The crosspoints of hyperdiamond granule G₂ and the line crossing vector C₂₁ = [0.333 - 0.6667] are Q₁ = [0.0333, 0.23333] and Q₂ = [0.1667, 0.1667]. According to algorithm1, the central vector and granularity of the join hyperdiamond granule G are C = [0.15, 0.175] and R = 0.175, namely G = [0.15 0.175 0.175].

3 Granular computing clustering

For the data set S = { x _i|i = 1, 2, …, n} inN-dimensional space, we form the following granular computing clustering (GrCC) algorithms based on the aforementioned granular computing.

Firstly, the samples are used to form the atomic granule. Secondly, the threshold of granularity is introduced to conditionally union the atomic granules by the aforementioned join operation, and the granule set is composed of all the join granules. Thirdly, if all atomic granules are included in the granules of GS, the join process is terminated, otherwise, the second process is continued. The GrCC process is described as follows.

Suppose the atomic granules induced by data set S are g₁, g₂, g₃, g₄, g₅. The clustering process can be described as the following tree structure shown in Fig. 5, leafs denote the atomic granules, root denotes GS including its child nodes G₁, G₂, and g₃. g₁ is selected as the first granule, g₂ is the nearest granule to g₁, G₁ is induced by join operation of child nodes g₁ and g₂ because the granularity of join granule of g₁ and g₂ is less than or equal to ρ. g₃ is the nearest granule to G₁, but the granularity of the join granule between G₁ and g₃ is greater than ρ, so g₃ becomes the member of GS. g₄ is the nearest granule to g₃, the granularity of join granule g₃ ∨ g₄ is greater than ρ, so g₃ is not be updated, and g₄ is the member of GS. g₅ is the nearest granule to g₄, the granularity of g₄ ∨ g₅ is less than ρ, so g₄ is updated as G₂ = g₄ ∨ g₅. The whole process of obtaining GS is the bottle up process.

The GrCC framework is described as Algorithm 2.

Algorithm 2. GrCC

Input: Data set S, threshold ρ of granularity

Output: Granule set GS

S1. for the 1th sample x₁ in S, form the corresponding atomic

granule G₁, granule set GS = {G₁}, remove x₁ from S.

S2. if S is empty, the procedure is terminated.

S3. name the last granule of GS as G

S4. j = 1 : |S|

S5. computing the distance d_j between G and G_j, where G_j

is represented by jth sample of S

S6. find the minimal distance,

id = arg min d_j

S7. if the granularity of G ∨ G_id is less than or equal to

ρ, G = G ∨ G_j, otherwise GS = GS ∪ {G_id}.

S8. remove x_id from S, go to S2.

We take the hypersphere granular clustering in 2D space for example, the data set including 6 sample is S = {x₁, x₂, x₃, x₄, x₅, x₆} = {(0, 0) , (0.14, 0.7) , (0.3, 0.38) , (0.46, 0.06) , (0.6, 0.76) , (0.76, 0.44)} which is composed of 6 vectors, and each vector is represented as an atomic hypersphere granule, 6 atomic hypersphere granules are g₁ = (0, 0, 0), g₂ = (0.14, 0.7, 0), g₃ = (0.3, 0.38, 0), g₄ = (0.46, 0.06, 0), g₅ = (0.6, 0.76, 0), g₆ = (0.76, 0.44, 0), which are shown in Fig. 6(a). If the threshold ρ is set to 0.3, the clustering process is described as follows.

For the selected sample x₁, the corresponding atomic granule g₁ is selected to form GS = {g₁}, x₁ is removed from S, and S = {x₂, x₃, x₄, x₅, x₆}, the corresponding atomic granules are g₂, g₃, g₄, g₅, and g₆. The distances between the selected granule g₁ and the rest granules are $\begin{matrix} d (g_{1}, g_{2}) = 0.7139 d (g_{1}, g_{3}) = 0.4841 \\ d (g_{1}, g_{4}) = 0.4639 d (g_{1}, g_{5}) = 0.9683 \\ d (g_{1}, g_{6}) = 0.8782 \end{matrix}$ the granule nearest to granule g₁ is g₄, the join granule of g₁ and g₄ is $g_{1} \lor g_{4} = (0.23, 0.03, 0.2319)$ the granularity of join granule is 0.2319 which is less than ρ, so the granule g₁ is replaced by g₁ ∨ g₄ and x₄ is removed from S, S = {x₂, x₃, x₅, x₆}. The distance between g₁ and the rest granules are $\begin{matrix} d (g_{1}, g_{2}) = 0.4441 d (g_{1}, g_{3}) = 0.1250 \\ d (g_{1}, g_{5}) = 0.5865 d (g_{1}, g_{6}) = 0.4381 \end{matrix}$ the nearest granule g₃ to g₁ is selected to join with g₁, and the join granule is $g_{1} \lor g_{3} = (0.2423, 0.0913, 0.2944)$ whose granularity is less than ρ, and g₁ is replaced by g₁ ∨ g₃, x₃ is removed from S, S = {x₂, x₅, x₆}, the corresponding atomic granules are g₂, g₅, g₆. The distance between g₁ and the rest granules are $\begin{matrix} d (g_{1}, g_{2}) = 0.3228 d (g_{1}, g_{5}) = 0.4640 \\ d (g_{1}, g_{6}) = 0.3298 \end{matrix}$ the nearest granule to g₁ is g₂, the join granule between g₁ and g₂ is $g_{1} \lor g_{2} = (0.2155, 0.2505, 0.4558)$ whose granularity is greater than ρ, and g₂ is move into GS, GS = {g₁, g₂}, x₂ is removed from S, S = {x₅, x₆}, the corresponding granules are g₅ and g₆. The distance between g₂ and the rest granules are $d (g_{2}, g_{5}) = 0.4639 d (g_{2}, g_{6}) = 0.6723$ the nearest granule to g₂ is g₅, the join granule between g₂ and g₅ is $g_{2} \lor g_{5} = (0.37, 0.73, 0.2319)$ whose granularity is less than ρ, and g₂ is replaced by g₂ ∨ g₅, x₅ is removed from S, S = {x₆}, the join granule between g₂ and g₆ is $g_{2} \lor g_{5} = (0.4719, 0.6542, 0.3590)$ whose granularity is greater than ρ, and the g₆ is united into GS, namely GS = {g₁, g₂, g₆}, x₆ is removed from S, and the join process is terminated because S is empty, the center of granule is the cluster center. The clustering process is shown in Fig. 6.

4 Evaluation of clustering from the view of set

For the partitions $π = {S_{1}, S_{2}, \dots, S_{s}} and π^{'} = {S_{1}^{'}, S_{2}^{'}, \dots, S_{t}^{'}}$ by two different clustering algorithms of data set S including n objects, and $\begin{matrix} n & = & | S_{1} | + | S_{2} | + \dots + | S_{s} | \\ = & | S_{1}^{'} | + | S_{2}^{'} | + \dots + | S_{t}^{'} | \end{matrix}$ where S_i ⊆ S, S_j ⊆ S. we evaluate the partition by the following methods from the view of set.

4.1 Global consistency error

D. Martin proposed several error measures to quantify the consistency between partitions of the same set [17, 18]. Let π and π′ be two partitions of data set S = {x₁, x₂, …, x_n} consisting of n objects with N features. For a given datum x_i, consider the classes that contain x_i in π and π′, C (π, x_i) and C (π′, x_i) are denoted as the subsets contain x_i in π and π′, respectively. Local Refinement Error (LRE) is then defined at point x_i as: $\begin{matrix} LRE (π, π^{'}, x_{i}) \\ = | C (π, x_{i}) - C (π^{'}, x_{i}) | / | C (π, x_{i}) | \end{matrix}$ where C (π, x_i) - C (π′, x_i) denotes the set differencing operat or between sets C (π, x_i) and C (π′, x_i). This error measure is not symmetric and encodes a measure of refinement in one direction only. There are two natural ways to combine the LRE at each point into a measure for the entire data set. Global Consistency Error (GCE) forces all local refinements to be in the same direction and is defined as: $\begin{matrix} GCE (π, π^{'}) \\ = min {LRE (π, π^{'}, x_{1}) + \dots + LRE (π, π^{'}, x_{n}), \\ LRE (π^{'}, π, x_{1}) + \dots + LRE (π^{'}, π, x_{n})} / n \end{matrix}$

Property 1. For all S_i ∈ π, there is an only $S_{j}^{'} \subseteq π^{'}$ satisfying $S_{i} \subseteq S_{j}^{'}$ , then GCE (π, π′) =0.

Proof. For all x ∈ S, there is an only subset S_i of S in π satisfying x ∈ S_i, S_i ∈ π, there is an only $S_{j}^{'} \in π^{'}$ , S_i ⊆ S_j, namely C (π, x) - C (π′, x) =∅. So GCE (π, π′) =0.

4.2 Normalized variation of information

Work in [19] computes a measure of information content in each of the partitions. The proposed measure, termed the Variation of Information (VI), is a metric and is related to the conditional entropies between the class label distribution of the partitions. The VI is computed by the following steps.

Firstly, computing the entropies En (π) and En (π′) associated with partitions π and π′.

For π = {S₁, S₂, …, S_S}

En (π) = - (P (1) log ₁₀P (1) + … + P (s) log ₁₀P (s))

where P (i) = |S_i|/n, and for $π^{'} = {S_{1}^{'}, S_{2}^{'}, \dots, S_{t}^{'}}$

En (π′) = - (P′ (1) log ₁₀P′ (1) + … + P′ (t) log ₁₀P′ (t))

Where $P^{'} (i) = | S_{i}^{'} | / n$ , log ₁₀0 = 0.

Secondly, computing the mutual information between π and π′ $I (π, π^{'}) = \sum_{i = 1}^{s} \sum_{j = 1}^{t} P (i, j) {log}_{10} \frac{P (i, j)}{P (i) P^{'} (j)}$

Thirdly, computing the VI $VI (π, π^{'}) = En (π) + En (π^{'}) - 2 I (π, π^{'})$

The VI can be normalized by the following formula [20]. $NVI (π, π^{'}) = VI / {log}_{10} (s)$

Property 2. If π = π′, then NVI (π, π′) =0.

Proof. If π = π′, then En (π) = En (π′), $S_{i} = S_{i}^{'}$ ,s = t.

For i ≠ j, $S_{i} \cap S_{j}^{'} = S_{i} \cap S_{j} = \emptyset$ , P (i, j) =0.

For i = j, $S_{i} \cap S_{j}^{'} = S_{i} \cap S_{j} = S_{i} = S_{j}$ , P (i) = P (j) , P (i, j) = P (i). $\begin{matrix} I (π, π^{'}) & = & \sum_{i = 1}^{s} \sum_{j = 1}^{t} P (i, j) {log}_{10} \frac{P (i, j)}{P (i) P^{'} (j)} \\ = & \sum_{j = 1}^{t} P (j) {log}_{2} \frac{1}{P (j)} \\ = & - \sum_{j = 1}^{t} P (j) {log}_{2} P (j) = En (π^{'}) \end{matrix}$

So VI (π, π′) =0, and NVI (π, π′) =0.

4.3 Rand index

Rand Index (RI) was motivated by standard classification problems in which the result of a classification scheme has to be compared to a correct classification [21]. The most common performance measure for this problem calculates the fraction of correctly classified (respectively misclassified) elements to all elements. For RI, comparing two clusters is just a natural extension of this problem which has a corresponding extension of the performance measure, instead of counting single elements, The correct classified pairs are counted. Thus, RI is defined by $RI (π, π^{'}) = 2 (n_{11} + n_{00}) / (n (n - 1))$ where n₁₁ denotes the numbers of pairs that are in the same cluster under π and π′, n₀₀ denotes the number of pairs that are in different clusters under π and π′. RI depends on both the number of clusters and the number of elements, and ranges from 0 to 1. Obviously, we have the following property for RI.

Property 3. If π = π′, then RI (π, π′) =1.

5 Experiments

In order to verify the superiority and feasibility of GrCC, we compared the proposed clustering with clustering algorithms by Kmeans and FCM, and all the experiments are performed in the same environment, such as Intel PIV PC with 2.8 GHz CPU and 2 GB memory, Microsoft Windows XP Professional and Matlab 7.0.

The performance includes global consistency error (GCE), normalized variant information (NVI), and Rand Index (RI) which measure the index value between partition by algorithms and partition by human. On the one hand, we compare the proposed algorithms, such as GrCC by the formula (2) (GrCC1), GrCC by the formula (3) (GrCC2), and GrCC by the formula (4) (GrCC3). On the other hand, we compare GrCC with the traditional clustering algorithms, such as FCM and Kmeans, GrCC is performed to obtain the granule set, and the size of granule set is regarded as the clustering numbers FCM and Kmeans.

For the selection of parameters, the parameter ρ of GrCC is selected from the large ρ to small ρ with the step length to realize the transformation from the granule space with large granularity to the granule space with small granularity. The numbers of cluster of FCM and Kmeans are set as the numbers of achieved granules by GrCC.

Firstly, the data set named spiral including 3 artificial clusters in [22] is used to verify the clustering feasibility of GrCC. The comparisons of GrCC1, GrCC2, and GrCC3 are listed in Table 1, the best results are in bold, and the value in brackets indicates the parameters to achieve the optimal performance of the algorithm. The curve of GCE of GrCC algorithms are shown in Fig. 7 with the decrease of granularity parameter from 4.5 to 1 with the step 0.01, from the figure we can see GrCC1 reaches the minimum GCE 0 firstly compared with GrCC2 and GrCC3. For NVI, GrCC3 is the best clustering algorithm because GrCC3 achieves the minimal NVI 0.1936 firstly compared with the minimal NVI 0.1939 by GrCC2 and the minimal NVI 0.2033 by GrCC1, the NVI by GrCC3 is less than NVI by GrCC2 and GrCC1 when ρ is less than 3.49. For RI, when ρ= 4.31, GrCC2 reaches the optimal value 0.6959 firstly which is greater than the maximal RI 0.6905 by GrCC1 and the maximal RI 0.6947 by GrCC3. For the data set spiral GrCC2 and GrCC3 are better than GrCC1 because GrCC2 achieves the best GCE 0 and the best RI 0.6959, and GrCC3 achieves the best GCE 0 and the best NIV 0.1936. We select GrCC2 to compare with the traditional clustering algorithms, such as FCM and Kmeans.

Secondly, we compare GrCC2 with FCM and Kmeans clustering algorithms in N-dimensional space by the selected data sets which can be found in website http://cs.joensuu.fi/sipu/datasets/. The performance is listed in Table 2, the best results are in bold, and the value in brackets indicates the parameters to achieve the optimal performance of the algorithm. From the table, we can see GrCC2 is better than FCM and Kmeans from GCE, NVI, and RI. For data sets Dim32, Dim64, Dim128, Dim256, GrCC2 achieved the same clustering results as human, and the clustering results of GrCC2 are different from human for data sets iris and sensor because there is a large margin between the clusters for data sets Dim32, Dim64, Dim128, and Dim256 compared with data sets iris and sensor.

The parameter ρ is sensitive to the margin for the data set. We discuss the influence of margin to the performance by the distance formula (3). According to the supervision information of data set, each artificial cluster is mapped into a hypersphere granule, the distance between two artificial clusters by formula (3) reflects the margin between two clusters. For the data set including n artificial clusters, we obtain n × n matrix D which is composed of the distance between arbitrary two artificial clusters and the diagonal elements are defined as infinity. $D = {\begin{matrix} d (G_{i}, G_{j}) & i \neq j \\ \infty & i = j \end{matrix}$

Where d (G_i, G_j) is the distance by formulas (2), (3), or (4), G_i and G_j are the corresponding granules which are induced by the ith artficial cluster and the jth artificial cluster. The minimum of D can be regarded as the margin M of data set, namely the margin M of data set is defined as follows. $M = min_{i, j} D$

When the margin is less than 0, the data set has the greater mutual information and greater NVI, namely the minor parameter ρ for GrCC and greater number (K) of cluster for FCM and Kmeans is selected to achieve the optimal GCE, NVI, and RI. For data set spiral, the minimal distance between two artificial clusters is –57.9495, GrCC1 achieves the minimal GCE 0, minimal NVI 0.2033, maximal RI 0.6905, and 45 hyperdiamond granules when ρ = 4.4, the number of clusters by GrCC1 is more than that of artificial clusters.

6 Conclusions and future directions

Evaluation indices, such GCE, NVI, and RI, are used to compared GrCC with conventional clustering algorithms FCM and Kmeans from the view of set. The evaluation methods are used to verify the clustering algorithms for the supervised clustering problems. The experimental results show that GrCC is better than FCM and Kmeans in the aspects of GCE, NVI, and RI. As a novel clustering algorithm, GrCC has some improvements, such as the GrCC achieved the better performance for the data sets whose margins are greater than 0. In order to obtain the satisfaction GCE, NVI, and RI, the GrCC induced the more cluster numbers compared with the artificial clustering.

For the future directions, as the method of information processing, granular computing reflects the degree that people understand the world. Human-centered information processing was initiated with the introduction of fuzzy sets. The mathematical theory of granular computing is still one of the research focus in granular computing, how to give deep understanding of human cognition from the viewpoint of granular computing is a future direction for granular computing researchers. In this paper, for the given supervised clustering problem, we only analyze the effect of margin on the performance of the clustering algorithms, and do not use margin to guide the selection of parameters, how to learn the granularity parameter ρ from the data set is another one future direction. The hypercube granule can be represented as the Cartesian product of intervals with the same interval lengths, we will study granular computing based on interval analysis in our future work.

Footnotes

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (Grant Nos. 61170202, 61402393, 61501393).

References

Zimek

and Vreeken

, The blind men and the elephant: On meeting the problem of multiple truths in data from clustering and pattern mining perspectives, Machine Learning 98(1-2) (2015), 121–155.

Zarinbal

, Zarandi

M.H.F.

and Türksen

I.B.

, Relative entropy collaborative fuzzy clustering method, Pattern Recognition 48(3) (2015), 933–940.

Chen

and Liu

, Clustering-based discriminant analysis for eye detection, IEEE Transactions on Image Processing 23(4) (2014), 1629–1638.

Peng

and Liu

, Clustering-based topical Web crawling for topic-specific information retrieval guided by incremental classifier, International Journal of Software Engineering and Knowledge Engineering 25(1) (2015), 147–168.

and Su

, Identification of cell types from single-cell transcriptomes using a novel clustering method, Bioinformatics 31(12) (2015), 1974–1980.

Aloise

, Deshpande

, Hansen

and Popat

, NP-hardness of Euclidean sum-of-squares clustering, Machine Learning 75 (2009), 245–249.

Khalilia

, Bezdek

J.C.

, Popescu

and Keller

J.M.

, Improvements to the relational fuzzy c-means clustering algorithm, Pattern Recognition 47(12) (2014), 3920–3930.

Veenman

C.J.

, Reinders

M.J.T.

and Backer

, A maximum variance cluster algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence 24(9) (2002), 1273–1280.

Liu

, Gegov

A.E.

and Cocea

, Collaborative rule generation: An ensemble learning approach, Journal of Intelligent & Fuzzy Systems 30(4) (2016), 2277–2287.

10.

Pedrycz

, Granular fuzzy rule-based architectures: Pursuing analysis and design in the framework of granular computing, Intelligent Decision Technologies 9(4) (2015), 321–330.

11.

Wang

and Xu

, Granular computing with multiple granular layers for brain big data processing, Brain Informatics 1(1-4) (2014), 1–10.

12.

Pedrycz

, Lu

, Liu

, Wang

and Wang

, Human-centric analysis and interpretation of time series: A perspective of granular computing, Soft Computing 18(12) (2014), 2397–2411.

13.

Yao

, Zhang

, Miao

and Xu

, Set-theoretic approaches to granular computing, Fundamenta Informaticae 115(2-3), 247–264.

14.

Graña

, Lattice computing in hybrid intelligent systems, Proceeding of International Conference on Hybrid Intelligent Systems (HIS), Pune, India, 2012, pp. 1–5.

15.

Jamshidi

and Kaburlasos

V.G.

, gsaINknn: A GSA optimized, lattice computing knn classifier, Engineering Applications of Artificial Intelligence 35 (2014), 277–285.

16.

Kaburlasos

V.G.

and Pachidis

T.P.

, A Lattice-Computing ensemble for reasoning based on formal fusion of disparate data types, and an industrial dispensing application, Information Fusion 16 (2014), 68–83.

17.

Martin

, An empirical approach to grouping and segmentation, Ph.D. dissertation, University of California, Berkeley, 2002.

18.

Martin

D.R.

, Fowlkes

and Malik

, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Transaction Pattern Analysis Machine Intelligence 26(5) (2004), 530–549.

19.

Meilă

, Comparing clusterings an information based distance, Journal of Multivariate Analysis 98(5) (2007), 873–895.

20.

Dimitriadis

S.I.

, Laskaris

N.A.

, Del Rio-Portilla

and Koudounis

G.C.

, Characterizing dynamic functional connectivity across sleep stages from EEG, Brain Topography 22(2) (2009), 119–133.

21.

Randm

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association 66(336) (1971), 846–850.

22.

Chang

and Yeung

D.Y.

, Robust path-based spectral clustering, Pattern Recognition 41(1) (2008), 191–203.