A multiple hierarchical clustering ensemble algorithm to recognize clusters arbitrarily shaped

Abstract

As a research hotspot in ensemble learning, clustering ensemble obtains robust and highly accurate algorithms by integrating multiple basic clustering algorithms. Most of the existing clustering ensemble algorithms take the linear clustering algorithms as the base clusterings. As a typical unsupervised learning technique, clustering algorithms have difficulties properly defining the accuracy of the findings, making it difficult to significantly enhance the performance of the final algorithm. AGglomerative NESting method is used to build base clusters in this article, and an integration strategy for integrating multiple AGglomerative NESting clusterings is proposed. The algorithm has three main steps: evaluating the credibility of labels, producing multiple base clusters, and constructing the relation among clusters. The proposed algorithm builds on the original advantages of AGglomerative NESting and further compensates for the inability to identify arbitrarily shaped clusters. It can establish the proposed algorithm’s superiority in terms of clustering performance by comparing the proposed algorithm’s clustering performance to that of existing clustering algorithms on different datasets.

Keywords

Clustering ensemble non-convex clusters AGglomerative NESting local hypothesis

1. Introduction

Image processing, recommender systems, and data analysis all benefit from clustering analysis. It aims to divide data objects into classes with high intra-class similarity and low inter-class similarity. Numerous clustering methods have been designed from various perspectives to achieve this goal [33]. Han and Kamber [18] categorized these methods into partitioning, hierarchical, density-based, model-based, and grid-based methods.

This paper attempts to propose a strategy for integrating Multiple AGglomerative NESting algorithms (MAGNES). Cluster ensemble [29, 17] is a common methodology, which helps to combine multiple base clusterings to improve the prominence of the cluster results. However, the unsupervised nature of the clustering algorithm makes it difficult to integrate several base algorithms when the dataset is unlabeled, further improvements are still possible. According to the study of Bai et al. [5, 4]. There are three key elements that influence the effect of clustering ensemble:

•
Label credible evaluation. When a single algorithm completes, part of the data may have the wrong clustering labels. If the points are never classified correctly, then the validity of the final ensemble will be affected by the incorrect labels. How to correctly judge the credibility of the clustering labels is the key to improving the ensemble performance.
•
Inter-cluster dissimilarity of base clusters. In clustering integration, the base clustering algorithms should be distinct from each other to obtain more robust results. If the base clustering algorithms all have extremely similar outputs, then the integration operation will be meaningless. Therefore, Bai generated different multiple basis clusters by introducing the randomness of fuzzy k-means [5] and k-means [4] algorithms to obtain the best results.
•
Inter-cluster equivalence relations. Unlike classification algorithms that have independent labels, the same result may be reflected on different clustering labels. So it is necessary to determine which clustering labels have an equivalence relationship. A reasonable clustering relationship is a prerequisite for obtaining good clustering results. Since different base clusters generated by the same base clustering algorithm may also represent the same true classification, the inter-cluster equivalence relation should be the overall relation of all base clusters owned by different base algorithms.

To overcome these problems, Bai et al. [4] argued that a cluster center’s local space might be represented by that center. Based on this hypothesis, this paper uses AGglomerative NESting (AGNES) as a base clustering algorithm. By building a hypothetical center to construct the local space, this paper proposes a MAGNES algorithm with a well-clustered result.

The structure of this paper is as follows. Section 2 introduces the related research progress in the field of cluster ensemble. Section 3 covers the ideas linked to granulation methods and offers the MAGNESs ensemble structure. Section 4 shows the experimental assessment of the proposed ensemble structure and a summary. Section 5 concludes with recommendations for future improvements.
2. Related work

The two main steps in the ensemble technique are Generation and Consensus. Generation is the first step, where a set of different base clusterings are generated. There are some methods for generating a base clustering set, of which two methods are more efficient. The first one is a mechanism that combines several single algorithms with different parameters, different clustering algorithms, and different algorithms on several dataset subsets. The second is the “granulation” method based on the local hypothesis. Different from the first method, the second method introduces the concept of label credibility, which provides consistency, novelty, and stability.

Consensus is the second step, in which you must be proficient in enlightening the solitary clustering algorithm. Different consensus functions have been developed to derive efficient data partitions, which can be divided into the following groups:

1)
Voting is the easiest and most straightforward method. For example, the Hungarian algorithm [22] presents the contingency matrix as a graph to relabelling the data inspired by the research of Topchy [31]. Therefore, all partitions use a globally compatible label package. The majority voting method can then be used to decide the data labels. Dudoit and Fridy [11] and Fischer and Buhmann [14] also used a similar voting pattern. Frossyniotis et al. [16], Dimitriadou et al. [9, 10], Ayad and Kamel [3] developed and promoted an incremental voting model, which adds data sequentially to the underlying set to continuously correct the voting results. Boulis and Ostendorf [7] replaced the combinatorial strategy generated by the classifier integration task with the ‘Label Correspondence’ method of Label Correspondence Search (LCS).
2)
The feature-based strategy regards the cluster integration process as categorical data clustering. Iterative Voting Consensus (IVC) with $k$ -modes as a consensus function was suggested by Nguyen and Caruana [24]. Topchy et al. [30] presented a mixture model and used the EM algorithm to optimize the model by treating the model as a maximum likelihood estimation problem. Clustering Aggregation (AGG) [17] also tried to find the model that is most consistent with all models. Bradley and Fayyad [8] used the $k$ -means to refine the initial data for better performance.
3)
The pairwise similarity approach builds a similarity matrix between distinct data points to solve the problem [15]. Fred and Jain [15] architected a ‘co-association (CO)’ matrix to return the final partition. Li et al. [21] devised a new hierarchical clustering method. Different from several methods of cluster ensemble which concentrate on blending the effects of crisp clustering. Fuzzy clustering algorithm integration is also a research hotspot. To obtain the final soft partition, Avogadri and Valentini [2] used fuzzy C-means to cluster the CO alike matrix by row.
4)
The graph-based method uses the idea of graph theory to complete the clustering task [33]. Yu et al. [32] used the ideas of graph theory to transform the correlation matrix into a graph G for further analysis. Strehl and Ghosh [29] proposed the cluster-based similarity partitioning algorithm (CSPA), hypergraph partitioning algorithm (HGPA), and meta clustering algorithm (MCLA). Similar to yu’s method, CSPA also generates a similarity graph. HGPA creates a hypergraph in which the vertices represent data points and each cluster is thought of as a hyperedge connecting the points. MCLA constructs a $r$ -partite graph where each vertex includes one cluster while edges’ weights are the similarity among them. Fern and Brodley [13] formed a bipartite graph with equal edge weights, partitioning of a graph is the process of grouping instance vertices and cluster vertices.

The literature review exposed that the performance of the basic clustering algorithm influences the above integration techniques. In order to effectively measure the performance of several basic clustering algorithms and reasonably integrate them to improve the overall performance, Bai et al. [4] proposed an evaluation function measuring the credibility of the labels and a clustering ensemble algorithm that possesses different local-credible labels. Then they constructed the relationship graph for all clusters by analyzing the ‘indirect’ overlap of the local credibility space. In considerations for measuring the ‘indirect’ overlap between clusters’ local credible spaces include the following two factors: (1) The distance between the two centers. (2) Latent cluster’s local-credible space density. Compared to linear clustering methods, AGNES has better clustering performance and wider applicability. Based on the theory of Bai et al., this paper attempts to construct multiple base clusterings, with different local spaces through AGNES, and aggregate them to create an integrated method with good clustering performance.
3. The improved cluster ensemble algorithm

Suppose $X=\{x_{i}\}^{N}_{i=1}$ is a dataset of number N, $\Pi=\{\pi_{h}\}^{T}_{h=1}$ represents T base algorithms, $\pi_{h}=\{C_{hl}\}^{k_{h}}_{l=1}$ is the $h$ th base algorithm, where $k_{h}$ represents the number of divisions, $C_{hl}$ represents the $\pi_{h}$ ’s $l$ th cluster. When $x_{i}$ is divided into cluster $C_{hl}$ , $\pi_{h}(x_{i})=l$ . $M=\{m_{h}\}^{T}_{h=1}$ is a set of label credibility matrices. $m_{h}=[\omega_{\textit{hli}}]_{1\leqslant{l}\leqslant{k_{h}},1\leqslant{i}% \leqslant{N}}$ is the $h$ th algorithm’s label credibility matrix while $\omega_{\textit{hli}}$ is the label credibility of $x_{i}$ to cluster $C_{hl}$ . $K=\{k_{h}\}^{T}_{h=1}$ is a clusters number set of all base clusters. $v_{h}=\{v_{hl}\}^{k_{h}}_{l=1}$ denotes the center set of the $h$ th clustering and $v_{hl}$ is the $l$ th cluster center. This paper uses the base algorithm set $\Pi$ and hopes to use the cluster ensemble structure to obtain the final ensemble algorithm $\pi^{*}$ .

3.1 Label credibility function

Figure 1.

Clustering of AGNES.

AGNES is a bottom-up hierarchical clustering method [20]. The following are three common inter-cluster distance calculation formulas:

$\displaystyle\textit{Single link}:d_{sl}(C_{g},C_{h})=\min\limits_{x_{i}\in C_% {g},x_{i}^{\prime}\in C_{h}}d_{x_{i},x_{i}^{\prime}}$ $\displaystyle\textit{Complete link}:d_{cl}(C_{g},C_{h})=\max\limits_{x_{i}\in C% _{g},x_{i}^{\prime}\in C_{h}}d_{x_{i},x_{i}^{\prime}}$ $\displaystyle\textit{Average link}:d_{\textit{avg}}(C_{g},C_{h})={\frac{1}{n_{% C_{g}}n_{C_{h}}}}\sum\limits_{x_{i}\in C_{g}}\sum\limits_{x_{i}^{\prime}\in C_% {h}}d_{x_{i},x_{i}^{\prime}}$

where $C_{g},C_{h}$ are two different clusters. $d_{x_{i},x_{i}^{\prime}}$ is Euclidean distance between point $x_{i}$ and point $x_{i}^{\prime}$ . $n_{C_{g}}$ and $n_{C_{h}}$ are the sample sizes of $C_{g}$ and $C_{h}$ . In AGNES, the three distance equations presented above are often used for combination. AGNES will keep combining the closest clusters until the maximum classification number threshold is reached. However, due to the irreversible process of merging clusters, incorrect merging results may appear when using the above arbitrary equations. Figure 1 depicts an example of AGNES clustering. It is easy to find that cluster 1 contains objects belonging to different real groups. Consequently, the clustering results obtained by AGNES do not reflect the true classification of the non-convex dataset appropriately.

According to Fig. 1, as the radius of the local space gradually shrinks, objects’ real classification in the local space becomes more consistent. Moreover, when such center-represented space shrinking, objects’ true labels become more reliable. Thus, assuming the label is accurate when the object described by the center of a cluster falls into the local space of the center. Define the space at a distance $\epsilon$ from the center of the cluster as the local-credible space with radius $\epsilon$ , and a label credibility function is [4]:

$\displaystyle{\lambda}_{hl}(x_{i})=\left\{\begin{array}[]{rcl}1,&&{x_{i}}\in B% (v_{hl}),\\ 0,&&\text{otherwise},\\ \end{array}\right.$ (1)

where $B(v_{hl})=\{x_{j}\in X{\mid}d(x_{j},v_{hl})\leqslant\epsilon\}$ is the center $v_{hl}$ ’s local-credible space, which is $v_{hl}$ ’s $\epsilon$ -neighborhood. Center $v_{hl}$ in this paper is defined as below:

$\displaystyle v_{hl}=\min\limits_{x_{i}\in C_{hl}}\sum\limits_{x_{j}\in C_{hl}% \atop x_{j}\neq x_{i}}d_{x_{i},x_{j}}.$ (2)

3.2 Production of multiple base clusterings

Figure 2.

Operation of MAGNES algorithm. (a) The base algorithm 1. (b) The base algorithm 2. (c) The base algorithm 3. (d) The base algorithm 4. (e) The base algorithm 5. (f) The base algorithm 6.

After addressing credibility issues, this part discusses how to generate multiple AGNES clusterings based on different local-credible spaces.

Since the points within the center neighborhood are already considered reliable, they are no need to participate in the next AGNES algorithm, which also ensures the local-credible spaces created by each algorithm are different from each other. Using $\theta_{gi}$ as a judgment parameter to judge whether $x_{i}$ participates in the $g$ th base clustering or not. Such an iterative learning process can be represented by the following equation [4]:

$\displaystyle{\theta}_{gi}=\left\{\begin{array}[]{rcl}1,&&\text{if }\sum^{g-1}% _{h=1}{\lambda}_{hl}(x_{i})=0,\\ 0,&&\text{otherwise}.\\ \end{array}\right.$ (3)

Such an iterative learning algorithm called the MAGNES algorithm is described as follows. Set $h=1$ , ${\theta}_{h}(x_{i})=1$ for $x_{i}\in X$ and $S=X$ . At each stage, firstly, using AGNES algorithm with label credibility to cluster $S$ . This process begins with finding the center of each cluster via Eq. (1). The label credibility ensures points that are not in $\epsilon$ -neighborhood of the center are used in the next base clustering of cluster generation. The above steps yielded a set of label-credible clusters represented by each center. Then, in $X-S$ , assign each point the same label as the closest center. Then, updating $S=S-S_{c}$ . $S_{c}$ is the set of points that get the credible label in the $h$ th base algorithm. Let $h=h+1$ , for $i$ in $[1,N]$ , if $x_{i}\in S$ , then $\theta_{h}(x_{i})=1$ and otherwise $\theta_{h}(x_{i})=0$ . The above steps will continue until the remaining data in $S$ drops to a threshold value ${k_{h}}^{2}$ . It will also terminate if the number of base algorithms reaches a maximum threshold $T_{\max}$ .

MAGNES: Multiple AGglomerative NESting algorithmDataset $X$ , Cluster number $K$ , $\epsilon$ , $T_{\max}$ Credibility matrix $M$ , Center set $V$ For $i$ in $[1,N]$ , let iteration number $q=N$ , $\theta_{h}(x_{i})=1$ , $M$ = $\emptyset$ , $S=X$ , $C_{i}=\{x_{i}\}$ , $V=\emptyset$ , and $h=0$ $h\leqslant T_{\max}$ $|S|<{k_{h}}^{2}$ Break $q>k_{h}$ Find the two closest clusters $C_{i*}$ and $C_{j*}$ Merge $C_{i*}$ and $C_{j*}$ : $C_{i*}=C_{i*}\bigcup C_{j*}$ Update the distance matrix $q=q-1\;$ $\pi_{h}(x_{i})=C$ $1\leqslant l\leqslant k_{h}$ $v_{hl}=\min\limits_{x_{i}\in C_{hl}}\sum\limits_{x_{j}\in C_{hl},x_{j}\neq x_{% i}}\textit{dis}(x_{i},x_{j})$ $x_{i}\in S$ $\pi_{h}(x_{i})=l$ $\omega_{\textit{hli}}=1$ $\omega_{\textit{hli}}=0$ each $x_{i}\in X-S$ $\pi_{h}(x_{i})=\arg\min_{l=1}^{k_{h}}d_{x_{i},v_{hl}}$ $S^{\prime}=\{\lambda_{hl}(x_{i})=1,x_{i}\in S\}$ $1\leqslant i\leqslant N$ $x_{i}\in S^{\prime}$ $\theta_{h+1}(x_{i})=0$ $\theta_{h+1}(x_{i})=1$ Update $M=M\bigcup\{m_{h}\}$ , $V=V\bigcup\{v_{h}\}$ , $S=S-S^{\prime}$ and $h=h+1$

Algorithm 2 shows the above iterative progress. The clustering process of the MAGNES algorithm is demonstrated on the dataset Jain [6]. Set $\epsilon=0.19$ and obtain six clustering cases. Details of generating those cases are shown in Fig. 2. The points will be gray if they do not participate in the creation of a new base clustering. As can be seen in Fig. 2, the local spaces of these base clusters are different from each other. It is helpful for the next ensemble operations.

The MAGNES method has a $O(N\sum^{T}_{h=1}t_{h}k_{h})$ time complexity, in which $t_{h}$ is iterations number of AGNES in the process of generating the $h$ th base cluster. The algorithm outputs a set of label credibility matrices $M=\{m_{h}\}$ and a center set $V=\{v_{h}\}$ where h is in $[1,T]$ . Notably, if setting $T_{\max}$ a too large value, the end of the iteration depends entirely on the parameter $\epsilon$ . In other words, when the $\epsilon$ comes to a small number, the base algorithms number $T$ will be a large one. This is because a small $\epsilon$ means only a small number of objects are considered to have a credible label. For this reason, a reasonable number of base algorithms need to be obtained if the algorithm results are expected to represent the real situation of the data better. As a result, this parameter can be set by the user based on their own experience to achieve the best results.

3.3 Construction of cluster relation

Figure 3.

(a) Define the latent cluster (b) Exploring similarity through latent clusters’ density.

A classification label denotes a specific classification. However, a cluster label represents only the grouping attributes of the data, so different clustering results cannot be compared. Therefore, the labels of all clusterings need to be aligned. Moreover, when identifying non-convex datasets, the weakness of AGNES leads to the possibility that different clusters generated by a base clustering may represent the same true classification. Therefore, it is particularly important to analyze clusters’ relationships.

At this stage, there have been many measures to explore the similarity between clusters [33]. Among them, the algorithms constructed by Strehl [29] and Zhou [34] mentioned that the similarity can be judged using the number of objects common between different clusters, i.e., their overlap degree. Notably, such a measure cannot work when the two clusters share no items. To solve this dilemma and reasonably measure the degree of overlap, Iam-On [19] suggested a link-based measure. Although the above methods have given good ideas for calculating the inter-cluster similarity, none of them consider the effect of label credibility. In response to this, Bai et al. [4] have suggested a new similarity measure considering local-credible to overcome this drawback.

Considering local-credible spaces, they tried to measure two clusters’ similarity by computing the overlap between local-credible spaces. $C_{hl}$ and $C_{pq}$ are two clusters, let $v_{hl}$ and $v_{pq}$ be the center of them. If the Euclidean distance between $v_{hl}$ and $v_{pq}$ is more than $2\epsilon$ , they are considered to be non-overlapping. However, in local-credible theory, local spaces are often small and different spaces generally share few or no overlap parts. Based on this mechanism, Bai [5] constructed a skill involving ‘latent cluster’ to measure clusters’ ‘indirect’ overlap. Assume that the center $v_{(hl,pq)}$ of a latent cluster $C_{(hl,pq)}$ is the midpoint of $v_{hl}$ and $v_{pq}$ , $v_{(hl,pq)}=\frac{v_{hl}+v_{pq}}{2}$ . If the Euclidean distance from $v_{(hl,pq)}$ to both $v_{hl}$ and $v_{pq}$ is no more than $2\epsilon$ which means that cluster $C_{(hl,pq)}$ overlaps with both cluster $C_{hl}$ and cluster $C_{pq}$ , $C_{hl}$ and $C_{pq}$ are considered to be indirectly overlapping, which is shown in Fig. 3a.

Obviously, Fig.3a illustrates such a situation, the smaller $d_{v_{hl},v_{pq}}$ is, the more cluster $C_{hl}$ and $C_{pq}$ overlap with cluster $C_{(hl,pq)}$ . Thus, the inter-cluster similarity is inversely proportional to the $d_{v_{hl},v_{pq}}$ . Besides, it is meaningful to consider the local spatial density of latent clusters, which reflects whether the existence of latent clusters is justified or not. In Fig. 3b, the centers of clusters A and C are equidistant from the center of cluster B. But Fig. 3b shows that clusters A and B should belong to the same class while cluster C belongs to the other. The main reason is that the latent cluster between cluster A and cluster B possesses a greater local spatial density than which between cluster B and cluster C, i.e., the latent cluster $C_{(A,B)}$ is more plausible than $C_{(B,C)}$ . Based on the above analysis, two conclusions can be drawn as follows: (1) The inter-cluster similarity is inversely proportional to the $d_{v_{hl},v_{pq}}$ . (2) The inter-cluster similarity, also called density, increases while the number of objects inside the latent clusters increases. Thus, the similarity of two clusters is defined as [4]:

$\displaystyle{\delta}{(C_{hl},C_{pq})}=\left\{\begin{array}[]{lcl}\frac{\rho(B% (v_{(hl,pq)}))}{d_{v_{hl},v_{pq}}},&&\text{if }d_{v_{hl},v_{pq}}\leqslant 4% \epsilon,\\ 0,&&\text{otherwise},\\ \end{array}\right.$ (4)

where $\rho(B(v_{(hl,pq)}))$ is the points amount in the $\epsilon$ -neighborhood of $v_{(hl,pq)}$ .

Figure 4.

(a) Constructing the weight graph (b) Cutting the graph.

After achieving the similarity, building a graph that is undirected and weighted to capture inter-cluster relationships. The graph is denoted as $G=<A,\Delta>$ , where $A$ is a vertices set. Every base cluster obtained from the MAGNES algorithm is represented by a vertex in $A$ . Since the objects in base clusters are all fall into the neighborhood of centers, parameter $A$ can also be considered as a set of all centers getting from the MAGNES algorithm. $\Delta$ is the set of edge weights connecting the vertices. This paper stipulates the similarity of any two vertices (clusters) is also the weights of the corresponding edges, i.e., for $C_{i},C_{j}\in A$ , $\Delta_{i,j}=\delta(C_{i},C_{j})$ .

After getting the weighted graph, the construction of cluster relations can be viewed as a graph cut question with following objective function [28]:

$\displaystyle\min\limits_{\Omega}\left[Q(\Omega)=\frac{1}{k}\sum\limits_{j=1}^% {k}\frac{\sum\limits_{x\in A_{j},y\in{A-A_{j}}}\Delta_{x,y}}{\sum\limits_{x\in A% _{j},z\in A}\Delta_{x,z}}\right],$ (5)

$\Omega=\{A_{j}\}^{k}_{j=1}$ is a partition of vertices and $A_{j}$ is the $j$ th subset of $A$ . By minimizing the $Q$ , each subset includes highly similar vertices while these vertices are very dissimilar with those in other subsets. The normalized spectral clustering (NSC) algorithm [23] is an optimal choice to solve this problem. Thus, define the sub-graph label to which the cluster $C_{x}$ belongs as $L(C_{x})$ , for $x\in A$ , $j$ in $[1,k]$ .

$\displaystyle L(C_{x})=j,\text{if }C_{x}\in A_{j},$ (6)

The relation constructing procedure’s time complexity is $O(N(\sum_{h=1}^{T}k_{h})^{2})$ [5]. Figure 4 depicts the process of structuring the above relationships. The construction of the relationship graph for the 12 basic clusters which obtained in Fig. 3 is illustrated in Fig. 4a and b shows the graph cut result employing the NSC algorithm.

3.4 Generate final clustering

After fixing all cluster labels, for $i$ in $[1,N]$ and $j$ in $[1,k]$ , the label credibility matrix $M^{*}$ of the final clustering $\pi^{*}$ is calculated through following equation:

$\displaystyle\omega^{*}_{ji}=\sum\limits_{hl\in A}\lambda_{hl}(x_{i})\omega_{% \textit{hli}},$ (7)

The final clustering can be obtained by $M^{*}$ by the following equation:

$\displaystyle\pi^{*}(x_{i})=\arg\max_{j=1}^{k}\omega^{*}_{ji}.$ (8)

The generation of the final clustering has a time complexity of $O(NT)$ . From the above procedures, an algorithm called Multiple AGglomerative NESting Clustering Ensemble (MAGNESCE) is formed. Algorithm 3.4 describes such an algorithm. The time complexity of MAGNESCE algorithm is $O(N\sum^{T}_{h=1}t_{h}k_{h}+N\sum_{h=1}^{T}k_{h}+N(\sum_{h=1}^{T}k_{h})^{2}+NT)$ .

MAGNESCE: Multiple AGglomerative NESting Clustering ensemble AlgorithmDataset $X$ , sub-graph number $k$ , Cluster number $K$ , $\epsilon$ , $T_{\max}$ Final result $\pi^{*}$ $M$ obtained by Algorithm 1 $i=1:N$ $h=1:\sum_{h=1}^{T}k_{h}$ Calculate $\lambda_{hl}(x_{i})$ by Eq. (1) A = a set of cluster center $i,j\in A$ $\Delta_{i,j}=\delta(C_{i},C_{j})$ by Eq. (4) Obtain a relation graph $G=<A,\Delta>$ where $\Delta=\{\Delta_{i,j}\}_{i,j\in A}$ $\Omega$ is obtained by using the NSC algorithm Using Eq. (6) to correct cluster labels Using Eq. (8) to get the final result

4. Experiments

Table 1
Details of datasets: Amount of data (N), Data Dimension (D), Category Number (k)

	Dataset name	N	D	K
Synthetic data	Ring	1,500	2	3
	Atom	800	3	2
	T4.8k	7,235	2	6
	Chainlink	1,000	3	2
	Flame	240	2	2
	Complex	3,031	2	9
	Agg	788	2	7
	Jain	373	2	2
Real data	Digits	5,620	63	10
	Wine	178	13	3
	Statlog	6,435	36	7
	Iris	150	4	3
	Breast	569	30	2

The MAGNESCE algorithm is employed on above datasets and its effectiveness is evaluated by two clustering evaluation metrics. The experiments were conducted in the python programming environment. The hardware conditions are: OS: Windows enterprise ltsc(x64); processor: Intel(R) Core(TM) i5-8400 CPU @ 2.80 GHZ 2.81 GHZ; RAM: 16 GB.

4.1 Details of datasets

Since the real datasets usually have missing values, the algorithm performance will be affected if the missing values cannot be handled effectively. For better comparison, five representative real data that have been processed well are selected in this section. The processing of missing values will be further discussed in the following paper. The datasets utilized are detailed in Table 1. Figure 5 depicts the images of the synthetic dataset, all datasets were obtained from [6, 1].

4.2 Evaluation criteria

Two common-used external criteria, adjusted rand index (ARI) [26] and normalized mutual information (NMI) [25], are used in this paper as similarity measures to assess the merits of clustering results. Assume a dataset $X$ of number $N$ , the overlap of the true partition $P=\{p_{1},p_{2},\ldots,p_{k^{\prime}}\}$ and the clustering results $C=\{c_{1},c_{2},\ldots,c_{k}\}$ is shown in Table 2. $n_{ij}$ is common point number contained in clusters $p_{j}$ and $c_{i}$ , which can be expressed as: $n_{ij}=|c_{i}\cap p_{j}|$ .

With values from Table 2, the ARI [26] is defined as

$\displaystyle\textit{ARI}=\frac{\sum_{ij}\binom{n_{ij}}{2}-[\sum_{i}\binom{b_{% i}}{2}\sum_{j}\binom{d_{j}}{2}]/\binom{N}{2}}{\frac{1}{2}[\sum_{i}\binom{b_{i}% }{2}+\sum_{j}\binom{d_{j}}{2}]-[\sum_{i}\binom{b_{i}}{2}\sum_{j}\binom{d_{j}}{% 2}]/\binom{N}{2}}.$

NMI [25] is calculated as

$\displaystyle\textit{NMI}=\frac{2\sum_{i}\sum_{j}n_{ij}\log\frac{n_{ij}N}{b_{i% }d_{j}}}{-\sum_{i}b_{i}\log\frac{b_{i}}{N}-\sum_{j}d_{j}\log\frac{d_{j}}{N}}.$

Table 2
Similar table of the two classification

C $\backslash$ P	$p_{1}$	$p_{2}$	$\ldots$	$p_{k^{\prime}}$	Sums
$c_{1}$	$n_{11}$	$n_{12}$	$\ldots$	$n_{1k^{\prime}}$	$b_{1}$
$c_{2}$	$n_{21}$	$n_{22}$	$\ldots$	$n_{2k^{\prime}}$	$b_{2}$
$\vdots$	$\vdots$	$\vdots$	$\ddots$	$\vdots$	$\vdots$
$c_{k}$	$n_{k1}$	$n_{k2}$	$\ldots$	$n_{kk^{\prime}}$	$b_{k}$
Sums	$d_{1}$	$d_{2}$	$\ldots$	$d_{k^{\prime}}$

Figure 5.

Images of the synthetic data. (a) Ring. (b) Atom. (c) T4.8k. (d) Chainlink. (e) Flame. (f) Complex. (g) Agg. (h) Jain.

The more similar the clustering result and the true partition are, the higher the values of ARI and NMI will be.

4.3 Compared methods

To examine MAGNESCE, the following algorithms are used on fuzzy $k$ -means just like Bai [5]. All of these algorithms are open and available.

•
The co-association similarity martix (CO) and the three link-based methods WCT, WTQ and CSM are the pairwise similarity method. To get the final solution, the average-link (AL) algorithms is used.
•
Algorithms like CSPA, HGPA, MCLA proposed by Strehl and Ghosh are the graph-based methods.
•
Feature-based methods contain the expectation maximization (EM) algorithm proposed by Topchy [30] and IVC algorithm proposed by Nguyen [24].
•
The multiple clustering algorithms FMKCE proposed by Bai [5].

Besides, to demonstrate the ability of MAGNESCE to identify non-convex divisions, three nonlinear algorithms still need to be taken into account: NSC (normalized spectral clustering algorithm) [23], DBSCAN (density-based spatial clustering of applications with noise) [12] and CFSFDP [27].

Table 3
ARI values of different methods

Methods Synthetic datasets Real datasets

Ring Atom T4.8k Chainlink Flame Complex Agg Jain Digits Wine Statlog Iris Breast

CO-AL 0.1305 0.1456 0.5098 0.0927 0.4880 0.3726 0.6245 0.5853 0.6050 0.8471 0.5700 0.7302 0.7302

WCT-AL 0.1382 0.1456 0.4952 0.0927 0.4880 0.3635 0.7342 0.5853 0.6046 0.8471 0.5699 0.7302 0.7302

WTQ-AL 0.1389 0.1456 0.3326 0.0927 0.4880 0.3705 0.7081 0.5853 0.6049 0.8471 0.5699 0.7302 0.7302

CSM-AL 0.1448 0.1456 0.4956 0.0927 0.4880 0.4199 0.7192 0.5853 0.6146 0.8471 0.5699 0.7302 0.7302

CSPA 0.3163 0.0021 0.5010 0.0927 0.4312 0.3418 0.5365 0.2774 0.7573 0.7808 0.4329 0.6521 0.3414

HGPA 0.0004 0.0013 0.4012 0.0010 0.0038 0.1966 0.3621 0.0021 0.3750 0.1286 0.2619 0.1026 0.0007

MCLA 0.0004 0.1554 0.5018 0.0927 0.4880 0.3736 0.5778 0.5853 0.6935 0.8471 0.5127 0.7302 0.7302

EM 0.0302 0.2617 0.4775 0.0896 0.4164 0.3240 0.5682 0.5151 0.6205 0.7855 0.5074 0.6008 0.6328

IVC 0.3231 0.1178 0.4894 0.0927 0.3708 0.4097 0.5783 0.1288 0.6006 0.6875 0.4188 0.5970 0.0487

NSC 1.0000 1.0000 0.9260 1.0000 0.8382 0.9848 0.9045 1.0000 0.7536 0.9310 0.5308 0.7455 0.7493

DBSCAN 1.0000 0.3786 0.7780 0.4947 0.2270 0.8513 0.6294 0.2824 0.5052 0.3587 0.4319 0.5162 0.0478

CFSFDP 0.3227 0.4154 0.6098 0.6853 0.9337 0.8043 0.9898 0.6438 0.7584 0.7414 0.4963 0.7028 0.7305

FKMCE 1.0000 1.0000 0.9786 1.0000 0.9539 0.9891 0.9909 1.0000 0.8430 0.8834 0.6544 0.8296 0.7700

MAGNESCE 1.0000 1.0000 0.9998 1.0000 0.9833 1.0000 0.9971 1.0000 0.9101 0.8992 0.6115 0.9038 0.8175

Table 4
NMI values of different methods

Methods Synthetic datasets Real datasets

Ring Atom T4.8k Chainlink Flame Complex Agg Jain Digits Wine Statlog Iris Breast

CO-AL 0.2112 0.2631 0.6601 0.0686 0.4420 0.6343 0.7522 0.5533 0.7307 0.8347 0.6322 0.7582 0.6231

WCT-AL 0.2162 0.2631 0.6546 0.0686 0.4420 0.6302 0.8291 0.5533 0.7305 0.8347 0.6321 0.7582 0.6231

WTQ-AL 0.2174 0.2631 0.5027 0.0686 0.4420 0.6370 0.8003 0.5533 0.7306 0.8347 0.6321 0.7582 0.6231

CSM-AL 0.2211 0.2631 0.6563 0.0686 0.4420 0.6630 0.7993 0.5533 0.7309 0.8347 0.6321 0.7582 0.6231

CSPA 0.3785 0.0024 0.6233 0.0686 0.4049 0.6071 0.7200 0.3631 0.7857 0.7771 0.5425 0.6803 0.2981

HGPA 0.0008 0.0000 0.5170 0.0000 0.0000 0.3656 0.4088 0.0000 0.4932 0.1705 0.3260 0.1609 0.0007

MCLA 0.0013 0.2713 0.6418 0.0686 0.4420 0.6334 0.7515 0.5533 0.7627 0.8347 0.5903 0.7582 0.6231

EM 0.1495 0.3404 0.6197 0.0663 0.3780 0.5730 0.7295 0.4869 0.7271 0.7980 0.5837 0.6727 0.5400

IVC 0.3813 0.1942 0.6342 0.0686 0.3360 0.6467 0.7303 0.1217 0.7208 0.7281 0.5256 0.6801 0.0415

NSC 1.0000 1.0000 0.9538 1.0000 0.7770 0.9853 0.9271 1.0000 0.8119 0.9016 0.6243 0.7980 0.6328

DBSCAN 1.0000 0.2773 0.7926 0.4828 0.2070 0.8719 0.6835 0.2561 0.7163 0.4451 0.5021 0.5904 0.0303

CFSFDP 0.3792 0.4592 0.7131 0.6544 0.8883 0.8451 0.9851 0.5960 0.8645 0.7528 0.5644 0.7277 0.6152

FKMCE 1.0000 1.0000 0.9840 1.0000 0.9028 0.9946 0.9869 1.0000 0.8919 0.8667 0.6774 0.8381 0.6667

MAGNESCE 1.0000 1.0000 0.9994 1.0000 0.9635 1.0000 0.9958 1.0000 0.9147 0.8782 0.6656 0.8851 0.7459

4.4 Experimental settings

Methods	Synthetic datasets	Real datasets
CO-AL	0.1305	0.1456	0.5098	0.0927	0.4880	0.3726	0.6245	0.5853	0.6050	0.8471	0.5700	0.7302	0.7302
WCT-AL	0.1382	0.1456	0.4952	0.0927	0.4880	0.3635	0.7342	0.5853	0.6046	0.8471	0.5699	0.7302	0.7302
WTQ-AL	0.1389	0.1456	0.3326	0.0927	0.4880	0.3705	0.7081	0.5853	0.6049	0.8471	0.5699	0.7302	0.7302
CSM-AL	0.1448	0.1456	0.4956	0.0927	0.4880	0.4199	0.7192	0.5853	0.6146	0.8471	0.5699	0.7302	0.7302
CSPA	0.3163	0.0021	0.5010	0.0927	0.4312	0.3418	0.5365	0.2774	0.7573	0.7808	0.4329	0.6521	0.3414
HGPA	0.0004	0.0013	0.4012	0.0010	0.0038	0.1966	0.3621	0.0021	0.3750	0.1286	0.2619	0.1026	0.0007
MCLA	0.0004	0.1554	0.5018	0.0927	0.4880	0.3736	0.5778	0.5853	0.6935	0.8471	0.5127	0.7302	0.7302
EM	0.0302	0.2617	0.4775	0.0896	0.4164	0.3240	0.5682	0.5151	0.6205	0.7855	0.5074	0.6008	0.6328
IVC	0.3231	0.1178	0.4894	0.0927	0.3708	0.4097	0.5783	0.1288	0.6006	0.6875	0.4188	0.5970	0.0487
NSC	1.0000	1.0000	0.9260	1.0000	0.8382	0.9848	0.9045	1.0000	0.7536	0.9310	0.5308	0.7455	0.7493
DBSCAN	1.0000	0.3786	0.7780	0.4947	0.2270	0.8513	0.6294	0.2824	0.5052	0.3587	0.4319	0.5162	0.0478
CFSFDP	0.3227	0.4154	0.6098	0.6853	0.9337	0.8043	0.9898	0.6438	0.7584	0.7414	0.4963	0.7028	0.7305
FKMCE	1.0000	1.0000	0.9786	1.0000	0.9539	0.9891	0.9909	1.0000	0.8430	0.8834	0.6544	0.8296	0.7700
MAGNESCE	1.0000	1.0000	0.9998	1.0000	0.9833	1.0000	0.9971	1.0000	0.9101	0.8992	0.6115	0.9038	0.8175

Methods	Synthetic datasets	Real datasets
CO-AL	0.2112	0.2631	0.6601	0.0686	0.4420	0.6343	0.7522	0.5533	0.7307	0.8347	0.6322	0.7582	0.6231
WCT-AL	0.2162	0.2631	0.6546	0.0686	0.4420	0.6302	0.8291	0.5533	0.7305	0.8347	0.6321	0.7582	0.6231
WTQ-AL	0.2174	0.2631	0.5027	0.0686	0.4420	0.6370	0.8003	0.5533	0.7306	0.8347	0.6321	0.7582	0.6231
CSM-AL	0.2211	0.2631	0.6563	0.0686	0.4420	0.6630	0.7993	0.5533	0.7309	0.8347	0.6321	0.7582	0.6231
CSPA	0.3785	0.0024	0.6233	0.0686	0.4049	0.6071	0.7200	0.3631	0.7857	0.7771	0.5425	0.6803	0.2981
HGPA	0.0008	0.0000	0.5170	0.0000	0.0000	0.3656	0.4088	0.0000	0.4932	0.1705	0.3260	0.1609	0.0007
MCLA	0.0013	0.2713	0.6418	0.0686	0.4420	0.6334	0.7515	0.5533	0.7627	0.8347	0.5903	0.7582	0.6231
EM	0.1495	0.3404	0.6197	0.0663	0.3780	0.5730	0.7295	0.4869	0.7271	0.7980	0.5837	0.6727	0.5400
IVC	0.3813	0.1942	0.6342	0.0686	0.3360	0.6467	0.7303	0.1217	0.7208	0.7281	0.5256	0.6801	0.0415
NSC	1.0000	1.0000	0.9538	1.0000	0.7770	0.9853	0.9271	1.0000	0.8119	0.9016	0.6243	0.7980	0.6328
DBSCAN	1.0000	0.2773	0.7926	0.4828	0.2070	0.8719	0.6835	0.2561	0.7163	0.4451	0.5021	0.5904	0.0303
CFSFDP	0.3792	0.4592	0.7131	0.6544	0.8883	0.8451	0.9851	0.5960	0.8645	0.7528	0.5644	0.7277	0.6152
FKMCE	1.0000	1.0000	0.9840	1.0000	0.9028	0.9946	0.9869	1.0000	0.8919	0.8667	0.6774	0.8381	0.6667
MAGNESCE	1.0000	1.0000	0.9994	1.0000	0.9635	1.0000	0.9958	1.0000	0.9147	0.8782	0.6656	0.8851	0.7459

Before comparing, certain parameters should be set in advance, which are listed as follows:

1)
Since MAGNESCE is based on the hierarchical clustering AGNES algorithm. Use the average-link to calculate the inter-cluster distance. Let the value of $k_{h}$ equals the true classification number and the parameter $T_{\max}=1000$ , so $\epsilon$ can control the base clusterings’ number. For the normalized data X, parameter $\epsilon$ is set by clamping method. Define $d_{\textit{magnesce}}=\frac{1}{C_{N}^{2}}\sum\limits_{x_{j}\in X,x_{j}\neq x_{% i}}d_{x_{i},x_{j}}$ . To begin, use $\tilde{d_{\textit{magnesce}}}/10$ as the step to search $\epsilon$ in the interval $(0,\tilde{d_{\textit{magnesce}}}]$ , then select the first two groups of $\epsilon$ values with the best evaluation indexes as the two endpoints of the pinch interval, continue searching for the best $\epsilon$ , and continuously reduce the range of the pinch interval until the optimal values of ARI and NMI converge to obtain the optimal $\epsilon$ value.
2)
The DBSCAN, CFSFDP and FKMCE also need to find the right parameter $\epsilon$ . To get such a parameter, use $\tilde{d}=\frac{1}{n}\sum_{i=1}^{n}\sqrt{{||x_{i}-\tilde{x}||}^{2}}$ , $\tilde{x}=\sum_{j=1}^{n}\frac{x_{j}}{n}$ . Select the $\epsilon$ in the step size of $\tilde{d}/10$ in $[\tilde{d}/10,\tilde{d}]$ . For comparison purposes, each algorithm selects the highest ARI and NMI.
3)
For the NSC algorithm, using Gaussian kernel and its kernel parameter $\delta^{2}$ is searched in $[0.1,2]$ with $0.1$ steps. Each algorithm selects the highest ARI and NMI.

4.5 Experimental results

The values of ARI and NMI of these algorithms are shown in Tables 4 and 4, respectively. Some of these data are from the [5, 4]. By bolding the highest value, it is easy to see that the MAGNESCE algorithm has superior accuracy to other algorithms on given datasets. The reason is that the fuzzy $k$ -means fails to divide part of the data correctly. If lacking an evaluation about the label-credibility, ensemble algorithms may not recognize datasets with different-shape clusters. By considering the label credibility, the FKMCE algorithm and the MAGNESCE algorithm can recognize these clusters. Due to AGNES having a better performance than fuzzy k-means and centers of base clusterings in MAGNESCE can better reflect the points in its neighborhood, the MAGNESCE algorithm proposed in this paper has superior performance than FKMCE. Moreover, as a form of hierarchical clustering, AGNES is far less capable of identifying non-convex datasets than those nonlinear clustering algorithms such as DBSCAN. However, the table shows that the validity of the MAGNESCE algorithm is even stronger than the three nonlinear clustering algorithms(NSC,DBSCAN,CFSFDP) in most cases. This means that from a performance point of view, the suggested structure is not inferior to the nonlinear clustering method.

According to Tables 4 and 4, the performance of ARI and NMI on real datasets is generally inferior to that on artificial datasets. This is since real datasets have a higher number of features than artificial datasets, leading to a strong sparsity of the similarity matrix $\Delta$ in the algorithm process, which interferes with the final spectral clustering. The clustering outcomes of different clustering algorithms vary as the dimensionality of the data rises, depending on the internal structure of the data. For example, our algorithm still lags behind spectral clustering on the Wine dataset, but it outperforms spectral clustering on the Digits dataset. Overall, the average of our algorithm’s metrics is higher than that of other algorithms. Our upcoming research will begin with dimensionality reduction of real data to improve the algorithm’s performance on the real dataset.

In addition, when $T_{\max}$ is set too large, the number of base algorithms is parameter $\epsilon$ dependent. This part tests the $\epsilon$ ’s influence on the structure. Set $T_{\max}=1000$ and the iris and wine data are taken as examples, as can be seen from Figs 6a and 7a, the value of $T$ tends to decrease gradually as the $\epsilon$ increases. But the subgraph(b) in both Figs 6 and 7 demonstrates the values of ARI and NMI do not follow this trend. ARI and NMI will peak at a certain value of $\epsilon$ . Thus, the value of parameter $\epsilon$ is crucial if users wish to obtain good results with the MAGNESCE algorithm.

Figure 6.

(a) Effect of parameter $\epsilon$ on the value of $T$ on the iris data. (b) Effect of parameter $\epsilon$ on the ARI and NMI of the iris data.

Figure 7.

(a) Effect of parameter $\epsilon$ on the value of $T$ on the wine data. (b) Effect of parameter $\epsilon$ on the ARI and NMI of the wine data.

4.6 Effect of noise points and missing values

Noise points and missing values are the two main factors, which affecting the clustering performance. The effects of different treatments of noise points and missing values on the clustering results are discussed in this section.

4.6.1 Effect of noise points

This part takes the T4.8k dataset as an example for noise point discussion. Noise points are added to the T4.8k dataset, named T4.8k_noise. The clustering pictures of the algorithm on the original dataset and the noise dataset are shown in Fig. 8a and b. A comparison of the two datasets is shown in Table 5. Table 6 shows the ARI and NMI values of the MAGNESCE algorithm on the two datasets. Table 6 shows that the ARI and NMI values of the noise set are lower than those of the original dataset due to the presence of noise points.

From the above results, it is clear that noise points will affect the algorithm’s performance to some amount. As a consequence, while performing cluster analysis, pre-processed noise-free point data is typically employed to achieve the best results.

Table 5
Comparison of original and noisy T4.8k dataset

Data set	N	D	K
T4.8k_origin	7235	2	6
T4.8k_noise	7901	2	7

Table 6

ARI and NMI for the original and noisy datasets

Data set	ARI	NMI
T4.8k_origin	0.9998	0.9994
T4.8k_noise	0.8509	0.8559

Figure 8.

(a) No noise points (b) With noise points.

4.6.2 Effect of missing values

Table 7
Description of the Dermatology dataset

Data set	Dermatology
N	358
D	34
K	6
M	8

Table 8

ARI and NMI obtained by different processing methods

Data set	ARI	NMI
Dermatology_delet	0.8443	0.9022
Dermatology_mean	0.8096	0.8684

Given that the real dataset contains a certain number of missing values under normal circumstances, this section will investigate the impact of various missing value processing methods on the algorithm’s performance. This section makes use of a dermatology dataset [1]. The detail of the dermatology dataset is in Table 7. Table 7’s final line displays the number of missing values (M). The data collection has 8 missing values. To handle these 8 missing values, two normal methods are used: the deletion method and the mean filling method. The deletion method deletes the data containing the missing value, whereas the mean filling method fills the missing value with the mean value of the column containing the missing value.

We used the above two methods on the dermatology dataset and obtained two new datasets, named Dermatology_delet and Dermatology_mean. Table 8 shows a comparison of the algorithm’s ARI and NMI on these two datasets. It shows that MAGNESCE has higher ARI and NMI values on the Dermatology_delet dataset than the values on the Dermatology_mean dataset. This is because the deletion method directly deletes the data with missing values without changing the original attributes of the data, whereas the mean filling method is very likely to cause a certain degree of deviation in the original data, and affects the algorithm’s clustering performance. As a result, when there are only a few missing values, the deletion method should be considered.

5. Conclusion

This paper proposes an ensemble framework called MAGNESCE to integrate AGNESs. The algorithm is divided into three main steps: constructing multiple AGNES algorithms, extracting label-credible objects in the local space of cluster centers, and producing the relation among clusters. This strategy improves the robustness and quality of AGNES so that arbitrarily shaped clusters can be well-identified. This paper also compares the MAGNESCE algorithm to other commonly used clustering algorithms and a multiple fuzzy k-means ensemble algorithm on several synthetic and real datasets. The experimental results show that the MAGNESCE algorithm is more effective than these algorithms. Furthermore, we hope to propose an efficient and general ensemble framework in the future, which does not require any more complex parameter adjustment operations.

Footnotes

Acknowledgments

The authors are very grateful to the editors and reviewers for their valuable comments and suggestions to improve the quality of this manuscript.

References

[dataset]uci machine learning repository. http://archive.ics.uci.edu/ml/datasets/. Accessed 2020.

Avogadri

and Valentini

, Fuzzy ensemble clustering based on random projections for dna microarray data analysis, Artificial Intelligence in Medicine 45(2–3) (2009), 173–183.

Ayad

H.G.

and Kamel

M.S.

, Cluster-based cumulative ensembles, in: International Workshop on Multiple Classifier Systems, pages 236–245, Springer, 2005.

Bai

Liang

and Cao

, A multiple k-means clustering ensemble algorithm to find nonlinearly separable clusters, Information Fusion 61 (2020), 36–47.

Bai

Liang

and Guo

, An ensemble clusterer of multiple fuzzy k-means clusterings to recognize arbitrarily shaped clusters, IEEE Transactions on Fuzzy Systems 26(6) (2018), 3524–3533.

Barton

, [dataset]clustering benchmarks. https://github.com/deric/clustering-benchmark. Accessed 2020.

Boulis

and Ostendorf

, Combining multiple clustering systems, in: European Conference on Principles of Data Mining and Knowledge Discovery, pages 63–74, Springer, 2004.

Bradley

P.S.

and Fayyad

U.M.

, Refining initial points for k-means clustering, in: ICML, Vol. 98, pages 91–99, Citeseer, 1998.

Dimitriadou

Weingessel

and Hornik

, Voting-merging: An ensemble method for clustering, in: International Conference on Artificial Neural Networks, pages 217–224, Springer, 2001.

10.

Dimitriadou

Weingessel

and Hornik

, A combination scheme for fuzzy clustering, International Journal of Pattern Recognition and Artificial Intelligence 16(07) (2002), 901–912.

11.

Dudoit

and Fridlyand

, Bagging to improve the accuracy of a clustering procedure, Bioinformatics 19(9) (2003), 1090–1099.

12.

Ester

Kriegel

H.-P.

Sander

et al., A density-based algorithm for discovering clusters in large spatial databases with noise, in: Kdd, Vol. 96, pages 226–231, 1996.

13.

Fern

X.Z.

and Brodley

C.E.

, Random projection for high dimensional data clustering: A cluster ensemble approach, in: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 186–193, 2003.

14.

Fischer

and Buhmann

J.M.

, Bagging for path-based clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 25(11) (2003), 1411–1415.

15.

Fred

A.L.

and Jain

A.K.

, Combining multiple clusterings using evidence accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6) (2005), 835–850.

16.

Frossyniotis

Pertselakis

and Stafylopatis

, A multi-clustering fusion algorithm, in: Hellenic Conference on Artificial Intelligence, pages 225–236, Springer, 2002.

17.

Gionis

Mannila

and Tsaparas

, Clustering aggregation, Acm Transactions on Knowledge Discovery from Data (Tkdd), 1(1) (2007), 4–es.

18.

Han

Kamber

and Pei

, Data mining concepts and techniques third edition, The Morgan Kaufmann Series in Data Management Systems 5(4) (2011), 83–124.

19.

Iam-On

Boongoen

Garrett

and Price

, A link-based approach to the cluster ensemble problem, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(12) (2011), 2396–2409.

20.

Kaufman

and Rousseeuw

P.J.

, Finding groups in data: an introduction to cluster analysis, Vol. 344. John Wiley & Sons, 2009.

21.

Hao

and Li

, Clustering ensembles based on normalized edges, in: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 664–671, Springer, 2007.

22.

Munkres

, Algorithms for the assignment and transportation problems, Journal of the Society for Industrial and Applied Mathematics 5(1) (1957), 32–38.

23.

A.Y.

Jordan

M.I.

Weiss

et al., On spectral clustering: Analysis and an algorithm, Advances in Neural Information Processing Systems 2 (2002), 849–856.

24.

Nguyen

and Caruana

, Consensus clusterings, in: Seventh IEEE International Conference on Data Mining (ICDM 2007), pages 607–612, IEEE, 2007.

25.

Press

W.H.

William

Teukolsky

S.A.

Saul

Vetterling

W.T.

and Flannery

B.P.

, Numerical recipes 3rd edition: The art of scientific computing, Cambridge university press, 2007.

26.

Rand

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association 66(336) (1971), 846–850.

27.

Rodriguez

and Laio

, Clustering by fast search and find of density peaks, Science 344(6191) (2014), 1492.

28.

Shi

and Malik

, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000), 888–905.

29.

Strehl

and Ghosh

, Cluster ensembles – a knowledge reuse framework for combining multiple partitions, Journal of Machine Learning Research 3(Dec) (2002), 583–617.

30.

Topchy

Jain

A.K.

and Punch

, Clustering ensembles: Models of consensus and weak partitions, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(12) (2005), 1866–1881.

31.

Topchy

A.P.

Law

M.H.

Jain

A.K.

and Fred

A.L.

, Analysis of consensus partition in cluster ensemble, in: Fourth IEEE International Conference on Data Mining (ICDM’04), pages 225–232, IEEE, 2004.

32.

Wong

H.-S.

and Wang

, Graph-based consensus clustering for class discovery from gene expression data, Bioinformatics 23(21) (2007), 2888–2896.

33.

Zhou

Z.-H.

, Ensemble methods: foundations and algorithms, CRC press, 2012.

34.

Zhou

Z.-H.

and Tang

, Clusterer ensemble, Knowledge-Based Systems 19(1) (2006), 77–83.

35.

MacQueen

et al., Some methods for classification and analysis of multivariate observations, in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, Oakland, CA, USA, 1967, pp. 281–297.

36.

Bezdek

J.C.

, Pattern recognition with fuzzy objective function algorithms, Springer Science & Business Media, 2013.

37.

Hathaway

R.J.

and Bezdek

J.C.

, Recent convergence results for the fuzzy c-means clustering algorithms, Journal of Classification 5(2) (1988), 237–247.

38.

Ayad

and Kamel

, Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors, in: International Workshop on Multiple Classifier Systems, Springer, 2003, pp. 166–175.

39.

Bezdek

J.C.

and Pal

N.R.

, Some new indexes of cluster validity, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 28(3) (1998), 301–315.

40.

Pal

N.R.

and Bezdek

J.C.

, On cluster validity for the fuzzy c-means model, IEEE Transactions on Fuzzy Systems 3(3) (1995), 370–379.

41.

Rathore

Bezdek

J.C.

Erfani

S.M.

Rajasegarar

and Palaniswami

, Ensemble fuzzy clustering using cumulative aggregation on random projections, IEEE Transactions on Fuzzy Systems 26(3) (2018), 1510–1524.

A multiple hierarchical clustering ensemble algorithm to recognize clusters arbitrarily shaped

Abstract

Keywords

1. Introduction

3.1 Label credibility function

Table 1 Details of datasets: Amount of data (N), Data Dimension (D), Category Number (k)

4.2 Evaluation criteria

Table 2 Similar table of the two classification

4.6.1 Effect of noise points

Table 5 Comparison of original and noisy T4.8k dataset

Table 7 Description of the Dermatology dataset

Footnotes

Acknowledgments

References

Table 1
Details of datasets: Amount of data (N), Data Dimension (D), Category Number (k)

Table 2
Similar table of the two classification

Table 5
Comparison of original and noisy T4.8k dataset

Table 7
Description of the Dermatology dataset