Manifold regularization ensemble clustering with many objectives using unsupervised extreme learning machines

Abstract

Spectral clustering has been an effective clustering method, in last decades, because it can get an optimal solution without any assumptions on data’s structure. The basic key in spectral clustering is its similarity matrix. Despite many empirical successes in similarity matrix construction, almost all previous methods suffer from handling just one objective. To address the multi-objective ensemble clustering, we introduce a new ensemble manifold regularization (MR) method based on stacking framework. In our Manifold Regularization Ensemble Clustering (MREC) method, several objective functions are considered simultaneously, as a robust method for constructing the similarity matrix. Using it, the unsupervised extreme learning machine (UELM) is employed to find the generalized eigenvectors to embed the data in low-dimensional space. These eigenvectors are then used as the base point in spectral clustering to find the best partitioning of the data. The aims of this paper are to find robust partitioning that satisfy multiple objectives, handling noisy data, keeping diversity-based goals, and dimension reduction. Experiments on some real-world datasets besides to three benchmark protein datasets demonstrate the superiority of MREC over some state-of-the-art single and ensemble methods.

Keywords

Spectral clustering ensemble learning manifold regularization stacking framework similarity matrix unsupervised extreme learning machine

1. Introduction

Spectral clustering becomes one of the most interesting subjects on clustering data, because it can get an optimal solution without any assumptions on data’s structure and can find global optimal results in processing non-convex data [1]. It is simple to implement and solve efficiently by standard linear algebra software. It obtains data representation in the low-dimensional space using eigenvectors, and very often outperforms traditional clustering algorithms such as $k$ -means [2].

In another view, spectral clustering is similar to manifold regularization (MR) that deals with deeper properties of data [3]. This view is considered, in this paper, to construct MR ensemble from data by using different clustering methods. Generally, there are two approaches in MR to deal with traditional risk minimization: localization and penalization. MR uses penalization based on geometry of marginal distribution of data. It is a regularization form for preventing overfitting and ensuring that a problem is well-posed by penalizing complex solutions [3]. More precisely, MR extends the Tikhonov technique of regularization as applied to reproducing kernel Hilbert spaces (RKHSs). In RKHSs, the standard Tikhonov regularization attempts to learn a function $f$ among a hypothesis space of ${\cal H}$ . In such space, we seek a kernel $K$ , which every candidate function $f$ has a norm $||f||_{K}$ , to represent the complexity of that in the hypothesis space. When a candidate function is considered, it takes part its norm to penalize the complex functions [3].

MR adds the intrinsic regularizer as a second regularization term in standard Tikhonov regularization. Under the manifold assumption in machine learning, data come from a nonlinear manifold $M\subset X$ , instead of the entire input space $X$ . The geometry of such manifold is used to determine the regularization norm. As a result, the Laplacian matrix is pertained to structure of this manifold and has the major role using regularization in practice [3]. The Laplacian matrix is calculated from similarity matrix [4]. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of instances in the data. So, one of the most important key in spectral clustering, which handles its performance, is this similarity matrix.

The goal of similarity matrix is to model the local neighborhood relationships between data. There are several popular constructions to transform a given data to similarity matrix. The choice of the similarity matrix influences the spectral clustering result and so, many researches have been done to find the more effective similarity matrices [5]. However, almost all these methods suffer from handling just one objective and cannot serve all the characteristics of data in their similarity matrices. To address this deficit, we introduce Manifold Regularization Ensemble Clustering [2] (MREC) as a new MR ensemble method based on stacking framework to employ the ability of different clustering methods in constructing similarity matrices. These abilities support many implicit objectives such as data density, cluster’s compactness and connectedness, clusters’ isolation and so on.

In our MREC, there are three basic steps. Firstly, the MR ensemble, based on stacking framework, is built with different objectives. Stacking is an ensemble method with different learners (functions) in each bag to cluster the data in different views. This model uses an effective algorithm with different goals to support all kinds of these essences. In the next step, each prediction of stacking bags are aggregated to produce the similarity matrix, according to upper-bound MR ensemble formulas, proposed in this paper, and then the Laplacian matrix is calculated. At last, unsupervised extreme learning machine (UELM) [6] is used to embed the data in new view space and then to find the best partitions.

Some benchmark datasets are used to acknowledge the superiority of MREC. In addition, rigorous statistical tests are conducted to prove its significance. Our contributions, in this paper, is summarized as follows. i) Employing a new method, in stacking framework, to find the similarity matrix while keeping diversity among the results. ii) Supporting many implicit objectives such as data density, cluster’s compactness and connectedness and clusters’ isolation in constructing similarity matrix. iii) Using multi-objective approach to find the best structures of data, based on proposed upperbound formulas on MR ensemble method. iv) Using UELMs in order to get effective learning mechanism in both reproducing kernel Hilbert space and new embedding space.

The rest of this paper is organized as follows. In Section 2, we briefly review some related works on spectral clustering and MR ensemble. Our MREC, as an MR ensemble using UELM, is explained in Section 3. Section 4 reports the experimental results and evaluations in various datasets. Finally, Section 5 presents a summary and some suggestions.

2. Literature review

Spectral clustering is a clustering technique based on graph theory. In the last years, this algorithm attracted more and more attention because of its theoretical foundation and good clustering results [3, 4]. Recently, some researchers have proposed many spectral clustering algorithms. In [7], Shi and Malik proposed normalized cut. Their criterion considers both internal and external connections. It could produce well-balanced clustering results while Fred and Jain applied the agglomerative hierarchical clustering [8].

Tao et al. proposed a robust representation for co-association matrix by low-rank constraint [9]. Their representation could show the cluster structure of a co-association matrix and so handled noises in spectral clustering. In [10], the authors introduced a sub-matrix based approximation that could emerge as a powerful and efficient tool for spectral clustering. They used Fourier features to represent the data objects in the kernel space and then built an NP sub-matrix upon which the efficient eigen-decomposition were performed.

Huang et al. interpreted the sparse sub-matrix as a bipartite graph and then used the transfer cut to efficiently partition the graph and obtain the best clustering result in spectral clustering [11]. Liu et al. [4] proposed an efficient spectral ensemble clustering based on co-association matrix and showed that their method has theoretical equivalence to weighted $k$ -means clustering while vastly reducing the algorithmic complexity. Their method was robust and generalized.

Yousefnezhad and Zhang developed a weighted spectral cluster ensemble with two kernels by exploiting some concepts from community detection arena and graph-based clustering [12]. They provided an effective diversity estimation for individual clustering results by using modularity. Wu et al. [13] changed the ensemble clustering into $k$ -means clustering problem with $k$ -means based consensus clustering (KCC) utility function and gave the sufficient and necessary conditions for KCC functions.

Although these methods had better performance than traditional clustering algorithms, they could only handle one objective. Ensemble clustering, on the other hand, tries to use all the abilities and characteristics of data for better performance. In this regard, different clustering algorithms may produce distinct partitions because they impose different views on the data. The goal of ensemble clustering is to exploit the complementary nature of these diverse partitions. Gullo et al. [14] proposed projective clustering ensembles for high-dimensional problems, based on two representations: object and feature. They used single objective, based on a cost function, and evolutionary algorithm for multi-objectives in their work. After that, they extended [15] the projective clustering ensembles, based on meta-clusters. They aimed to find the correlation between object-based and feature-based representations and proposed a measure that could consider both representations for better performance.

Franek and Jiang [16] research was about ensemble clustering by means of cluster embedding in vector spaces. They computed the median of the vectors and converted these median vectors into a clustering, according to get not bijective mapping. The authors in [17] presented a framework for hierarchical ensemble clustering. They constructed a consensus distance matrix, with aggregate dendrogram distance. They obtained final clusters by hierarchical clustering using the consensus distance matrix. Gonzàlez and Turmo [18] proposed an unsupervised ensemble minority clustering. Their soft-clustering was fuzzy and could assign an accumulative score to each object by an average of a co-association matrix.

Jing et al. proposed a stratified feature sampling method for ensemble clustering for high-dimensional data [19]. They constructed ensembles by clustering of similar features, instead of random feature selection, and selecting features from each cluster. In their method, features were clustered and then each cluster of features were sampled. At last, a graph out of cluster weights was created and partitioned by Jaccard similarity. In [20], authors presented an incremental semi-supervised clustering ensemble for high-dimensional data. Their ensemble was made by random feature subspaces. For each subspace, a clustering was made by e2cp algorithm based on graph. This algorithm considered the limited number of must-links and cannot-links prior knowledge in clustering, and used a graph and label propagation to construct a co-association matrix. At last, $n$ -cut algorithm was used for generating the final clustering.

The above-mentioned researches show that the performance of spectral clustering methods greatly depend on the construction of the similarity matrix. Despite some improvements in constructing the similarity matrices, they often handle just one objective. To address this issue in this paper, we introduce an ensemble clustering method with many objectives in stacking framework to construct the similarity matrix.

3. MREC, the proposed ensemble clustering method

As we know, the basic key in spectral clustering is its similarity matrix, $W=[{w_{ij}}]$ . If spectral clustering cannot exhibit the main manifold structure of data, its performance may be inferior to the $k$ -means clustering. Intuitively, if the structure of data, with different objects, is taken into account in the clustering process, the performance of the ensemble can improve. In this paper, we focus on many objectives in ensemble clustering to find the ensemble manifolds and use the formulas in [8] to find the Laplacian matrix, based on the structure of data. The UELM is then used to embed the data in new space.

3.1 Customizing UELM for MREC

UELM is built on two assumptions: i) Manifold assumption says that the data lies on a manifold with much lower dimension than the input space. Therefore, this manifold can be learnt, using unlabeled data, to decrease the data dimensions and so to avoid the curse of dimensionality. ii) Smoothness assumption imposes that the conditional probabilities of two close instances $x_{i}$ and $x_{j}$ should be similar as well. To enforce this assumption, the MR framework [3] is used to minimize the objective function:

$\displaystyle\hat{L}_{m}=\frac{1}{2}\sum_{i,j}w_{i,j}||\hat{y}_{i}-\hat{y}_{j}% ||^{2}$ (1)

where $w_{ij}$ is the pairwise similarity between $x_{i}$ and $x_{j}$ and $\hat{y}_{i}$ and $\hat{y}_{j}$ are their predictions, respectively. This expression can be simplified in a matrix form as:

$\displaystyle\hat{L}_{m}=Tr(\hat{Y}^{T}L\hat{Y})$ (2)

where $Tr(.)$ indicates the trace of matrix and $L$ is a Laplacian matrix, defined later. Actually, MR brings three distinct concepts together: spectral clustering, manifold learning, and regularization in RKHS. The formula in (2) is just like the spectral clustering functions, used in [3]. The Laplacian matrix $L$ is defined as [4]:

$\displaystyle L=D-W$ (3)

where $D=[{d_{ij}}]$ is a diagonal matrix whose diagonal elements are computed as:

$\displaystyle d_{ii}=\sum_{j}w_{ij}$ (4)

Instead of using $L$ directly, it can be normalized by $D^{-\frac{1}{2}}LD^{-\frac{1}{2}}$ , as mentioned in [4]. In this paper, however, we use MR ensemble [3] and UELM [6] approaches to propose a new ensemble clustering method, MREC, that can estimate the Laplacian matrix and model the structure of data more efficiently.

UELM consists of two main stages. The first stage is to construct a hidden layer with $n_{h}$ neurons that maps the data from input space into $n_{h}$ -dimensional feature space. The second stage finds the output weights, by using a spectral clustering objective function in (2), as regularization in RKHS. The optimization formula of UELM, based on MR ensemble approach in framework of one objective function, is unified as [6]:

$\displaystyle{\min}_{\beta}||\beta||^{2}+\sum_{k}\lambda_{k}Tr({\beta^{T}H^{T}% ({I-({D^{k}})^{-\frac{1}{2}}W^{k}({D^{k}})^{-\frac{1}{2}}})H\beta})$ (5)

where $H$ is the random weights based on the first stage of UELM. Also, $\beta$ is a matrix whose columns are the eigenvectors corresponding to the first smallest eigenvalues of the generalized eigenvalue problem. Indeed, $\beta$ has the most important information about the data in lower dimensions. In Eq. (5), $k$ is the index of bags in the stacking framework and $\lambda$ is the regularization parameter.

Every $k$ th clustering algorithm in MR ensemble considers data in itself point of view and creates a similarity matrix $W^{k}=[{w_{ij}^{k}}]$ for $k=1,\ldots,n_{a}$ . To solve this multi-objective function, it is common to find an upper bound and then minimize it:

$\displaystyle||\beta||^{2}+\sum_{k}\lambda_{k}Tr({\beta^{T}H^{T}({I-({D^{k}})^% {-\frac{1}{2}}W^{k}({D^{k}})^{-\frac{1}{2}}})H\beta})\leqslant$ (6) $\displaystyle||\beta||^{2}+\lambda Tr(\beta^{T}H^{T}(I-D^{*-\frac{1}{2}}WD^{*-% \frac{1}{2}})H\beta)$

where

$\displaystyle W=\sum_{k}W^{k}$ (7)

and $D^{\ast}$ is defined as:

$\displaystyle d_{ii}^{\ast}=\sum_{j}\sum_{k}w_{ij}^{k}$ (8)

Also, $\lambda$ , as the regularization parameter, would be:

$\displaystyle\lambda={\max}_{k}\lambda_{k}$ (9)

The final optimization formula will be as:

$\displaystyle{\min}_{\beta}||\beta||^{2}+\lambda Tr({\beta{}^{T}H^{T}L^{\ast}H% \beta})$ (10)

where $L^{\ast}$ implicitly induce a structure on data in the ambient space:

$\displaystyle L^{\ast}=I-D^{*-\frac{1}{2}}WD^{*-\frac{1}{2}}$ (11)

According to experiments, this Laplacian matrix $L^{\ast}$ corresponds to more accurate underlying structure of data. This formulation would always be minimum at $\beta=0$ . To avoid a degenerated solution, additional constraint is introduced [6] as:

$\displaystyle{\min}_{\beta}||\beta||^{2}+\lambda Tr({\beta^{T}H^{T}L^{\ast}H% \beta})\text{ s.t. }({H\beta})^{T}H\beta=I_{n_{o}}$ (12)

An optimal solution to Eq. (12) is given by choosing $\beta$ as the matrix whose columns are the normalized eigenvectors corresponding to the first $n_{o}$ smallest eigenvalues of the generalized eigenvalue problem [6]:

$\displaystyle Av=\gamma Bv$ (13)

where $\gamma$ and $v$ are eigenvalues and eigenvectors, respectively. Also, $A=I_{n_{h}}+\lambda H^{T}L^{\ast}H$ and $B=H^{T}H$ . Let $\gamma_{1},\gamma_{2},\ldots,\gamma_{n_{o}+1}({\gamma_{1}\leqslant\gamma_{2}% \leqslant\ldots\leqslant\gamma_{n_{o}+1}})$ be the $n_{o}+1$ smallest eigenvalues of Eq. (13) and $v_{1},v_{2},\ldots,v_{n_{o}+1}$ be their corresponding eigenvectors. Then, the solution to the output weights $\beta$ is given by:

$\displaystyle\beta^{\ast}=\left[{\frac{v_{2}}{||Hv_{2}||},\frac{v_{3}}{||Hv_{3% }||},\ldots,\frac{v_{n_{o}+1}}{||Hv_{n_{o}+1}||}}\right]$ (14)

Using the output weights $\beta^{\ast}$ , the embedded data in new $n_{o}$ -dimensional feature space, $E$ , is computed as:

$\displaystyle E=H\beta^{\ast}$ (15)

In order to reach this point, Eq. (7) needs to know how to find $W^{k}$ .

3.2 Clustering with different objectives to build the similarity matrix

Clustering with different objectives, in which multiple objective functions are simultaneously optimized, has been used as a robust alternative method and has become popular in the past decades [21]. In this section, we propose a stacking framework, which uses some clustering algorithms with multiple objectives, in order to build a diverse ensemble of clusters and then to fill the similarity matrix, $W$ . The clustering results would be improved through properly combining the multiple learners.

Stacking is an efficient ensemble model that has significant performance despite its simplicity. In order to get advantages of all clustering algorithms, including isolation, compactness and connectedness of clusters, modelling data density, minimizing entropy, and so on, we try to choose clustering algorithms from all categories to keep diversity among clusters and get better performance. These categories, including some effective algorithms with different objects, are outlined here.

The partitioning approaches, which construct various partitions and evaluates them by some criteria, include $k$ -means [2], $k$ -medoids [22], and CLARANS [23]. The hierarchical schemes, which use criteria to create a hierarchical decomposition of data, are Diana [24], Agnes [24], BIRCH [25], and CURE [26]. The density-based methods, which create clusters based on connectivity and density functions, consist of DBSCAN [27], OPTICS [28], and Denclue [29]. The model-based algorithms, which produce a model that is hypothesized for each of the clusters and tries to find the best fit of model to each cluster, include SOM [30] and COBWEB [31]. At last, frequent pattern-based method, which is created based on the analysis of frequent patterns is pCluster [32]. In this paper, however, we use $k$ -means of partitioning approaches, DBSCAN for keeping density, CURE for hierarchy and distance learning, SOM for model-based, and P-Cluster of frequent pattern-based method. Figure 1 shows the overall block diagram of our MREC.

Figure 1.

A brief view of MREC.

By using $n_{a}$ clustering methods with different objectives, the ensemble of clusters are constructed. Each ensemble, indeed, reveals the structure of data as a manifold. Suppose bag ${b}_{k}$ shows the $k$ th ensemble, containing the clustering result of $k$ th method, as represented by $W^{k}=[{w_{ij}^{k}}]$ . In this bag, each data instance resides in just one cluster. In order to find the best structure of data, these bags are used to fill the similarity matrix $W=[{w_{ij}}]$ , according the upper bound formula in (3.1). Using Eq. (7), each member $w_{ij}=\sum_{k}w_{ij}^{k}$ shows the co-occurrence frequency. That is, $w_{ij}$ counts the number of times two data instances $x_{i}$ and $x_{j}$ reside in the same cluster in all ensemble bags. Figure 2 shows the three bags of 10 data instances, which are used to fill $W$ . According to this figure, $w_{15}=2$ means that $x_{1}$ and $x_{5}$ reside 2 times in the same clusters in bags ${b}_{1}$ and ${b}_{2}$ , and $w_{12}=0$ confirms that $x_{1}$ and $x_{2}$ never co-occur. Clearly, $w_{ii}=3$ because there are three bags of ensembles in this example.

Figure 2.

Three bags of ensembles, used to fill the similarity matrix for 10 data instances.

In this similarity matrix, if $w_{ij}$ has a positive value then two data instances $x_{i}$ and $x_{j}$ are connected in the graph and so, its edge weight is set accordingly; otherwise, they are not connected.

After obtaining the similarity matrix $W$ , the normalized Laplacian matrix $L^{\ast}$ is computed from Eq. (11). Then, an UELM network, having $n_{h}$ hidden neurons with random weights, is set up and the output of hidden layer, $H$ , is obtained. By solving the generalized eigenvalue problem in Eq. (13), and using its generalized eigenvectors, as in Eq. (14), the output weights $\beta$ are computed. Using $\beta$ in Eq. (15), the embedded data, $E$ , in new feature space is obtained. At last, the $k$ -means algorithm is used to group $E$ into $n_{c}$ clusters. The details of MREC are given in the following algorithm.

Algorithm: MREC
Input: training unlabeled data: $X$
0. Parameter setting
Set no. of clustering algorithms: $n_{a}$
Set no. of clusters: $n_{c}$
Set no. of hidden neurons of UELM: $n_{h}$
Set dimension of embedding: $n_{o}$
Set regularization parameter: $\lambda$
1. Stacking ensemble with different objects
Apply $n_{a}$ clustering methods on $X$ , fill in similarity matrices $W^{k},k=1,\ldots,K$
2. Multi-objective similarity matrix construction
Using Eq. (7), obtain similarity matrix $W$
3. MR clustering
Using Eqs (8) and (11), compute normalized Laplacian matrix $L^{\ast}$
Set up UELM with $n_{h}$ hidden neurons and random weights
Compute output of hidden layer $H$
Solve Eq. (13) to obtain eigenvectors $v_{1},v_{2},\ldots,v_{n_{o}+1}$ of smallest eigenvalues
Using Eq. (14), compute output weights $\beta$
Using Eq. (15), obtain embedded data $E$
Employ $k$ -means algorithm to group $E$ into $n_{c}$ clusters
Output: index of $n_{c}$ clusters

4. Experimental results

In this section, the results of the experiments, which have been done to assess the performance of MREC, are reported. We compare our method with several baseline methods and some state-of-the-art ensemble and spectral clustering methods in the experiments.

Though determining the clustering quality of an algorithm is a subjective task, the accuracy is often employed as one of the major criteria, which is also used to evaluate the results of MREC. Additionally, we adopt different statistical techniques to decide whether the differences between the compared algorithms are real or random. We examine our results using some known statistical tests including absolute accuracy rank, $t$ -test, Wilcoxon signed rank test, Friedman test and Nemenyi test [33].

4.1 Datasets

In the experiments, some real-world datasets besides three standard benchmark protein datasets are examined to justify the capability of MREC. Nine real-world datasets from UCI machine learning repository [34] and two datasets from other sources are used. Their characteristics are summarized in Table 1.

Table 1
Characteristics of datasets

Dataset	# instances	# attributes	# classes
Banana	5300	2	2
Bank-Note	1372	5	2
Breast-C	699	9	2
Ionosphere	351	35	2
Iris	150	4	3
Isolet	7797	617	26
Leukemia	72	7130	2
Pendigits	10992	16	10
Semeion	1593	256	10
Spiral	77	7	3
Wine	178	13	3

Table 2

Characteristics of protein datasets

Dataset	Sequence similarity	# protein domains	# all_ $\alpha$ classes	# all_ $\beta$ classes	# $\alpha$ / $\beta$ classes	# $\alpha+\beta$ classes
1189	$<$ 40%	1092	223	294	334	241
640	$\leqslant$ 25%	640	138	154	177	171
25PDB	$\leqslant$ 25%	1673	443	443	346	441

Also, three standard benchmark protein datasets are used to justify the capability of our method in protein structural clustering in Table 2.

Four different groups are defined for proteins according to their structural similarity and characteristics. These groups are all_ $\alpha$ , all_ $\beta$ , $\alpha$ / $\beta$ and $\alpha+\beta$ classes of protein structures. When a novel protein is identified, we can guess its functional properties based on its predicted family clustering [39].

In this paper, we have extracted features from both secondary structures and their protein sequences. Different types of features such as amino acid composition [40], pseudo amino acid composition [41], polypeptide composition [42], and predicted secondary structure information [43] are used. Hence, using PSIPRED [44], the secondary structure of proteins is predicted and 19 features are extracted [45]. Six of them are computed from $\alpha$ -helix segments: NAvgSeg ${}^{\text{H}}$ , CMV ${}^{\text{H}}_{1}$ , NCount ${}^{\text{H}}_{6}$ , NCount ${}^{\text{H}}_{8}$ , NmaxSeg ${}^{\text{H}}$ and P ${}_{\alpha}$ . The other six features are based on $\beta$ -strand segments: NmaxSeg ${}^{\text{E}}$ , CMV ${}^{\text{E}}_{1}$ , NCount ${}^{\text{E}}_{5}$ , CV ${}^{\text{E}}$ , NAvgSeg ${}^{\text{E}}$ , and P ${}_{\beta}$ . Also, protein length, p(E), p(H), P ${}_{\alpha\beta}$ and P ${}_{\beta\alpha}$ reflect the general contents and spatial arrangement of the secondary structure elements of a given protein sequence. The three protein datasets are 1189, 640 and 25PDB [46]. Description of these protein characteristics are shown in Table 2.

4.2 Accuracy,

t

-test and Wilcoxon test

A widely used evaluation criterion for quantitative analysis of clustering is accuracy (CAC). It measures the fraction of predicted labels, obtained by a clustering method, that match with ground-truth labels [35]:

$\displaystyle\textit{CAC}=\frac{1}{N}\sum_{i=1}^{N}\delta({y_{i},\textit{map}(% {c_{i}})})$ (16)

Where $N$ is the number of training patterns, $y_{i}$ is the true label of pattern $x_{i}$ and $\textit{map}({c_{i}})$ indicates its predicted label, in terms of cluster $c_{i}$ containing $x_{i}$ . The equality function $\delta({a,b})$ returns 1 if $a=b$ ; otherwise it gives 0. According to Eq. (16), the higher values of CAC indicate better clustering performance.

Using CAC, the performance of MREC is compared against some baseline methods including $k$ -means, complete linkage, single linkage, average linkage, median linkage, fuzzy c-means, spectral clustering and DBSCAN. To handle the factor of getting lucky occasionally, in each experiment, every testing method is conducted 20 times and their average and standard deviation of CAC is reported. Table 3 show these comparisons where the best result for each dataset is shown in boldface.

Table 3

Accuracy comparison of MREC against baseline methods

Dataset	Single linkage	Complete linkage	Average linkage	Median linkage	$k$ -means	Fuzzy $c$ -means	DBSCAN	Spectral clustering	MREC
Banana	55.15 $\pm$ 1.02	53.98 $\pm$ 0.98	55.15 $\pm$ 1.02	55.15 $\pm$ 1.32	56.67 $\pm$ 1.72	56.60 $\pm$ 2.09	53.65 $\pm$ 2.21	54.66 $\pm$ 0.81	58.60 $\pm$ 1.43
Bank-Note	57.72 $\pm$ 1.72	57.72 $\pm$ 1.02	57.72 $\pm$ 1.72	57.72 $\pm$ 1.42	61.22 $\pm$ 1.90	60.93 $\pm$ 1.83	68.51 $\pm$ 4.18	59.98 $\pm$ 2.41	88.25 $\pm$ 4.19
Breast-C	65.37 $\pm$ 2.08	77.82 $\pm$ 1.23	96.28 $\pm$ 2.65	96.10 $\pm$ 1.65	95.85 $\pm$ 1.54	95.27 $\pm$ 2.43	95.99 $\pm$ 1.67	96.18 $\pm$ 0.12	96.42 $\pm$ 0.29
Ionosphere	64.38 $\pm$ 1.45	68.66 $\pm$ 0.98	65.78 $\pm$ 1.45	61.82 $\pm$ 0.98	71.22 $\pm$ 1.99	70.94 $\pm$ 1.53	69.39 $\pm$ 2.83	70.37 $\pm$ 0.22	73.23 $\pm$ 1.41
Iris	69.33 $\pm$ 1.65	77.33 $\pm$ 1.84	91.33 $\pm$ 2.87	86.00 $\pm$ 1.65	89.33 $\pm$ 1.98	89.33 $\pm$ 2.01	74.69 $\pm$ 2.13	76.00 $\pm$ 1.54	96.00 $\pm$ 4.06
Isolet	14.32 $\pm$ 2.01	30.96 $\pm$ 1.65	39.05 $\pm$ 2.01	21.00 $\pm$ 1.65	49.01 $\pm$ 1.21	17.70 $\pm$ 2.56	49.90 $\pm$ 2.43	51.23 $\pm$ 2.80	55.93 $\pm$ 1.76
Leukemia	65.27 $\pm$ 1.14	76.38 $\pm$ 1.41	66.66 $\pm$ 0.97	63.88 $\pm$ 1.14	65.27 $\pm$ 2.62	50.00 $\pm$ 0.09	65.28 $\pm$ 2.06	55.55 $\pm$ 0.78	87.50 $\pm$ 4.35
Pendigits	10.49 $\pm$ 2.78	58.23 $\pm$ 1.41	56.45 $\pm$ 2.78	44.35 $\pm$ 1.78	65.15 $\pm$ 3.12	47.02 $\pm$ 1.23	63.38 $\pm$ 1.22	69.08 $\pm$ 0.87	72.55 $\pm$ 0.65
Semeion	10.92 $\pm$ 1.54	44.25 $\pm$ 2.01	55.86 $\pm$ 0.98	10.94 $\pm$ 1.54	59.38 $\pm$ 4.93	31.32 $\pm$ 2.08	48.33 $\pm$ 2.22	55.05 $\pm$ 1.09	62.02 $\pm$ 0.23
Spiral	49.32 $\pm$ 1.87	43.91 $\pm$ 2.28	44.23 $\pm$ 2.28	38.14 $\pm$ 1.34	34.61 $\pm$ 1.72	33.97 $\pm$ 1.71	34.61 $\pm$ 2.04	34.97 $\pm$ 1.06	57.69 $\pm$ 2.46
Wine	43.25 $\pm$ 1.23	64.04 $\pm$ 0.91	67.41 $\pm$ 1.23	67.41 $\pm$ 0.91	50.00 $\pm$ 2.01	68.53 $\pm$ 1.90	66.45 $\pm$ 2.41	51.12 $\pm$ 1.86	70.23 $\pm$ 3.59
1189	30.67 $\pm$ 0.008	32.96 $\pm$ 0.14	35.80 $\pm$ 0.07	29.21 $\pm$ 0.14	34.34 $\pm$ 0.12	34.70 $\pm$ 0.00	32.77 $\pm$ 2.01	35.62 $\pm$ 0.39	56.59 $\pm$ 1.50
640	36.70 $\pm$ 0.01	46.08 $\pm$ 0.21	47.45 $\pm$ 0.01	43.09 $\pm$ 0.07	42.97 $\pm$ 0.21	42.43 $\pm$ 0.02	48.23 $\pm$ 1.98	46.56 $\pm$ 1.50	56.24 $\pm$ 0.62
25PDB	26.65 $\pm$ 0.007	29.34 $\pm$ 0.14	33.29 $\pm$ 0.01	29.82 $\pm$ 0.02	35.83 $\pm$ 0.029	35.83 $\pm$ 0.029	34.23 $\pm$ 1.45	38.49 $\pm$ 0.09	60.90 $\pm$ 1.69
Average	42.82 $\pm$ 1.32	54.38 $\pm$ 1.16	58.03 $\pm$ 1.42	50.32 $\pm$ 1.11	57.91 $\pm$ 1.79	52.46 $\pm$ 1.39	57.52 $\pm$ 2.20	56.77 $\pm$ 1.11	70.86 $\pm$ 2.01

It is obvious that our MREC outperforms other baseline methods in most of the datasets, especially in Bank-Note, Spiral, Isolet, Leukemia, Iris and all protein datasets where the difference of MREC to the next best one is more than 4%. Among the compared methods, the spectral clustering, DBSCAN, average linkage and $k$ -means are the nearest rivals. In comparing with the traditional spectral clustering, MREC results are significantly better; from about 3% to 32% in all datasets.

Additionally, several state-of-the-art ensemble and spectral clustering methods are used in the experiments. These methods include hierarchical clustering on co-association matrix (HCC) [8], $k$ -means based consensus clustering (KCC) [13], spectral ensemble clustering (SEC) [4], robust spectral ensemble clustering (RSEC) [9], fast large-scale spectral clustering via explicit feature mapping (F-ESC) [10], ultra-scalable spectral clustering and ensemble clustering (U-EPEC) [36]. The accuracy of these methods, beside MREC, are depicted in Table 4.

Table 4

Accuracy comparison of MREC against ensemble and spectral methods

Dataset	HCC		KCC		SEC		RSEC		F-ESC		U-EPEC		MREC
Banana	54.28	$\pm$ 1.23	48.91	$\pm$ 1.98	48.37	$\pm$ 1.32	57.00	$\pm$ 0.34	45.49	$\pm$ 0.87	57.12	$\pm$ 2.01	58.60	$\pm$ 1.43
Bank-Note	52.87	$\pm$ 2.08	52.99	$\pm$ 3.44	53.17	$\pm$ 1.34	56.69	$\pm$ 0.98	55.41	$\pm$ 2.65	79.02	$\pm$ 2.63	88.25	$\pm$ 4.19
Breast-C	94.99	$\pm$ 1.82	66.35	$\pm$ 2.09	95.81	$\pm$ 1.43	97.14	$\pm$ 1.34	90.56	$\pm$ 1.43	95.13	$\pm$ 2.13	96.42	$\pm$ 0.29
Ionosphere	68.38	$\pm$ 1.90	63.82	$\pm$ 2.00	63.53	$\pm$ 2.65	67.52	$\pm$ 0.28	67.18	$\pm$ 1.78	70.78	$\pm$ 2.43	73.23	$\pm$ 1.41
Iris	89.33	$\pm$ 2.01	88.92	$\pm$ 3.98	96.00	$\pm$ 1.92	97.33	$\pm$ 1.82	97.33	$\pm$ 0.78	98.12	$\pm$ 3.09	96.20	$\pm$ 4.06
Isolet	41.47	$\pm$ 2.12	42.08	$\pm$ 3.51	38.76	$\pm$ 1.83	49.68	$\pm$ 1.21	42.04	$\pm$ 2.09	52.12	$\pm$ 2.32	55.93	$\pm$ 1.76
Leukemia	69.12	$\pm$ 1.45	60.09	$\pm$ 2.55	66.99	$\pm$ 2.78	77.65	$\pm$ 0.98	81.23	$\pm$ 3.76	69.01	$\pm$ 2.87	87.50	$\pm$ 4.35
Pendigits	74.24	$\pm$ 2.43	63.97	$\pm$ 3.47	74.61	$\pm$ 1.53	71.57	$\pm$ 0.01	69.97	$\pm$ 0.71	84.17	$\pm$ 2.18	72.55	$\pm$ 0.65
Semeion	45.05	$\pm$ 1.54	56.03	$\pm$ 0.09	58.85	$\pm$ 1.54	61.33	$\pm$ 0.45	55.59	$\pm$ 0.03	61.44	$\pm$ 2.41	62.02	$\pm$ 0.23
Spiral	39.56	$\pm$ 1.98	36.27	$\pm$ 2.65	34.91	$\pm$ 2.98	43.82	$\pm$ 0.03	35.51	$\pm$ 1.43	52.12	$\pm$ 1.48	57.69	$\pm$ 2.46
Wine	50.00	$\pm$ 2.10	50.49	$\pm$ 0.98	51.12	$\pm$ 1.81	52.25	$\pm$ 2.40	51.69	$\pm$ 0.87	68.17	$\pm$ 3.22	70.23	$\pm$ 3.59
1189	51.2	$\pm$ 1.49	46.21	$\pm$ 2.01	45.97	$\pm$ 1.36	55.05	$\pm$ 0.76	43.44	$\pm$ 0.45	55.22	$\pm$ 2.65	56.59	$\pm$ 1.50
640	40.27	$\pm$ 2.02	43.18	$\pm$ 3.22	39.16	$\pm$ 1.33	49.88	$\pm$ 1.67	48.94	$\pm$ 2.19	54.98	$\pm$ 2.98	56.24	$\pm$ 0.62
25PDB	43.15	$\pm$ 1.43	56.73	$\pm$ 0.09	56.85	$\pm$ 1.54	59.32	$\pm$ 0.48	55.59	$\pm$ 0.07	59.43	$\pm$ 2.42	60.90	$\pm$ 1.69
Average	58.13	$\pm$ 1.82	55.43	$\pm$ 2.28	58.86	$\pm$ 1.81	64.01	$\pm$ 0.91	59.99	$\pm$ 1.36	68.34	$\pm$ 2.48	70.88	$\pm$ 2.01

As Table 4 show, MREC performs better than other methods in most datasets, except Breast-C, Iris and Pendigits. This time, the main rival is RSEC, in average, and U-EPEC in Iris and Pendigits datasets.

In order to do comparisons in statistical manner, Wilcoxon signed rank test [37] is used. Table 5 reports the difference in $p$ -values of $t$ -tests between MREC and the baseline methods of Table 3. In this test, the mean of accuracies is used to assess whether, on average, the accuracy of MREC and baseline methods are statistically different. The $p$ -values are computed on each dataset and the statistically different values are depicted by ‘ $+$ ’. That is, ‘ $+$ ’ for those less than the critical value 0.05, ‘ $-$ ’ for greater values, and ‘ $\approx$ ’ if the values are almost the same.

Table 5

The $p$ -values comparison of MREC against baseline methods

Dataset	Single linkage	Complete linkage	Average linkage	Median linkage	$k$ -means	Fuzzy $c$ -means	DBSCAN	Spectral clustering
Banana	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Bank-Note	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Breast-C	$+$	$+$	$+$	$\approx$	$+$	$+$	$+$	$\approx$
Ionosphere	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Iris	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Isolet	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Leukemia	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Pendigits	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Semeion	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Spiral	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
Wine	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
1189	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
640	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$
25PDB	$+$	$+$	$+$	$+$	$+$	$+$	$+$	$+$

As is clear in Table 5, it justifies that the accuracy of MREC is statistically better than baseline methods, in almost all datasets. However, this was expected as none of the baseline methods use ensemble of clusters while MREC is an ensemble method.

To assess MREC against ensemble and spectral clustering methods, Wilcoxon signed rank test is reused for those in Table 4. Table 6 report the differences in $p$ -values, again.

Table 6

The $p$ -values comparison of MREC against ensemble and spectral methods

Dataset	HCC	KCC	SEC	RSEC	F-ESC	U-EPEC
Banana	$+$	$+$	$+$	$+$	$+$	$+$
Bank-Note	$+$	$+$	$+$	$+$	$+$	$+$
Breast-C	$+$	$+$	$\approx$	$-$	$+$	$+$
Ionosphere	$+$	$+$	$+$	$+$	$+$	$+$
Iris	$+$	$+$	$+$	$\approx$	$\approx$	$-$
Isolet	$+$	$+$	$+$	$+$	$+$	$+$
Leukemia	$+$	$+$	$+$	$+$	$+$	$+$
Pendigits	$\approx$	$+$	$-$	$+$	$+$	$-$
Semeion	$+$	$+$	$+$	$\approx$	$+$	$+$
Spiral	$+$	$+$	$+$	$+$	$+$	$+$
Wine	$+$	$+$	$+$	$+$	$+$	$\approx$
1189	$+$	$+$	$+$	$+$	$+$	$+$
640	$+$	$+$	$+$	$+$	$+$	$+$
25PDB	$+$	$+$	$+$	$+$	$+$	$+$

According to Table 6, MREC statistically outperforms HCC, KCC and F-ESC in almost all datasets. Setting aside one dataset, it achieves better results than SEC and RSEC. It also performs better than U-EPEC, except in two datasets.

4.3 Accuracy ranking, absolute accuracy rank Friedman test and Nemenyi test

In this section, we followed Demsar’s proposal [38] to compare our MREC against other methods via employing two statistical tests: the Friedman test and the Nemenyi test.

Firstly, we prepare the results to do the Friedman test. In this regard, the competing algorithms are ranked regarding their accuracy on each dataset. The algorithm with the highest accuracy gets rank 1; the next methods get ranks 2, 3 and so on. Then, the average rank of each method, on all datasets, is calculated as another performance measure.

Table 7 depicts the accuracy ranking of MREC against rival baseline methods, according to Table 3. Clearly, MREC outperforms all compared methods since its ranking distance to $k$ -means, as the next rival method, is more than 3.

Table 7
Accuracy ranking of MREC against baseline methods

Dataset	Single linkage	Complete linkage	Average linkage	Median linkage	$k$ -means	Fuzzy $c$ -means	DBSCAN	Spectral clustering	MREC
Banana	4	7	4	5	2	3	8	6	1
Bank-Note	8	6	8	7	3	4	2	5	1
Breast-C	9	8	2	4	6	7	5	3	1
Ionosphere	8	6	7	9	2	3	5	4	1
Iris	9	6	2	5	3	4	8	7	1
Isolet	9	6	5	7	4	8	3	2	1
Leukemia	5	2	3	7	6	9	4	8	1
Pendigits	9	5	6	8	3	7	4	2	1
Semeion	9	6	3	8	2	7	5	4	1
Spiral	2	4	3	7	8	9	6	5	1
Wine	9	6	4	3	8	2	5	7	1
1189	8	6	2	9	5	4	7	3	1
640	9	5	3	6	7	8	2	4	1
25PDB	8	7	5	6	3	3	4	2	1
Average	7.56	5.72	4.07	6.5	4.43	5.57	4.85	4.43	1

Using the accuracy results for ensemble and spectral clustering methods in Table 4, their accuracy rankings are collected in Table 8. This time again, MREC performs better than all compared methods, though its ranking distance to the next rival method, U-EPEC, is less than 0.5.

Table 8

Accuracy ranking of MREC against ensemble and spectral methods

Dataset	HCC	KCC	SEC	RSEC	F-ESC	U-EPEC	MREC
Banana	4	5	6	3	7	2	1
Bank-Note	7	6	5	3	4	2	1
Breast-C	5	7	3	1	6	4	2
Ionosphere	3	6	7	4	5	2	1
Iris	6	7	5	3	2	1	4
Isolet	6	4	7	3	5	2	1
Leukemia	4	7	6	3	2	5	1
Pendigits	3	7	2	5	6	1	4
Semeion	7	5	4	3	6	2	1
Spiral	4	5	7	3	6	2	1
Wine	7	6	5	3	4	2	1
1189	4	5	6	3	7	2	1
640	6	5	7	3	4	2	1
25PDB	7	5	4	3	6	2	1
Average	5.21	5.71	5.28	3.07	5	2.21	1.5

Additionally, we have used Friedman test to evaluate a statistical hypothesis for comparing the average of ranks, in Tables. Let there are $K$ algorithms and $N$ datasets in the experiments, and $R_{j}$ computes the average rank of $j$ th algorithm. The Friedman test compares the $R_{j}$ of $K$ algorithms. Under the null-hypothesis, which states that all algorithms have equivalent performance and so their ranks $R_{j}$ should be equal, the Friedman statistic

$\displaystyle\chi_{F}^{2}=\frac{12N}{K({K+1})}\left({\sum_{j=1}^{K}R_{j}^{2}-% \frac{K({K+1})^{2}}{4}}\right)$ (17)

is distributed according to $\chi_{F}^{2}$ with $K-1$ degrees of freedom, when $N$ and $K$ are big enough (as a rule of thumb), $N>10$ and $K>5$ . Friedman’s $\chi_{F}^{2}$ is undesirably conservative and derived a better statistic

$\displaystyle F_{F}=\frac{({N-1})\chi_{F}^{2}}{N({K-1})-\chi_{F}^{2}}$ (18)

which is distributed according to the $F$ -distribution with $K-1$ and $({K-1})({N-1})$ degrees of freedom [38].

When we apply the Friedman test for baseline methods in Table 7, with $K=9$ methods and $N=14$ datasets, $F_{F}$ is distributed according to $F({K-1,({K-1})({N-1})})=F({8,104})$ with 8 and 104 degrees of freedom. Since $F_{F}=4.94$ is greater than the critical values of $F$ -distribution for (8, 104) which is 2.01 at type-1 error of 0.05, we have sufficient evidence to reject the null hypothesis and conclude that MREC is significantly different from the baseline schemes.

Similarly, for $K=7$ ensemble and spectral clustering methods in Table 8, $F_{F}$ is distributed according to $F({6,78})$ with 6 and 78 degrees of freedom. This time, $F_{F}=14.23$ is much greater than the critical value 2.22 for (6, 78) freedoms at type-1 error of 0.05. Therefore, the evidence for rejecting the null hypothesis is arguable and justifies that the compared ensemble and spectral clustering schemes.

Since the Friedman test rejects its null hypothesis, we need a post-hoc test to find the exact differences in our group of experiments. In this regard, the Nemenyi multiple comparison test is used to find which algorithms have the best performance. The results of this test, for average of ranks in Tables 7 and 8, are illustrated in Figs 3 and 4 respectively. In these figures, the average of ranks for each scheme is pointed out by a circle and the horizontal bar across each circle indicates the critical difference. In short, those methods that have no overlap in their horizontal bar are significantly different; otherwise, they can do the same in some situation.

Figure 3.

Nemenyi test on MREC and eight baseline methods.

According to Fig. 3, MREC is ranked as the best and significantly better than baseline methods since its horizontal bar has no overlap with other bars.

Additionally, MREC is better than the ensemble and spectral clustering methods, as Fig. 4 shows. However, there are some overlaps in horizontal bars of MREC and U-EPEC. This means that U-EPEC can perform similarly in some situations, though MREC has better rank than U-EPEC.

Figure 4.

Nemenyi test on MREC and six ensemble and spectral methods.

5. Conclusion

In this paper, we proposed MREC, the manifold regularization ensemble clustering method with respect to different objects, which constructs ensemble of manifold regularization to predict the structure of data on different views. Unsupervised extreme learning machine was used to initialize and find the generalized eigenvectors in order to embed the data in new space. It was used as the base point in spectral clustering to find a good partitioning of the data instances.

The experimental results showed that by taking the full advantages of ensemble learning with distinct objects, MREC could get acceptable diversity and so better clusters. We compared our proposed method against fifteen state-of-the-art methods on some UCI and 3 protein datasets. The rigorous statistical tests confirmed that MREC could perform significantly better than the compared methods in terms of clustering accuracy.

As future works, we plan to extend the current work to different-objective evolutionary algorithms and deep learning methods to improve the clustering performance.

References

Von Luxburg

, A tutorial on spectral clustering, Statistics and Computing 17(4) (2007), 395–416.

Omran

M.G.

Engelbrecht

A.P.

and Salman

, An overview of clustering methods, Intelligent Data Analysis 11(6) (2007), 583–605.

Belkin

Niyogi

and Sindhwani

, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, Journal of Machine Learning Research 7 (2006), 2399–2434.

Liu

Tao

and Fu

, Spectral ensemble clustering, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 715–724.

Sheng

Wang

and Xu

, Adaptive local learning regularized nonnegative matrix factorization for data clustering, Applied Intelligence 49(6) (2019), 2151–2168.

Huang

Song

Gupta

J.N.D.

and Wu

, Semi-supervised and unsupervised extreme learning machines, IEEE Transactions on Cybernetics 44(12) (2014), 2405–2417.

Shi

and Malik

, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000), 888–905.

Fred

A.L.

and Jain

A.K.

, Combining multiple clusterings using evidence accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6) (2005), 835–850.

Tao

Liu

Ding

and Fu

, Robust spectral ensemble clustering via rank minimization, ACM Transactions on Knowledge Discovery from Data 13(1) (2019), 4.

10.

Ray

Guan

and Zhang

, Fast large-scale spectral clustering via explicit feature mapping, IEEE Transactions on Cybernetics 49(3) (2018), 1058–1071.

11.

Nie

Huang

and Huang

, Large-scale multi-view spectral clustering via bipartite graph, in: Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.

12.

Yousefnezhad

and Zhang

, Weighted spectral cluster ensemble, in: IEEE International Conference on Data Mining, 2015, pp. 549–558.

13.

Liu

Xiong

Cao

and Chen

, K-means-based consensus clustering: a unified view, IEEE Transactions on Knowledge and Data Engineering 27(1) (2014), 155–169.

14.

Gullo

Domeniconi

and Tagarelli

, Projective clustering ensembles, Data Mining and Knowledge Discovery 26(3) (2013), 452–511.

15.

Gullo

Domeniconi

and Tagarelli

, Metacluster-based projective clustering ensembles, Machine Learning 98(1-2) (2015), 181–216.

16.

Franek

and Jiang

, Ensemble clustering by means of clustering embedding in vector spaces, Pattern Recognition 47(2) (2014), 833–842.

17.

Zheng

and Ding

, A framework for hierarchical ensemble clustering, ACM Transactions on Knowledge Discovery from Data 9(2) (2014), 9.

18.

Gonzàlez

and Turmo

, Unsupervised ensemble minority clustering, Machine Learning 98(1-2) (2015), 217–268.

19.

Jing

Tian

and Huang

J.Z.

, Stratified feature sampling method for ensemble clustering of high dimensional data, Pattern Recognition 48(11) (2015), 3688–3702.

20.

Luo

You

Wong

H.S.

Leung

Zhang

and Han

, Incremental semi-supervised clustering ensemble for high dimensional data clustering, IEEE Transactions on Knowledge and Data Engineering 28(3) (2016), 701–714.

21.

Jiamthapthaksin

Eick

C.F.

and Vilalta

, A framework for multi-objective clustering and its application to co-location mining, in: International Conference on Advanced Data Mining and Applications, 2009, pp. 188–199.

22.

Kaufman

and Rousseeuw

P.J.

, Finding Groups in Data: An Introduction to Cluster Analysis, Wiley, New York, 1990.

23.

R.T.

and Han

, CLARANS: a method for clustering objects for spatial data mining, IEEE Transactions on Knowledge & Data Engineering 5 (2002), 1003–1016.

24.

Karypis

Han

E.H.S.

and Kumar

, Chameleon: hierarchical clustering using dynamic modeling, Computer 8 (1999), 68–75.

25.

Zhang

Ramakrishnan

and Livny

, BIRCH: an efficient data clustering method for very large databases, In ACM Sigmod Record 25(2) (1996), 103–114.

26.

Guha

Rastogi

and Shim

, CURE: an efficient clustering algorithm for large databases, In ACM Sigmod Record 27(2) (1998), 73–84.

27.

Ester

Kriegel

H.P.

Sander

and Xu

, A density-based algorithm for discovering clusters in large spatial databases with noise, In KDD 96(34) (1996), 226–231.

28.

Ankerst

Breunig

M.M.

Kriegel

H.P.

and Sander

, OPTICS: ordering points to identify the clustering structure, In ACM Sigmod Record 28(2) (1999), 49–60.

29.

Hinneburg

and Keim

D.A.

, An efficient approach to clustering in large multimedia databases with noise, In KDD 98 (1998), 58–65.

30.

Honkela

, Self-organizing maps in natural language processing, Doctoral dissertation, Helsinki University of Technology, 1997.

31.

Fisher

D.H.

, Knowledge acquisition via incremental conceptual clustering, Machine Learning 2(2) (1987), 139–172.

32.

Wang

Yang

and Yu

P.S.

, Clustering by pattern similarity in large data sets, in: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002, pp. 394–405.

33.

Zimmerman

D.W.

and Zumbo

B.D.

, Relative power of the Wilcoxon test, the Friedman test, and repeated-measures ANOVA on ranks, The Journal of Experimental Education 62(1) (1993), 75–86.

34.

Asuncion

and Newman

D.J.

, UCI machine learning repository, Department of Information and Computer science, University of California, Irvine, CA, online available: http://www.ics.uci.edu/mlearn/MLRepository.html, 2007.

35.

Yan

Huang

and Jordan

M.I.

, Fast approximate spectral clustering, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, pp. 907–916.

36.

Huang

Wang

C.D.

J.S.

Lai

J.H.

and Kwoh

C.K.

, Ultra-Scalable Spectral Clustering and Ensemble Clustering, IEEE Transactions on Knowledge and Data Engineering, 2019.

37.

Woolson

R.F.

, Wilcoxon signed-rank test, Wiley encyclopedia of clinical trials, 2007, 1–3.

38.

Demšar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006), 1–30.

39.

Mansoori

E.G.

Zolghadri

M.J.

and Katebi

S.D.

, Protein superfamily classification using fuzzy rule-based classifier, IEEE Transactions on Nanobioscience 8(1) (2009), 92–99.

40.

Cao

D.S.

Q.S.

and Liang

Y.Z.

, Propy: a tool to generate various modes of Chou’s PseAAC, Bioinformatics 29(7) (2013), 960–962.

41.

Crosio

Fimia

G.M.

Loury

Kimura

Okano

Zhou

and Sassone-Corsi

, Mitotic phosphorylation of histone H3: spatio-temporal regulation by mammalian Aurora kinases, Molecular and Cellular Biology 22(3) (2002), 874–885.

42.

Zhang

Ding

and Wang

, High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure, Biochimie 93(4) (2011), 710–714.

43.

Jones

D.T.

, Protein secondary structure prediction based on position-specific scoring matrices, Journal of Molecular Biology 292(2) (1999), 195–202.

44.

Kurgan

Cios

and Chen

, SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences, BMC Bioinformatics 9(1) (2008), 226.

45.

Wang

Z.X.

and Yuan

, How good is prediction of protein structural class by the component-coupled method? Proteins: Structure, Function, Bioinformatics 38(2) (2000), 165–175.

46.

Chen

Kurgan

L.A.

and Ruan

, Prediction of protein structural class using novel evolutionary collocation-based sequence representation, Journal of Computational Chemistry 29(10) (2008), 1596–1604.

Manifold regularization ensemble clustering with many objectives using unsupervised extreme learning machines

Abstract

Keywords

1. Introduction

2. Literature review

3. MREC, the proposed ensemble clustering method

3.1 Customizing UELM for MREC

4.1 Datasets

Table 1 Characteristics of datasets

Table 7 Accuracy ranking of MREC against baseline methods

References

Table 1
Characteristics of datasets

Table 7
Accuracy ranking of MREC against baseline methods