Abstract
Spectral clustering has been an effective clustering method, in last decades, because it can get an optimal solution without any assumptions on data’s structure. The basic key in spectral clustering is its similarity matrix. Despite many empirical successes in similarity matrix construction, almost all previous methods suffer from handling just one objective. To address the multi-objective ensemble clustering, we introduce a new ensemble manifold regularization (MR) method based on stacking framework. In our Manifold Regularization Ensemble Clustering (MREC) method, several objective functions are considered simultaneously, as a robust method for constructing the similarity matrix. Using it, the unsupervised extreme learning machine (UELM) is employed to find the generalized eigenvectors to embed the data in low-dimensional space. These eigenvectors are then used as the base point in spectral clustering to find the best partitioning of the data. The aims of this paper are to find robust partitioning that satisfy multiple objectives, handling noisy data, keeping diversity-based goals, and dimension reduction. Experiments on some real-world datasets besides to three benchmark protein datasets demonstrate the superiority of MREC over some state-of-the-art single and ensemble methods.
Keywords
Introduction
Spectral clustering becomes one of the most interesting subjects on clustering data, because it can get an optimal solution without any assumptions on data’s structure and can find global optimal results in processing non-convex data [1]. It is simple to implement and solve efficiently by standard linear algebra software. It obtains data representation in the low-dimensional space using eigenvectors, and very often outperforms traditional clustering algorithms such as
In another view, spectral clustering is similar to manifold regularization (MR) that deals with deeper properties of data [3]. This view is considered, in this paper, to construct MR ensemble from data by using different clustering methods. Generally, there are two approaches in MR to deal with traditional risk minimization: localization and penalization. MR uses penalization based on geometry of marginal distribution of data. It is a regularization form for preventing overfitting and ensuring that a problem is well-posed by penalizing complex solutions [3]. More precisely, MR extends the Tikhonov technique of regularization as applied to reproducing kernel Hilbert spaces (RKHSs). In RKHSs, the standard Tikhonov regularization attempts to learn a function
MR adds the intrinsic regularizer as a second regularization term in standard Tikhonov regularization. Under the manifold assumption in machine learning, data come from a nonlinear manifold
The goal of similarity matrix is to model the local neighborhood relationships between data. There are several popular constructions to transform a given data to similarity matrix. The choice of the similarity matrix influences the spectral clustering result and so, many researches have been done to find the more effective similarity matrices [5]. However, almost all these methods suffer from handling just one objective and cannot serve all the characteristics of data in their similarity matrices. To address this deficit, we introduce Manifold Regularization Ensemble Clustering [2] (MREC) as a new MR ensemble method based on stacking framework to employ the ability of different clustering methods in constructing similarity matrices. These abilities support many implicit objectives such as data density, cluster’s compactness and connectedness, clusters’ isolation and so on.
In our MREC, there are three basic steps. Firstly, the MR ensemble, based on stacking framework, is built with different objectives. Stacking is an ensemble method with different learners (functions) in each bag to cluster the data in different views. This model uses an effective algorithm with different goals to support all kinds of these essences. In the next step, each prediction of stacking bags are aggregated to produce the similarity matrix, according to upper-bound MR ensemble formulas, proposed in this paper, and then the Laplacian matrix is calculated. At last, unsupervised extreme learning machine (UELM) [6] is used to embed the data in new view space and then to find the best partitions.
Some benchmark datasets are used to acknowledge the superiority of MREC. In addition, rigorous statistical tests are conducted to prove its significance. Our contributions, in this paper, is summarized as follows. i) Employing a new method, in stacking framework, to find the similarity matrix while keeping diversity among the results. ii) Supporting many implicit objectives such as data density, cluster’s compactness and connectedness and clusters’ isolation in constructing similarity matrix. iii) Using multi-objective approach to find the best structures of data, based on proposed upperbound formulas on MR ensemble method. iv) Using UELMs in order to get effective learning mechanism in both reproducing kernel Hilbert space and new embedding space.
The rest of this paper is organized as follows. In Section 2, we briefly review some related works on spectral clustering and MR ensemble. Our MREC, as an MR ensemble using UELM, is explained in Section 3. Section 4 reports the experimental results and evaluations in various datasets. Finally, Section 5 presents a summary and some suggestions.
Literature review
Spectral clustering is a clustering technique based on graph theory. In the last years, this algorithm attracted more and more attention because of its theoretical foundation and good clustering results [3, 4]. Recently, some researchers have proposed many spectral clustering algorithms. In [7], Shi and Malik proposed normalized cut. Their criterion considers both internal and external connections. It could produce well-balanced clustering results while Fred and Jain applied the agglomerative hierarchical clustering [8].
Tao et al. proposed a robust representation for co-association matrix by low-rank constraint [9]. Their representation could show the cluster structure of a co-association matrix and so handled noises in spectral clustering. In [10], the authors introduced a sub-matrix based approximation that could emerge as a powerful and efficient tool for spectral clustering. They used Fourier features to represent the data objects in the kernel space and then built an NP sub-matrix upon which the efficient eigen-decomposition were performed.
Huang et al. interpreted the sparse sub-matrix as a bipartite graph and then used the transfer cut to efficiently partition the graph and obtain the best clustering result in spectral clustering [11]. Liu et al. [4] proposed an efficient spectral ensemble clustering based on co-association matrix and showed that their method has theoretical equivalence to weighted
Yousefnezhad and Zhang developed a weighted spectral cluster ensemble with two kernels by exploiting some concepts from community detection arena and graph-based clustering [12]. They provided an effective diversity estimation for individual clustering results by using modularity. Wu et al. [13] changed the ensemble clustering into
Although these methods had better performance than traditional clustering algorithms, they could only handle one objective. Ensemble clustering, on the other hand, tries to use all the abilities and characteristics of data for better performance. In this regard, different clustering algorithms may produce distinct partitions because they impose different views on the data. The goal of ensemble clustering is to exploit the complementary nature of these diverse partitions. Gullo et al. [14] proposed projective clustering ensembles for high-dimensional problems, based on two representations: object and feature. They used single objective, based on a cost function, and evolutionary algorithm for multi-objectives in their work. After that, they extended [15] the projective clustering ensembles, based on meta-clusters. They aimed to find the correlation between object-based and feature-based representations and proposed a measure that could consider both representations for better performance.
Franek and Jiang [16] research was about ensemble clustering by means of cluster embedding in vector spaces. They computed the median of the vectors and converted these median vectors into a clustering, according to get not bijective mapping. The authors in [17] presented a framework for hierarchical ensemble clustering. They constructed a consensus distance matrix, with aggregate dendrogram distance. They obtained final clusters by hierarchical clustering using the consensus distance matrix. Gonzàlez and Turmo [18] proposed an unsupervised ensemble minority clustering. Their soft-clustering was fuzzy and could assign an accumulative score to each object by an average of a co-association matrix.
Jing et al. proposed a stratified feature sampling method for ensemble clustering for high-dimensional data [19]. They constructed ensembles by clustering of similar features, instead of random feature selection, and selecting features from each cluster. In their method, features were clustered and then each cluster of features were sampled. At last, a graph out of cluster weights was created and partitioned by Jaccard similarity. In [20], authors presented an incremental semi-supervised clustering ensemble for high-dimensional data. Their ensemble was made by random feature subspaces. For each subspace, a clustering was made by e2cp algorithm based on graph. This algorithm considered the limited number of must-links and cannot-links prior knowledge in clustering, and used a graph and label propagation to construct a co-association matrix. At last,
The above-mentioned researches show that the performance of spectral clustering methods greatly depend on the construction of the similarity matrix. Despite some improvements in constructing the similarity matrices, they often handle just one objective. To address this issue in this paper, we introduce an ensemble clustering method with many objectives in stacking framework to construct the similarity matrix.
MREC, the proposed ensemble clustering method
As we know, the basic key in spectral clustering is its similarity matrix,
Customizing UELM for MREC
UELM is built on two assumptions: i) Manifold assumption says that the data lies on a manifold with much lower dimension than the input space. Therefore, this manifold can be learnt, using unlabeled data, to decrease the data dimensions and so to avoid the curse of dimensionality. ii) Smoothness assumption imposes that the conditional probabilities of two close instances
where
where
where
Instead of using
UELM consists of two main stages. The first stage is to construct a hidden layer with
where
Every
where
and
Also,
The final optimization formula will be as:
where
According to experiments, this Laplacian matrix
An optimal solution to Eq. (12) is given by choosing
where
Using the output weights
In order to reach this point, Eq. (7) needs to know how to find
Clustering with different objectives, in which multiple objective functions are simultaneously optimized, has been used as a robust alternative method and has become popular in the past decades [21]. In this section, we propose a stacking framework, which uses some clustering algorithms with multiple objectives, in order to build a diverse ensemble of clusters and then to fill the similarity matrix,
Stacking is an efficient ensemble model that has significant performance despite its simplicity. In order to get advantages of all clustering algorithms, including isolation, compactness and connectedness of clusters, modelling data density, minimizing entropy, and so on, we try to choose clustering algorithms from all categories to keep diversity among clusters and get better performance. These categories, including some effective algorithms with different objects, are outlined here.
The partitioning approaches, which construct various partitions and evaluates them by some criteria, include
A brief view of MREC.
By using
Three bags of ensembles, used to fill the similarity matrix for 10 data instances.
In this similarity matrix, if
After obtaining the similarity matrix
In this section, the results of the experiments, which have been done to assess the performance of MREC, are reported. We compare our method with several baseline methods and some state-of-the-art ensemble and spectral clustering methods in the experiments.
Though determining the clustering quality of an algorithm is a subjective task, the accuracy is often employed as one of the major criteria, which is also used to evaluate the results of MREC. Additionally, we adopt different statistical techniques to decide whether the differences between the compared algorithms are real or random. We examine our results using some known statistical tests including absolute accuracy rank,
Datasets
In the experiments, some real-world datasets besides three standard benchmark protein datasets are examined to justify the capability of MREC. Nine real-world datasets from UCI machine learning repository [34] and two datasets from other sources are used. Their characteristics are summarized in Table 1.
Characteristics of datasets
Characteristics of datasets
Characteristics of protein datasets
Also, three standard benchmark protein datasets are used to justify the capability of our method in protein structural clustering in Table 2.
Four different groups are defined for proteins according to their structural similarity and characteristics. These groups are all_
In this paper, we have extracted features from both secondary structures and their protein sequences. Different types of features such as amino acid composition [40], pseudo amino acid composition [41], polypeptide composition [42], and predicted secondary structure information [43] are used. Hence, using PSIPRED [44], the secondary structure of proteins is predicted and 19 features are extracted [45]. Six of them are computed from
A widely used evaluation criterion for quantitative analysis of clustering is accuracy (CAC). It measures the fraction of predicted labels, obtained by a clustering method, that match with ground-truth labels [35]:
Where
Using CAC, the performance of MREC is compared against some baseline methods including
Accuracy comparison of MREC against baseline methods
It is obvious that our MREC outperforms other baseline methods in most of the datasets, especially in Bank-Note, Spiral, Isolet, Leukemia, Iris and all protein datasets where the difference of MREC to the next best one is more than 4%. Among the compared methods, the spectral clustering, DBSCAN, average linkage and
Additionally, several state-of-the-art ensemble and spectral clustering methods are used in the experiments. These methods include hierarchical clustering on co-association matrix (HCC) [8],
Accuracy comparison of MREC against ensemble and spectral methods
As Table 4 show, MREC performs better than other methods in most datasets, except Breast-C, Iris and Pendigits. This time, the main rival is RSEC, in average, and U-EPEC in Iris and Pendigits datasets.
In order to do comparisons in statistical manner, Wilcoxon signed rank test [37] is used. Table 5 reports the difference in
The
As is clear in Table 5, it justifies that the accuracy of MREC is statistically better than baseline methods, in almost all datasets. However, this was expected as none of the baseline methods use ensemble of clusters while MREC is an ensemble method.
To assess MREC against ensemble and spectral clustering methods, Wilcoxon signed rank test is reused for those in Table 4. Table 6 report the differences in
The
According to Table 6, MREC statistically outperforms HCC, KCC and F-ESC in almost all datasets. Setting aside one dataset, it achieves better results than SEC and RSEC. It also performs better than U-EPEC, except in two datasets.
In this section, we followed Demsar’s proposal [38] to compare our MREC against other methods via employing two statistical tests: the Friedman test and the Nemenyi test.
Firstly, we prepare the results to do the Friedman test. In this regard, the competing algorithms are ranked regarding their accuracy on each dataset. The algorithm with the highest accuracy gets rank 1; the next methods get ranks 2, 3 and so on. Then, the average rank of each method, on all datasets, is calculated as another performance measure.
Table 7 depicts the accuracy ranking of MREC against rival baseline methods, according to Table 3. Clearly, MREC outperforms all compared methods since its ranking distance to
Accuracy ranking of MREC against baseline methods
Accuracy ranking of MREC against baseline methods
Using the accuracy results for ensemble and spectral clustering methods in Table 4, their accuracy rankings are collected in Table 8. This time again, MREC performs better than all compared methods, though its ranking distance to the next rival method, U-EPEC, is less than 0.5.
Accuracy ranking of MREC against ensemble and spectral methods
Additionally, we have used Friedman test to evaluate a statistical hypothesis for comparing the average of ranks, in Tables. Let there are
is distributed according to
which is distributed according to the
When we apply the Friedman test for baseline methods in Table 7, with
Similarly, for
Since the Friedman test rejects its null hypothesis, we need a post-hoc test to find the exact differences in our group of experiments. In this regard, the Nemenyi multiple comparison test is used to find which algorithms have the best performance. The results of this test, for average of ranks in Tables 7 and 8, are illustrated in Figs 3 and 4 respectively. In these figures, the average of ranks for each scheme is pointed out by a circle and the horizontal bar across each circle indicates the critical difference. In short, those methods that have no overlap in their horizontal bar are significantly different; otherwise, they can do the same in some situation.
Nemenyi test on MREC and eight baseline methods.
According to Fig. 3, MREC is ranked as the best and significantly better than baseline methods since its horizontal bar has no overlap with other bars.
Additionally, MREC is better than the ensemble and spectral clustering methods, as Fig. 4 shows. However, there are some overlaps in horizontal bars of MREC and U-EPEC. This means that U-EPEC can perform similarly in some situations, though MREC has better rank than U-EPEC.
Nemenyi test on MREC and six ensemble and spectral methods.
In this paper, we proposed MREC, the manifold regularization ensemble clustering method with respect to different objects, which constructs ensemble of manifold regularization to predict the structure of data on different views. Unsupervised extreme learning machine was used to initialize and find the generalized eigenvectors in order to embed the data in new space. It was used as the base point in spectral clustering to find a good partitioning of the data instances.
The experimental results showed that by taking the full advantages of ensemble learning with distinct objects, MREC could get acceptable diversity and so better clusters. We compared our proposed method against fifteen state-of-the-art methods on some UCI and 3 protein datasets. The rigorous statistical tests confirmed that MREC could perform significantly better than the compared methods in terms of clustering accuracy.
As future works, we plan to extend the current work to different-objective evolutionary algorithms and deep learning methods to improve the clustering performance.
