Abstract
In recent years, tensor-Singular Value Decomposition (t-SVD) based tensor nuclear norm has achieved remarkable progress in multi-view subspace clustering. However, most existing clustering methods still have the following shortcomings: (a) It has no meaning in practical applications for singular values to be treated equally. (b) They often ignore that data samples in the real world usually exist in multiple nonlinear subspaces. In order to solve the above shortcomings, we propose a hyper-Laplacian regularized multi-view subspace clustering model that joints representation learning and weighted tensor nuclear norm constraint, namely JWHMSC. Specifically, in the JWHMSC model, firstly, in order to capture the global structure between different views, the subspace representation matrices of all views are stacked into a low-rank constrained tensor. Secondly, hyper-Laplace graph regularization is adopted to preserve the local geometric structure embedded in the high-dimensional ambient space. Thirdly, considering the prior information of singular values, the weighted tensor nuclear norm (WTNN) based on t-SVD is introduced to treat singular values differently, which makes the JWHMSC more accurately obtain the sample distribution of classification information. Finally, representation learning, WTNN constraint and hyper-Laplacian graph regularization constraint are integrated into a framework to obtain the overall optimal solution of the algorithm. Compared with the state-of-the-art method, the experimental results on eight benchmark datasets show the good performance of the proposed method JWHMSC in multi-view clustering.
Keywords
Introduction
In the era of big data, the speed of data generation and collection is increasing rapidly, and multi-view data is becoming more and more common in practical applications. Compared to traditional data, which describes objects from a single perspective, multi-view data is semantically richer and more useful, but more complex [1]. For instance, the same news can be reported by articles in different news centers. A document can be translated into multiple languages, such as French, German, Japanese, etc. A web page can be described by pictures, text or links. Due to the consistency and complementarity of multi-view data, the description of data samples can be enriched and the learning effect can be improved. Therefore, the analysis and learning of multi-view data have attracted more and more attention.
Clustering is a widely used unsupervised learning method, which uses a certain similarity measure to group a given dataset, so that the similarity of samples in the same clusters is as large as possible, and the similarity of samples in different clusters is as small as possible. Multi-view clustering as an important method of multi-view learning, which aims to make full use of the consistency and complementarity between multiple views of data to obtain more effective and accurate learning results than using only a single-view of data. The existing multi-view clustering methods can be roughly divided into five categories: collaborative training clustering methods [2–4], graph clustering methods [5–8], subspace clustering methods [9–12], multi-kernel learning clustering methods [13, 14], and deep learning clustering methods [15, 16]. This paper mainly studies methods based on subspace clustering.
The purpose of subspace clustering is to learn a robust coefficient matrix or similarity matrix commonly used for spectral clustering. Sparse subspace clustering (SSC) [17] and low-rank subspace clustering (LRR) [18] are two classic subspace clustering methods based on spectral clustering. The former adds a sparsity constraint on the coefficient matrix represented by the sample linearly, while the latter is a low-rank constraint on the coefficient matrix. SSC lacks a global constraint on the coefficient matrix. Although LRR has a low-rank global constraint, the coefficient matrix obtained is often dense, which is not conducive to subspace division. On the basis of LRR and SSC, a large number of multi-view clustering algorithms have been proposed [19–28]. In order to recover a shared similarity matrix for the final clustering task from multiple views, Xia et al. [19] proposed a robust multi-view spectral clustering (RMSC) method based on low rank and sparse decomposition. Considering that the original features may contain noise, Zhang et al. [20] tried to find latent representation and developed a latent multi-view subspace clustering (LMSC) method. Chen et al. [21] designed a multi-view clustering in latent embedding space (MCLES) which firstly mapped each view data into a common representation matrix, and then learned a global similarity matrix for spectral clustering. By relaxing the constraint of the global similarity matrix, Chen et al. [22] presented a relaxed multi-view clustering in latent embedding space (R-MCLES) wihch avoided the optimization problem of quadratic programming. Hu et al. [23] learned a sparse consensus similarity matrix from multiple views and introduced a multi-view spectral clustering (S-MVSC) method based on sparse graph learning. Considering the different contributions of multiple views, Nie et al. [24] proposed a self-weighted multi-view clustering with multiple graphs (SwMC) method. In order not to introduce additional parameters, Nie et al. [25] developed a parameter-free auto-weighted multiple graph learning (AMGL) method. Subsequently, Nie et al. [26] put forward a multi-view clustering via adaptively weighted procrustes (AWP) method. In order to avoid post-processing and reduce high computational cost, Hu et al. [27] came up with a multi-view spectral clustering via integrating nonnegative embedding and spectral embedding (NESE) method. For an overview of multi-view learning, please refer to [29].
Recently, by stacking the similarity matrices of multiple views into a 3-order tensor, the tensor-based multi-view spectral clustering methods have received widespread attention [30–35]. To be specific, the high-order correlations between multiple views can be captured by using the low-rank of the tensor. For instance, Zhang et al. [30] first proposed a low-rank tensor constrained multi-view subspace clustering (LT-MSC) method, which regards the similarity matrices constructed by different views through self-representations learning as the lateral slices of tensor. However, the tensor nuclear norm is not a tight convex relaxation of ℓ1-norm. In order to deal with this problem, Lu et al. [36] developed a tensor nuclear norm based on tensor-singular value decomposition (t-SVD). Subsequently, Xie et al. [31] presented an unifying multi-view self-representations for clustering by tensor multi-rank minimization (t-SVD-MSC), which effectively captures the high-order information embedded in the multi-view data. However, they ignore the local geometric structure embedded in the high-dimensional ambient space. To solve this problem, Xie et al. [32] introduced a hyper-Laplacian regularized multilinear multiview self-representations for clustering (HLR-M2VS). Combining the idea of robust principal component analysis, Wu et al. [33] used a tensor nuclear norm based on t-SVD to preserve the low-rank property of the essential tensor, a new method of essential tensor learning for multi-view spectral clustering (ETLMSC) is proposed. In order to deal with different types of errors, Wang et al. [34] developed a multi-view clustering error robust low-rank tensor approximation (ELRTA) method. Taking into account the limitations of step-by-step learning transition probability matrix, Chen et al. [35] adopted a one-step strategy to directly learn the transition probability matrix under the framework of robust principal component analysis.
Although these tensor-based spectral clustering methods have achieved good performance in multi-view clustering, they still have some limitations. First, they ignore that different singular values have significant difference, which limits their ability and flexibility in dealing with many practical problems. Second, they ignore that the data samples in the real world often exist in multiple nonlinear subspaces, which cannot effectively captures the local geometry structure of the data. In order to overcome the above problems, we proposes a hyper-Laplacian regularized multi-view subspace clustering (JWHMSC) model that joints representation learning and weighted tensor nuclear norm constraint for multi-view clustering. The Fig. 1 shows the framework of the proposed JWHMSC. The main contributions of our proposed model are as follows. The subspace representation matrices of all views are stacked into a low-rank constraint tensor, which not only captures the global structure of all views, but also exploits the complementary information between different views. hyper-Laplace graph regularization is adopted to preserve the local geometric structure embedded in the high-dimensional ambient space. WTNN based on t-SVD is introduced to treat singular values differently, which can make better use of the prior information of singular values. Therefore, the proposed JWHMSC can more accurately reveal the category information of the sample data distribution. Representation learning, WTNN constraint, and hyper-Laplacian graph regularization constraint are integrated into a framework to obtain the overall optimal solution of the algorithm. An effective algorithm based on the alternating direction method of multipliers (ADMM) is proposed to solve the JWHMSC model. Experiments on eight benchmark datasets validate our proposed method is better than many state-of-the-art clustering approaches.

Flowchart of JWHMSC method. (a) Multi-view Data. (b) Subspace Representations. (c) First, the subspace representations
It should be pointed out that the most similar method to our proposed JWHMSC is HLR-M2VS [32]. In fact, the model of HLR-M2VS can be regarded as a special case of our proposed model JWHMSC. When the weights of WTNN are all set to 1, the proposed model of JWHMSC degenerates to the model of HLR-M2VS.
The rest of this paper is structured as follows. Section 2 briefly reviews single-view subspace clustering and multi-view subspace clustering. Section 3 introduces the proposed JWHMSC method which provides an optimization solution process, convergence and complexity analysis. The experimental results and analysis are introduced in detail in Section 4. In Section 5, we present the conclusion and future work.
In this part, we briefly review the single-view subspace clustering method, and then introduce the multi-view self-representations subspace clustering method.
Single-view subspace clustering
Among the single-view subspace clustering methods, the self-representation subspace clustering model is the most typical and important. Self-representations subspace clustering is based on the assumption that any data point in the space can be obtained by a linear combination of other sample points. Commonly used single-view subspace clustering methods can be expressed in the following form
Since multi-view data can provide more category information, many single-view subspace clustering methods are extended to multi-view subspace clustering [30–32]. The multi-view subspace clustering model based on self-representation learning has the following general formula
In this section, some notations and peliminaries will be provided, then we introduce the JWHMSC model, and give the solution process of the model. Finally, the convergence and complexity of the model are analyzed.
Notations and peliminaries
For a 3-order tensor
Most existing t-SVD based tensor low-rank approximation methods [10, 33] mainly solve the following tensor nuclear norm minimization problem
The optimal solution of Equation (5) can be solved by solving n3 independent optimization problems. We use b to represent the number of singular values of
The most low-rank matrix
In multi-view self-representation learning, the weighted tensor nuclear norm is used to perform low-rank constraints on the tensor composed of representation coefficients, which can not only obtain the complementary information between different views, but also describe the global structure of the sample data. At the same time, the hpyer-Laplace [40] constraint is adopted to constrain the self-representation coefficients in order to maintain the local geometric structure of the sample data. Thereform, the objective function of our proposed JWHMSC model is shown below
We use the Alternating Direction Method of Multipliers (ADMM) [42] to optimize the proposed model, and the constructed augmented Lagrangian function is as follows
Fixing
Equation (11) is a smooth convex problem. Therefore, we take the derivative of the variable
Fixing
Lemma 4.1 in [18] can solve the problem in Equation (13), the introduction of the lemma is as follows.
Here
Fixing
To solve Equation (17), we show the following theorem.
From theorem 1, the solution of Equation (17) is as follows
In addition, we update the Lagrange multipliers
The whole solution process of Equation (9) is shown in Algorithm 1, in which the stopping criterion is given as follows
Here ε is a predefined tolerance, we set its value to 10-7. Once the convergence condition in Equation (93) is satisfied, we construct the final similarity matrix by unfolding the tensor representation
In literature [45], the strong convergence of the ADMM method with two blocks is proved. However, as the number of blocks increases, the convergence property of the ADMM method is difficult to prove. In our model, these blocks
For the computation complexity, as shown in Algorithm 1, our model is mainly composed of five stages, which are introduced separately. Define n to represent the number of data samples, V is the number of views. The first process is to solve hyper-Laplacian matrix
Experiments
In this section, the public standard datasets used in the experiment are first introduced, and then the comparison algorithms and experimental results and analysis are displayed. Finally we represent the parameter setting, convergence analysis, running time, weighted values analysis and the visualization results of some clustering algorithms. All experiments are conducted in Matlab 2019a on the computer with Intel(R) Core(TM) i5-4210U processors, 1.70 GHz CPU and 12 GB of memory.
Datasets
In our experiments, eight benchmark datasets are used to validate our method. They are one-handred plant species leaves dataset 1 (100leaves), 3 source dataset 2 (3source), BBC dataset 3 (BBC), Handwritten digit dataset 4 (Hdigit), Newsgroups dataset 5 (NGs), BBC-Sport dataset 6 , Caltech101-20 dataset [23], Calte ch101-07 dataset [23] respectively. Their detailed information can be found in literatures [6] and [23]. Table 2 briefly describes the characteristics of these datasets. Some sample images of these datasets are shown in Figs. 2–5.
Statistics of different datasets
Statistics of different datasets
Clustering results of ACC (%) on the tested multi-view datasets

The sample image of the 100leaves dataset.

The sample image of the Hdigit dataset.

The sample image of the Caltech101-07 dataset.

The sample image of the BBC-Sport dataset.
We compare JWHMSC with nine state-of-the-art clustering algorithms, which are SPC best [44], S-MVSC [23], SwMC [24], AMGL [25], AWP [26], GMC [6], MCGL [46], NESE [27] and HLR-M2VS [32] respectively.
Evaluation metrics
In our experiments, three evaluation metrics are used to evaluate all algorithms, which are ACC (Accuracy), NMI (Normalized Mutual Information) and ARI (Adjusted Rand Index) respectively. These evaluation metrics have been extensively used in previous studies [6, 32], and their detailed information can be found in [31]. For each evaluation metric, the larger the value, the better the result.
Parameter setting
In our proposed JWHMSC method, four parameters are involved, which are λ1, λ2, λ3 and ω
i
respectively. Their value ranges are [0.01,0.4], [0.1,1], [0.5,5] and (0,100] respectively. Specifically, λ1, λ2, λ3 and
Performance evaluation
Tables 2–4 shows the clustering results of all algorithms on eight datasets. The best clustering results are shown in bold. We report the average of 20 runs of all algorithms. The standard deviations of all algorithms on all datasets are less than 0.1. Due to page limitations, we do not display them. From Tables 2–4, we draw the following observations. In most cases, the performance of JWHMSC is better than all the compared algorithms, especially on the 100leaves, BBC, Hdigit, NGs and BBC-Sport datasets. Compared with the second-best method, take ACC as an example, our method obtains significant improvement around 3.0%, 2.5%, 9.6%, 0.5%, 1.4% on the 100leaves, 3source, BBC, Hdigit and Caltech101-20 datasets respectively. This may be because our method preserves the local geometric structure embedded in the high-dimensional ambient space through the hyper-Laplacian constraint, and considers the prior information of singular values through the t-SVD based WTNN constraint, which effectively utilizes the complementary information between different views. Compared with HLR-M2VS, on the 3source dataset, our method indicates a significant increase of 12.5%, 3.9% and 13.6% w.r.t. ACC, NMI and ARI respectively. On the BBC dataset, our method shows 9.6%, 23% and 22.8% of relative improvement w.r.t. ACC, NMI and ARI respectively. This may be because our method introduces a weighted tensor nuclear norm and treats each singular value differently, that is, considering the prior knowledge of singular values. Numerical results show that the clustering performance can be improved by applying different weights to each singular value. T-SVD based tensor low-rank methods, including HLR-M2VS and our method, have significantly improved the clustering results on BBC, NGs, BBC-Sport, and Caltech101-20 datasets compared with other algorithms. This may be because the tensor low-rank methods consider the high-order correlations embedded in the multi-view data. In addition, the complementary information among different views can be explored more efficiently and thoroughly by the t-SVD based tensor low-rank methods. Overall, the clustering results of the single-view standard spectral clustering algorithm are inferior to the multi-view clustering algorithms. It shows that by fully fusing multi-view data, the separability of multi-view data can be improved. SwMC algorithm achieves lower performance than the best spectral clustering method in all single-view data. It is probably because the heterogeneous features underlying different views leads to the poor similarity matrices of SwMC.
Clustering results of NMI (%) on the tested multi-view datasets
Clustering results of NMI (%) on the tested multi-view datasets
Clustering results of ARI (%) on the tested multi-view datasets
Average running time (in seconds) on all datasets
Figure 7 shows the empirical convergence analysis of our method on eight benchmark datasets. The vertical axis represents the error defined as

Performance of JWHMSC with various parameter k ranging in {2, 3, 4, 5, 6, 7, 8, 9, 10}.

Convergence analysis on all datasets.
Table 5 reports the average running time of our method and all compared methods on eight benchmark datasets. It is easy to see that the multi-view graph clustering methods run faster than the multi-view subspace clustering methods, which include HLR-M2VS and JWHMSC. The graph based method S-MVSC runs the fastest, the main reason is that the S-MVSC method can directly obtain the closed-form solutions without iteration in the process of solving variables. For the multi-view subspace methods, we also find that in most cases, our proposed method is more effective than the competitors, which is also verified in Tables 2–4.
Weighted values analysis
Figures 8 and 9 show ACC and NMI of our JWHMSC method versus weighted coefficient

Analysis of the weighted coefficient

Analysis of the weighted coefficient

Visualization of the embedding results on Hdigit dataset.
As shown in Fig. 10(a)–(k), which are obtained by performing t-distributed stochastic neighbor embedding (t-SNE) algorithm on each view, SPC, S-MVSC, SwMC, AMGL, AWP, MCGL, NESE, GMC, HLR-M2VS and JWHMSC. We can see that compared with Fig. 10(a)–(k), our proposed JWHMSC method can effectively group similar samples into the same clusters.
Conclusion
In this paper, we put forward a novel hyper-Laplacian regularized multi-view subspace clustering method by jointing representation learning and weighted tensor nuclear norm constraint, which is referred as JWHMSC. To be specific, in our proposed model, the subspace representation matrices of all views are stacked into a low-rank constraint tensor to capture the global structure of all views and exploit the complementary information between different views. In addition, the hyper-Laplace graph regularization is adopted to preserve the local geometric structure embedded in the high-dimensional ambient space. Considering that different singular values have different importance, a WTNN based on t-SVD is introduced to treat singular values differently, which can make better use of the prior information of singular values. Experiments on eight benchmark datasets with different applications validate our proposed method is superior to many state-of-the-art clustering approaches. In future work, we are interested in combining our method with representation learning of deep learning to consider large-scale datasets.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (No.61866033), the Introduction of Talent Research Project of Northwest Minzu University (No.xbmuyjrc201904), the Gansu Provincial First-class Discipline Program of Northwest Minzu University (No.11080305), the Leading Talent of National Ethnic Affairs Commission (NEAC), the Young Talent of NEAC, and the Innovative Research Team of NEAC (2018) 98.
