Abstract
Multi-view subspace clustering arises in many computer visional tasks such as object recognition and image segmentation. The basic idea is to measure the same instance with multiple views. In this paper, we proposed two centralized joint sparse representation models, namely, Centralized Global Joint Sparse Representation (CGJSR) and Centralized Local Joint Sparse Representation (CLJSR) for multi-view subspace clustering. CGJSR and CLJSR force the concatenated representation matrix of all views and the representation matrix of each view to be sparse respectively. Both CGJSR and CLJSR allow the sparse coefficient matrix to approach a unified latent structure with an acceptable error. Noises and outliers regularization terms are included in CGJSR and CLJSR to reduce the influence of noises and outliers. Related optimization problems are solved using the alternating direction method of multipliers. Compared with seven state-of-the-art multi-view clustering algorithms, our proposed algorithms can achieve better or comparable results on four real-world datasets.
Introduction
In many real-world applications, data are collected from multiple views or multiple features. For example, web-pages can be represented by page-text or hyperlinks pointing to them [1]. Biometrics system can be presented by face, fingerprints or iris [2]. Since more useful information exists in multi-view dataset than in a single view, multi-view learning algorithms can result in better performances [3–5]. In recent years, many efforts have been made to develop multi-view clustering algorithms from different viewpoints due to its widely practical applications. Multi-view clustering aims to group the samples based on the multiple representation of the sample.
Most previous multi-view clustering methods can be divided into three major categories: (1) graph-based and multi-kernel learning algorithms; (2) co-training or co-regularized style algorithms; (3) subspace learning algorUithms [6]. First, the graph-based algorithms explore the relationship among different views by fusing multiple graphs [7–13]. Proposes a multi-view clustering algorithm which constructs a bipartite graph by disagreement minimization to connect the two-view feature [9]. Mainly fuses the corresponding weighted graphs by a common transition matrix to learn the consistent information [10]. Iteratively updates the hyper-parameters of traditional graph-based algorithms to obtain optimal graph representations and a consistent manifold without relying on any predefined parameter [11]. Mainly applies a rank constrain to the Laplacian matrix to directly output cluster indicators and proposes a new disagreement cost function to measure the difference between any two views. Similar to [11, 12], considers the rank property of the Laplacian matrix and learns a global graph [13]. Integrates Norm Regularization, Exponential Decay and p-th Root Loss into a unified weighting learning paradigm. Multiple Kernel Learning (MKL) [5, 14–15] technique has also been used to integrate information from different features by learning a weighted combination of respective kernels, which is most related to graph-based algorithms [15]. Proposes a multi-kernel learning based multi-view clustering algorithm by adaptively estimate the kernel weight. However, for multi-view systems, weight determination during testing is important, based on the quality of views [4]. Second, Co-training and co-regularized style algorithms often construct separate learners on distinct views, then utilize the information in each learner to constrain other views [16–18]. This kind of method should satisfy two main assumptions for success: sufficiency and conditional independence [16]. Proposes a co-training style multi-view clustering method which utilizes the spectral embedding from one view to constrain the adjacent matrices in other views [17]. Proposes two co-regularized style methods (CO-P and CO-C) which implicitly combines graphs from multiple views of the data to achieve a better result. Both the graph-based and multi-kernel learning algorithms and the co-training or co-regularized style algorithms can handle high-dimensional data by dimension reduction, but the limitation of them is that they depend tightly on a reasonable graph (kernel or spectral) based on clean data and an effective measurement method. In fact, it is hard to reasonably determine a measurement method and the similarity of the samples for a dataset with noise and outliers.
In recent years, many researchers have efforts to subspace learning methods that not only select features but also learn cleaner data by self-representation, Multi-view Canonical Correlation Analysis (CCA) and so on. This kind of algorithms supposes that the observations from different views are generated from a union of low-dimensional subspaces. These algorithms are roughly categorized: (1) multi-view CCA clustering; (2) multi-view subspace clustering. First, Multi-view CCA [19] projects the multi-view data onto a low-dimensional subspace for clustering, but such dimensionality reduction methods are specifically designed for two views. Tensor CCA in [20] extends the above CCA based algorithm to handle views more than two by analyzing the covariance tensor of different views. Second, multi-view subspace clustering methods expect to explore the latent subspace structure of samples with self-representation [21, 25]. Introduces a low-rank tensor constrain to explore the complementary information from multi-view data based on low-rank tensor representation [6] and [26] propose multi-view subspace clustering algorithms by utilizing the tensor-singular value decomposition (t-SVD) based nuclear norm (t-TNN) [24, 27]. Extends the classical subspace clustering algorithms SSC [28], LRR [29] and LRSSC [30] to multi-view data by exploiting the self-expressiveness property of each sample in its respective view and enforcing the common representation across the views. The latent representation is divided into two parts, i.e. shared consistency and specificity, and both parts are jointly learned in [31, 32]. Adds a block diagonal constrain to the latent representation. Pairwise MLRSSC and Centroid-based MLRSSC (C-MLRSSC) in [22] respectively encourage the consistency from pair views and all views to be similar [21]. SSuts forward a diversity-induced multi-view subspace clustering (DiMSC) algorithm which utilizes the Hilbert Schmidt Independence Criterion (HSIC) to explore the complementarity of multi-view representations [33]. Combines self-representation learning and clustering in each view, and explores the consensus graph structure by common clustering indicator. Similar to [21, 34], Preserves the diversity of each view by HSIC criterion and learns a common shared coefficient matrix by low-rank representation. Obviously, high-quality latent representation is crucial to construct an effective graph structure or to be used to clustering algorithms. Multi-view subspace clustering can embed all latent representations into a unified subspace to fuse complementary information from distinct views and doesn’t bridge any two subspaces repeatedly.
In the real word, most datasets often contain noises and outliers, and each view from a specific source should have unique effect. However, most previous multi-view subspace clustering methods either ignore the specificity of each view by using the common latent representation with no error of different views [24, 34], or ignore the noises or outliers of samples [22, 32]. Given these limitations, two centralized joint sparse representation algorithms, namely Centralized Global Joint Sparse Representation (CGJSR) and Centralized Local Joint Sparse Representation (CLJSR) are proposed in this paper. Fig. 1 lists the framework of our proposed algorithm. The main contributions of our algorithms are as follows:

Illustration of our method. Given a dataset
CGJSR and CLJSR not only consider the specific of representation matrix of different views, but also consider the influence of data noises and outliers. A centralized sparse term is used to represent the consistency information from all views. The sparse representation matrices of different views are not to be exactly the same to preserve the specificity of each view. For two types of noise, the Frobenius norm is used to enhance the robustness to the small and additive noise (such as Gaussian Noise), while the l2 norm is used to enhance the robustness to the sparse noise (such as outliers). Compared with seven state-of-the-art multi-view clustering algorithms, our proposed algorithms can achieve better or comparable results on four real-world datasets.
The paper is organized as follows. Section 2 introduces the multi-view subspace clustering problem and the related multi-view subspace clustering algorithms. In Section 3, two multi-view subspace clustering algorithms are proposed. The experimental results and analyses are presented in section 4. Finally, section 5 gives the conclusions and future work.
In this section, we list some algorithms that are most related to our work.
Multi-view subspace clustering
Given a dataset
Here we list some nations that will be used in the following sections: ∥
Related algorithms
Here we introduce some multi-view subspace clustering algorithms that are closely related with our work.
The C-MLRSSC in [22] enforces representations across different views towards a common centroid term. It aims to solve the following minimization problem:
The CSMSC in [31] jointly exploits the consistency and specificity for subspace representation learning. The optimization problem is as follows:
After obtaining the self-representation, the joint affinity matrix can be directly computed from the self-representation i.e.
MVSC in [33] use a common cluster structure to guarantee the consistence among different views. The optimization problem is as follows:
Noticing that the practical data may contain noises and outliers simultaneously, and the specificity of different views is also important to the learning of self-representation matrix, two centralized joint sparse representation algorithms CGJSR and CLJSR are proposed in this section. The proposed algorithms not only consider the error of the representation matrix of different views by permitting the representation matrix not to be exactly the same, but also consider the influence of data noises and outliers by imposing different error regularization. The alternating direction method of multipliers (ADMM) is used to solve the proposed optimization models.
Centralized global joint sparse representation algorithm (CGJSR)
Given a training dataset
As for each view i,
Once
The sparsity of concatenated matrix
As for each view i,
In this section, an optimization algorithm by the popular ADMM algorithm [36, 37] is developed to solve Equations (5) and (6). Because the solving processes of Equations (5) and (6) are similar, here we only present the detailed program for solving Equation (6).
Firstly, two auxiliary variables
The augmented Lagrange function is given by:
1) Update
The objective function in Equation (8) with
Computing the derivative of Equation (9) with respect to
2) Update E i
The objective function in Equation (8) with
Computing the derivative of Equation (11) with respect to
3) Update
Fixed the variables except
Calculating the derivative of Equation (13) with respect to
4) Update
The objective function in Equation (8) with
The last two items can be formulated into complete squares, so Equation (15) can be converted into the following problem:
Due to the separable structure of Equation (16), it can be solved by minimizing it with respect to each row of
The objective function in Equation (8) with
The last two items can be formulated into complete squares, so Equation (19) can be converted into the following problem:
It is shown in reference [24] that the Equation (20) can be solved in closed form using the shrinkage-thresholding operator as
6) Update the multipliers
The complete algorithm is outlined in Table 1.
Multi-view subspace clustering algorithm via CLJSR model
The proposed CLJSR which adopts ADMM method to solve CLJSR model, mainly includes the computation of
Experimental results
In this section, the performance of the proposed algorithms CGJSR and CLJSR is evaluated on four datasets including MSRC-V1, NGs, UCI Digits and ORL. We compare the performance of the proposed algorithms CGJSR and CLJSR with seven state-of-the-art multi-view subspace clustering methods such as: co-regularized information-based multi-view spectral clustering algorithms (Co-P and Co-C [17]), Centroid-based multi-view low-rank sparse subspace clustering algorithm (Centroid-based MLRSSC (C-MLRSSC) [22]), multiple graph learning-based clustering algorithms (AMGL [10] and MVGL [12]), multi-view subspace clustering algorithm (MVSC [33]) and clustering algorithm with consistency and specificity (CSMSC [31]).
The parameters of the proposed algorithms and the state-of-the-art algorithms are tuned by grid search method to achieve the best clustering accuracy. Our programs are written in Matlab2017b on a computer with Intel (R) Core i7-6700 2.6 GHz, 12GB RAM memory and Microsoft Windows 10.0.
Description of dataset
MSRC-V1 1 scene dataset consists of 240 images with 8 classes. Similar to the work in [33], we select 7 classes including airplane, car, bicycle, cow, face, tree and building as test dataset with additive noise. The five views are represented by Color Moment (CM, 24 features), HOG (576 features), GIST (512 features), LBP (256 features) and CENTRIST (CENT, 254 features) respectively.
7 Newsgroups (NGs) 2 dataset with 500 documents and 5 topics is a subset of the 20 Newsgroup dataset. It consists of three views and each view is constructed by applying a specific pre-processing method to each document. The detail of extraction method is introduced in [38].
UCI Handwritten Digits dataset (UCI Digits) 3 includes 2000 samples from 10-digit classes (200 samples per class). It is a natural multi-view dataset and is composed of six feature representations using the Fourier descriptors (76 features), Karhunen-Loeve coefficients (64 features), profile correlations (216 features), morphological (6 features), Zernike moment (47 features) and pixel averages in 2×3 windows (240 features). Specific steps can be referred to [16] and [12].
The ORL 4 face dataset covers 10 different images of 40 distinct subjects. These images present different facial expressions at different times and under diverse illuminations. The GIST (512 features), LBP (59 features), HOG (864 features) and CENT (254 features) are selected as four independent views to describe these images.
The detailed information is descripted in Table 2 and some specific instances of the MSRC-V1 and ORL datasets are displayed in Fig. 2 and Fig. 3 respectively.
Statistics of the experimental datasets
Statistics of the experimental datasets

The sample image of the MSRC-V1 dataset.

The sample image of the ORL dataset.
Experimental results
We evaluate clustering performance using five different metrics: normalized mutual information (NMI), F-score, Precision, Recall, and adjusted rand index (Adj-RI). For all the metrics, higher value means better result. Each algorithm runs 20 times and the average value and standard deviation are recorded. Table 3 displays the results. The best results of the nine algorithms are highlighted in bold type. It can be seen from the Table 3 that: On MSRC-V1, NGs and ORL dataset, CLJSR performs better than other seven state-of-the-art multi-view subspace clustering algorithms on all metrics. On UCI Digits dataset, the F-score, Precision and Adj-RI of CLJSR are 5.03%, 5.13% and 5.57% higher than the best results of the other seven state-of-art multi-view subspace clustering algorithms, though the NMI and Recall of CLJSR are 0.22% and 1.56% lower than that of MVGL algorithm. Compared with the other seven state-of-art multi-view subspace clustering algorithms, CGJSR can achieve the best results on MSRC-V1 and NGs datasets on all metrics. On UCI Digits dataset, the F-score, Precision and Adj-RI of CGJSR are 4.97%, 5.11 % and 5.53% higher than the best results of the other seven state-of-art multi-view subspace clustering algorithms, though the NMI and Recall of CGJSR are 0.45% and 1.44% lower than that of MVGL algorithm. On ORL dataset, the F-score, Precision and Adj-RI of CGJSR are 0.19%, 1.26% and 0.20% higher than the best results of the other seven state-of-art multi-view subspace clustering algorithms, though the NMI and Recall of CGJSR are 0.45% and 3.54% lower than that of MVGL algorithm. CLJSR performs slightly better than CGJSR, which means that the sparsity of the representation coefficient matrix of each view can make use of more discriminant information than the sparsity of the concatenated matrix of the representation coefficient matrix for multi-view subspace clustering problem.
Clustering results of nine multi-view clustering algorithms on four datasets
Clustering results of nine multi-view clustering algorithms on four datasets
We visualize the affinity matrices learned from nine different algorithms in Fig. 4 to directly present the relationship of all samples in the latent subspace. The ORL dataset is taken in this experiment. Brighter parts represent larger coefficient values. Note that MVGL, MVSC, CSMSC, C-MLRSSC, CGJSR and CLJSR aim to learn the unified coefficient matrix as the latent representation, while CO-P, CO-C and AMGL regard the indicator vectors as the latent representation which is used to construct an affinity matrix by Gaussian kernel criterion. As can be seen from Fig. 4 (a)-(c), the affinity matrix of CO-P, CO-C and AMGL are dense which can’t eliminate the influence of noise well. Fig. 4 (d)-(e) show that the affinity matrices of MVGL and MVSC are too sparse to lose some useful relevant information. It can be seen from Fig. 4 (f)-(i) that compared with CSMSC and C-MLRSSC, CGJSR and CLJSR can output more sparse coefficients which are more robust to noise and outliers. The visual representations of Fig. 4 are consistent with the precision of the ORL dataset in table 4, which demonstrates that a balance affinity matrix between preserving information and denoising can perform better.

The visual representation of similarity matrix on ORL dataset. (a)-(i) visualize the affinity matrices of CO-P, CO-C, AMGL, MVGL, MVSC, CSMSC, C-MLRSSC, CGJSR, CLJSR by gray graph, respectively. The whiter the point is, the closer the corresponding samples are.
Clustering results with different number of views on MSRC-V1 dataset for CGJSR
Note: The data in brackets is the standard deviation of running 20 times. The result of MVGL is unique, so the standard deviation is always equal to zero, which is marked as 0. For CO-P, CO-C and AMGL algorithms, their basic equation is standard spectral clustering model, which is represented as
In this part, we present the sensitivity analysis including parameter sensitivity, the effect of the number of views, and noise sensitivity of CGJSR and CLJSR algorithms.
Effect of the key term
As can be seen from the model (5) and (6), the performance of CGJSR and CLJSR algorithms is closely related to the sparsity term, noise occlusion regularization and the centralized term. Referring to [4], we expect to explore the importance of each key term to the model by varying λ1, λ2 and λ3. The subspace clustering task with 5 views from MSRC-V1 dataset is chosen to do the experiments. According to the results of grid search, λ1, λ2 and λ3 are set as 0.0001, 0.001, 0.01 respectively, since it is the most stable parameter setting in the grid search process.
(1) Effect of sparsity
Consider the parameter λ1 ∈ {0, 0.00005, 0.0001, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1} with λ2 = 0.001 and λ3 = 0.01 and study the relationship between the sparsity term and the NMI. From Fig. 5 (a), we can see that the NMI values of CGJSR and CLJSR are slightly increasing with λ1 varying from 0 to 0.01. If λ1 exceeds 0.01, it will lead to a sharp drop. This demonstrates that the l1,q sparse constrain is conductive to improving the precision, while the centralized representation matrix is too sparse will lead to a lack of structure information.

The effect of sparsity term λ1, noise regulation λ2 and centralized term λ3 on the clustering NMI on MSRC-V1 dataset.
(2) Effect of noise regularization
Consider the parameter λ2 ∈ {0, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1} with λ1 = 0.0001 and λ3 = 0.01 and study the relationship between the noise regularization and the NMI. From Fig. 5 (b), we can see that the noise regulation term is important to the clustering model. The NMI curve increases sharply when λ2 is from 0 to 0.001. The NMI starts decreasing when λ2 is greater than 0.001.
(3) Effect of centralized term
Consider the parameter λ3 ∈ {0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1} with λ1 = 0.0001 and λ2 = 0.001 and study the relationship between the centralized term and the NMI. From Fig. 5 (c), we can see that CGJSR and CLJSR obtain maximum NMI when λ3 equals to 0.01. These results are superior to the results of λ3 = 0, which means that considering the error of the sparse representation matrix of each view is very essential.
Here we discuss the effect of the number of views on CGJSR and CLJSR algorithms. MSRC-V1 dataset including Color Moment (CM), HOG, GIST, LBP, CENT of five views is used to do the experiments, setting the different view number from 1 to 5. As shown in Table 4-5, with the increase of the number of views, the NMI and Adi-RI of CGJSR and CLJSR algorithm grow rapidly. When the number of views reaches five, CGJSR and CLJSR algorithms achieve the best results which shows the validity of multi-view subspace clustering algorithms.
Clustering results with different number of views on MSRC-V1 dataset for CLJSR
Clustering results with different number of views on MSRC-V1 dataset for CLJSR
In the noise immunity experiment, ORL dataset is taken to evaluate the performance of nine algorithms.
All data is manually disturbed by Gaussian White Noise with different variance σ2 ={ 0, 0.002, 0.004, 0.006, 0.008, 0.01 }. As can be seen from Fig. 6, the NMI and Adj-RI of all algorithms have a different degree of reduction as the noise level σ increases. The performance of the proposed CGJSR and CLJSR algorithms is not much degraded with the increase of noise level. The NMI and Adj-RI of CLJSR algorithm are higher than that of the other seven algorithms at most of the noise level. The NMI and Adj-RI of CGJSR is only inferior to CLJSR when the noise level is below 0.004, though the NMI or Adj-RI of CGJSR is a little lower than that of CLJSR, AMGL and CSMSC when the noise level exceeds 0.004.

The clustering results with different levels of noise variance on ORL dataset for nine algorithms.
We study the convergence of CGJSR and CLJSR in terms of max error on the MSRC-V1 dataset. From Fig. 7, if the error threshold is 1e-4, the CGJSR and CLJSR converge fast within 10 iterations. If the error threshold is 1e-6, the CLJSR can converge with around 50 iterations and the CGJSR requires 5 more iterations to reach convergence.

Convergence curves of CGJSR and CLJSR in terms of max error - max(max|Gt+1 - G
t
|, max|
We propose two centralized joint sparse representation models for multi-view subspace clustering problem. These two algorithms not only preserve the specificity of each view, but also well describe the data representation together with noise and outliers. Specifically, CGJSR encourages the concatenated representation matrix to be row sparse, while CLJSR imposes the sparsity constraint on the row vectors of each views’ representation matrix. Furthermore, a unified latent structure is obtained for clustering. The Alternating direction method of multipliers (ADMM) algorithm is developed to solve the two models. Experimental results on four real-world datasets demonstrate the effectiveness of our approach. In the future, we will extend this algorithm to large-scale datasets and improve the efficiency of our algorithms.
Footnotes
Acknowledgments
The research was supported by the National Natural Science Foundation of China (NSFC) (Grant NO. 61502175, 61273295), the Humanities and social sciences research and planning fund of the Ministry of Education of China (x2lxY9180090) and the Natural Science Foundation of Guangdong Province (2016A030313545).
