Sample-Dependent Subspace Clustering with Elastic Structure Consistency Constraints

Abstract

Subspace clustering (SC) approximates high-dimensional data as a combination of low-dimensional subspaces, which is suitable for high-dimensional data analysis across various domains including image segmentation and face recognition. Existing SC methods typically obtain the global structure representation solely through the self-representation of the samples, thereby neglecting the intrinsic local connections among the samples. Moreover, due to their inherent framework design, obtaining additional a priori information in unsupervised scenarios presents a significant challenge. To address these limitations, this paper proposes a new method, named Sample-Dependent Subspace Clustering with Elastic Structure Consistency Constraints (SDSC). Firstly, we introduce a new Elastic Structure Consistency Constraints (ESCC) strategy to measure global and local structures elastically. Benefiting from this strategy, SDSC can flexibly explore the structural information within the samples to obtain a comprehensive data representation. By employing the joint regularization term, SDSC can learn effective cluster assignment information directly from the constrained structured data representation, and the cluster assignment information and representation coefficient matrix are smoothly integrated into a unified framework and learn in a mutually reinforcing manner. This learning approach contributes to comprehensive and high-quality clustering results, enhancing the robustness and utility of SDSC. Extensive experiments on several real-world benchmarks and synthetic datasets demonstrate the feasibility and effectiveness of SDSC.

Keywords

Elastic structure consistency constraints global and local structures cluster assignment information subspace clustering

1. Introduction

As a critical aspect of machine learning and artificial intelligence, clustering, particularly subspace clustering, is vital for exploring topological pairwise relationships in high-dimensional data.^1,2 Subspace clustering (SC) believes high-dimensional data as consisting of concatenated sets of low-dimensional subspaces^3,4 and its main purpose is to determine the subspace corresponding to the cluster of associated data points. Over the past decades, SC has been extensively studied and applied in various fields, including image clustering, motion segmentation, and face recognition. In general, the SC problem can be described as follows: Assume that the given data $X = [X_{1}, X_{2}, \dots, X_{k}] \in ℜ^{d \times n}$ consists of k subspaces ${C_{i}}_{i = 1}^{k}$ , where $X_{i}$ denotes the data samples belonging to the i-th subspace and $n_{i}$ denotes the number of samples in the i-th subspace, $\sum_{i = 1}^{k} n_{i} = n$ .

Currently, existing SC methods can generally be categorized into four types: iterative methods,⁵ statistical methods,⁶ spectral clustering-based methods,⁷ and algebraic methods.⁸ Among these four types, spectral clustering-based methods have demonstrated superior performance in various applications due to their theoretical guarantees.^9,10 The framework of the spectral clustering-based SC method is divided into two parts. In the first step, a reconstruction coefficient matrix $Z \in ℜ^{n \times n}$ is learned from the data, where $X = X Z$ . Then, an affinity matrix $\tilde{Z} = (Z + Z^{T}) / 2$ is constructed. In the second step, $\tilde{Z}$ is partitioned using a spectral clustering method such as Normalized Cuts (Ncuts)¹¹ or Ratio Cut¹² to obtain the final clustering result. In the study of spectral clustering-based SC methods, the primary focus is on designing regularizations to impart desirable properties to the representation matrix. The primary design strategies at present are categorized into two main groups: Sparse Subspace Clustering (SSC),¹³ which aims to minimize and solve the $ℓ_{1}$ norm-based problem to achieve a sparse representation of the data; and Low-Rank Representation (LRR),¹⁴ which employs the $ℓ_{*}$ norm-based to achieve a low-rank self-representation of the data. Numerous research methods for subspace clustering have emerged based on studies of SSC and LRR.^15–18

From the framework of spectral clustering-based SC methods, it is evident that their performance is highly dependent on the quality of the reconstructed coefficient matrix. Consequently, these methods suffer from the following issues. To ensure that different subspaces are independent of each other, most $ℓ_{1}$ norm-based methods represent the data with as few samples as possible. However, this inevitably weakens the ability to represent data within the same subspace, likely neglects otherwise beneficial data representation, and fails to establish correlations between samples, thereby negatively affecting the final clustering performance. Most $ℓ_{*}$ norm-based methods have strict requirements on the distributional patterns of the applied data and can theoretically only be applied to data sets with k mutually independent subspaces. These factors make it difficult to represent the true data structure distribution in the reconstruction coefficient matrix. It is worth noting that popular subspace clustering methods generally use the self-representation of the data to explain its global structure while ignoring the intrinsic local structural information of the data.¹⁹ Researches^20,21 have shown that relying solely on global structural information does not fully capture the data distribution, and additional structural information sources are needed to complement each other, i.e., focusing on both global and local structures.^22–24 Furthermore, spectral clustering-based SC methods tend to perform poorly in supervised scenarios, primarily because they are designed for unsupervised scenarios. As shown in,^25,26 even minimal supervised information can enhance clustering performance. This is because supervisory information offers direct and accurate cluster assignments, enhancing clustering performance, whereas structural information is more complex to analyze. Consequently, clustering results become highly sensitive to the provided information. Moreover, to enhance clustering in unsupervised scenarios, obtaining reliable a priori supervised information from unsupervised datasets presents a significant challenge.

To address the above issues, we first introduce a new strategy called Elastic Structure Consistency Constraints (ESCC). ESCC can flexibly explore the intrinsic structure of samples and accurately express the structural relationships between samples. Further, we propose a new clustering method called Sample-Dependent Subspace Clustering with Elastic Structure Consistency Constraints (SDSC). Instead of providing label information in advance, SDSC relies on samples to automatically learn reliable cluster assignment information. Meanwhile, SDSC introduces a joint regularization term to simultaneously learn the cluster information matrix and the representation coefficient matrix from the constrained structured data representation. This improvement mechanism allows SDSC to better adapt to various application scenarios.

The main contributions of the proposed method are as follows:

We propose a novel ESCC strategy that simultaneously captures both global and local structures while fully leveraging the complementarity of composite structures. Moreover, ESCC elastically captures the intrinsic structure of samples and accurately characterizes the affinity between samples.

We develop a subspace clustering model with ESCC strategy, which directly learns clustering assignment information from the samples themselves. By introducing a joint regularization term, the constrained structured data representation effectively guides the model in performing subspace clustering. This process optimizes the clustering information matrix and representation coefficient matrix in a mutually reinforcing manner, leading to improved clustering performance.

An effective iterative solution method is provided. Meanwhile, we perform several experiments on multiple real and synthetic datasets and analyze the results to illustrate the effectiveness of the SDSC method.

The rest of the paper is structured as follows: Section 2 provides a brief overview of existing subspace clustering methods. In Section 3, we introduce the proposed SDSC method. Then, we present the solution of SDSC and analyze the computational complexity. In Section 4, we conduct comparative experiments on seven benchmark datasets and six synthetic datasets. Finally, Section 5 presents the conclusions and future works.

2. Related works

Subspace clustering is a significant research area in machine learning. It posits that the complete feature space can be divided into multiple low-dimensional subspaces to explore meaningful patterns and relationships among data points. By focusing on these related subspaces, SC can effectively reveal the structural distribution of the data, even when it is complex and masked by noise. SSC and LRR are two of the most representative subspace clustering methods. Both regard the data itself as a dictionary of independent and disjoint subspaces.²⁷ This self-expressive approach assumes that if several data samples are in the same subspace, they are linearly correlated with each other. The specific problem can be described as follows:

\begin{aligned} min_{Z, E} Θ (Z) s . t . X = X Z \end{aligned}

(1)

where

Θ (Z)

usually denotes some regular term of Z. In practice, considering that the data often contains noise, the problem can be further described as follows:

\begin{aligned} min_{Z, E} Θ (Z) + ζ Ψ (E) s . t . X = X Z + E \end{aligned}

(2)

where E denotes the interfering terms,

Ψ (E)

is used to separate these interfering terms to ensure clean data, and

ζ

is a conditioning parameter. How to design

Θ (Z)

is crucial for the subspace clustering method. For example, Reweighted Sparse Subspace Clustering (RSSC)²⁸ redefines the weighted sparsity constraints on the coefficient matrix by using the inverse of each element. Structured Sparse Subspace Clustering (SSSC)²⁹ adds a reweighted

ℓ_{1}

norm to SSC, integrating SSC and spectral clustering into a unified framework. Structure-constrained LRR (SCLRR)³⁰ accurately reveals the relationships between multiple subspaces by adding structural constraints to the representation matrix. Graph-regularized LRR (GLRR)³¹ minimizes both the kernel norm and Laplace regularizer of the representation matrix simultaneously. Block diagonal representation (BDR)³² proposes a new k-diagonal block operator to directly pursue coefficient matrices containing k-diagonal block structures. Table 1. summarizes these representative methods.

Table 1.

Related subspace clustering methods for different $Θ (Z)$ and $Ψ (E)$ .

Methods	$Θ (Z)$	$Ψ (E)$
SSC	$‖ Z ‖_{1}$	$‖ E ‖_{1}$
LRR	$‖ Z ‖_{*}$	$‖ E ‖_{2, 1}$
SSSC	$‖ (I + ζ Q) ⊙ Z ‖_{1}$	$‖ E ‖_{1}$
RSSC	$‖ W ⊙ Z ‖_{1}$	$‖ E ‖_{1}$
SCLRR	$‖ Z ‖_{1} + ‖ Z ‖_{*}$	$‖ E ‖_{2, 1}$
GLRR	$‖ Z ‖_{*} + t r (Z L Z^{T})$	$‖ E ‖_{2, 1}$
BDR	$‖ Z ‖_{k} = \sum_{i = n - k + 1}^{n} σ_{i} (L_{Z})$	$‖ E ‖_{F}^{2}$

* I is the identity matrix. $⊙$ denotes the element-wise multiplication of a pair of elements in the matrix. In RSSC, $W = 1 / (Z + ϖ I)$ and $ϖ > 0$ denotes a small scalar. $‖ Z ‖_{k}$ denotes the sum of the k eigenvalues of the Laplacian matrix of Z.

However, these methods typically focus on a single representation of the data, either global or local. In reality, data usually exhibits a complex structure and is accompanied by various noise factors.³³ A single representation not only loses effective feature information but also makes these methods highly dependent on the data, potentially leading to overfitting³⁴ or out-of-sample problems³⁵ in practical applications. Additionally, supervised information significantly improves clustering accuracy. However, since clustering results are sensitive to the quality of a priori supervised information, inaccurate supervised information often negatively affects clustering results. Furthermore, the challenge of obtaining feasible and effective supervised information from unsupervised datasets remains open to debate. In summary, we propose a new subspace clustering method, named Sample-Dependent Subspace Clustering with Elastic Structure Perception (SDSC). SDSC considers the global geometric distribution and local neighborhood relationships of the data and can utilize the structural information to deeply uncover the complex associations among the data. Additionally, inspired by the spectral graph theory, useful cluster assignment information can be obtained directly from the samples, improving the accuracy of assigning the correct clusters and corresponding subspaces. The joint regular terms enable SDSC to optimize both data representation and labeling information, which in turn reinforce each other and ultimately achieve high-quality clustering performance. The following section describes the specific steps and implementation details of our proposed method in detail.

3. Proposed method

3.1 Elastic structure consistency constraints

Subspace clustering methods are used to identify highly correlated samples within the same space, under the assumption that data samples are self-expressive. Essentially, any data sample in a clean dataset can be linearly reconstructed by the linear combination of the other data samples, provided that there is no error or that the error is negligible. Consequently, $X \approx X Z$ , where Z represents the reconstruction coefficient matrix. The $x_{i}$ and $x_{j}$ indicate the i-th and j-th samples of the dataset X respectively, while $z_{i}$ and $z_{j}$ represent the coefficient representations corresponding to $x_{i}$ and $x_{j}$ , respectively. Motivated by,^14,36 SDSC assumes that there is a relational transitivity between samples and coefficient representations, termed Structure Consistency. This implies that the relationship between any two coefficient representations remains consistent with the relationship between their corresponding samples. This can be succinctly expressed as:

\begin{aligned} ‖ x_{i} - x_{j} ‖_{F}^{2} \to 0 \Rightarrow ‖ z_{i} - z_{j} ‖_{F}^{2} \to 0 \end{aligned}

(3)

It is widely acknowledged that manifold learning^37,38 plays a pivotal role in revealing the local nearest neighbor structure among samples. By capturing the nonlinear relationships within the samples and maintaining the local geometric structure, manifold learning can more accurately reflect their intrinsic properties. Normally, the concept of the nearest neighbor is employed in manifold learning to emphasize the local relationships of the samples, as illustrated below.

\begin{aligned} w_{i j} = {\begin{array}{ll} exp (- ‖ x_{i} - x_{j} ‖_{2}^{2} / 2 t^{2}), & x_{j} \in N_{k} (x_{i}) \\ 0, & otherwise \end{array} \end{aligned}

(4)

where

N_{k} (x_{i})

denotes the set of nearest neighbors of

x_{i}

. By relational transitivity, it follows that if

x_{j}

is one of the nearest neighbors of

x_{i}

, then the same nearest neighbor relationship exists between

z_{i}

and

z_{j}

, which can be described as:

\begin{aligned} x_{j} \in N_{k} (x_{i}) \Rightarrow z_{j} \in N_{k} (z_{i}) \end{aligned}

(5)

To obtain a good local nearest-neighbor representation, we provide the following definition:

\begin{aligned} m i n \sum_{i, j} {‖ z_{i} - z_{j} ‖}_{F}^{2} w_{i j} = m i n t r (Z^{T} D Z - Z^{T} W Z) = m i n t r (Z^{T} L Z) \end{aligned}

(6)

where

D = d i a g (s u m (W))

is a diagonal matrix and

L = D - W

is a Laplacian matrix. Based on (6), there is a significant gain in learning the local nearest neighbors of samples.

Although local neighborhood relationships play an important role in determining whether two samples share similar features and assigning them to a subspace, relying on local structure alone is not a reliable approach. This is because data representations based on local relationships are sensitive to outliers and noise, and a single representation does not capture the full range of data associations. By considering both local and global structures, it is possible to obtain a comprehensive understanding of the data, enhancing both intra-class similarity and inter-class separability. For a more comprehensive global linear description, we give the following derivation:

For the reconstruction coefficient matrix $Z = [Z_{1}, Z_{2}, \dots, Z_{k}] \in ℜ^{n \times n}$ , its covariance matrix $S$ ³⁹ is

\begin{aligned} S & = \frac{1}{n} (z_{i} - \bar{z}) (z_{i} - \bar{z})^{T} \\ = \frac{1}{n} (z_{1} - \bar{z}, z_{2} - \bar{z}, \dots, z_{n} - \bar{z}) (z_{1} - \bar{z}, z_{2} - \bar{z}, \dots, z_{n} - \bar{z})^{T} \end{aligned}

(7)

To simplify, we further deduce that

\begin{aligned} (z_{1} - \bar{z}, z_{2} - \bar{z}, \dots, z_{n} - \bar{z}) & = (z_{1}, z_{2}, \dots, z_{n}) - (\bar{z}, \bar{z}, \dots, \bar{z}) = Z^{T} - \bar{z} (1, 1, \dots, 1) \\ = Z^{T} - \frac{1}{n} Z^{T} 1_{n} 1_{n}^{T} = Z^{T} (I - \frac{1}{n} 1_{n} 1_{n}^{T}) \end{aligned}

(8)

where

1_{n}

a column vector containing all ones. Thus, (7) can be rewritten to

\begin{aligned} S = Z^{T} (I - \frac{1}{n} 1_{n} 1_{n}^{T}) {(I - \frac{1}{n} 1_{n} 1_{n}^{T})}^{T} Z \end{aligned}

(9)

where

M = I - 1 / n 1_{n} 1_{n}^{T}

is the central matrix with the following properties:

Theorem 1

The center matrix M is idempotent and symmetric.

Proof

Firstly, we prove that the matrix M is symmetric, i.e., $M^{T} = M$ .

Since I and $1_{n} 1_{n}^{T}$ are both symmetric matrices, we have

\begin{aligned} M^{T} = I^{T} - (1 / n 1_{n} 1_{n}^{T})^{T} = I - 1 / n 1_{n} 1_{n}^{T} = M \end{aligned}

(10)

Therefore, the matrix M is symmetric.

Secondly, we prove that the matrix M is idempotent, i.e., $M^{2} = M$ .

For $M^{2}$ , there is

\begin{aligned} M^{2} & = (I - \frac{1}{n} 1_{n} 1_{n}^{T}) (I - \frac{1}{n} 1_{n} 1_{n}^{T}) \\ = I^{2} - I (\frac{1}{n} 1_{n} 1_{n}^{T}) - (\frac{1}{n} 1_{n} 1_{n}^{T}) I + (\frac{1}{n} 1_{n} 1_{n}^{T}) (\frac{1}{n} 1_{n} 1_{n}^{T}) \\ = I - \frac{1}{n} 1_{n} 1_{n}^{T} - \frac{1}{n} 1_{n} 1_{n}^{T} + \frac{1}{n^{2}} (1_{n} 1_{n}^{T}) (1_{n} 1_{n}^{T}) \end{aligned}

(11)

Notice that $1_{n} 1_{n}^{T}$ is an all-1 matrix and its square is still an all-1 matrix, so the number of multiplications will only amplify the coefficients of this all-1 matrix, i.e., $(1_{n} 1_{n}^{T}) (1_{n} 1_{n}^{T}) = n (1_{n} 1_{n}^{T})$ .

Thus,

\begin{aligned} M^{2} = I - \frac{1}{n} 1_{n} 1_{n}^{T} - \frac{1}{n} 1_{n} 1_{n}^{T} + \frac{n}{n^{2}} 1_{n} 1_{n}^{T} = I - \frac{1}{n} 1_{n} 1_{n}^{T} = M \end{aligned}

(12)

Therefore, the matrix M is idempotent.

According to Theorem 1, (9) can be expressed as

\begin{aligned} S = Z^{T} M Z \end{aligned}

(13)

Redefining $M = - M = (1 / n) 1_{n} 1_{n}^{T} - I$ , minimizing the following problem to obtain a global structural representation of the sample:

\begin{aligned} m i n t r (Z^{T} M Z) \end{aligned}

(14)

To maintain the elasticity between global and local structures and to explore the intrinsic associations of the samples deeply and flexibly, a trade-off parameter is used to unite (6) and (14):

\begin{aligned} m i n t r ((1 - τ) Z^{T} L Z + τ Z^{T} M Z) \Leftrightarrow min t r (Z^{T} ((1 - τ) L + τ M) Z) \end{aligned}

(15)

where

τ \in [0, 1]

Integrating (15) into the subspace clustering framework, we can obtain

\begin{aligned} m i n_{Z, E} β t r (Z^{T} ((1 - τ) L + τ M) Z) + ζ Ψ (E) s . t . X = X Z + E \end{aligned}

(16)

3.2 Complete objective function

Existing studies^40,41 have shown that even small amounts of supervised information can significantly improve the performance of clustering methods. While subspace clustering methods are often used in unsupervised scenarios, the challenge lies in how to obtain trustworthy labeling information in such contexts. Moreover, clustering performance is sensitive to the quality of supervised information; accurate supervised information guides the data to the correct subspace, while poor quality information leads to the wrong subspace. Because of this, we propose a new function that learns clustering information directly from samples. Furthermore, this function smoothly integrates the affinity matrix of the samples to enhance the confidence in H, as follows:

\begin{aligned} Φ (H, Z) = t r (H^{T} T H) + \frac{α}{2} \underset{j o i n t r e g u l a r i z a t i o n t e r m}{‖ H H^{T} - Z ‖_{F}^{2}} \end{aligned}

(17)

where

H \in ℜ^{n \times c}

is the label matrix,

T = I - \overset{⌢}{Z}

is the standardized Laplacian matrix.

\overset{⌢}{Z} = D^{- 1 / 2} Z D^{- 1 / 2} o r D^{- 1} Z

means the standardized structural matrix, where

D = d i a g (s u m (Z))

. Thanks to the

Φ (H, Z)

function, SDSC can directly obtain the clustering information of the data samples without requiring input labels in advance, greatly improving the accuracy of the clustering results. Meanwhile, the joint regularization term ensures that H and Z interact with each other in the optimization process, gradually improving their quality until reaching the optimum. Compared with the traditional individual optimization strategy, this joint optimization mechanism better captures the complete structure of the data. Combining (17) with (16), the final objective function is as follows:

\begin{aligned} m i n_{Z, E} t r (H^{T} T H) + \frac{α}{2} ‖ H H^{T} - Z ‖_{F}^{2} + β t r (Z^{T} ((1 - τ) L + τ M) Z) + ζ ‖ E ‖_{2, 1} \\ s . t . X = X Z + E, H^{T} H = I, d i a g (Z) = 0, Z_{i, j} \geq 0, Z 1 = 1 \end{aligned}

(18)

where

H H^{T} = I

is the orthogonality constraint, ensuring that the resulting is orthogonal to each other, improving their robustness and stability, and

Z 1 = 1

is the column normalization, which helps maintain the integrity of the low-rank approximation.

3.2 Optimization

In this section, we use the Alternating Direction Method of Multipliers (ADMM) to find the solution to the objective function. We introduce a variable G and let $G = Z$ , then the augmented Lagrangian function of (18) is

\begin{aligned} m i n_{Z, H, E, G, R_{1}, R_{2}, σ} t r (H^{T} (I - D^{- 1 / 2} Z D^{- 1 / 2}) H) + \frac{α}{2} ‖ H H^{T} - G ‖_{F}^{2} + β t r (Z^{T} \bar{L} Z) + ζ ‖ E ‖_{2, 1} \\ + \frac{λ}{2} ({‖ X - X Z - E + \frac{R_{1}}{λ} ‖}_{F}^{2} + {‖ Z - G + \frac{R_{2}}{λ} ‖}_{F}^{2}) - σ^{T} (G 1 - 1) \\ s . t . H^{T} H = I, d i a g (G) = 0, G_{i, j} \geq 0 \end{aligned}

(19)

where

\bar{L} = (1 - τ) L + τ M

R_{1}

R_{2}

and

σ

are Lagrange multipliers,

λ > 0

is the penalty term. Next, the variables in (19) are updated sequentially during the iteration process.

1. Updated H and other fixed

\begin{aligned} m i n_{H} t r (H^{T} (I - D^{- 1 / 2} Z D^{- 1 / 2}) H) + \frac{α}{2} ‖ H H^{T} - G ‖_{F}^{2} s . t . H^{T} H = I \end{aligned}

(20)

Further expansion for $‖ H H^{T} - G ‖_{F}^{2}$ leads to

\begin{aligned} m i n ‖ H H^{T} - G ‖_{F}^{2} = min t r (G^{T} G - H H^{T} G - G^{T} H H^{T} + H H^{T} H^{T} H) \end{aligned}

(21)

Due to $H^{T} H = I$ , so

\begin{aligned} m i n ‖ H H^{T} - G ‖_{F}^{2} = m i n t r (- H H^{T} G - G^{T} H H^{T}) \\ \Leftrightarrow m i n - 2 t r (H^{T} G H) \end{aligned}

(22)

Substituting (22) into (20), we can get

\begin{aligned} m i n_{H} t r (H^{T} (I - D^{- 1 / 2} Z D^{- 1 / 2} - α G) H) s . t . H^{T} H = I \end{aligned}

(23)

For (23), the eigenvectors corresponding to the k eigenvalues of the matrix $(I - D^{- 1 / 2} Z D^{- 1 / 2} - α G)$ are the solutions for H.

2. Updated G and other fixed

\begin{aligned} m i n_{G} \frac{α}{2} ‖ G - H H^{T} ‖_{F}^{2} + \frac{λ}{2} {‖ G - Z - \frac{R_{2}}{λ} ‖}_{F}^{2} - σ^{T} (G 1 - 1) \\ s . t . d i a g (G) = 0, G_{i, j} \geq 0 \end{aligned}

(24)

Let $C_{1} = α H H^{T}, C_{2} = Z + \frac{R_{2}}{λ}, C_{3} = \frac{C_{1} + C_{2}}{α + λ}$ , and $δ = α + λ$ . Then, (24) can be described as

\begin{aligned} m i n_{G} \frac{δ}{2} ‖ G - C_{3} ‖_{F}^{2} - σ^{T} (G 1 - 1) \\ s . t . d i a g (G) = 0, G_{i, j} \geq 0 \end{aligned}

(25)

As reported in,²⁷ we can find the optimal solution of G as follows:

\begin{aligned} G_{i, j} = {\begin{array}{ll} {(C_{3})}_{i j} + \frac{σ_{i}}{δ}, & i \neq j \\ 0, & otherwise \end{array} \end{aligned}

(26)

According to constraints $G_{i, j} \geq 0$ , $G = max (G, 0)$ . Furthermore, the Lagrangian multiplier $σ$ is updated as follows:

\begin{aligned} σ_{i} = δ (1 - \sum_{j = 1, j \neq i}^{n} {(C_{3})}_{i j}) / (n - 1) \end{aligned}

(27)

3. Updated Z and other fixed

\begin{aligned} m i n_{Z} t r (H^{T} (I - D^{- 1 / 2} Z D^{- 1 / 2}) H) + β t r (Z^{T} \bar{L} Z) + \frac{λ}{2} ({‖ X - X Z - E + \frac{R_{1}}{λ} ‖}_{F}^{2} + {‖ Z - G + \frac{R_{2}}{λ} ‖}_{F}^{2}) \end{aligned}

(28)

For simplicity, let $C_{4} = X - E + \frac{R_{1}}{λ}, C_{5} = G - \frac{R_{2}}{λ}$ . So, (28) can be rewritten as

\begin{aligned} min_{Z} t r (H^{T} (I - D^{- 1 / 2} Z D^{- 1 / 2}) H) + β t r (Z^{T} \bar{L} Z) + \frac{λ}{2} (‖ C_{4} - X Z ‖_{F}^{2} + ‖ Z - C_{5} ‖_{F}^{2}) \end{aligned}

(29)

Taking the partial derivative of (29) and setting it to zero results in

\begin{aligned} Z = (2 β \bar{L} + λ X^{T} X + λ I)^{- 1} (λ X^{T} C_{4} + λ C_{5} + D^{- 1}) \end{aligned}

(30)

4. Updated E and other fixed

\begin{aligned} \frac{λ}{2} {‖ X - X Z - E + \frac{R_{1}}{λ} ‖}_{F}^{2} + ζ ‖ E ‖_{2, 1} \end{aligned}

(31)

Similarly, we let $C_{6} = X - X Z + \frac{R_{1}}{λ}$ , and (31) is equivalent to

\begin{aligned} \frac{1}{2} ‖ E - C_{6} ‖_{F}^{2} + \frac{ζ}{λ} ‖ E ‖_{2, 1} \end{aligned}

(32)

Define $e_{i}$ and $(C_{6})_{i}$ as the i-th columns of E and $(C_{6})_{i}$ , respectively. (32) can be resolved by

\begin{aligned} e_{i} = {\begin{array}{ll} (1 - \frac{ζ / λ}{{‖ {(C_{6})}_{i} ‖}_{F}}) {(C_{6})}_{i}, & {‖ {(C_{6})}_{i} ‖}_{F} > \frac{ζ}{λ} \\ 0, & otherwise \end{array} \end{aligned}

(33)

5. Updated other variables

\begin{aligned} R_{1} = R_{1} + λ (X - X Z - E) \\ R_{2} = R_{2} + λ (Z - G) \\ λ = min (λ_{max}, ρ λ) \end{aligned}

(34)

where

λ_{max}

and

ρ

are the scalars required for ADMM. The detailed pseudocode is presented in Algorithm 1.

3.3 Complexity analysis

In this section, the complexity of the proposed method is analyzed. Assume that the dimension of samples is d and the number of samples is n. According to Algorithm 1, the main complexity of the SDSC method in this paper comes from solving the four variables H, G, Z, and E, which are $O (n^{3})$ , $O (n^{2})$ , $O (n^{3})$ and $O (d)$ , respectively. Considering that the inverse operation of $(2 β \bar{L} + λ X^{T} X + λ I)^{- 1}$ in solving the variable Z can be pre-computed and utilized in all iteration steps. Therefore, the time complexity of the SDSC is $O (ι (n^{3} + n^{2} + d))$ , where $ι$ denotes the number of iterations required for convergence.

3.4 Convergence analysis

Obviously, the augmented Lagrangian of the objective function represents a non-convex optimization problem, which does not guarantee strict global convergence. However, SDSC can converge to a stationary point where all Karush-Kuhn-Tucker (KKT) conditions are satisfied, albeit with relatively weak local convergence. For a detailed proof of model convergence, refer to Appendix I.

4. Experiment

In this section, we conduct experiments on seven real-world datasets and six synthetic datasets to evaluate the performance of the proposed SDSC. The following section presents a detailed description of the methodology used.

4.1 Datasets

These seven real datasets include six well-known facial datasets (GT, ORL, YALE, UMIST, Terravic thermal, Coil20) and the Semeion handwritten digit recognition dataset. Details of these datasets are as follows:

The GT dataset contains images of 50 individuals, captured under varying conditions such as time of day, lighting, and facial expressions. Each individual has 15 color images in JPEG format.

The ORL dataset includes 10 different greyscale images of each of 40 individuals, taken at different times and with varying facial details.

The YALE dataset comprises 165 greyscale images in GIF format from 15 individuals, with 11 images per subject.

The UMIST dataset consists of 564 images of 20 individuals, encompassing a wide range of poses from side to front.

The Terravic thermal image dataset comprises thermal facial images of 20 individuals, captured from various angles.

The Coil20 dataset captures images at 5-degree intervals, totaling 1440 images of 20 objects, with 72 images per object.

The Semeion handwritten digit dataset consists of 1593 examples of digits written by approximately 80 individuals. Each digit is stretched into a 16*16 rectangular box with 256 grayscale values.

The details of the six synthetic datasets are provided in Table 2. The symbols N, D, and C represent the number of samples, the dimensionality of the dataset, and the number of categories in the dataset, respectively. For simplicity, the symbol “Mark” denotes the dataset name abbreviation.

Table 2.
Parameters to the Synthetic dataset.

Synthetic datasets Cigar Half rings Circle and 3gaussians Petals Enclosure Worms

N 500 500 500 500 622 500

D 2 2 2 2 2 2

C 4 2 4 4 3 4

Mark SYN-1 SYN-2 SYN-3 SYN-4 SYN-5 SYN-6

Synthetic datasets	Cigar	Half rings	Circle and 3gaussians	Petals	Enclosure	Worms
N	500	500	500	500	622	500
D	2	2	2	2	2	2
C	4	2	4	4	3	4
Mark	SYN-1	SYN-2	SYN-3	SYN-4	SYN-5	SYN-6

4.2 Compared methods

The comparison methods used in the experiment are listed below:

Spectral Clustering (SC)⁴²: This widely used method clusters data by constructing a graph based on the similarity between data points and using the spectrum of the graph (eigenvalues and eigenvectors) to group the data points.

Low-Rank Representation (LRR)¹⁴: Obtains a low-rank approximation matrix using the self-representation of the data for classification purposes.

Kernel Diagonal Block Representation (KBDR)⁴³: Directly tracks a representation matrix with a block-diagonal structure in kernel Hilbert space, where the number of diagonal blocks corresponds to the subspace of the data.

Structure-Aware Subspace Clustering (SASC)²³: Captures the intrinsic structure of the data, including both global and local structures, with lower computational cost and model complexity than self-expressive subspace methods.

Adaptive-order proximity Learning for Graph-based Clustering (AOPL)⁴⁴: A graph-based clustering method that introduces higher-order proximity into structured proximity matrix learning to derive consistent structured proximity matrices from multi-order proximity.

Structure-Preserving Projection Learning via Low-Rank Embedding (SPPLLRE)⁴⁵: Aims to find low-rank projective representations with high energy. The model's ability to preserve local manifolds is enhanced by introducing graph smoothing of the representation matrix.

Global and Local Structure Preserving Non-Negative Subspace Clustering (NSC)²²: A NSC method that preserves global and local structures by kernelizing the space to improve its ability to handle non-linear data structures.

4.3 Clustering metric

The clustering performance of all methods in the experiment was evaluated using two commonly used metrics: accuracy (ACC) and normalized mutual information (NMI). ACC represents the proportion of correctly classified samples out of all samples, while NMI measures the consistency of the cluster labels with the true class labels. For both metrics, higher values indicate better clustering performance. All clustering results in the experiment are reported as percentages.

4.4 Parameter setting and robustness analysis

Based on the trends observed in Figures 1 and 2, the following conclusions can be drawn:(1) SDSC is extremely robust to the parameter ζ, with different values having an almost negligible effect on its final clustering performance; (2) The choice of parameters α and β affects the final performance of SDSC. The figure shows that, when other parameters are fixed, the clustering performance of SDSC fluctuates with different values of these parameters. Notably, for the ORL dataset, the clustering performance of SDSC improves significantly as the interval of $α < 1$ increases; (3) Additionally, the clustering scores of SDSC increase steadily with the increase in parameter values under the same conditions until they stabilize, especially when α changes from 1 to 50, which is a regularization parameter that controls the SDSC method, enabling it to capture both global and local structural information of the data. This trend demonstrates that global and local information can be used in conjunction, allowing SDSC to achieve a comprehensive and accurate data distribution. Furthermore, SDSC can exploit feature information hidden within local manifolds. These conclusions indicate that β and ζ are coarse-tuned parameters for SDSC pairs, whereas α is a fine-tuned parameter that should be considered. To verify this conclusion, additional experiments were conducted on the ORL dataset. With $ζ = 0.01$ fixed, the parameter α takes a constant range of values, and β is chosen from $[1, 10 : 10 : 100]$ . The obtained ACC and NMI values are shown in Figure 3.

Figure 1.

Different α and ζ corresponding to ACC on the Coil20 dataset when β is fixed.

Figure 2.

Different α and ζ corresponding to ACC on the ORL dataset when β is fixed.

Figure 3.

Two clustering scores were obtained on the ORL dataset with fixed ζ.

The combination of Figures 2 and 3 demonstrates that SDSC is highly sensitive to the parameter α. This sensitivity is primarily due to its role in controlling the joint optimization term of the labels and the representation matrix. The latter directly determines the ability to obtain high-quality label information from the multivariate data structure representation. Furthermore, the results indicate that our method can directly obtain useful label information from the sample data and positively influence the clustering results. This evidence supports the feasibility and effectiveness of our proposed method.

4.5 Initialization settings for Z and H

The initialization choices of Z and H are crucial for SDSC's performance. Z represents the self-expressive nature of the samples, whereas H denotes the labeling of the samples to the clusters. The closer the initial setting is to the true value, the more beneficial it is for SDSC. A common initialization setting for Z is to use KNN as the original input. However, the performance of KNN is heavily dependent on the choice of the k-nearest neighbor parameter. This pre-set setting is very unfriendly to unknown data. It should be noted that the coefficient of self-expression between any samples in the data must equal one. Consequently, the identity matrix is employed to initialize Z. For H, using H derived from K-means as an initial approximation appears to be a viable approach. Nevertheless, a similar issue persists with KNN, in that the results are inherently unpredictable and may not consistently yield a high-quality H. In light of this, we employ the maximum minimum distance to select k objects from the original sample. Subsequently, the clustering scores obtained with different initialization strategies on the GT and Terravic datasets are presented in Figure 4. In the experiment, the value of k is set to 5 for K-nearest neighbor (KNN), and the value of k in K-means is set to the number of categories present in the dataset. From Figure 4, it can be observed that the optimal experimental performance is achieved in terms of both ACC and NMI when SDSC is initialized using an identity matrix and the maximum minimum distance. This demonstrates that the initialization approach employed in SDSC for Z and H is both feasible and efficient.

Figure 4.

Clustering scores for different initialization methods for Z and H on two real datasets. (KNN means Z initialization via KNN, K-means means H initialization via K-means, both means both methods are used).

4.6 Experimental results and analyses on real and synthetic datasets

For the experiments, we randomly select N images from each class for testing. This selection is performed ten times randomly, and the final clustering result is the average of these ten iterations. The experiment results and medians of our method and the comparison methods on seven real datasets are presented in Tables 3–9. For the synthetic datasets, the visualization results of all methods are shown in Figures 5–10. By analyzing the final experimental results, the following conclusions can be drawn:

Overall, SDSC achieves optimal clustering performance on nearly all real and synthetic datasets. In particular, SDSC outperforms other subspace methods that focus solely on a single data structure. This superiority stems from SDSC's ability to elastically explore both the global geometric distribution and local nearest-neighbor relationships, uncovering intrinsic structural information from multiple perspectives, thus enhancing clustering performance. Unlike previous methods, SDSC relies solely on unsupervised samples to find high-quality cluster assignment information. This self-reliance is a key reason for SDSC's outstanding performance across all datasets. Furthermore, SDSC demonstrates robustness, as evidenced by the analysis of its clustering scores and their medians presented in the tables.

As evidenced by the experiment results presented in the table, LRR exhibits suboptimal performance relative to the other compared methods due to its employment of low-rank constraints to obtain a single global LRR. In contrast, SASC, SPPLLRE, and NSC further exploit the local structure while incorporating global information, significantly contributing to their superior clustering performance compared to LRR. Furthermore, both SPPLLRE and NSC address noise and redundancy to enhance the model's capability in handling nonlinear data. SPPLLRE utilizes dimensionality reduction projection, while NSC employs kernel mapping. Both techniques have distinct advantages and disadvantages in handling high-dimensional data. Kernel mapping addresses nonlinear problems by projecting the original data to a higher dimensional space. However, selecting the appropriate kernel function and parameters requires experience. Otherwise, the risk of overfitting or underfitting is significant. Dimensionality reduction reduces the original space to a lower-dimensional subspace, retaining sufficient valid information while potentially losing some data. This explains why the performance of SPPLRE and NSC is not consistently superior to one another.

The visualization results of the synthetic dataset demonstrate that the SDSC method is clearly superior to other methods. The figure illustrates that the comparison methods are susceptible to being misled by nearest neighbor samples in different subspaces, leading to erroneous clustering outcomes. However, in the SYN-1, SYN-3, and SYN-5 datasets, SDSC exhibits a clear advantage and is not affected by this factor. This indicates that these comparison methods require improvement in terms of inter-class separability. Additionally, SPPLLRE performs significantly worse than the other methods in all synthetic datasets. This is because the dimension of the synthetic dataset is only two-dimensional, and reducing its dimension would result in a massive loss of useful information. Conversely, KBDR, SASC, and AOPL are considered graph-based clustering methods. Based on the clustering performance of the three methods, they all exhibit excellent performance on complex real-world datasets. However, in experiments on synthetic datasets, the performance of KBDR is considerably inferior to that of SASC and AOPL. This may be attributed to the curse of dimensionality, which occurs in the mapped high-dimensional space, rendering KBDR ineffective. In addition, the high-dimensional space leads to a sparse data distribution, which affects KBDR's ability to accurately assess the relationships between data points.

For SASC, SPPLLRE, and NSC, which also focus on global and local information, SDSC further considers the elastic balance between global and local information to address different application scenarios. The experimental results in the table demonstrate that the ACC and NMI metrics of SDSC are generally higher than those of the three compared methods. The visualization results in Figures 5 to 10 show that although the comparison methods assign most sample points to the correct subspaces, there is still the problem of interaction between samples in neighboring subspaces. In contrast, SDSC successfully completes the clustering task. In both the simpler low-dimensional synthetic datasets and the more complex real-world image datasets, SDSC shows excellent clustering performance, proving its feasibility.

Figure 5.

Clustering results of SDSC and comparison methods on SYN-1.

Figure 6.

Clustering results of SDSC and comparison methods on SYN-2.

Figure 7.

Clustering results of SDSC and comparison methods on SYN-3.

Figure 8.

Clustering results of SDSC and comparison methods on SYN-4.

Figure 9.

Clustering results of SDSC and comparison methods on SYN-5.

Figure 10.

Clustering results of SDSC and comparison methods on SYN-6.

Table 3.

Clustering scores on the GT dataset.

	8				10				12
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	42.70	42.13	66.50	66.64	43.86	44.00	65.34	65.09	44.23	44.67	64.35	64.48
LRR	32.67	33.00	58.25	58.51	34.44	35.80	57.74	58.16	37.62	37.25	58.77	58.97
KBDR	46.03	45.88	67.92	68.00	43.72	43.10	65.29	65.12	45.90	45.75	65.25	65.03
SASC	41.80	41.63	64.79	64.88	44.32	43.60	64.83	64.28	44.82	44.50	64.15	64.33
AOPL	50.45	50.13	64.63	64.41	51.00	50.80	63.09	63.00	50.67	50.83	61.12	61.11
SPPLLRE	38.40	39.63	62.33	62.33	38.50	38.50	60.61	60.59	38.55	38.42	59.35	59.29
NSC	45.93	45.63	63.73	63.12	41.42	40.60	58.68	57.67	41.18	41.17	56.07	55.93
SDSC	50.70	50.25	70.62	70.53	51.58	51.70	69.51	69.38	52.77	52.75	69.16	69.67

Table 4.

Clustering scores on the Coil20 dataset.

	5				6				7
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	63.40	62.50	75.65	75.69	62.42	60.83	75.58	76.23	65.57	67.50	76.87	77.62
LRR	57.90	57.50	71.48	71.52	55.67	57.50	69.69	70.68	52.93	54.64	67.51	67.77
KBDR	65.00	65.00	77.28	77.39	64.17	63.33	76.96	77.02	67.07	66.79	78.88	78.85
SASC	68.10	68.50	79.45	79.69	68.58	68.75	79.00	79.13	68.29	67.50	78.88	79.24
AOPL	70.70	71.00	80.02	79.71	70.08	70.83	79.41	79.52	72.50	72.50	80.41	80.79
SPPLLRE	66.00	66.00	78.11	78.06	65.58	66.25	77.57	77.42	67.21	67.14	77.31	75.77
NSC	64.30	63.50	78.12	78.16	64.25	62.50	78.19	78.44	65.07	63.93	78.34	78.11
SDSC	71.90	72.00	81.25	81.27	70.58	70.00	79.89	80.60	73.14	72.86	80.79	80.65

Table 5.

Clustering scores on the Terravic dataset.

	2				3				4
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	51.83	51.67	78.73	78.55	44.44	44.44	70.78	70.87	42.83	43.33	67.36	67.43
LRR	48.83	48.33	75.93	75.84	48.22	48.89	72.12	72.09	43.50	44.58	67.47	67.67
KBDR	52.17	53.33	78.50	77.81	48.56	48.33	72.60	72.95	46.33	46.67	69.83	70.27
SASC	57.00	58.33	81.25	81.52	51.67	52.78	75.14	75.35	47.25	47.08	70.50	70.38
AOPL	44.67	45.00	73.31	73.89	47.00	47.78	71.22	71.80	47.67	48.33	69.52	70.18
SPPLLRE	56.17	56.67	81.20	81.40	48.89	48.89	73.73	73.62	45.92	46.67	69.46	69.62
NSC	50.17	50.00	75.04	74.62	51.78	52.22	74.17	74.36	48.42	48.75	71.12	70.99
SDSC	58.67	58.33	81.44	81.65	53.89	53.33	75.52	75.82	49.08	49.58	71.03	71.79

Table 6.

Clustering scores on the ORL dataset.

	2				4				8
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	48.63	48.75	79.38	79.18	48.63	47.81	72.48	72.55	61.00	60.78	79.10	78.93
LRR	47.00	47.50	77.56	77.44	56.94	57.50	76.51	77.22	61.81	62.03	78.11	78.24
KBDR	65.00	64.38	85.89	85.45	63.69	63.75	81.31	81.34	65.66	65.78	80.97	80.84
SASC	71.50	71.25	88.44	88.34	66.88	67.81	83.09	83.35	68.69	67.81	83.15	83.05
AOPL	53.75	52.50	79.15	79.27	72.94	72.19	86.01	85.92	78.34	77.97	87.83	88.03
SPPLLRE	70.00	70.00	88.00	87.74	64.38	64.38	81.82	81.64	66.44	66.41	81.65	82.17
NSC	75.25	76.25	88.06	88.48	73.75	74.06	87.65	88.12	67.59	68.91	83.33	84.53
SDSC	75.50	75.63	90.32	90.39	74.00	75.00	87.31	87.62	79.69	80.00	88.47	88.51

Table 7.

Clustering scores on the Semeion dataset.

	30				40				70
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	55.07	54.50	54.23	53.99	56.58	56.63	54.79	54.79	62.29	64.14	59.49	60.28
LRR	52.17	51.33	48.56	48.81	50.00	49.13	46.78	45.06	48.86	48.43	45.59	45.59
KBDR	55.93	56.83	51.14	51.72	54.63	53.13	49.42	49.13	57.17	57.50	49.09	48.82
SASC	51.57	52.33	50.98	51.54	53.58	53.13	50.57	49.79	51.69	53.50	49.66	49.69
AOPL	49.40	50.50	51.55	51.14	48.60	49.63	49.40	50.16	49.37	50.57	49.64	51.92
SPPLLRE	49.27	49.83	46.77	46.93	47.35	47.33	44.78	44.87	46.94	46.79	43.71	43.16
NSC	53.80	54.67	54.03	55.09	54.78	55.38	53.95	53.36	59.20	61.43	58.03	57.09
SDSC	57.00	56.50	56.16	56.48	60.30	61.25	58.19	58.73	65.44	65.43	61.61	62.32

Table 8.

Clustering scores on the Umist dataset.

	4				5				6
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	46.88	46.25	65.64	65.75	46.30	46.00	64.87	64.91	48.08	49.17	64.61	65.13
LRR	45.88	45.00	63.41	63.13	47.20	47.50	63.44	64.55	41.33	40.83	58.40	57.41
KBDR	51.88	52.50	69.77	69.84	52.30	52.50	68.38	68.10	50.25	50.00	66.75	67.26
SASC	55.88	55.63	72.42	72.39	55.70	55.50	71.20	71.26	55.33	55.83	69.32	68.73
AOPL	50.88	51.88	68.12	68.69	54.70	54.00	69.54	69.55	53.67	54.17	68.43	68.99
SPPLLRE	55.00	54.38	71.70	71.40	54.10	54.00	70.27	70.29	51.33	51.25	68.67	68.68
NSC	54.38	54.38	71.84	71.74	53.40	54.00	70.55	70.93	51.92	51.25	68.91	68.83
SDSC	57.50	58.13	72.92	73.79	56.90	56.00	71.88	71.87	53.75	54.58	68.97	68.85

Table 9.

Clustering scores on the Yale dataset.

	7				8				9
Methods	ACC	Median	NMI	Median	ACC	Median	NMI	Median	ACC	Median	NMI	Median
SC	48.96	48.05	52.82	51.81	47.05	46.59	50.86	50.74	46.57	46.97	50.40	50.70
LRR	40.91	42.21	44.51	43.64	45.45	43.75	51.73	52.75	49.39	48.99	51.55	51.46
KBDR	47.40	45.45	52.14	50.82	45.00	44.32	49.82	49.50	45.76	45.45	48.38	49.11
SASC	50.52	50.65	57.46	58.97	50.34	50.57	55.10	54.60	50.10	51.01	54.37	55.14
AOPL	51.95	53.25	54.84	56.43	49.89	50.00	52.88	52.80	48.08	48.99	50.21	51.11
SPPLLRE	52.34	53.90	58.46	60.14	50.91	50.57	55.66	54.71	50.10	51.01	54.21	54.07
NSC	48.44	46.75	53.08	52.02	44.43	44.89	50.09	49.71	45.45	44.95	49.83	49.60
SDSC	52.47	51.30	57.33	57.80	49.09	52.27	56.34	57.16	50.30	53.03	54.47	55.48

To further demonstrate the superiority of our method with respect to,³ we performed the Bayesian signed rank test on ACC and NMI. The results are visualized as shown in Figures 11 and 12, where the values reflect the probability that the methods listed in the rows are superior to those listed in the columns. From the results in Figures 11 and 12, it is obvious that our method outperforms the other methods. Taking Figure 11 as an example, our method significantly outperforms the classical subspace method LRR, with probability improvement ranging from 96% to 100%. Compared to recent methods such as NSC and SPPLLRE, our method also shows superior performance, with probability improvements ranging from 54% to 100%. Combining all the subplots in Figures 11 and 12, the probability of other methods outperforming ours in the last column is close to zero. In particular, the last column represents the probability where other methods perform better than ours, with only a few methods, such as AOPL, which achieves 46% in Figure 11(a). Overall, these results strongly confirm the superiority of our method.

Figure 11.

Visualization of the Bayesian signed rank test about ACC on seven image datasets.

Figure 12.

Visualization of the Bayesian signed rank test about NMI on seven image datasets.

4.6 Ablation experiment

In this subsection, we conduct ablation experiments to evaluate the feasibility of the proposed method, employing the following setup:

The proposed method does not consider the ESCC strategy (SDSCe):

\begin{aligned} m i n t r (H^{T} T H) + \frac{α}{2} ‖ H H^{T} - Z ‖_{F}^{2} + ζ ‖ E ‖_{2, 1} \\ s . t . X = X Z + E, H^{T} H = I, d i a g (Z) = 0, Z_{i, j} \geq 0, Z 1 = 1 \end{aligned}

(35)

The proposed method does not consider the joint regularization term (SDSCj):

\begin{aligned} m i n β t r (Z^{T} ((1 - τ) L + τ M) Z) + ζ ‖ E ‖_{2, 1} \\ s . t . X = X Z + E, d i a g (Z) = 0, Z_{i, j} \geq 0, Z 1 = 1 \end{aligned}

(36)

The proposed method does not consider the noise error (SDSCn):

\begin{aligned} m i n_{Z, E} t r (H^{T} T H) + \frac{α}{2} ‖ H H^{T} - Z ‖_{F}^{2} + β t r (Z^{T} ((1 - τ) L + τ M) Z) \\ s . t . X = X Z + E, H^{T} H = I, d i a g (Z) = 0, Z_{i, j} \geq 0, Z 1 = 1 \end{aligned}

(37)

Figure 13 presents the clustering performance of three variants alongside our original method across seven image datasets. From Figure 13, it can be observed that each regularization term is crucial to our method. Regarding the noise term, the clustering performance of SDSCn decreases when noise errors are not considered, demonstrating the effectiveness of this term. In comparison, the importance of the joint regularization term is even greater. When this term is removed, SDSCj can no longer adaptively learn label information from samples to constrain clustering, leading to performance degradation. Furthermore, the impact of the ESCC strategy is far more significant than that of the above two regularization terms. As shown in Figure 11, when ESSC is removed, our method nearly fails to perform the clustering task. The clustering performance of SDSCj and SDSCn is lower than that of SDSC, but they still utilize the ESSC strategy to extract intrinsic structural information from the data. This mechanism ensures that the final self-expressive matrix Z retains strong structural representation capabilities. In contrast, SDSCe abandons this advantage, resulting in the worst clustering performance. These experimental results not only confirm the superiority of the SDSC method but also further validate the feasibility and importance of the ESSC strategy.

Figure 13.

Ablation results analysis of the proposed method on seven datasets.

4.7. Convergence analysis

As described in,^46,47 we show the relative error of the objective function as the number of iterations varies over all datasets. Figure 14 shows the convergence curves of SDSC relative to the number of iterations on the seven datasets. Due to the significant variation in some values, Figure 14 further illustrates the specific error to accurately reflect the actual number of convergence iterations. It can be observed from Figure 14 that SDSC tends to converge as the number of iterations increases. Usually, the convergence of SDSC occurs after 6 to 8 iterations which provides further evidence of convergence of SDSC.

Figure 14.

Convergence curves on the seven datasets.

4.8. Computational complexity experiment

In this section, we provide a comparative analysis of running times and clustering performance across seven datasets. To validate the effectiveness of our method, Table 10 presents the running times measured in seconds for various clustering algorithms. As indicated in Table 10, our method exhibits competitive running times on several datasets, outperforming approaches such as AOPL and SPLLLRE. While the runtime is slightly higher than that of faster methods like SC and LRR on specific datasets, our method achieves a commendable balance between efficiency and clustering accuracy, as demonstrated in Tables 3–8. Overall, our method not only delivers superior clustering performance but also maintains a relatively low computational cost, underscoring its accuracy and efficiency. These results indicate that our method holds significant promise for practical applications in clustering tasks.

Table 10.
Running times on of different methods the seven image datasets.

Method ORL COIL20 Terravic GT Semeion Umist Yale

SC 0.80 1.00 0.43 1.38 0.62 0.38 0.48

LRR 7.64 7.24 9.16 13.7 21.12 10.48 6.68

KBDR 1.22 0.75 0.83 2.29 3.03 0.88 0.38

SASC 3.47 1.66 1.86 9.70 24.70 2.05 2.54

AOPL 3.11 1.72 1.23 15.82 99.68 6.84 1.70

SPPLLRE 5.29 2.93 0.73 7.04 104.55 1.85 1.20

NSC 6.23 14.58 1.38 55.45 75.14 5.18 0.64

OURS 2.33 2.31 3.10 15.20 19.94 7.84 2.75

Method	ORL	COIL20	Terravic	GT	Semeion	Umist	Yale
SC	0.80	1.00	0.43	1.38	0.62	0.38	0.48
LRR	7.64	7.24	9.16	13.7	21.12	10.48	6.68
KBDR	1.22	0.75	0.83	2.29	3.03	0.88	0.38
SASC	3.47	1.66	1.86	9.70	24.70	2.05	2.54
AOPL	3.11	1.72	1.23	15.82	99.68	6.84	1.70
SPPLLRE	5.29	2.93	0.73	7.04	104.55	1.85	1.20
NSC	6.23	14.58	1.38	55.45	75.14	5.18	0.64
OURS	2.33	2.31	3.10	15.20	19.94	7.84	2.75

5. Conclusion and future work

This paper presents a new subspace clustering method, Sample-Dependent Subspace Clustering (SDSC). Benefitting from the Elastic Structure Consistency Constraints (ESCC) strategy, SDSC can uncover both global and local structural information by learning a high-quality and flexible affinity matrix. Furthermore, SDSC can extract clustering guidance information from samples during the learning process of the affinity matrix, which is iteratively optimized by mutual reinforcement with the affinity matrix to achieve optimal clustering performance. Finally, compared to other methods, SDSC has demonstrated superior experimental results on several well-known real and synthetic datasets, evidenced by higher clustering accuracy and enhanced robustness.

There are two improvements for future work. First, an adaptive method should be designed to automatically determine the local nearest neighbor structure of the data to further improve the robustness of the method. Second, the method should be extended to multi-view learning to handle data with different feature perspectives and improve the diversity and accuracy of clustering results.

Footnotes

Acknowledgment

This work is supported by the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA) and City University of Hong Kong (Projects 9610034 and 9610460).

ORCID iD

Gang Zhu

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA) and City University of Hong Kong (Projects 9610034 and 9610460).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix I

Assuming SDSC reaches a stationary point, the KKT conditions⁴⁹ for (19) can be derived as follows: (35)

\begin{aligned} X - X Z - E = 0 \end{aligned}

(36)

\begin{aligned} Z - G = 0 \end{aligned}

(37)

\begin{aligned} G 1 - 1 = 0 \end{aligned}

(38)

\begin{aligned} \frac{\partial L}{\partial G_{i j}} = δ (G - C_{3}) - σ = 0, \forall i \neq j \end{aligned}

(39)

\begin{aligned} \frac{\partial L}{\partial Z} = 2 β \bar{L} Z - D^{- 1} - X^{T} R_{1} + λ R_{2} = 0 \end{aligned}

(40)

\begin{aligned} \frac{\partial L}{\partial R_{1}} \in \frac{ζ}{λ} \partial_{E} (‖ E ‖_{2, 1}) \end{aligned}

Subsequently, we demonstrate the rationale for why SDSC can converge to a stationary point by the KKT conditions.

As described in (27), $G_{i j} = (C_{3})_{i j} + \frac{σ_{i}}{δ} (i \neq j)$ . So (42) can be further written as (43)

\begin{aligned} σ_{i}^{+} - σ_{i} = δ (1 - \sum_{j = 1, j \neq i}^{n} G_{i j}) / (n - 1) = δ (1 - G_{i :} 1) / (n - 1) \to 0 \end{aligned}

From the above equation, the KKT condition (37) is satisfied.

To prove the KKT condition (39), the following equation can be given (44)

\begin{aligned} Z^{+} - Z = [(2 β \bar{L} + λ X^{T} X + λ I)^{- 1} (λ X^{T} C_{4} + λ C_{5} + D^{- 1}) - Z] \end{aligned}

Then, (44) can be further described as follows. (45)

\begin{aligned} (2 β \bar{L} + λ X^{T} X + λ I) (Z^{+} - Z) \\ = (λ X^{T} C_{4} + λ C_{5} + D^{- 1}) - (2 β \bar{L} + λ X^{T} X + λ I) Z \\ = λ X^{T} (X - X Z - E) + λ (G - Z) + (X^{T} R_{1} - λ R_{2} + D^{- 1} - 2 β \bar{L} Z) \end{aligned}

When $Z^{+} - Z \to 0$ , (45) also converges to 0. It has been shown above that $X - X Z - E \to 0$ and $(G - Z) \to 0$ hold, so $(X^{T} R_{1} - λ R_{2} + D^{- 1} - 2 β \bar{L} Z) \to 0$ . Thus, the KKT condition (39) holds. Similarly, when the convergence condition $(G_{i j}^{+} - G_{i j}) \to 0$ is satisfied, we have (46)

\begin{aligned} (G_{i j}^{+} - G_{i j}) = (C_{3})_{i, j} + \frac{σ_{i}}{δ} - G_{i j} = \frac{1}{δ} [δ (C_{3})_{i, j} - δ G_{i j} + σ_{i}] \end{aligned}

Thus, the KKT condition (38) is proved.

For proving the KKT condition (40), the following equation can be given (47)

\begin{aligned} X - X Z + \frac{R_{1}}{λ} \in X - X Z + ζ (\partial_{E} (‖ E ‖_{2, 1})) / λ \end{aligned}

We define a simple function $ℏ_{ζ / λ} (a) := a +_{ζ / λ} (\partial (‖ a ‖_{2, 1}))$ . Then, (47) is also equivalent to $ℏ_{ζ / λ} (E) := a +_{ζ / λ} (\partial (‖ E ‖_{2, 1}))$ . Meanwhile, as reported in,⁴⁸ the following equations can be obtained (48)

\begin{aligned} E = ℏ_{_{ζ / λ}}^{- 1} (X - X Z + \frac{R_{1}}{λ}) ≅ Φ_{ζ / λ} (X - X Z + \frac{R_{1}}{λ}) \end{aligned}

According to (48), we have (49)

\begin{aligned} E^{+} - E = Φ_{ζ / λ} (X - X Z + \frac{R_{1}}{λ}) - E \end{aligned}

when

E^{+} - E \to 0

, we have

Φ_{ζ / λ} (X - X Z + \frac{R_{1}}{λ}) - E \to 0

. Now, all the KKT conditions are proven, and Theorem 2. holds.

References

Parsons

Haque

Liu

. Subspace clustering for high dimensional data: a review. Acm Sigkdd Explorat Newslett 2004; 6: 90–105.

Vidal

. Subspace clustering. IEEE Signal Process Mag 2011; 28: 52–68.

Feng

Wang

Xiao

, et al. Adaptive weighted dictionary representation using anchor graph for subspace clustering. Pattern Recognit 2024; 151: 110350.

Wei

Liu

, et al. Subspace clustering via structured sparse relation representation. IEEE Trans Neur Netw Learn Syst 2021; 33: 4610–4623.

Tseng

. Nearest q-flat to m points. J Optim Theory Appl 2000; 105: 249–252.

Rao

Tron

Vidal

, et al. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. IEEE Trans Pattern Anal Mach Intell 2009; 32: 1832–1845.

Zhao

Chang

, et al. Latent low-rank representation with weighted distance penalty for clustering. IEEE Trans Cybern 2022; 53: 6870–6882.

Vidal

Sastry

. Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 2005; 27: 1945–1959.

Chen

Tao

, et al. PDRLRR: a novel low-rank representation with projection distance regularization via manifold optimization for clustering. Pattern Recognit 2024; 149: 110198.

10.

Wang

Lin

, et al. Multiview spectral clustering via structured low-rank matrix factorization. IEEE Trans Neur Netw Learn Syst 2018; 29: 4833–4843.

11.

Nie

, et al. A novel normalized-cut solver with nearest neighbor hierarchical initialization. IEEE Trans Pattern Anal Mach Intell 2023; 46: 659–666.

12.

Pei

Nie

Wang

, et al. Efficient clustering based on a unified view of k-means and ratio-cut. In: Advances in neural information processing systems, on line, December 6–12, 2020, pp. 14855–14866.

13.

Elhamifar

Vidal

. Sparse subspace clustering: algorithm, theory, and applications. IEEE Trans Pattern Anal Mach Intell 2013; 35: 2765–2781.

14.

Liu

Lin

Yan

, et al. Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 2012; 35: 171–184.

15.

Chen

Wang

Bai

. Fuzzy sparse subspace clustering for infrared image segmentation. IEEE Trans Image Process 2023; 32: 2132–2146.

16.

Wang

Nie

Wang

, et al. Sparse robust subspace learning via boolean weight. Inf Fusion 2023; 96: 224–236.

17.

Guo

Tierney

Gao

. Efficient sparse subspace clustering by nearest neighbour filtering. Signal Process 2021; 185: 108082.

18.

Liu

Yan

. Latent low-rank representation for subspace segmentation and feature extraction. In: 2011 international conference on computer vision, Barcelona, Spain, November 6–13, 2011, pp. 1615–1622.

19.

Liu

Wang

Zhang

, et al. Global and local structure preservation for feature selection. IEEE Trans Neur Netw Learn Syst 2013; 25: 1083–1095.

20.

Zhou

Cheng

, et al. Global and local structure preserving sparse subspace learning: an iterative approach to unsupervised feature selection. Pattern Recognit 2016; 53: 87–101.

21.

Zhang

, et al. Ensemble clustering via fusing global and local structure information. Expert Syst Appl 2024; 237: 121557.

22.

Jia

Zhu

Huang

, et al. Global and local structure preserving nonnegative subspace clustering. Pattern Recognit 2023; 138: 109388.

23.

Kou

Yin

Wang

, et al. Structure-Aware subspace clustering. IEEE Trans Knowl Data Eng 2023; 35: 10569–10582.

24.

Liu

Tan

, et al. Preserving local and global information: an effective metric-based subspace clustering. In: Proceedings of the 31st ACM international conference on multimedia, Ottawa, Canada, October 29 to November 3, 2023, pp. 3619–3627.

25.

Bai

Liang

Cao

. Semi-supervised clustering with constraints of different types from multiple information sources. IEEE Trans Pattern Anal Mach Intell 2020; 43: 3247–3258.

26.

Huang

Wang

, et al. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans Knowl Data Eng 2019; 32: 1212–1226.

27.

Zhao

Chang

, et al. Auto-weighted low-rank representation for clustering. Knowl Based Syst 2022; 251: 109063.

28.

Chen

, et al. Reweighted sparse subspace clustering. Comput Vis Image Underst 2015; 138: 25–37.

29.

You

Vidal

. Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans Image Process 2017; 26: 2988–3001.

30.

Tang

Liu

, et al. Structure-constrained low-rank representation. IEEE Trans Neur Netw Learn Syst 2014; 25: 2167–2179.

31.

Wang

Yuan

. Graph-regularized low-rank representation for destriping of hyperspectral images. IEEE Trans Geosci Remote Sens 2013; 51: 4009–4018.

32.

Feng

Lin

, et al. Subspace clustering by block diagonal representation. IEEE Trans Pattern Anal Mach Intell 2018; 41: 487–501.

33.

Kang

Lin

Zhu

, et al. Structured graph learning for scalable subspace clustering: from single view to multiview. IEEE Trans Cybern 2021; 52: 8976–8986.

34.

Hawkins

. The problem of overfitting. J Chem Inf Comput Sci 2004; 44: 1–12.

35.

Peng

Tang

Zhang

, et al. A unified framework for representation-based subspace clustering of out-of-sample and large-scale data. IEEE Trans Neur Netw Learn Syst 2015; 27: 2499–2512.

36.

Min

Zhao

, et al. Robust and efficient subspace segmentation via least squares regression. In: Computer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7–13, 2012, pp. 347–360.

37.

Crawford

Tian

. Local manifold learning-based k-nearest-neighbor for hyperspectral image classification. IEEE Trans Geosci Remote Sens 2010; 48: 4099–4109.

38.

Hajizadeh

Aghagolzadeh

Ezoji

. Local distances preserving based manifold learning. Expert Syst Appl 2020; 139: 112860.

39.

Abdi

Williams

. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010; 2: 433–459.

40.

Peng

. Exhaustive and efficient constraint propagation: a graph-based learning approach and its applications. Int J Comput Vision 2013; 103: 306–325.

41.

Liu

Tao

. Partition level constrained clustering. IEEE Trans Pattern Anal Mach Intell 2017; 40: 2469–2483.

42.

Von Luxburg

. A tutorial on spectral clustering. Stat Comput 2007; 17: 395–416.

43.

Liu

Wang

Sun

, et al. Adaptive low-rank kernel block diagonal representation subspace clustering. Appl Intell 2022; 52: 2301–2316.

44.

Chang

, et al. Adaptive-order proximity learning for graph-based clustering. Pattern Recognit 2022; 126: 108550.

45.

Cai

Wan

Yang

, et al. Structure preserving projections learning via low-rank embedding for image classification. Inf Sci (Ny) 2023; 648: 119636.

46.

Guo

Sun

Gao

, et al. Rank consistency induced multiview subspace clustering via low-rank matrix factorization. IEEE Trans Neur Netw Learn Syst 2021; 33: 3157–3170.

47.

Wang

Deng

, et al. Attention reweighted sparse subspace clustering. Pattern Recognit 2023; 139: 109438.

48.

Kim

Lee

. Elastic-net regularization of singular values for robust subspace learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Boston, Massachusetts, June 7–12, 2015, pp. 915–923.

49.

Guo

Sun

Gao

. Logarithmic schatten-p norm minimization for tensorial multi-view subspace clustering. IEEE Trans Pattern Anal Mach Intell 2022; 45: 3396–3410.

Sample-Dependent Subspace Clustering with Elastic Structure Consistency Constraints

Abstract

Keywords

1. Introduction

2. Related works

3.1 Elastic structure consistency constraints

3.4 Convergence analysis

4. Experiment

4.1 Datasets

Table 2. Parameters to the Synthetic dataset. Synthetic datasets Cigar Half rings Circle and 3gaussians Petals Enclosure Worms N 500 500 500 500 622 500 D 2 2 2 2 2 2 C 4 2 4 4 3 4 Mark SYN-1 SYN-2 SYN-3 SYN-4 SYN-5 SYN-6

4.3 Clustering metric

4.4 Parameter setting and robustness analysis

Footnotes

Acknowledgment

ORCID iD

Funding

Declaration of conflicting interests

Appendix I

References

Table 2.
Parameters to the Synthetic dataset.

Synthetic datasets Cigar Half rings Circle and 3gaussians Petals Enclosure Worms

N 500 500 500 500 622 500

D 2 2 2 2 2 2

C 4 2 4 4 3 4

Mark SYN-1 SYN-2 SYN-3 SYN-4 SYN-5 SYN-6