Adaptive soft subspace clustering combining within-cluster and between-cluster information

Abstract

For the uncertain problem that between-cluster distance influences clustering in the soft subspace clustering (SSC) process, a novel clustering technique called adaptive soft subspace clustering (ASSC) is proposed by employing both within-cluster and between-cluster information. First, a new objective function is constructed by minimizing the within-cluster compactness and maximizing the between-cluster distance based on the framework of SSC algorithm. Based on this objective function, a new way of computing clusters’ feature weights, centers and membership is then derived by using Lagrange multiplier method. The uniqueness of ASSC is that the objective function does not increase any control parameters, which can avoid the sensitivity of clustering results to the initial points of the control parameters. The properties of this algorithm are investigated and the performance is evaluated experimentally using UCI datasets. The contrastive experiment results demonstrate that the accuracy and the stability of the proposed algorithm outperform the four existing clustering algorithms, i.e., ESSC, EWKM, FWKM and CIM_QPSO_SSC.

Keywords

Soft subspace clustering within-cluster compactness between-cluster distance not increase any control parameters

1 Introduction

Clustering is an algorithm which divides the dataset without labels into several meaningful categories according to the different similarity degree. A cluster is a homogeneous group of entities. While entities in the same cluster are supposed to be homogeneous, according to some notion of similarity, entities in different clusters are expected to be heterogeneous [1, 31], i.e., by minimizing the between-cluster similarity while maximizing the within-cluster similarity [2 , 28]. Driver et al. first applied the idea of cluster analysis to the field of anthropology, Robert Tryon introduced it to the psychology [4], Ayub, et al. Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems [32] and Qazi et al. A hybrid technique for speech segregation and classification using a sophisticated deep neural network [33]. In order to solve practical problems, clustering has been developed simultaneously with many classic algorithms [5]. Those algorithms have been applied to various fields such as electronic commerce, internet, bioinformatics as well as financial transactions [6, 7]. Among the studies algorithms, for datasets with different clusters correlating to different subsets of features, the soft subspace clustering (SSC, for short) is a more suitable approach since different vectors of feature weights are assigned to each cluster.

According to the ways of soft subspace weighting, the SSC algorithm which has been widely used in scientific research and industrial applications can be divided into two categories, namely, fuzzy weight [8] and entropy weight [9, 30] for SSC. For the former, the algorithm assign a fuzzy weight w_jk β to the jth feature of the kth cluster and adjust the feature weights for each cluster automatically during the clustering process, while for the latter, the algorithms utilize entropy to control the feature weights in each cluster. Entropy is a measure based on local similarity and has certain advantages for measuring the similarity between sample points in the same subspace. The membership degree in entropy weight is a hard partition matrix. Although many SSC algorithms have been developed for different application areas, there are still rooms to further improve the performance and many researchers have proposed different remarkable methods. Wang et al. proposed a soft subspace algorithm based on fuzzy partition index and applied the concept of partition index to the existing SSC algorithm [10]. Zhu et al. proposed the online SSC algorithm combined online learning strategy with SSC to solve large-scale high-dimensional data and data streams problems [3]. Wang et al. proposed an enhanced SSC algorithm through hybrid dissimilarity. The optimization function in the algorithm was designed based on the goal of minimizing the hybrid dissimilarity and maximizing the weighted entropy, eventually improve the performance of SSC [11].

However, one of the main weaknesses of SSC is that the clustering results can be satisfactory for some datasets while others are not so desirable. This may be because the study of distance between classes is not deep enough, so it is necessary to adaptively evaluate the relationship between cluster and cluster distance measure. Recent studies have shown that the introduction of between-cluster separation can effectively improve the clustering performance. Representative approaches include alternating optimization approach [12], adaptive metric learning for self-organizing incremental neural network [13] and entropy weighting fuzzy clustering in a composite kernel space (CKS, for short) for kernel space and feature space [14]. However, all these clustering algorithms frequently falls into local optimum during searching clustering center point due to the improper selection of initial points, which leads to a number of clustering mergers and thus results in the degradation of the clustering performance.

In this work, the above problems will be investigated by integrating within-cluster compactness and between-cluster separation into the framework of SSC and a novel adaptive soft subspace clustering algorithm named ASSC will be proposed accordingly. The first characteristic of ASSC is that the new objective function does not increase any control parameters, which can avoid the sensitivity of clustering results to the initial points of the control parameters and thus the computational efficiency could be improved. The second is that the proposed algorithm employs a new way of computing clusters’ feature weights, centers and membership degree by Lagrange multiplier method. For easy reference and to enhance the readability of the paper, the major notations used in this paper are summarized in Table 1.

Table 1
Notations used in this paper

Notatios Descriptions

C the number of clusters

N size of dataset

M the number of features

beta feature weight index

m membership index

u _ik the membership of the ith object that belongs to the kth cluster

w _kj the weight value of the jth dimension for the kth cluster

x _ij the value of the ith object in the jth dimension

v _kj the value of the kth cluster center in the jth dimension

v _oj the global center of the jth feature in the entire dataset

U cluster partition matrix

V the cluster center matrix

W feature weight matrix

Notatios	Descriptions
C	the number of clusters
N	size of dataset
M	the number of features
beta	feature weight index
m	membership index
u _ik	the membership of the ith object that belongs to the kth cluster
w _kj	the weight value of the jth dimension for the kth cluster
x _ij	the value of the ith object in the jth dimension
v _kj	the value of the kth cluster center in the jth dimension
v _oj	the global center of the jth feature in the entire dataset
U	cluster partition matrix
V	the cluster center matrix
W	feature weight matrix

The remainder of this paper is organized as follows: Section 2 presents related work and background. Section 3 introduces the ASSC algorithm. In Section 4, comparative experimental results on real dataset show the superiority of ASSC. Finally, in Section 5, we draw conclusions and describe possible extensions of this work.

2 Background and related work

Let X ={ x_ij, i = 1, 2, …, N ; j = 1, 2, …, M } be a dataset with N entities (i.e., objects) in M variables (i.e., features). The expression of U ={ u_ik, i = 1, 2, …, N ; k = 1, 2, …, C } , V = { v_kj, j = 1, 2, …, M ; k = 1, 2, …, C } and W ={ w_kj, j = 1, 2, …, M ; k = 1, 2, …, C } are shown in Table 1.

2.1 The soft subspace clustering algorithm

The SSC algorithm is a clustering algorithm for local search on a related dimension. The SSC seeks the important features by setting the feature weights, thereby reducing the effect of irrelevant features. The SSC algorithm can identify the feature subset of each cluster and find the corresponding clusters of different characteristic subsets. Its basic thought could be described as a problem of minimum value of the objective function [10], as follows: $\begin{matrix} J_{SSC} (W, U, V) \\ = \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj}^{β} \times (x_{ij} - v_{kj})^{2} + H \\ s . t . u_{ik} \in {0, 1}, \sum_{k = 1}^{C} u_{ik} = 1, \\ \sum_{j = 1}^{M} w_{kj} = 1, w_{kj} \in [0, 1] \end{matrix}$ (1)

The first term in Equation (1) is interpreted as the total weight distance between each sample point x_i, i = 1, 2, …, N, and the cluster center point v_k, k = 1, 2, . . , C and the second item H is a penalty item which is often used to optimize the performance of the algorithm. The parameter m (m > 0) is used to control the influence of membership matrix U during an iterative process. The parameter β (β >0) is used to control the influence of weight matrix W during an iterative process.

2.2 Soft subspace clustering algorithm based on between-cluster distance

The cluster results are affected by within-cluster compactness and between-cluster separation. Some scholars had studied the subspace clustering algorithms which integrate within-cluster and between-cluster information. The enhanced soft subspace clustering (ESSC, for short) [2, 26] algorithm minimizes the within-cluster distance feature weight entropy and meanwhile maximizes the between-cluster distance. It belongs to soft membership degree matrix, divided into numerical values between 0 and 1. It modifies the division of membership to better fit each sample point and effectively reduces the influence of close cluster centers on global central feature. Xia et al. proposed a multi-objective evolutionary SSC algorithm in 2013 [15], while optimizing two objective functions, the optimal solution and the optimal number of clustering were obtained by using the projection similarity criterion. Qiu et al. proposed a SSC algorithm based on adaption of between-cluster distance in 2016 [16], a new way of computing clusters’ center and feature weight was derived, which overcomes the sensitive defect of input parameters and obtains better clustering results. Xu et al. proposed a SSC algorithm based on quantum-behaved particle swarm optimization (SSC_QPSO, for short) [17, 34]. The algorithm solved global optimal clustering centers in the subspace by the advantages of global optimal algorithm of QPSO, the corresponding objective function of which can be formulated as

$\begin{matrix} J_{SSC_QPSO} (W, U, V) \\ = \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj} \times (x_{ij} - v_{kj})^{2} \\ + γ \times \sum_{k = 1}^{C} \sum_{j = 1}^{M} w_{kj} \times ln w_{kj} \\ - η \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj} \times (v_{kj} - v_{oj})^{2} \\ \begin{matrix} s . t . & u_{ik} \in {0, 1}, & \sum_{k = 1}^{C} u_{ik} = 1, & \sum_{j = 1}^{M} w_{kj} = 1, & w_{kj} \in [0, 1] \end{matrix} \end{matrix}$ (2) where σ and η are the weight coefficients, if σ > 0 and η > 0, which means the effects of coordination entropy and between-cluster separation on clustering results.

However, the influence of the between-cluster separation of SSC_QPSO algorithm on clustering results is not specified. In the process of the iteration in Equation (2), the within-cluster compactness and between-cluster separation are calculated in the same way - the objective function is based on Euclidean distance and still has a problem of dimensionality curse in higher dimensions and thus results in the failure of Euclidean distance metric function.

Xu et al. proposed CIM_QPSO_SSC (Correntropy Induced Metric is abbreviated to CIM) by modifying the metric in SSC_QPSO [18 , 29]. The corresponding objective function can be written as

$\begin{matrix} J (W, U, V) & = & \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj} \\ \times [1 - exp (- (x_{ij} - v_{kj})^{2} / σ^{2})] \\ - η \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj} \\ \times (1 - \exp (- (v_{kj} - v_{oj})^{2} / σ^{2}) + γ \\ \times \sum_{k = 1}^{C} \sum_{j = 1}^{M} w_{kj} \times ln w_{kj} \\ s . t . u_{ik} \in {0, 1}, & \sum_{k = 1}^{C} u_{ik} = 1, \sum_{j = 1}^{M} w_{kj} = 1, w_{kj} \in [0, 1] \end{matrix}$ (3)

CIM_QPSO_SSC has demonstrated better performance than classical clustering algorithms on some popular datasets [10]. But given a specific learning task, it is difficult to select appropriate initial values for the control parameters (m, σ, γ, η). Incorrect choices can seriously affect the performance of CIM_QPSO_SSC. At the same time, CIM_QPSO_SSC has some shortcoming of QPSO because of the introduction of QPSO algorithm, i.e., the time efficiency is sacrificed due to the process of particle renewal.

As described in [2], in the weighting subspace, the fuzzy weighting within-cluster compactness and the fuzzy weighting between-cluster separation of a dataset containing C clusters can be expressed as follows:

$\begin{matrix} J (W, U, V) = \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj}^{β} \\ \times (x_{ij} - v_{kj})^{2} - η \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik}^{m} \times w_{kj}^{β} \times (v_{kj} - v_{oj})^{2} \\ \begin{matrix} s . t . & u_{ik} \in {0, 1}, & \sum_{k = 1}^{C} u_{ik} = 1, & \sum_{j = 1}^{M} w_{kj} = 1, & w_{kj} \in [0, 1] \end{matrix} \end{matrix}$ (4)

Where η ⩾ 0. Note that, when η = 0, Equation (4) was only considered to develop the corresponding algorithm.

The form of different objective functions and the value of control parameters have significant influence on the performance of the algorithm. Therefore, a novel algorithm based on SSC is developed by integrating within-cluster compactness and between-cluster separation. Compared with the classic algorithm, the uniqueness of our work is that a new objective function is designed does not increase any control parameters. We decided to extend ASSC by addressing the three major theorems – updating the membership degree matrix, updating the cluster center matrix and updating the feature weight matrix.

3 Adaptive soft subspace clustering algorithm (ASSC)

3.1 Design of objective function

ASSC is proposed based on the framework of SSC. The objective function J is given by

$\begin{matrix} J (W, U, V) = \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik} \times w_{kj}^{2} \\ \times [1 - exp (- γ_{j} (x_{ij} - v_{kj})^{2}))] \\ - \sum_{k = 1}^{C} \sum_{i = 1}^{N} \sum_{j = 1}^{M} u_{ik} \times w_{kj}^{2} \times \\ s . t . u_{ik} \in {0, 1}, \sum_{k = 1}^{C} u_{ik} = 1, \\ \sum_{j = 1}^{M} w_{kj} = 1, w_{kj} \in [0, 1] \end{matrix}$ (5)

The main idea of ASSC is to minimize the sum of the within-cluster compactness and the between-cluster distance in Equation (5). It contains two terms - the within-cluster compactness and the between-cluster separation. The ASSC algorithm is extended to the corresponding non-Euclidean distance for measuring distances, i.e.,k_γ = - □ γ_j × (x_ij - v_kj) ²). The function maps to the Gaussian kernel space based on the original Euclidean distance metric’s optimization function. The parameter σ_j can be estimated by using the inverse of the variance of all samples X_i = (x_1j, x_2j, …, x_Nj) in the j-th dimension, i.e., $γ_{j} = 1 / s_{j}, s_{j} = \sum_{i = 1} {N (x_{ij} - x_{j}^{,})}^{2} / N, x_{j}^{,} = \sum_{i = 1} N x_{ij} / N$ .

Essentially, the minimization of the objective function in Equation (5) with the constraints is a class of constrained non-Euclidean distance problems. ASSC can minimize Equation (5) by iteratively solving the three groups of variables, namely, U, V and W, which can be achieved iteratively by solving the three basic theorems below.

Membership matrix. First we should resolve the cluster partition matrix U at specific center V and weight W using the minimum distance rule with distance defined in Equation (5). The membership function satisfies u_ik ∈ { 0, 1 } , ∑_k=1_Cu^ik = 1 and ASSC can minimize the Equation (5) by alternately updating the following equation: ${\begin{matrix} \begin{matrix} u_{ik} = 1, & if & \sum_{j = 1}^{M} w_{kj}^{2} \times D_{ik} ⩽ \sum_{j = 1}^{M} w_{lj}^{2} \times D_{il} \end{matrix} \\ \begin{matrix} u_{ik} = 0, & 1 ⩽ l ⩽ C, & l \neq k \end{matrix} \end{matrix}$ (6)

where D_ik = (1 - exp (- γ_j (x_ij - v_kj) ²)) -1/g_j (v_kj - v_oj) ², D_il = (1 - exp (- γ_j (x_ij - v_lj) ²)) -1/γ_j (v_lj - v_oj) ². In Equation (6), u_ik = 1 means that the ith data x_i is assigned to the kth cluster v_k and otherwise vice versa.

Class center matrix. Examines cluster center matrix V for a specific cluster partition U and weight W, we can directly use Lagrange multiplier method to optimize the minimum value of the object function by alternately updating the following equations. $\begin{matrix} v_{kj} = \frac{\sum_{i = 1}^{N} u_{ki} \times [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) \times x_{ij} - v_{oj} / γ_{j}]}{\sum_{i = 1}^{N} u_{ki} \times [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) - 1 / γ_{j}]} \\ v_{oj} = 1 / (N \times \sum_{i = 1}^{N} x_{ij}) \end{matrix}$ (7) which can be solved using the fixed point iteration method.

minimizing Equation (5) is equivalent to maximizing $\begin{matrix} \sum_{i = 1}^{N} u_{ik} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}), \\ 1 ⩽ k ⩽ C, 1 ⩽ j ⩽ M \end{matrix}$ (8)

Denote the number of data points in the k-cluster as n_k, Equation (8) turns into $\begin{matrix} \sum_{l = 1}^{Nk} exp (- γ_{j} \times (x_{lj} - v_{kj})^{2}), \\ 1 ⩽ k ⩽ C, 1 ⩽ j ⩽ M \end{matrix}$ (9)

We differentiate Equation (9) with respect to all x_ij and set the derivative to zero, then we can obtain the following equation

$v_{kj} = \frac{\sum_{l = 1}^{Nk} [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) \times x_{ij} - v_{oj} / γ_{j}]}{\sum_{l = 1}^{Nk} [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) - 1 / γ_{j}]}$ (10)

which is exactly Equation (7). Equation (10) is a nonlinear equation with respect to v_kj, it can be solved using the fixed point iteration method [10]. Denote

$f (v_{kj}) = \sum_{i = 1}^{N} u_{ik} \times exp (- γ_{j} \times (x_{lj} - v_{kj})^{2})$ (11) $φ (v_{kj}) = \frac{\sum_{l = 1}^{Nk} [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) \times x_{ij} - v_{oj} / γ_{j}]}{\sum_{l = 1}^{Nk} [γ_{j} \times exp (- γ_{j} \times (x_{ij} - v_{kj})^{2}) - 1 / γ_{j}]}$ (12)

then by the fixed point iteration method, we can obtain the sequence $v_{kj}^{(i + 1)} = φ (v_{kj}^{(i)}), i = 1, 2$ ... We can further show that the sequence ${v_{kj}^{(i)}, i = 1, 2, . . .}$ and ${f (v_{kj}^{(i)}), i = 1, 2, . . .}$ converge. The derivation process can be found in Ref. [19].

Feature weight matrix. Examines cluster weight W for a specific cluster partition U and center matrix V, it is a constrained optimization problem with the constraint that the sum of all feature weights for one cluster should be one. We can directly use Lagrange multiplier method to optimize the minimum value of the object function. The weight matrix W can be obtained by the following update formula: $w_{kj} = \sum_{q = 1}^{M} D_{kj} / D_{kq}$ (13) where D_kj = ∑_i=1^Nu_ik [(1 - exp (- γ_j (x_ij - v_kj) ²)) -1/γ_j (v_kj - v_oj) ²] , it represents the feature disperson of the variable j within cluster k, which means that it appears appropriate that variables that vary strongly within “true” clusters are weighted down.

The minimisation of Equation (5) is subject to ∑_j=1^Mw_kj = 1fork = 1, 2, . . . , C and a crisp clustering where any given U and V. We can apply the first-order optimality condition for the Lagrange function L as follows: $L = \sum_{k = 1}^{C} \sum_{j = 1}^{M} w_{kj}^{2} \times D_{kj} - λ \times (\sum_{j = 1}^{M} w_{kj} - 1)$ (14) where λ is the Lagrange multiplier. By setting the gradient of Equation (14) with respect to w_kj and λ to zero, we obtain $\begin{matrix} \frac{\partial L}{\partial w_{kj}} = 2 \times w_{kj} \times D_{kj} - λ = 0 \\ 2 \times w_{kj} \times D_{kj} - λ = 0 \\ w_{kj} = \frac{λ}{2 \times D_{kj}} \end{matrix}$

It one can be easily derived that w_kj = λ/2D_kj. By summing these expressions over all M, one arrives at equation ∑_j=1^Mλ/2D_kj = 1 and thus λ = ∑_j=1^M2D_kj . This leads to Equation (13) we want.

3.2 Algorithm summary

The ASSC algorithm is summarized in Table 2. Before running an algorithm, the dataset is pre-processed so that every feature is standardized by subtracting its average from the data entries and dividing the result by half the feature’s range, the difference between the maximum and minimum divided by 2 [20].

Table 2
The pseudo-code of the ASSC algorithm

Solving optimization problem Equation (5) by ASSC algorithm

Input: X—the dataset t—the iteration number, ɛ—threshold for determination

1: the dataset X is pre-processed;

2: Randomly initialize V₀ and initialize W₀ with w_kj = 1/M, e = 10e - 5 MaxGen = 500

3: for t = 1: MaxGen

4: Update U_t according to Equation (6) with V₀ and W₀;

5: Update V_t according to Equation (7) with U_t and W₀;

6: Update W_t according to Equation (13) with U_t and V_t;

7: Calculate the objective function J (U_t, V_t, W_t) with Equation (5);

8: if||J (U_t, V_t, W_t) - J (U_t, V₀, W₀) || < ɛ, break;

9: Set V₀ = V_t, W₀ = W_t;

10: end for

Output: U_t, V_t, W_tandJ (U_t, V_t, W_t) .

Solving optimization problem Equation (5) by ASSC algorithm
Input: X—the dataset t—the iteration number, ɛ—threshold for determination
1: the dataset X is pre-processed;
2: Randomly initialize V₀ and initialize W₀ with w_kj = 1/M, e = 10e - 5 MaxGen = 500
3: for t = 1: MaxGen
4: Update U_t according to Equation (6) with V₀ and W₀;
5: Update V_t according to Equation (7) with U_t and W₀;
6: Update W_t according to Equation (13) with U_t and V_t;
7: Calculate the objective function J (U_t, V_t, W_t) with Equation (5);
8: if\|\|J (U_t, V_t, W_t) - J (U_t, V₀, W₀) \|\| < ɛ, break;
9: Set V₀ = V_t, W₀ = W_t;
10: end for
Output: U_t, V_t, W_tandJ (U_t, V_t, W_t) .

Observing Equation (7), we can know that x_ij, which is the j-th feature of data point x_i, is assigned a weigh exp (σ_j (x_ij - v_kj) ²). Since exp (- γ_j (x_ij - v_kj) ²) →0 as γ_j→ 0, 1/γ_j → ∞. When two of the same order of infinity are divided, the limit is constant.

3.3 Convergence and complexity analysis of ASSC

The objective function Equation (5) is minimized interactively according to three theorems. Suppose in the t iteration where partial minimization is achieved, the following relationship holds $\begin{matrix} J (U_{t + 1}, V_{t + 1}, W_{t + 1}) & ⩽ & J (U_{t + 1}, V_{t + 1}, W_{t}) \\ ⩽ & J (U_{t + 1}, V_{t}, W_{t}) \\ ⩽ & J (U_{t}, V_{t}, W_{t}) \end{matrix}$

It implies that J(U, W, V) is a decreasing function with respect to the iteration number t. Therefore, the proposed algorithm ASSC can subsequently converges to either a local optimal solution or a saddle point of the objective function. In other words, ASSC converges toward a local minimum in a finite number of iterations, thereby making it suitable for data mining applications.

The computational complexity of ASSC algorithm per iteration is O (tCNM), where t is the total number of iterations required for performing steps 4-6 in Table 2. The space required by the algorithm to store the cluster centers V, feature weight matrix W and the partition matrix U is O (CM), O (CM) and O (CN) respectively.

4 Experiments and analysis

The proposed ASSC algorithm is evaluated with a large number of experiments on real datasets for different complexities. The clustering results were compared with those obtained by ESSC [2], EWKM [9] (entropy weighting k-Means is abbreviated to EWKM), FWKM [25, 31] (feature weighting k-Means is abbreviated to FWKM) and CIM_QPSO_SSC [18] algorithms. Different parameters are used in these five algorithms and their settings are tabulated in Table 4. All the experiments were implemented on a computer with an Intel(R) Core(TM) i3-4170 CPU, Windows7 and Matlab 13.0.

4.1 Datasets selection

The proposed method was evaluated with experiments conducted using datasets obtained from the UCI repository [21]. The datasets are described with a data matrix of “objects × features ”. The details are shown in Table 3.

Table 3
Details of the UCI datasets

Dataset N (Number of samples for each category) M C

Wine 178 (59,71,48) 13 3

Wall 5456 (2205,826,2097,328) 24 4

Sonar 208 (97,111) 60 2

Glass 214 (70,76,17,13,9,29) 9 6

Ionosphere 351 (225,126) 34 2

Dataset	N (Number of samples for each category)	M	C
Wine	178 (59,71,48)	13	3
Wall	5456 (2205,826,2097,328)	24	4
Sonar	208 (97,111)	60	2
Glass	214 (70,76,17,13,9,29)	9	6
Ionosphere	351 (225,126)	34	2

Before the experiments, all the datasets were normalized - the three datasets of Wine, Wall and Sonar were dimensional normalized and the datasets of Glass and Ionosphere were normalized by sample points. Dimension normalization is the sample point x_ij divided by the square sum of all the sample points in the dimension, i.e., $temp = x_{ij} / sqrt (\sum_{i = 1} N x_{ij}^{2}), x_{ij} = temp$ ; the sample point normalization is the sample point x_ij divided by the sample point in the sum of the squares of all dimensions, i.e., $Temp = x_{ij} / sqrt (\sum_{i = 1} M x_{ij}^{2}), x_{ij} = Temp$ .

4.2 Performance metrics and settings

Two metrics, the rand index (RI) [22] and the normalized mutual information (NMI) [23], are used for evaluating the performance of the proposed ASSC algorithm. RI is defined as $RI = \frac{d_{00} + d_{11}}{N \times (N - 1) / 2}$ (15) where d₀₀ is the number of pairs of samples that have different labels and belong to different clusters, d₁₁ is the number of pairs of samples that have the same labels and belong to the same clusters and N is the total number of samples. The value of RI is between 0 and 1, and the closer to the value 1, the better the performance of the algorithm. NMI is defined and computed according to the formula below

$NMI = \frac{\sum_{k = 1}^{C} \sum_{i = 1}^{C} d_{ik} \times log [(N \times d_{ik}) / (d_{i} \times d_{k})]}{\sqrt{[\sum_{k = 1}^{C} d_{k} \times log (d_{k} / N)] / [\sum_{i = 1}^{C} d_{i} \times log (d_{i} / N)]}}$ (16)

where C is number of clusters, N is number of samples, d_ik is the number of data objects occurring in both class ith and cluster kth, d_i is the number of samples that have the ith label, d_k is the number of samples that belong to the kth cluster. Apparently NMI is equal to 1 when the clustering results perfectly match the external category labels, and close to 0 for random partitioning.

4.3 Parameter setting

In the experiments, the maximum iteration number of the five clustering algorithms is set to 500 and the threshold value of iterative stop is set to 10^-5. The parameters of these algorithms and their settings are tabulated in Table 4.

Table 4
Algorithms and the setting of the parameters in the experiments

Algorithms Parameter setting

ASSC no

ESSC m = min(N, M - 1)/[min(N, M - 1) -2]

γ =1, 2, 10

η = 0.01

CIM_QPSO_SSC m = 1.5, 2.0

η=0.01, 0.1, 1

β₀ = 1.0

γ =0.01, 0.1, 0.5, 1.0, 2.0

σ =2.5, 10

T = 200

M = 20

FWKM m = 1.0

α =1.5, 2.0

σ=0.01, 0.1

EWKM m = 1.0

α=1.0

σ=1, 2, 2.5

Algorithms	Parameter setting
ASSC	no
ESSC	m = min(N, M - 1)/[min(N, M - 1) -2]
	γ =1, 2, 10
	η = 0.01
CIM_QPSO_SSC	m = 1.5, 2.0
	η=0.01, 0.1, 1
	β₀ = 1.0
	γ =0.01, 0.1, 0.5, 1.0, 2.0
	σ =2.5, 10
	T = 200
	M = 20
FWKM	m = 1.0
	α =1.5, 2.0
	σ=0.01, 0.1
EWKM	m = 1.0
	α=1.0
	σ=1, 2, 2.5

4.4 Experimental results and analysis

In the experiments, for each UCI datasets, the clustering of each algorithm was executed repeatedly 10 times with different initial partitions under a fixed parameter setting. The best clustering results expressed in terms of the means and standard deviations of the RI and NMI values are tabulated in Tables 5 and 6 respectively. The results of the four comparison algorithms refer to the results in Ref. [24].

Some interesting results can be observed in Table 5 when considering the results of RI metric. ASSC produced higher values of mean than ESSC for 3 of the 5 datasets we considered. CIM_QPSO_SSC algorithm did even better, providing higher values of mean than ESSC for 5 of the 5 datasets. The performance of ASSC is slightly lower than CIM_QPSO_SSC in the Wall and Sonar datasets, with the difference of 0.0125 and 0.0257 in the corresponding datasets respectively. This indicates that the ASSC algorithm has strong robustness. In terms of standard deviations, the ASSC algorithm in 5 datasets is better than the other four algorithms. We believe that the introduction of between-cluster distance metric was the key factor to ensure the stability of the cluster results.

Table 5
Best results obtained on UCI dataset with RI as metric

Dataset ASSC CIM_QP FWKM EWKM ESSC

SO_SSC

Glass

Mean 0.7830 0.7139 0.6736 0.6988 0.7009

Std 0.0005 0.0129 0.1308 0.0201 0.0234

Wine

Mean 0.9336 0.9103 0.9015 0.9036 0.9035

Std 0.00001 0.0001 0.0322 0 0

Wall

Mean 0.5935 0.6060 0.5851 0.5802 0.5858

Std 0 0.0079 0.0692 0.1312 0.0421

Ionosphere

Mean 0.5302 0.5880 0.5913 0.5731 0.5750

Std 0.000002 0.0096 0.0264 0.0263 0.0985

Sonar

Mean 0.5031 0.5288 0.5088 0.5116 0.5222

Std 0.00001 0.0058 0.0581 0.0543 0.0359

Dataset	ASSC	CIM_QP	FWKM	EWKM	ESSC
Glass
Mean	0.7830	0.7139	0.6736	0.6988	0.7009
Std	0.0005	0.0129	0.1308	0.0201	0.0234
Wine
Mean	0.9336	0.9103	0.9015	0.9036	0.9035
Std	0.00001	0.0001	0.0322	0	0
Wall
Mean	0.5935	0.6060	0.5851	0.5802	0.5858
Std	0	0.0079	0.0692	0.1312	0.0421
Ionosphere
Mean	0.5302	0.5880	0.5913	0.5731	0.5750
Std	0.000002	0.0096	0.0264	0.0263	0.0985
Sonar
Mean	0.5031	0.5288	0.5088	0.5116	0.5222
Std	0.00001	0.0058	0.0581	0.0543	0.0359

Table 6

Best results obtained on UCI dataset with NMI as metric

Dataset	ASSC	CIM_QP	FWKM	EWKM	ESSC
		SO_SSC
Glass
Mean	0.6053	0.3483	0.3728	0.3416	0.3700
Std	0.0006	0.0212	0.0181	0.0302	0.0160
Wine
Mean	0.8221	0.7712	0.7607	0.7615	0.7642
Std	0.00004	0.0039	0.0220	0.0165	0
Wall
Mean	0.1278	0.0979	0.1104	0.1050	0.1141
Std	0.000001	0.0014	0.0105	0.0106	0.0309
Ionosphere
Mean	0.0266	0.1305	0.1312	0.0949	0.1010
Std	0.000005	0.0156	0	0.1335	0.1156
Sonar
Mean	0.0660	0.0515	0.0300	0.0300	0.0494
Std	0.0004	0.0089	0.0288	0.0258	0.0223

When considering the results of NMI metric, it can be observed from Table 6 that ASSC produced higher values of mean than the other four algorithms for 4 of the 5 datasets. In terms of standard deviations the algorithms took to complete, the ASSC algorithm in 5 datasets is better than the other four algorithms.

Further comprehensive comparison of the two categories of evaluation indicators in Tables 5 and 6 can find that the ASSC algorithm is better than the other four algorithms in the accuracy and the stability of clustering results. The ASSC algorithm has not obtained the optimal RI and NMI evaluation indexes in all the datasets. This indicates that there is no single algorithm that is always superior to the others for all datasets. By comparing Tables 5 and 6, it is further noticed that the best clustering performance as indicated by RI is not always consistent with that indicated by NMI, i.e., an algorithm showing good clustering performance with a high RI value may not have a high NMI value. Therefore, it is necessary to evaluate the performance of a clustering algorithm with different metrics. The averages of the best clustering results obtained from these five algorithms are plotted in Fig. 1.

Fig. 1

The averages of clustering results obtained for the UCI datasets.

Figure 1 shows the average results of the five algorithms evaluated in these five UCI datasets. It can be seen from the histogram that the RI metric of the CIM_QPSO_SSC algorithm is optimal, and the RI value are 0.6694 and 0.66868 respectively with CIM_QPSO_SSC and ASSC algorithms, the difference between them is very small. The comparison of the NMI metric shows that the ASSC algorithm has the best performance in these five algorithms. It demonstrates that once again the proposed algorithm has better clustering accuracy and stability, as well as adaptive learning capabilities.

The reason for the better clustering results of ASSC algorithm in most datasets is: (i) a novel objective function integrating the within-cluster compactness and the between-cluster separation is proposed based on SSC objective function; (ii) the variance of the sample is chosen as the influencing factor of the between-cluster distance on the clustering effect, which can prevent the local optimum to a greatest extent; and (iii) The contrastive experiment results demonstrate that the accuracy and the stability of the ASSC algorithm outperform the four existing clustering algorithms. The above three points guarantee the superiority of the proposed algorithm. In the next section, the clustering results of ASSC in five datasets are described in detail.

4.5 ASSC algorithm performance metrics

Figure 2 (a) (e) shows the convergence curves of the ASSC algorithm in five datasets. The horizontal axis represents the number of iterations and the vertical axis represents the value of the objective function that the values are the best results among 10 trials. We can see that the objective function of ASSC has a significant drop after dozens of iterations and that shows ASSC has good convergence. i.e., the Glass dataset for iteration 10 times tends to be smooth, the Wine dataset iterates 5 times, the Wall dataset iterates 26 times, the Ionosphere dataset iterates 5 times and the Sonar dataset iterated 7 times.

Fig. 2

Convergence curves of (a) Glass, (b) Wine, (c) Wall, (d) Ionosphere, (e) Sonar, (f) Time.

The running time of the ASSC algorithm performed on the UCI datasets is measured in Fig. 2 (f). The horizontal axis represents the number of trials and the vertical axis represents time. We can see that the CPU running time of the ASSC algorithm in the Glass dataset, the Wine dataset, the Ionosphere dataset and the Sonar dataset are the fastest, the average of running 10 trials were 0.0676s, 0.0383s, 0.1672s and 0.0669s, respectively; and the Wall dataset is a large-scale data, its running time is longer than the other four datasets and the average time of 10 run was 402.2320s. Once again embodies the proposed algorithm was very strong adaptability and fast convergence rate. This is due to the fact that it is based on the framework of SSC and integrates within-cluster and between-cluster information required in the clustering process. Therefore, it is necessary to evaluate the distribution of a clustering algorithm with the feature weight, the distribution of the best clustering results obtained from ASSC algorithm is plotted in Fig. 3, X_j {j = 1, 2, …, M} represents the value of all entities in the j dimension.

Fig. 3

The left is the original data distribution; the right is the transformed data distribution according to feature weight.

The data distribution according to the feature weight was shown in Fig. 3. The left is the original data distribution for each dataset, the right is the implementation of ASSC algorithm after the data distribution, and the corresponding feature weight values are shown in Table 7. We can see that the introduction of between-cluster distance is very important for clustering results. There is a serious overlap between these five raw data, but the implementation of the ASSC algorithm can well distinguish these overlapping data. For example, the Glass dataset, the Wine dataset and the Ionosphere dataset can be completely separated and the distance between each type of data is maximized and between-cluster distance reaches a maximum. While the four classes in the Wall dataset have a small amount of data overlap. The two classes in the Sonar dataset have only a small amount of data overlap. Furthermore, the adaptive strength of ASSC algorithm can improve the clustering performance, thus indicating that the introduction of between-cluster distance can improve the clustering accuracy and the stability of clustering results.

Table 7

The transformed data distribution according to feature weight

Dataset	Feature weight
	0.0477, 0.1147;
	0.2220, 0.0407;
Glass	0.0021, 0.3791;
	0.0060, 0.0751;
	0.0221, 0.2062;
	0.0266, 0.1341.
Wine	0.0521, 0.0573;
	0.0342, 0.0247;
	0.1027, 0.0454.
Wall	0.0687, 0.0528;
	0.0552, 0.0115;
	0.0155, 0.0718;
	0.0111, 0.0154.
Ionosphere	0.0267, 0.0206;
	0.0268, 0.0201.
Sonar	0.0068, 0.0114;
	0.0236, 0.0196.

5 Conclusions and future work

In this study, the ASSC algorithm is proposed and developed by considering both within-cluster and between-cluster information. The work involves the following aspects: (i) The algorithm combines non-Euclidean distance and between-cluster separation, a new objective function is constructed by modifies the original SSC algorithm; (ii) the adaptive soft subspace clustering ASSC is developed and the properties are investigated; and (iii) comprehensive experiments are carried out to evaluate the performance of the ASSC algorithm. The findings in this study demonstrate that the proposed ASSC algorithm is in general more effective in subspace clustering than ESSC, EWKM, FWKM and CIM_QPSO_SSC algorithms.

This study will be further extended to improve the performance of other SSC algorithms. For example, entropy weighting subspace clustering algorithms. In this paper, the test results on high-dimensional dataset do not fully reflect the superiority of ASSC algorithm, but next we will try larger and more practical dataset to valid the effectiveness of the algorithm.

Footnotes

Acknowledgments

We would like to thank the anonymous reviewers whose thoughtful comments improved the quality of this paper. This work was jointly supported by the Fundamental Research Funds for the National Natural Science Foundation of China for General Program (Grant No. 51675414), National Key Research and Development Program of China (Grant No. 2017YFD0700200), China Postdoctoral Science Foundation (Grant No. 2018M643627) and National Natural Science Foundation of PR China (Approval No. 51575532).

References

De Amorim

R.C.

and Hennig

, Recovering the number of clusters in data sets with noise features using feature rescaling factors, Information Sciences 324 (2015), 126–145.

Deng

, Choi

K.S.

, Chung

F.L.

and Wang

, Enhanced soft subspace clustering integrating within-cluster and between-cluster information, Pattern Recognition 43 (2010), 767–781.

Zhu

, Cao

, Yang

and Lei

, Evolving soft subspace clustering, Applied Soft Computing 14 (2014), 210–228.

T.R. C, Cluster analysis: correlation profile and orthogonal (factor) analysis for the isolation of unities in mind and personality [M], Edwards brother, Incorporated, lithoprinters and publishers, 1939, 443–452.

, Kumar

, Quinlan

J.R.

, Ghosh

, Yang

, Motoda

, Mclachlan

G.J.

, Ng

, Liu

and Yu

P.S.

, Top 10 algorithms in data mining, Knowledge & Information Systems 14 (2008), 1–37.

Jain

A.K.

, 50 years beyond K-means, Pattern Recognition 31(8) (2010), 651–666.

Mirkin

, Clustering: A Data Recovery Approach, Chapman & Hall Crc Computer Science & Data Analysis 72 (2012), 109–110.

Gan

and Wu

, A convergence theorem for the fuzzy subspace clustering (FSC) algorithm, Pattern Recognition 41 (2008), 1939–1947.

Jing

, Ng

M.K.

and Huang

J.Z.

, An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data, IEEE Transactions on Knowledge & Data Engineering 19 (2007), 1026–1041.

10.

Wang

, Wang

, Chung

and Deng

, Fuzzy partition based soft subspace clustering and its applications in high dimensional data, Information Sciences 246 (2013), 133–154.

11.

Wang

, Hao

, Cai

and Wen

, Enhanced soft subspace clustering through hybrid dissimilarity, Journal of Intelligent & Fuzzy Systems 29 (2015), 1395–1405.

12.

Zhong

, Huang

and Liu

C.L.

, Low Rank Metric Learning with Manifold Regularization, IEEE International Conference on Data Mining 2011, pp. 1266–1271.

13.

Okada

and Nishida

, Online incremental clustering with distance metric learning for high dimensional data, International Joint Conference on Neural Networks 2011, pp. 2047–2054.

14.

Wang

, Deng

, Choi

K.S.

, Jiang

, Luo

, Chung

F.L.

and Wang

, Distance metric learning for soft subspace clustering in composite kernel space, Pattern Recognition 52 (2015), 113–134.

15.

Xia

, Zhuang

and Yu

, Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data, Pattern Recognition 46 (2013), 2562–2575.

16.

Qiu

, Longjuan

D.I.

and University

L.T.

, Soft subspace clustering algorithm based on self-adaption of intercluster distance, Computer Engineering & Applications 52(21) (2016), 88–93.

17.

and Wu

, Soft subspace clustering algorithm based on quantum-behaved particle swarm optimization, Pattern Recognition and Artificial Intelligence 29(6) (2016), 558–566.

18.

, Research on Subspace Clustering Algorithms and its Applications [D], Wuxi: School of IoT Engineering, Jiangnan University, 2016, 28–38.

19.

Zhi

, Fan

and Zhao

, Robust Local Feature Weighting Hard C-Means Clustering Algorithm, Neurocomputing 134 (2014), 20–29.

20.

Amorim

R.C.D.

and Mirkin

, Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering, Pattern Recognition 45 (2012), 1061–1075.

21.

Blake

, UCI repository of machine learning databases, 1998.

22.

Sun

, Feng

and Xu

, Particle swarm optimization with particles having quantum behavior, Evolutionary Computation, 2004. CEC2004. Congress on 2004, pp. 325–331 Vol. 321.

23.

Rand

, Objective Criteria for the Evaluation of Clustering Methods, Publications of the American Statistical Association 66 (1971), 846–850.

24.

Yajun

, Research on subspace clustering algorithms and its applications [D], Jiangnan university 2016, pp. 24–34.

25.

Jing

, Ng

, Xu

, et al., Subspace clustering of text documents with feature weighting k-means algorithm [M]. Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2005, 802–812.

26.

Jia

and Cheung

Y.-M.

, Subspace Clustering of Categorical and Numerical Data With an Unknown Number of Clusters [J], IEEE Transactions on Neural Networks and Learning Systems 29 (2018), 3308–3325.

27.

Zhou

and Zhu

, Kernel-based multiobjective clustering algorithm with automatic attribute weighting [J], Soft Computing 22 (2018), 3685–3709.

28.

Xifeng

, Xinwang

, et al., Adaptive Self-paced Deep Clustering with Data Augmentation [J], IEEE Transactions on Knowledge and Data Engineering (2019), 1–14.

29.

Sarwar , et al., A novel method for content-based image retrieval to improve the effectiveness of the bag-of-words model using a support vector machine [J], Journal of Information Science 45(1) (2019), 117–135.

30.

Mehmood , et al., Scene search based on the adapted triangular regions and soft clustering to improve the effectiveness of the visual-bag-of-words model [J], EURASIP Journal on Image and Video Processing 2018(1) (2018), 48.

31.

Mehmood , et al., Effect of complementary visual words versus complementary features on clustering for effective content-based image search [J], Journal of Intelligent & Fuzzy Systems Preprint (2018), 1–14.

32.

Ayub , et al., Modeling user rating preference behavior to improve the performance of the collaborative filtering based recommender systems [J], PloS one 14(8) (2019), e0220129.

33.

Qazi , et al., A hybrid technique for speech segregation and classification using a sophisticated deep neural network [J], PloS one 13(3) (2018), e0194151.

34.

Sharif , et al., Scene analysis and search using local features and support vector machine for effective content-based image retrieval [J], Artificial Intelligence Review 52(2) (2019), 901–925.

Adaptive soft subspace clustering combining within-cluster and between-cluster information

Abstract

Keywords

1 Introduction

2.1 The soft subspace clustering algorithm

3.1 Design of objective function

4 Experiments and analysis

4.1 Datasets selection

Table 3 Details of the UCI datasets Dataset N (Number of samples for each category) M C Wine 178 (59,71,48) 13 3 Wall 5456 (2205,826,2097,328) 24 4 Sonar 208 (97,111) 60 2 Glass 214 (70,76,17,13,9,29) 9 6 Ionosphere 351 (225,126) 34 2

Footnotes

Acknowledgments

References

Table 3
Details of the UCI datasets

Dataset N (Number of samples for each category) M C

Wine 178 (59,71,48) 13 3

Wall 5456 (2205,826,2097,328) 24 4

Sonar 208 (97,111) 60 2

Glass 214 (70,76,17,13,9,29) 9 6

Ionosphere 351 (225,126) 34 2