Adaptive kernel fuzzy C-Means clustering algorithm based on cluster structure 1

Abstract

The well-known Fuzzy C-Means (FCM) algorithm and its modified clustering derivatives have been widely applied in various fields. However, previous studies have focused on the yield of correctly clustered data, and few have addressed the alignment of extracted influential areas of clusters to natural cluster structure. Various clustering algorithms present diverse characteristics in cluster structure detection due to the different clustering principles involved. For example, Mahalanobis distance-based FCM algorithms effectively detect the influential direction of each cluster, while kernel-based FCM algorithms provide an interface for adjusting the influential range. Combining the advantages of these previous algorithms, the Adaptive Kernel Fuzzy C-Means (AKFCM) algorithm based on cluster structure is proposed in this paper. The AKFCM algorithm can effectively detect the influential direction and adjust the influential range of each cluster with adaptive kernelization. By applying the previous and AKFCM algorithms to both synthetic and real-world datasets, the proposed algorithm is proven to achieve better performance not only in clustering accuracy but also in the extraction of reasonable influential areas. The proposed algorithm could be helpful for clustering datasets composed of clusters with different directions and ranges in structure.

Keywords

Fuzzy C-Means mahalanobis distance kernel fuzzy C-Means influential area adaptive kernel

1 Introduction

Clustering, an unsupervised discrimination process applied to homogeneous data features with little or no prior knowledge [1], has made outstanding contributions in several domains such as engineering, medicine, image processing and pattern recognition [2 –6]. As one of the basic types of clustering, centroid-based clustering plays an important role in the development of clustering algorithms [7]. The Fuzzy C-Means (FCM) clustering algorithm [8, 9], in which each data point is shared among multipleclusters, is one of the most widely accepted and applied clustering algorithms. Many extensive studies have been conducted on the family of FCM algorithms from different perspectives, including the influence of fuzzy factors, cluster number selection, initialization sensitivity, the introduction of intuitionistic fuzzy sets, the difficulty of dealing with unbalanced or large-scale data sets and the constraints of solely detecting identical super spherical shapes [1, 10–16 , 1, 10–16].

Using the FCM algorithm as a meta/basic algorithm, several modified algorithms have been introduced by considering the relationships among data from different angles. In these algorithms, there are two general research modes: one is incorporating the neighboring region information around the data (region-based) and the other is integrating the structure information of cluster (structure-based). The region-based studies, which consider the relationship between data and neighboring region, have been mainly applied in image segmentation [17]. Most of these studies only need to handle single feature input data, i.e. intensity level feature [18]. In the image, the data (intensity level feature of image pixel) is closely associated with its neighboring region information. The neighborhood pixels usually hold similar intensity level feature, and the higher similarity means higher probability of belonging to the same cluster [19]. Therefore, by taking into account the local spatial image information, the problems such as local intensity inhomogeneities [17 , 21] and noise corruption [18 , 21–23], can be settled as the image pixels are not regarded as separated points any more. However, except for the image data, very few datasets can be incorporated with such clear contextual information as in the images. For this reason, the structure-based studies have been carried out to integrate the structure information of cluster into the clustering process. The data that better aligns with the cluster structure is more likely to belong to the corresponding cluster. Based on this principle, some studies have adopted kernelization to reflect the relationships between data and clusters, which go beyond the general distance ties [24 –32]. While some studies have achieved proper clustering performance by employing Mahalanobis distance to bring the structure information of clusters into calculation [33 –39]. The details of the typical structure-based algorithms are introduced in Section 3. More recently, the research on using both of the structure-based and region-based techniques has also been implemented [40, 41]. However, even though the structure information of cluster has been integrated into the clustering process in different extent, few researches have focused on and pertinently analyzed the measure of cluster structure, the influential areas of clusters, which are further introduced and discussed in this paper. This may inhibit effectively exploiting the natural cluster structure and thereby developing improved clustering algorithms.

The influential areas of a cluster are mappings of the data aggregation structure within a cluster. The directional and range information of the influential areas are called influential direction and influential range respectively in this paper. The influential areas extracted by the family of FCM algorithms can be described by the fuzzy membership degree of the data points belonging to the clusters. In Fig. 1, the influential areas of the illustrated cluster are indicated by varied colors, and their important envelope lines are also drawn. In the figure, data point A and B hold same distance to the cluster centre, and thus, by only considering the general distance in clustering, the attraction of the cluster for both points keeps identical. However, compared with point B, point A’s participation is more in line with the natural cluster structure, which makes its membership degree larger to get more reasonable clustering results. By constructing the influential area that is more consistent with cluster structure, the clustering accuracy and validity of membership degrees can be improved. Different clustering methods can generate diverse influential areas of clusters in accordance with various clustering principles. For example, the kernelization technique is more concentrated on adjusting the influential range of cluster, while the Mahalanobis distance measure is better at detecting the influential direction of cluster. However, the clustering performance is still limited, as few existing algorithms simultaneously consider and effectively tailor both the influential directions and ranges of clusters for achieving reasonable influential areas of clusters that align well with natural structure.

Fig.1

The influential areas of a cluster.

Therefore, we developed a clustering algorithm that exhibits reliable performance not only in clustering accuracy but also in extracting reasonable influential areas of clusters. There are two primary contributions in our work:

The Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm based on cluster structure is proposed in this paper through an analysis of existing algorithms in the FCM family.

A novel clustering validity index considering the influential areas of clusters is also introduced in addition to the conventional rand index to assess the effectiveness of the clustering algorithms.

The paper is organized as follows: Section 2 includes and summarizes the related works; Section 3 introduces a set of related algorithms including basic FCM, FCM based on Mahalanobis distance (FCM-M) and its modified algorithm, and Kernel Fuzzy C-Means (KFCM) and its modified algorithm; Section 4 proposes and details the Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm and novel clustering validity index; Section 5 analyzes and discusses the clustering performance of various previous algorithms versus the proposed algorithm from both the perspective of clustering accuracy and influential areas; Section 6 concludes the paper.

2 Related work

Many adaptive FCM family algorithms have been raised and analyzed in the previous researches.

Nanthini and Devi [4] applied an adapting skipping method to reduce the training time of FCM by giving varying degrees of attention on the correctly and incorrectly classified data sample. By using the skipping factor (SF), the data sample is labeled differently (SF = 0 or SF = 1) in each iteration of the algorithm. In this way, the correctly classified data sample is skipped and extra calculation is avoided. The size of the data sample incorporating into computation is adaptively changed according to the skipping factor values.

For images, Feng et al. [17] and Guo et al. [23] also put forward the adaptive versions of the FCM by adaptively selecting the neighbor regions with respect to each pixel of image based on mutual information and noise detection. These region-based studies are appropriate for image segmentation due to the clear contextual information given by image data. The adaptiveness of the algorithms is mainly incarnated in considering the heterogeneity of neighboring information. Thus, for the homogeneous data, these adaptive algorithms would lose their magic.

Searching for and capturing the hidden characteristics is a good way to reasonably cluster the homogeneous data. If the clusters are with similar structure, then all data points can be converted according to the identical specified rule. M. Meng and Y. Zhang [24] utilized kernelization rule for source code mining. A. M. Yang et al. [25] developed a KFCM-based fuzzy classifer by selecting an appropriate kernel function and its parameters. X. Yang et al. [27] integrated the KFCM into support vector machine algorithm and found the algorithm is robust for classification problems. Except for the kernelization, other kinds of conversions have also been tried and tested. Y. Feng et al. [17] utilized the Hausdorff distance based FCM to segment MR images effectively. O. Kesemen et al. [12] porposed a fuzzy C-Means clustering algorithm adapted for directional data based on angular difference. In the study, the data with the form of a periodic value is converted into circular data. L. E. Aik and T. W. Choon [33] replaced the Euclidean distance of FCM with Mahalanobis distance and increased the training accuracy. Based on FCM with Mahalanobis distance, Yong et al. [35] also found the improvement on image segmentation.

In the above mentioned studies, the parameters of kernelization, Mahalanobis distance calculation and other conversions are usually same to all clusters, which means the structures of all clusters obey an identical rule. However, the assumption of having similarly structured clusters is not always true in the data analysis.

To discover the unique characteristics of clusters, the adaptive process is needed. Unlike adaptively making changes on the clustering data size and neighboring region, different hidden characteristics of clusters can be extracted by adaptively detecting and adjusting the structure information of each cluster. The structure information of clusters, the influential directions and influential ranges, can be considered into clustering process by adopting varied kernel parameters and covariance matrices for different clusters.

Ferreira and de Carvalho [29] automatically assigned weights for different variables in KFCM. Actually, it can be seen as an adaptive approach of selecting the kernel parameters for different clustering dimensions. But this study mainly focused on the dissimilarity of variables rather than clusters. Huang et al. [30] proposed a multiple kernel fuzzy clustering algorithm by mapping the original data features to new feature spaces. It demonstrates the effectiveness of using multiple kernels, but the learned kernel parameters may have little correlation with cluster structure. Similar attempts were also made by Dzung and Long [31], in which different kernels were further combined to a new kernel. Chen et al. [32] also utilized multiple kernels and their composite kernels to fuse different information in clustering. Nevertheless, very few researches have discussed the selection of different kernels matching different clusters, which can incorporate the specific range information of cluster into computation.

Liu et al. [34] proposed FCM based on complete Mahalanobis distance by considering both of the overall covariance matrix and local covariance matrix. Melnykoy and Melnykov [38] extended K-means algorithm to Mahalanobis distance based algorithm by estimating the local covariance matrices. Similarly, Zhao et al. [36] and Hamasuna et al. [37] calculated the Mahalanobis distance with the covariance matrix of each cluster, which gives the opportunity to consider different influential directions of clusters. However, in some cases such as occurring invalid inverse of covariance matrix and requiring influential range adjustment, these algorithms could still experience difficulties.

Overall, the adaptive detection of influential directions and adjustment of influential ranges need to be further integrated on cluster level, which is the main work of this study. Comparatively, our work will pay more attention on how to improve FCM based on more complete cluster structure information rather than relying on additional information (the intensity of neighboring region) or partial structure information (neglecting influential range or influential direction). In the following sections, by summarizing previous related work, we will introduce several representative algorithms and our proposed algorithm in detail.

3 Related algorithms

3.1 Fuzzy C-Means

The Fuzzy C-Means (FCM) clustering algorithm introduced in 1984 [9] as a generalization of ISODATA [8] has been widely applied due to its simplicity and efficiency. The main idea of the FCM algorithm is to cluster data into reasonable groups by grouping samples with close distance in feature space with each data point being shared by greater than one cluster. The FCM defines x_k (k = 1, 2, . . . n) as the k-th data point in the target dataset, v_i (i = 1, 2, . . . c) as the i-th center of the clusters, d_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) as the Euclidean distance between the k-th data and i-th cluster center, and u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) as the fuzzy membership degree of the k-th data with respect to the i-th cluster (u_ik is equal to 0 or 1 in the Hard C-Means clustering algorithm). In this paper, lowercase symbols, bold lowercase symbols and bold capitalized symbols represent values (1-dimension), vectors and matrices respectively.

The objective function of the FCM can be written as Equation (1) where m is a fuzzy factor. $J = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} d_{ik}^{2} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} ∥ x_{k} - v_{i} ∥^{2}$ (1)

The membership degrees of the data must also obey Equation (2) and, of course, u_ik ⩾ 0. $\sum_{i = 1}^{c} u_{ik} = 1, k = 1, 2 . . . n$ (2)

To minimize the objective function under the constraint of Equation (2), the gradient of its Lagrangian function on the unknown variables (U and V matrices) can be calculated. Equations (3) and (4) provide updated membership degrees and centers via iterative computation. $u_{ik} = \frac{| | x_{k} - v_{i} | |^{\frac{2}{1 - m}}}{\sum_{j = 1}^{c} | | x_{k} - v_{j} | |^{\frac{2}{1 - m}}}$ (3) $v_{i} = \frac{\sum_{k = 1}^{n} (u_{ik})^{m} x_{k}}{\sum_{k = 1}^{n} (u_{ik})^{m}}$ (4)

The FCM algorithm proceeds as follows.

Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T.

Initialize the cluster centers $V^{(0)} = {v_{i}^{(0)}; i = 1, 2, . . . c}$ and the membership degrees $U^{(0)} = {u_{ik}^{(0)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$

Calculate the updated membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ in iteration t according to Equation (3).

Calculate the updated centers $V^{(t)} = {v_{i}^{(t)}; i = 1, 2, . . . c}$ in iteration t according to Equation (4).

If ∥V^(t) - V^(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).

To obtain improved clustering results, the best cluster number can be found by calculating a validity measure [42, 43]. However, because this process is not the essence of this study, the cluster numbers of the datasets are predefined based on the actual cluster numbers in this paper. The initialization of the cluster centers should also be considered; details on this step can be found in Section 4.4. For the sake of simplicity and clarity, we set the fuzzy factor value m equal to 2.0 in the following sections as many users prefer this value of m [44].

3.2 Fuzzy C-Means based on mahalanobis distance

The traditional FCM clustering algorithm is based on the Euclidean distance metric and works well for detecting clusters with super spherical shapes. However, the assumptions of the Euclidean distance metric, which require equivalence and independence in the feature space of the target dataset, are not always valid in practice. Therefore, an FCM algorithm based on Mahalanobis distance was proposed, and many modified forms of it have been studied [33 –39].

The advantage of the Mahalanobis distance is that it takes the structure of a dataset into account (by achieving the covariance matrix) in calculating the distance between data vectors. A distance metric shows the difference between data points [45], and the Mahalanobis distance metric can adequately express varied relationships with different aggregation structures. From this point of view, the Fuzzy C-Means clustering algorithm based on Mahalanobis distance (FCM-M) may be more appropriate for analyzing correlated data that show clear correlation in the feature space than the FCM algorithm. The squared Mahalanobis distance is defined in Equation (5): $\begin{matrix} D_{M}^{2} (x_{i}, x_{j}) = | | x_{i} - x_{j} | |_{M}^{2} \\ = (x_{i} - x_{j})^{T} \sum^{- 1} (x_{i} - x_{j}) \end{matrix}$ (5) in which, x_i and x_j are the target vectors and Σ is the covariance matrix of the given dataset.

The objective function of the FCM-M algorithm can be written as Equation (6): $\begin{matrix} J = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} D_{ik}^{2} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} | | x_{k} - v_{i} | |_{M}^{2} \\ = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} (x_{k} - v_{i})^{T} \sum^{- 1} (x_{k} - v_{i}) \end{matrix}$ (6) in which, v_i (i = 1, 2, . . . c) is the i-th center of the clusters; x_k (k = 1, 2, . . . n) is the k-th data point in the target dataset; u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) is the fuzzy membership degrees of the data k with respect to cluster i; D_ik is the Mahalanobis distance between the cluster i and data k; Σ^-1 is the inverse matrix of covariance matrix of the entire given dataset.

By calculating the corresponding Lagrangian function, the updated membership degrees can be obtained as shown in Equation (7). The relevant Mahalanobis distance is easily found according to Equation (5). The calculation of the updated centers is identical to Equation (4) in the FCM algorithm. To focus more attention on analyzing the characteristics of the algorithm, this derivation is not presented in this paper. The FCM-M algorithm proceeds similarly to the FCM algorithm; only the calculation of the updated membership degrees at step (c) is changed from Equations (3) to (7). $u_{ik} = \frac{| | x_{k} - v_{j} | |_{M}^{\frac{2}{1 - m}}}{\sum_{j = 1}^{c} | | x_{k} - v_{j} | |_{M}^{\frac{2}{1 - m}}}$ (7)

3.3 Modified fuzzy C-Means based on mahalanobis distance

Unlike the FCM-M algorithm, the main idea of the modified Fuzzy C-Means based on Mahalanobis distance (mFCM-M) is to consider the structure of each cluster rather than solely exploit the overall structure of the dataset. The covariance matrix of each cluster is associated into the computation, which helps the mFCM-M detect the influential structure of clusters. The objective function is written asEquation (8): $\begin{matrix} J = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} D_{ik}^{2} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} | | x_{k} - v_{i} | |_{M_{i}}^{2} \\ = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} (x_{k} - v_{i})^{T} \sum_{i}^{- 1} (x_{k} - v_{i}) \end{matrix}$ (8) in which, v_i (i = 1, 2, . . . c) is the i-th center of the clusters; x_k (k = 1, 2, . . . n) is the k-th data point in the target dataset; u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) is the fuzzy membership degree of the data k with respect to cluster i; $Σ_{i}^{- 1}$ is the inverse of the covariance matrix of cluster i.

The updated membership degrees and centers can be obtained via iterative calculation according to Equations (9) and (4). The covariance matrix of each cluster is calculated based on the data points belonging to the cluster, as shown in Equation (10): $u_{ik} = \frac{| | x_{k} - v_{j} | |_{M_{i}}^{\frac{2}{1 - m}}}{\sum_{j = 1}^{c} | | x_{k} - v_{j} | |_{M_{i}}^{\frac{2}{1 - m}}}$ (9) $\sum_{i} = cov ({x_{k} | u_{ik} > {\bar{u}}_{i}, k = 1, 2 . . . n})$ (10) in which, cov is a function for calculating the covariance matrix, and ${\bar{u}}_{i}$ is the proper threshold value of the membership degree to detect the core structure of cluster i (in the paper, we use the mean value of the membership degrees of the data points belonging to the clusters by assuming that the data with larger membership degree and less noise interference are more beneficial to find out the accurate structural information).

The mFCM-M algorithm proceeds as follows.

Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T.

Initialize the cluster centers $V^{(0)} = {v_{i}^{(0)}; i = 1, 2, . . . c}$ and the membership degrees $U^{(0)} = {u_{ik}^{(0)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ .

Calculate the inverse of the covariance matrix of each cluster $(Σ_{i}^{- 1})^{t}$ in iteration t according to Equation (10).

Calculate the updated membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ in iteration t according to Equation (9).

Calculate the updated centers $V^{(t)} = {v_{i}^{(t)}; i = 1, 2, . . . c}$ in iteration t according to Equation (4).

If ∥V^(t) - V^(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).

3.4 Kernel fuzzy C-Means

Due to the increasing popularity of the kernel function, the Kernel Fuzzy C-Means (KFCM) clustering algorithm and its modified versions have been gradually applied in various fields [24 –32]. The widely accepted Gaussian Radial Basis Function (GRBF) kernel shown in Equation (11) is used in the KFCM algorithm in this paper. The kernel function elaborates the relationship of data points that have been mapped into a higher dimensional space: $\begin{matrix} K (x_{i}, x_{j}) = 〈 φ (x_{i}), φ (x_{j}) 〉 \\ = exp (- | | x_{i} - x_{j} | |^{2} / 2 k_{b}^{2}) \end{matrix}$ (11) in which, K (x_i, x_j) represents the kernel value between data point x_i and x_j; φ (x_i) and φ (x_j) are the mapping values of x_i and x_j in high dimensional space; 〈x, y〉 calculates the inner product of x and y; k_b is the kernel width parameter.

The objective function of the KFCM algorithm addresses the distance of mapped values in high dimensional space, which is written as Equation (12). The updated membership degrees and centers can be derived via iterative calculation and written as Equations (13) and (14). The derivation is not presented in this paper; however, interested readers can refer to [46] for details. $\begin{matrix} J^{φ} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} (u_{ik})^{m} | | φ (x_{k}) - φ (v_{i}) | |^{2} \\ = 2 \sum_{k = 1}^{n} \sum_{i = 1}^{c} {(u_{ik})^{m} [1 - K (x_{k}, v_{i})]} \end{matrix}$ (12) $u_{ik} = \frac{(1 - K (x_{k}, v_{i}))^{\frac{1}{1 - m}}}{\sum_{j = 1}^{c} (1 - K (x_{k}, v_{j}))^{\frac{1}{1 - m}}}$ (13) $v_{i} = \frac{\sum_{k = 1}^{n} (u_{ik})^{m} K (x_{k}, v_{i}) x_{k}}{\sum_{k = 1}^{n} (u_{ik})^{m} K (x_{k}, v_{i})}$ (14) in which,v_i (i = 1, 2, . . . c) is the i-th center of the clusters; x_k (k = 1, 2, . . . n) is the k-th data point in the target dataset; u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) is the fuzzy membership degree of the data k with respect to cluster i.

The KFCM algorithm proceeds as follows.

Set cluster number c, fuzzy factor m, kernel width parameter k_b, stopping condition ɛ, and maximum iteration number T.

Initialize the cluster centers $V^{(0)} = {v_{i}^{(0)}; i = 1, 2, . . . c}$ and the membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ .

Calculate the kernel matrix K^(t) (x, v) in iteration t according to Equation (11).

Calculate the updated membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ in iteration t according to Equation (13).

Calculate the updated centers $V^{(t)} = {v_{i}^{(t)}; i = 1, 2, . . . c}$ in iteration t according to Equation (14).

If ∥V^(t) - V^(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).

By mapping the data into a high dimensional space, the KFCM algorithm can expand the differences among the features of the dataset to gain higher efficiency and clustering performance. However, the determination of a favorable value of the kernel width parameter k_b has not been properly specified in previous studies. The kernel width parameter is used to change the correlation between data points and centers according to the specified range. Too large or too small kernel width parameter can lead to insensitive and invalid correlation change, which further affects achieving accurate clustering results and reasonable influential areas. Therefore, combining with the empirical performance, the mean value of the distance between cluster centers and data points is used to select the kernel width parameter k_b in this paper, as shown in Equation (15). $k_{b} = \sqrt{\frac{1}{nc} \sum_{k = 1}^{n} \sum_{i = 1}^{c} | | x_{k} - v_{i} | |^{2}}$ (15)

3.5 Modified kernel fuzzy C-Means

The single value of the kernel width parameter for an entire dataset limits the delineation of the influential ranges of clusters, as the influential range may vary on each cluster and each dimension. The modified Kernel Fuzzy C-Means (mKFCM) clustering algorithm with additional kernel width parameters is a viable option. The mKFCM introduced in this paper considers various kernel width parameters for different clusters and dimensions according to Equation (16): $k_{b, ip} = c \sqrt{\frac{1}{n_{i}} \sum_{s = 1}^{n_{i}} | | x_{sp} - v_{ip} | |^{2}}$ (16) in which, k_b,ip (i = 1, 2, . . . c ; p = 1, 2, . . . d) is the kernel width parameter for cluster i on dimension p; d is the number of dimensions; n_i is the number of data points in cluster i, which is given by aggregating the data points with largest membership degree on cluster i.

The objective function, updated membership degrees and centers of the mKFCM algorithm can be written as Equations (17)–(19) respectively: $\begin{matrix} J^{φ} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} \sum_{p = 1}^{d} (u_{ik})^{m} | | φ_{ip} (x_{kp}) - φ_{ip} (v_{ip}) | |^{2} \\ = 2 \sum_{k = 1}^{n} \sum_{i = 1}^{c} \sum_{p = 1}^{d} {(u_{ik})^{m} [1 - K ip (x_{kp}, v_{ip})]} \end{matrix}$ (17) $u_{ik} = \frac{\sum_{p = 1}^{d} (1 - K_{ip} (x_{kp}, v_{ip}))^{\frac{1}{1 - m}}}{\sum_{j = 1}^{c} \sum_{p = 1}^{d} (1 - K_{jp} (x_{kp}, v_{jp}))^{\frac{1}{1 - m}}}$ (18) $v_{ip} = \frac{\sum_{k = 1}^{n} \sum_{p = 1}^{d} (u_{ik})^{m} K_{ip} (x_{kp}, v_{ip}) x_{kp}}{\sum_{k = 1}^{n} \sum_{p = 1}^{d} (u_{ik})^{m} K_{ip} (x_{kp}, v_{ip})}$ (19) in which, v_ip (i = 1, 2, . . . c ; p = 1, 2, . . . d) represents the value of the cluster center i in dimension p; x_kp (k = 1, 2, . . . n ; p = 1, 2, . . . d) represents the value of data point k in dimension p in the target dataset; u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) is the fuzzy membership degree of the data k with respect to cluster i; K_ip (* , *) calculates the kernel function value with kernel width parameter of cluster i in dimension p.

The mKFCM algorithm proceeds similarly to that of the KFCM algorithm and is not presented here due to the length limit of the paper.

4 Methods

4.1 Defects of the previous algorithms

The aforementioned FCM, FCM-M and KFCM algorithms fail to detect the distinct influential direction and range of each cluster, which hinders a better understanding of natural cluster structure. In contrast, the mFCM-M algorithm effectively discovers the influential structure of clusters by computing the inverse of their covariance matrices. Nevertheless, two problems remain:

The inverse of the covariance matrix may not exist for certain clusters in the calculation of the Mahalanobis distance. Although the pseudo inverse of a covariance matrix can be used alternatively, it can reduce the accuracy of the computation. Clustering results on the synthetic dataset DS3 exhibit this problem in Section 5.

As we use the threshold value of the membership degree to detect the core structure of each cluster, the varying data densities of the core areas can lead to influential range deviation.

To address the problems associated with the mFCM-M algorithm, the mKFCM algorithm provides an opportunity for adjusting the influential range without taking the inverse computation. This is accomplished by using varying kernel width parameters for different clusters. However, the mKFCM algorithm is insensitive to detecting the influential direction of each cluster. Taking advantage of the mFCM-M and mKFCM algorithms, a new algorithm is proposed in this paper to ensure both the detection of the influential direction and the adjustment of the influential range of each cluster.

4.2 Adaptive kernel fuzzy C-Means

By combining the advantages of the former structure-based algorithms, the Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm is proposed in this paper. Unlike the region-based adaptive algorithms, the AKFCM is devoted to adaptively integrate the cluster structure information into the clustering process. Moreover, compared with the former structure-based algorithms, the AKFCM aims to extract more reasonable influential areas that align well with natural cluster structure by simultaneously tailoring the influential direction and influential range of each cluster. In the AKFCM algorithm, the objective function is rewritten as Equation (20). It can be seen that two types of feature space transformation are adopted:

the first transformation is for detecting the influential directions of clusters, in which the uncorrelated features for each cluster are built up using the directional information of the covariance matrix;

the second transformation is for adjusting the influential ranges of clusters, in which the influential ranges in the uncorrelated feature space are determined using the kernel function.

\begin{matrix} J^{φ^{'}} = \sum_{k = 1}^{n} \sum_{i = 1}^{c} \sum_{p = 1}^{d} (u_{ik})^{m} | | φ_{ip} (w_{i} (x_{kp})) - φ_{ip} (w_{i} (v_{ip})) | |^{2} \\ = 2 \sum_{k = 1}^{n} \sum_{i = 1}^{c} \sum_{p = 1}^{d} {(u_{ik})^{m} [1 - K_{ip} (w_{i} (x_{kp}), w_{i} (v_{ip}))]} \end{matrix}

(20) in which, x_kp (k = 1, 2, . . . n ; p = 1, 2, . . . d) denotes the value of data point k in dimension p and v_ip (i = 1, 2, . . . c ; p = 1, 2, . . . d) represents the value of cluster center i in dimension p; w_i (*) transforms the given data into the uncorrelated feature space corresponding to cluster i; φ_ip (*) transforms the given data into a higher dimensional space according to the kernel function K_ip (* , *), which defines different kernel width parameters for each cluster in each dimension; u_ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) is the fuzzy membership degree of the data k with respect to cluster i.

To minimize the objective function, the gradient of the Lagrangian function (which also addresses the restrictions of the membership degrees in Equation (2)) on the unknown variables (U and w (V)) can be calculated. The Lagrangian function is shown in Equation (21), where λ is the Lagrangian parameter. The closed-form formulas for updating unknown variables (U and w (V)) are derived by taking the partial derivatives on the Lagrangian function. Through a specific operation and conversion, the updated membership degrees and centers in the uncorrelated feature space can be derived and written as Equations (22) and (23). The updated centers V in the original feature space can be easily found through a corresponding reverse transformation. $\begin{matrix} L (J^{φ^{'}}) = 2 \sum_{k = 1}^{n} \sum_{i = 1}^{c} \sum_{p = 1}^{d} {(u_{ik})^{m} \\ [1 - K_{ip} (w_{i} (x_{kp}), w_{i} (v_{ip}))]} \\ + \sum_{k = 1}^{n} λ (\sum_{i = 1}^{c} u_{ik} - 1) \end{matrix}$ (21) $u_{ik} = \frac{\sum_{p = 1}^{d} [1 - K_{ip} (w_{i} (x_{kp}), w_{i} (v_{ip}))]^{\frac{1}{1 - m}}}{\sum_{j = 1}^{c} \sum_{p = 1}^{d} [1 - K_{jp} (w_{j} (x_{kp}), w_{j} (v_{jp}))]^{\frac{1}{1 - m}}}$ (22) $w_{i} (v_{ip}) = \frac{\sum_{k = 1}^{n} (u_{ik})^{m} K_{ip} (w_{i} (x_{kp}), w_{i} (v_{ip})) w_{i} (x_{kp})}{\sum_{k = 1}^{n} (u_{ik})^{m} K_{ip} (w_{i} (x_{kp}), w_{i} (v_{ip}))}$ (23)

The kernel width parameters in the AKFCM algorithm are not constants but rather adaptive changing parameters based on the structural information of different clusters. The first transformation detecting the directional information of each cluster is obtained according to Equation (24). This transform function is to map the current data set x to the uncorrelated feature space by rotating x with specified directions. For each cluster, the transform function is varied due to the different assembling direction of cluster. By multiplying the given data with the eigenvectors of the covariance matrix of cluster, the data based on the new coordinates that align with the assembling direction of cluster can be achieved. This can make each cluster judge the unlabelled data according to the cluster’s own directional preferences. $w_{i} (x) = x * eig (cov ({x_{k} | u_{ik} > {\bar{u}}_{i}, k = 1, 2 . . . n}))$ (24) in which, cov is a function for calculating the covariance matrix; eig is a function for finding the eigenvectors of the covariance matrix; ${\bar{u}}_{i}$ is the proper threshold value of the membership degrees for detecting the core structure of cluster i (in this paper, we use the mean value of the membership degrees of the data points belonging to the clusters by assuming that the data with larger membership degree and less noise interference are more beneficial to find out the accurate directional information).

The second transformation determining the influential range of each cluster is achieved by employing different kernel width parameters in the kernel function for each cluster and dimension according to Equations (25)–(27). The range of the core data in each cluster and dimension is measured by Equation (25). Meanwhile, as the core data of each cluster can be detected with varying degrees, adjustments to the core data range are made by Equations (26) and (27) (based on the data density of the clusters; higher density can lead to greater range adjustments): $k_{b, ip} = c \cdot α_{i} \cdot \sqrt{\frac{1}{n_{i}} \sum_{s = 1}^{n_{i}} | | w_{i} (x_{sp}) - w_{i} (v_{ip}) | |^{2}}$ (25) $α_{i} = 1 + {(β_{i} / \sum_{j = 1}^{c} β_{j})}^{1 / c}$ (26) $β_{i} = n_{i} / \prod_{p = 1}^{d} \sqrt{\frac{1}{n_{i}} \sum_{s = 1}^{n_{i}} | | w_{i} (x_{sp}) - w_{i} (v_{ip}) | |^{2}}$ (27) in which k_b,ip (i = 1, 2, . . . c ; p = 1, 2, . . . p) is the kernel width parameter for cluster i on dimension p; w_i (*) transforms the given data into the new feature space corresponding to cluster i; c is the number of clusters; n_i represents the number of data points in cluster i; α_i is the adjusting parameter for cluster i; β_i is the data density of cluster i.

The AKFCM algorithm proceeds as follows.

Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T.

Initialize the cluster centers $V^{(0)} = {v_{i}^{(0)}; i = 1, 2, . . . c}$ and the membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$

Detect the influential direction of each cluster and transform the original data and centers into the uncorrelated feature space corresponding to each cluster according to Equation (24).

Calculate and adjust the influential range of each cluster in the transformed feature space and adaptively determine the kernel width parameter for each cluster on each dimension according to Equations (25)–(27).

Calculate the kernel matrix $K_{ip}^{(t)} (w_{i} (x_{kp}), w_{i} (v_{ip}))$ in iteration t according to Equation (11).

Calculate the updated membership degrees $U^{(t)} = {u_{ik}^{(t)}; i = 1, 2, . . . c; k = 1, 2, . . . n}$ in iteration t according to Equation (22).

Calculate the updated centers in the uncorrelated feature space w (V) ^(t) = {w_i (v_i) ^(t) ; i = 1, 2, . . . c} in iteration t according to Equation (23) and retrieve the updated centers $V^{(t)} = {v_{i}^{(t)}; i = 1, 2, . . . c}$ in iteration t by carrying out a reverse transformation, i.e., from the uncorrelated feature space to the original feature space.

If ∥V^(t) - V^(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).

The time complexity of the AKFCM algorithm is O(cdnt), where c is the number of clusters, d is the number of dimensions, n is the number of data and t is the iteration number. The pseudo-code of the AKFCM algorithm is as follows:

4.3 Convergence of AKFCM

The convergence of AKFCM is proven as follows through referring to the previous studies on the convergence of the FCM family of algorithms [29 , 47–49]. Let U = {u_ik ; i = 1, 2, . . . c ; k = 1, 2, . . . n}. φ = {φ_ip (w_i (v_ip)) ; i = 1, 2, . . . c ; p = 1, 2, . . . d}

Theorem 1. Let φ be fixed, then U is a local minimum of J^φ′ in Equation (20) if and only if U is computed from Equation (22).

Proof. The first derivative of the Lagrangian function of U confirms the only-if part. To prove it is sufficient, the Hessian of the Lagrangian function of U is inspected, as shown in Equation (28).□ $\begin{matrix} h_{ij, ab} (U) = \frac{\partial}{\partial u_{ij}} (\frac{\partial J^{φ^{'}} (U)}{\partial u_{ab}}) \\ = {\begin{matrix} if i = a, j = b \\ m (m - 1) ((u_{ij})^{m - 2} \sum_{p = 1}^{d} | | φ_{ip} (w_{i} (x_{jp})) \\ - φ_{ip} (w_{i} (v_{ip})) | |^{2}) \\ otherwise \\ 0 \end{matrix} \end{matrix}$ (28)

Since m > 1, u_ij > 0 and ∥φ_ip (w_i (x_jp)- φ_ip (w_i (v_ip) ∥ ² > 0, the Hessian of the Lagrangian function of U is positive definite. Thus, Equation (22) is sufficient and can achieve the local minimum of the objective function.

Theorem 2. Let U be fixed, then φ is a local minimum of J^φ′ in Equation (20) if and only if φ is computed from Equation (23).

Proof. The derivation of the Lagrangian function of φ confirms the only-if part. Meanwhile, as the Lagrangian function of φ is unconstrained and strictly convex function, the local minimum can be obtained through first derivative computation.□

According to Theorems 1 and 2, it can be proved that the objective function is a decreasing function and the AKFCM algorithm is locally convergent.

4.4 Initialization of clustering centers

In the proposed AKFCM algorithm, the core structure of each cluster is detected in accordance with the data belonging to it. However, initial detection may be ineffective as the clustering centers are randomly initialized in the FCM family of algorithms. Therefore, an effective initialization process is implemented in this paper to avoid unnecessary computation.

A clustering center’s distance to the data within the cluster should be relatively small, and its distance to other clustering centers should be relatively large. In light of this point, a modified initialization process of the clustering centers V = {v_i ; i = 1, 2, . . . c} is proposed as follows.

Find the alternative cluster centers $V^{'} = {v_{j}^{'}; j = 1, 2, . . . c^{'}; c^{'} > c}$ using subtractive clustering [50] on the dataset. This step generates several possible clustering centers with high data density, which confirms the existence of a relatively small distance between the centers and the data within their clusters.

Pick the two data points with the greatest distance from the alternative cluster centers and add them to the list of clustering centers V.

If the current size of V equals the number of clustering centers c, then the initialization process terminates; if not, it continues to step (d).

Calculate the minimum distance $d_{j}^{'} = {j = 1, 2, . . . c^{″}}$ between each of the remaining alternative cluster centers $v_{j}^{'} = {j = 1, 2, . . . c^{″}}$ and the current V.

Pick the data point with the largest minimum distance $max ({d_{j}^{'} = (j = 1, 2, . . . c^{″})})$ from the remaining alternative cluster centers and add them to the list of clustering centers V. Return to step (c). Step (d) and (e) ensure that the distances between the clustering centers are relatively large.

The initialization process was tested by executing the basic FCM algorithm on synthetic data sets S1, S2, S3 and S4, which have 5000 data points for 15 predefined clusters with varying degrees of overlap [51]. Proper initialization of the clustering centers can result in improved clustering results. Figure 2 shows the clustering results of the basic FCM algorithm using random initialization and this modified initialization process. The basic FCM algorithm using random initialization was carried out 1000 times on each synthetic data set. The Adjusted Rand (AR) index of the clustering results is used to assess the clustering accuracy, which is calculated according to Equations (29) and (30). We defuzzify the clustering results by determining the affiliated cluster of data according to its largest membership degree. A higher AR index value represents greater accuracy. From Fig. 2, it can be seen that the clustering accuracy of the basic FCM algorithm with modified initialization is much closer to the maximum clustering accuracy of the basic FCM algorithm using random initialization. Thus, the modified initialization of the clustering centers was adopted in the following analysis to avoid unnecessary and ineffective computation. $RI = \frac{TP + TN}{TP + FP + FN + TN}$ (29) $ARI = \frac{RI - E (RI)}{max (RI) - E (RI)}$ (30) in which, TP is the number of pairs of data that are clustered into the same cluster and with same label; TN is the number of pairs of data that are clustered into the different clusters and with different labels; fP is the number of pairs of data that are clustered into the same cluster but with different labels; fN is the number of pairs of data that are clustered into the different clusters but with same label; max(RI) and E(RI) return the maximum index and expected index.

Fig.2

Clustering results using different initialization processes.

4.5 Influential area index

Clustering accuracy can be properly evaluated by the Adjusted Rand (AR) index [52, 53], which is a measure of the similarity between two data clusterings: clustering results and data real labels. However, to our knowledge, there is still a lack of an effective index for assessing reasonability of the produced clustering influential areas. In light of this point, a new clustering validity index called the Influential Area (IA) index is proposed in this study. The IA index of a data clustering is calculated as follows:

${IA}_{l} = \frac{Area ({Z_{l} | u_{Z_{l}} ⩾ min (u_{X_{l}}), \min (X_{l}) ⩽ Z_{l} ⩽ \max (X_{l})})}{Area (X_{l})}$ (31) in which, area (*) is a function for calculating the area bounded by the convex hull of the target data; X_i represents the data matrix with label l; z_i is a constructed grid data matrix within the range of X_i; u_{X
_i} and u_{Z
_j} are the corresponding membership degrees with respect to cluster l.

According to Equation (31), the IA index of a cluster is the ratio between the influential area of the clustering results and the area of the natural structure of the data. As we take the minimum membership degree of each cluster as the threshold, all of the data belonging to a specific cluster are included in the boundaries of the constructed influential area. In other words, the constructed influential area is larger than the area of the data. Thus, the IA index has a value greater than 1, with 1 indicating that the influential areas of clusters exactly align with the natural structure of the data. Generally, the IA index cannot be 1 as the boundaries of data with identical labels rarely achieve completely consistent membership degrees. However, an IA index value closer to 1 is still preferable for portraying influential structure similar to the natural structure of the data. In Fig. 3, which shows the different IA indexes of two clustering algorithms, the depicted structure of algorithm A (IA index value closer to 1) matches the natural structure of the data better than that of algorithm B. The mean value of the IA indexes of all the clusters is used to assess the validity and rationality of the influential areas of clusters in the following section.

Fig.3

Influential area (IA) index.

With respect to the numerous cluster validity indexes in literature, the proposed Influential Area index is able to check whether the achieved membership degrees (depicting influential area) are appropriate by comparing the membership-built area and label-built area. Thus, the index is fit for the fuzzy clustering algorithms that output membership degrees and the data with real labels. Also note that the index is more suitable to describe convex clusters as the area is calculated according to the convex hull of the target data.

5 Analysis and discussion

To clearly discuss the characteristics of the clustering algorithms in detail, the performance of the previous algorithms are analyzed first in this section, and a comparative analysis with the proposed AKFCM algorithm is discussed in the following section.

5.1 Previous algorithms

The FCM, FCM-M, mFCM-M, KFCM and mKFCM algorithms were applied to synthetic datasets DS1, DS2 and DS3 (each dataset has 1200 data points) as shown in Figs. 4–6. The AR and IA indices are used to assess the accuracy of the clustering results and rationality of the influential areas of clusters produced. Higher AR index values and lower IA index values signify improved clustering performance.

Fig.4

Clustering performance of previous algorithms on the DS1 dataset.

Fig.5

Clustering performance of previous algorithms on the DS2 dataset.

Fig.6

Clustering performance of previous algorithms on the DS3 dataset.

The clustering performance of traditional FCM and FCM-M on synthetic dataset DS1 is given in Figs. 4(a1), 4(a2), 4(b1) and 4(b2). The accuracy of FCM-M (AR Index = 0.978) is higher than that of FCM (AR Index = 0.867). More importantly, the influential areas of FCM-M appear more rational (IA Index = 1.150) than those of FCM, considering the overall structure of the dataset. However, because it solely considers overall structure, FCM-M may perform as well as or even worse than FCM when the clusters in a dataset have different structures with respect to the overall structure. For example, in Figs. 5(a1), 5(a2), 5(b1) and 5(b2) on synthetic dataset DS2, the clustering performance of FCM (AR Index = 0.921; IA Index = 1.139) is much higher than FCM-M (AR Index = 0.45; IA Index = 1.179). The poor performance of FCM-M may result from its dependency on the overall structure of the dataset, as seen in the influential areas in Fig. 5(b2).

To overcome the defects of FCM-M, mFCM-M was applied to synthetic datasets DS1 and DS2. As shown in Figs. 4(c1), 4(c2), 5(c1) and 5(c2), increased overall performance (AR Index = 0.993 (DS1); IA Index = 1.156 (DS1); AR Index = 0.958 (DS2); IA Index = 1.132 (DS2)) over FCM and FCM-M was achieved, as the mFCM-M considers the sub-structure of each cluster. Intuitively, the influential areas of mFCM-M, particularly the influential directions, resemble the structure of the clusters. However, mFCM-M has its own shortcomings, which were mentioned in Section 4, the empirical evidence for which can be found in its clustering performance on synthetic dataset DS3. As shown in Fig. 6(c1) and 6(c2), the data in cluster 2 (blue) present a linear relationship in the uncorrelated feature space, making the inverse of the covariance matrix impossible to calculate. Thus, mFCM-M performs poorly because it uses the pseudo inverse of the covariance matrix (AR Index = 0.535).

The clustering performance of KFCM on synthetic datasets DS1, DS2 and DS3 is given in Figs. 4(d1), 4(d2), 5(d1), 5(d2), 6(d1) and 6(d2). The kernel width parameters for the three datasets are set according to Equation (20). The clustering results are not that bad, with AR Index = 0.867 (DS1); IA Index = 1.216 (DS1); AR Index = 0.926 (DS2); IA Index = 1.139 (DS2); AR Index = 0.756 (DS3); IA Index = 1.26 (DS3). However, there is still room for improvement in the optimization of the clustering results and influential areas. From the figures, it is fairly straightforward to see that the influential ranges of the clusters tend to be similar, which is actually caused by using the same kernel width parameter indiscriminately on different clusters.

The clustering performances of mKFCM on synthetic datasets DS1, DS2 and DS3 are shown in Figs. 4(e1), 4(e2), 5(e1), 5(e2), 6(e1) and 6(e2) (AR Index = 0.918 (DS1); IA Index = 1.218 (DS1); AR Index = 0.909 (DS2); IA Index = 1.142 (DS2); AR Index = 0.739 (DS3); IA Index = 1.308 (DS3)). Compared to the clustering performances of the KFCM, the influential areas, particularly the influential ranges, of mKFCM are varied among different clusters because mKFCM chooses the kernel width parameter separately for each cluster in each dimension. However, unfortunately, both KFCM and mKFCM are insensitive to detecting the influential direction of each cluster. Their influential directions of clusters are still based on horizontal and vertical coordinates (which is noticeable from the innermost envelopes of their influential areas), which may inhibit their ability to handle datasets with differently structured clusters, particularly those with different directional preferences.

5.2 AKFCM

The clustering performance of the proposed AKFCM algorithm on synthetic datasets DS1, DS2 and DS3 is shown in Figs. 7(a1), 7(a2), 7(b1), 7(b2), 7(c1), 7(c2). The clustering performance of AKFCM is generally better not only in clustering accuracy (AR Index = 0.983 (DS1); AR Index = 0.953 (DS2); AR Index = 0.966(DS3)) but also in influential area detection (IA Index = 1.156 (DS1); IA Index = 1.128 (DS2); IA Index = 1.118(DS3)). Figure 7(d) demonstrates the average performance of FCM, FCM-M, mFCM-M, KFCM, mKFCM and AKFCM. Of the previous algorithms, mKFCM achieves the highest accuracy (Average AR Index = 0.855), and mFCM-M achieves the most reasonable influential areas (Average IA Index = 1.184). In comparison, the clustering results of AKFCM have both higher accuracy (Average AR Index = 0.967) and more reasonable influential areas (Average IA Index = 1.134).

Fig.7

Clustering performance of AKFCM on the DS1, DS2 and DS3 datasets.

To further evaluate the effectiveness of the proposed algorithm, the previous algorithms and AKFCM were also applied to various publicly available real-world and synthetic datasets [54] including Iris [55], Wine [56], WDBC [57], Aggregation [58] and the synthetic data sets S1, S2, S3 and S4 [51]. The clustering results are listed in Table 1 for the sake of comparison. AKFCM maintains higher performance compared with the other algorithms not only in clustering accuracy (Average AR Index = 0.847) but also in extracting more reasonable influential areas (Average IA Index = 1.271). The average running time of the six algorithms is also given in Table 2. Even though the AKFCM can hardly beat the other algorithms from the perspective of running time, the improvement on both clustering accuracy and influential areas can justify its computation cost. To properly illustrate the improvement, Figs. 8 and 9 show the clustering performance of basic FCM and AKFCM on the S3 dataset. The most obvious improvement can be observed on the marked cluster which is structurally different from other clusters.

Fig.8

Clustering performance of FCM on S3.

Fig.9

Clustering performance of AKFCM on S3.

Algorithm 1

Algorithm 1: AKFCM
1: Input: Unlabelled dataset X
2: Output: Membership degrees U and cluster centers V
3: Initialization: Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T; Initialize V⁽⁰⁾ and U⁽⁰⁾
4: fort = 1,2…Tdo
5: fori = 1,2…cdo
6: fork = 1,2…ndo
7: Compute w_i (x_k) according to Equation (24)
8: end for
9: Compute w_i (v_i) according to Equation (24)
10: end for
11: fori = 1,2…cdo
12: forp = 1,2…ddo
13: Compute k_b,ip according to Equations (25)–(27)
14: end for
15: end for
16: fork = 1,2…ndo
17: fori = 1,2…cdo
18: forp = 1,2…ddo
19: Compute $K_{ip}^{(t)} (w_{i} (x_{kp}), w_{i} (v_{ip}))$ according to
Equation (11)
20: end for
21: end for
22: end for
23: fork = 1,2…ndo
24: fori = 1,2…cdo
25: Compute u_ik according to Equation (22)
26: end for
27: end for
28: Update U^(t)
29: fori = 1,2…cdo
30: forp = 1,2…ddo
31: Compute w_i (v_ip) according to Equation (23)
32: end for
33: end for
34: Update w (V) ^(t) and V^(t)
35: if ∥V^(t) - V^(t-1) ∥ < ɛ or t = Tthen
36: Return U^(t) and V^(t)
37: end if
38: end for

Table 1

Clustering performance on publicly available datasets

	FCM	FCM-M	mFCM-M	KFCM	mKFCM	AKFCM
Iris (AR Index)	0.729	0.503	0.852	0.715	0.771	0.868
Iris (IA Index)	1.396	1.394	1.388	1.399	1.401	1.391
Wine (AR Index)	0.850	0.456	0.829	0.850	0.895	0.863
Wine (IA Index)	1.284	1.275	1.279	1.285	1.284	1.284
WDBC (AR Index)	0.731	0.367	0.336	0.731	0.501	0.756
WDBC (IA Index)	1.221	1.223	1.223	1.221	1.223	1.221
Aggregation (AR Index)	0.706	0.710	0.789	0.703	0.923	0.978
Aggregation (IA Index)	1.223	1.223	1.205	1.223	1.215	1.214
S1 (AR Index)	0.987	0.986	0.985	0.987	0.986	0.987
S1 (IA Index)	1.371	1.372	1.266	1.371	1.371	1.275
S2 (AR Index)	0.929	0.927	0.928	0.929	0.931	0.930
S2 (IA Index)	1.202	1.202	1.193	1.202	1.191	1.197
S3 (AR Index)	0.727	0.706	0.738	0.728	0.735	0.738
S3 (IA Index)	1.332	1.332	1.291	1.332	1.332	1.204
S4 (AR Index)	0.633	0.636	0.654	0.632	0.642	0.657
S4 (IA Index)	1.382	1.385	1.381	1.382	1.382	1.379
Average AR Index	0.786	0.661	0.764	0.784	0.798	0.847**
Average IA Index	1.301	1.301	1.278	1.302	1.300	1.271**

**best performance in average.

Table 2

Average running time of the six algorithms (s)

Dataset (size)	FCM	FCM-M	mFCM-M	KFCM	mKFCM	AKFCM
Iris (150×4)	0.00151	0.0689	0.0280	0.00939	0.00143	0.0213
Wine (178×13)	0.00168	0.0198	0.00900	0.00800	0.00232	0.0541
WDBC (569×30)	0.00210	0.0213	0.0587	0.0119	0.00428	0.407
Aggregation (788×2)	0.0103	0.129	0.0266	0.0345	0.00224	0.0306
S1 (5000×2)	0.00948	0.0239	0.236	0.0315	0.0142	0.0685
S2 (5000×2)	0.0125	0.0300	0.0569	0.0447	0.0181	0.111
S3 (5000×2)	0.0149	0.0479	0.828	0.0559	0.0158	0.135
S4 (5000×2)	0.0231	0.0432	0.698	0.0684	0.0168	0.443

5.3 Discussion

AKFCM improves upon the clustering performance of the FCM family of algorithms in two aspects: clustering accuracy and influential area detection. The former describes whether data are clustered correctly, and the latter indicates whether the natural cluster structure is properly extracted.

Of the previous algorithms, mFCM-M and mKFCM consider the structure of each cluster. However, mKFCM fails to achieve more reasonable influential areas as it is insensitive to detecting the influential direction of each cluster. In addition,mFCM-M shows decreased accuracy due to deviations in calculating the pseudo inverse of the covariance matrix and adjusting the influential range.

Combining the advantages of mFCM-M and mKFCM algorithms, the proposed AKFCM algorithm in this paper ensures both the detection of the influential direction and adjustments to the influential range of each cluster. In the AKFCM algorithm, the influential direction of each cluster is detected in accordance with its first transformation using the covariance matrix information, and the adjustment of the influential range of each cluster is attributed to its second transformation with adaptive kernelization. Therefore, the AKFCM algorithm can contribute to clustering datasets composed of clusters with different directions and ranges in structure.

In this paper, only the basic modifications of the FCM algorithm are considered and discussed. For example, the FCM-M and mFCM-M algorithms introduce the Mahalanobis distance to the FCM, while the KFCM and the mKFCM algorithms adopt kernelization. These modifications can be further extended to different clustering algorithms [30–33 , 37]. However, such extended algorithms may preserve the innate problems of their fundamental algorithms to varying degrees. This study introduces a novel fundamental algorithm (AKFCM) to the FCM family by combining the advantages of Mahalanobis distance-based FCM algorithms and kernel-based FCM algorithms. Moreover, this fundamental algorithm is an evolutionary algorithm that can overcome the innate problems of its predecessors by utilizing their complementary advantages.

The AKFCM algorithm is suitable for data sets containing clusters with varying influential ranges and directions. Especially, it can better handle the general ellipsoid shaped clusters as shown in Figs. 7 and 9. However, as shown in Table 2, the better clustering performance of AKFCM is achieved at the cost of much higher execution time. Therefore, AKFCM is more applicable to the conditions where the accuracy is of much importance against the execution time. Except for the real-time analysis, most of clustering tasks could be conducted in a sufficient time window and for some cases such as tumor recognition, disease diagnosis and long-term anomaly detection, the clustering accuracy comes first to avoid misclassification and misjudgment whenever possible.

Furthermore, several recent achievements have been made towards improving the clustering performance of the FCM family of algorithms, including studies on automatic variable weights [29], cluster number [59], sparse regularization [60], ensemble clustering [61] and local intensity inhomogeneities [41]. These achievements, which are not applied in this paper, can be helpful for further extending the potentials of the AKFCM algorithm in future work.

Noticeably, in this paper, we only focus on the FCM family algorithms, and other clustering algorithms, i.e. density based algorithms, are not involved. As they are fundamentally different algorithms, more benchmark sets are needed to compare them thoroughly, which is another important topic that can be included in a future study.

6 Conclusions

Over years of research and testing, the basic FCM clustering algorithm has evolved into multiple modified versions. FCM algorithms based on Mahalanobis distance (FCM-M and mFCM-M) effectively detect directional information from dataset structure. FCM algorithms based on kernelization (KFCM and mKFCM) provide an interface to adjust the influential range of clusters. Combining the advantages of these clustering algorithms, the AKFCM algorithm based on cluster structure is proposed in this paper. Comparatively, the AKFCM algorithm paid more attention on how to improve FCM based on more complete cluster structure information. The kernel width parameters in AKFCM are not constants but rather adaptive parameters that change according to the structural information (including directional and range information) of different clusters. The results show that the AKFCM algorithm improves upon the clustering performance of the FCM family of algorithms not only in clustering accuracy but also in extracting reasonable influential areas of clusters. Future research can be devoted to reducing the running time of the algorithm, developing the algorithm’s extended version and applying the algorithm in practice.

Footnotes

Acknowledgments

The authors would like to thank Dr. Y. Du (Tsinghua University), Dr. M. Xu (Tsinghua University) and Terigele (Dalian Medical University) for their valuable research insights and kind support in the preparation of the paper.

References

Rui

and Donald

, Survey of clustering algorithms, IEEE Transactions on Neural Networks 16 (2005), 645–678.

, Du

, Wu

and Xu

, Leveraging longitudinal driving behaviour data with data mining techniques for driving style analysis, IET Intelligent Transport Systems 9 (2015), 792–801.

Liu

X.W.

and Wang

L.S.

, Computing the maximum similarity bi-clusters of gene expression data, Bioinformatics 23 (2007), 50–56.

Nanthini

and Devi

R.M.

, Adaptive fuzzy C-Means for human activity recognition, 2014 International Conference on Information Communication and Embedded Systems (ICICES) (2014), 1–5.

Verma

, Agrawal

R.K.

and Sharan

, An improved intuitionistic fuzzy C-Means clustering algorithm incorporating local information for brain image segmentation, Applied Soft Computing 46 (2016), 543–557.

Hruschka

E.R.

, Campello

R.J.G.B.

, Freitas

A.A.

, A.C.P.L. and F., A Survey of Evolutionary Algorithms for Clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39 (2009), 133–155.

Dunn

J.C.

, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics 3 (1973), 32–57.

Bezdek

J.C.

, Ehrlich

and Full

, FCM:, The fuzzy C-Means clustering algorithm, Computers & Geosciences 10 (1984), 191–203.

Wang

Y.N.

, Li

C.S.

, Zuo

and A., Selection Model for Optimal Fuzzy Clustering Algorithm and Number of Clusters Based on Competitive Comprehensive Fuzzy Evaluation, IEEE Transactions on Fuzzy Systems 17 (2009), 568–577.

10.

Lin

, Huang

, Kuo

C.H.

and Lai

Y.H.

, A size-insensitive integrity-based fuzzy C-Means method for data clustering, Pattern Recognition 47 (2014), 2042–2056.

11.

Havens

T.C.

, Bezdek

J.C.

, Leckie

, Hall

L.O.

and Palaniswami

, Fuzzy C-Means Algorithms for Very Large Data, IEEE Transactions on Fuzzy Systems 20 (2012), 1130–1146.

12.

Kesemen

, Tezel

and Özkul,

, Fuzzy C-Means clustering algorithm for directional data (FCM4DD), Expert Systems with Applications 58 (2016), 76–82.

13.

Stetco

, Zeng

and Keane

, Fuzzy, C-Means++:, Fuzzy C-Means with effective seeding initialization, Expert Systems with Applications 42 (2015), 7541–7548.

14.

Namburu

, Samayamantula

S.K.

and Edara

S.R.

, Generalised rough intuitionistic fuzzy C-Means for magnetic resonance brain image segmentation, IET Image Processing 11 (2017), 777–785.

15.

Liu

, Hou

, Kang

and Liu

, Unsupervised Binning of Metagenomic Assembled Contigs Using Improved Fuzzy C-Means Method, IEEE/ACM Transactions on Computational Biology & Bioinformatics 14 (2017), 1459–1467.

16.

Liu

, Hou

and Liu

, Improving fuzzy C-Means method for unbalanced dataset, Electronics Letters 51 (2015), 1880–1882.

17.

Feng

, Dong

, Xia

, Hu

C.H.

, Fan

, Hu

, Gao

, and Mutic

, An adaptive Fuzzy C-Means method utilizing neighboring information for breast tumor segmentation in ultrasound images, Medical Physics 44 (2017), pp. 3752.

18.

Memon

K.H.

and Lee

, Generalised fuzzy C-Means clustering algorithm with local information, IET Image Processing 11 (2017), 1–12.

19.

Zhao

, Cheng

and Cheng

, Neighbourhood weighted fuzzy C-Means clustering algorithm for image segmentation, IET Image Processing 8 (2014), 150–161.

20.

Cao

and Wang

Y.P.

, Segmentation of M-FISH Images for improved classification of chromosomes with an adaptive fuzzy C-Means clustering algorithm, IEEE Transactions on Fuzzy Systems 20 (2012), 1–8.

21.

Dubey

Y.K.

and Mushrif

M. M.

, FCM Clustering Algorithms for Segmentation of Brain MR Images, Advances in Fuzzy Systems, ID:3406406 (2016), 1–14.

22.

C.L. and C.S. , New shadowed fuzzy C-Means algorithm for image segmentation, 2016 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS) (2016), 43–46.

23.

Guo

F.F.

, Wang

X.X.

and Shen

, Adaptive fuzzy C-Means algorithm based on local noise detecting for image segmentation, IET Image Processing 10 (2016), 272–279.

24.

Meng

and Zhang

, Research on applying KFCM algorithm to source code mining, Computer Engineering and Design 31 (2010), 2249–2252.

25.

Yang

A.M.

, Jiang

L.M.

and Zhou

Y.M.

, A KFCM-based fuzzy classifier, Fourth International Conference on Fuzzy Systems and Knowledge Discovery, Haikou (2007), 80–84.

26.

Mahajan

S.M.

and Dubey

Y.K.

, Color Image Segmentation Using Kernalized Fuzzy C-Means Clustering, 2015 Fifth International Conference on Communication Systems and Network Technologies (CSNT) (2015), 1142–1146.

27.

Yang

, Zhang

, Lu

and Ma

, A Kernel fuzzy C-Means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noises, IEEE Transactions on Fuzzy Systems 19 (2011), 105–115.

28.

Pal

N.R.

and Sarkar

, What and when can we gain from the Kernel versions of C-Means algorithm? IEEE Transactions on Fuzzy Systems 22 (2014), 363–379.

29.

Ferreira

M.R.P.

and de Carvalho

F.D.A.T.

, Kernel fuzzy C-Means with automatic variable weighting, Fuzzy Sets and Systems 237 (2014), 1–46.

30.

Huang

H.C.

, Chuang

Y.Y.

and Chen

C.S.

, Multiple Kernel fuzzy clustering, IEEE Transactions on Fuzzy Systems 20 (2012), 120–134.

31.

Dzung

D.N.

and Long

T.N.

, Multiple kernel interval type-2 fuzzy C-Means clustering, 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Piscataway, NJ, USA (2013), 1–8.

32.

Chen

, Philip Chen

C.L.

and Lu, M., A multiple-Kernel fuzzy C-Means algorithm for image segmentation, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 41 (2011), 1263–1274.

33.

Aik

L.E.

and Choon

T.W.

, An Incremental Clustering Algorithm Based on Mahalanobis Distance, AIP Conference Proceedings (2014), 788–793.

34.

Liu

H.C.

, Yih

J.M.

, Wu

D.B.

and Liu

S.W.

, Fuzzy C-mean algorithm based on complete Mahalanobis distances, Proceedings of 2008 International Conference on Machine Learning and Cybernetics (2008), 3569–3574.

35.

Zhang

, Li

Z.R.

, Cai

J.Y.

and Wang

J.Y.

, Image Segmentation Based on FCM with Mahalanobis Distance, Lecture Notes in Computer Science 2010 (6377), 205–212.

36.

Zhao

X.M.

, Li

and Zhao

Q.H.

, Mahalanobis distance based on fuzzy clustering algorithm for image segmentation, Digital Signal Processing 43 (2015), 8–16.

37.

Hamasuna

, Endo

and Miyamoto

, On Mahalanobis Distance Based Fuzzy C-Means Clustering for Uncertain Data Using Penalty Vector Regularization, IEEE International Conference on Fuzzy Systems (2011), 810–815.

38.

Melnykoy

, Melnykov

and On, K-means algorithm with the use of Mahalanobis distances, Statistics & Probability Letters 84 (2014), 88–95.

39.

Krishnapuram

and Kim

, A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms, IEEE Transactions on Fuzzy Systems 7 (1999), 453–461.

40.

Liu

, Yang

, Zhou

, Zhang

, Fei

and Tu

, Robust dataset classification approach based on neighbor searching and kernel fuzzy C-Means, IEEE/CAA Journal of Automatica Sinica 2 (2015), 235–247.

41.

Dougherty

A.W.

and You

, A kernel-based adaptive fuzzy C-Means algorithm for M-FISH image segmentation, International Joint Conference on Neural Networks (IJCNN) (2017), 198–205.

42.

Xie

X.L.

and Beni

, A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991), 841–847.

43.

Kwon

S.H.

, Cluster validity index for fuzzy clustering, Electronics Letters 34 (1998), 2176–2177.

44.

Pal

N.R.

and Bezdek

J.C.

, On cluster validity for the fuzzy C-Means model, IEEE Transactions on Fuzzy Systems 3 (1995), 370–379.

45.

Jain

A.K.

, Murty

M.N.

and Flynn

P.J.

, Data clustering: A review, ACM Computing Surveys 31 (1999), 264–323.

46.

Graves

and Pedrycz

, Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study, Fuzzy Sets and Systems 161 (2010), 522–543.

47.

Bezdek

J.C.

, A convergence theorem for the fuzzy isodata clustering algorithms, IEEE Trans Pattern Analysis & Machine Intelligence 2 (1980), 1–8.

48.

Selim

S.Z.

and Ismail

M.A.

, K-means-type algorithms: A generalized convergence theorem and characterization of local optimality, IEEE Trans Pattern Analysis & Machine Intelligence 6 (1984), 81–87.

49.

Shen

, Yang

, Wang

and Liu

, Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets, Soft Computing 10 (2006), 1061–1073.

50.

Chiu

S.L.

, Fuzzy model identification based on cluster estimation, Journal of Intelligent & Fuzzy Systems 2 (1994), 267–278.

51.

Franti

and Virmajoki

, Iterative shrinking method for clustering problems, Pattern Recognition 39 (2006), 761–775.

52.

Rand

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association 66 (1971), 846–850.

53.

Hubert

and Arabie

, Comparing partitions, Journal of Classification 2 (1985), 193–218.

54.

Lichman

UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences 2013.

55.

Fisher

R.A.

, The use of multiple measurements in taxonomic problems, Annals of Human Genetics 7 (1936), 179–188.

56.

Aeberhard

, Coomans

and Vel

O.D.

The classification performance of RDA, Technical Report 92-01, Department of Computer Science and Department of Mathematics and Statistics, James Cook University, North Queensland, Australia 1992.

57.

Street

W.N.

, Wolberg

W.H.

and Mangasarian

O.L.

, Nuclear feature extraction for breast tumor diagnosis, IS&T/SPIE’s Symposium on Electronic Imaging: Science and Technology (1993), 861–870.

58.

Gionis

, Mannila

and Tsaparas

, Clustering aggregation, ACM Transactions on Knowledge Discovery from Data (TKDD), 1(2007), article 4 1–30.

59.

Yang

and Nataliani

, Robust-learning fuzzy C-Means clustering algorithm with unknown number of clusters, Pattern Recognition 71 (2017), 4559.

60.

Chang

, Wang

, Liu

and Wang

, Sparse regularization in fuzzy C-Means for high-dimensional data clustering, IEEE Transactions on Cybernetics 47 (2017), 2616–2627.

61.

Xin

, Hao

, Hong

, Guannan

and Maobo

, Ensemble clustering via fuzzy C-Means, 2017 International Conference on Service Systems and Service Management (2017), 1–6.