Abstract
The well-known Fuzzy C-Means (FCM) algorithm and its modified clustering derivatives have been widely applied in various fields. However, previous studies have focused on the yield of correctly clustered data, and few have addressed the alignment of extracted influential areas of clusters to natural cluster structure. Various clustering algorithms present diverse characteristics in cluster structure detection due to the different clustering principles involved. For example, Mahalanobis distance-based FCM algorithms effectively detect the influential direction of each cluster, while kernel-based FCM algorithms provide an interface for adjusting the influential range. Combining the advantages of these previous algorithms, the Adaptive Kernel Fuzzy C-Means (AKFCM) algorithm based on cluster structure is proposed in this paper. The AKFCM algorithm can effectively detect the influential direction and adjust the influential range of each cluster with adaptive kernelization. By applying the previous and AKFCM algorithms to both synthetic and real-world datasets, the proposed algorithm is proven to achieve better performance not only in clustering accuracy but also in the extraction of reasonable influential areas. The proposed algorithm could be helpful for clustering datasets composed of clusters with different directions and ranges in structure.
Introduction
Clustering, an unsupervised discrimination process applied to homogeneous data features with little or no prior knowledge [1], has made outstanding contributions in several domains such as engineering, medicine, image processing and pattern recognition [2–6]. As one of the basic types of clustering, centroid-based clustering plays an important role in the development of clustering algorithms [7]. The Fuzzy C-Means (FCM) clustering algorithm [8, 9], in which each data point is shared among multipleclusters, is one of the most widely accepted and applied clustering algorithms. Many extensive studies have been conducted on the family of FCM algorithms from different perspectives, including the influence of fuzzy factors, cluster number selection, initialization sensitivity, the introduction of intuitionistic fuzzy sets, the difficulty of dealing with unbalanced or large-scale data sets and the constraints of solely detecting identical super spherical shapes [1, 10–16, 1, 10–16].
Using the FCM algorithm as a meta/basic algorithm, several modified algorithms have been introduced by considering the relationships among data from different angles. In these algorithms, there are two general research modes: one is incorporating the neighboring region information around the data (region-based) and the other is integrating the structure information of cluster (structure-based). The region-based studies, which consider the relationship between data and neighboring region, have been mainly applied in image segmentation [17]. Most of these studies only need to handle single feature input data, i.e. intensity level feature [18]. In the image, the data (intensity level feature of image pixel) is closely associated with its neighboring region information. The neighborhood pixels usually hold similar intensity level feature, and the higher similarity means higher probability of belonging to the same cluster [19]. Therefore, by taking into account the local spatial image information, the problems such as local intensity inhomogeneities [17, 21] and noise corruption [18, 21–23], can be settled as the image pixels are not regarded as separated points any more. However, except for the image data, very few datasets can be incorporated with such clear contextual information as in the images. For this reason, the structure-based studies have been carried out to integrate the structure information of cluster into the clustering process. The data that better aligns with the cluster structure is more likely to belong to the corresponding cluster. Based on this principle, some studies have adopted kernelization to reflect the relationships between data and clusters, which go beyond the general distance ties [24–32]. While some studies have achieved proper clustering performance by employing Mahalanobis distance to bring the structure information of clusters into calculation [33–39]. The details of the typical structure-based algorithms are introduced in Section 3. More recently, the research on using both of the structure-based and region-based techniques has also been implemented [40, 41]. However, even though the structure information of cluster has been integrated into the clustering process in different extent, few researches have focused on and pertinently analyzed the measure of cluster structure, the influential areas of clusters, which are further introduced and discussed in this paper. This may inhibit effectively exploiting the natural cluster structure and thereby developing improved clustering algorithms.
The influential areas of a cluster are mappings of the data aggregation structure within a cluster. The directional and range information of the influential areas are called influential direction and influential range respectively in this paper. The influential areas extracted by the family of FCM algorithms can be described by the fuzzy membership degree of the data points belonging to the clusters. In Fig. 1, the influential areas of the illustrated cluster are indicated by varied colors, and their important envelope lines are also drawn. In the figure, data point A and B hold same distance to the cluster centre, and thus, by only considering the general distance in clustering, the attraction of the cluster for both points keeps identical. However, compared with point B, point A’s participation is more in line with the natural cluster structure, which makes its membership degree larger to get more reasonable clustering results. By constructing the influential area that is more consistent with cluster structure, the clustering accuracy and validity of membership degrees can be improved. Different clustering methods can generate diverse influential areas of clusters in accordance with various clustering principles. For example, the kernelization technique is more concentrated on adjusting the influential range of cluster, while the Mahalanobis distance measure is better at detecting the influential direction of cluster. However, the clustering performance is still limited, as few existing algorithms simultaneously consider and effectively tailor both the influential directions and ranges of clusters for achieving reasonable influential areas of clusters that align well with natural structure.

The influential areas of a cluster.
Therefore, we developed a clustering algorithm that exhibits reliable performance not only in clustering accuracy but also in extracting reasonable influential areas of clusters. There are two primary contributions in our work: The Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm based on cluster structure is proposed in this paper through an analysis of existing algorithms in the FCM family. A novel clustering validity index considering the influential areas of clusters is also introduced in addition to the conventional rand index to assess the effectiveness of the clustering algorithms.
The paper is organized as follows: Section 2 includes and summarizes the related works; Section 3 introduces a set of related algorithms including basic FCM, FCM based on Mahalanobis distance (FCM-M) and its modified algorithm, and Kernel Fuzzy C-Means (KFCM) and its modified algorithm; Section 4 proposes and details the Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm and novel clustering validity index; Section 5 analyzes and discusses the clustering performance of various previous algorithms versus the proposed algorithm from both the perspective of clustering accuracy and influential areas; Section 6 concludes the paper.
Many adaptive FCM family algorithms have been raised and analyzed in the previous researches.
Nanthini and Devi [4] applied an adapting skipping method to reduce the training time of FCM by giving varying degrees of attention on the correctly and incorrectly classified data sample. By using the skipping factor (SF), the data sample is labeled differently (SF = 0 or SF = 1) in each iteration of the algorithm. In this way, the correctly classified data sample is skipped and extra calculation is avoided. The size of the data sample incorporating into computation is adaptively changed according to the skipping factor values.
For images, Feng et al. [17] and Guo et al. [23] also put forward the adaptive versions of the FCM by adaptively selecting the neighbor regions with respect to each pixel of image based on mutual information and noise detection. These region-based studies are appropriate for image segmentation due to the clear contextual information given by image data. The adaptiveness of the algorithms is mainly incarnated in considering the heterogeneity of neighboring information. Thus, for the homogeneous data, these adaptive algorithms would lose their magic.
Searching for and capturing the hidden characteristics is a good way to reasonably cluster the homogeneous data. If the clusters are with similar structure, then all data points can be converted according to the identical specified rule. M. Meng and Y. Zhang [24] utilized kernelization rule for source code mining. A. M. Yang et al. [25] developed a KFCM-based fuzzy classifer by selecting an appropriate kernel function and its parameters. X. Yang et al. [27] integrated the KFCM into support vector machine algorithm and found the algorithm is robust for classification problems. Except for the kernelization, other kinds of conversions have also been tried and tested. Y. Feng et al. [17] utilized the Hausdorff distance based FCM to segment MR images effectively. O. Kesemen et al. [12] porposed a fuzzy C-Means clustering algorithm adapted for directional data based on angular difference. In the study, the data with the form of a periodic value is converted into circular data. L. E. Aik and T. W. Choon [33] replaced the Euclidean distance of FCM with Mahalanobis distance and increased the training accuracy. Based on FCM with Mahalanobis distance, Yong et al. [35] also found the improvement on image segmentation.
In the above mentioned studies, the parameters of kernelization, Mahalanobis distance calculation and other conversions are usually same to all clusters, which means the structures of all clusters obey an identical rule. However, the assumption of having similarly structured clusters is not always true in the data analysis.
To discover the unique characteristics of clusters, the adaptive process is needed. Unlike adaptively making changes on the clustering data size and neighboring region, different hidden characteristics of clusters can be extracted by adaptively detecting and adjusting the structure information of each cluster. The structure information of clusters, the influential directions and influential ranges, can be considered into clustering process by adopting varied kernel parameters and covariance matrices for different clusters.
Ferreira and de Carvalho [29] automatically assigned weights for different variables in KFCM. Actually, it can be seen as an adaptive approach of selecting the kernel parameters for different clustering dimensions. But this study mainly focused on the dissimilarity of variables rather than clusters. Huang et al. [30] proposed a multiple kernel fuzzy clustering algorithm by mapping the original data features to new feature spaces. It demonstrates the effectiveness of using multiple kernels, but the learned kernel parameters may have little correlation with cluster structure. Similar attempts were also made by Dzung and Long [31], in which different kernels were further combined to a new kernel. Chen et al. [32] also utilized multiple kernels and their composite kernels to fuse different information in clustering. Nevertheless, very few researches have discussed the selection of different kernels matching different clusters, which can incorporate the specific range information of cluster into computation.
Liu et al. [34] proposed FCM based on complete Mahalanobis distance by considering both of the overall covariance matrix and local covariance matrix. Melnykoy and Melnykov [38] extended K-means algorithm to Mahalanobis distance based algorithm by estimating the local covariance matrices. Similarly, Zhao et al. [36] and Hamasuna et al. [37] calculated the Mahalanobis distance with the covariance matrix of each cluster, which gives the opportunity to consider different influential directions of clusters. However, in some cases such as occurring invalid inverse of covariance matrix and requiring influential range adjustment, these algorithms could still experience difficulties.
Overall, the adaptive detection of influential directions and adjustment of influential ranges need to be further integrated on cluster level, which is the main work of this study. Comparatively, our work will pay more attention on how to improve FCM based on more complete cluster structure information rather than relying on additional information (the intensity of neighboring region) or partial structure information (neglecting influential range or influential direction). In the following sections, by summarizing previous related work, we will introduce several representative algorithms and our proposed algorithm in detail.
Related algorithms
Fuzzy C-Means
The Fuzzy C-Means (FCM) clustering algorithm introduced in 1984 [9] as a generalization of ISODATA [8] has been widely applied due to its simplicity and efficiency. The main idea of the FCM algorithm is to cluster data into reasonable groups by grouping samples with close distance in feature space with each data point being shared by greater than one cluster. The FCM defines x k (k = 1, 2, . . . n) as the k-th data point in the target dataset, v i (i = 1, 2, . . . c) as the i-th center of the clusters, d ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) as the Euclidean distance between the k-th data and i-th cluster center, and u ik (i = 1, 2, . . . c ; k = 1, 2, . . . n) as the fuzzy membership degree of the k-th data with respect to the i-th cluster (u ik is equal to 0 or 1 in the Hard C-Means clustering algorithm). In this paper, lowercase symbols, bold lowercase symbols and bold capitalized symbols represent values (1-dimension), vectors and matrices respectively.
The objective function of the FCM can be written as Equation (1) where m is a fuzzy factor.
The membership degrees of the data must also obey Equation (2) and, of course, u
ik
⩾ 0.
To minimize the objective function under the constraint of Equation (2), the gradient of its Lagrangian function on the unknown variables (U and V matrices) can be calculated. Equations (3) and (4) provide updated membership degrees and centers via iterative computation.
The FCM algorithm proceeds as follows. Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T. Initialize the cluster centers Calculate the updated membership degrees Calculate the updated centers If ∥V(t) - V(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).
To obtain improved clustering results, the best cluster number can be found by calculating a validity measure [42, 43]. However, because this process is not the essence of this study, the cluster numbers of the datasets are predefined based on the actual cluster numbers in this paper. The initialization of the cluster centers should also be considered; details on this step can be found in Section 4.4. For the sake of simplicity and clarity, we set the fuzzy factor value m equal to 2.0 in the following sections as many users prefer this value of m [44].
The traditional FCM clustering algorithm is based on the Euclidean distance metric and works well for detecting clusters with super spherical shapes. However, the assumptions of the Euclidean distance metric, which require equivalence and independence in the feature space of the target dataset, are not always valid in practice. Therefore, an FCM algorithm based on Mahalanobis distance was proposed, and many modified forms of it have been studied [33–39].
The advantage of the Mahalanobis distance is that it takes the structure of a dataset into account (by achieving the covariance matrix) in calculating the distance between data vectors. A distance metric shows the difference between data points [45], and the Mahalanobis distance metric can adequately express varied relationships with different aggregation structures. From this point of view, the Fuzzy C-Means clustering algorithm based on Mahalanobis distance (FCM-M) may be more appropriate for analyzing correlated data that show clear correlation in the feature space than the FCM algorithm. The squared Mahalanobis distance is defined in Equation (5):
The objective function of the FCM-M algorithm can be written as Equation (6):
By calculating the corresponding Lagrangian function, the updated membership degrees can be obtained as shown in Equation (7). The relevant Mahalanobis distance is easily found according to Equation (5). The calculation of the updated centers is identical to Equation (4) in the FCM algorithm. To focus more attention on analyzing the characteristics of the algorithm, this derivation is not presented in this paper. The FCM-M algorithm proceeds similarly to the FCM algorithm; only the calculation of the updated membership degrees at step (c) is changed from Equations (3) to (7).
Unlike the FCM-M algorithm, the main idea of the modified Fuzzy C-Means based on Mahalanobis distance (mFCM-M) is to consider the structure of each cluster rather than solely exploit the overall structure of the dataset. The covariance matrix of each cluster is associated into the computation, which helps the mFCM-M detect the influential structure of clusters. The objective function is written asEquation (8):
The updated membership degrees and centers can be obtained via iterative calculation according to Equations (9) and (4). The covariance matrix of each cluster is calculated based on the data points belonging to the cluster, as shown in Equation (10):
The mFCM-M algorithm proceeds as follows. Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T. Initialize the cluster centers Calculate the inverse of the covariance matrix of each cluster Calculate the updated membership degrees Calculate the updated centers If ∥V(t) - V(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).
Due to the increasing popularity of the kernel function, the Kernel Fuzzy C-Means (KFCM) clustering algorithm and its modified versions have been gradually applied in various fields [24–32]. The widely accepted Gaussian Radial Basis Function (GRBF) kernel shown in Equation (11) is used in the KFCM algorithm in this paper. The kernel function elaborates the relationship of data points that have been mapped into a higher dimensional space:
The objective function of the KFCM algorithm addresses the distance of mapped values in high dimensional space, which is written as Equation (12). The updated membership degrees and centers can be derived via iterative calculation and written as Equations (13) and (14). The derivation is not presented in this paper; however, interested readers can refer to [46] for details.
The KFCM algorithm proceeds as follows. Set cluster number c, fuzzy factor m, kernel width parameter k
b
, stopping condition ɛ, and maximum iteration number T. Initialize the cluster centers Calculate the kernel matrix K(t) (x, v) in iteration t according to Equation (11). Calculate the updated membership degrees Calculate the updated centers If ∥V(t) - V(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).
By mapping the data into a high dimensional space, the KFCM algorithm can expand the differences among the features of the dataset to gain higher efficiency and clustering performance. However, the determination of a favorable value of the kernel width parameter k
b
has not been properly specified in previous studies. The kernel width parameter is used to change the correlation between data points and centers according to the specified range. Too large or too small kernel width parameter can lead to insensitive and invalid correlation change, which further affects achieving accurate clustering results and reasonable influential areas. Therefore, combining with the empirical performance, the mean value of the distance between cluster centers and data points is used to select the kernel width parameter k
b
in this paper, as shown in Equation (15).
The single value of the kernel width parameter for an entire dataset limits the delineation of the influential ranges of clusters, as the influential range may vary on each cluster and each dimension. The modified Kernel Fuzzy C-Means (mKFCM) clustering algorithm with additional kernel width parameters is a viable option. The mKFCM introduced in this paper considers various kernel width parameters for different clusters and dimensions according to Equation (16):
The objective function, updated membership degrees and centers of the mKFCM algorithm can be written as Equations (17)–(19) respectively:
The mKFCM algorithm proceeds similarly to that of the KFCM algorithm and is not presented here due to the length limit of the paper.
Defects of the previous algorithms
The aforementioned FCM, FCM-M and KFCM algorithms fail to detect the distinct influential direction and range of each cluster, which hinders a better understanding of natural cluster structure. In contrast, the mFCM-M algorithm effectively discovers the influential structure of clusters by computing the inverse of their covariance matrices. Nevertheless, two problems remain: The inverse of the covariance matrix may not exist for certain clusters in the calculation of the Mahalanobis distance. Although the pseudo inverse of a covariance matrix can be used alternatively, it can reduce the accuracy of the computation. Clustering results on the synthetic dataset DS3 exhibit this problem in Section 5. As we use the threshold value of the membership degree to detect the core structure of each cluster, the varying data densities of the core areas can lead to influential range deviation.
To address the problems associated with the mFCM-M algorithm, the mKFCM algorithm provides an opportunity for adjusting the influential range without taking the inverse computation. This is accomplished by using varying kernel width parameters for different clusters. However, the mKFCM algorithm is insensitive to detecting the influential direction of each cluster. Taking advantage of the mFCM-M and mKFCM algorithms, a new algorithm is proposed in this paper to ensure both the detection of the influential direction and the adjustment of the influential range of each cluster.
Adaptive kernel fuzzy C-Means
By combining the advantages of the former structure-based algorithms, the Adaptive Kernel Fuzzy C-Means (AKFCM) clustering algorithm is proposed in this paper. Unlike the region-based adaptive algorithms, the AKFCM is devoted to adaptively integrate the cluster structure information into the clustering process. Moreover, compared with the former structure-based algorithms, the AKFCM aims to extract more reasonable influential areas that align well with natural cluster structure by simultaneously tailoring the influential direction and influential range of each cluster. In the AKFCM algorithm, the objective function is rewritten as Equation (20). It can be seen that two types of feature space transformation are adopted: the first transformation is for detecting the influential directions of clusters, in which the uncorrelated features for each cluster are built up using the directional information of the covariance matrix; the second transformation is for adjusting the influential ranges of clusters, in which the influential ranges in the uncorrelated feature space are determined using the kernel function.
To minimize the objective function, the gradient of the Lagrangian function (which also addresses the restrictions of the membership degrees in Equation (2)) on the unknown variables (U and w (V)) can be calculated. The Lagrangian function is shown in Equation (21), where λ is the Lagrangian parameter. The closed-form formulas for updating unknown variables (U and w (V)) are derived by taking the partial derivatives on the Lagrangian function. Through a specific operation and conversion, the updated membership degrees and centers in the uncorrelated feature space can be derived and written as Equations (22) and (23). The updated centers V in the original feature space can be easily found through a corresponding reverse transformation.
The kernel width parameters in the AKFCM algorithm are not constants but rather adaptive changing parameters based on the structural information of different clusters. The first transformation detecting the directional information of each cluster is obtained according to Equation (24). This transform function is to map the current data set x to the uncorrelated feature space by rotating x with specified directions. For each cluster, the transform function is varied due to the different assembling direction of cluster. By multiplying the given data with the eigenvectors of the covariance matrix of cluster, the data based on the new coordinates that align with the assembling direction of cluster can be achieved. This can make each cluster judge the unlabelled data according to the cluster’s own directional preferences.
The second transformation determining the influential range of each cluster is achieved by employing different kernel width parameters in the kernel function for each cluster and dimension according to Equations (25)–(27). The range of the core data in each cluster and dimension is measured by Equation (25). Meanwhile, as the core data of each cluster can be detected with varying degrees, adjustments to the core data range are made by Equations (26) and (27) (based on the data density of the clusters; higher density can lead to greater range adjustments):
The AKFCM algorithm proceeds as follows. Set cluster number c, fuzzy factor m, stopping condition ɛ, and maximum iteration number T. Initialize the cluster centers Detect the influential direction of each cluster and transform the original data and centers into the uncorrelated feature space corresponding to each cluster according to Equation (24). Calculate and adjust the influential range of each cluster in the transformed feature space and adaptively determine the kernel width parameter for each cluster on each dimension according to Equations (25)–(27). Calculate the kernel matrix Calculate the updated membership degrees Calculate the updated centers in the uncorrelated feature space w (V) (t) = {w
i
(v
i
) (t) ; i = 1, 2, . . . c} in iteration t according to Equation (23) and retrieve the updated centers If ∥V(t) - V(t-1) ∥ < ɛ or t = T, then exit; if not, let t = t + 1 and return to step (c).
The time complexity of the AKFCM algorithm is O(cdnt), where c is the number of clusters, d is the number of dimensions, n is the number of data and t is the iteration number. The pseudo-code of the AKFCM algorithm is as follows:
The convergence of AKFCM is proven as follows through referring to the previous studies on the convergence of the FCM family of algorithms [29, 47–49]. Let U = {u ik ; i = 1, 2, . . . c ; k = 1, 2, . . . n}. φ = {φ ip (w i (v ip )) ; i = 1, 2, . . . c ; p = 1, 2, . . . d}
Since m > 1, u ij > 0 and ∥φ ip (w i (x jp )- φ ip (w i (v ip ) ∥ 2 > 0, the Hessian of the Lagrangian function of U is positive definite. Thus, Equation (22) is sufficient and can achieve the local minimum of the objective function.
Proof. The derivation of the Lagrangian function of φ confirms the only-if part. Meanwhile, as the Lagrangian function of φ is unconstrained and strictly convex function, the local minimum can be obtained through first derivative computation.□
According to Theorems 1 and 2, it can be proved that the objective function is a decreasing function and the AKFCM algorithm is locally convergent.
In the proposed AKFCM algorithm, the core structure of each cluster is detected in accordance with the data belonging to it. However, initial detection may be ineffective as the clustering centers are randomly initialized in the FCM family of algorithms. Therefore, an effective initialization process is implemented in this paper to avoid unnecessary computation.
A clustering center’s distance to the data within the cluster should be relatively small, and its distance to other clustering centers should be relatively large. In light of this point, a modified initialization process of the clustering centers V = {v
i
; i = 1, 2, . . . c} is proposed as follows. Find the alternative cluster centers Pick the two data points with the greatest distance from the alternative cluster centers and add them to the list of clustering centers V. If the current size of V equals the number of clustering centers c, then the initialization process terminates; if not, it continues to step (d). Calculate the minimum distance Pick the data point with the largest minimum distance
The initialization process was tested by executing the basic FCM algorithm on synthetic data sets S1, S2, S3 and S4, which have 5000 data points for 15 predefined clusters with varying degrees of overlap [51]. Proper initialization of the clustering centers can result in improved clustering results. Figure 2 shows the clustering results of the basic FCM algorithm using random initialization and this modified initialization process. The basic FCM algorithm using random initialization was carried out 1000 times on each synthetic data set. The Adjusted Rand (AR) index of the clustering results is used to assess the clustering accuracy, which is calculated according to Equations (29) and (30). We defuzzify the clustering results by determining the affiliated cluster of data according to its largest membership degree. A higher AR index value represents greater accuracy. From Fig. 2, it can be seen that the clustering accuracy of the basic FCM algorithm with modified initialization is much closer to the maximum clustering accuracy of the basic FCM algorithm using random initialization. Thus, the modified initialization of the clustering centers was adopted in the following analysis to avoid unnecessary and ineffective computation.

Clustering results using different initialization processes.
Clustering accuracy can be properly evaluated by the Adjusted Rand (AR) index [52, 53], which is a measure of the similarity between two data clusterings: clustering results and data real labels. However, to our knowledge, there is still a lack of an effective index for assessing reasonability of the produced clustering influential areas. In light of this point, a new clustering validity index called the Influential Area (IA) index is proposed in this study. The IA index of a data clustering is calculated as follows:
According to Equation (31), the IA index of a cluster is the ratio between the influential area of the clustering results and the area of the natural structure of the data. As we take the minimum membership degree of each cluster as the threshold, all of the data belonging to a specific cluster are included in the boundaries of the constructed influential area. In other words, the constructed influential area is larger than the area of the data. Thus, the IA index has a value greater than 1, with 1 indicating that the influential areas of clusters exactly align with the natural structure of the data. Generally, the IA index cannot be 1 as the boundaries of data with identical labels rarely achieve completely consistent membership degrees. However, an IA index value closer to 1 is still preferable for portraying influential structure similar to the natural structure of the data. In Fig. 3, which shows the different IA indexes of two clustering algorithms, the depicted structure of algorithm A (IA index value closer to 1) matches the natural structure of the data better than that of algorithm B. The mean value of the IA indexes of all the clusters is used to assess the validity and rationality of the influential areas of clusters in the following section.

Influential area (IA) index.
With respect to the numerous cluster validity indexes in literature, the proposed Influential Area index is able to check whether the achieved membership degrees (depicting influential area) are appropriate by comparing the membership-built area and label-built area. Thus, the index is fit for the fuzzy clustering algorithms that output membership degrees and the data with real labels. Also note that the index is more suitable to describe convex clusters as the area is calculated according to the convex hull of the target data.
To clearly discuss the characteristics of the clustering algorithms in detail, the performance of the previous algorithms are analyzed first in this section, and a comparative analysis with the proposed AKFCM algorithm is discussed in the following section.
Previous algorithms
The FCM, FCM-M, mFCM-M, KFCM and mKFCM algorithms were applied to synthetic datasets DS1, DS2 and DS3 (each dataset has 1200 data points) as shown in Figs. 4–6. The AR and IA indices are used to assess the accuracy of the clustering results and rationality of the influential areas of clusters produced. Higher AR index values and lower IA index values signify improved clustering performance.

Clustering performance of previous algorithms on the DS1 dataset.

Clustering performance of previous algorithms on the DS2 dataset.

Clustering performance of previous algorithms on the DS3 dataset.
The clustering performance of traditional FCM and FCM-M on synthetic dataset DS1 is given in Figs. 4(a1), 4(a2), 4(b1) and 4(b2). The accuracy of FCM-M (AR Index = 0.978) is higher than that of FCM (AR Index = 0.867). More importantly, the influential areas of FCM-M appear more rational (IA Index = 1.150) than those of FCM, considering the overall structure of the dataset. However, because it solely considers overall structure, FCM-M may perform as well as or even worse than FCM when the clusters in a dataset have different structures with respect to the overall structure. For example, in Figs. 5(a1), 5(a2), 5(b1) and 5(b2) on synthetic dataset DS2, the clustering performance of FCM (AR Index = 0.921; IA Index = 1.139) is much higher than FCM-M (AR Index = 0.45; IA Index = 1.179). The poor performance of FCM-M may result from its dependency on the overall structure of the dataset, as seen in the influential areas in Fig. 5(b2).
To overcome the defects of FCM-M, mFCM-M was applied to synthetic datasets DS1 and DS2. As shown in Figs. 4(c1), 4(c2), 5(c1) and 5(c2), increased overall performance (AR Index = 0.993 (DS1); IA Index = 1.156 (DS1); AR Index = 0.958 (DS2); IA Index = 1.132 (DS2)) over FCM and FCM-M was achieved, as the mFCM-M considers the sub-structure of each cluster. Intuitively, the influential areas of mFCM-M, particularly the influential directions, resemble the structure of the clusters. However, mFCM-M has its own shortcomings, which were mentioned in Section 4, the empirical evidence for which can be found in its clustering performance on synthetic dataset DS3. As shown in Fig. 6(c1) and 6(c2), the data in cluster 2 (blue) present a linear relationship in the uncorrelated feature space, making the inverse of the covariance matrix impossible to calculate. Thus, mFCM-M performs poorly because it uses the pseudo inverse of the covariance matrix (AR Index = 0.535).
The clustering performance of KFCM on synthetic datasets DS1, DS2 and DS3 is given in Figs. 4(d1), 4(d2), 5(d1), 5(d2), 6(d1) and 6(d2). The kernel width parameters for the three datasets are set according to Equation (20). The clustering results are not that bad, with AR Index = 0.867 (DS1); IA Index = 1.216 (DS1); AR Index = 0.926 (DS2); IA Index = 1.139 (DS2); AR Index = 0.756 (DS3); IA Index = 1.26 (DS3). However, there is still room for improvement in the optimization of the clustering results and influential areas. From the figures, it is fairly straightforward to see that the influential ranges of the clusters tend to be similar, which is actually caused by using the same kernel width parameter indiscriminately on different clusters.
The clustering performances of mKFCM on synthetic datasets DS1, DS2 and DS3 are shown in Figs. 4(e1), 4(e2), 5(e1), 5(e2), 6(e1) and 6(e2) (AR Index = 0.918 (DS1); IA Index = 1.218 (DS1); AR Index = 0.909 (DS2); IA Index = 1.142 (DS2); AR Index = 0.739 (DS3); IA Index = 1.308 (DS3)). Compared to the clustering performances of the KFCM, the influential areas, particularly the influential ranges, of mKFCM are varied among different clusters because mKFCM chooses the kernel width parameter separately for each cluster in each dimension. However, unfortunately, both KFCM and mKFCM are insensitive to detecting the influential direction of each cluster. Their influential directions of clusters are still based on horizontal and vertical coordinates (which is noticeable from the innermost envelopes of their influential areas), which may inhibit their ability to handle datasets with differently structured clusters, particularly those with different directional preferences.
The clustering performance of the proposed AKFCM algorithm on synthetic datasets DS1, DS2 and DS3 is shown in Figs. 7(a1), 7(a2), 7(b1), 7(b2), 7(c1), 7(c2). The clustering performance of AKFCM is generally better not only in clustering accuracy (AR Index = 0.983 (DS1); AR Index = 0.953 (DS2); AR Index = 0.966(DS3)) but also in influential area detection (IA Index = 1.156 (DS1); IA Index = 1.128 (DS2); IA Index = 1.118(DS3)). Figure 7(d) demonstrates the average performance of FCM, FCM-M, mFCM-M, KFCM, mKFCM and AKFCM. Of the previous algorithms, mKFCM achieves the highest accuracy (Average AR Index = 0.855), and mFCM-M achieves the most reasonable influential areas (Average IA Index = 1.184). In comparison, the clustering results of AKFCM have both higher accuracy (Average AR Index = 0.967) and more reasonable influential areas (Average IA Index = 1.134).

Clustering performance of AKFCM on the DS1, DS2 and DS3 datasets.
To further evaluate the effectiveness of the proposed algorithm, the previous algorithms and AKFCM were also applied to various publicly available real-world and synthetic datasets [54] including Iris [55], Wine [56], WDBC [57], Aggregation [58] and the synthetic data sets S1, S2, S3 and S4 [51]. The clustering results are listed in Table 1 for the sake of comparison. AKFCM maintains higher performance compared with the other algorithms not only in clustering accuracy (Average AR Index = 0.847) but also in extracting more reasonable influential areas (Average IA Index = 1.271). The average running time of the six algorithms is also given in Table 2. Even though the AKFCM can hardly beat the other algorithms from the perspective of running time, the improvement on both clustering accuracy and influential areas can justify its computation cost. To properly illustrate the improvement, Figs. 8 and 9 show the clustering performance of basic FCM and AKFCM on the S3 dataset. The most obvious improvement can be observed on the marked cluster which is structurally different from other clusters.

Clustering performance of FCM on S3.

Clustering performance of AKFCM on S3.
Clustering performance on publicly available datasets
**best performance in average.
Average running time of the six algorithms (s)
AKFCM improves upon the clustering performance of the FCM family of algorithms in two aspects: clustering accuracy and influential area detection. The former describes whether data are clustered correctly, and the latter indicates whether the natural cluster structure is properly extracted.
Of the previous algorithms, mFCM-M and mKFCM consider the structure of each cluster. However, mKFCM fails to achieve more reasonable influential areas as it is insensitive to detecting the influential direction of each cluster. In addition,mFCM-M shows decreased accuracy due to deviations in calculating the pseudo inverse of the covariance matrix and adjusting the influential range.
Combining the advantages of mFCM-M and mKFCM algorithms, the proposed AKFCM algorithm in this paper ensures both the detection of the influential direction and adjustments to the influential range of each cluster. In the AKFCM algorithm, the influential direction of each cluster is detected in accordance with its first transformation using the covariance matrix information, and the adjustment of the influential range of each cluster is attributed to its second transformation with adaptive kernelization. Therefore, the AKFCM algorithm can contribute to clustering datasets composed of clusters with different directions and ranges in structure.
In this paper, only the basic modifications of the FCM algorithm are considered and discussed. For example, the FCM-M and mFCM-M algorithms introduce the Mahalanobis distance to the FCM, while the KFCM and the mKFCM algorithms adopt kernelization. These modifications can be further extended to different clustering algorithms [30–33, 37]. However, such extended algorithms may preserve the innate problems of their fundamental algorithms to varying degrees. This study introduces a novel fundamental algorithm (AKFCM) to the FCM family by combining the advantages of Mahalanobis distance-based FCM algorithms and kernel-based FCM algorithms. Moreover, this fundamental algorithm is an evolutionary algorithm that can overcome the innate problems of its predecessors by utilizing their complementary advantages.
The AKFCM algorithm is suitable for data sets containing clusters with varying influential ranges and directions. Especially, it can better handle the general ellipsoid shaped clusters as shown in Figs. 7 and 9. However, as shown in Table 2, the better clustering performance of AKFCM is achieved at the cost of much higher execution time. Therefore, AKFCM is more applicable to the conditions where the accuracy is of much importance against the execution time. Except for the real-time analysis, most of clustering tasks could be conducted in a sufficient time window and for some cases such as tumor recognition, disease diagnosis and long-term anomaly detection, the clustering accuracy comes first to avoid misclassification and misjudgment whenever possible.
Furthermore, several recent achievements have been made towards improving the clustering performance of the FCM family of algorithms, including studies on automatic variable weights [29], cluster number [59], sparse regularization [60], ensemble clustering [61] and local intensity inhomogeneities [41]. These achievements, which are not applied in this paper, can be helpful for further extending the potentials of the AKFCM algorithm in future work.
Noticeably, in this paper, we only focus on the FCM family algorithms, and other clustering algorithms, i.e. density based algorithms, are not involved. As they are fundamentally different algorithms, more benchmark sets are needed to compare them thoroughly, which is another important topic that can be included in a future study.
Conclusions
Over years of research and testing, the basic FCM clustering algorithm has evolved into multiple modified versions. FCM algorithms based on Mahalanobis distance (FCM-M and mFCM-M) effectively detect directional information from dataset structure. FCM algorithms based on kernelization (KFCM and mKFCM) provide an interface to adjust the influential range of clusters. Combining the advantages of these clustering algorithms, the AKFCM algorithm based on cluster structure is proposed in this paper. Comparatively, the AKFCM algorithm paid more attention on how to improve FCM based on more complete cluster structure information. The kernel width parameters in AKFCM are not constants but rather adaptive parameters that change according to the structural information (including directional and range information) of different clusters. The results show that the AKFCM algorithm improves upon the clustering performance of the FCM family of algorithms not only in clustering accuracy but also in extracting reasonable influential areas of clusters. Future research can be devoted to reducing the running time of the algorithm, developing the algorithm’s extended version and applying the algorithm in practice.
Footnotes
Acknowledgments
The authors would like to thank Dr. Y. Du (Tsinghua University), Dr. M. Xu (Tsinghua University) and Terigele (Dalian Medical University) for their valuable research insights and kind support in the preparation of the paper.
