A fast method for discovering suitable number of clusters for fuzzy clustering

Abstract

One main problem of Fuzzy c-Means (FCM) is deciding on an appropriate number of clusters. Although methods have been proposed to address this, they all require clustering algorithms to be executed several times before the right number is chosen. The aim of this study was to develop a method for determining cluster numbers without repeated execution. We propose a new method that combines FCM and singular value decomposition. Based on the percentage of variance, this method can calculate the appropriate number of clusters. The proposed method was applied to several well-known datasets to demonstrate its effectiveness.

Keywords

Fuzzy c-Means (FCM)clustering singular value decomposition number of clusters processing time

1. Introduction

Clustering is an unsupervised technique for grouping a set of objects based on their similarity. Clustering in general and Fuzzy c-Means (FCM) in particular can be applied to many real-world challenges, such as medical image segmentation [1, 2], target selection in direct marketing [3], web mining [4], managing readiness activities for enterprise resource planning [5], clustering microarray data [6], stock market prediction [7], and even cluster analysis and nonlinear mapping of soil datasets for detection of polluted sites [8].

The need for clustering continues to grow. A pivotal challenge for most basic clustering algorithms is determining the number of clusters, which greatly influences clustering results [9]. Most clustering algorithms are sensitive to the initial number of clusters, which is difficult to determine without prior information [10]. Some methods have attempted to determine the number of clusters for crisp data [11, 12, 13, 14].

Only a limited number of studies focused on determining a suitable number of centers for fuzzy clustering techniques [10, 11, 15, 13, 16]. In [15], Sun et al. proposed specifying values based on a cluster validity criterion. This algorithm splits the worst cluster into two clusters until the predefined criterion is met. Subbalakshmi et al. calculated a Silhouette score function for each cluster. The cluster with the lowest score is split, and the process is repeated until maximum average Silhouette scores are achieved [11]. In [16], Yu and Li proposed a new index for determining the number of clusters in FCM by applying a Hessian matrix derived from minimizing the objective function. They use the conditional index of this Hessian matrix as an index of stability. The main disadvantage of all these approaches is that number of clusters are derived through repetitive trials. The numbers are strongly intertwined with the clustering algorithms. As a result, the numbers identified cannot be applied to other methods. Every FCM clustering method still have to figure out their own ways to estimate numbers of clustering. The previous methods aim for finding the k value by running the clustering algorithm several times with a different k value for each time. Hence, their limitations are about executing time.

To independently identify a suitable number of clusters, this study proposes an algorithm utilizing Singular Value Decomposition (SVD) on similarity matrix of data points. SVD is known for relating data points with latten features [17, 18, 19]. In this study, the data points highly associated with the same latten features are assumed to be related. As a result, the number of features is suggested as the number of clusters. The method proposed in this study reduces execution time by approximately 85% compared with other methods.

This paper is organized as follows. In Section 2, we review some studies closely related to our topic. In Section 3, we introduce the basic FCM algorithm and SVD. We also provide a detailed presentation of our new method, which incorporates a user-defined threshold. In Section 4, numerical examples are provided to demonstrate the exactness and usefulness of our method, including empirical comparisons. In Section 5, we offer conclusions and discuss current and future applications of this research.

2. Related work

Fuzzy clustering has emerged as a fruitful research topic. Accordingly, a great many clustering algorithms have been developed. However, determining the number of clusters remains a critical task. In [19], Lee and Olafsson showed that the number of clusters can affect the performance of clustering algorithms. In [20] Yu et al. noted that the quality of clustering results, including factors such as noise or outliers, is also affected by the number of clusters.

In [15], Sun et al. reported an FCM-based splitting algorithm for automatically determining the number of clusters, which was calculated using a score function. With U as the fuzzy partition matrix, V as the vector of centers, and $c$ as the number of clusters, they proposed VWSJ (U, V, $c$ ) as a validity index in the following form:

$\displaystyle\textit{VWSJ}(\textbf{{U}},\textbf{{V}},c)=\textit{Scat}(c)+% \textit{Sep}(c)/\textit{Sep}(\textit{Cmax})$

The value of Scat ( $c$ ) generally decreases as $c$ increases because the clusters become more compact. Sep ( $c$ ) represents the separation between clusters. A cluster number $c$ that minimizes VWSJ (U, V, $c$ ) is considered the optimal value for the number of clusters present in the data.

In [11], Subbalakshmi et al. used a fuzzy Silhouette score function $S(i)$ associated with each cluster to identify problematic clusters and determine the optimal number of clusters based on a fuzzy silhouette cluster validity index calculated from dynamic data. An average silhouette value is checked to ensure the overall clustering quality of the dataset. The process is repeated until maximum average Silhouette scores are achieved.

In [10], Beringer and Hüllermeier reported an adaptive optimization method for adjusting the number of clusters in FCM clustering. They proposed a modification of the commonly used Xie-Beni index as a validity measure for fuzzy partitions. Each iteration of this method consists of a quality check to determine whether the cluster model might be improved by increasing ( $K+1$ ) or decreasing ( $K-1$ ) the number of clusters $K$ . The process is repeated until $K$ remains unchanged.

Although fuzzy clustering and methods for determining the number of clusters have been the topic of research, to the best of our knowledge no studies have focused on using SVD and FCM in a clustering algorithm to determine the number of clusters.

In [13], Thong et al. listed three popular approaches to determining the number of clusters: scanning [19, 21, 22, 23], preprocessing [24, 25], and pruning [11, 26, 27]. To our understanding, all of these approaches require that a clustering procedure be run numerous times to check the optimality of the number of clusters.

Several studies have attempted to use both FCM and SVD in certain applications. In [28], Li et al. combined SVD, FCM, and rough set theory to diagnose faults in rotating machinery. In [29], Muliawati et al. used FCM to identify trending topics on Twitter from reduced data dimensions by SVD. Guo et al. proposed a feature-clustering approach to recognize faults in resonant grounding distribution systems, using SVD to decompose the time-frequency matrix and merging with the polarity distribution matrix to establish an amplitude-polarity feature matrix (APFM). FCM was then applied to the APFM to detect system faults by classifying feeders into two types: fault feeders and sound feeders [30]. Oliynyk et al. used FCM clustering to identify clusters with various neuronal spike shapes, then applied SVD to classify neuronal waveforms. Although FCM clustering based on SVD has been applied in their algorithms, none of these studies attempted to identify the appropriate number of clusters [31]. Our proposed method therefore seeks to determine the suitable number of clusters using a combination of FCM and SVD. Comparing to other methods, our performance can be integrated with other approaches. To overcome the issue of execution timing, this paper strives to reduce the time of clustering and makes some comparison with the benchmarking methods.

3. Methodology

In [16], there are two main approaches to determining the number of clusters. The first approach focuses on geometric properties (e.g., the Xie-Beni index [10]). The second approach uses the fuzzy partition concept, and its main advantage is superior data clustering performance in uncertain cases. The algorithm proposed in this paper is based on fuzzy partition.

3.1 Primary of FCM

The main aim of FCM is to cluster data points by minimizing the objective function $G_{\omega}$ defined in Eq. (1) through an iterative process updating the degree of membership values.

$\displaystyle G_{\omega}({U,C,D})=\mathop{\sum}\limits_{i=1}^{k}\mathop{\sum}% \limits_{j=1}^{N}\mu_{ij}^{\omega}\delta_{ij}^{2}$ (1)

Where:

$U$ : matrix of membership values; $D$ : dataset; $C$ : denotes the cluster center matrix; $N$ : number of data points in $D$ ; $k$ : is the number of clusters; $\omega$ : is the fuzziness parameter in FCM, a weighting exponent that controls the degree of fuzzy overlap between clusters; $\delta_{ij}$ : distance between the $i^{\text{th}}$ centroid and $j^{\text{th}}$ data point; the actual distance function $\delta_{ij}=d_{j}-c_{i}$ can vary according to applications.

In Eq. (2), $c_{i}$ is the centroid of $i^{\text{th}}$ cluster and

$\displaystyle c_{i}=\frac{\mathop{\sum}\nolimits_{j=1}^{N}\mu_{ij}^{\omega}d_{% j}}{\mathop{\sum}\nolimits_{j=1}^{N}\mu_{ij}^{\omega}}$ (2)

where $\mu_{ij}$ is the membership value of the $j^{\text{th}}$ data point and the $i^{\text{th}}$ centroid, where

$\displaystyle\mu_{ij}=\frac{1}{\mathop{\sum}\nolimits_{g=1}^{k}\left({\frac{% \delta_{ij}}{\delta_{kj}}}\right)^{\frac{2}{\omega-1}}}$ (3) $\displaystyle\mathop{\sum}\limits_{i=1}^{k}\mu_{ij}=1$ (4)

After being clustered by FCM, the $j^{\text{th}}$ data point is represented by a $k$ -dimensional membership vector $\mu_{j}$ .

The pseudocode of FCM is shown in Fig. 1.

Figure 1.

Pseudocode of FCM algorithm.

3.2 SVD method

Let $A$ denote a matrix with size $n\times m$ , which can be decomposed into three factorized matrices as follows:

$\displaystyle A=\textit{WSV}^{T}$ (5)

where $W$ is an $n\times n$ orthonormal matrix, $S$ is an $n\times m$ matrix, $V$ is an $m\times m$ orthonormal matrix, and $V^{T}$ is a transposed matrix of $V$ . The columns of $W$ are called left singular vectors, and the rows of $V^{T}$ are right singular vectors. Both $W$ and $V^{T}$ can be viewed as a new set of axes representing the data shown in $A$ within the original hyperspace. The singular values $\lambda$ in the diagonal entries of matrix $S$ are in descending order. A singular value relates to the dispersal of data being mapped on the corresponding orthonormal vector. Therefore, the larger the singular value, the more dispersed the data along the vector.

3.3 The proposed method

To correctly apply FCM, a reasonable $k$ , that is, the number of clusters, must be estimated. Although FCM has been applied in many different areas, little research has focused on estimating $k$ .

The methodology starts with initializing membership matrix $U$ with an upper limit number of clusters, $\bar{k}$ , which is larger than the $k$ users estimated. Based on $U$ , a Pearson correlation matrix is then computed. The values of $re_{jh}$ in the $R E$ correlation matrix represent the similarity between the $j^{\text{th}}$ and $h^{\text{th}}$ data points based on the membership values of $U$ . The computation of correlation matrix is performed using Eq. (6).

$\displaystyle re_{jh}=\frac{\mathop{\sum}\nolimits_{x=1}^{N}({\mu_{jx}-\bar{% \mu}})(\mu_{hx}-\bar{\mu}_{hx})}{\sqrt{\mathop{\sum}\nolimits_{x=1}^{N}(u_{jx}% -\bar{\mu}_{jx})^{2}}\sqrt{\mathop{\sum}\nolimits_{x=1}^{N}(u_{hx}-\bar{\mu}_{% hx})^{2}}}$ (6)

The $R E$ matrix represents the similarity between any pair of data points clustered using FCM. Clustering basically involves separating dissimilar data into different groups. FCM specifically maps data points into a $k$ -dimensional space constructed by unit membership vectors and separating data points in the converted space. Data points in different clusters are represented drastically by different membership vectors.

Applying SVD to $R E$ can decompose the matrix and derive singular values. A large singular value indicates that data points can be scattered along the corresponding vector; that is, the vector has a high potential of distinguishing data points, and can thus represent a cluster in FCM. Therefore, by examining singular values, we can determine the required number of clusters.

The percentage threshold is defined in Eq. (7). The top large singular values indicate the suitable number of clusters $k$ :

$\displaystyle\varphi=\frac{\mathop{\sum}\nolimits_{q=1}^{k}\lambda_{q}}{% \mathop{\sum}\nolimits_{p=1}^{N}\lambda_{p}}$ (7)

where $\lambda_{q}$ is the $q^{\text{th}}$ singular value, $N$ is the number of top singular values, and $\varphi$ is the percentage threshold.

The pseudocode of the proposed algorithm is shown in Fig. 2.

Figure 2.

Pseudocode of the proposed algorithm.

Figure 3 provides a block diagram of the proposed method summarizing its algorithm in simplified steps.

Figure 3.

Block diagram of the proposed algorithm.

4. Experimental results

To test the effectiveness and reliability of the proposed method, three simulations were performed on both synthesized and real datasets. To ensure the validity of results in the test simulations, all experiments were performed on the same device with the same initial parameters, and all were implemented using a 64-bit Windows 10 operating system. To evaluate the algorithms under the same development environment, all the algorithms were executed in Matlab language using an Intel Core i5 processor on a 4-GB RAM notebook.

4.1 Simulation 1

In this numerical experiment, we performed a simulation using the fcmdata.dat [20] dataset consisting of 140 instances in two dimensions. The proposed method was applied step by step. Users can set a threshold in Eq. (7) to identify the most influential clusters. The results for different thresholds are shown in Table 1.

Table 1
Relationship between thresholds and number of suitable clusters

Initial centers	Thresholds	Number of clusters
7	95%	5
7	90%	4
7	85%	4
7	80%	4
7	75%	3

Table 1 displays the number of influential clusters determined by the thresholds. When a higher threshold is set, more singular values should adequately describe the data characteristics. The number of singular values with high variance in the dataset represents the number of clusters. For example, when the threshold is set at 95%, the number of influential clusters is five. The other singular values contain only small variations, and therefore cannot be counted as having great influence in the dataset. In most cases, a threshold is set so that the percentage of variance accounts for at least 70% [32].

4.2 Simulation 2

This simulation experiment had two main purposes. First, we checked the reliability of the proposed method’s experimental results by comparing them with three well-known cluster validity indices, namely, the Calinski-Harabasz index [33], the Silhouette index [34], and the Davies-Bouldin index [35]. Second, we compared the time consumed by other methods and by this new method through cross-testing with other criteria on both artificial and real datasets. The Fisher’s Iris dataset from the UCI machine learning repository [36] contains 150 instances and four dimensions, whereas the synthesized pathway dataset [37] has 300 instances and three dimensions. To ensure comparison under the same conditions, we used the same initial parameters by setting the number of clusters at six for all datasets. The aforementioned hardware specification was again used in conducting comparisons for the optimal number of clusters $k$ between the proposed method and other benchmark methods using the same dataset. The simulation demonstrated the optimal number of clusters determined by these methods. In addition, the time consumed by each was calculated, as shown in Tables 2 and 3.

Table 2
Cross-testing between the proposed method and other approaches in the Fisher’s Iris dataset

Dataset	Algorithms	Clustering evaluation criteria	Optimal number of clusters	Time need to run (seconds)
FisherIris	k-means	Calinski-Harabasz	3	0.452413
		Silhouette	2	0.567920
		Davies-Bouldin	2	0.480737
	The proposed method	Threshold $=$ 90%	3	0.060326
		Threshold $=$ 80%	3	0.062826
		Threshold $=$ 70%	2	0.074892
	Gaussian mixture distribution	Calinski-Harabasz	2	0.910766
		Silhouette	2	1.095470
		Davies-Bouldin	2	0.930187

Table 3

Cross-testing between the proposed method and other approaches in a path-based dataset

Dataset	Algorithms	Clustering evaluation criteria	Optimal number of clusters	Time need to run (seconds)
Pathbased	k-means	Calinski-Harabasz	2	0.693987
		Silhouette	3	0.61293
		Davies-Bouldin	3	0.51858
	The proposed method	Threshold $=$ 90%	4	0.13207
		Threshold $=$ 80%	3	0.13081
		Threshold $=$ 70%	2	0.12763
	Gaussian mixture distribution	Calinski-Harabasz	4	1.90889
		Silhouette	4	0.83927
		Davies-Bouldin	4	0.73344

Tables 2 and 3 provide an empirical comparison of validation techniques using artificial and real datasets, respectively. Two other well-known algorithms (k-means clustering and Gaussian mixture distribution clustering) were employed for cross-checking of the three aforementioned cluster validity indices including the Calinski-Harabasz index, the Davies-Bouldin index and the Silhouette index. The outputs of the numerical experiment indicated that the suitable numbers of clusters determined by our method and the other methods were largely the same. This result validates the accuracy of the proposed algorithm. We then checked whether our method outperformed the others in terms of time consumption. Figures 4 and 5 display the results.

Figure 4.

Run time on Fisher’s Iris dataset.

Figure 4 shows the results of testing with Fisher’s Iris dataset using three methods and three criteria. Because the output of our method was the same as that of the others, we wished to know which method required less execution time. Our method consumed less time than the others. Specifically, the proposed method reduced execution time by approximately 90%.

Figure 5.

Run time on path-based dataset.

Figure 5 shows the results of testing for time consumption with a path-based dataset. Although both our method and the others determined the same number of clusters, ours had lower time consumption than the others. Specifically, the proposed method reduced execution time by 83% to 93%.

4.3 Simulation 3

To demonstrate the efficiency of the proposed method, we compared its time consumption with that of other methods by testing with a different number of data points. Datasets of many different sizes, from 1000 to 7000 normally distributed random data points with two dimensions, were created. The result is displayed in Fig. 6.

Figure 6.

Comparison of time consumption between methods.

The line graph in Fig. 6 illustrates the average time consumption of the three methods for five numbers of data points. The proposed method outperformed the k-means and Gaussian mixture clustering methods. The smaller the dataset size, the lower was the variance observed. However, when the dataset size increased, the variance in time consumption increased as well. For example, the execution time for 1000 data points did not differ greatly among the three methods, whereas when the number of data points increased to 7000, the proposed method reduced time consumption by approximately 88% compared with the Gaussian mixture method, and by 75.5% compared with the k-means method. These findings indicate our proposed method can dramatically reduce the time required to determine a suitable number of clusters. The positive results of Simulation 3 suggest that the proposed method could be used with large datasets containing a massive number of data records.

Figure 7.

Visualization of the original dataset.

Figure 8.

Visualize the dataset after clustering with 2 clusters when the threshold is 75%.

Figure 9.

Visualize the dataset after clustering with 3 clusters when the threshold is 85%.

Figure 10.

Visualize the dataset after clustering with 4 clusters when the threshold is 90%.

4.4 Simulation 4

The above simulations have shown the effectiveness of our approach in term of time consumption. When clustering the dataset, it is critical important to ensure the quality of clustering with the particular number of clusters. Thus, we visualize the clustering results then compare them to show the quality of the fcmdata.dat dataset with 140 data points and 2 features. In this simulation, we plot the original data points at the first step. Then, we plot the clusters of the dataset with different number of clusters respectively. By this way, we can visualize the quality of clustering results with different recommended number of clusters. The below graphics show the clusters in the different colours after running the proposed algorithm with different thresholds.

From the visualized results we obtained when clustering the same dataset, it is clearly to see that the clustering results are very different with various recommended number of clusters. In this dataset, the best clustering result we got with two clusters. When clustering with three or four clusters, the clustering results are not desirable enough as the data points are not distributed well in those cases. Therefore, we can check clearly the quality of clustering with the visualized figures.

4.5 Simulation 5

To validate the number of clusters, we added two more graphical simulations by using two benchmark approaches, namely, Gap statistic method [38] and Silhouette approach [34] to determine the suitable number of clusters. We use visualization techniques for understanding the statistical properties of the dataset to decide the number of clusters. The experiments are reported as follows:

Figure 11.

Relationship between Gap values and number of clusters.

Figure 12.

Relationship between Silhouette values and number of clusters.

$+$ Applying Gap statistic to decide the number of clusters:

Gap statistic is a well-known graphical approach to cluster evaluation involves plotting an error measurement of several numbers of clusters, and then it creates an “elbow” in the figure. The “elbow” shows the most significant decrease in error measurement. The gap statistic approach estimates the “elbow” point as the number of clusters with the largest gap value.

Based on the Fig. 11, the maximum value of the gap criterion occurs at 3 clusters. However, the value at 2 clusters is within a standard error of the maximum, so the suggested optimal number of clusters is 2.

$+$ Applying Silhouette clustering evaluation to decide the number of clusters:

To validate the number of clusters, we also simulate the second simulation by using graphical Silhouette clustering evaluation as follows:

From the above graphical simulations with two benchmark methods, we can conclude that the suitable number of clusters for this dataset is two. Our approach also recommends two clusters for this dataset. Therefore, our approach is a trust worthy method to determine number of clusters for fuzzy data.

4.6 Simulation 6: The high dimensional dataset

In order to evaluate the results with high-dimensional and skewed dataset. An experiment is conducted the Forest Cover Type drawn from the UCI Machine Learning Repository with 54 attributes and 581012 instances. This dataset is high-dimensional, skewed and has many outliers. The simulation results show that the proposed algorithm is outperformed comparing to benchmarking methods when clustering more complicated datasets. The experimental applied the validity indices to testify the efficiency and accuracy of the proposed method which are shown in Table 4.

Table 4
Cross-testing between the proposed method and other approaches in a skewed dataset

Dataset	Algorithms	Clustering evaluation criteria	Optimal number of clusters	Time need to run (seconds)
Forest Cover type	k-means	Calinski-Harabasz	5	7.944.4118
		Silhouette	4	8.138.1844
		Davies-Bouldin	5	7.490.5772
	The proposed method	Threshold $=$ 90%	5	862.4006
		Threshold $=$ 80%	5	828.3866
		Threshold $=$ 70%	5	803.0564
	Gaussian mixture distribution	Calinski-Harabasz	5	10375.7886
		Silhouette	4	9265.1352
		Davies-Bouldin	5	8356.8256

Table 5

Cross-testing between the proposed method and other approaches in a skewed dataset

Dataset	Algorithms	Clustering evaluation criteria	Optimal number of clusters	Time need to run (seconds)
Boston_house_price	k-means	Calinski-Harabasz	2	4.476696
		Silhouette	2	11.461738
		Davies-Bouldin	2	4.763787
	The proposed method	Threshold $=$ 90%	2	0.94663
		Threshold $=$ 80%	2	0.319015
		Threshold $=$ 70%	2	0.162081
	Gaussian mixture distribution	Calinski-Harabasz	2	41.481328
		Silhouette	2	47.364483
		Davies-Bouldin	2	50.588044

4.7 Simulation 7: The skewed dataset

To assess the results on a skewed dataset, an experiment is conducted on the Boston_house_price dataset which is drawn from the UCI Machine Learning Repository. From the results in the Table 5, the execution time of the proposed method is comparatively faster than the other approaches with the correct number of clustering. Hence, the proposed method is consistent when clustering skewed dataset.

4.8 Simulation 8: Dataset which has many outliers

To test the dataset with outliers, we have implemented the robust testing for the proposed algorithm on three synthetic datasets with two dimensions and 1000 instances. The datasets contain three percentages of outliers of 0.1%, 0.2% and 0.3% respectively. The results in the Table 6 can help to determine how well each method performs among these three outlier percentages and whether the proposed method is better than the Calinski-Harabasz, Silhouette and Davies-Bouldin indices on computing the number of clusters. Hence, the proposed method is robust to the limited percentages of outliers.

Table 6
Cross-testing between the proposed method and other approaches in the datasets with different percentages of outliners

Algorithms	Clustering evaluation criteria	Optimal number of clusters without
outliers	Optimal number of clusters with 0.1%
outliers	Optimal number of clusters with 0.2%
outliers	Optimal number of clusters with 0.3%
outliers	Average time need to run (seconds)
k-means	Calinski-Harabasz	4	4	4	4	6.325891
	Silhouette	4	4	4	4	25.970057
	Davies-Bouldin	4	4	4	4	6.520138
The proposed method	Threshold $=$ 90%	4	4	4	4	0.810394
	Threshold $=$ 80%	4	4	4	4	0.455169
	Threshold $=$ 70%	4	4	4	4	0.336067
Gaussian mixture distribution	Calinski-Harabasz	4	4	4	4	267.971229
	Silhouette	4	4	4	4	561.028721
	Davies-Bouldin	4	4	4	4	361.871897

5. Conclusion

This paper presents a new method for calculating a suitable number of clusters using a mixture of fuzzy and SVD approaches. The proposed method can reduce time consumption when determining the most effective number of clusters. The model was tested on both artificial and real datasets for validation, and empirical comparisons were also performed to prove the reliability and accuracy of the proposed method. Furthermore, the numerical experiment results indicated that the proposed method was consistent with lower time consumption. Compared with more traditional methods, the proposed method can perform faster when working with large datasets.

Future research could use this method in real-world case studies involving particular scenarios. We also plan to use the proposed method in clustering the massive datasets generated by IoT devices, medical imaging, car sensors, satellite imaging, navigation systems, wearable health-monitoring devices, appliances, and other such sources. Furthermore, the missing values in the datasets should be handled in the coming studies.

Footnotes

Acknowledgments

The authors thank the reviewers for their constructive comments and recommendations. This paper is supported by Ho Chi Minh city University of Technology and Education.

Conflict of interest

No potential conflict of interest is reported by the authors.

References

Chen

Duan

and Han

, Image segmentation method for crop nutrient deficiency based on fuzzy c-means clustering algorithm, Intelligent Automation & Soft Computing 18(8) (2012), 1145–1155.

Zhang

D.-Q.

and Chen

S.-C

, A novel kernelized fuzzy c-means algorithm with application in medical image segmentation, Artificial Intelligence in Medicine 32(1) (2004), 37–50.

Magne

and Kaymak

, Fuzzy modeling of client preference from large data sets: An application to target selection in direct marketing, IEEE Transactions on Fuzzy Systems 9(1) (2001), 153–163.

Joshi

and Raghu

, Robust fuzzy clustering methods to support web mining, in: The Proc. Workshop in Data Mining and knowledge Discovery, 1998.

Ahmadi

Yeh

C.-H

Papageorgiou

E.I.

and Martin

, An FCM-FAHP approach for managing readiness-relevant activities for ERP implementation, Computers & Industrial Engineering 88 (2015), 501–517.

Doulaye

and Kastner

, Fuzzy c-means method for clustering microarray data, Bioinformatics 19(8) (2003), 973–980.

Enke

and Mehdiyev

, Stock market prediction using a combination of stepwise regression analysis, differential evolution-based fuzzy clustering, and a fuzzy inference neural network, Intelligent Automation & Soft Computing 19(4) (2013), 636–648.

Hanesch

Scholger

and Dekkers

, The application of fuzzy c-means cluster analysis and non-linear mapping to a soil data set for the detection of polluted sites, Physics and Chemistry of the Earth, Part A: Solid Earth and Geodesy 26(11) (2001), 885–891.

Wei

and Mendel

J.M.

, Optimality tests for the fuzzy c-means algorithm, Pattern Recognition 27(11) (1994), 1567–1573.

10.

Beringer

and Eyke

, Fuzzy clustering of parallel data streams, Advances in fuzzy clustering and its application, 2007, 333–352.

11.

Bai

Liang

and Dang

, An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data, Knowledge-Based Systems 24(6) (2011), 785–795.

12.

Amorim

and Hennig

, Recovering the number of clusters in data sets with noise features using feature rescaling factors, Information Sciences 324 (2015), 126–145.

13.

Sugar

C.A.

and M James

, Finding the number of clusters in a dataset: An information-theoretic approach, Journal of the American Statistical Association 98(463) (2003), 750–763.

14.

Sun

Wang

and Jiang

, FCM-based model selection algorithms for determining the number of clusters, Pattern Recognition 37(10) (2004), 2027–2037.

15.

and Li

C.-X.

, Novel cluster validity index for FCM algorithm, Journal of Computer Science and Technology 21(1) (2006), 137–140.

16.

Dhillon

I.S.

and S Modha

, Concept decompositions for large sparse text data using clustering, Machine Learning 42(1) (2001), 143–175.

17.

Howland

and Park

, Generalizing discriminant analysis using the generalized singular value decomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8) (2004), 995–1006.

18.

Lee

J.S.

and Olafsson

, A meta-learning approach for determining the number of clusters with consideration of nearest neighbors, Information Sciences 232 (2013), 208–224.

19.

Liu

and Wang

, An automatic method to determine the number of clusters using decision-theoretic rough set, International Journal of Approximate Reasoning 55(1) (2014), 101–115.

20.

Arima

Hakamada

Okamoto

and Hanai

, Modified fuzzy gap statistic for estimating preferable number of clusters in fuzzy k-means clustering, Journal of Bioscience and Bioengineering 105(3) (2008), 273–281.

21.

Fang

and Wang

, Selection of the number of clusters via the bootstrap method, Computational Statistics & Data Analysis 56(3) (2012), 468–477.

22.

Liang

Zhao

Cao

and Dang

, Determining the number of clusters using information entropy for mixed data, Pattern Recognition 45(6) (2012), 2251–2265.

23.

Cichocki

Xie

and Choi

, Detecting the number of clusters in n-way probabilistic clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 32(11) (2010), 2006–2021.

24.

Pakhira

M.K.

, Finding number of clusters before finding clusters, Procedia Technology 4 (2012), 27–37.

25.

Cheung

Y.-M.

and Jia

, Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number, Pattern Recognition 46(8) (2013), 2228–2238.

26.

Maraziotis

I.A.

, A semi-supervised fuzzy clustering algorithm applied to gene expression data, Pattern Recognition 45(1) (2012), 637–648.

27.

R.-Q.

Chen

and Alugongo

, Fault diagnosis of rotating machinery based on SVD, FCM and RST, The International Journal of Advanced Manufacturing Technology 27(1-2) (2005), 128–135.

28.

Muliawati

and Murfi

, Eigenspace-based fuzzy c-means for sensing trending topics in Twitter, in: AIP Conference Proceedings, 2017.

29.

Guo

M.-F.

and Yang

N.-C.

, Features-clustering-based earth fault detection using singular-value decomposition and fuzzy c-means in resonant grounding distribution systems, International Journal of Electrical Power & Energy Systems 93 (2017), 97–108.

30.

Oliynyk

Bonifazzi

Montani

and Fadiga

, Automatic online spike sorting with singular value decomposition and fuzzy C-mean clustering, BMC Neuroscience 13(1) (2012), 96.

31.

Jolliffe

I.T.

, Principal component analysis and factor analysis, Principal Component Analysis, 2002, 150–166.

32.

Caliński

and Harabasz

, A dendrite method for cluster analysis, Communications in Statistics-theory and Methods 3(1) (1974), 1–27.

33.

Rousseeuw

P.J.

, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics 20 (1987), 53–65.

34.

Davies

D.L.

and Bouldin

D.W.

, A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence 2 (1979), 224–227.

35.

Asuncion

and Newman

, UCI machine learning repository, 2007.

36.

Chang

and Yeung

D.-Y.

, Robust path-based spectral clustering, Pattern Recognition 41(1) (2008), 191–203.

37.

Tibshirani

Walther

and Hastie

, Estimating the number of clusters in a data set via the gap statistic, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2) (2001), 411–423.

38.

Jock

and Blackard

, UCI machine learning repository, http://archive.ics.uci.edu/ml, 1998.

39.

Harrison

and Rubinfeld