A new weighted fuzzy C-means clustering approach considering between-cluster separability

Abstract

Fuzzy C-means clustering algorithm (FCM) is an effective approach for clustering. However, in most existing FCM type frameworks, only in-cluster compactness is taken into account, whereas the between-cluster separability is overlooked. In this paper, to enhance the clustering, by incorporating the feature weighting and data weighting method, we put forward a new weighted fuzzy C-means clustering approach considering between-cluster separability, in which for achieving good compactness and separability, making the in-cluster distances as small as possible and making the between-cluster distances as large as possible, the in-cluster distances and between-cluster distances are taken into account; To achieve the optimal clustering result, the iterative formulas of the feature weights, membership degrees, data weights and cluster centers are obtained by maximizing the in-cluster compactness and the between-cluster separability. Experiments on real-world datasets were carried out, the results showed that the new approach could obtain promising performance.

Keywords

Fuzzy C-means data weighting feature weighting between-cluster separability

1 Introduction

Clustering is a commonly used tool for unsupervised machine learning. Clustering is to divide a dataset into clusters in which the data in different clusters have dissimilar properties whereas data in the same cluster have similarities as more as possible by identifying inherent structures [1]. Clustering has a wide range of uses in kinds of application fields such as image segmentation [2], fault detection [3], vehicle suspension system [4], text organization [5], bioinformatics [6] and others.

Due to simple structure, low computation complexity and easy implementation, fuzzy C-means clustering approach (FCM) [7] has been a widely known clustering approaches and used in many real-world applications [8 –18]. In addition, FCM can overcome the problem existing in the hard clustering algorithms with soft partition, that is, a given data could be partitioned into different clusters by different membership degrees.

In most existing FCM type frameworks, only in-cluster compactness is taken into account, whereas the between-cluster separability is overlooked; in the basic FCM, each feature and each data are treated equally, the different importance of different data objects and different features cannot be distinguished effectively. To overcome the above two shortcomings simultaneously, in this paper, we put forward a new weighted fuzzy C-means clustering approach considering between-cluster separability, in which by incorporating the feature weighting and data weighting method, for achieving good compactness and separability, making the in-cluster distances as small as possible and making the between-cluster distances as large as possible, the in-cluster distances and between-cluster distances are taken into account, we maximize between-cluster separability by maximizing the sum of the distances between all data and the other cluster centers; to avoid overfitting and avoid the situation that few features with large feature weights or few data with large data weights may dominate the clustering process, we incorporate the l²-norm regularization terms to feature weights and data weights, so, a novel objective function is developed; based on the new objective function, the iterative formulas of the feature weights, membership degrees, data weights and cluster centers are obtained by maximizing the in-cluster compactness and the between-cluster separability. Real-world datasets are used to assess the performance of the new approach, experimental results have shown that the new algorithm can achieve good clustering performance.

2 Related work

In this section, we get the related works improving the performance of FCM briefly reviewed.

By learning according to the gradient descent technique, a new feature-weight assignment method is given, and an improving FCM is proposed [19]; To address the problem that the feature-weight vector cannot be adjusted adaptively during the training phase, an improved feature-weighted FCM (IFWFCM) is put forward [20]; Taking the internal connectivity of all data into account, an adaptive fuzzy clustering algorithm (AFCM) is proposed [21]; Considering data weights, an enhanced FCM method in which the values of adaptive parameters are optimized by simulated annealing integrated with particle swarm optimization (SA-PSO) is presented [22]; Taking the neighborhood of each data into account, a conditional spatial fuzzy C-means method is presented [23]; For clustering incomplete data and general non-spherical datasets, an attribute weighted mercer kernel based fuzzy clustering algorithm is put forward [24]; By considering different feature weights and data weights, an improved fuzzy C-means method (DwfwFcm) is proposed [25]; In order to enhance the quality of segmentation for images, considering local information, a generalized kernel weighted FCM is presented [26]; By introducing the feature weighted distance and the power exponent, a double-indices fuzzy subspace clustering algorithm is put forward [27]; A fuzzy clustering integrating the entropy regulation to feature weights (EWFCM) is put forward [28]; By adding the sum of the distances between cluster centers as the separation item, the fuzzy inter-cluster separation clustering method (FICSC) is proposed [29].

Considering the limitations that in most above FCM type frameworks, only in-cluster compactness is taken into account, whereas the between-cluster separability is overlooked, we put forward a new weighted fuzzy C-means approach which not only can perform data weighting and feature weighting, but also can take in-cluster compactness and between-cluster separability into account, at the same time. Also, the l²-norm regularization terms to feature weights and data weights are incorporated into the objective function to harmonize the weighting scatter which makes the approach more effective.

3 A new weighted fuzzy C-means clustering approach considering between-cluster separability

3.1 The proposed approach

In this paper, to achieve good compactness and separability, to make the in-cluster distances as small as possible, make the between-cluster distances as large as possible and make the data not belonging to cluster i be away from the center of cluster i as far as possible, the in-cluster distances and between-cluster distances are taken into account; to distinguish the significance of different feature and different data, an adaptive feature weights matrix and data weights vector [23] are introduced; to avoid overfitting and avoid the situation that few features with large feature weights or few data with large data weights may dominate the clustering process, we incorporate the l²-norm regularization terms to feature weights and data weights into the objective function, so the novel objective function of the new weighted fuzzy C-means clustering algorithm considering between-cluster separability (WFCM_bc) can be defined as: $\begin{matrix} J_{WFCM_bc} (X, U, V, W, F) = \\ \begin{matrix} \end{matrix} \sum_{k = 1}^{n} w_{k}^{2} \sum_{i = 1}^{c} u_{ik}^{2} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} \\ \begin{matrix} \end{matrix} - ɛ \sum_{k = 1}^{n} w_{k}^{2} \sum_{i = 1}^{c} (1 - u_{ik})^{2} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} \\ \begin{matrix} \end{matrix} + \frac{1}{2} γ \sum_{k = 1}^{n} w_{k}^{2} + \frac{1}{2} θ \sum_{k = 1}^{n} f_{ij}^{2} \end{matrix}$ (1)

J_{WFCM_bc} is subjected to the constraint: $\begin{matrix} \sum_{j = 1}^{m} f_{ij} = 1; \sum_{i = 1}^{c} u_{ik} = 1; \prod_{k = 1}^{n} w_{k} = 1 \\ f_{ij} \in [0, 1]; u_{ik} \in [0, 1]; 1 ⩽ i ⩽ c, 1 ⩽ k ⩽ n . \end{matrix}$ (2)

In Equation (1), the first item is the sum of in-cluster distances indicating the compactness of the fuzzy partition, X = [x₁, x₂, ⋯ , x_n], x_k = [x_k1, x_k2, ⋯ , x_km], m is the number of features, $U = [\begin{matrix} u_{11} & \dots & u_{1 n} \\ ⋮ & ⋮ & ⋮ \\ u_{c 1} & \dots & u_{cn} \end{matrix}]$ , u_ik is membership degree value of data x_k referring to the cluster i; The second item is the sum of the between-cluster distances indicating the separability of the fuzzy partition, 1 - u_ik is the probability of x_k not belonging to cluster i, ɛ is a parameter to make a tradeoff of the distance inside clusters and the distance between clusters; W = [w₁, ⋯ , w_k ⋯ , w_n], w_k shows the weight of data x_k; $V = [\begin{matrix} v_{11} & \dots & v_{1 m} \\ ⋮ & ⋮ & ⋮ \\ v_{c 1} & \dots & v_{cm} \end{matrix}]$ , the ith row v_i stands for the ith cluster center; $F = [\begin{matrix} f_{11} & \dots & f_{1 m} \\ ⋮ & ⋮ & ⋮ \\ f_{c 1} & \dots & f_{cm} \end{matrix}]$ , the ith row f_i stands for the weights of m features referring to the cluster i, 1 ⩽ i ⩽ c, c is the number of clusters; the third item is l²-norm regularization to data weights for harmonizing the data weighting scatter and stimulating more data to take part in clustering; the fourth item is l²-norm regularization terms to feature weights. γ, θ are the regularization coefficients for harmonizing the weighting scatter and controlling the regularization of feature weights and data weights in the process of clustering.

3.2 The iterative rules

We can solve the above optimization problem by introducing Lagrange multipliers function: $\begin{matrix} ϕ_{WFCM_bc} (U, V, W, F) = \\ \begin{matrix} \end{matrix} \sum_{k = 1}^{n} w_{k}^{2} \sum_{i = 1}^{c} (u_{ik}^{2} - ɛ u_{ik}^{2} + 2 ɛ u_{ik} - ɛ) \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} \end{matrix}$

$\begin{matrix} \begin{matrix} + \frac{1}{2} γ \sum_{k = 1}^{n} w_{k}^{2} \end{matrix} + \frac{1}{2} θ \sum_{k = 1}^{n} f_{ij}^{2} + ϕ_{1} (\sum_{i = 1}^{c} u_{ik} - 1) \\ \begin{matrix} + ϕ_{2} (\prod_{k = 1}^{n} w_{k} - 1) + ϕ_{3} (\sum_{j = 1}^{m} f_{ij} - 1) \end{matrix} \end{matrix}$ (3)

3.2.1 The iterative rule of u_ik

Let $d_{ik}^{2} = w_{k}^{2} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2}$ , by setting ∂ϕ_{WFCM_bc} (U, V, W, F) / ∂u_ik = 0, we can get the updating rule of u_ik: $\begin{matrix} \frac{\partial ϕ_{WFCM_bc} (U, V, W, F)}{\partial u_{ik}} = (2 u_{ik} - 2 ɛ u_{ik} + 2 ɛ) d_{ik}^{2} + ϕ_{1} = 0 \\ u_{ik} = \frac{- ϕ_{1} - 2 ɛ d_{ik}^{2}}{(2 - 2 ɛ) d_{ik}^{2}} \end{matrix}$

Because $\sum_{p = 1}^{c} u_{pk} = 1$ $\begin{matrix} \sum_{p = 1}^{c} \frac{- ϕ_{1} - 2 ɛ d_{pk}^{2}}{(2 - 2 ɛ) d_{pk}^{2}} = 1 \\ - ϕ_{1} = \frac{1 - ɛ + ɛ c}{(1 - ɛ) . \sum_{p = 1}^{c} 1 / (2 - 2 ɛ) d_{pk}^{2}} \\ u_{ik} = \frac{(1 - ɛ + ɛ c) . ((2 - 2 ɛ) d_{ik}^{2})^{- 1}}{(1 - ɛ) . \sum_{p = 1}^{c} ((2 - 2 ɛ) d_{pk}^{2})^{- 1}} - \frac{ɛ}{1 - ɛ} \end{matrix}$ (4)

3.2.2 The iterative rule of w_k

Let $g_{ik} = u_{ik}^{2} - ɛ u_{ik}^{2} + 2 ɛ u_{ik} - ɛ$ , By setting ∂ϕ_{WFCM_bc} (U, V, W, F) / ∂w_k = 0, we can get the updating rule of w_k: $\begin{matrix} \frac{\partial ϕ_{WFCM_bc} (U, V, W, F)}{\partial w_{k}} = 2 w_{k} \sum_{i = 1}^{c} g_{ik} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} \\ \begin{matrix} \begin{matrix} \begin{matrix} \end{matrix} \end{matrix} \end{matrix} + γ w_{k} + ϕ_{2} \prod_{p = 1, p \neq k}^{n} w_{p} = 0 \\ ϕ_{2} = \frac{- [2 w_{k} \sum_{i = 1}^{c} g_{ik} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} + γ w_{k}]}{\prod_{p = 1, p \neq k}^{n} w_{p}} \end{matrix}$ $\begin{matrix} w_{k} = \frac{- ϕ_{2} \prod_{p = 1, p \neq k}^{n} w_{p}}{[2 \sum_{i = 1}^{c} g_{ik} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} + γ]} \\ \begin{matrix} \end{matrix} = {[\frac{- ϕ_{2}}{[2 \sum_{i = 1}^{c} g_{ik} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} + γ]}]}^{1 / 2} \end{matrix}$

Because $\prod_{p = 1}^{n} w_{p} = 1$

We can get: $\begin{matrix} ϕ_{2} = \prod_{p = 1}^{n} [- 2 \sum_{i = 1}^{c} g_{ip} \sum_{j = 1}^{m} f_{ij}^{2} (x_{pj} - v_{ij})^{2} - γ]^{1 / n} \\ w_{k} = {[\frac{[\prod_{p = 1}^{n} [2 \sum_{i = 1}^{c} g_{ip} \sum_{j = 1}^{m} f_{ij}^{2} (x_{pj} - v_{ij})^{2} + γ]^{1 / n}]}{2 \sum_{i = 1}^{c} g_{ik} \sum_{j = 1}^{m} f_{ij}^{2} (x_{kj} - v_{ij})^{2} + γ}]}^{1 / 2} \end{matrix}$ (5)

3.2.3 The iterative rule of f_ij

By setting ∂ϕ_{WFCM_bc} (U, V, W, F) / ∂f_ij = 0, we can get the updating rule of f_ij: $\begin{matrix} \frac{\partial ϕ_{WFCM_bc} (U, V, W, F)}{\partial f_{ij}} = \\ \begin{matrix} \begin{matrix} \end{matrix} & 2 \sum_{k = 1}^{n} w_{k}^{2} g_{ik} f_{ij} (x_{kj} - v_{ij})^{2} + ϕ_{3} + θ f_{ij} = 0 \end{matrix} \\ f_{ij} = \frac{- ϕ_{3}}{2 \sum_{k = 1}^{n} w_{k}^{2} g_{ik} (x_{kj} - v_{ij})^{2} + θ} \end{matrix}$

Because $\sum_{p = 1}^{m} f_{ip} = 1$ $\begin{matrix} - ϕ_{3} = \frac{1}{\sum_{p = 1}^{m} {([\sum_{k = 1}^{n} 2 w_{k}^{2} g_{ik} (x_{kp} - v_{ip})^{2} + θ])}^{- 1}} \\ f_{ij} = \frac{(\sum_{k = 1}^{n} 2 w_{k}^{2} g_{ik} (x_{kj} - v_{ij})^{2} + θ)^{- 1}}{\sum_{p = 1}^{m} {(\sum_{k = 1}^{n} 2 w_{k}^{2} g_{ik} (x_{kp} - v_{ip})^{2} + θ)}^{- 1}} \end{matrix}$ (6)

The iterative rule of v_ij

By setting ∂ϕ_{WFCM_bc} (U, V, W, F) / ∂v_ij = 0, we get the updating rule of v_ij: $v_{ij} = \frac{\sum_{k = 1}^{n} w_{k}^{2} g_{ik} f_{ij}^{2} x_{kj}}{\sum_{k = 1}^{n} w_{k}^{2} g_{ik} f_{ij}^{2}}$ (7)

3.3 Complexity analysis

During the process of clustering by the new approach, in each iteration, we need to update the membership matrix U, feature weights F, data weights W and centroids V, respectively. Suppose the number of iterations is t, we can get the computational updating cost: O (tcmn).

4 Experiments and results

4.1 Datasets

To assess the performance of WFCM_bc presented, by using UCI datasets, experiments were carried out. Knowledge, Haberman, Messidor, Wine and Sonar datasets are used. The information of different datasets is shown in Table 1.

Table 1
The number of categories, number of features and sample size of different datasets

Dataset Sample size Number of features Number of categories

Knowledge 403 5 4

Haberman 306 3 2

Messidor 1151 19 2

Wine 150 4 3

Sonar 208 60 2

Dataset	Sample size	Number of features	Number of categories
Knowledge	403	5	4
Haberman	306	3	2
Messidor	1151	19	2
Wine	150	4	3
Sonar	208	60	2

4.2 Comparative study

In this paper, we apply external cluster performance metrics: normalized mutual information (NMI), Accuracy (AC), Rand index (RI) [30] and internal cluster performance metric: Xie and Beni index (XB) [31] to assess clustering performance.

All datasets are dealt with normalization. To figure out the different impacts on the clustering results with different values of ɛ, γ and θ, we have carried out experiments by comparing the clustering performance (AC, NMI, RI) with the increments of ɛ, γ and θ with 0.0001,0.001, 0.02, 0.2 and 1 when the values of ɛ, γ and θ are in the range of (0,0.001], (0.001,0.01], (0.01,0.1], (0.1,1] and (1,15], respectively. In general, we could obtain the best clustering results when the values of ɛ, γ and θ are set in the range of (0,0.05], (0,0.5] and (0.01,10], respectively, depending on different datasets. The smaller γ is, the difference between data weights will be larger; The smaller θ is, the difference between feature weights will be larger, the difference between feature weights will be smaller with the increase of the value of θ.

We compare the clustering results produced by the proposed WFCM_bc, the conventional FCM [7] and commonly used weighted FCM algorithms: AFCM [21], double-indices fuzzy subspace clustering algorithm based on feature weighted distance (DI-FSC) [27], EWFCM [28]. The clustering results are obtained over 30 independent experiments by different approaches for reducing the impact of initialization.

Take the Knowledge dataset as example, the comparison of XB when the algorithms are convergent is shown in Table 2. XB is the ratio of compactness (COMP) to separation (SPT), $COMP = \sum_{k = 1}^{n} \sum_{i = 1}^{c} u_{ik}^{2} d_{ik}^{2} / n$ , $SPT = min_{i \neq j} {∥ v_{i} - v_{j} ∥}^{2}$ . A good clustering partition could lead to a high value of SPT and a small value of COMP, the better the clustering partition is, the smaller the XB index is. Hence, by the minimum value point of the XB index, we can determine the optimal cluster number. From Table 2, we can see that XB index of WFCM_bc is smaller than the other clustering approaches, which shows that by taking the in-cluster distances and between-cluster distances into account, making the in-cluster distances as small as possible and making the between-cluster distances as large as possible, the proposed WFCM_bc approach has a better clustering partition, could achieve better compactness and separability.

Table 2
The comparison of XB index on Knowledge dataset with format: mean (±standard deviation)

Algorithm FCM AFCM EWFCM DI-FSC WFCM_bc

COMP 0.2450(±0.0011) 0.2257(±0.0002) 0.0774(±0.0000) 0.2051(±0.0097) 0.0777(±0.0001)

SPT 0.0139(±0.0064) 0.1575(±0.0111) 0.0029(±0.0011) 0.3504(±0.0359) 0.1657(±0.0111)

XB 21.549(±11.370) 1.4363(±0.1028) 31.411(±18.039) 0.5869(±0.0323) 0.4701(±0.0325)

Algorithm	FCM	AFCM	EWFCM	DI-FSC	WFCM_bc
COMP	0.2450(±0.0011)	0.2257(±0.0002)	0.0774(±0.0000)	0.2051(±0.0097)	0.0777(±0.0001)
SPT	0.0139(±0.0064)	0.1575(±0.0111)	0.0029(±0.0011)	0.3504(±0.0359)	0.1657(±0.0111)
XB	21.549(±11.370)	1.4363(±0.1028)	31.411(±18.039)	0.5869(±0.0323)	0.4701(±0.0325)

The comparison of the mean AC among different approaches on different datasets is shown in Fig. 1; The analysis of variance (ANOVA) has been conducted based on a confidence level of 95%, the statistical analysis including ANOVA results and the comparison of the three external performance metrics is shown in Table 3. Based on the p-values, we can see: AC and RI of the different five approaches have significant difference; we can see from Fig. 1 and Table 3: the mean three performance metrics: NMI, RI and AC of WFCM_bc are better than the other clustering approaches, which shows that the new approach can effectively get the clustering performance improved.

Fig. 1

The comparison of the mean AC among different approaches on different datasets.

Table 3

The statistical analysis of the external cluster performance metrics: ANOVA results and the comparison with format: mean (±standard deviation)

	Knowledge	Haberman	Messidor	Wine	Sonar
AC	FCM	0.4475(±0.0158)	0.5204(±0.0015)	0.5308(±0.0000)	0.9494(±0.0000)	0.5531(±0.0024)
	AFCM	0.4511(±0.0010)	0.5062(±0.0066)	0.5317(±0.0000)	0.9484(±0.0227)	0.5746(±0.0049)
	EWFCM	0.4674(±0.0232)	0.6619(±0.1154)	0.5377(±0.0230)	0.9417(±0.0629)	0.5616(±0.0146)
	DI-FSC	0.4791(±0.0241)	0.7240(±0.0689)	0.5375(±0.0321)	0.9141(±0.0291)	0.5515(±0.0161)
	WFCM_bc	0.5188(±0.0012)	0.7450(±0.0000)	0.5535(±0.0090)	0.9515(±0.0283)	0.5817(±0.0017)
	p-value	1.31E-12	1.21E-14	2.83E-4	5.58E-8	2.88E-5
RI	FCM	0.6727(±0.0024)	0.4991(±0.0001)	0.5015(±0.0000)	0.9331(±0.0000)	0.5032(±0.0)
	AFCM	0.6787(±0.0001)	0.4985(±0.0002)	0.5016(±0.0000)	0.9319(±0.0026)	0.5088(±0.0)
	EWFCM	0.6789(±0.0111)	0.5752(±0.0605)	0.5032(±0.0034)	0.9239(±0.0076)	0.5056(±0.0)
	DI-FSC	0.6701(±0.0131)	0.6077(±0.0360)	0.5037(±0.0051)	0.8918(±0.0324)	0.5034(±0.0034)
	WFCM_bc	0.6913(±0.0032)	0.6189(±0.0000)	0.5077(±0.0024)	0.9355(±0.0033)	0.5126 (±0.0054)
	p-value	7.81E-7	2E-14	2.32E-5	1.73E-8	2.35E-5
NMI	FCM	0.2630(±0.0067)	0.0024(±0.0001)	0.0015(±0.0000)	0.8336(±0.0000)	0.0086(±0.0)
	AFCM	0.2682(±0.0007)	0.0015(±0.0004)	0.0016(±0.0000)	0.8314(±0.0048)	0.0171(±0.0)
	EWFCM	0.2847(±0.0137)	0.0437(±0.0327)	0.0116(±0.0016)	0.8170(±0.0137)	0.0143(±0.0)
	DI-FSC	0.2303(±0.0332)	0.0700(±0.0232)	0.0189(±0.0054)	0.7479(±0.0659)	0.0180(±0.0128)
	WFCM_bc	0.2789(±0.0061)	0.0673(±0.0000)	0.0118(±0.0025)	0.8383(±0.0065)	0.0238(±0.0095)
	p-value	3.28E-9	1.12E-14	3.9E-21	7.98E-9	0.1127

5 Conclusion

In this paper, a new kind of weighted fuzzy C-means clustering approach considering between-cluster separability is put forward. By incorporating the feature weighting and data weighting method, considering between-cluster separability, we put forward a novel objective function in which the in-cluster distances and between-cluster distances are taken into account, to avoid overfitting and avoid the situation that few features with large feature weights or few data with large data weights may dominate the clustering process, the l²-norm regularization terms to feature weights and data weights are incorporated into the objective function. The iterative formulas of the feature weights, membership degrees, data weights and cluster centers are obtained by maximizing the in-cluster compactness and the between-cluster separability. Experimental results show that the new algorithm can effectively get the clustering performance improved.

There still are some limitations of this approach, such as: it is sensitive to initialization. In future studies, we could combine other methods to further improve the clustering quality and apply this approach to address some practical problems.

Compliance with ethical standards

Conflict of interest: the authors declare that there is no conflict of interest.

Footnotes

Acknowledgments

This work is supported by open project of Anhui province key laboratory of special and heavy load robot (No.TZJQR004-2020), science and technology projects of Xuancheng (No.1932) and national natural science foundation of China (No.61601004, 61602008).

References

Huang

, Ng

, Rong

and Li

, Automated variable weighting in k-means type clustering[J], IEEE Transactions on Pattern Analysis & Machine Intelligence 27(5) (2005), 657–668.

Liu

, Zhang

and Wang.

, Incorporating adaptive local information into fuzzy clustering for image segmentation[J], IEEE Transactions on Image Processing 24(11) (2015), 3990–4000.

Yin

, Gao

, Qiu

and Kaynak

, Fault detection for nonlinear process with deterministic disturbances: a just-in-time learning based data driven method[J], IEEE Transactions on Cybernetics 47(11) (2016), 3649–3657.

Yin

and Huang

, Performance monitoring for vehicle suspension system via fuzzy positivistic c-means clustering based on accelerometer measurements[J], IEEE/ASME Transactions on Mechatronics 20(5) (2014), 2613–2620.

Jing

, Ng

and Huang

, An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data[J], IEEE Transactions on Knowledge & Data Engineering 19(8) (2007), 1026–1041.

Moreno-Hagelsieb

, Wang

, Walsh

, El

and Phylogenomic

Sherbiny.

, clustering for selecting non-redundant genomes for comparative genomics[J], Bioinformatics 29(7) (2013), 947–949.

Dunn.

J.C.

, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters[J], Journal of Cybernetics 3 (1973), 32–57.

Bezdek

J.C.

, Pattern recognition with fuzzy objective function algorithms, New York: Plenum Press (1981).

Maity

S.P.

, Chatterjee

and Acharya

, On optimal fuzzy c-means clustering for energy efficient cooperative spectrum sensing in cognitive radio networks[J], Digital Signal Process 49 (2016), 104–115.

10.

Kesemen

, Özge

and Özkul

, Fuzzy c-means clustering algorithm for directional data (FCM4DD)[J], Expert Systems with Applications 58 (2016), 76–82.

11.

Verma

, Agrawal

R.K.

and Sharan

, An improved intuitionistic fuzzy c-means clustering algorithm incorporating local information for brain image segmentation[J], Applied Soft Computing 46 (2016), 543–557.

12.

Ramathilagam

and Huang

Y.M.

, Extended Gaussian kernel version of fuzzy c-means in the problem of data analyzing[J], Expert Systems with Applications 38 (2011), 3793–3805.

13.

Maity

S.P.

, Chatterjee

and Acharya

, On optimal fuzzy c-means clustering for energy efficient cooperative spectrum sensing in cognitive radio networks[J], Digital Signal Process 49 (2016), 104–115.

14.

, Song

, Zhang

, Ouyang

and Khan

S.U.

, MapReducebased fast fuzzy c-means algorithm for large-scale underwater image segmentation[J], Future Generation Computer Systems 65 (2016), 90–101.

15.

Ban

, Ban

and Tus

E.D.A.

, Importance-performance analysis by fuzzy C-means algorithm[J], Expert Systems with Applications 50 (2016), 9–16.

16.

Liu

, Sun

S.Z.

, Yu

, Yue

and Zhang

, A modified Fuzzy C-Means (FCM) Clustering algorithm and its application on carbonate fluid identification[J], Journal of Applied Geophysics 129 (2016), 28–35.

17.

Haldar

N.A.H.

, Khan

F.A.

, Ali

and Abbas

, Arrhythmia classification using mahalanobis distance based improved Fuzzy C-Means Clustering for mobile health monitoring systems[J], Neurocomputing 220 (2017), 221–235.

18.

Pimentel

B.A.

and de Souza

R.M.C.R.

, A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data[J], Expert Systems with Applications 41 (2014), 3223–3236.

19.

Wang

, Wang

and Wang

, Improving fuzzy c-means clustering based on feature-weight learning[J], Pattern Recognition Letters 25 (2004), 1123–1132.

20.

Xing

H.J.

and Ha

M.H.

, Further improvements in feature-weighted Fuzzy C-means[J], Information Sciences 267 (2014), 1–15.

21.

Tang

C.L.

and Wang

S.G.

, Adaptive fuzzy clustering model based on internal connectivity of all data points[J], Acta Automatica Sinica 36(11) (2010), 1544–1556.

22.

Z.H.

, Wu

Z.C.

and Zhang

, An improved FCM algorithm with adaptive weights based on SA-PSO[J], Neural Computing & Applications 28 (2017), 3113–3118.

23.

Adhikari

S.K.

, Sing

J.K.

, Basu

D.K.

and Nasipuri

, Conditional spatial fuzzy C-means clustering algorithm for segmentation of MRI images[J], Applied Soft Computing 34 (2015), 758–769.

24.

Shen

H.B.

, Yang

, Wang

S.T.

and Liu

X.J.

, Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets[J], Soft Computing 10 (2006), 1061–1073.

25.

Z.H.

and Wang

, DwfwFcm: An effective fuzzy c-means clustering framework considering the different data weights and feature weights[J], Journal of Intelligent & Fuzzy Systems 37(3) (2019), 4339–4347.

26.

Memon

K.H.

and Lee

D.H.

, Generalised kernel weighted fuzzy C-means clustering algorithm with local information[J], Fuzzy Sets and Systems 340 (2018), 91–108.

27.

Wang

, Wang

S.T.

and Wang

X.M.

, Double-indices fuzzy subspace clustering algorithm based on feature weighted distance[J], Control and Decision 25(8) (2010), 1207–1210.

28.

Zhou

, Chen

C.L.P.

, Zhang

and Li

H.X.

, Fuzzy clustering with the entropy of attribute weights[J], Neurocomputing 198 (2016), 125–134.

29.

X.H.

, Wu

, Sun

and Zhao

J.W.

, Mixed fuzzy inter-cluster separation clustering algorithm[J], Applied Mathematical Modelling 35 (2011), 4790–4795.

30.

Silva Filho

T.M.

, Pimentel

B.A.

, Souza

R.M.C.R.

and Oliveira

A.L.I

, Hybrid methods for fuzzy clustering based on fuzzy c-means and improved particle swarm optimization[J], Expert Systems with Applications 42 (2015), 6315–6328.

31.

Zhou

K.L.

, Fu

and Yang

S.L.

, Fuzziness parameter selection in fuzzy c-means: the perspective of cluster validation[J], Science China Information Sciences 24(57) (2014), 1–8.

A new weighted fuzzy C-means clustering approach considering between-cluster separability

Abstract

Keywords

1 Introduction

2 Related work

3 A new weighted fuzzy C-means clustering approach considering between-cluster separability

3.1 The proposed approach

4 Experiments and results

4.1 Datasets

Table 1 The number of categories, number of features and sample size of different datasets Dataset Sample size Number of features Number of categories Knowledge 403 5 4 Haberman 306 3 2 Messidor 1151 19 2 Wine 150 4 3 Sonar 208 60 2

Compliance with ethical standards

Footnotes

Acknowledgments

References

Table 1
The number of categories, number of features and sample size of different datasets

Dataset Sample size Number of features Number of categories

Knowledge 403 5 4

Haberman 306 3 2

Messidor 1151 19 2

Wine 150 4 3

Sonar 208 60 2