Adaptive fuzzy C-means clustering integrated with local outlier factor

Abstract

The conventional fuzzy C-means (FCM) is sensitive to the initial cluster centers and outliers, which may cause the centers deviate from the real centers when the algorithm converges. To improve the performance of FCM, a method of initializing the cluster centers based on probabilistic suppression is proposed and an improved local outlier factor is integrated into the model of FCM. Firstly, the probability of an object as cluster center is defined by its local density, and all initial centers are obtained by the cluster center’s probability and probability suppression function incrementally. Next, an improved local outlier factor is reconstructed according to the local distribution of an object, and its reciprocal is regarded as the contribution degree of an object to cluster center. Then, the improved local outlier factor is integrated into FCM to alleviate the negative effect caused by outliers. Finally, experiments on synthetic and real-world datasets are provided to demonstrate the clustering performance and anti-noise ability of proposed method.

Keywords

FCM local outlier factor initial cluster center outlier

1. Introduction

Clustering analysis aims to find the most natural way to partition dataset. It is a research hotspot of unsupervised learning and widely used in data mining, pattern recognition and computer vision [1, 2, 3, 4]. The essence of clustering is a process of dividing a dataset into multiple non-empty disjoint subsets, each of which is called a cluster. In the final result of partitioning, it is necessary to ensure that objects in the same cluster have a high similarity, and have a low similarity in different clusters.

Researchers have proposed many clustering algorithms, which can be roughly divided into the following categories: partitioning clustering, spectral clustering, hierarchy-based clustering, density-based clustering, grid-based clustering, etc. The main idea of partitioning clustering algorithm is: Given a dataset ${\bf{X}}$ containing $n$ objects, a partitioning method divide ${\bf{X}}$ into $c({c<n})$ disjoint subsets. The main method is to iterate all objects, and then to obtain the optimal partitioning result by moving each object between clusters continuously. During the partitioning process, the reasonableness of the partitioning result is measured by a custom loss function. The optimal partitioning result usually makes the loss function optimal, and each cluster shows the natural geometrical structure of the dataset. FCM clustering algorithm [5], as one of the classical partitioning clustering method, has been paid much attention by scholars whose work can be summarized as follows:

(1)
Adaptively obtain the number of clusters to avert it’s given manually.
(2)
Adaptively initialize cluster centers to avoid falling into the local optimum.
(3)
While updating the cluster centers in iteration, a weighting factor is integrated into the model of FCM to reduce sensitivity to outliers.
(4)
Introduce an appropriate distance metric, which has a positive effect on dataset with complicated geometric structure.

This paper focuses on the second and third issues mentioned above. On the one hand, to weaken the influence of random initial cluster centers on the clustering performance, some scholars adopt the adaptive method of obtaining the initial cluster centers instead of random selection. Their core idea is: the objects with high density are used as initial centers, and the distance among them is large. On the other hand, some scholars have proposed a sample-weighted or feature-weighted method that provided weight for each object. In recent years, many researchers use local outlier factor to modify the cluster centers and optimize the clustering process.

Although a large number of fruitful works have been done by predecessors, there are still further works on the points (1) and (3) mentioned above. In this study, we propose a method to initialize the cluster centers with probabilistic suppression and an improved FCM algorithm integrated with a new local outlier factor. The Probability Suppression Method is referred to as PSM and the improved FCM is named as FCM-iLOF. The works of this study can be summarized as follows:

(1)
A new method called PSM is proposed to improve the selection rule of initial centers.
(2)
A novel local outlier factor is redefined to estimate the contribution degree of objects.
(3)
The experimental results indicate the proposed approaches achieve better performance than others.

The rest of this paper is organized as follows: Section 2 introduces the method of initializing cluster centers and the improved local outlier factor proposed by previous scholars. Section 3 describes the original FCM algorithm. PSM and FCM-iLOF algorithm are proposed in Section 4. In Section 5, we analyze the experiments. Finally, the summary of this work is given in Section 6.
2. Related works

In this section, we mainly review the method of selecting initial centers and the algorithm for measuring outliers.

Many scholars have done researches to improve the rule of selecting initial cluster centers to avoid the local optimum caused by the randomness of the initial cluster centers [6]. Li et al. [7] proposed a novel initial clustering centers selection algorithm, which first calculates the density and nearest neighborhood distance of each object within a specified radius, and then constructs the ratio of density to nearest neighborhood distance, and finally objects with the $c$ largest ratio are considered as the initial cluster centers. In literature [8], a multi-initialization genetic algorithm was used to perform a global search, which decreases the possibility of the algorithm falling into the local optimum. Zhou et al. [9] designed a density-based approach to restrict the initial cluster centers to the high-density region by the greedy algorithm. In literatures [10, 11, 12], the initial cluster centers were determined by an iterative procedure. In literature [10], the object with the highest density was prioritized as the initial center, and then its neighborhood would be deleted once was identified as the initial center. Next, the object with the highest density among the remaining objects would be regarded as the next initial center. Repeat the above process until $c$ initial centers have been obtained.

However, multiple centers may be contained in a cluster caused by the irrationality of neighborhood radius. To solve this shortcoming, Yu et al. [11] optimized the K-MEDOIDS clustering method by increasing the number of clusters from 2 to $c$ in a step-wise fashion. Although the initial centers obtained by Yu’s method can fall into each cluster exactly, they may be located at the edge of the cluster and these initial centers are far from the real centers. Zhou et al. [12] adopted the roulette algorithm to determine the centers from the core objects with high density. Similarly, it also suffers from the drawback that multiple initial centers may appear in the same cluster. Meanwhile, literatures [13, 14, 15, 16] also made relevant work about the initialization of cluster centers, which have certain reference significance.

Outliers are a special subset that occupy a relatively small part of the dataset, which are greatly distinct from its neighbors in spatial distribution. In the clustering problem, objects should have different weights. In other words, the contribution degree of the outlier should be smaller than normal object. Therefore, it is worthwhile employing an appropriate weighting function to assign a weight for each object [17].

To implement the above requirement, some researchers used sample-weighted and feature-weighted to assign a weight to each object to distinguish outlier from normal object. The idea of density peak clustering (DPC) [18] was integrated into the FCM algorithm by applying local density that reflected the denseness or sparsity [12]. Yu et al. [17] adopted a sample-weighted method based on the probability distribution and the maximum entropy principle. Pimentel et al. [19] proposed two methods to obtain the weight, the first was modeling the membership degree of each object, and the second was modeling the dispersion degree by adaptive distance. Both of them have superior robustness to the dataset with outliers.

In addition to adopting the sample-weighted or feature-weighted approach, some scholars calculated the contribution degree by local outlier factor (LOF) [20]. The LOF algorithm was first proposed for outlier detection, and its main idea is that the abnormality of an object is judged by the similarity of the spatial distribution between the object and its neighbors. The greater similarity an object gains, indicates it has more opportunity to be a normal object, conversely, it is more likely to be an outlier. In literatures [21, 22, 23, 24, 25, 26, 27], many scholars continued to improve the LOF algorithm and achieved success. To avoid outliers affecting the performance of the clustering algorithm, Zhang et al. [21] proposed an improved FCM algorithm using the outlier factor to weaken the weights of outliers. Muhima et al. [22] combined the LOF algorithm with the K-Means algorithm to divide the dataset after deleting the outliers, which overcomes the influence of outliers to some extent and achieves better performance. However, many improved LOF algorithm proposed by previous scholars only depend on the neighborhood mean distance to measure the degree of local deviation, without considering the local distribution of an object within its neighborhood. Recently, Su et al. [23] redefined a new factor called LDC that first integrates variance and expectation of the neighborhood distance into LOF to compute the degree of local deviation. In literature [24], the LOF was extended by utilizing the entropy of the relative $k$ -neighborhood.

In summary, all of the above improved LOF-based algorithms judge whether the object is an outlier by employing the neighborhood mean distance to calculate the local outlier factor. Only a few papers have considered the variation of neighborhood distance, but their calculation rules can’t fully reflect the local distribution of an object within its neighborhood. In this study, we define a new local outlier factor, which is fully taking advantage of the local distribution. The local outlier factor proposed in this paper integrates both distance and distribution information. And, the module of intra-cluster k-neighborhood vector’s sum is used to describe the distribution information of an object within its neighborhood, weakening the deficiency of variance. It is more applicable compared to existing algorithms when quantifying distribution information. The details are given in Section 4.

3. Fuzzy C-means algorithm

FCM is a clustering algorithm based on objective function minimization, which iteratively updates membership matrix and cluster centers to minimize objective function until it converges. Let dataset ${\bf{X}}=\{{x_{j}}\}_{j=1}^{n}$ containing $n$ objects can be partitioned into $c$ clusters. ${\bf{V}}=\{{{v_{i}}}\}_{i=1}^{c}$ represents all cluster centers. The objective function of FCM is

$\displaystyle J({{\bf{U}},{\bf{V}}})=\sum\limits_{i=1}^{c}{\sum\limits_{j=1}^{% n}{{u_{ij}}^{m}||{x_{j}}-{v_{i}}|{|^{2}}}}$ (1) $\displaystyle\text{s.t.}\sum\limits_{i=1}^{c}{{u_{ij}}}=1,{\rm{}}{u_{ij}}\in[0% ,1]$

where $m$ is a fuzzy-weighted parameter. ${\bf{U}}={[{{u_{ij}}}]_{c\times n}}$ is called membership matrix, each of which denotes the degree of the object ${x_{j}}$ belongs to the ${i^{{\rm{th}}}}$ cluster.

Finally, the objective function $J({{\bf{U}},{\bf{V}}})$ is transformed into an unconstrained objective function by the Lagrange multiplier method, and then the bias derivatives are obtained and made equal to zero for ${v_{i}}$ , ${u_{ij}}$ , and the Lagrange multiplier, respectively. FCM algorithm iterative update formulas as follows:

$\displaystyle{u_{ij}}={{{{({{{\|{{x_{j}}-{v_{i}}}\|}^{2}}})}^{-\frac{1}{m-1}}}% }\over{\sum\limits_{k=1}^{c}{{{({{{\|{{x_{j}}-{v_{k}}}\|}^{2}}})}^{-{1\over{m-% 1}}}}}}}$ (2) $\displaystyle{v_{i}}=\frac{{\sum\limits_{j=1}^{n}{{u_{ij}}^{m}{x_{j}}}}}{{\sum% \limits_{j=1}^{n}{{u_{ij}}^{m}}}}$ (3)

4. Adaptive FCM integrated with local outlier factor

4.1 A method for initial centers selection

The randomness of initial cluster centers will affect the clustering performance of FCM. The DPC is based on the following two assumptions. The cluster centers are distributed in high-density region, and the distance among them is large. But DPC still needs to manually determine the centers. Inspired by the above assumption, we propose a method of initial centers selection based on probabilistic suppression to obtain the initial cluster centers automatically.

.

(Local Density of Object $x$ ): Let object $x\in{\bf{X}}$ , the local density $\rho(x)$ of the object $x$ is defined as:

$\displaystyle\rho(x)=\sum\limits_{o\in{\bf{X}}}{\chi({\|{x-o}\|-dc})}$ (4) $\displaystyle\chi(z)=\left\{{\begin{array}[]{ll}{1,}&{z\leqslant 0}\\ {0,}&{{\text{otherwise}}}\end{array}}\right.$ (5)

where $d c$ is a cutoff distance, generally taking the value at the first 2% of the distance between all objects [18].

.

(Cluster Center’s Probability of Object $x$ ): Assuming that the object $x$ can be selected as a cluster center, then the probability of the object $x$ becoming a cluster center is

$\displaystyle cp(x)={{\rho(x)}\over{\sum\limits_{o\in{\bf{X}}}{\rho(o)}}}$ (6)

Obviously, Eq. (6) conforms to the first assumption of DPC. The first initial center ${v_{1}}$ can be obtained by equation ${v_{1}}=\mathop{{\mathop{\rm argmax}\nolimits}}\limits_{x,x\in{\bf{X}}}cp(x)$ . The second assumption of DPC is equivalent to the following assumption. When an object is identified as a cluster center, the probabilities of its surrounding objects being regarded as cluster center should be weakened. Therefore, inspired by assumption of DPC, a novel probability suppression function $f$ is employed to implement the above requirement, which simultaneously integrate both density and distance. The probability suppression function is defined as:

$\displaystyle f({v,x})={1\over{1+{1\over{\rho(v)}}{{\|{v-x}\|}^{2}}}}$ (7)

where $v$ is a cluster center, and the value of $f({v,x})$ stands for suppression degree of the center $v$ to the cluster center’s probability of object $x$ . It is obvious that $f({v,x})$ will increase when distance between $v$ and $x$ decreases, i.e., the closer an object is to center $v$ , the more mightily its cluster center’s probability is suppressed. When the first initial center ${v_{1}}$ is obtained, the cluster center’s probability of all objects will be updated by using Eq. (8) and ${v_{1}}$ . Then, the second initial center ${v_{2}}$ can be gained by equation ${v_{2}}=\mathop{{\mathop{\rm argmax}\nolimits}}\limits_{x,x\in{\bf{X}}}cp(x)$ . Therefore, all initial centers can be obtained by using the above iterative operation based on Eqs (7) and (8).

$\displaystyle cp(x)=[{1-f({v,x})}]\cdot cp(x)$ (8)

However, to further reduce the risk of multiple initial centers falling into the same cluster, it is necessary to judge whether these existing initial centers can be reached from initial center just gained. The rule for determining reachability are shown in Definition 3.

.

(Reachable Initial Center): Given two initial centers ${v_{a}}$ and ${v_{b}}$ , let $FS=p_{ab}^{(1)},p_{ab}^{(2)},\ldots,\linebreak p_{ab}^{(q)},\ldots$ be a finite sequence and its recursive definition is shown in Eq. (9). If there exists $p_{ab}^{(q)}$ , such that $p_{ab}^{({q-1})}=p_{ab}^{(q)}$ or $p_{ab}^{(q)}={v_{b}}$ , then $p_{ab}^{(q)}$ is the last term of $F S$ . If $p_{ab}^{(q)}={v_{b}}$ , then ${v_{b}}$ can be reached from ${v_{a}}$ , and initial centers ${v_{a}}$ and ${v_{b}}$ are called reachable initial center.

$\displaystyle\left\{\begin{array}[]{ll}p_{ab}^{(q)}=v_{a},&\text{ if }q=1\\ p_{ab}^{(q)}=\left\langle p_{ab}^{(q-1)},v_{b}\right\rangle,&\text{ if }q>1% \end{array}\right.$ (9)

where $\langle p_{ab}^{({q-1})},{v_{b}}\rangle$ stands for an object closest to ${v_{b}}$ in the $\varepsilon$ -neighborhood of $p_{ab}^{({q-1})}$ . The $\varepsilon$ -neighborhood of $p_{ab}^{({q-1})}$ is a hypersphere with radius $d c$ centered at $p_{ab}^{({q-1})}$ . Reachable and unreachable cases are shown in Fig. 1. The sequence $F S$ and $\varepsilon$ -neighborhood are marked as triangle and circle, respectively. In Fig. 1a, ${v_{b}}$ can’t be reached from ${v_{a}}$ . In Fig. 1b, ${v_{b}}$ can be reached from ${v_{a}}$ .

Now, assuming that $l-1({1<l<c+1})$ initial centers have been obtained and stored in set ${\bf{V}}$ . When obtaining the next initial center ${v_{l}}$ , it is meaningful to judge whether the initial center $v^{\prime}_{l}({v^{\prime}_{l}=\mathop{{\rm{argmin}}}\limits_{v,v\in{\bf{V}}}% \|{v-{v_{l}}}\|})$ can be reached from ${v_{l}}$ . If they aren’t reachable initial center, ${v_{l}}$ is valid and should be added to ${\bf{V}}$ . Otherwise, they may be in the same cluster and ${v_{l}}$ will be abandoned, then the cluster center’s probability of all objects should be suppressed again by initial center $v^{\prime}_{l}$ . By using the reachability between initial centers, it is more possible to reduce the risk of multiple initial centers falling into the same cluster. The detail of PSM is shown in Algorithm 1.

[ht] : PSM[1] dataset ${\bf{X}}$ , the number of clusters $c$ initial centers ${\bf{V}}$

Initialize an empty set ${\bf{V}}$ ; Set $l=0$ ; Calculate the density of all objects via Eq. (4); Calculate cluster center’s probability of all objects via Eq. (6); Obtain the first initial center ${v_{1}}=\mathop{{\mathop{\rm argmax}\nolimits}}\limits_{x,x\in{\bf{X}}}[{cp(x)}]$ ; ${\bf{V}}={\bf{V}}\cup\{{{v_{1}}}\}$ ; $l=l+1$ ; $x$ in ${\bf{X}}$ $cp(x)=[{1-f({{v_{1}},x})}]\cdot cp(x)$ ;

Obtain the ${l^{{\rm{th}}}}$ initial center ${v_{l}}=\mathop{{\mathop{\rm argmax}\nolimits}}\limits_{x,x\in{\bf{X}}}[{cp(x)}]$ ; Obtain the nearest initial center $v^{\prime}_{l}=\mathop{{\rm{argmin}}}\limits_{v,v\in{\bf{V}}}\|{v-{v_{l}}}\|$ ; ( ${v_{l}}$ and ${v^{\prime}_{l}}$ are not reachable initial center) ${\bf{V}}={\bf{V}}\cup\{{{v_{l}}}\}$ ; $l=l+1$ ; $x$ in ${\bf{X}}$ $cp(x)=[{1-f({{v_{l}},x})}]\cdot cp(x)$ ; $x$ in ${\bf{X}}$ $cp(x)=[{1-f({v^{\prime}_{l},x})}]\cdot cp(x)$ ; ( $l==c$ ). ${\bf{V}}$ ;

Figure 1.

Unreachable and reachable cases.

4.2 FCM algorithm integrated with local outlier factor

Equation (3) shows that each object equally participates in the updating process of cluster centers without considering the difference among objects. In fact, each object should be assigned a unique weight. According to the analysis in Section 2, the LOF can be used to describe the deviation of objects, but most of the existing improved LOF are still inadequate. For example, Fig. 2 shows two different local distributions of object ${x_{0}}$ . In Fig. 2a and b, ${x_{0}}$ has the same neighborhood distance and the same variation of neighborhood distance. Apparently, the abnormal degrees of ${x_{0}}$ are different in the two plots. Therefore, it is not sufficient to describe the local distribution within the neighborhood, if only considering the neighborhood distance and its variation. To address the deficiency of the existing rules for calculating the local outlier factor, this paper fully takes the local distribution of the object into account and introduces the module of intra-cluster k-neighborhood vector’s sum, so as to define a new improved local outlier factor. The details are defined as follows.

Figure 2.

$x_{0}$ and its neighborhood.

.

(Intra-Cluster k-Neighborhood of Object $x$ ): Let ${N_{k}}(x)$ be a subset of ${\bf{X}}$ that contains $k$ objects. If each object of ${N_{k}}(x)$ belongs to the same cluster and satisfies Eq. (10), ${N_{k}}(x)$ is called intra-cluster k-neighborhood of the object $x$ .

$\displaystyle\|{x-o}\|\leqslant\|{x-p}\|,\forall o\in{N_{k}}(x),\forall p% \notin{N_{k}}(x)$ (10)

.

(Intra-Cluster k-Neighborhood Mean Distance of Object $x$ ): The Intra-cluster k-neighborhood mean distance of the object $x$ is defined as:

$\displaystyle{N_{\textit{k-avg-dist}}}(x)={1\over k}\sum\limits_{o\in{N_{k}}(x% )}{\|{x-o}\|}$ (11)

.

(Module of Intra-Cluster k-Neighborhood Vector’s Sum of Object $x$ ): Let ${N_{k}}(x)=\{{{o_{1}},{o_{2}},..,{o_{k}}}\}$ be the intra-cluster k-neighborhood of the object ${x}$ , and k-neighborhood vector is represented as $\{{\overrightarrow{x{o_{1}}},\overrightarrow{x{o_{2}}},\ldots,\overrightarrow{% x{o_{k}}}}\}$ . The module of intra-cluster k-neighborhood vector’s sum of the object $x$ is defined as:

$\displaystyle{N_{\textit{k-vector-m}}}(x)=\left|{\sum\limits_{o\in{N_{k}}(x)}{% \overrightarrow{xo}}}\right|$ (12)

As we know, normal object generally has a higher density than outlier in the same cluster, so the ${N_{\textit{k-vector-m}}}(x)$ of the object $x$ has the following properties:

(1)

If $x$ is a high density object, the its neighborhood are close to it and distribute uniformly around it in space. Thus, we have $|{\overrightarrow{x{o_{1}}}}|\approx|{\overrightarrow{x{o_{2}}}}|\approx\ldots% \approx|{\overrightarrow{x{o_{k}}}}|$ and $\sum\limits_{o\in{N_{k}}(x)}{\overrightarrow{xo}}\approx{\bf{0}}$ .

(2)

Conversely, if $x$ is an outlier, the values of $\{{|{\overrightarrow{x{o_{l}}}}|}\}_{l=1}^{k}$ are greatly different, and $\sum\limits_{o\in{N_{k}}(x)}{\overrightarrow{xo}}$ is a large non-zero vector.

In general, outlier has a larger value of ${N_{\textit{k-vector-m}}}(x)$ than normal object.

.

(Dispersion Degree of Object $x$ ): The dispersion degree of object $x$ is defined as:

$\displaystyle{N_{\textit{k-diff}}}(x)=({{N_{\textit{k-vector-m}}}(x)+1})\cdot(% {{N_{\textit{k-avg-dist}}}(x)+1})$ (13)

.

(Improved Local Outlier Factor of Object $x$ ): The improved local outlier factor of object $x$ is defined as:

$\displaystyle{N_{\textit{k-outlier}}}(x)={{{N_{\textit{k-diff}}}(x)}\over{{1% \over{k-1}}\sum\limits_{o\in{N_{k}}(x),o\neq x}{{N_{\textit{k-diff}}}(o)}}}$ (14)

.

(Contribution Degree of Object $x$ ): Obviously, $1/{N_{\textit{k-outlier}}}(x)$ can be used to indicate the importance of object $x$ . Thus, it is normalized to represent the contribution degree of object $x$ . The contribution degree of object $x$ is defined as:

$\displaystyle w(x)={{1/{N_{\textit{k-outlier}}}(x)}\over{\sum\limits_{o\in{\bf% {X}}}{1/{N_{\textit{k-outlier}}}(o)}}}$ (15)

The contribution degree is utilized to improve objective function of FCM, then the objective function of FCM-iLOF is defined as:

$\displaystyle J({{\bf{U}},{\bf{V}},{\bf{W}}})=\sum\limits_{i=1}^{c}{\sum% \limits_{j=1}^{n}{{u_{ij}}^{m}{w_{j}}||{x_{j}}-{v_{i}}|{|^{2}}}}$ (16) $\displaystyle\text{s.t.}\sum\limits_{i=1}^{c}{{u_{ij}}}=1,{\rm{}}{u_{ij}}\in[0% ,1]$

where ${w_{j}}$ stands for the contribution degree of object ${x_{j}}$ . The objective function of FCM-iLOF can be transformed using Lagrange multiplier method as follows:

$\displaystyle J(\mathbf{U},\mathbf{V},\mathbf{W})=\sum_{i=1}^{c}\sum_{j=1}^{n}% u_{ij}^{m}w_{j}\left\|x_{j}-v_{i}\right\|^{2}+\sum_{j=1}^{n}\lambda_{j}\left(% \sum_{i=1}^{c}u_{ij}-1\right)$ (17)

In order to obtain estimators of ${\bf{U}}$ and ${\bf{V}}$ , derivatives of $J({{\bf{U}},{\bf{V}},{\bf{W}}})$ concerning ${u_{ij}}$ and ${v_{i}}$ are set equal to zero.

The ${u_{ij}}$ is obtained as:

$\displaystyle{u_{ij}}={{{{({{{\|{{x_{j}}-{v_{i}}}\|}^{2}}})}^{-{1\over{m-1}}}}% }\over{\sum\limits_{k=1}^{c}{{{({{{\|{{x_{j}}-{v_{k}}}\|}^{2}}})}^{-{1\over{m-% 1}}}}}}}$ (18)

The ${v_{i}}$ is obtained as:

$\displaystyle{v_{i}}={{\sum\limits_{j=1}^{n}{{u_{ij}}^{m}{w_{j}}{x_{j}}}}\over% {\sum\limits_{j=1}^{n}{{u_{ij}}^{m}{w_{j}}}}}$ (19)

The membership matrix ${\bf{U}}$ and cluster centers ${\bf{V}}$ are iteratively updated to minimize the objective function. Specifically as shown in Algorithm 2.

[ht] : FCM-iLOF[1] dataset ${\bf{X}}$ , initial cluster centers ${\bf{V}}$ , the number of neighbor $k$ ; label of objects ${\bf{label}}$

Initialize parameter $\delta$ , $m$ , and $T$ ;

Update membership matrix ${\bf{U}}$ using Eq. (18); Calculate the contribution degree ${\bf{W}}$ by Eq. (15); Update cluster centers ${\bf{V}}$ by Eq. (19); (Variation of each center is less than $\delta$ or iteration times is more than $T$ ). Obtain label ${\bf{label}}=\left\{{\mathop{{\mathop{\rm argmax}\nolimits}}\limits_{i}\left({% {u_{ij}}}\right)}\right\}_{j=1}^{n}$ ; ${\bf{label}}$ ;

5. Evaluation

To verify the effectiveness of the proposed algorithms, a series of experiments are performed on synthetic datasets and real-world datasets from UCI machine learning library [29]. The comparison algorithms include DP-K-Means [10], INCK [11], FCM [5], DSFCM [12], LOFKMeans [22] and KMO [28]. All the experiments are implemented on the platform which runs Windows 10 professional workstation edition and has Intel (R) Xeon (R) Silver 4114CPU with 2.19 GHz and 64 GB RAM.

Table 1
Summary of synthetic dataset

Group	Centers	The 1st cluster		The 2nd cluster		The 3rd cluster
		$n$	$\sigma$	$n$	$\sigma$	$n$	$\sigma$
1	(0, 0), (7, 7), (6, $-$ 3)	50	1	50	1.0	50	1.0
2	(0, 0), (7, 7), (6, $-$ 3)	50	1	60	1.2	50	1.1
3	(0, 0), (7, 7), (6, $-$ 3)	50	1	70	1.4	50	1.2
4	(0, 0), (7, 7), (6, $-$ 3)	50	1	80	1.6	50	1.3
5	(0, 0), (7, 7), (6, $-$ 3)	50	1	90	1.8	50	1.4
6	(0, 0), (7, 7), (6, $-$ 3)	50	1	100	2.0	50	1.5
7	(0, 0), (7, 7), (6, $-$ 3)	50	1	110	2.2	50	1.6
8	(0, 0), (7, 7), (6, $-$ 3)	50	1	120	2.4	50	1.7
9	(0, 0), (7, 7), (6, $-$ 3)	50	1	130	2.6	50	1.8
10	(0, 0), (7, 7), (6, $-$ 3)	50	1	140	2.8	50	1.9

5.1 Synthetic dataset

Table 1 shows the parameters of synthetic datasets with three clusters, where include 10 groups of parameters. Each one is used to generate 10 groups of datasets, a total of 100 groups of datasets. In Table 1, $n$ and $\sigma$ represent the number of samples and standard deviation, respectively.

5.2 Evaluation of initial cluster centers

The effectiveness of the PSM algorithm is verified on the synthetic datasets with known centers. The mean error $E$ between the initial cluster center and the real cluster center is used as the evaluation index, which is defined as Eq. (20). Figure 3 shows the box plot of the $E$ of DP-K-means, INCK and PSM algorithms on 100 synthetic datasets. Meanwhile, in order to show the initial cluster centers obtained by the three algorithms intuitively, a dataset is selected to show the graphical results of the initial cluster centers in Fig. 4. In this experiment, the parameter $\alpha$ of DP-K-means is set to 0.15 and the parameter $\lambda$ of INCK is set to 8. The number of clusters is set to 3 in three algorithms.

$\displaystyle E=\frac{1}{c}\sum\limits_{i=1}^{c}||v\_\textit{init}_{i}-v\_% \textit{real}_{i}||$ (20)

where $v\_\textit{init}_{i}$ is the $i^{\text{th}}$ initial cluster center, $v\_\textit{real}_{i}$ is $i^{\text{th}}$ real cluster center.

Figure 3.

Error of initial centers.

Figure 4.

Initial centers obtained by different algorithms.

From Fig. 3, we see that the PSM algorithm achieves the best result among the three algorithms in the aspect of $E$ . The PSM algorithm has the minimum error $E$ , and its performance is stable. As can be seen from the Fig. 4 and the description of Section 2, the initial centers obtained by DP-K-means may fall into the same cluster because it’s hard to determine the neighborhood radius. Although the INCK algorithm can solve this shortcoming, there are offset between initial centers and real centers. Compared with the above two algorithms, the initial centers obtained by PSM algorithm can be allocated correctly to each cluster and are close to the real centers, which can raise the probability of convergence to the global optimum.

5.3 Evaluation of the anti-noise ability

To easily control the local distribution of objects, the two synthetic datasets in Fig. 2 are adopted to illustrate the effectiveness of contribution degree proposed in this paper. As shown in Fig. 5, the contribution degree of LOF and LDC is compared with Eq. (15) in this paper. Due to LOF and LDC factors are used to describe the abnormity, the reciprocal of them are calculated to quantify the contribution degree. From Fig. 2, it is not difficult to find that the object ${x_{0}}$ has the lowest deviation in Fig. 2a, the object ${x_{0}}$ has the highest deviation in Fig. 2b. That means the object ${x_{0}}$ has the highest contribution degree in Fig. 2a, and the object ${x_{0}}$ has the lowest contribution degree in Fig. 2b. From Fig. 4, it can be seen that LOF fails on the dataset in Fig. 2a, and LDC can only describe the difference of objects in Fig. 2b. Ours can accurately distinguish the difference of the objects in the two datasets.

Figure 5.

Contribution degree of each object in Fig. 2 calculated by the reciprocal of different local outlier factors.

Next, we select 10 datasets generated by the first synthetic dataset parameter and add a few outliers to each cluster to evaluate the anti-noise ability of different algorithms. For the convenience of subsequent description, the 10 datasets of adding outliers are called DSO. The outliers added to each cluster are shown in Table 2, which are far from cluster centers. To visually show the dataset with noise, a dataset is selected from DSO and shown in Fig. 6. Obviously, the artificially added outliers are far away from the corresponding clusters, which can be used to test the anti-noise ability. The FCM-iLOF and four comparison algorithms are executed on DSO, and the intra-cluster distance of each algorithm is shown Table 3. The intra-cluster distance is defined in Eq. (21). As shown in Eq. (19), in order to make the centers closer to the real centers when the algorithm converges, a small weight will be assigned to the outlier to weaken the influence of the outlier on the center update. Therefore, a smaller value of the intra-cluster distance indicates a better anti-noise ability. Note: the four comparison algorithms are executed 100 times respectively. In Table 3, the experimental results of each row are the mean of the intra-cluster distance of the 100 clustering results. In this experiment, we set $k=10$ for LOFKMeans, $\gamma=3$ and ${n_{0}}=0.1n$ for KMOR, and $k=10$ for FCM-iLOF. The number of clusters is set to 3.

$\displaystyle\textit{ICD}=\frac{1}{c}\sum\limits_{i=1}^{c}\sum\limits_{x\in{C_% {i}}}||x-v\_\textit{con}_{i}||$ (21)

where ${C_{i}}$ denotes the $i^{{\text{th}}}$ cluster, $v\_\textit{con}_{i}$ stands for the ${i^{{\rm{th}}}}$ cluster center when algorithm converges.

Table 2

Added outliers artificially

The 1st cluster	The 2nd cluster	The 3rd cluster
( $-$ 4, $-$ 4)	(10, $-$ 6)	(12, 6)
( $-$ 2, $-$ 6)	(12, $-$ 4)	(12, 8)
( $-$ 4, $-$ 2)	(10, $-$ 8)	(14, 4)
( $-$ 6, $-$ 2)	(12, $-$ 6)	(14, 7)
( $-$ 2, $-$ 4)	(14, $-$ 4)	(14, 10)

Table 3

ICD of different clustering algorithms on DSO

Group	FCM	DSFCM	LOFKMeans	KMOR	FCM-iLOF
1	1.8295	1.7463	1.7656	1.9404	1.7449
2	1.6893	1.6874	1.6879	1.9398	1.6819
3	1.7248	1.7152	1.7179	1.9415	1.7129
4	1.6730	1.6720	1.6733	1.8303	1.6659
5	1.7788	1.7780	1.7857	1.9414	1.7761
6	1.7912	1.6265	1.6269	1.8275	1.6167
7	1.6463	1.6402	1.6475	1.8699	1.6377
8	1.7761	1.7674	1.7757	1.9923	1.7684
9	1.7325	1.7247	1.7249	1.8809	1.7206
10	1.7040	1.8086	1.7284	1.9604	1.7026

Figure 6.

A dataset with artificially added noise.

The intra-cluster distance of each algorithm on DSO is shown in Table 3, and the best results of all algorithms are highlighted in bold font. It can be seen from Table 3 that although the DSFCM algorithm has the smallest intra-cluster distance in the ${8^{{\rm{th}}}}$ group, the FCM-iLOF algorithm perform the best in the other 9 groups. Obviously, the proposed algorithm has not only a small intra-cluster distance but also stability. In general, the FCM-iLOF algorithm has better anti-noise ability than the four comparison algorithms.

5.4 Evaluation of clustering performance

In this subsection, we chose real-world datasets obtained from UCI machine learning library, including Iris, Seeds, Wine, Glass, and New Thyroid, to evaluate the performance of five different algorithms. The detail descriptions of datasets are shown in Table 4. These datasets are often used for clustering and outlier detection [32, 33, 34, 35, 36]. Rand Index ( $R I$ ) [30] is a commonly used measure to evaluate clustering performance. It is calculated by the rule shown in Eq. (22). $R I$ measures the degree of agreement between the ground truth partition and the predictive one obtained by the algorithm [31]. The value of $R I$ ranges from 0 to 1. A greater value of $R I$ indicates a better clustering performance. The FCM, DSFCM, LOFKMeans, KMOR, and FCM-iLOF algorithms are executed on the real-world datasets, and the values of $R I$ are shown in Table 5. Note: the parameter setting is the same as Section 5.3. The number of clusters is set according to Table 4.

$\displaystyle RI=\frac{{A+B}}{{C_{n}^{2}}}$ (22)

where $A$ denotes the number of pairs of points with different labels and belonging to different clusters, $B$ stands for the number of pairs of points which belong to the same class and have the same labels.

Table 4

Descriptions of real-world datasets

Name	Instances	Dimension	Clusters	Missing values
Iris	150	4	3	No
Seeds	210	7	3	No
Wine	178	13	3	No
Glass	214	9	6	No
New Thyroid	215	5	3	No

Table 5

$R I$ of different clustering algorithms on real-world datasets

Dataset	FCM	DSFCM	LOFKMeans	KMOR	FCM-iLOF
Iris	0.7976	0.7587	0.8010	0.7584	0.8368
Seeds	0.7275	0.7225	0.8994	0.8997	0.8941
Wine	0.6643	0.6667	0.9457	0.9482	0.9543
Glass	0.5430	0.6301	0.5923	0.6507	0.6814
New Thyroid	0.5481	0.4972	0.8112	0.6498	0.8329

In order to decrease the influence of the randomness of initial centers on FCM, DSFCM, LOFKMeans and KMOR, these four comparison algorithms are executed 100 times respectively, and the mean $R I$ is taken as the final evaluation index. Table 5 shows that the $R I$ of the KMOR and FCM-iLOF is 0.8997 and 0.8941 on Seeds dataset, respectively. Although the performance of the FCM-iLOF is inferior to that of KMOR on Seeds dataset, their $R I$ is very close. Except for Seeds dataset, the FCM-iLOF has the highest $R I$ on the Iris, Wine, Glass and New Thyroid datasets. Therefore, the FCM-iLOF algorithm has better performance compared with the other four algorithms.

6. Conclusions

In this study, we propose an adaptive FCM algorithm integrated with local outlier factor.

(1)

The probability of an object becoming a initial center is defined by the local density, and the initial centers are obtained by combining cluster center’s probability with probability suppression function incrementally.

(2)

A new improved local outlier factor is reconstructed by defining the intra-cluster k-neighborhood mean distance and module of intra-cluster k-neighborhood vector’s sum, and it is integrated into FCM to improve the iterative formula of cluster center.

(3)

The effectiveness of the proposed algorithm is verified on synthetic and real-world datasets. The initial centers obtained by PSM are close to the real centers, which is helpful to increase the possibility of algorithm convergence to the global optimum. Not only is the reconstructed local outlier factor effective in the aspect of describing the degree of deviation, but also it improves the anti-noise performance of FCM.

(4)

Besides, after replacing intra-cluster k-neighborhood in Definition 4 with k-neighborhood, the local outlier factors reconstructed in this study can be applied to the outlier detection directly.

The shortcoming of this paper is that the number of clusters can’t be obtained adaptively, and it still needs to be given artificially. In the future, it is significant to enhance the theoretical proof and analysis, and to apply the proposed approach in engineering applications.

Footnotes

Acknowledgments

This work was supported by the Key Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-K201900505), Chongqing University Innovation Research Group Funding (CXQT20015), and Innovation Project of Chongqing Normal University (YKC20032).

References

Nie

Wang

Jordan

and Huang

, The constrained laplacian rank algorithm for graph-based clustering, in: Proceedings of the AAAI Conference on Artificial Intelligence, 30, AAAI, Arizona, 2016, pp. 1969–1976.

Kouhi

Seyedarabi

and Aghagolzadeh

, Robust FCM clustering algorithm with combined spatial constraint and membership matrix local information for brain MRI segmentation, Expert Systems with Applications 146 (2020).

Memon

K.H.

and Lee

D.H.

, Generalised kernel weighted fuzzy c means clustering algorithm with local information, Fuzzy Sets and Systems 340 (2018), 91–108.

Nie

Ding

Luo

and Huang

, Improved minmax cut graph clustering with nonnegative relaxation, in: Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, 2010, pp. 451–466.

Bezdek

J.C.

, Pattern recognition with fuzzy objective function algorithms, Advanced Applications in Pattern Recognition 22(1171) (1981), 203–239.

Bezdek

J.C.

, A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2 (1) (1980), 1–8.

Cai

Yang

Zhang

and Zhao

, A novel algorithm for initial cluster center selection, IEEE Access 7 (2019) 74683–74693.

Broin

P.Ó.

Smith

T.J.

and Golden

A.A.

, Alignment-free clustering of transcription factor binding motifs using a genetic-k-medoids approach, BMC Bioinformatics 16(1) (2015), 1–12.

Zhou

S.B.

W.X.

and Chai

, Data-weighted fuzzy C-means clustering algorithm, Systems Engineering and Electronics 36(11) (2014), 2314–2319.

10.

Zhu

and Ma

, An effective partitional clustering algorithm based on new clustering validity index, Applied Soft Computing 71 (2018), 608–621.

11.

Liu

Guo

and Liu

, An improved k-medoids algorithm based on step increasing and optimizing medoids, Expert Systems with Applications 92 (2018), 464–473.

12.

Zhou

S.B.

W.X.

and Xu

L.K.

, Improved FCM algorithm based on density peaks and spatial neighborhood information, Chinese Journal of Scientific Instrument 40(4) (2019), 137–144.

13.

Naik

Satapathy

S.C.

and Parvathi

, Improvement of initial cluster center of c-means using teaching learning based optimization, Procedia Technology 6 (2012), 428–435.

14.

Liu

and Wang

X.Y.

, K mean cluster algorithm with refined initial center point, Journal of Shenyang Normal University (Natural Science Edition) 27(4) (2009), 448–450.

15.

Feng

Hao

Chen

Jin

and Zhao

, An improved PAM algorithm for optimizing initial cluster center, in: 2012 IEEE International Conference on Computer Science and Automation Engineering, IEEE, 2012, pp. 24–27.

16.

Singh

R.P.

and Rajpoot

D.S.

, Efficient identification of initial clusters centers for partitioning clustering methods, in: 2019 Fifth International Conference on Image Information Processing (ICIIP), IEEE, 2019, pp. 131–136.

17.

Yang

M.S.

and Lee

E.S.

, Sample-weighted clustering methods, Computers & Mathematics with Applications 62(5) (2011), 2200–2208.

18.

Rodriguez

and Laio

, Clustering by fast search and find of density peaks, Science 344(6191) (2014), 1492–1496.

19.

Pimentel

B.A.

and de Souza

R.M.

, Multivariate fuzzy c-means algorithms with weighting, Neurocomputing 174 (2016), 946–965.

20.

Breunig

M.M.

Kriegel

H.P.

R.T.

and Sander

, LOF: Identifying density-based local outliers, in: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, New York, 2000, pp. 93–104.

21.

Zhang

D.D.

You

Z.Y.

, and Chen

S.G.

, Optimal clustering algorithm based on modified local outlier factor detection, Microelectronics & Computer 36(11) (2019), 43–48.

22.

Muhimal

R.R.

Kurniawan

and Pambudi

O.T.

, A LOF k-means clustering on hotspot data, international Journal of Artificial Intelligence & Robotics 2(1) (2020), 29–33.

23.

Xiao

Ruan

Wang

and Xu

, An efficient density-based local outlier detection approach for scattered data, IEEE Access 7 (2019), 1006–1020.

24.

Yang

Wang

Wei

and Li

, An outlier detection approach based on improved self-organizing feature map clustering algorithm, IEEE Access 7 (2019), 115914–115925.

25.

Schubert

Zimek

and Kriegel

H.P.

, Generalized outlier detection with flexible kernel density estimates, in: Proceedings of the 2014 SIAM International Conference on Data Mining (SDM), SIAM, 2014, pp. 542–550.

26.

Zhang

Yin

and Huang

, An optimized LOF algorithm based on tree structure, in: 2020 3rd International Conference on Artificial Intelligence and Big Data (ICAIBD), IEEE, 2020, pp. 167–171.

27.

Zhang

Liu

Cui

Sun

Yang

and Guo

, An outlier detection algorithm for electric power data based on DBSCAN and LOF, in: Proceedings of the 9th International Conference on Computer Engineering and Networks, Springer, Singapore, 2021, pp. 1097–1106.

28.

Gan

and Ng

M.K.P.

, k-means clustering with outlier removal, Pattern Recognition Letters 90 (2017), 8–14.

29.

Dua

and Graff

, UCI machine learning repository, in Irvine, CA: University of California, School of Information and Computer Science, 2019.

30.

Rand

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association 66(336) (1971), 846–850.

31.

Ouadfel

and Abd Elaziz

, A multi-objective gradient optimizer approach-based weighted multi-view clustering, Engineering Applications of Artificial Intelligence 106 (2021).

32.

Tax

D.M.J

and Duin

R.P.W

, Support vector domain description, Pattern Recognition Letters 20 (1999), 1191–1199.

33.

Hasan

M.A.

Chaoji

Salem

and Zaki

M.J.

, Robust partitional clustering by outlier and density insensitive seeding, Pattern Recognition Letters 30(11) (2009), 994–1002.

34.

Sethi

J.R.

, Study of distance-based outlier detection methods, 2013.

35.

Gupta

Eswaran

Shah

Akoglu

and Faloutsos

, Beyond outlier detection: LookOut for pictorial explanation, Machine Learning and Knowledge Discovery in Databases (2019) 122–138.

36.

Kokkula

and Musti

N.M.

, Classification and outlier detection based on topic based pattern synthesis, Machine Learning and Data Mining in Pattern Recognition (2013) 99–114.

Adaptive fuzzy C-means clustering integrated with local outlier factor

Abstract

Keywords

1. Introduction

3. Fuzzy C-means algorithm

4.1 A method for initial centers selection

.

.

.

.

.

.

.

.

.

Table 1 Summary of synthetic dataset

5.2 Evaluation of initial cluster centers

Footnotes

Acknowledgments

References

Table 1
Summary of synthetic dataset