Incorporating spatial context into fuzzy-possibilistic clustering using Bayesian inference

Abstract

Data clustering is the generic process of splitting a set of datums into a number of homogenous sets. Nevertheless, although a clustering process inputs datums as a set of separate mathematical objects, these entities are in fact correlated within a spatial context specific to the problem class in hand. For example, when the data acquisition process yields a 2D matrix of regularly sampled measurements, as it is the case with image sensors which utilize different modalities, adjacent datums are highly correlated. Hence, the clustering process must take into consideration the spatial context of the datums. A review of the literature, however, reveals that a significant majority of the well-established clustering techniques in the literature ignore spatial context. Other approaches, which do consider spatial context, however, either utilize pre- or post-processing operations or engineer into the cost function one or more regularization terms which reward spatial contiguity. We argue that employing cost functions and constraints based on heuristics and intuition is a hazardous approach from an epistemological perspective. This is in addition to the other shortcomings of those approaches. Instead, in this paper, we apply Bayesian inference on the clustering problem and construct a mathematical model for data clustering which is aware of the spatial context of the datums. This model utilizes a robust loss function and is independent of the notion of homogeneity relevant to any particular problem class. We then provide a solution strategy and assess experimental results generated by the proposed method in comparison with the literature and from the perspective of computational complexity and spatial contiguity.

Keywords

Fuzzy clustering Bayesian modeling robust clustering correlated clustering spatial context bilateral clustering

1 Introduction

Unsupervised grouping of datums into homogenous clusters is an important prerequisite in a vast number of signal and image processing applications. In essence, these applications require a stable process which inputs a set of datums, of a particular mathematical model, and divides them into an appropriate number of clusters, where the clusters comply with a particular notion of homogeneity. Additionally, this process is required to satisfy conditions such as being robust, allowing for datums with different priorities, and detecting outliers. Moreover, it is an important advantage for such a process to be independent of any particular datum and cluster model. When this condition is satisfied, reuse of the clustering module becomes possible and time and effort can be saved. Also, from the point of view of practical usability, a clustering process which depends on a few parameters with perceptual definitions is highly preferred to an alternative which requires the deliberate adjustment of a large set of incomprehensibleconfiguration variables.

Clustering algorithms are historically defined as operators which function on sets. It is important, however, to emphasize that the input set of datums to any clustering algorithm is inherently ordered. In other words, datums generally correspond to a physical phenomenon which is spatially correlated. This is important because a significant majority of the algorithms in the literature are indifferent to the order of the datums. Nevertheless, the spatial context of the datums is important because we generally expect spatial contiguity in the classification results. Additionally, the spatial context of a datum is an important component in clustering it and provides helpful clues when noise and cluster combination produce ambiguity for the clustering process.

In practice, the dominant approach for delivering a novel clustering algorithm is to translate a verbal description of the purpose of clustering into an objective function, one or more constraints, and a potential set of pruning procedures. In this framework, it is essentially the intuition of the researchers which drives the development of new clustering algorithms. As such, the inclusion of a new regularization term or a novel weight component in the objective function is commonly justified through verbal descriptions which are based on heuristics and metaphors. In other words, the development of fuzzy clustering algorithms is often output-based, where experimental results provide the “proof” that the heuristics of the researchers have been correct. Comparison of different fuzzy clustering algorithms, too, is often carried out based on the outputs of the respective formulations.

In this paper, we utilize Bayesian inference and directly derive the mathematical model for loss in a fuzzy-possibilistic clustering problem when spatial context is taken into consideration. This loss model is independent of any particular datum or cluster models and utilizes a generic notion of homogeneity. This model also utilizes a robust loss function and employs the concept of the relevance of each datum to the set. Moreover, the loss model developed in this paper is independent of any parameter which might have to be trained or tuned empirically or through repetition of any process. Hence, it is important to emphasize that the process utilized in this work develops the objective function organically and is, therefore, characteristically different from previous works in the field which are based on heuristics.

The rest of this paper is organized as follows. First, in Section 2, we review the related literature and then, in Section 3, we provide the developed method. Subsequently, in Section 4, we present experimental results produced by the developed method on three different problem classes and, in Section 5, we provide the concluding remarks.

2 Literature review

2.1 Notion of membership

The notion of membership is a key point of distinction between different clustering frameworks. Essentially, membership may be Hard or Fuzzy. Within the context of hard membership, each datum belongs to one cluster and is different from all other clusters. The fuzzy membership regime, however, maintains that each datum in fact belongs to all clusters, with the stipulation that the degree of membership to different clusters is different. K-means [104] and Hard C-means (HCM) [55] clustering algorithms, for example, utilize hard membership values. The reader is referred to [74] and the references therein for a history of K-means clustering and other methods closely related to it. Iterative Self-Organizing Data Clustering (ISODATA) [7] is a hard clustering algorithm as well.

With the introduction of Fuzzy Theory [161], many researchers incorporated this more natural notion into clustering algorithms [8 , 155]. The premise for employing a fuzzy clustering algorithm is that fuzzy membership is more applicable in practical settings, where, generally, no distinct line of separation is present between the clusters. Additionally, from a practical perspective, it is observed that hard clustering techniques are extremely more prone to falling into local minima [40]. The reader is referred to [16, 18] for the wide array of fuzzy clustering methods developed in the past few decades.

Initial work on fuzzy clustering was done by Ruspini [136] and Dunn [41] and it was then generalized by Bezdek [16] into Fuzzy C-means (FCM). In FCM, datums, which are denoted as x₁, ⋯ , x_N, belong to $ℝ^{k}$ and clusters, which are identified as ψ₁, ⋯ , ψ_C, are represented as points in $ℝ^{k}$ . FCM makes the assumption that the number of clusters, C, is known through a separate process or expert opinion and minimizes the following objective function,

$Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} f_{nc}^{m} {∥ x_{n} - ψ_{c} ∥}^{2} .$ (1)This objective function is heuristically suggested to result in appropriate clustering results and is constrained by,

$\sum_{c = 1}^{C} f_{nc} = 1, \forall n .$ (2)Here, f_nc ∈ [0, 1] denotes the membership of x_nto ψ_c.

In (1), m > 1 is the fuzzifier (also called weighing exponent and fuzziness). The optimal choice for the value of the fuzzifier is a debated matter [94] and is suggested to be “an open question” [160]. Bezdek [145] suggests that 1 < m < 5 is a proper range and utilizes m = 2. The use of m = 2 is suggested by Dunn [41] in his early work on the topic as well and also by Frigui et al. [47], among others [79]. Bezdek [15] provided physical evidence for the choice of m = 2 and Pal et al. [116] suggested that the best choice for m is probably in the interval [1.5, 2.5]. Yu et al. [160] argue that the choices for the value of m are mainly empirical and lack a theoretical basis. They worked on providing such a basis and suggested that “a proper m depends on the data set itself” [160].

Recently, Zhou et al. [165] proposed a method for determining the optimal value of m in the context of FCM. They employed four Cluster Validity Index (CVI) models and utilized repeated clustering for m ∈ [1.1, 5] on four synthetic data sets as well as four real data sets adopted from the UCI Machine Learning Repository [99] (refer to [17] for a review of CVIs and [138] for coverage in the context of relational clustering). The range for m in that work is based on previous research [115] which provided lower and upper bounds on m. The investigation carried in [165] yields that m = 2.5 and m = 3 are optimal in many cases and that m = 2 may in fact not be appropriate for an arbitrary set of datums. This result is in line with other works which demonstrate that larger values of m provide more robustness against noise and the outliers. Nevertheless, significantly large values of m are known to push the convergence towards the sample mean, in the context of Euclidean clustering [160]. Wu [150] analyzes FCM and some of its variants in the context of robustness and recommends m = 4.

Rousseeuw et al. [135] suggested to replace $f_{nc}^{m}$ with ${α f}_{nc} + (1 - α) f_{nc}^{2}$ , for a known 0 < α < 1. Klawonn et al. [77, 78] suggested to generalize this effort and to replace $f_{nc}^{m}$ with an increasing and differentiable function g (f_nc).

Pedrycz [122 –124] suggested to modify (2) in favor of customized $\underset{nc}{\sum f}$ constraints for different values of n. That technique allows for the inclusion of a priori information into the clustering framework and is addressed as Conditional Fuzzy C-means (CFCM). The same modification is carried out in Credibilistic Fuzzy C-Means (CFCM) [27, 28], in which the “credibility” of datums is defined based on the distance between datums and clusters. Therefore, in that approach, (2) is modified in order to deflate the membership of outliers to the set of clusters (also see [156]). Customization of (2) is also carried out in Cluster Size Insensitive FCM (csiFCM) [112] in order to moderate the impact of datums in larger clusters on an smaller adjacent cluster. Leski [94] provides a generalized version of this approach in which $\sum {β f}_{nc}^{α}$ is constrained.

2.2 Prototype-based clustering

It is a common assumption that the notion of homogeneity depends on datum-to-datum distances. This assumption is made implicitly when clusters are modeled as prototypical datums, also called clustroids or cluster centroids, as in FCM, for example. A prominent choice in these works is the use of the Euclidean distance function [89]. For example, the potential function approach considers datums as energy sources scattered in a multi-dimensional space and seeks peak values in the field [154] (also see [12 , 133]). We argue, however, that the distance between the datums may not be either defined or meaningful and that what the clustering algorithm is to accomplish is the minimization of datum-to-cluster distances. For example, when datums are to be clustered into certain lower-dimensional subspaces, as is the case with Fuzzy C-Varieties (FCV) [95], the Euclidean distance between the datums is irrelevant.

Note that, prototype-based clustering does not necessarily require explicitly present prototypes. For example, in kernel-based clustering, it is assumed that a non-Euclidean distance can be defined between any two datums. The clustering algorithm then functions based on an FCM-style objective function and produces clustroids which are defined in the same feature space as the datums [146]. These cluster prototypes may not be explicitly represented in the datum space, but, nevertheless, they share the same mathematical model as the datums [151] (the reader is referred to a review of Kernel FCM (KFCM) and Multiple-Kernel FCM (MKFCM) in [24] and several variants of KFCM in [25]). Another example for an intrinsically prototype-based clustering approach in which the prototypes are not explicitly“visible” is the Fuzzy PCA-guided Robust k-means (FPR k-means) clustering algorithm [66] in which a centroid-less formulation [162] is adopted which, nevertheless, defines homogeneity as datum-to-datum proximity.

Relational clustering approaches constitute another class of algorithms which are intrinsically based on datum-to-datum distances (for example refer toRelational FCM (RFCM) [63] and its non-Euclidean extension Nerf C-means [60]). The goal of this class of algorithms is to group datums into self-similar bunches. Another algorithm in which the presence of prototypes may be less evident is Multiple Prototype Fuzzy Clustering Model (FCMP) [107], in which datums are described as a linear combination of a set of prototypes, which are, nevertheless, members of the same $ℝ^{k}$ as the datums are. Additionally, some researchers utilize L_r-norms, for r ≠ 2 [19 , 72], or other datum-to-datum distance functions [75].

We argue that a successful departure from the assumption of prototypical clustering is achieved when clusters and datums have different mathematical models. For example, the Gustafson-Kessel algorithm [57] models a cluster as a pair of a point and a covariance matrix and utilizes the Mahalanobis distance between datums and clusters (also see the Gath-Geva algorithm [50]). Fuzzy shell clustering algorithms [79] utilize more generic geometrical structures. For example, the FCV [95] algorithm can detect lines, planes, and other hyper-planar forms, the Fuzzy C Ellipsoidal Shells (FCES) [46] algorithm searches for ellipses, ellipsoids, and hyperellipsoids, and the Fuzzy C Quadric Shells (FCQS) [79] and its variants seek quadric and hyperquadric clusters (also see Fuzzy C Plano-Quadric Shells (FCPQS) [88]).

2.3 Robustification

Dave et al. [36] argue that the function of membership values in FCM and the concept of weight functions in robust statistics are related. Based on this perspective, it is argued that the classical FCM in fact provides an indirect means for attempting robustness. Nevertheless, it is known that FCM and other least square methods are highly sensitive to noise [28]. Hence, there has been ongoing research on the possible modifications of FCM in order to provide a (more) robust clustering algorithm [43,93, 43,93]. Dave et al. [36] provide an extensive list of relevant works and outline the intrinsic similarities within a unified view (also see [33, 56]).

The first attempt to robustifying FCM, based on one account [36], is the Ohashi Algorithm [56, 114]. That work adds a noise class to FCM and writes the robustified objective function as,

$\begin{matrix} Δ = α \sum_{c = 1}^{C} \sum_{n = 1}^{N} f_{nc}^{m} {∥ x_{n} - ψ_{c} ∥}^{2} + (1 - α) \sum_{n = 1}^{N} {(1 - \sum_{c = 1}^{C} f_{nc})}^{m} . \end{matrix}$ (3)

The transformation from (1) to (3) was suggested independently by Dave [33, 34] when he developed the Noise Clustering (NC) algorithm as well. The core idea in NC is that there exists one additional imaginary prototype which is at a fixed distance from all of the datums and represents noise. That approach is similar to modeling approaches which perform consecutive identification and deletion of one cluster at a time [73, 166]. Those methods, however, are expensive to carry out and require reliable cluster validity measures.

Krishnapuram et al. [85] extended the idea behind NC and developed the Possibilistic C-means (PCM) algorithm by rewriting the objective function as, $Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} t_{nc}^{m} {∥ x_{n} - ψ_{c} ∥}^{2} + \sum_{c = 1}^{C} η_{c} \sum_{n = 1}^{N} (1 - t_{nc})^{m} .$ (4) Here, t_nc denotes the degree of representativeness or typicality of x_n to ψ_c (also addressed as a possibilistic degree in contrast to the probabilistic model utilized in FCM). As expected from the modification in the way t_nc is defined, compared to that of f_nc, PCM removes the sum of one constraint, shown in (2), and in effect extends the idea of one noise class in NC into C noise classes. In other words, PCM could be considered as the parallel execution of C independent NC algorithms, each seeking a cluster. Therefore, the value of C is somewhat arbitrary in PCM [36]. For this reason, PCM has been called a mode-seeking algorithm where C is the upper bound on the number of modes.

We argue that the interlocking mechanism present in FCM, i.e. (2), is valuable in that, not only clusters seek homogenous sets, but that they are also forced into more optimal “positions” through forces applied by competing clusters. In other words, borrowing the language used in [89], in FCM clusters “seize” datums and it is disadvantageous for multiple clusters to claim high membership to the same datum. There is no phenomenon, however, in NC and PCM which corresponds to this internal factor. Additionally, it is likely that PCM clusters coincide and/or leave out portions of the data unclustered [10]. In fact, it is argued that the fact that at least some of the clusters generated through PCM are non-coincidental is because PCM gets trapped into local minimum [142]. PCM is also known to be more sensitive to initialization than other algorithms in its class [89].

It has been argued that both concepts of possibilistic degrees and membership values have positive contributions to the purpose of clustering [32]. Hence, Pal et al. [117] combined FCM and PCM and rewrote the optimization function of Fuzzy Possiblistic C-Means (FPCM) as minimizing,

$Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} (f_{nc}^{m} + t_{nc}^{η}) {∥ x_{n} - ψ_{c} ∥}^{2},$ (5)subject to (2) and $\sum_{n = 1}^{N} t_{nc} = 1, \forall c$ . That approach was later shown to suffer from different scales for f_nc and t_nc values, especially when N ⪢ C, and, therefore, additional linear coefficients and a PCM-style term were introduced to the objective function [118] (also see [147] for another variant). It has been argued that the resulting objective function employs four correlated parameters and that the optimal choice for them for a particular problem instance may not be trivial [89]. Additionally, in the new combined form, f_nc cannot necessarily be interpreted as a membership value [89].

Weight modeling is an alternative robustification technique and is exemplified in the algorithm developed by Keller [76], in which the objective function is rewritten as,

$Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} f_{n c}^{m} u_{c} \frac{1}{ω_{n}^{q}} {∥ x_{n} - ψ_{c} ∥}^{2},$ (6)subject to $\sum_{n = 1}^{N} ω_{n} = ω$ . Here, the values of on are updated during the process as well.

Frigui et al. [47] included a robust loss function in the objective function of FCM and developed Robust C-Prototypes (RCP),

$Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} f_{nc}^{m} u_{c} (∥ x_{n} - ψ_{c} ∥) .$ (7)Here, u_cparan· is the robust loss function for cluster c. They further extended RCP and developed an unsupervised version of RCP, nicknamed URCP [47]. Wu et al. [151] used u_c (x) =1 - e^-βx² and developed Alternative HCM (AHCM) and Alternative FCM (AFCM) algorithms (also see [164]).

2.4 Number of clusters

The classical FCM and PCM, and many of their variants, are based on the assumption that the number of clusters is known (the reader is referred to [16, Chapter 4] and [71, Chapter 4] for reviews of this topic). While PCM-style formulations may appear to relax this requirement, the corresponding modification is carried out at the cost of yielding an ill-posed optimization problem [89]. Hence, provisions have been added to existing clustering algorithms in order to address this challenge. Repeating the clustering procedure for different numbers of clusters [50 , 87] and progressive clustering are two of the approaches devised in the literature.

Among the many variants of progressive clustering are methods which start with a significantly large number of clusters and freeze “good” clusters [83 , 88], approaches which combine compatible clusters [35 , 88], and the technique of searching for one “good” cluster at a time until no more is found [73 , 139]. Use of regularization terms in order to push the clustering results towards the “appropriate” number of clusters is another approach taken in the literature [48]. These regularization terms, however, generally involve additional parameters which are to be set carefully, and potentially per problem instance (for example see the mixed C-means clustering model proposed in [117]).

Dave et al. [36] conclude in their 1997 paper that the solution to the general problem of robust clustering, when the number of clusters is unknown, is “elusive” and that the techniques available in the literature each have their limitations. In this paper, we acknowledge that the problem of determining the appropriate number of clusters is hard to solve and even hard to formalize. Additionally, we argue that this challenge is equally applicable to many clustering problems independent of the particular clustering model utilized in the algorithms. Therefore, we designate this challenge as being outside the scope of this contribution and assume that either the appropriate number of clusters is known or that an exogenous means of cluster pruning is available which can be utilized within the context of the algorithm developed in this paper.

2.5 Weighted clustering

Many fuzzy and possibilistic clustering algorithms make the assumption that the datums are equally important. Weighted fuzzy clustering, however, works on inputs datums which have an associated positive weight [73]. This notion can be considered as a marginal case of clustering fuzzy data [42]. Other examples for this setting include clustering of a weighted set, clustering of sampled data, clustering in the presence of multiple classes of datums with different priorities [3], and a measure used in order to speed up the execution through data reduction [22 , 157]. Nock et al. [109] formalize the case in which weights are manipulated in order to move the clustering results towards datums which are harder to include regularly. Chen et al. [23] utilize density motivated weights in order to reduce the impact of outliers (refer to [58] for different variants of this framework). Semi Supervised FCM (ssFCM) [13] uses weight factors based on Euclidean norm in order to balance the sizes of different hyper-spherical shaped clusters based on user intervention. Note that the extension of FCM on weighted sets has been developed under different names, including Density-Weighted FCM (WFCM) [62], Fuzzy Weighted C-means (FWCM) [96], and New Weighted FCM (NW-FCM) [70].

2.6 Spatial context

FCM effectively ignores spatial context [137]. In other words, in FCM, PCM, and many other variants of fuzzy clustering algorithms, datums are treated as separate realizations, where there is no a priori relationship between x_n and x_n+1. Nevertheless, in the physical world, datums are always correlated. Hence, the notion of spatial context suggests that a datum is defined in its context and must be classified while the context is taken into consideration. In words of the authors of [137], “usually, one pixel is too small to represent part of an image”.

It is important to emphasize that the notion of spatial context and the premise of NC are somewhat contradictory. While NC attempts to classify noisy datums into a separate cluster and to subsequently discard them, the thinking behind spatial context is that a noisy datum may be classified based on its context [137]. This is an important requirement in approaches such as image segmentation, in which a pixel identified as noise is in effect a discontinuity in the output.

While it is theoretically possible to include datum coordinates, if applicable, as additional features and then to carry out clustering [71, 84], that approach is not theoretically justifiable, because coordinate information and datum features are often inherently different and are defined in different scales. From a practical perspective, too, defining a notion of homogeneity which encompasses both datum homogeneity as well as spatial contiguity for datums is not a trivial task.

Another primitive approach to including the spatial context into the clustering process is to perform data pre-processing [5 , 152]. Potential loss of details, however, is among the caveats of this technique. A marginally more appropriate approach is to perform post-processing on the membership maps [20 , 152] or to execute clustering at different scales and to fuse the results afterwards [4, 144]. The reader is referred to [45 , 68] for other variations of these approaches. It has been argued [101] that the incorporation of spatial context as a pre- or post-processing stage is easy to implement but lacks proper theoretical justification.

Spatial context can also be incorporated into the optimization process as additional information. In fact, Gibbs random fields have been used in order to model spatial context within the framework of K-means clustering [14 , 92]. Also see [119] for an iterative process which utilizes sliding windows which shrink over time. Another, more recent, example for this approach is Geometrically Guided FCM (GG-FCM) [110, 111] in which the semi-supervised framework developed in [125] is modified in order to use neighborhood information as training for image pixels (also see Spatial FCM (sFCM) [29] and Bilateral FCM (bFCM) [103]).

GG-FCM and Geometrically Guided Conditional FCM (GGC-FCM) utilize a reject class and eliminate datums which are found to be spurious [110]. That approach is not desirable in applications which require every datum to be classified into a cluster. That deficiency, however, is addressed in Spatially Guided FCM (SG-FCM) [113], in which a geometrical shape descriptor is incorporated into the objective function. Nevertheless, in those works, spatial context is either utilized as a static input [125] or as information which is dynamically recalculated through the process [113]. The latter case, however, commonly depends on engineered measures of compliance to spatial contiguity and often additionally requires the proper setting of one or more regularization coefficients. For example, the approaches outlined in [113, 125] depend on the proper setting of the value of the parameter α.

Another example for the utilization of spatial context as a priori information is the Improved FCM (IFCM) [157], in which a histogram-based FCM deployment produces cluster prototypes and membership values and then the resulting crisp membership information is utilized in order to produce the p_nc quantities for each datum and each cluster. Here, p_nc denotes the ratio of the neighbors of x_n which belong to ψ_c. The second stage of that algorithm then finds f_nc and ψ_c which minimize the following modified objective function,

$Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} p_{nc}^{m} f_{nc}^{m} {∥ x_{n} - ψ_{c} ∥}^{2} .$ (8)That framework effectively utilizes spatial context as static information which is injected into the objective function at some point during the process.

The aforementioned works belong to the general category of approaches which utilize engineered regularization terms. Many of the works in that category propose a superficially constructed term which penalizes excessive spatial variation. Additionally, as stated before, the regularization terms are generally multiplied by a constant, the proper setting of the value of which is an important prerequisite for the appropriateness of the outcomes of the algorithm (as an example containing both issues refer to [38]). An early examples of that approach is the Contig-k-means [131, 141] algorithm which updates the crisp k-means clustering objective function in order to incorporate spatial contiguity into it. That approach, however, requires the proper adjustment of the value of the parameter λ. In another example, in [126], a term is devised which penalizes correlation between membership values of adjacent datums to different clusters. Another example is presented in [163], in which the authors inject a regularization term into the objective function of a Kernel FCM (KFCM) algorithm [52] and generate the Spatially constrained Kernel FCM (SKFCM) algorithm. That approach of modeling the spatial context has precedence in the literature [22 , 159] and is inspired by Neighborhood EM (NEM) [6].

In NEM, the proper value for the parameter α is to be set by the user and is suggested to be dependent on the signal to noise ratio (SNR) of the input image. Dependence on additional parameters is a concern in other works as well. For example, in [137] the authors develop the Improved FCM (IFCM) algorithm wherein “neighborhood attraction” is modeled as a combination of “feature attraction” and “distance attraction”. That model is a reminiscent of notions from bilateral filtering literature [120]. Nevertheless, the term which utilizes spatial context in that work depends on two parameters, the values of which are to be set by on an Artificial Neural Network (ANN) (also see [29]). Additionally, the attraction models utilized in [137] depend on datum-to-datum distances, which, as previously noted, are in fact irrelevant to a generic homogeneity model.

A more generalized approach to spatial context is utilized in Bias-Corrected FCM (BCFCM) [5], in which the following regularization term isemployed,

$\frac{α}{∥ S_{n} ∥} \sum_{c = 1}^{C} \sum_{n = 1}^{N} [f_{nc}^{m} \sum_{n^{'} \in S_{n}} ∥ x_{n^{'}} - ψ_{c} ∥^{2}] .$ (9)In that framework, α inversely depends on the SNR of the input image and its proper value is to be set through a separate process. An accelerated and robustified variant of that framework is given in [25] and its combination with [129] is proposed in [98]. A variant of that approach is utilized in Enhanced FCM (EnFCM) [140], in which acceleration is achieved through utilizing the image histogram. That work, as well as the Fast Generalized FCM (FGFCM) [22], require the proper setting of two parameters which govern the tradeoff between the original image and a filtered version of it. The effects of those parameters are described as “crucial” and “experience” or “trial-and-error” are stated to be required in order for them to be properly selected [82]. Similar conditions are applicable to [25, 140], among other works.

In [128] the authors develop the Adaptive FCM (AFCM) algorithm through incorporating a multiplier field into the objective function as follows (also see [100 , 149]), $\begin{matrix} Δ = \sum_{c = 1}^{C} \sum_{n = 1}^{N} f_{nc}^{2} ∥ x_{n} - s_{n} ψ_{c} ∥^{2} \\ + λ_{1} \sum_{n = 1}^{N} [{(Δ_{x} s_{n})}^{2} + {(Δ_{y} s_{n})}^{2}] \\ + λ_{2} \sum_{n = 1}^{N} [{(Δ_{xx} s_{n})}^{2} + 2 {(Δ_{xy} s_{n})}^{2} + {(Δ_{yy} s_{n})}^{2}] . \end{matrix}$ (10)Here, the Δ terms indicate forward difference operators and λ_is are regularization coefficients. In effect, the two regularization terms in (9) constrain the s_n field into a smooth surface. In addition to the difficulties of optimizing the AFCM objective function in terms of s_n, as outlined in [128], it is important to emphasize that (9) is essentially an Euclidean prototype-based framework and that the generalized formulation for generic datum and cluster models is not trivial.

Fuzzy Local Information C-Means (FLICM) [82] is among the state-of-the-art in the field of image segmentation. The formulation of FLICM does not depend on any particular parameters and it uses fuzzy local similarity measures which incorporate both gray level as well as spatial closeness. We argue, however, that the notion of datum-to-datum comparison which is used in FLICM, is only applicable to a certain category of problem classes which includes gray scale image segmentation but excludes color image segmentation. Moreover, the primary concern with FLICM is that it engineers a new concept, i.e. the Fuzzy Factor G. In fact, that work is a prime example of the introduction of a new concept based on intuition, as it is outlined in the list of “characteristics” given in [82, Section III.A]. That concept is then heuristically composed as a mathematical formula [82, (17)]. That paper then follows with verbal justification of the appropriateness of the engineered factor [82, Section III.B]. While there are important epistemological questions regarding the construction of FLICM, that framework is further extended in subsequent frameworks such as RFLICM [53] and KWFLICM [54]. An extension of FLICM is given in FCM with Edge and Local Information (FELICM) [97], in which results at image edges are improved through separate treatment of boundary pixels.

3 Developed method

In this section, we utilize Bayesian inference in order to develop the loss model for a fuzzy-possibilistic clustering algorithm which utilizes a generic datum model and a generic notion of cluster homogeneity and also employs a robust loss function. The main contribution of this section is the incorporation of spatial context into the loss model.

3.1 Model preliminaries

We assume that a problem class is given, within the context of which a datum model is known and denote a datum as x. We also assume that a particular cluster model is provided, which complies with the notion of homogeneity relevant to the problem class at hand, and denote a cluster as ψ.

In this work, we utilize a weighted set of datums, defined as,

$X = {(ω_{n}; x_{n})}, n = 1, \dots, N, ω_{n} > 0,$ (11)and we define the weight of X as,

$Ω = \sum_{n = 1}^{N} ω_{n} .$ (12)In this context, when estimating loss, we treat X as a set of realizations of the random variable x and write,

$p {x_{n}} = \frac{ω_{n}}{Ω} .$ (13)

We assume that the real-valued non-negative distance function φ (x, ψ) is defined on the datum x and the cluster representation ψ. We emphasize that this generalized assumption is a definite departure from prototype-based approaches. Those approaches assume that x and ψ have identical mathematical models, generally as members of $ℝ^{k}$ , and adopt the Euclidean distance function. We also assume that the distance function is unbounded, i.e. for any cluster representation ψ and any positive value L, there exist infinite number of datums x for which φ (x, ψ) > L. As special cases, when the datum belongs to $ℝ^{k}$ , the Euclidean Distance, any L_r norm, and the Mahalanobis Distance are special cases of the notion of datum-to-cluster distance defined here. The corresponding cluster models in these cases would be $ψ \in ℝ^{k}$ , $ψ \in ℝ^{k}$ , and ψ identifying a pair of a member of $ℝ^{k}$ and a k × k covariance matrix, respectively.

We assume that φ (x, ψ) is differentiable in terms of ψ and that for any non-empty weighted set X, the following function of ψ,

$Δ_{X} (ψ) = \sum_{n = 1}^{N} ω_{n} φ (x_{n}, ψ),$ (14)has one and only one minimizer which is also the only solution to the following equation,

$\sum_{n = 1}^{N} ω_{n} \frac{\partial}{\partial ψ} φ (x_{n}, ψ) = 0 .$ (15)In this paper, we assume that a function Ψparan· is given, which for the input weighted set X produces the optimal cluster ψ which minimizes (14) and is the solution to (14):prime. We address Ψparan· as the cluster fitting function. In fact, Ψparan· is the solution to the M-estimator given in (14). Examples for Ψparan· include mean and median when x and ψ are real values and φ (x, ψ) = (x - ψ) ² and φ (x, ψ) = |x - ψ|, respectively. We note that when a closed-form representation for Ψparan· is not available, conversion to a W-estimator can produce a procedural solution to (14):prime (refer to [44, 65] for details).

We assume that a function Ψ_∘paran·, which may depend on X, is given that produces an appropriate number of initial clusters. We address this function as the cluster initialization function and denote the number of clusters produced by it as C. Note that, C is not required to be explicitly known. It is in fact the responsibility of Ψ_∘paran· to produce an initial number of clusters which is relevant to the problem class or problem instance. As stated in Section 2:validity, an assumption in this work is that either the algorithm is to converge to C clusters or that the number of clusters is modified by an external process during the execution of the developed algorithm. This issue is addressed again in Section 3:outline.

We assume that a robust loss function, uparan · : [0, ∞] → [0, U], is given. Here, U > 0 is the value which u (τ) converges to for τ→ ∞. We assume that uparan· is an increasing differentiable function which satisfies u (0) =0 and u (λ) = ontop12 for a known value of λ > 0, which we will address as the scale parameter (note the similarity with the cluster-specific weights in PCM [85]). In fact, λ has a similar role to that of scale in robust statistics [69] (also called the resolution parameter [12]) and the idea of distance to noise prototype in the Noise Clustering (NC) algorithm [33, 34]. Scale can also be considered as the controller of the boundary between inliers and outliers [36]. From a geometrical perspective, λ controls the radius of spherical clusters and the thickness of planar and shell clusters [89]. One may investigate the possibility of generalizing the unique scale factor λ into cluster-specific scale factors λ_c in line with the η_c parameters inPCM [85].

In this work, we utilize the rational robust loss function given below,

$u (x) = \frac{x}{λ + x} .$ (16)Note that while λ is the scale parameter for this loss function, varying λ does not impact the overall geometrical layout of u (x). In fact, λ linearly stretches the vertical span of the function.

We acknowledge that one may consider the possibility of utilizing Tukey’s biweight [11], Hampel [59], and Andrews loss functions in this framework. Huber and Cauchy loss functions are not bounded and therefore are not applicable to this work. Refer to [36, Table I] for mathematical formulations.

Now, we provide a mathematical model for spatial context. We note that the notion of spatial context implies that datums which are spatially adjacent must interfere in the classification of each other. Here, we model the influence of x_{n
₁} on x_{n
₂} as ω_{n
₁
n
₂} ≥ 0. We utilize this model through a Bayesian inference framework, as follows,

$E {Loss | x_{n} \in \hat{X} = \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} E {Loss | x_{n^{'}} \in \hat{X}} .$ (17)Here, $\hat{X}$ is any set. A special case of this loss model, in the context of color image segmentation, is used in Spatial FCM (SFCM) [102]. Note that, the model employed in this paper refrains from comparing datums together, as for example is done in [22]. This is due to the fact that datum-to-datum distance values may in fact have no relevance to homogeneity and membership tosimilar clusters.

In (17), we differentiate between E{Loss} and expectloss. The former is the expected loss associated to a particular setting while the spatial context is taken into consideration. In effect, the purpose of the clustering algorithm, derived here, is the minimization of expectLoss for the entire set of datums. In contrast, expectloss is the expected loss when the spatial context is ignored, i.e. the loss of the individual datum. As such, we write,

$E {Loss | x_{n} in cluster c = u = (φ (x_{n}, ψ_{c})) .$ (18)While the hypothetical assumption that $\sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}$ must be independent of n, will simplify the derivations, we avoid this assumption because in certain applications, especially within the field of image processing, this condition is hard to satisfy for boundary datums. Nevertheless, we assume that $\sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}$ is non-zero. Note that the notion of spatial context which is utilized in this work is a generalization of the binary framework utilized in [158] and many other works.

3.2 Assessment of loss

Here, we employ a Bayesian inference framework and model the aggregate loss which corresponds to an arbitrary solution to the clustering problem. This process results in an objective function which is organically derived from the utilized loss model.

We write,

$\begin{matrix} E {Loss = \frac{1}{Ω} \sum_{n = 1}^{N} ω_{n} E {Loss | x_{n}} = \frac{1}{Ω} \\ \sum_{n = 1}^{N} ω_{n} [E {Loss | x_{n} \int p {x_{n} \in \tilde{X} + \\ E {Loss | x_{n} \notin \tilde{X} p {x_{n} \notin \tilde{X}] . \end{matrix}$ (19)Here, $\tilde{X}$ denotes the subset of X which only includes the inliers. Then, we model the probability that x_n is an inlier as p_n and write,

$p {x_{n} \in \tilde{X} = p_{n} .$ (20)Therefore, we have,

$\begin{matrix} E {Loss = \frac{1}{Ω} \sum_{n = 1}^{N} ω_{n} [p_{n} \sum_{c = 1}^{C} \\ E {Loss | x_{n} \in {\tilde{X}}_{c} p {x_{n} \in {\tilde{X}}_{c} | x_{n} \in \tilde{X} \\ + (1 - p_{n}) E {Loss | x_{n} \notin \tilde{X}] . \end{matrix}$ (21)

In this model, $\tilde{X}$ c is the subset of $\tilde{X}$ which contains datums in ψ_c. Therefore,

$\tilde{X} = ⋃_{c = 1}^{C} {\tilde{X}}_{c} .$ (22)Now, we utilize (17) and write,

$\begin{matrix} E {Loss | x_{n} \notin X \\ = \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} E {Loss | x_{n^{'}} \notin X = U \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} . \end{matrix}$ (23)Here, we have modeled expectLoss|x_n ∉ $\tilde{X}$ as $U = lim_{τ \to \infty} u (τ)$ . In other words, loss of an outlier is modeled as the maximum value to which the loss function saturates.

We continue the derivation of (21) by using (17) for $\hat{X} = {\tilde{X}}_{c}$ and write,

$\begin{matrix} E {Loss = \frac{1}{Ω} \sum_{n = 1}^{N} ω_{n} \\ [p_{n} \sum_{c = 1}^{C} f_{nc} \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} E {Loss | x_{n^{'}} \in {\tilde{X}}_{c} \\ + (1 - p_{n}) U \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}] . \end{matrix}$ (24)Here, f_nc denotes p{x_n ∈ $\tilde{X}$ c|x_n ∈ $\tilde{X}$ . In other words, f_nc represents the membership of x_n to ψ_c subject to x_n being an inlier. Now, using (18) we have,

$\begin{matrix} E {Loss = \frac{1}{Ω} \sum_{n = 1}^{N} ω_{n} [p_{n} \sum_{c = 1}^{C} f_{nc} \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} \\ u (φ (x_{n^{'}}, ψ_{c})) + (1 - p_{n}) U \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}], \end{matrix}$ (25)which we reorder and rewrite as,

$\begin{matrix} Ω E {Loss = \sum_{n = 1}^{N} \sum_{n^{'} = 1}^{N} \sum_{c = 1}^{C} ω_{n} ω_{{nn}^{'}} f_{nc} p_{n} \\ u (φ (x_{n^{'}}, ψ_{c})) + \\ UC \frac{1}{C} \sum_{n = 1}^{N} [ω_{n} (1 - p_{n}) \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}] . \end{matrix}$ (26)Here, we have chosen to write U as UC $\frac{1}{C}$ for reasons which will become clear later.

Close assessment of (26) shows that this cost function contains an HCM-style hard formulation. It is known, however, that the use of the fuzzifier has important benefits, as outlined in Section 2:membership. Hence, we rewrite (26) as,

$\begin{matrix} Δ = \sum_{n = 1}^{N} \sum_{n^{'} = 1}^{N} \sum_{c = 1}^{C} ω_{n} ω_{{nn}^{'}} f_{nc}^{m} {p_{n}}^{m} \\ u (φ (x_{n^{'}}, ψ_{c})) + \\ {UC}^{1 - m} \sum_{n = 1}^{N} [ω_{n} (1 - p_{n})^{m} \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}] . \end{matrix}$ (27)This objective function is to be minimized subject to (2).

It is important to emphasize that (27) converges to the objective function for the classical FCM when the additional features built into it are eliminated. In order to observe this relationship, we first need to assume that ω_nn′ is the identity operator, i.e. that ω_nn′ is one iff n = n′. Additionally, we need to set pⁿ ≡ 1, ∀ n, i.e. assume that all datums are inliers, and also set u (x) ≡ x, i.e. ignore the robust loss function. Then, (27) will convert into the conventional FCM formulation. In other words, the process developed in this paper can be considered as theoretical justification for the FCM model when the presence of the outliers and the significance of spatial context are ignored and robustification is not required.

Note the presence of the product of $f_{nc}^{m}$ and uncn′c in (27). In other words, the distance between x_n′ and ψ_c is weighted by the membership of x_n to this cluster. This notion of spatial correspondence is at the heart of the spatial context model which is utilized in the present work. A distantly similar approach is taken in [101], in which the Euclidean distance between the datums is coupled with their corresponding distances to the clusters. That framework is inherently based on the assumption that the Euclidean distance between the datums is relevant to the homogeneity of sets of datums (also see [91]).

A Note on the concept of membership in the present framework is necessary. In this work, f_nc denotes the membership of x_n to ψ_c conditional to x_n being an inlier. However, for convenience, we may simply refer to f_nc as “membership of x_n to ψ_c”. It is important to realize that this reference contains an implicit “subject to x_n being an inlier” excluded from the sentence in order to facilitate the flow of the conversation and the derivations. In other words, the membership of x_n to the ψ_c, as it is commonly referenced in the data clustering literature, is in fact equal to p_nf_nc, as derived below,

$\begin{matrix} f_{nc} = \frac{p {x_{n} \in {\tilde{X}}_{c} \cap \tilde{X}}}{p {x_{n} \in \tilde{X}}}, \end{matrix}$ (28)and as $\tilde{X}$ c ⊂ $\tilde{X}$ , we have,

$p {x_{n} \in {\tilde{X}}_{c} = p_{n} f_{nc} .$ (29)Also, it is important to realize that the equality,

$\sum_{c = 1}^{C} p_{n} f_{nc} + (1 - p_{n}) = 1, \forall n,$ (30)carries an intuitive meaning, as follows: X is partitioned into the C + 1 disjoint sets of ${\tilde{X}}_{1}, {\tilde{X}}_{2}, \dots, {\tilde{X}}_{c}, X - \tilde{X} .$ As such, (30) carries the probability values which indicate the membership of x_n to each one of these sets. Here, it is also worth pointing out the similarity as well as the differences between the p_n identifiers utilized in this work and the notion of “credibility” employed in CFCM [27]. CFCM, in effect, replaces the constant one in (2) with datum-specific credibility identifiers which are calculated using an engineered term. These terms are expected to result in low membership values for outliers. While that framework does not always yield the desired outcome, as discussed in [58], we emphasize the heuristic nature of that framework. In contrast, in this work, we have,

$\sum_{c = 1}^{C} p_{n} f_{nc} = p_{n}, \forall n,$ (31)which exhibits the fact that the unconditional membership values to the clusters are indeed constrained differently for the inliers and the outliers. Nevertheless, this constraint is organically derived in the present work and is not based on heuristics.

We emphasize that the second term in (27) in fact acts as a regularization component and that it bears resemblance to the regularization term in PCM (see (3)). It is important to emphasize, however, that this term is derived organically through the mathematical modeling of the loss value. In comparison, a significant majority of the well-known methods in the literature utilize regularization components which are heuristically and based on the intuition of the researchers assumed to yield desired effects (for example see [49]). Those works then utilize experimental evidence in order to exhibit that the desired effects are in fact present and valid. In contrast, in this work, we start from a Bayesian model for the loss function and derive an objective function which yields a term that may be understood as a regularization term. We find this distinction important from an epistemological perspective.

3.3 Alternating optimization strategy

In this section, we provide a solution strategy for the cost function given in (27). This cost function is to be minimized subject to (2).

First, we calculate ontop∂Δ∂p_n and equate it to zero and derive,

$\frac{1}{p_{n}} = 1 + C {[\frac{\sum_{n^{'} = 1}^{N} \sum_{c = 1}^{C} ω_{{nn}^{'}} f_{nc}^{m} u (φ (x_{n^{'}}, ψ_{c}))}{U \sum_{n^{'} = 1}^{N} ω_{{nn}^{'}}}]}^{\frac{1}{m - 1}}$ (32)Then, we utilize Lagrange Multipliers and combine (2) with (27) and find the optimal value for f_nc as,

$f_{nc} = \frac{{[\sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} u (φ (x_{n^{'}}, ψ_{c}))]}^{- \frac{1}{m - 1}}}{\sum_{c = 1}^{C} p {[\sum_{n^{'} = 1}^{N} ω_{{nn}^{'}} u (φ (x_{n^{'}}, ψ_{c}))]}^{- \frac{1}{m - 1}}} .$ (33)Similarly, by calculating ontop∂Δ∂ψ_c and equating it with zero, we produce the update function for ψ_c, as follows. We define,

${\tilde{ω}}_{n c} = \sum_{n^{'} = 1}^{N} ω_{n^{'}} ω_{n^{'} n} f_{n^{'} c}^{m} p_{n^{'}}^{m} u (φ (x_{n^{'}}, ψ_{c})) .$ (34)and use (14):prime in order to produce a candidate solution as,

$ψ_{c}^{★} = Ψ ({({\tilde{ω}}_{n c}; x_{n})}) .$ (35)Note that, in (14):star, the dependency between omegatnc and phixcc is ignored. In other words, the cluster representation $ψ_{c}^{★}$ calculated in (14):star does not necessarily satisfy $Δ_{c} (ψ_{c}^{★}) \leq Δ_{c} (ψ_{c})$ . Here, Δ_c (ψ) is the contribution of ψ_c = ψ to the objective function.

$Δ_{c} (ψ) = \sum_{n = 1}^{N} \sum_{n^{'} = 1}^{N} ω_{n} ω_{{nn}^{'}} f_{nc}^{m} {p_{n}}^{m} u (φ (x_{n^{'}}, ψ_{c})) .$ (36)In order to address this issue, we propose that $ψ_{c}^{★}$ is only accepted if it produces a smaller value of Δ_cpcdotp compared to the value produced by the existing ψ_c. Otherwise, ψ_c will be fed to the next iteration as it is.

An alternative method is to utilize the technique developed by Weiszfeld [148] and Miehle [105, 134] (similar techniques are cited under different names as well [30, 90]). The Weiszfeld technique utilizes the fixed point method in order to solve (14):prime when φpcdotp is not the Euclidean distance function (refer to [90] for details and to [39] for acceleration options). A weighted version of the Levenberg-Marquardt algorithm [106] may also be applicable for certain distance functions and loss functions.

3.4 Outlier removal

For certain classes of problems, it is not a requirement that every datum must be assigned to a cluster. In other words, the problem class under consideration may prefer a solution in which clusters are not unnecessarily “bloated” in order to include outliers. We satisfy the requirements of such problem classes through returning the C disjoint sets $\tilde{X}$ c, c = 1, ⋯ , C, as the inliers, as well as X - $\tilde{X}$ as the set of outliers. This approach has precedence in the literature (for example refer to the use of the reject class in [110]).

We suggest to utilize a Maximum Likelihood inference framework and to compare the probability values corresponding to x_n ∈ $\tilde{X}$ c, for c = 1, ⋯ , C as well as the probability that x_n ∈ X - $\tilde{X}$ . Derivation shows that a datum belongs to one of the $\tilde{X}$ c, and therefore is inlier, when,

$p_{n} max_{c} f_{nc} > 1 - p_{n} .$ (37)Hence, if the problem is to be solved in inclusive mode, then x_n is assigned to cluster $c_{n} = {arg}_{c} \max f_{nc}$ . In non-inclusive mode, however, c_n is not defined if x_n is not an inlier, i.e. if it does not satisfy (37). This strategy has similarities to Conditional Fuzzy C-means (CFCM) [94], but the method developed in this paper does not require the a priori knowledge needed in CFCM.

3.5 Algorithm outline

This section, presents an outline of the algorithm developed in this paper, as follows,

Generate the input set of datums $\tilde{X}$ .

Call Ψ_∘paran· to produce ψ_c, for all c.

Calculate uncnc, for all n and c.

Calculate f_nc, for all n and c, using (33).

Calculate p_n, for all n, using (32).

Calculate Δ using (27).

If change in Δ is negligible, exit the loop.

Update ψ_c, for all c, using (14):star.

Call external pruning process, if applicable.

Go to Line 3.

Note that, in Line 9, there is a place-holder for an external pruning process which may decide to remove clusters that it finds redundant or scarcely populated. In essence, in this context, any of the techniques listed in Section 2:validity can be utilized as external pruning procedures.

3.6 Implementation notes

The developed algorithm is implemented as a class named Calista in MATLAB Version 8.1 (R2013a). It takes use of the Image Processing Toolbox for minor image-related primitives. The major operations in this class are implemented as C/MEX dll’s. The results carried in this paper are collected on Windows 7, 64 bit, on an Intel Core i5-2400 CPU, 3.10 GHz, with 8.00 GB of RAM.

In this work, each problem class is implemented as a child class for Calista. The child classes implement a constructor which creates the weighted set X based on the input image, data file, etc. The child classes also implement the three functions φparan·, Ψ_∘paran·, and Ψparan· and set the value of λ independent of X. The child classes are not responsible for any of the core operations of the developed algorithm. These operations are implemented in the parent class.

3.7 Complexity analysis

Analysis shows that the computational cost of the classical FCM can be estimated as,

$τ_{FCM} = O (INC (2 + φ + Ψ)) .$ (38)Here, φ is the cost of executing φparan· for one datum and one cluster representation and Ψ is the cost of executing Ψparan· per input datum. In (38), I denotes the number of iterations required for convergence.

Similar derivations indicate that the computational complexity of the developed algorithm can be estimated as,

$τ = O (INC (11 F + u + u^{'} + φ + Ψ)) .$ (39)In this equation, F denotes the area of the active kernel of ω_{n
₁
n
₂} and u and u′ are the costs of executing the corresponding functions for one scalar.

Substituting typical values in (38) and (39), we derive that the developed algorithm is expected to require about 10 times more computation than FCM when a 3 × 3 kernel is employed. With a 5 × 5 kernel this ratio drives up to about 24.

4 Experimental results

In this section, we define three problem classes and then follow with the procedure required in order to deploy the developed method in the context of these problem classes. Here, we use the inputs sets of datums shown in Fig. 1.

4.1 Spatial context model

The problem classes utilized in this section all address datums in regularly sampled 2-D grids. Therefore, here, we define ω_{n
₁
n
₂} as a function of the Euclidean distance between the two points which correspond to x_{n
₁} and x_{n
₂}. We denote this distance as d_{n
₁
n
₂} and measure it in pixels. We then model the spatial context as a finite 2D kernel of size 2r + 1 ×2r + 1 and define ω_{n
₁
n
₂} = 0 outside of this neighborhood of x_{n
₁}. Inside this square, we utilize,

$ω_{n_{1} n_{2}} = \frac{r^{2}}{r^{2} + d_{n_{1} n_{2}}^{2}} .$ (40)In the experiments reviewed in this section, we always set r = 2.

4.2 Problem classes

In this section, we review the utilization of the developed method in the context of three problems classes which address 1D, 3D, and 3D datums, as described below.

4.2.1 Grayscale image multi-level thresholding

The problem of grayscale image multi-level thresholding defines datums as grayscale values and models a cluster as an interval on the grayscale axis centered at the scalar ψ_c. The reader is referred to [21] for a review of the approaches to image segmentation with emphasis on Magnetic Resonance Images.

In order to produce the datums, we downsample the input image at the scale of 4 and produce the sample average and the sample standard deviation for each block. We address these entities as η_n and σ_n, respectively. Then, we utilize x_n = η_n and calculate ω_n as,

$ω_{n} = \frac{σ}{σ + σ_{n}} .$ (41)In the experiments, we use σ = 20 gray levels.

In this problem class, distance between a datum and a cluster is defined as the square difference and the initial clusters are defined as uniformly-distributed points in the working range. The cluster fitting function in this problem class calculates the weighted sum of the input datums. This problem utilizes the scale of 25 gray levels and is defined as inclusive. The input images used in the experiments carried in this section are at the resolution of 512 × 512 pixels.

4.2.2 Plane finding in range data

The input data in this problem class contains 3D points captured by a Kinect 2 sensor. The depth-maps used in these experiments are captured at the resolution of 424 × 512 pixels. Here, intrinsic parameters of the camera are acquired through the Kinect SDK and each datum in this problem class has the weight of one.

Clusters in this problem class are defined as planes which have a thickness, i.e. ${\vec{ψ}}_{c}$ is a vector in $ℝ^{3}$ . Here, we model a plane using the following mathematical representation,

${\vec{ψ}}^{T} \vec{x} = {∥ \vec{ψ} ∥}^{2} .$ (42)Moreover, distance between ${\vec{x}}_{n}$ and ${\vec{ψ}}_{c}$ is defined as follows,

$φ ({\vec{x}}_{n}, {\vec{ψ}}_{c}) = \frac{1}{{∥ {\vec{ψ}}_{c} ∥}^{2}} {[{\vec{ψ}}_{c}^{T} {\vec{x}}_{n} - {∥ {\vec{ψ}}_{c} ∥}^{2}]}^{2} .$ (43)This model is similar to the distance function employed in the FCV algorithm [95].

The initial set of clusters in this problem class are very rough estimates of the three ideal planes which are expected to exist in the scene. Here, each plane passes through the corresponding point at the tenth percentile of the marginal distribution of the points on the corresponding axis and is parallel to the two other axes. The cluster fitting function, Ψpcdotp, for this problem class fits a plane to a number of 3-D weighted points using a weighted variant of Singular Value Decomposition (SVD). This problem class utilizes scale of 200 millimeters and is defined as non-inclusive.

4.2.3 Color image segmentation

The appropriateness of an Euclidean model for color homogeneity has been disputed in the literature. Klinker et al. [80] presented a new approach to measuring highlights from an arbitrary point of a dielectric object in 1988. Then, in 1990, they applied their theory to color image understanding [81]. About a decade later, Cheng et al. [26] used Principal Component Analysis (PCA) for color image processing. Then, in 2004,Nikolaev et al. [108] provided theoretical evidence for the appropriateness of linear models of color homogeneity in natural images. Later in 2008, Abadpour et al. [2] utilized a resulting cylindrical cluster model for color image segmentation for the purposes of compression and watermarking. Here, we utilize this model using the clustering algorithm developed in this paper in order to perform color image segmentation.

In order to produce the datums, we downsample the input image at the scale of 4 and utilize the same framework as (41) in order to produce the weights. Note that in this problem class we utilize 3σ instead of σ.

A cluster in this problem class is defined as the mean vector ${\vec{η}}_{c}$ and the principal vector ${\vec{v}}_{c}$ , i.e. $ψ_{c} = ({\vec{η}}_{c}, {\vec{v}}_{c})$ . In this problem class, distance between ${\vec{x}}_{n}$ and ψ_c is defined as [1],

$φ ({\vec{x}}_{n}, ψ_{c}) = {∥ ({\vec{x}}_{n} - {\vec{η}}_{c}) - {\vec{v}}_{c}^{T} ({\vec{x}}_{n} - {\vec{η}}_{c}) {\vec{v}}_{c} ∥}^{2} .$ (44)The cluster fitting function in this problem class is calculated using Fuzzy PCA (FPCA) [31, 153]. Hence, $Ψ (X) = (\vec{η}; \vec{v})$ , where $\vec{η}$ is the sample mean of X and $\vec{v}$ is the eigenvector of $C = \sum_{n = 1}^{N} ω_{n} ({\vec{x}}_{n} - \vec{η}) {({\vec{x}}_{n} - \vec{η})}^{T} .$ (45)corresponding to the largest eigenvalue.

The clusters in this problem class are initialized to the PCA representations of a maximum of 6 homogenous 256 × 256 colored patches. These patches are extracted form images found on the web through searching for terms such as Red, Green, and Blue. This problem class utilizes a scale of 25 gray levels and it is solved as inclusive.

4.3 Comparative results

4.3.1 Grayscale image multi-level thresholding

Figure 2 compares the outputs of classical FCM and the developed algorithm on the input image shown in Fig. 1(a). Here, C equals 2. In this figure, the bottom row graphs present the input data as well as the output clusters and also membership of datums to the clusters in the histogram domain. In fact, in these visualizations, the horizontal axis identifies the datums and the gray curve denotes the weights, i.e. the histogram of the downscaled version of the input image. Here, each set of colored points identifies membership to one cluster, i.e. p_nf_nc, and black points denote values of p _n .

Note that, as expected from the model, p_n is always one in FCM. Moreover, the x_n–p_n relationship in the developed algorithm is not one-to-one. In other words, in the developed algorithm, the same grayscale value may be considered an inlier with different probabilities based on its context. Hence, the black set of points in Figure 2(b-bottom) do not constitute a single-valued curve. Similarly, the x_n–f_nc relationship is not one-to-one for the developed algorithm either. Thus, while in FCM the membership representation for each cluster is a single-valued curve, it is in fact a cloud of points for the developed algorithm. This observation also exhibits that in the developed algorithm membership to clusters depends on the context of the datum.

We note that in the outputs of FCM, shown in Figure 2(a-bottom), the membership curve for the rightmost cluster is at over 0.5 at the end of the range. This is essentially because the distances between these datums and both clusters are significantly large. We compare this unwanted situation with the geometry of the output of the developed method, in which, as observed in Figure 2(b-bottom), at a distance from any cluster it is not significantly important how far the datum is from that particular cluster. In other words, as seen in the left side of Figure 2(b-bottom) for x_n ≤ 40, membership to both clusters drops without requiring a competing cluster. In FCM, however, a cluster centered between 60 and 80 assigns high membership values to x_n valuesbelow 10.

Figure 2 also compares the output of FCM with that of the developed algorithm in the image domain. Here, in the top row, in order to visualize the classification results, x_n is replaced with ψ_{c
_n}. We observe that the developed algorithm produces stronger spatial contiguity. This is essentially the main goal of the approaches which perform post- or pre-processing operations in order to increase correlation of the clustering information for adjacent datums. The developed algorithm achieves this goal using the datum-to-datum relationship model described in Section 3:preliminaries. In fact, we observe that compared to Figure 2(a-top), there are less isolated pixels in Figure 2(b-top) and that the boundaries of the clusters in Figure 2(b-top) are smoother than those in Figure 2(a-top).

It took FCM and the proposed method 30 milliseconds and 250 milliseconds to produce the results shown in Figure 2(a) and (b), respectively.

Figure 3 shows another pair of results for grayscale image multi-level thresholding produced by FCM and the proposed method on the image shown in Figure 1(b) for C = 5 clusters. We observe in Figure 3 that, similar to the observations made in Figure 2, FCM clusters are wider while the proposed method produces more specific clusters. This point can be observed, for example, at the two ends of the spectrum in Figure 3(a-bottom), where the FCM clusters assign membership values over half to x_n = 0 and x_n = 255. Whereas, membership of these datums to the clusters in the output of the developed method falls down to close to zero, as seen in Figure 3(b-bottom). We also emphasize the higher spatial contiguity of the result shown in Figure 3(b-top) compared to the one shown in Figure 3(a-top). In this experiment, FCM converged after 91 milliseconds while the proposed method required 811 milliseconds to converge.

4.3.2 Plane finding in range data

Figure 4 shows the output of FCM and the developed algorithm for the input shown in Figure 1(c). This scene contains a room in which the floor and two walls are visible. Therefore, the purpose of this experiment is to examine whether the two algorithms can successfully recognize these three planes.

The top images in Figure 4 denote the classification of the datums to the detected clusters. Here, datums are painted according to the cluster they are assigned to. Outlier datums, i.e. datums which are found to not belong to any of the clusters, are painted in black. As expected form the models, in the FCM results every datum is assigned to a clusters. The proposed method, however, labels some of the datums as outliers. These datums are the ones which belong to other objects in the scene as well as to areas in which the inherent geometrical distortions caused by the sensor push datums unacceptably far from the planes.

The bottom images in Figure 4 show the relative positions of the converged clusters. Here, we observe that both algorithms have successfully detected the floor and the two walls. Nevertheless, examining the top images in Figure 4, we observe higher spatial contiguity and smoother edges for the clusters generated by the proposed method compared to the ones produced by FCM.

In this experiment, FCM converged after 2,194 milliseconds, whereas the proposed method required 19,526 milliseconds to converge. We observe that while the proposed method is computationally more expensive, as predicted in Section 3.7, the ratio between the computational costs of the two algorithms is in fact smaller than the estimate produced in Section 3.7.

The results shown in Figure 4 indicate a situation in which the proposed method and FCM perform similarly, with the added bonus that the proposed method performs outlier detection while the latter does not. The comparison shown in Figure 5, however, exhibits another case in which the clustering results generated by the two algorithms are in fact significantly different. This data belongs to the same room utilized in Figure 4 with the difference that the camera is moved to another corner of the room and three human bodies are added to the scene. In other words, the major difference between the data utilized in Figure 4 and the one used in Figure 5 is a visible increase in the portion of the data which does not belong to any of the clusters.

Under these conditions, as seen in the bottom row of Figure 5(a), FCM fails to produce meaningful clusters. In fact, we observe that the outliers in the scene pull the clusters into incorrect positions. This is also evident in the top image in Figure 5(a), i.e. the output of FCM. The proposed method, however, as seen in Figure 5(b), produces spatially contiguous clusters which also correctly correspond to the actual planes present in the scene.

4.3.3 Color image segmentation

Finally, we present two pairs of results generated by FCM and the proposed algorithm for color image segmentation. These results correspond to the two input images shown in Figure 1 and (f). As seen in Figure 6 and Figure 7, however, it is harder to discuss appropriateness of clustering results for color data, compared to what was carried out on the results corresponding to the two other problem classes discussed in this section. Nevertheless, We observe spatial contiguity in the outputs of the proposed method while the FCM results, as expected from the mathematical model, lack spatial context and contiguity.

4.4 Complexity analysis

Table 1 carries the elapsed times corresponding to the experimental results exhibited in this section. We observe that the computational cost of the proposed method is always less than 10 times that of FCM. This empirically calculated value must be compared with the estimation given in Section 3.7 which predicts that the proposed method is 24 times more expensive that FCM.

In fact, we repeatedly executed the clustering procedures, both FCM and the proposed method, for the purpose of grayscale image multi-level thresholding on 16 samples from the USC-SIPI database for 9 different values of C, between 2 and 10, and recorded the results. We observed that FCM in average required 557.17 milliseconds to converge while the proposed method converged in average after 2,854.32 milliseconds. Thus, the computational cost of the proposed method is in average 9.38 times that of FCM for this problem class

We also utilized 7 samples and examined the plane finding in range data problem class. For this problem class FCM in average converged after 4,430.14 milliseconds while the proposed method in average required 18,490.29 milliseconds to converge. Hence, the proposed method is in average 4.69 times more computationally expensive for this problem class.

Finally, we carried out a similar investigation for the color image segmentation problem class. In this investigation, we utilized 8 samples from the USC-SIPI database and the same range of C investigated for the grayscale image multi-level thresholding problem class. We observed that FCM and the proposed method in average converged after 490.67 and 1,925.56 milliseconds, respectively. Hence, for this problem class, the computational cost of the proposed method is in average about 4.80 times that of FCM.

These investigations verify the finding, based on the data presented in Table 1, that the developed method is less than 10 times more computationally expensive than FCM.

4.5 Spatial contiguity

Figure 8 shows magnified sections of the outputs of FCM and the proposed method for some of the experiments discussed in this section. The three pairs visualized in this figure each present classification results corresponding to one of the problem classes utilized in this section. Here, for each problem class, FCM and the proposed method are applied on the same set of input datums and the same sections of the output are magnified.

We observe that, independent of the specifics of the problem class, the proposed method produces spatially contiguous classification results. In the output of FCM, however, there exist datums which are assigned to clusters, to which none of their neighbors are assigned. In other words, as expected from the mathematical models, FCM disregards the expected correlation in the classification results and assigns datums to clusters with no regard for the spatial context in which the datums exist. The proposed method, however, produces patches of datums which are assigned to the same cluster and, therefore, generates smoother edges between theclusters.

5 Conclusions

In this work, we investigated a generic fuzzy-possibilistic clustering problem in which datums of an arbitrary mathematical model are classified into a number of clusters of an arbitrary model. This framework addresses the general case in which, due to the physical properties of the datums and the clusters, a particular notion of homogeneity is applicable to the problem class in hand. We showed that datum and cluster models and the notion of homogeneity can be abstracted out of the loss model, which we derived using Bayesian inference. As a result, we developed a loss model which employs a robust loss function and utilizes both concepts of fuzzy/probabilistic membership as well as the possibilistic estimate that a datum is in fact an inlier.

A key contribution of this paper is the incorporation of spatial context into the clustering process. We argued that while a significant majority of the works in the literature are indifferent to the order of the input datums, the input to any clustering problem is inherently correlated in the spatial domain. When algorithms ignore this relationship between the datums, however, they fail to produce spatially contiguous classification results. Additionally, spatial context is an important property of a datum in the presence of noise and when clusters meet. Thus, we modeled the loss for a datum as a function of the loss for the datums in its proximity. Then, we utilized a generic concept of datum-to-datum correlation and derived the loss function for the clustering problem. Subsequently, we developed a solution strategy for the developed clustering model.

We emphasize that the process developed in this paper avoids heuristically engineered terms which may be intuitively believed to lead to particular types of output, as it is generally practiced in the literature. In fact, in this paper, we based the entire model on direct derivation of loss using a construction process and avoided any parameters which may have to be set by the user or through other processes.

We used problem instances in three different problem classes and described the process of adopting the developed algorithm in each one. We exhibited experimental results and compared the outputs of the developed algorithm with those of FCM. We showed that the developed method is successful in performing outlier detection as well as finding the clusters present in the data. We also demonstrated the spatial contiguity of the classification results generated by the developed algorithm. While in theory the developed algorithm is estimated to be 24 times more computationally expensive than FCM, we observe that in practice the increase in computational complexity is less than 10 times that of FCM.

Acknowledgments

The author wishes to thank the management of Epson Edge for their help and support during the course of this research. We thank Masoud Mazloom and Bahar T. for their help in locating critical pieces of literature needed for this work. We wish to thank Mahsa Pezeshki for proofreading this manuscript.

References

Abadpour

and Kasaei

, Principal color and its application to color image segmentation, Scientia Iranica 15(2) (2008), 238–245.

Abadpour

and Kasaei

, Color PCA eigenimages and their application to compression and watermarking, IEE Image & Vision Computing 26(7) (2008), 878–890.

Abadpour

, Alfa

A.S.

and Diamond

, Video-on-demand network design and maintenance using fuzzy optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38(2) (2008), 404–420.

Acton

S.T.

and Mukherjee

D.P.

, Scale space classification using area morphology, IEEE Transactions on Image Processing 9(4) (2000), 623–635.

Ahmed

M.N.

, Yamany

S.M.

, Mohamed

, Farag

A.A.

and Moriarty

, A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data, IEEE Transactions on Medical Imaging 21(3) (2002), 193–199.

Ambroise

and Govaert

, Convergence of an EM-type algorithm for spatial clustering, Pattern Recognition Letters 19(10) (1998), 919–927.

Ball

G.H.

and Hall

D.J.

, A clustering technique for summarizing multivariate data, Behavioral Science 12(2) (1967), 153–155.

Baraldi

and Blonda

, A survey of fuzzy clustering algorithms for pattern recognition. I, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29(6) (1999), 778–785.

Baraldi

and Blonda

, A survey of fuzzy clustering algorithms for pattern recognition. II, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29(6) (1999), 786–801.

10.

Barni

, Cappellini

and Mecocci

, Comments on “A possibilistic approach to clustering”, IEEE Transactions on Fuzzy Systems 4(3) (1996), 393–396.

11.

Beaton

A.E.

and Tukey

J.W.

, The fitting of power series, meaning polynomials, illustrated on band-spectroscopic data, Technometrics 16 (1974), 147–185.

12.

Beni

and Liu

, A least biased fuzzy clustering method, IEEE Transactions on Pattern Analysis and Machine Intelligence 16(9) (1994), 954–960.

13.

Bensaid

A.M.

, Hall

L.O.

, Bezdek

J.C.

and Clarke

L.P.

, Partially supervised clustering for image segmentation, Pattern Recognition 29(5) (1996), 859–871.

14.

Besag

, On the statistical analysis of ditty pictures, Journal of the Royal Statistical Society Series B (Methodological) 48(3) (1986), 259–302.

15.

Bezdek

J.C.

, A physical interpretation of fuzzy ISODATA, IEEE Transactions on Systems, Man and Cybernetics SMC-6(5) (1976), 387–389.

16.

Bezdek

J.C.

, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.

17.

Bezdek

J.C.

and Pal

N.R.

, Some new indexes of cluster validity, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 28(3) (1998), 301–315.

18.

Bezdek

J.C.

, Keller

, Krishnapuram

and Pal

N.R.

, Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Boston, 1999.

19.

Bobrowski

and Bezdek

J.C.

, C-means clustering with the ℓ₁ and ℓ_∞ norms, IEEE Transactions on Systems, Man, and Cybernetics 21(3) (1991), 545–554.

20.

Boudraa

A.-E.-O.

, Automated detection of the left ventricular region in magnetic resonance images by fuzzy C-means model, The International Journal of Cardiac Imaging 13(4) (1997), 347–355.

21.

Bezdek

J.C.

, Hall

and Clarke

, Review of MR image segmentation techniques using pattern recognition, Medical Physics 20(4) (1993), 1033–1048.

22.

Cai

, Chen

and Zhang

, Fast and robust fuzzy Cmeans clustering algorithms incorporating local information for image segmentation, Pattern Recognition 40(3) (2007), 825–838.

23.

Chen

J.-L.

and Wang

J.-H.

, A new robust clustering algorithm-density-weighted fuzzy C-means, In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC 1999) 3 (1999), 90–94.

24.

Chen

, Chen

and Lu

, A multiple-kernel fuzzy Cmeans algorithm for image segmentation, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41(5) (2011), 1263–1274.

25.

Chen

and Zhang

, Robust image segmentation using FCM with spatial constraints based on new kernel-induced distance measure, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(4) (2004), 1907–1916.

26.

Cheng

S-C

and Hsia

S-C

, Fast algorithm’s for color image processing by principal component analysis, Journal of Visual Communication and Image Representation 14 (2003), 184–203.

27.

Chintalapudi

K.K.

and Kam

, A noise-resistant fuzzy C means algorithm for clustering, In Proceedings of IEEE World Congress on Computational Intelligence 2 (1998), 1458–1463.

28.

Chintalapudi

K.K.

and Kam

, The credibilistic fuzzy Cmeans clustering algorithm, In IEEE International Conference on Systems, Man, and Cybernetics (SMC 1998) 2 (1998), 2034–2039.

29.

Chuang

K.-S.

, Tzeng

H.-L.

, Chen

, Wu

and Chen

T.-J.

, Fuzzy C-means clustering with spatial information for image segmentation, Computerized Medical Imaging and Graphics 30(1) (2006), 9–15.

30.

Cooper

, Location-allocation problems, Operation Research 11 (1963), 331–343.

31.

Cundari

, Sarbu

and Pop

H.F.

, Robust fuzzy principal component analysis (FPCA), A comparative study concerning interaction of carbon-hydrogen bonds with molybdenumoxo bonds, Journal of Chemical Information and Computer Sciences 42(6) (2002), 1363–1369.

32.

Dave

and Sen

, On generalising the noise clustering algorithms, In Proceedings of the 7th IFSA World Congress (IFSA 1997), 1997, pp. 205–210.

33.

Dave

R.N.

, Characterization and detection of noise in clustering, Pattern Recognition Letters 12(11) (1991), 657–664.

34.

Dave

R.N.

, Robust fuzzy clustering algorithms, In Second IEEE International Conference on Fuzzy Systems 2 (1993), 1281–1286.

35.

Dave

R.N.

and Fu

, Robust shape detection using fuzzy clustering: Practical applications, Fuzzy Sets and Systems 65(2-3) (1994), 161–185.

36.

Dave

R.N.

and Krishnapuram

, Robust clustering methods: A unified view, IEEE Transactions on Fuzzy Systems 5(2) (1997), 270–293.

37.

Derin

and Elliott

, Modeling and segmentation of noisy and textured images using Gibbs random fields, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI- 9(1) (1987), 39–55.

38.

Despotovic

, Vansteenkiste

and Philips

, Spatially coherent fuzzy clustering for accurate and noise-robust image segmentation, IEEE Signal Processing Letters 20(4) (2013), 295–298.

39.

Drezner

, A note on accelerating the Weiszfeld procedure, Location Science 3 (1995), 275–279.

40.

Duda

and Hart

, Pattern Classification and Scene Analysis, Wiley, New York, 1973.

41.

Dunn

, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics 3(3) (1973), 32–57.

42.

D’Urso

, Fuzzy clustering of fuzzy data, In de Oliveira

J.V.

and Pedrycz

, editors, Advances in Fuzzy Clustering and its Applications, Wiley, England, 2007, pp. 155–192.

43.

D’Urso

and Giovanni

L.D.

, Robust clustering of imprecise data, Chemometrics and Intelligent Laboratory Systems 136 (2014), 58–80.

44.

Dutter

, Numerical solution of robust regression problems: Computational aspects, a comparison, Journal of Statistical Computation and Simulation 5(3) (1977), 207–238.

45.

Eklundh

J.O.

, Yamamoto

and Rosenfeld

, A relaxation method for multispectral pixel classification, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-2(1) (1980), 72–75.

46.

Frigui

and Krishnapuram

, A comparison of fuzzy shellclustering methods for the detection of ellipses, IEEE Transactions on Fuzzy Systems 4(2) (1996), 193–199.

47.

Frigui

and Krishnapuram

, A robust algorithm for automatic extraction of an unknown number of clusters from noisy data, Pattern Recognition Letters 17(12) (1996), 1223–1232.

48.

Frigui

and Krishnapuram

, Clustering by competitive agglomeration, Pattern Recognition 30(7) (1997), 1109–1119.

49.

Frigui

and Krishnapuram

, A robust competitive clustering algorithm with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5) (1999), 450–465.

50.

Gath

and Geva

, Unsupervised optimal fuzzy clustering, IEEE Transaction on Pattern Analysis Machine Intelligence 11(7) (1989), 773–781.

51.

Geman

and Geman

, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI- 6(6) (1984), 721–741.

52.

Girolami

, Mercer kernel-based clustering in feature space, IEEE Transactions on Neural Networks 13(3) (2002), 780–784.

53.

Gong

, Zhou

and Ma

, Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering, IEEE Transactions on Image Processing 21(4) (2012), 2141–2151.

54.

Gong

, Liang

, Shi

, Ma

and Ma

, Fuzzy C-means clustering with local information and kernel metric for image segmentation, Image Processing, IEEE Transactions on 22(2) (2013), 573–584.

55.

Gray

and Linde

, Vector quantizers and predictive quantizers for Gauss-Markov sources, IEEE Transactions on Communications 30(2) (1982), 381–389.

56.

Gruijter

J.J.D.

and McBratney

A.B.

, A modified fuzzy Kmeans method for predictive classification, In Bock

H.H.

, editor, Classification and Related Methods of Data Analysis, Elsevier, Amsterdam, The Netherlands, 1988, pp. 97–104.

57.

Gustafson

D.E.

and Kessel

W.C.

, Fuzzy clustering with a fuzzy covariance matrix, In IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, volume 17, San Diego, CA, 1979, pp. 761–766.

58.

Hadjahmadi

A.H.

, Homayounpour

M.M.

and Ahadi

S.M.

, Bilateral weighted fuzzy C-means clustering, Iranian Journal of Electrical & Electronic Engineering 8 (2012), 108–121.

59.

Hampel

F.R.

, Ponchotti

E.M.

, Rousseeuw

P.J.

and Stahel

W.A.

, Robust Statistics: The Approach based on Influence Functions. Wiley, New York, 2005.

60.

Hathaway

R.J.

and Bezdek

J.C.

, NERF C-means: Non- Euclidean relational fuzzy clustering, Pattern Recognition 27(3) (1994), 429–437.

61.

Hathaway

R.J.

and Bezdek

J.C.

, Optimization of clustering criteria by reformulation, IEEE Transactions on Fuzzy Systems 3 (1995), 241–246.

62.

Hathaway

R.J.

and Hu

, Density-weighted fuzzy C-means clustering, IEEE Transactions on Fuzzy Systems 17(1) (2009), 243–252.

63.

Hathaway

R.J.

, Davenport

J.W.

and Bezdek

J.C.

, Relational duals of the C-means clustering algorithms, Pattern Recognition 22(2) (1989), 205–212.

64.

Hathaway

R.J.

, Bezdek

J.C.

and Hu

, Generalized fuzzy C-means clustering strategies usingnorm distances, IEEE Transactions on Fuzzy Systems 8(5) (2000), 576–582.

65.

Holland

P.W.

and Welsch

R.E.

, Robust regression using iteratively reweighted least squares, Communication Statistics - Theory and Methods A6(9) (1977), 813–827.

66.

Honda

, Sugiura

and Ichihashi

, Fuzzy PCA-guided robust k-means clustering, IEEE Transactions on Fuzzy Systems 18(1) (2010), 67–79.

67.

Hsiao

J.Y.

and Sawchuk

A.A.

, Supervised textured image segmentation using feature smoothing and probabilistic relaxation techniques, IEEE Transactions on Pattern Analysis and Machine Intelligence 11(12) (1989), 1279–1292.

68.

Hsiao

J.Y.

and Sawchuk

A.A.

, Unsupervised textured image segmentation using feature smoothing and probabilistic relaxation techniques, Computer Vision, Graphics, and Image Processing 48(1) (1989), 1–21.

69.

Huber

P.J.

and Ronchetti

, Robust Statistics. Wiley, New York, 2009.

70.

Hung

C.-C.

, Kulkarni

and Kuo

B-C

, A new weighted fuzzy C-means clustering algorithm for remotely sensed image classification, IEEE Journal of Selected Topics in Signal Processing 5(3) (2011), 543–553.

71.

Jain

A.K.

and Dubes

R.C.

, Algorithms for Clustering Data. Prentice-Hall, 1981.

72.

Jajuga

, L₁-norm based fuzzy clustering, Fuzzy Sets and Systems 39(1) (1991), 43–50.

73.

Jolion

J.M.

, Meer

and Bataouche

, Robust clustering with applications in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8) (1991), 791–802.

74.

Kanungo

, Mount

D.M.

, Netanyahu

N.S.

, Piatko

C.D.

, Silverman

and Wu

A.Y.

, An efficient k-means clustering algorithm: Analysis and implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7) (2002), 881–892.

75.

Karayiannisa

N.B.

and Randolph-Gips

M.M.

, Non-Euclidean C-means clustering algorithms, Intelligent Data Analysis 7 (2003), 405–425.

76.

Keller

, Fuzzy clustering with outliers, In Proceesings of the 19th International Conference of the North American Fuzzy Information Processing Society (NAFIPS 2000), 2000, pp. 143–147.

77.

Klawonn

, Fuzzy clustering: Insights and a new approach, Mathware and soft Computing 11 (2004), 125–142.

78.

Klawonn

and Hoppner

, What is fuzzy about fuzzy clustering? Understanding and improving the concept of the fuzzifier. In Berthold

M.R.

, Lenz

H.-J.

, Bradley

, Kruse

and Borgelt

, editors, Advances in Intelligent Data Analysis V,volume 2810 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2003, pp. 254–264.

79.

Klawonn

, Kruse

and Timm

, Fuzzy shell cluster analysis. In della Riccia

, Lenz

and Kruse

, editors, Learning, networks and statistics, Springer, 1997, pp. 105–120.

80.

Klinker

G.J.

, Shafer

S.A.

and Kanade

, The measurement of highlights in color images, International Journal of Computer Vision 2 (1988), 7–32.

81.

Klinker

G.J.

, Shafer

S.A.

and Kanade

, A physical approach to color image understanding, International Journal of Computer Vision 4 (1990), 7–38.

82.

Krinidis

and Chatzis

, A robust fuzzy local information C-means clustering algorithm, IEEE Transactions on Image Processing 19(5) (2010), 1328–1337.

83.

Krishnapuram

, Generation of membership functions via possibilistic clustering, In IEEE World Congress on Computational Intelligence 2 (1994), 902–908.

84.

Krishnapuram

and Freg

C.-P.

, Fitting an unknown number of lines and planes to image data through compatible cluster merging, Pattern Recognition 25(4) (1992), 385–400.

85.

Krishnapuram

and Keller

, A possibilistic approach to clustering, IEEE Transactions on Fuzzy Systems 1(2) (1993), 98–110.

86.

Krishnapuram

, Nasraoui

and Frigui

, The fuzzy Cspherical shells algorithm: A new approach, IEEE Transactions on Neural Networks 3(5) (1992), 663–671.

87.

Krishnapuram

, Frigui

and Nasraoui

, Quadric shell clustering algorithms and their applications, Pattern Recognition Letters 14(7) (1993), 545–552.

88.

Krishnapuram

, Frigui

and Nasraoui

, Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation - Parts I & II, IEEE Transaction on Fuzzy Systems 3(1) (1995), 29–60.

89.

Kruse

, Doring

and Lesot

M.-J.

, Fundamentals of fuzzy clustering. In de Oliveira

J.V.

and Pedrycz

, editors, Advances in Fuzzy Clustering and its Applications, Wiley, England, 2007, pp. 3–29.

90.

Kuhn

H.W.

and Kuenne

R.E.

, An efficient algorithm for the numerical solution of the generalized Weber problem in the spatial economics, Journal of Regional Science 4 (1962), 21–33.

91.

Kwon

, Han

, Shin

and Park

, Hierarchical fuzzy segmentation of brain MR images, International Journal of Imaging Systems and Technology 13(2) (2003), 115–125.

92.

Lakshmanan

and Derin

, Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing, IEEE Transactions on Pattern Analysis and Machine Intelligence 11(8) (1989), 799–813.

93.

Leski

, Towards a robust fuzzy clustering, Fuzzy Sets and Systems 137(2) (2003), 215–233.

94.

Leski

J.M.

, Generalized weighted conditional fuzzy clustering, IEEE Transactions on Fuzzy Systems 11(6) (2003), 709–715.

95.

Leski

J.M.

, Fuzzy c-varieties/elliptotypes clustering in reproducing kernel Hilbert space, Fuzzy Sets and Systems 141(2) (2004), 259–280.

96.

C.-H.

, Huang

W.-C.

, Kuo

B.-C.

and Hung

C.-C.

, A novel fuzzy weighted C-means method for image classification, International Journal of Fuzzy Systems 10(3) (2008), 168–173.

97.

, Huo

, Ming Zhao

, Chen

and Fang

, A spatial clustering method with edge weighting for image segmentation, IEEE Geoscience and Remote Sensing Letters 10(5) (2013), 1124–1128.

98.

, Li

, Lu

, Chen

and Liang

, Inhomogeneity correction for magnetic resonance images with fuzzy C-mean algorithm, In Proceedings of SPIE 5032 (2003), 995–1005.

99.

M. Lichman, UCI machine learning repository, 2013. URL http://archive.ics.uci.edu/ml

100.

Liew

A.W.-C.

and Yan

, An adaptive spatial fuzzy clustering algorithm for 3-D MR image segmentation, IEEE Transactions on Medical Imaging 22(9) (2003), 1063–1075.

101.

Liew

A.W.-C.

, Leung

S.H.

and Lau

W.H.

, Fuzzy image clustering incorporating spatial continuity, IEE Proceedings on Vision, Image and Signal Processing 147(2) (2000), 185–192.

102.

Liew

A.W.-C.

, Hung Leung

and Lau

W.-H.

, Segmentation of color lip images by spatial fuzzy clustering, IEEE Transactions on Fuzzy Systems 11(4) (2003), 542–549.

103.

Liu

, Xiao

, Liang

and Guan

, Fuzzy C-means clustering with bilateral filtering for medical image segmentation. In Corchado

, Snasel

, Abraham

, Wozniak

, Grana

and Cho

S.-B.

, editors, Hybrid Artificial Intelligent Systems, volume 7208 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2012, pp. 221–230.

104.

MacQueen

J.B.

, Some methods for classification and analysis of multivariate observations, In Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, 1967, pp. 281–297.

105.

Miehle

, Link-length minimization in networks, Operations Research 6(2) (1958), 232–243.

106.

J.J.

, The Levenberg-Marquardt algorithm: Implementation and theory. In Watson

, editor, Numerical Analysis, volume 630 of Lecture Notes in Mathematics, Springer Berlin Heidelberg, 1978, pp. 105–116.

107.

Nascimento

, Mirkin

and Moura-Pires

, Multiple prototype model for fuzzy clustering. In Hand

D.J.

, Kok

J.N.

and Berthold

M.R.

, editors, Advances in Intelligent Data Analysis, volume 1642 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 1999, pp. 269–279.

108.

Nikolaev

D.O.

and Nikolayev

P.O.

, Linear color segmentation and its implementation, Computer Vision and Image Understanding 94 (2004), 115–139.

109.

Nock

and Nielsen

, On weighting clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 28(8) (2006), 1223–1235.

110.

Noordam

and van den Broek

W.H.A.M.

, Multivariate image segmentation based on geometrically guided fuzzy C-means clustering, Journal of Chemometrics 16(1) (2002), 1–11.

111.

Noordam

, van den Broek

and Buydens

, Geometrically guided fuzzy C-means clustering for multivariate image segmentation, In Proceedings of 15th International Conference on Pattern Recognition 1 (2000), 462–465.

112.

Noordam

, van den Broek

and Buydens

, Multivariate image segmentation with cluster size insensitive fuzzy Cmeans, Chemometrics and Intelligent Laboratory Systems 64(1) (2002), 65–78.

113.

Noordam

, van den Broek

W.H.A.M.

and Buydens

L.M.C.

, Unsupervised segmentation of predefined shapes in multivariate images, Journal of Chemometrics 17(4) (2003), 216–224.

114.

Ohashi

, Fuzzy clustering and robust estimation, Presented at the 9th SAS Users Group International (SUGI) Meeting at Hollywood Beach, Florida, 1984.

115.

Ozkan

and Turksen

, Upper and lower values for the level of fuzziness in FCM. In Wang

P.P.

, Ruan

and Kerre

E.E.

, editors, Fuzzy Logic, volume 215 of Studies in Fuzziness and Soft Computing, Springer Berlin Heidelberg, 2007, pp. 99–112. ISBN 978-3-540-71257-2.

116.

Pal

N.R.

and Bezdek

J.C.

, On cluster validity for the fuzzy C-means model, IEEE Transactions on Fuzzy Systems 3(3) (1995), 370–379.

117.

Pal

N.R.

, Pal

and Bezdek

J.C.

, A mixed C-means clustering model, In Proceedings of the Sixth IEEE International Conference on Fuzzy Systems 1 (1997), 11–21.

118.

Pal

N.R.

, Pal

, Keller

and Bezdek

J.C.

, A new hybrid C-means clustering model, In Proceedings of the 2004 IEEE International Conference on Fuzzy Systems 1 (2004), 179–184.

119.

Pappas

T.N.

, An adaptive clustering algorithm for image segmentation, IEEE Transactions on Signal Processing 40(4) (1992), 901–914.

120.

Paris

, Kornprobst

, Tumblin

and Durand

, Bilateral filtering: Theory and applications, Foundations and Trends in Computer Graphics and Vision 4(1) (2009), 1–73.

121.

Park

S.H.

, Yun

I.D.

and Lee

S.U.

, Color image segmentation based on 3-D clustering: Morphological approach, Pattern Recognition 31(8) (1998), 1061–1076.

122.

Pedrycz

, Conditional fuzzy C-means, Pattern Recognition Letters 17(6) (1996), 625–631.

123.

Pedrycz

, Fuzzy set technology in knowledge discovery, Fuzzy Sets and Systems 98(3) (1998), 279–290.

124.

Pedrycz

, Conditional fuzzy clustering in the design of radial basis function neural networks, IEEE Transactions on Neural Networks 9(4) (1998), 601–612.

125.

Pedrycz

and Waletzky

, Fuzzy clustering with partial supervision, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 27(5) (1997), 787–795.

126.

Pham

D.L.

, Spatial models for fuzzy clustering, Computer Vision and Image Understanding 84(2) (2001), 285–297.

127.

Pham

D.L.

, Fuzzy clustering with spatial constraints, In Proceedings of the International Conference on Image Processing (ICIP 2002) volume 2, 2002, pp. II–65–II–68.

128.

Pham

D.L.

and Prince

J.L.

, An adaptive fuzzy C-means algorithm for image segmentation in the presence of intensity inhomogeneities, Pattern Recognition Letters 20(1) (1999), 57–68.

129.

Pham

D.L.

and Prince

J.L.

, Adaptive fuzzy segmentation of magnetic resonance images, IEEE Transactions on Medical Imaging 18(9) (1999), 737–752.

130.

Rezaee

M.R.

, Application of fuzzy techniques in image segmentation. PhD thesis, University of Leiden, Leiden, Netherlands, 1998.

131.

Roberts

, Gisler

G.R.

and Theiler

J.P.

, Spatio-spectral image analysis using classical and neural algorithms. In Dagli

C.H.

, Akay

, Chen

C.L.P.

, Fernaaandez

B.R.

and Ghosh

, editors, Smart Engineering Systems: Neural Networks, Fuzzy Logic, and Evolutionary Programming, volume 6 of Intelligent Engineering Systems Through Artificial Neural Networks, ASME Press, New York, 1996, pp. 425–430.

132.

Rose

, Gurewitz

and Fox

, A deterministic annealing approach to clustering, Pattern Recognition Letters 11(9) (1990), 589–594.

133.

Rose

, Gurewitz

and Fox

, Constrained clustering as an optimization method, IEEE Transactions on Pattern Analysis and Machine Intelligence 15(8) (1993), 785–794.

134.

Rosen

J.B.

and Xue

G.L.

, On the convergence of Miehle’s algorithm for the Euclidean multifactory location problem, Operations Research 40(1) (1992), 188–191.

135.

Rousseeuw

P.J.

, Trauwaert

and Kaufman

, Fuzzy clustering with high contrast, Journal of Computational and Applied Mathematics 64(1-2) (1995), 81–90.

136.

Ruspini

, A new approach to clustering, Information & Control 15(1) (1969), 22–32.

137.

Shen

, Sandham

, Granat

and Sterr

, MRI fuzzy segmentation of brain tissue using neighborhood attraction with neural-network optimization, IEEE Transactions on Information Technology in Biomedicine 9(3) (2005), 459–467.

138.

Sledge

, Bezdek

J.C.

, Havens

and Keller

, Relational generalizations of cluster validity indices, IEEE Transactions on Fuzzy Systems 18(4) (2010), 771–786.

139.

Stewart

, MINPRAN: A new robust estimator for computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 17(10) (1995), 925–938.

140.

Szilagyi

, Benyo

, Szilagyi

and Adam

H.S.

, MR brain image segmentation using an enhanced fuzzy C-means algorithm, In Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS 2003) 1 (2003), 724–726.

141.

Theiler

J.P.

and Gisler

G.R.

, A contiguity-enhanced kmeans clustering algorithm for unsupervised multispectral image segmentation, Proceedings of SPIE, 1997), 108–118.

142.

Timm

, Borgelt

, Doring

and Kruse

, An extension to possibilistic fuzzy cluster analysis, Fuzzy Sets and Systems 147(1) (2004), 3–16.

143.

Tolias

and Panas

, On applying spatial constraints in fuzzy image clustering using a fuzzy rule-based system, IEEE Signal Processing Letters 5(10) (1998), 245–247.

144.

Tolias

and Panas

, Image segmentation by a fuzzy clustering algorithm using adaptive spatially constrained membership functions, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 28(3) (1998), 359–369.

145.

Trivedi

and Bezdek

J.C.

, Low-level segmentation of aerial images with fuzzy clustering, IEEE Transactions on Systems, Man, and Cybernetics 16(4) (1986), 589–598.

146.

Tsai

D.-M.

and Lin

C.-C.

, Fuzzy C-means based clustering for linearly and nonlinearly separable data, Pattern Recognition 44(8) (2011), 1750–1760.

147.

Wang

X.-Y.

and Garibaldi

J.M.

, Simulated annealing fuzzy clustering in cancer diagnosis, Informatica 29(1) (2005), 61–70.

148.

Weiszfeld

, Sur le point pour lequel la somme des distances de n points donnes est minimum, Tohoku Mathematical Journal 43 (1937), 355–386.

149.

Wells

W.M.

, Grimson

, Kikinis

and Jolesz

F.A.

, Adaptive segmentation of MRI data, IEEE Transactions on Medical Imaging 15(4) (1996), 429–442.

150.

K.-L.

, Analysis of parameter selections for fuzzy cmeans, Pattern Recognition 45(1) (2012), 407–415.

151.

K.-L.

and Yang

M.-S.

, Alternative C-means clustering algorithms, Pattern Recognition 35(10) (2002), 2267–2278.

152.

Xue

J.-H.

, Pizurica

, Philips

, Kerre

, Walle

R.V.D.

and Lemahieu

, An integrated method of adaptive enhancement for unsupervised segmentation of MRI brain images, Pattern Recognition Letters 24(15) (2003), 2549–2560.

153.

Yabuuchi

and Watada

, Fuzzy principal component analysis and its application, Biomedical Fuzzy and Human Sciences 3 (1997), 83–92.

154.

Yager

and Filev

, Approximate clustering via the mountain method, IEEE Transactions on Systems, Man and Cybernetics 24(8) (1994), 1279–1284.

155.

Yang

M.-S.

, A survey of fuzzy clustering, Mathematical and Computer Modelling 18(11) (1993), 1–16.

156.

Yang

M.-S.

and Wu

K.-L.

, Unsupervised possibilistic clustering, Pattern Recognition 39(1) (2006), 5–21.

157.

Yang

, Image segmentation based on fuzzy clustering with neighborhood information, Optica Applicata 39(1) (2009), 135–147.

158.

Yang

and Huang

, Image segmentation by fuzzy C-means clustering algorithm with a novel penalty term, Computing and Informatics 26(1) (2007), 17–31.

159.

Yang

, Zheng

and Lin

, Fuzzy C-means clustering algorithm with a novel penalty term for image segmentation, Opto-Electronics Review 13(4) (2005), 309–315.

160.

, Cheng

and Huang

, Analysis of the weighting exponent in the FCM, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(1) (2004), 634–639.

161.

Zadeh

L.A.

, Fuzzy sets, Information Control 8 (1965), 338–353.

162.

Zha

, Ding

, Gu

, He

and Simon

, Spectral relaxation for K-means clustering, In Proceedings of Advances in Neural Information Processing Systems, 2002, pp. 1057–1064.

163.

Zhang

and Chen

, A novel kernelized fuzzy C-means algorithm with application in medical image segmentation, Artificial Intelligence in Medicine 32(1) (2004), 37–50.

164.

Zhang

D.-Q.

and Chen

S.-C.

, A comment on “Alternative C-means clustering algorithms”, Pattern Recognition 37(2) (2004), 173–174.

165.

Zhou

, Fu

and Yang

S.L.

, Fuzziness parameter selection in fuzzy c-means: The perspective of cluster validation, Science China Information Sciences 57(11) (2014), 1–8.

166.

Zhuang

, Wang

and Zhang

, A highly robust estimator through partially likelihood function modeling and its application in computer vision, IEEE Transactions on Pattern Analysis and Machine Intelligence 14(1) (1992), 19–35.