A robust fuzzy clustering algorithm using mean-field-approximation based hidden Markov random field model for image segmentation

Abstract

Although how to deal well with images corrupted with noise is a commonly encountered task in image segmentation, the design of efficient and robust segmentation algorithms still keeps a challenging research topic. In this paper, a robust fuzzy-clustering-based image segmentation algorithm is presented to effectively segment noisy images. The proposed algorithm is derived from both the conventional fuzzy c-means (FCM) clustering algorithm and the hidden Markov random field (HMRF) model with the capability of incorporating spatial information. The performance of the proposed algorithm is experimentally evaluated with the comparison algorithms. Experimental results on synthetic and real images demonstrate the effectiveness of the proposed algorithm.

Keywords

Image segmentation fuzzy c-means clustering hidden Markov random field mean field approximation

1 Introduction

Image segmentation is a process in which an image is divided or partitioned into a set of homogeneous regions [33]. It is significant for a lot of applications, including computer vision, pattern recognition, analysis of biologic images, medical image processing, image reconstruction, object tracking [2]. Furthermore, image segmentation is usually the first task of image analysis, and hence the quality of segmentation heavily affects subsequent tasks. However, the design of efficient and robust image segmentation algorithms is still a very challenging research topic, due to the variety and complexity of images, such as noise, outliers, and other imaging artifacts [28]. Therefore, many researchers have been contributing their endeavors to improve the performance of image segmentation [16].

Threshold technique is one of the widely used techniques for image segmentation because of its simplicity. The basic idea of threshold technique is to partition a raw image into different regions according to each pixel of the image based on a comparison with some optimal threshold value. Typical threshold based image segmentation methods include Mean method [31], P-title method [33] and Otus method [17, 21]. Threshold based image segmentation has less computational burden than other image segmentation techniques [19]. However, its drawback is its infeasibility for complex images.

Clustering is another technique used for image segmentation. When applied to image segmentation, clustering is a process in which all the pixels of a given image are divided into different clusters that have similar features in a same cluster. The k-means algorithm [11, 12] and the fuzzy c-means algorithm [3 , 36] are two most typical clustering schemes. The k-means algorithm is a hard clustering strategy in which data are divided into a number of distinct clusters where each data element belongs to exactly one cluster [29]. Except k-means algorithm is sensitive to the presence of outliers and noisy points [32], it is also not good enough for image segmentation because it does not take account of the spatial dependencies between image pixels. So quite often, it is only used in the initialization step for other approaches. The fuzzy clustering is a process that assigns all the data elements in a dataset to different clusters using fuzzy membership degrees. The fuzzy c-means algorithm called FCM is one of the most famous unsupervised fuzzy clustering techniques that have been applied successfully in image segmentation [4, 20]. Although FCM generally yields satisfactory results for segmenting noise-free images, it often fails to segment images corrupted by noise or containing inaccuracy edges. In the past studies [8 , 25], several scholars have tried their best to improve the robust capability of FCM. Nevertheless, all these methods still have the following drawbacks: failure to cluster data from arbitrary shapes, limited robustness to outliers, and high computationalcost.

Statistical techniques are also often used to segment images. Gaussian mixture models (GMM) is a common statistical technique for image segmentation [37]. Gaussian mixture model considers an image as the result generated from a mixture of a finite number of Gaussian distributions with unknown parameters, thus an image segmentation problem can be transformed into the maximum likelihood parameter estimation by the expectation-maximization (EM) algorithm. However, GMM has an intrinsic limitation that no spatial information is taken into account. This limitation causes the model to yield good results only for corrupted images with low noise levels. For corrupted images with high noise levels, GMM produces unreliable results. To address this problem, an approach combining Gaussian mixture model with hidden Markov random field (GMM-HMRF) is proposed in [23]. This approach has the advantage of incorporating neighbor information by using hidden Markov random field (HMRF). The HMRF model provides a powerful framework for introducing the information about the mutual influences among pixels in an image into the image segmentation procedure [9, 27]. Although considering dependency structure among pixels in an image can improve segmentation performance, the HMRF model has the heavy computational burden arising from the calculation of the conditional expectations of the hidden Markov processes, which may even become intractable due to the difficulty of analytical computation. In [10 , 34], the pseudo-likelihood approximation is used to make the HMRF model computationally feasible. Though the pseudo-likelihood approximation usually help the HMRF model provide a reasonable image segmentation result, it needs a considerable time cost. In this paper, we use the mean field approximation in [1, 7] instead of the pseudo-likelihood approximation to make the HMRF model become tractable. When considering a particular pixel i in an image, by neglecting the fluctuations of the pixels interacting with i, the mean field approximation can result in valid probability models and has much less computational burden than the pseudo-likelihoodapproximation.

The work in this study mainly aims to present a new image segmentation algorithm through the integration of both FCM and HMRF to greatly improve the segmentation results and enhance the flexibility in image segmentation settings. As we know well, FCM has two main advantages. One is that it is easy to understand and implement, and the other is that its fuzzy objective function is easy to extend by absorbing the considered items. As we also know, two strong correlations exist in a real-world image. One is the correlation between the state and the pixel value at each pixel, and the other is the correlation among the neighbor pixels. The HMRF model can capture this prior knowledge about these two strong correlations in an image by defining a graph structure with potential function over the states of cliques in this graph. As a result, the HMRF model considering the prior knowledge can produce smooth state for noisy images. Since the proposed algorithm is on the basis of both FCM and HMRF, it can benefit from their distinctive features.

This paper is organized as follows. A brief review of the FCM algorithm is stated in Section 2. Section 3 briefly describes a general version of the HMRF model for image segmentation. In Section 4, the proposed algorithm is given and its parameter estimation is also derived. Experimental results and comparisons are reported in Section 5. Finally, conclusions are drawn in Section 6.

2 FCM

In this section, we start with a brief view of the fuzzy clustering algorithm FCM [3]. Suppose x = {x ₁, x ₂, …, x _N} denotes a set of N objects to be divided into K clusters, where the ith object x _i is a vector with l dimensions. FCM minimizes the objective function defined as follows $J (Θ) = \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij}^{m} d_{ij}$ (1) where z _ij is the fuzzy membership of the ith object x _i belonging to the jth cluster, m the so-called fuzzy index which controls fuzzy degree of the membership z _ij, d _ij is a distance metric that measures the similarity between the ith object x _i and the jth cluster prototype. Both z _ij and m satisfy the constraints $z_{ij} \geq 0; \sum_{j = 1}^{K} z_{ij} = 1; m > 1$ (2)

FCM is an iterative clustering method. The new cluster center is obtained in each iteration step by $v_{j} = \frac{\sum_{i = 1}^{N} z_{ij}^{m} x_{i}}{\sum_{i = 1}^{N} z_{ij}^{m}}$ (3) and the new membership degree is obtained in each iteration step by $z_{ij} = {[\sum_{k = 1}^{K} {(\frac{| | x_{i} - v_{j} | |^{2}}{| | x_{i} - v_{k} | |^{2}})}^{\frac{1}{m - 1}}]}^{- 1}$ (4)

With Equations (3 and 4), we can summarize FCM as follows.

Fix K, m and ɛ; set the iteration counter t = 0.

Initialize the membership degree matrix Z ⁽⁰⁾.

Update the cluster centers using Equation (3).

Update the membership degree matrix using Equation (4).

t = t + 1.

Repeat Step 3 to Step 5 until ∥Z ^(t) − Z ^(t − 1) ∥ < ɛ.

It should be pointed out that since FCM does not consider/leverage the spatial information existing in an image, it is not always effective for image segmentation.

3 HMRF

In this section, we will briefly state that image segmentation can be generally explained in terms of the HMRF model such that the spatial information in an image may be leveraged.

The HMRF model here for image segmentation can be illustrated using Fig. 1. According to the HMRF model, the image to be segmented (called observed image in Fig. 1) can be viewed as a realization of a Markov random field y = (y ₁, y ₂, …, y _i, …, y _N) , i ∈ A where A = {1, 2, …, N} denotes the set of the pixel indexes. The segmented image can be viewed as the realization of another Markov Random Field s = (s ₁, …, s _i, …, s _N) , i ∈ A, taking values in the discrete state space B = {1, 2, …, K}. K represents the number of classes in the image. Each pixel y _i can be observed while each state s _i is hidden. Mathematically, the HMRF model is generally characterized by the following two aspects.

(1) The hidden random field s is an underlying MRF with the following probability distribution [27], $p (s | β) = T (β)^{- 1} exp (- U (s | β))$ (5) where T (β) is a normalizing constant called the partition function given by $T (β) = \sum_{s \in ℚ} exp (- U (s | β))$ (6) and U (s|β) is an energy function which is generally expressed in the following form $U (s | β) = \sum_{c \in C} V_{c} (s | β)$ (7) where C denotes all possible cliques.

Each clique c is a subset of all the pixels in an image in which every pair of distinct pixels is neighbors, V _c denotes the clique potential of the clique c. β is an inverse supercritical temperature parameter in the clique potentials and $ℚ = B^{N}$ in Equation (6) denotes the configuration set of the hidden random field s.

The intuition behind the HMRF model is that a pixel is more likely to be of a state if the pixels in its neighbors are also of that state. For example, in Fig. 1, the state of s _i, corresponding to the pixel y _i, intends to have the same state with the gray points, corresponding to the neighbors of the pixel y _i. Based on this assumption, Equation (5) is often calculated [10] by $p (s | β) = \prod_{i = 1}^{N} p (s_{i} | s_{\partial_{i}}; β)$ (8) where s _{∂_i} is the state vector of the ith pixel’s neighbors. p (s _i|s _{∂_i} ; β) [10] is given by $p (s_{i} | s_{\partial_{i}}; β) = \frac{exp (- \sum_{i \in c} V_{c} (s | β))}{\sum_{s_{i} = 1}^{K} exp (- \sum_{i \in c} V_{c} (s | β))}$ (9)

As shown in [26], Equation (8) often obtains consistent estimators for the HMRF model parameters using the pseudo-likelihood approximation. However the neighbors are still allowed to fluctuate and a large amount of computations are required in this method. In this study, the mean field approximation is adopted. The mean field approximation [7] can neglect fluctuations by taking the mean of the neighbors of each pixel. The state value of the pixel i does not depend on the state values of other pixels which are all set to constants independent on the state value of the pixel i. In this way, Equation (8) can be readily calculated and hence the computational burden of the HMRF model is accordingly reduced. When the mean field approximation is adopted, we can re-write Equation (8) as $p (s | β) = \prod_{i = 1}^{N} p (s_{i} | {\hat{s}}_{\partial_{i}}; β)$ (10) where ${\hat{s}}_{\partial_{i}}$ is a state vector of the ith pixel’s neighbors estimated by the mean field approximation.

(2) y is an observable random field. y follows the known conditional probability distribution [27], $p (y | s; θ) = \prod_{i = 1}^{N} p (y_{i} | s_{i} = j; θ_{j})$ (11) where θ _j denotes the parameter of the jth state in the HMRF model. Finally, the conditional probability distribution with a Gaussian emission distribution can be derived as $p (y | s; θ) = \prod_{i = 1}^{N} N (y_{i} | μ_{j}, Σ_{j})$ (12) where μ _j and Σ _j are the mean and covariance of the Gaussian emission distribution of the jth state of the HMRF model, respectively.

As can be seen in the above, the HMRF model defines a structure with potential function over the states of cliques. This structure generally has a node corresponding to each pixel, edges connecting certain pairs of neighboring states. Due to the use of this structure, the HMRF model indeed has a capability of leveraging the neighbor/spatial information.

4 Proposed algorithm

To overcome the drawback that spatial information is not considered in FCM when applied to image segmentation, a robust fuzzy clustering algorithm using hidden Markov random field model is proposed in this section. After spatial information, expressed in terms of the HMRF model, is incorporated into FCM, its new objective function of FCM becomes $J (Θ) = \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij} d_{ij} + λ \sum_{i = 1}^{N} \sum_{j = 1}^{K} (π_{ij} - z_{ij})^{2}$ (13) where the positive regularization parameter λ controls the role of the second item in Equation (13), d _ij is the dissimilarity function which measures the similarity between the observation y _i and the jth region prototype, z _ij is the fuzzy membership, π _ij is the prior probability of the observation y _i belonging to the jth region. Please note, since several researchers have pointed out that m used in FCM (see Equation (1)) is actually not a natural element [8, 30] and the degree of fuzziness of the clustering algorithm can be well reflected by considering the divergence information (e.g, KL divergence information), so we use the second term in Equation (13) as the divergence information instead of directly considering the fuzzy index m in Equation (1). Moreover, easy mathematical derivation is an additional benefit of doing so.

To obtain an estimate of π _ij, the mean field approximation of the Markov field prior p (s|β) in Equation (10) can be adopted. Using the mean field approximation [7], we have

$\begin{matrix} π_{ij} & = & p (s_{i} = j | {\hat{s}}_{\partial_{i}}; β) \\ = & \frac{exp (- \sum_{i \in c} V_{c} (s_{i} = j | {\hat{s}}_{\partial_{i}}; β))}{\sum_{h = 1}^{K} exp (- \sum_{i \in c} V_{c} (s_{i} = h | {\hat{s}}_{\partial_{i}}; β))} \end{matrix}$ (14)

In the proposed algorithm, the estimate s _i of the ith pixel’s state is obtained by using the following Equation (15): $s_{i} = {arg max}_{j = 1}^{K} z_{ij}$ (15)

In essence, in contrast to FCM, the proposed algorithm adds the second item in Equation (13). The second item plays a role of incorporating spatial information expressed by HMRF. As analyzed in Section 3, the power of the HMRF model lies in its ability to capture both the interaction between neighbor pixels and the relationship between the observation and the state at each pixel. That is to say, π _ij can capture the states of neighboring pixels. In terms of the second item, the fuzzy membership z _ij is encouraged to have similar value with the prior probability π _ij obtained by the states of neighboring pixels. This tendency contributes the robustness of the proposed objective function in Equation (13).

The fuzzy membership z _ij in Equation (13) satisfies the constraints $z_{ij} \geq 0; \sum_{j = 1}^{K} z_{ij} = 1$ (16)

Since the standard Euclidean dissimilarity function is not robust. The dissimilarity function d _ij in Equation (13) may be defined as $d_{ij} = - log p (y_{i} | s_{i} = j; θ_{j})$ (17)

According to Equation (12), the likelihood log p (y _i|s _i = j ; θ _j) in Equation (17) is defined as

$\begin{matrix} log p (y_{i} | s_{i} = j; θ_{j}) = - \frac{l}{2} log (2 π) \\ - \frac{1}{2} log | Σ_{j} | - \frac{1}{2} (y_{i} - μ_{j})^{T} Σ_{j}^{- 1} (y_{i} - μ_{j}) \end{matrix}$ (18) where l stands for the dimensions of observations. From Equations (17 and 18), the objective function in Equation (13) can be rewritten as

$\begin{matrix} J (Θ) & = & \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij} (\frac{l}{2} log (2 π) + \frac{1}{2} log | Σ_{j} | \\ + \frac{1}{2} (y_{i} - μ_{j})^{T} Σ_{j}^{- 1} (y_{i} - μ_{j})) \\ + λ \sum_{i = 1}^{N} \sum_{j = 1}^{K} (π_{ij} - z_{ij})^{2} \end{matrix}$ (19)

The estimate of the parameter β is given by $β^{(t + 1)} = \underset{β}{arg max} \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij} log p (s_{i} = j | {\hat{s}}_{\partial_{i}}; β^{(t)})$ (20)

To minimize the objective function in Equation (13), we use the Lagrangian optimization method. The Lagrange function can accordingly be defined as

$\begin{matrix} L (Θ) & = & \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij} d_{ij} + λ \sum_{i = 1}^{N} \sum_{j = 1}^{K} (π_{ij} - z_{ij})^{2} \\ + \sum_{i = 1}^{N} η_{i} (1 - \sum_{j = 1}^{K} z_{ij}) \end{matrix}$ (21) where η _i is a Lagrange multiplier. In order to optimize the parameter z _ij, we now consider the differential of the function L (Θ) in Equation (21) with respect to z _ij $\frac{\partial L (Θ)}{\partial z_{ij}} = d_{ij} + 2 λ (z_{ij} - π_{ij}) - η_{i}$ (22)

The solution of ∂L (Θ)/∂z _ij = 0 yields the minimization of z _ij at the new step $z_{ij} = π_{ij} + \frac{\frac{1}{K} \sum_{k = 1}^{K} d_{ik} - d_{ij}}{2 λ}$ (23)

Similarly, the solution of ∂L (Θ)/∂ μ _j = 0 yields the minimization of μ _j at the new step $μ_{j} = \frac{\sum_{i = 1}^{N} z_{ij} y_{i}}{\sum_{i = 1}^{N} z_{ij}}$ (24)

Similarly, the solution of ∂L (Θ)/∂Σ _j = 0 yields the minimization of Σ _j at the new step $Σ_{j} = \frac{\sum_{i = 1}^{N} z_{ij} (y_{i} - μ_{j}) (y_{i} - μ_{j})^{T}}{\sum_{i = 1}^{N} z_{ij}}$ (25)

In Equation (23), we can easily see that when λ→ ∞, z _ij = π _ij. This means that when λ is large enough, the fuzzy membership of each object tends to have the same value as its prior probability. When λ is very large, the second item in Equation (13) dominates the optimization such that the solution to minimize the objective function actually becomes the solution to minimize its second item. Indeed, for i = {1, 2, …, N}, $\sum_{j = 1}^{K} (π_{ij} - z_{ij})^{2}$ reaches its minimum at z _ij = π _ij. Obviously, this extreme situation is useless for practical applications. To optimize the original objective function, the second item in Equation (13) is introduced to regularize the first term of the conventional FCM algorithm. However, the second item should not be set too large to avoid the phenomenon that the regularization item is overemphasized to make it dominate the optimization.

Since $\frac{1}{K} \sum_{k = 1}^{K} d_{ik}$ in the second item of Equation (23) denotes the average distance between the object i to all the cluster centers, so the second item will become negative when the distance between the object i to the jth cluster center is greater than this average distance. Thus the membership z _ij has possibly negative values when the absolute values of the negative second term of Equation (23) is greater than the positive value of π _ij. In order to avoid such a case at the successive iteration steps, the non-negative constraint in Equation (16) should be considered. The Lagrange function in Equation (21) without considering this constraint should be modified as

$\begin{matrix} L (Θ) & = & \sum_{i = 1}^{N} \sum_{j = 1}^{K} z_{ij} d_{ij} + λ \sum_{i = 1}^{N} \sum_{j = 1}^{K} (π_{ij} - z_{ij})^{2} \\ + \sum_{i = 1}^{N} η_{i} (1 - \sum_{j = 1}^{K} z_{ij}) + \sum_{i = 1}^{N} \sum_{j = 1}^{K} γ_{ij} z_{ij} \end{matrix}$ (26) where γ _ij is a Lagrange multiplier. In terms of the Karush-Kuhn-Tucker conditions [14], we have ${\begin{matrix} \frac{\partial L}{\partial z_{ij}} = 0 \\ γ_{ij} > 0 \\ γ_{ij} z_{ij} = 0 \end{matrix}$ (27)

The solution of z _ij can finally be given by $z_{ij} = {\begin{matrix} 0 & j \in K^{-} \\ π_{ij} + \frac{\frac{1}{| K^{+} |} \sum_{k = 1}^{| K^{+} |} d_{ik} - d_{ij}}{2 λ} & j \in K^{+} \end{matrix}$ (28) and $K^{-} = {j : z_{ij} = 0}; K^{+} = {j : z_{ij} > 0}$ (29) where K ⁺ denotes a set of clusters where x _i belongs to or has positive values of fuzzy memberships, and K ⁻ denotes a set of clusters where this object does not belong to or has zero value of fuzzy memberships. |K ⁺| and |K ⁻| represent the sizes of the sets K ⁺ and K ⁻, respectively.

So far, we have given the main details of the proposed fuzzy clustering algorithm. Now we summarize the proposed algorithm as follows

Initialize the parameters Θ = (z _ij, μ _j, Σ _j, β); Set the termination threshold ɛ.

Compute the state vector s _i using Equation (15).

Compute the prior probability π _ij using Equation (14).

For each i ∈ {1, 2, …, N}

Decide K ⁺ and K ⁻ by calling

Procedure − K below

For each j ∈ {1, 2, …, K}

Calculate z _ij using Equation (28).

End

Update the mean vector μ _j using Equation (24).

Update the covariance matrices Σ _j using Equation (25).

Update the inverse temperature parameter β using Equation (20).

Repeat Step 2 to Step 7 until ∥J ^(t+1) − J ^(t) ∥ < ɛ

Procedure − K is given as follows:

Initialize $K_{0}^{+} = \emptyset$ , $K_{0}^{-} = {1, 2, \dots, K}$ , h = 0.

h ← h + 1, $K_{h}^{+} = K_{h - 1}^{+} + {v}$ ,

$K_{h}^{-} = K_{h - 1}^{-} - {v}$ , $v = \underset{c \in K_{h - 1}^{-}}{arg min} {d_{ic}}$ .

Compute z _ij in terms of Equation (28), go to step 2 if z _ij > 0, else set $K^{+} = K_{h - 1}^{+}$ , $K^{-} = K_{h - 1}^{-}$ and then terminate.

When a dissimilarity matrix is given, the computational complexity of the proposed algorithm is O (N ²) at each iteration step, which is the same as in other dissimilarity-based fuzzy clustering algorithms including KL-FCM [8].

The proposed algorithm combines the FCM algorithm and the HMRF model by adding a regularization item in the objective function of the FCM algorithm. The regularization item essentially encourages the fuzzy memberships to have similar values with the prior probability which captures the strong correlations not only among the neighboring pixels but also between the observations and the states at each pixel. The tendency that the fuzzy membership and the prior probability have similar value contributes the capability of dealing with noisy data such as corrupted images. In other words, the modified objective function has the characteristic of robustness. As we have analyzed above, the proposed algorithm combines the flexibility of the conventional FCM algorithm with the enhanced spatial information modeling capabilities of the HMRF model. In this way it has significant benefits over the related image segmentation algorithms. The superiority of the proposed algorithm over the related algorithms will be illustrated in Section 5 by reporting the experimental results about several synthetic and real-world image segmentation tasks.

5 Experimental evaluation

In this section, several experiments are conducted to evaluate the effectiveness and robustness of the proposed algorithm and do a comparative study between it and five comparison algorithms. We carry out the proposed algorithm and five comparison algorithms on synthetic gray images and MRI brain images corrupted with different noise types and levels, as well as real color images from Berkeley dataset.

Five comparison algorithms are FCM [3], KL-FCM [8], GMM [37], GMM-HMRF [23] and OSTU [21]. For the FCM algorithm, the fuzzy index m is set to 2. For the KL-FCM algorithm, the parameter λ is regarded as temperature, and the annealing schedule is used. We set the initial value λ ⁽⁰⁾ = 8, and λ ^(t+1) (t) = λ ^(t)/ln(2 + t) where t denotes the iteration number. λ is fixed after it reaches to 2. As for the parameter λ involved in the proposed algorithm, it is determined by the grid search strategy from (0.0, 100] with the step 0.2. For all these algorithms, the parameter ɛ involved in iteration termination criterion is set to 10^–5. In the proposed algorithm and the GMM-HMRF algorithm which are both based on the HMRF model, since a second-order neighborhood system [27] is adopted, the energy function in Equation (7) for the general version of the HMRF model for image segmentation can be exactly expressed in the following form $U (s | β) = - β \sum_{i = 1}^{N} \sum_{h \in \partial_{i}} δ (s_{i} - s_{h})$ (30)

After adopting the energy function in Equation (30), Equation (14) about the prior probability π _ij actually becomes $π_{ij} = \frac{exp (β \sum_{h \in \partial_{i}} δ (j - s_{h}))}{\sum_{k = 1}^{K} exp (β \sum_{h \in \partial_{i}} δ (k - s_{h}))}$ (31) where δ (·) denotes the Kronecker’s delta function which is given by $δ (s_{i} - s_{h}) = {\begin{matrix} 1, & if s_{i} = s_{h} \\ 0, & otherwise \end{matrix}$ (32)

In order to have an objective evaluation for the results obtained by the proposed algorithm and other comparison algorithms, the probabilistic rand index PR is adopted. Here we only give its definition, and its more computation details can be also seen in [24].

$\begin{matrix} PR (U_{g}, U_{s}) & = & \frac{2}{N (N - 1)} \sum_{i} \sum_{j > i} [c_{ij} p_{ij} \\ + (1 - c_{ij}) (1 - p_{ij})] \end{matrix}$ (33) where U _g is the ground truth which is usually given. U _s is the segmentation result which is yielded by various algorithms and will be evaluated. N is the number of pixels in an image. p _ij denotes the ground truth probability that a pair of pixels i and j have the same state. c _ij stands for the event that the pixel i and j belong to the same segmentation in U _s. The value of c _ij is one if the pixel i and j belong to the same segmentation, otherwise, its value is zero. The value of the PR index is in the range of 0 to 1. A bad segmentation will obtain the PR index value closing to 0. Otherwise, a good segmentation will obtain the PR index value closing to 1.

5.1 Segmentation of synthetic gray images

In the first experiment, we illustrate the robustness of the proposed algorithm against noise. For this purpose, we consider the application of the proposed algorithm in segmentation of some synthetic gray images with different noise types and levels.

The six images adopted in this experiment are illustrated in Figs. 2(a), 3(a) and 4(a). Each image is 128×128 resolutions. The two images in Fig. 2(b) are derived from the original image by corrupting it with 5% Gaussian noise, the two images in Fig. 3(b) are derived from the original image by corrupting it with 10% Salt & Pepper noise and the two images in Fig. 4(b) are derived from the original image by corrupting it with 15% Speckle noise. The numbers of the classes of the six images are 4, 3, 5, 4, 5 and 5 respectively.

The segmentation results of the proposed algorithm and the comparison algorithms FCM, KL-FCM, GMM, GMM-HMRF, OTSU are shown from Figs. 2(c) to (h), 3(c) to (h) and 4(c) to (h) respectively. Tables 1 –3 provide average performance comparison of the proposed algorithm and five comparison algorithms on six synthetic gray images under three noise types which are Gaussian noise, Salt and Pepper noise and Speckle noise, and six noise levels which are 0%, 5%, 10%, 15%, 20% and 25%. As we see from Tables 1 3, for Gaussian noise and Speckle noise, the proposed algorithm outperforms the comparison algorithms with the higher PR values in despite of noise levels. For Salt and Pepper noise, the proposed algorithm obtains almost the same PR values as OTSU, FCM and KL-FCM. While the PR values obtained by KL-FCM are always higher than that by FCM because of the robust capability of KL-FCM and the PR values by GMM-HMRF are always higher than that by GMM because of the ability of the incorporation of spatial information in the HMRF model. As evident from Tables 1 3, on average, the proposed algorithm is accurate and robust with respect to noise, in contrast to the comparison algorithms.

5.2 Segmentation of MRI brain images

In the second experiment, MRI brain images corrupted with different noise types and levels are used to evaluate the proposed algorithm and five comparison algorithms.

These brain images are obtained from the Brainweb dataset [5]. The Brainweb dataset contains a set of realistic MRI data volumes to evaluate the performance of various image analysis methods in a setting where the truth is known. A 3-dimensional data volumes using T1-weighted sequence, 1 mm slice thickness and 0% noise levels are generated. The fourteen brain images in the data volumes are selected in this experiment, as shown in Fig. 5 where each image is denoted as Slice #.

We test the performance of the proposed algorithm and all five comparison algorithms on the brain images under three noise types which are Gaussian noise, Salt and Pepper noise and Speckle noise, and six noise levels which are 0%, 5%, 10%, 15%, 20% and 25%. The objective is to segment these brain images into ten classes. As shown in Tables 4 –6, the PR values obtained by KL-FCM remain higher than that by FCM, and the PR values obtained by GMM-HMRF are also higher than that by GMM with the same reason as pointed out in the first experiment. In contrast to the comparison algorithms, the influence of noise on the final segmented image by the proposed algorithm is lowest. The segmentation accuracy of the proposed algorithm in this experiment is highest.

5.3 Segmentation of Berkeley images

Finally, we evaluate the proposed algorithm by using a subset of the Berkeley image segmentation dataset in [6]. The Berkeley image segmentation dataset consists of a set of natural color images and their ground truth segmentation. All of the images used in the experiment are 481 × 321 resolutions.

In Table 7, we provide the PR values obtained by the proposed algorithm and all the comparison algorithms on each test image without noise.

We illustrate some of the original images in the adopted dataset in Fig. 6 where each image is denoted as its corresponding image number, and the corresponding segmentations by the proposed algorithm in Fig. 7. Besides, we test the performance of the proposed algorithm and all five comparison algorithms on these images under three noise types and five noise levels and the results are listed in Tables 8 –10. As we can easily observe from Tables 7 10, on average, the proposed algorithm yields better segmentation results in contrast to the comparison algorithms.

6 Conclusion

In this paper, as an extension of the conventional fuzzy clustering algorithm FCM, A new robust fuzzy clustering algorithm for image segmentation is proposed. The power of the proposed algorithm lies in the new objective function modified from the objective function of the conventional FCM algorithm. The new objective function has a capability of absorbing neighbor information by adding a regularization item which contains the MRF prior. The regularization item encourages the fuzzy memberships to have the similar values with the MRF prior. The tendency that the fuzzy membership and the MRF prior have similar values contributes the robustness of the proposed algorithm. Considering the computational cost of the MRF prior, the mean field approximation is adopted. We derive a new way to address the problem that fuzzy memberships might have negative values at the successive iteration steps. Experimental results on synthetic and real images indicate the effectiveness of the proposed algorithm.

Future work will focus on how to speed up the proposed algorithm and how to seek its effective self-adaptive parameter setting strategy.

Footnotes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (Grants 61272210 and 61572236), the Ministry of education program for New Century Excellent Talents (Grant NCET-120882), the Fundamental Research Funds for the Central Universities (Grants JUSRP51321B and JUSRP51614A), the Jiangsu Province Outstanding Youth Fund (Grant BK20140001) and the Natural Science Foundation of Jiangsu Province (Grant BK20130155).

References

Dummur

A.P.

, and Titterington

, Parameter estimation in latent profile models, Computational Statistics & Data Analysis 27(4) (1998), 371–388.

Penga

, Zhang

, and Zhang

, A survey of graph theoretical approaches to image segmentation, Pattern Recognition 46(3) (2013), 1020–1038.

Bezdek

, Pattern Recognition with Fuzzy Objective Functions Algorithms, Plenum Press, 1981.

Che

, and Zhang

, Robust image segmentation using FCM with spatial constraints based in new kernel-induced distance measure, IEEE Transaction Systems Man Cybernetics 34(4) (2004), 1907–1916.

Collins

D.L.

, Zijdenbos

A.P.

, Kollokian

, Sled

J.G.

, Kabani

N.J.

, Holmes

C.J.

, and Evans

A.C.

, Design and Construction of a Realistic Digital Brain Phantom, IEEE Transactions on Medical Imaging 17(3) (1998), 463–468.

Martin

, Fowlkes

, Tal

, and Malik

, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, In Proc 8th IEEE Int Conf Comput Vis, Vancouver, BC, Canada, 2 2001, pp. 416–423.

Celeux

, Forbes

, and Peyrard

, EM procedures using mean field-like approximations for markov model-based image segmentation, Pattern Recognition 36(1) (2003), 131–144.

Ichihashi

, Miyagishi

, and Honda

, Fuzzy c-means clustering with regularization by K-L information, IEEE International Conference on Fuzzy System (1997), 86–92.

Besag

, On the statistical analysis of dirty pictures, J Royal Statistical Soc, Series B 48(3) (1986), 259–302.

10.

Besag

, Statistical analysis of non-lattice data, The Statistician 24(3) (1975), 179–195.

11.

Hartigan

, and Wong

, Algorithm AS136: A k-means clustering algorithm, Applied Statistics 28 (1979), 100–108.

12.

Hartigan

, Clustering Algorithms, John Wiley & Sons, New York, NY, 1975.

13.

Leski

, Toward a robust fuzzy clustering, Fuzzy Sets System 137(2) (2003), 215–233.

14.

Mei

J.P.

, and Chen

, Fuzzy clustering with weighted medoids for relational data, Pattern Recognition 43 (2010), 1964–1974.

15.

Wang

, Wang

, Chung

F.-L.

, and Deng

, Fuzzy partition based soft subspace clustering and its applications in high dimensional data, Inf Sci 246 (2013), 133–154.

16.

Plataniotis

, and Venetsanopoulos

A.N.

, Color image processing and applications, Springer Science & Business Media (2013), 237–278.

17.

Wei

, Zhang

, Shen

, and Liu

, An improved threshold selection algorithm based on particle swarm optimization for image segmentation, Third International Conference on Natural Computation 5 (2007), 591–594.

18.

Ibanez

M.V.

, and Simo

, Parameter estimation in Markov random field image modeling with imperfect observations. A comparative studey, Pattern Recognition Letters 24(14) (2003), 2377–2389.

19.

Baradez

M.-0.

, McGuckin

C.P.

, Forraz

, Pettengell

, and Hoppe

, Robust and automated unimodal histogram thresholding and potential applications, Pattern Recognition 37(6) (2004), 1131–1148.

20.

Ahmed

, Yamany

, Mohamed

, and Farag

, A modified fuzzy C-means algorithm for bias field estimation and segmentation of MRI data, IEEE Transaction on Medical Imaging 21(3) (2002), 193–199.

21.

Otsu

, A threshold selection method from gray-level histogram, IEEE Transaction System Man Cybernetics 9(1) (1979), 62–66.

22.

Pal

N.R.

, Pal

, Keller

J.M.

, and Bezdek

J.C.

, A possibilistic fuzzy c-means clustering algorithm, IEEE Transaction Fuzzy System 13(4) (2005), 517–530.

23.

Wang

, GMM-Based Hidden Markov Random Field for Color Image and 3D Volume Segmentation, Computer Science, 2012.

24.

Unnikrishnan

, Pantofaru

, and Hebert

, A measure for objective evaluation of image segmentation algorithms, IEEE International Conference on Computer Vision 2005, pp. 24–41.

25.

Chen

, and Zhang

, Robust image segmentation using FCM with spatial constraints based on new Kernel-induced distance measure, IEEE Transaction System, Man Cybernetics 34(4) (2004), 1907–1916.

26.

Geman

, and Graffigne

, Markov random field image models and their applications to computer vision, Proceeding of the International Congress of Mathematicians Berkeley, 1987, pp. 1498–1517.

27.

Geman

, and Geman

, Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6(6) (1984), 721–741.

28.

Krinidis

, and Chatzis

, A robust fuzzy local information C-means clustering algorithm, IEEE Transaction on Image Processing 19(5) (2010), 1328–1337.

29.

Naz

, Majeed

, and Irshad

, Image segmentation using fuzzy clustering: A survey, pp, International Conference on Emerging Technologies (2010), 181–186.

30.

Chatzis

S.P.

, and Varvarigou

T.A.

, Factor analysis latent subspace modeling and robust fuzzy clustering using t-distributions, IEEE Transaction on Fuzzy Systems 17(3) (2009), 505–517.

31.

Al-amri

S.S.

, and Kalyankar

N.V.

, Image Segmentation by using threshold techniques, Journal of Computing 2(5) (2010), 83–86.

32.

Theodoridis

, and Koutroumbas

, Pattern Recognition, Academic Press, Technology & Engineering, 2008.

33.

Khan

, Image Segmentation Techniques: A Survey, Journal of Image and Graphics 1(4) (2013), 166–170.

34.

Qian

, and Titterington

, Estimation of parameters in hidden Markov models, Philos Trans Roy Soc London A 337 (1991), 407–428.

35.

Jiang

, Chung

F.-L.

, Wang

, Deng

, Wang

, and Qian

, Collaborative fuzzy clustering from multiple weighted views, IEEE T Cybernetics 45(4) (2015), 688–701.

36.

Jiang

, Chung

F.-L.

, and Wang

, Enhanced fuzzy partitions vs data randomness in FCM, Journal of Intelligent and Fuzzy Systems 27(4) (2014), 1639–1648.

37.

, and Wang

, Color image segmentation using gaussian mixture model and EM algorithm, Multimedia and Signal Processing Communications in Computer and Information Science 346 (2012), 61–66.