Fast and automatic hesitant fuzzy clustering applied to image segmentation

Abstract

Image segmentation is a very studied area, looking for the best clustering of pixels. However, it is sometimes a complicated task, especially when these pixels are at the edges of regions, where there is a gradient and it is difficult to decide to which region to assign it. Hesitating fuzzy sets (HFS) better describe these situations, allowing to have multiple possible values for each element, giving more flexibility. This type of sets has been mainly applied in decision-making problems, obtaining better results than other types of fuzzy sets. This research proposes a fast and automatic method based on fuzzy hesitant clustering (FAHFC), which does not require parameters since it is capable of determining the number of clusters, using the Calinski-Harabasz index, in which the segmentation is performed, solving the initialization problem in clustering; it also proposes an alternative to construct the HFS through the use of fuzzy relations. The experiments show superiority in terms of clustering quality and convergence over some selected state-of-the-art algorithms.

Keywords

Fuzzy clustering hesitant fuzzy sets image segmentation Calinski-Harabasz index

1 Introduction

Clustering is an unsupervised learning task, which aims to divide a set of data according to their similarity, so that objects belonging to the same cluster are as similar as possible, while those belonging to different clusters are as different as possible [1]. This kind of learning, unsupervised, has been used in different fields, such as, machine learning, image segmentation, pattern recognition, among others [2–7]. Image segmentation has been studied with the objective of separating the image into more homogeneous regions that have similar characteristics, i.e., that belong to the same object [8]. It is a complicated task, because the images have different characteristics.

One of the most studied algorithms in this task is the Fuzzy C-means algorithm (FCM), which allows a fuzzy clustering analysis, in other words, each data is assigned a degree of membership in the interval [0, 1] with respect to each of the groups or regions, simultaneously, in such a way that each data belongs to a certain degree to each of the groups [1, 9].

There are different algorithms based on the FCM algorithm, making improvements to achieve better segmentation quality performance, such as the FRFCM algorithm that employs morphological reconstruction to smooth images [10]. AFCF [11] is inspired by the image superpixels, the DP algorithm and the prior entropy-based fuzzy clustering. The RSSFCA [12] utilizes regularization under Gaussian metric to obtain proper sparse memberships that can effectively reduce non-homogenous interference and achieve better classification, also employs the CCF-ADB to achieve automatically region merging. FCM_SICM [13] is another improved FCM with spatial and adaptive intensity constraint and membership binding.

The FCM algorithm exhibits a deficiency in its expression of degrees of membership with precise values, a limitation that becomes particularly pronounced when dealing with imprecise values along the boundaries of regions. This issue extends to other theoretical extensions, including intuitionistic fuzzy sets, interval-evaluated fuzzy sets, and type 2 fuzzy sets, all of which share a common challenge: the unclear assignment of the degree of membership to a fixed set due to a set of possible values rather than a probability distribution or a precise value [14]. In response to these limitations, Hesitant Fuzzy Sets (HFS) [15] emerge as a pivotal alternative. Proposed as a generalization of the fuzzy set, HFS excels in describing situations where uncertainties arise during decision-making processes, offering a more flexible approach to expressing preferences over objects. Unlike its counterparts, HFS allows for several possible membership values for a single element within a reference set [14, 16].

While traditional approaches to constructing HFS rely on expert input, determining the appropriate number of experts presents a challenge that significantly influences clustering performance [17]. Previous studies, such as the clustering algorithm PHFCM, have showcased superiority over state-of-the-art alternatives. However, their dependence on expert knowledge remains a limiting factor [18]. In this context, the proposed alternative aims to generate hesitant fuzzy information independently, reducing reliance on other algorithms and enhancing the adaptability of hesitant fuzzy sets. By addressing the deficiencies of FCM and other theoretical extensions, HFS emerges as a robust solution, offering a nuanced and flexible representation of uncertainty in scenarios where precise membership assignments prove challenging.

In this research, the theory of hesitant fuzzy sets is integrated into the field of clustering, presenting a method called Fast and Automatic Hesitant Fuzzy Clustering (FAHFC). The emphasis is on its application to data clustering, specifically in the context of image segmentation.There are four main features of FAHFC: (1) The algorithm dynamically calculates the optimal number of clusters c by utilizing the Calinski-Harabasz (CH) index, this eliminates the need for manual intervention, enhancing its adaptability across diverse datasets. (2) FAHFC generates fuzzy hesitant information through the implementation of fuzzy relations, this departure from conventional methods, which rely on alternative clustering algorithms or expert input, underscores the distinctiveness of our approach. (3) A image reduction technique is incorporated within FAHFC to minimize execution time by reducing iteration steps. This enhancement contributes to the algorithm’s efficiency, particularly when dealing with large datasets. Comparative analyses consistently demonstrate that FAHFC outperforms some of the state-of-the-art methods in terms of segmentation quality. This highlights the efficacy of our proposed algorithm in achieving improved clustering results.

The remainder of this paper is organized as follows. Section 2 describe some concepts used for the formulation of the proposed algorithm. The mathematical justification of the proposed algorithm is detailed in Section 3. Section 4 presents the experimental results and finally the conclusions in Section 5.

2 Background

This section briefly describes some basic concepts used for the formulation of the proposed algorithm, such as the theory of hesitant fuzzy sets and their distance measures, as well as the index used to select the number of clusters.

2.1 Hesitant fuzzy sets

The concept of hesitant fuzzy set was defined by Torra and Narukawa (2009) to deal with problems in which the membership of an element to a set is not clear and can express it more completely than other extensions of the fuzzy set. The HFS allow to have a possible several values instead of a one precise value (fuzzy sets) or a possibility distribution (type 2 fuzzy) even a margin of error (intuitionistic fuzzy set, interval-valued fuzzy set). HFS, mostly used in the decision making process, are a generalization of the fuzzy set, to better describe situations in which there are doubts when giving preferences over elements, allows several possible member- ship values for a single element in a reference set. Formally, can be seen as fuzzy multisets [16].

Definition 1. [15]. Let X be a reference set, then we define hesitant fuzzy set on X in terms of a function h that when applied to X returns a subset of [0, 1].

For a better comprehension, a HFS can be represented by a mathematical symbol [16]:

Z = {< x, h_{Z} (x) > ∣ x \in X}

(1)

where h_Z (x) is a set of some values in [0, 1], which denotes the possible membership degrees of the element x ∈ X to the set Z. h = h_z (x) is called Hesitant Fuzzy Element (HFE).

2.2 Hesitant fuzzy clustering to image segmentation

Let X be a image of size M × N, in image segmentation, it is commonly that the image is transformed to a vector of the form: X = {x_ij|i = 1, 2, …, N ; j = 1, 2, …, D}, where n = M * N and D are the image channels. Next, it is required to be expressed in terms of hesitant fuzzy information, so each pixel is transformed to the fuzzy domain with:

μ (x_{ij}) = \frac{x_{ij} - \min (x_{j})}{\max (x_{j}) - \min (x_{j})}

(2)

where μ (x_ij) is the degree of membership of pixel x_i.

Then, the Hesitant Fuzzy Set is a function that when applied to a reference set returns a subset of [0, 1] [15]; mathematically is expressed by A = {x_i, h_A (x_i)} , i = 1, 2, …, n, where h_A (x_i) is a set of some values in [0, 1], denoting the possible membership degrees of each element x_i ∈ X. The function h = h_A (x_i) is a hesitant fuzzy element, according to the literature, it is defined by experts who provide preference information about a set of alternatives.

Image segmentation using a clustering algorithm requires a minimizing objective function (i.e., minimizing the distances between cluster centers and data, since the closer a data is to a center the higher its degree of membership). Then Hesitant fuzzy clustering objective function is:

\min J_{HFS} (X^{HFS}; U; V^{HFS}) = \sum_{i = 1}^{n} \sum_{j = 1}^{c} u_{ij}^{m} d^{2} (h_{i}, v_{j})

(3)

where X^HFS = {h₁, h₂, …, h_n} represents n hesitant fuzzy elements corresponding to each pixel x_i = h (x_i):

h (x_{i}) = {μ_{ij}^{k} | 0 \leq μ_{ij}^{k} \leq 1, k = 1, \dots, l} \forall x \in X

(4)

in which l is the number of possible membership degrees, u_ij is the degree of membership of the i-th element of the j-th cluster U = (u_ij) _c×n is the fuzzy membership matrix. c is the number of clusters (1 ≤ c ≤ n) and m is the fuzzifier parameter (m > 1). V^HFS are the centers of the cluster also represented in terms of hesitant fuzzy sets. Finally, d² (h_i, v_j) is the hesitant normalized Euclidean distance between the data and the centers of the clusters defined by [21] in Equation 5.

d^{2} (A_{1}, A_{2}) = [\frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{l_{x_{i}}} \sum_{j = 1}^{l_{x_{i}}} | \underset{A_{1}}{h^{σ^{j}}} (x_{i}) - \underset{A_{2}}{h^{σ^{j}}} (x_{i}) |^{2})]^{\frac{1}{2}}

(5)

For the objective function J_HFS to be minimized the membership degrees are optimized to fixed parameters of the clusters, then, V^HFS is optimized for a fixed membership degrees matrix, repeating this process until a certain stopping criterion is met. The optimal values at each of these steps can be calculated using the parameter update equations. These formulas are obtained by setting the partial derivative of the objective function J_HFS(Equation 3) with respect to the parameter to be optimized equal to zero using the Lagrange multipliers [22]:

L = \sum_{i = 1}^{n} \sum_{j = 1}^{c} u_{ij}^{m} d^{2} (h_{i}, v_{j}) - \sum_{i = 1}^{n} λ_{i} (\sum_{j = 1}^{c} u_{ij} - 1)

(6)

The membership degree update equation is defined by:

\begin{matrix} \frac{\partial L}{\partial u_{ij}} = 0, \\ \frac{\partial L}{\partial λ} = 0, \end{matrix}} \Rightarrow u_{ij} = \frac{1}{\sum_{c = 1}^{C} (\frac{d^{2} (h_{i}, v_{j})}{d^{2} (h_{i}, v_{c})})^{\frac{2}{(m - 1)}}}

(7)

The centers of the groups are calculated from the following expressions:

\begin{matrix} \forall 1 \leq j \leq c, 1 \leq i \leq N, \\ \frac{\partial L}{\partial u_{1} (v_{j})} = \frac{\partial L}{\partial u_{2} (v_{j})} = \dots = \frac{\partial L}{\partial u_{l} (v_{j})} = 0 \end{matrix}

(8)

v_{j} = \frac{\sum_{i = 1}^{N} u_{ij}^{m} h_{i}}{\sum_{i = 1}^{N} u_{ij}^{m}}

(9)

2.3 Calinski-harabasz Index

The Calinski-Harabasz Index, introduced by Calinski and Harabasz in 1974 [23], is a measure of how similar an object is to its own group (cohesion) compared to other groups (separation). Cohesion is a function of the distances of the data in the group to the group centroid and separation is based on the distance of the group centroids to the global centroid [24, 25]. The formula for the Calinski-Harabasz index, for K groups in a data set X = [x₁, x₂, … d_n], is defined as:

CH = [\frac{\sum_{k = 1}^{K} n_{k} ∥ c_{k} - c ∥^{2}}{(K - 1)}] / [\frac{\sum_{k = 1}^{K} \sum_{i = 1}^{n_{k}} ∥ x_{i} - c_{k} ∥^{2}}{(n - K)}]

(10)

here n_k and c_k are the number of data and the centroid of the k-th cluster, respectively and c is the centroid of the data set,i.e. the global centroid. N the total number of data. A higher value of the Calinski-Harabasz index means that the clusters are dense and well separated, indicating that the clustering is better.

3 Proposed method

In this paper, a color image segmentation method is proposed, which is shown in Fig. 1. This method has three main stages: 1) filter the image X, and make a reduction, f, to find the number of clusters C using the Calinski-Harabasz (CH) index, at this stage we also obtain the hesitant fuzzy matrix of the image f, H_f; 2) the matrix H_f is used to calculate the centroids which will pass to stage 3) as the initial centroids that will segment the image X, for this third stage it is necessary to calculate the hesitant fuzzy matrix H_x.

Fig. 1

Process of the Fast and Automatic Hesitant Fuzzy Clustering algorithm for image segmentation.

The literature describes that there is no database that contains hesitant fuzzy information, since this is given by experts, in this article it is proposed an alternative to generate this type of information, and it is by means of membership functions based on fuzzy relations, described below: –

Homogeneity, computed within a neighborhood of 3 × 3, is considered a fuzzy set and its membership function is defined as:

μ_{h} = 1 - \frac{G^{\max, l} - G^{\min, l}}{G^{\max, g} - G^{\min, g}}

(11)

where G^min,l,G^max,l,G^min,g and G^max,g are the local and global minimum and maximum gray levels, respectively.

–

Color space, a transformation to the IJK color space is performed, which is a rotation to the RGB color space. First it is necessary to transform each pixel of the image to the fuzzy domain, using:

μ (x_{i}) = \frac{x_{i} - mn (x)}{\max (x) - \min (x)}

(12)

Then, the conversion from RGB to IJK color space is performed with the following equations:

I = \frac{R + G + B}{3}

(13)

J = \frac{2 R - G - B}{2 \sqrt{3}}

(14)

K = \frac{G - B}{\sqrt{6}}

(15)

–

Textures, statistical approaches are used to describe the texture of the images, with the objective of having an algorithm that is simple to implement. The techniques to describe the textures used by the algorithm are: Fuzzy entropy, which quantifies the uncertainty of the image content, defined by:

H_{\log} (G) = \frac{1}{MNln 2} \sum_{m, n} S_{n} (μ_{mn})

(16)

where

\begin{matrix} S_{n} (μ_{mn}) = \\ - μ_{mn} \ln (μ_{mn}) - (1 - μ_{mn}) \ln (1 - μ_{mn}) \end{matrix}

(17)

Standard deviation, used because its values are usually more intuitive, and is described by:

σ (z) = \sqrt{\sum_{i = 0}^{L - 1} (z_{i} - m)^{n} p (z_{i})}

(18)

where m is the mean value of z (average intensity)

m = \sum_{i = 0}^{L - 1} z_{i} p (z_{i})

(19)

As previously mentioned, the hesitant fuzzy matrix contains a set of values in [0, 1] that denote the possible degrees of membership of an element in the set. Therefore, the proposal is to make use of the three described relations to initialize this matrix, as indicated in Equation 4 where k represents each of the l established relations (homogeneity, color and textures).

For more details, Algorithm 1 summarizes the implementation of this method, which has as input a color image, which will be subsequently filtered, in the experiments the median filter is used, since it presents real pixels within the image, and it is less sensitive to extreme values. Then a reduction of the image (Algorithm 2) is done to find the number of groups in which the image will be segmented, as well as to find the initial centroids, all this in the reduced image f. In steps 4 and 5, the transformation of the image X to the hesitant fuzzy domain is performed, and the initialization of the hesitant fuzzy matrix using the alternative described above, using the fuzzy homogeneity, color and texture relations. Finally, an iterative process is performed to update the centroids of the groups and the fuzzy membership matrix and obtain the segmented image.

Algorithm 1 Fast and Automatic Hesitant Fuzzy Clustering

Require: Image X (M, N)

Ensure: Segmented image Y (M, N)

1: Set m and ɛ (m = 2 and ɛ = 0.00001 as default)

2: Reduction (Algorithm 2)

3: X_HFS ← X

4: H_X ← h (x) ∈ [0, 1]

5: while $∥ u_{ij}^{a} - u_{ij}^{a - 1} ∥ < ɛ$ do

6: $v_{j} \leftarrow \frac{\sum_{i = 1}^{N} u_{ij}^{m} h_{i}}{\sum_{i = 1}^{N} u_{ij}^{m}}$

7: $u_{ij} \leftarrow \frac{1}{\sum_{c = 1}^{C} (\frac{d^{2} (h_{i}, v_{j})}{d^{2} (h_{i}, v_{c})})^{\frac{2}{(m - 1)}}}$

8: end while

9: $Y \leftarrow {argmax}_{j = 1}^{c} U_{ij}$

10: returnY

Algorithm 2 Reduction

Require:X (M, N)

Ensure: v _j

1: $\hat{X} \leftarrow$ Median filter{X} (3 × 3 neighborhood)

2: $F (p, q) \leftarrow \hat{X} (M, N)$

3: F^HFS ← F

4: Find number of clusters (Algorithm 3)

5: H_F ← h (f) ∈ [0, 1]

6: v_j ← rand ([0, 1])

7: $u_{ij} \leftarrow \frac{1}{\sum_{s = 1}^{c} (\frac{d^{2} (h_{i}, v_{j})}{d^{2} (h_{i}, v_{s})})^{\frac{2}{(m - 1)}}}$

8: while $∥ u_{ij}^{a} - u_{ij}^{a - 1} ∥ < ɛ$ do

9: $v_{j} \leftarrow \frac{\sum_{i = 1}^{N} u_{ij}^{m} h_{i}}{\sum_{i = 1}^{N} u_{ij}^{m}}$

10: $u_{ij} \leftarrow \frac{1}{\sum_{c = 1}^{C} (\frac{d^{2} (h_{i}, v_{j})}{d^{2} (h_{i}, v_{c})})^{\frac{2}{(m - 1)}}}$

11: end while

returnv_j

As mentioned before, the number of clusters is determined by the Calinski-Harabasz index described in Equation 10, this is calculated with the clustering result of the FCM algorithm for 2 to n clusters, and the index with the highest value is selected; the process is detailed in Algorithm 3.

Algorithm 3 Find number of clusters

Require: Dataset X

Ensure: C

1: Set n (n = 15 as default)

2: fori ← 1 to ndo

3: clusters (i) ← FCM (X, i)

4: CH (i)← Equation 10

5: if $CH (i) > CH (\max_{value})$ then

6: $\max_{value} \leftarrow i$

7: end if

8: end for

9: $C = \max_{value}$

10: returnC

3.1 Metrics

The algorithm proposed in this paper is applied to image segmentation, for this aspect the metrics considered to measure image segmentation performance are:

–
Probabilistic Index Rand (PRI), [33] which measures the similarity between the segmented images and the ground truth.

$PRI (S, G_{k}) = \frac{2}{N (N - 1)} \sum_{i, j, i < j} (p_{ij}^{c_{ij}} (1 - p_{ij})^{1 - c_{ij}})$
(20)
where S is the tested algorithm segmentation, N pixels number, c_ij is a Boolean function denoting if $l_{i}^{S} = l_{j}^{G_{k}}$ , p_ij is the expected value of a Bernoulli distribution for the pixel pair, $l_{i}^{G_{k}}$ is the label pixel x_i in the kth manually segmented image, $l_{i}^{S}$ is the label pixel x_i in the tested algorithm. The ground truth set is defined as {G₁, G₂, …, G_L}, where L is the number of manually segmented images.
–
Variation of information (VOI) [34] is defined by:

$VOI (S, G_{k}) = H (S) + H (G_{k}) - 2 I (S, G_{k})$
(21)
where H is the entropy $- \sum_{i = 1}^{c} \frac{n_{i}}{n} log \frac{n_{i}}{n}$ , n_i represents the number of points belonging to i-th cluster, $I (S, G_{k}) = \sum_{i = 1}^{c} \sum_{j = 1}^{c} \frac{n_{i, j}}{n} log \frac{n_{i, j}}{n} \frac{n_{i}}{n} \frac{n_{j}}{n}$ is the mutual information between two clustering, n_i,j is the number of points in the intersection of cluster i of S and j of G_k.
–
Global consistency error (GCE), [32], evaluates the extent to which the segmentation can be considered to refine the others.

$GCE (S, G_{k}) = \frac{1}{n} min (\sum_{i = 1}^{n} E (S, G_{k}, x_{i}), \sum_{i = 1}^{n} E (G_{k}, S, x_{i}))$
(22)
where $E (S, G_{k}, x_{i}) = \frac{| R (S, x_{i}) ∖ R (G_{k}, x_{i}) |}{| R (S, x_{i}) |}$ is an error measure of each pixel x_i, | · | is the cardinality, ∖ is the set difference and R (S, x_i) is the set of pixels corresponding to segmentation region S that contains the pixel x_i.
–
Boundary displacement error (BDE), [35], which evaluates the average pixel displacement error at the edges between two segmented images by calculating the distance between the pixel and the nearest pixel in the other segmentation.
$BDE (S, G_{k}) = \frac{1}{2} (D_{S}^{G_{k}} + D_{G_{k}}^{S})$
(23)
where $D_{S}^{G_{k}}$ is a distance distribution signature obtained by adding the distances over all points of S.

On the other hand, different metrics are considered to evaluate the quality of the clustering, which are described below: –
Adjusted Rand Index (ARI), computes a similarity measure between two clusterings by considering all pairs of samples and counting the pairs that are assigned in the same or different clusters between the clustering result and its ground truth. [36, 37].

$ARI = \frac{RI - E [RI]}{\max (RI) - E [RI]}$
(24)
where RI is the non-adjusted version of the Rand Index and is described by:
$RI = \frac{a + b}{C_{2}^{n}}$
(25)
The ARI value is between 0 and 1, the index is equal to 1 only if a partition is completely identical to the intrinsic structure and close to 0 for a random partition.
–
Accuracy (Acc), discovers the one-to-one relationship between clusters and classes. Let li and $\hat{l_{i}}$ be the clustering result and the label of the ground truth of x_i, respectively [38].

$Acc = \frac{\sum_{i = 1}^{n} δ (\hat{l_{i}}, map (l_{i}))}{n}$
(26)
where n is the number of samples, the delta function δ (x, y) is equal to one if and only if x = y and zero otherwise, map (-) is a permutation mapping function that maps each index of the cluster with a label.
–
Normalized mutual information (NMI) measures the clustering quality [38].

$NMI (L, \hat{L}) = \frac{\sum_{l \in L, \hat{l} \in \hat{L}} p (l, \hat{l}) \log (\frac{p (l, \hat{l})}{p (l) p (\hat{l})})}{\max (H (L), H (\hat{L}))}$
(27)
where p (l) and $p (\hat{l})$ represent the marginal probability distribution functions of L and $\hat{L}$ , respectively, induced from the joint distribution $p (l, \hat{l})$ of L and $\hat{L}$ . H (·) is the entropy function. The higher value of NMI represents the best clustering performance.
–
Purity, measures the degree to which the groups contain a single class [38].

$Purity = \sum_{i = 1}^{c} \frac{n_{i}}{n} P (C_{i}) P (S_{i}) = \frac{1}{n_{i}} \max_{j} (n_{i}^{j})$
(28)
where n_i is the number of points in the C_i cluster and $n_{i}^{j}$ represents the total number of points that the i-th cluster is assigned to the j-th category. There are c categories in total and higher purity indicates better clustering performance.

4 Experiments and results

In this section, the performance of the proposed method is demonstrated through three experiments: 1) First, it is necessary to measure the execution time of each algorithm to expose the advantage of the proposed method. 2) Then, it is required to quantify the segmentation quality in metric terms and make an objective comparison with other state-of-the-art algorithms such as Fast and Robust Fuzzy C-Means Clustering (FRFCM) [10], Automatic Fuzzy Clustering Framework for Image Segmentation (AFCF) [11], Robust Self-Sparse Fuzzy Clustering (RSSFCA) [12], FCM with adaptive spatial & intensity constraint and membership linking (FCM_SICM)[13] and Parallel Hesitant Fuzzy C-Means (PHFCM) [18]. 3) Finally, the quality of clustering in UCI repository [26] is evaluated, comparing the results with some algorithms from the literature such as FCM [27], Kmediod [28], HPSOFCM [29], FRFCM [10], KMM [30] and FCM-ELPSO [31]; in addition, it is verified if the algorithm is capable of correctly find the number of clusters in each dataset.

The proposed method and comparative algorithms were implemented in MATLAB R2021a, on a PC equipped with a quad-core Intel Core i7-4510U @ 2.00 GHz and 8 GB of RAM. The operating system was Ubuntu 18.04.6.

4.1 Databases

The evaluation of this algorithm was performed with images from the Berkeley Segmentation Data Set 500 (BSDS500) database [32], image size is 481×321 pixels. Figure 2 presents some of the images with which the evaluation was performed, most of the images in this database are complex because they have different textures and lighting changes.

Fig. 2

Sample images from the database.

On the other hand, the databases obtained from the UCI repository are summarized in Table 1, selecting different numbers of instances, characteristics and the classes in which each dataset should be clustered to have a reference in the evaluation.

Table 1

Real datasets.

Dataset	Instances	Features	Classes
Lung-cancer	32	56	3
Iris	150	4	3
Wine	178	13	3
Seeds	210	7	3
Ecoli	336	7	8
WDBC	569	30	2
Yeast	1484	8	10
Banknote	1372	5	2
ImgSeg	2310	19	7

4.2 Experiment 1: Runtime test

The results of this experiment are presented in Table 2, it is observed that the AFCF and RSSFCA algorithms, obtain the highest number of iterations, this is because they are set by default at the beginning of the algorithm, therefore, the execution time is also high. Another important fact is that although the FCM algorithm is faster, the number of iterations is higher.

Table 2
Average execution performance

Algorithm Time (s) Iterations

FCM 0.015 20

FRFCM 2.746 36

AFCF 2.306 50

RSSFCA 61.994 50

FCM_SICM 0.803 15

PHFCM 0.439 6

FAHFC 0.387 5

Algorithm	Time (s)	Iterations
FCM	0.015	20
FRFCM	2.746	36
AFCF	2.306	50
RSSFCA	61.994	50
FCM_SICM	0.803	15
PHFCM	0.439	6
FAHFC	0.387	5

On the other hand, the proposed method converges in less number of iterations and in a considerable time, less than the majority of the algorithms with which it is compared, t even has a better performance than the PHFCM algorithm, this is due to the second stage, where the initial centroids are calculated through a reduction of the original image; since the image is smaller, the iterative process to update centroids and membership matrix is done in less time, the centroids obtained are used in third stage, given the approach they have to the optimal values, they make the algorithm converge in fewer iterations, in an average of 5. The approach towards the final centroids that is carried out in the second stage allows the proposed algorithm to improve the quality of the segmentation with respect to the FCM algorithm, sacrificing processing time.

4.3 Experiment 2: Clustering quality evaluation

One of the main contributions of this article is that the algorithm automatically determines the number of clusters into which the data will be grouped, solving the initialization problem. Figure 3 details the results obtained by the CH index for each of the databases, it is important to show that this index correctly finds the number of clusters for each of the databases, the graph shows the maximum value obtained by the index which identifies the number of clusters in which each database has to be clustered.

Fig. 3

CH index result for each database.

Subsequently, by verifying that the algorithm is capable of determining the number of groups, the quality of the grouping was evaluated, the average value of each metric is summarized in Table 3, here it is observed that the performance of the AHFCM algorithm is superior compared to the others, having advantage of the four metrics selected to evaluate the quality of clustering. To visualize the evaluation of the clustering quality of the previous table, Fig. 4 is presented with the results of the metrics in each of the databases, in this figure it is observed that the AHFCM algorithm obtains the highest values in most of the databases, thus corroborating the quantitative results.

Fig. 4

Quantitative results graphs.

Table 3

Average result of each algorithm

Algorithm	ARI	ACC	NMI	Purity
FCM	0.476	0.645	0.446	0.677
Kmediod	0.730	0.636	0.448	0.684
HPSOFCM	0.726	0.640	0.397	0.681
FRFCM	0.734	0.525	0.438	0.614
KMM	0.564	0.553	0.305	0.575
FCM-ELPSO	0.749	0.659	0.430	0.709
AHFCM	0.756	0.669	0.453	0.713

4.4 Experiment 3: Color image segmentation evaluation

The average result of each metric is summarized in Table 4, where the advantage of the proposed FAHFC algorithm over the other algorithms is observed in most of the metrics, for example it is superior to the PHFCM algorithm in the PRI and GCE metrics with differences of 0.02 and 0.06, respectively; while in the VOI metric it has a difference of 0.12 over RSSFCA, just to visualize the analysis of these results, the performance of each algorithm is shown as a graph in Fig. 5.

Table 4
Average result of image segmentation

Algorithm PRI VOI GCE BDE

FCM 0.769 1.147 0.210 9.563

FRFCM 0.833 0.855 0.150 7.969

AFCF 0.868 0.735 0.132 7.113

RSSFCA 0.875 0.663 0.106 9.135

FCM_SICM 0.802 0.966 0.181 14.388

PHFCM 0.901 0.735 0.103 10.209

FAHFC 0.923 0.546 0.097 9.478

Algorithm	PRI	VOI	GCE	BDE
FCM	0.769	1.147	0.210	9.563
FRFCM	0.833	0.855	0.150	7.969
AFCF	0.868	0.735	0.132	7.113
RSSFCA	0.875	0.663	0.106	9.135
FCM_SICM	0.802	0.966	0.181	14.388
PHFCM	0.901	0.735	0.103	10.209
FAHFC	0.923	0.546	0.097	9.478

Fig. 5

Quantitative results.

Finally, a subjective evaluation is presented in Fig. 6 showing the result of the segmentation of some sample images, with the different algorithms to compare the performance of each one. In this figure the parameter c is added, which suggests the number of regions in which each of the images is segmented, in addition its ground truth is also added.

Most of the images shown are complicated to segment, due to their textures and illumination changes, however the proposed algorithm, FAHFC outperforms the results of the other algorithms with which the comparison is made, being visually more similar to the ground truth, generates more homogeneous regions and delimits them better. The image also reveals the difference between the PHFCM and the proposed FAHFC algorithms, although both are based on the theory of hesitant fuzzy sets, they do not generate the same results, the advantage is FAHFC, which uses three different fuzzy relations to initialize the hesitant fuzzy matrix and achieve better performance, while PHFCM still uses FCM.

Fig. 6

Segmentation result of some sample images.

5 Conclusion

In summary, our research has introduced an method for color image segmentation based on hesitant fuzzy sets (HFS) and fuzzy relations for initializing the hesitant fuzzy matrix. The FAHFC (Fast and Automatic Hesitant Fuzzy Clustering) algorithm, utilizing the Calinski-Harabasz (CH) index, has demonstrated effectiveness in accelerating processing time through image reduction and centroid utilization, leading to convergence in a limited number of iterations. It is noteworthy to highlight that, in comparison with the baseline FCM algorithm, our approach does involve a slight sacrifice in processing time, but this trade-off is justified by the notable enhancement in segmentation quality. Experimentation has revealed that the proposed FAHFC method not only speeds up the processing time but also significantly improves the segmentation quality compared to the PHFCM version. While both approaches leverage HFS for clustering, the distinguishing factor lies in the initialization process of the hesitant fuzzy matrix, where FAHFC uses fuzzy relations and PHFCM employs the FCM algorithm. Furthermore, the metrics results have consistently indicated the superiority of the proposed algorithm over other state-of-the-art methods, both in the domain of image segmentation and in the clustering of data obtained from the UCI repository.

Considering future research directions, it is important to expand comparative studies to include other types of fuzzy techniques, such as intuitionistic fuzzy sets, possibilistic fuzzy sets, type-2 fuzzy sets, interval-evaluated fuzzy sets, and more. This extension aims to provide a comprehensive understanding of how hesitant fuzzy sets compare and complement different fuzzy methodologies in the context of image segmentation. Additionally, the trade-off between processing time and segmentation quality is recognized, especially in comparison with the widely used FCM algorithm. Despite a slight increase in processing time, the notable improvement in segmentation quality justifies this trade-off. Future investigations will focus on fine-tuning the algorithm to optimize processing time without compromising segmentation precision. Furthermore, it is planned to explore the incorporation of different types of textures during the initialization of the vacillating diffuse matrix, anticipating that this improvement will allow the algorithm to effectively segment degraded images with noise, contributing to its adaptability in real-world scenarios with various visual complexities.

In conclusion, this research not only advances the theoretical understanding of hesitant fuzzy sets in image segmentation, but also emphasizes the practical implications and potential real-world applications of the proposed methodology. The future research directions described aim to further refine and expand the capabilities of the algorithm, contributing to the continued evolution of fuzzy clustering methodologies in the field of image processing and pattern recognition.

Footnotes

Acknowledgments

The authors express their gratitude to CONACYT, as well as to the Tecnológico Nacional de México/CENIDET for their support of this research through the CONACYT National Scholarships.

Conflict of interest

The authors declare that they have no conflict of interest.

References

J.V.

, Advances in fuzzy clustering and its applications. John Wiley & Sons, 2007.

Likas

Vlassis

Verbeek

J.J.

, The global k-means clustering algorithm, Pattern recognition36(2) (2003), 451–461.

Aggarwal

C.C.

, Data classification, in Data Mining, pp. 285–344, Springer, 2015.

Hsu

Y.-C.

Kira

, Neural network-based clustering using pairwise constraints, arXiv preprint arXiv:1511.06321, 2015.

Patel

K.A.

Thakral

, The best clustering algorithms in data mining, in 2016 International Conference on Communication and Signal Processing (ICCSP), pp. 2042–2046, IEEE, 2016.

Taherkhani

Pierre

, Centralized and localized data congestion control strategy for vehicular ad hoc networks using a machine learning clustering algorithm, IEEE Transactions on Intelligent Transportation Systems17(11) (2016), 3275–3285.

Chen

Zheng

Lin

, A novel image segmentation method based on fast density clustering algorithm, Engineering Applications of Artificial Intelligence73 (2018), 92–110.

Szeliski

, Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

Bezdek

J.C.

Keller

Krisnapuram

Pal

, Fuzzy models and algorithms for pattern recognition and image processing vol. 4. Springer Science & Business Media, 1999

10.

Lei

Jia

Zhang

Meng

Nandi

A.K.

, Significantly fast and robust fuzzy c-means clustering algorithm based on morphological reconstruction and membership filtering, IEEE Transactions on Fuzzy Systems, 2018.

11.

Lei

Liu

Jia

Zhang

Meng

Nandi

A.K.

, Automatic fuzzy clustering framework for image segmentation, IEEE Transactions on Fuzzy Systems28(9) (2019), 2078–2092.

12.

Jia

Lei

Liu

Meng

Nandi

A.K.

, Robust self-sparse fuzzy clustering for image segmentation, IEEE Access8 (2020), 146182–146195.

13.

Wang

Fang

Yang

, Robust fuzzy c-means clustering algorithm with adaptive spatial & intensity constraint and membership linking for noise image segmentation, Applied Soft Computing92 (2020), 106318.

14.

Liao

, Springer, Hesitant fuzzy decision making methodologies and applications2017.

15.

Torra

, Hesitant fuzzy sets, International Journal of Intelligent Systems25(6) (2010), 529–539.

16.

, Hesitant fuzzy sets theory. Springer, 2016.

17.

Sun

Zhao

Zhang

, Fuzzy time series forecasting based on hesitant fuzzy sets and fuzzy c-means clustering, in 2020 IEEE 20th International Conference on Communication Technology (ICCT), pp. 1514–1518, IEEE, 2020.

18.

Vela-Rincón

V.V.

Mújica-Vargas

, Parallel hesitant fuzzy c-means algorithm to image segmentation, Signal, Image and Video Processing, (2021), pp. 1–9.

19.

Torra

Narukawa

, On hesitant fuzzy sets and decision, in 2009 IEEE International Conference on Fuzzy Systems, pp. 1378–1382, IEEE, 2009.

20.

Xia

, Studies on the aggregation of intuitionistic fuzzy and hesitant fuzzy information, International Journal of Intelligent Systems26 (2011), 26.

21.

Xia

, Distance and similarity measures for hesitant fuzzy sets, Information Sciences181(11) (2011), 2128–2138.

22.

Rockafellar

R.T.

, Lagrange multipliers and optimality, SIAM Review35(2) (1993), 183–238.

23.

Caliński

Harabasz

, A dendrite method for cluster analysis, Communications in Statistics-Theory and Methods3(1) (1974), 1–27.

24.

Maulik

Bandyopadhyay

, Performance evaluation of some clustering algorithms and validity indices, IEEE Transactions on pattern analysis and machine intelligence24(12) (2002), 1650–1654.

25.

Liu

Xiong

Gao

, Understanding of internal clustering validation measures, in 2010 IEEE international conference on data mining, pp. 911–916, IEEE, 2010.

26.

Dua

Graff

, UCI machine learning repository, 2017.

27.

Bezdek

J.C.

Ehrlich

Full

, Fcm: The fuzzy c-means clustering algorithm, Computers & Geosciences10(2-3) (1984), 191–203.

28.

Park

H.-S.

Jun

C.-H.

, A simple and fast algorithm for kmedoids clustering, Expert systems with applications36(2) (2009), 3336–3341.

29.

Chen

Tang

, A hybrid clustering algorithm based on fuzzy c-means and improved particle swarm optimization, Arabian Journal for Science and Engineering39(12) (2014), 8875–8887.

30.

Nie

Wang

C.-L.

, K-multiple-means: A multiple-means clustering method with specified k clusters, in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2019), pp. 959–967.

31.

Zhang

, Hybrid fuzzy clustering method based on fcm and enhanced logarithmical pso (elpso), Computational intelligence and neuroscience2020 (2020).

32.

Martin

Fowlkes

Tal

Malik

, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in Proc. 8th Int’l Conf. Computer Vision 2 416–423, July 2001.

33.

Unnikrishnan

Pantofaru

Hebert

, Toward objective evaluation of image segmentation algorithms, IEEE transactions on pattern analysis and machine intelligence29(6) (2007), 929–944.

34.

Meila

, a, Comparing clusterings— an information based distance, Journal of Multivariate Analysis98(5) (2007), 873–895.

35.

Freixenet

Munoz

Raba

Martí

Cufí

, Yet another survey on image segmentation: Region and boundary information integration, in European Conference on Computer Vision, pp. 408–422, Springer, 2002.

36.

Hubert

Arabie

, Comparing partitions, Journal of Classification2(1) (1985), 193–218.

37.

Santos

J.M.

Embrechts

, On the use of the adjusted rand index as a metric for evaluating supervised classification, in International conference on artificial neural networks, pp. 175–184, Springer, 2009.

38.

Kang

Wen

Chen

, Low-rank kernel learning for graph-based clustering, Knowledge-Based Systems163 (2019), 510–517.

Fast and automatic hesitant fuzzy clustering applied to image segmentation

Abstract

Keywords

1 Introduction

2 Background

2.1 Hesitant fuzzy sets

4.1 Databases

Table 2 Average execution performance Algorithm Time (s) Iterations FCM 0.015 20 FRFCM 2.746 36 AFCF 2.306 50 RSSFCA 61.994 50 FCM_SICM 0.803 15 PHFCM 0.439 6 FAHFC 0.387 5

Table 4 Average result of image segmentation Algorithm PRI VOI GCE BDE FCM 0.769 1.147 0.210 9.563 FRFCM 0.833 0.855 0.150 7.969 AFCF 0.868 0.735 0.132 7.113 RSSFCA 0.875 0.663 0.106 9.135 FCM_SICM 0.802 0.966 0.181 14.388 PHFCM 0.901 0.735 0.103 10.209 FAHFC 0.923 0.546 0.097 9.478

Footnotes

Acknowledgments

Conflict of interest

References

Table 2
Average execution performance

Algorithm Time (s) Iterations

FCM 0.015 20

FRFCM 2.746 36

AFCF 2.306 50

RSSFCA 61.994 50

FCM_SICM 0.803 15

PHFCM 0.439 6

FAHFC 0.387 5

Table 4
Average result of image segmentation

Algorithm PRI VOI GCE BDE

FCM 0.769 1.147 0.210 9.563

FRFCM 0.833 0.855 0.150 7.969

AFCF 0.868 0.735 0.132 7.113

RSSFCA 0.875 0.663 0.106 9.135

FCM_SICM 0.802 0.966 0.181 14.388

PHFCM 0.901 0.735 0.103 10.209

FAHFC 0.923 0.546 0.097 9.478