Aggregation similarity measure based on intuitionistic fuzzy closeness degree and its application to clustering analysis

Abstract

In order to distinguish with effect different intuitionistic fuzzy sets (IFSs), we generalize the asymmetrical relative entropy between IFSs as distance measure for higher discernment. Next, the formula of attribute weights is derived via an optimal model according to TOPSIS from the relative closeness degree constructed by the discerning relative entropy. Then, we propose a similarity formula with strong discernibility and two co-correlation degree formulas from the viewpoint of probability theory and prove their similar traits to the traditional correlation coefficient. To make full use of the three similarity measures presented in this paper, we consider aggregating those similarity measures and derive the synthetical similarity formula. Finally, the derived formula is used for clustering analysis under intuitionistic fuzzy (IF) information and the effectiveness and superiority are verified through a detailed comparison analysis of clustering results obtained by other clustering algorithms.

Keywords

Intuitionistic fuzzy set relative entropy TOPSIS similarity measure clustering analysis

1 Introduction

In 1965, Zadeh [1] pioneered the concept of fuzzy set (FS), which captures well the certainty information of real world via using a membership function and in 1986, Atanassov [2, 3] exploited intuitionistic fuzzy set (IFS) based on FS through adding the non-membership degree as another parameter and generate the third argument, hesitancy degree with it. IFS is more flexible and powerful in dealing with uncertainty and complexity of the practical matters comparing to FS. Entropy and correlation measure are two important information measure way in information theory. In 1948, Shannon [4] first proposed the concept of information entropy from the thermal entropy of thermodynamics and gave its computational formula to granulate the quantification of information. Subsequently, Kullback and Leibler [5] put forward the concept of relative entropy based on comentropy to describe the difference of two probability distribution as information granularity. In 1997, Shang and Jiang [6] first extended the entropy measure to the fuzzy theory in order to measure the uncertain information of fuzzy sets and Vlachos and Sergiadis [7] generalized it to IF information granularity, discussed the relation between entropy and relative entropy and made application in pattern recognition, medical diagnosis and image segmentation. Since then, entropy and relative entropy have been crucial research topic in fuzzy theory and the investigation has been applied to many fields such as data mining, pattern recognition, clustering analysis, decision making, granular computing and image processing [16 –21]. Gerstenkorn and Manko [22] generalize the concept of correlation coefficient in statistics to IFSs. Hong and Hwang [23] define them in the probability space. The correlation coefficient of the IFSs is calculated by the means of "centroid" [24]. After that, correlation coefficients are extended to the interval-valued IFSs [25, 26] by other scholars. It is with these uncertainty measures, IF information is granulated and lead to the formation of granular structure and granular space.

Cluster analysis is a course that divides all data from the dataset into multiple clusters according to certain granularity level depending on the similarity among samples. With the arrival of the information age, especially under the trend of the big data, it has been an important analytic tool and method for data preprocessing in data mining to obtain the distribution of data, and more and more people begin paying attention to it. In the face of different fuzzy environment, people provide a variety of clustering algorithms to deal with different types of fuzzy data such as IF clustering algorithm [27 –29], type-2 fuzzy clustering algorithm [30, 31] etc. In IF situation, Xu et al. [27] utilize the derived correlation coefficient as similarity measure to do clustering analysis for the sample data through the transitive closure method but the classification result is not fine and not very consistent with the practicality. Wang et al. [28] defined a novel closeness degree of two IFSs by modifying the previous findings and then clustered by using the netting means with IF context, which is able to handle and analyze data directly on the derived correlation matrix. This method is simple to operate and easy to grasp with a strong intuition for readers, so it can get results fast and effectively, however, the operation is rough and it will be confusing, even unable to perform when the sample size gets large. Moreover, it is also very difficult to use computerprogramming.

To solve these problems, we’ll do some improvement work and the rest of this work is organized as follows. Section 2 reviews basic content related to IFSs. In Section 3, we evolve the relative entropy of arbitrary two IFSs and develop their properties and functions. Section 4 establishes a nonlinear programming model on the basis of TOPSIS [33] with respect to this relative entropy closeness degree to obtain the formula of attribute weights. Section 5 defines three similarity measures for IFSs and prove their respective properties. In Section 6, we apply the aggregation notion to similarity measure for IFSs and uses the GWAA operator to aggregate the three kinds of degrees of similarity, which will make the derived similarity of IFSs more comprehensive and effective. Finally, the eventual association matrix is built with the aggregated association degree, and it is quickly converted into an equivalent matrix as means of "square". Upon these bases, we can make clustering granularity analysis availably with IF information in Section 7.

2 Preliminaries

In this section, we briefly recall some necessary concepts and operations relating to IFSs, which will be needed hereinafter.

Definition 2.1. ([2]) Let X be an non-empty set called universe of discourse, calling the three-tuples A = {〈x, μ_A (x), ν_A (x) 〉|x ∈ X} defined on X a intuitionistic fuzzy set (IFS), where μ_A (x), ν_A (x) are two mappings μ_A, ν_A : X → [0, 1], denote the membership and non-membership degrees of the element x belonging to the set A respectively and μ_A (x) + ν_A (x) ⩽1, ∀x ∈ X. Moreover, let π_A (x) =1 - μ_A (x) - ν_A (x), named uncertainy or hesitancy degree of x to A, obviously 0 ⩽ π_A (x) ⩽1. For the convenience of expression, we represent the family of all IFSs on the finite universe X with IFS(X), simplify μ_A (x), ν_A (x), π_A (x) as μ_A, ν_A, π_A.

In particular, when the hesitation degree π_A (x) =0 (∀ x ∈ X), the IFS A degenerates into the common fuzzy set. Therefore, IFS is one kind of generalization form of fuzzy set, which adds a parameter, non-membership.

Here, we give some basic operations for IFSs.

Definition 2.2. ([3]) Let A, B ∈ IFS (X) with A = {〈x, μ_A (x), ν_A (x) 〉|x ∈ X} and B = {〈x, μ_B (x), ν_B (x) 〉|x ∈ X}, define

(1) A^c = {〈x, ν_A (x), μ_A (x) 〉|x ∈ X}, where A^c denotes the complement of the IFS A; (2) A ⊆ B ⇔ μ_A (x) ⩽ μ_B (x), ν_A (x) ≥ ν_B (x); (3) A = B ⇔ A ⊆ B and B ⊆ A, i. e., μ_A (x) = μ_B (x), ν_A (x) = ν_B (x); (4) A ∪ B = {〈x, μ_A (x) ∨ μ_B (x), ν_A (x) ∧ ν_B (x) 〉|x ∈ X}; (5) A ∩ B = {〈x, μ_A (x) ∧ μ_B (x), ν_A (x) ∨ ν_B (x) 〉|x ∈ X}.

3 Relative entropy for IFSs

In 1948, Shannon [4] put forward the concept of information entropy.

Definition 3.1. Let P = {p₁, p₂, ⋯, p_m} be a probability distribution, where $p_{i} \in [0, 1], \sum_{i = 1}^{m} p_{i} = 1$ , the information entropy of the distribution P is defined as

$H (P) = - \sum_{i = 1}^{m} p_{i} log p_{i} = \sum_{i = 1}^{m} p_{i} log \frac{1}{p_{i}} .$

Assume Q = {q₁, q₂, ⋯, q_m} is the other probability distribution and its information entropy is

$H (Q) = - \sum_{i = 1}^{m} q_{i} log q_{i},$ soon Kullback and Leibler [5] give the relative entropy between the two distributions P and Q, defined by the Kullback-Leibler divergence: $D_{KL} (P | | Q) = \sum_{i = 1}^{m} p_{i} log \frac{1}{q_{i}} - \sum_{i = 1}^{m} p_{i} log \frac{1}{p_{i}} = \sum_{i = 1}^{m} p_{i} log \frac{p_{i}}{q_{i}} .$ (1) If m = 2, Equation (1) becomes $D_{KL} (P ∥ Q) = p log \frac{p}{q} + (1 - p) log \frac{1 - p}{1 - q} .$ (2) Therefore, Shang and Jiang [6] apply Equation (2) to the fuzzy set theory to measure the discrimination information and consider the circumstance when p = 0, 1 or q = 0, 1 at the same time.

Definition 3.2. Let A = {μ_A (x₁), μ_A (x₂), ⋯, μ_A (x_n)} and B = {μ_B (x₁), μ_B (x₂), ⋯, μ_B (x_n)} be two fuzzy sets on X = {x₁, x₂, ⋯, x_n}, then the fuzzy relative entropy of A from B is expressed as $\begin{matrix} E (A, B) = \sum_{i = 1}^{n} (μ_{A} (x_{i}) ln \frac{μ_{A} (x_{i})}{\frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))} \\ + (1 - μ_{A} (x_{i})) ln \frac{1 - μ_{A} (x_{i})}{1 - \frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))}) \end{matrix}$

Similarly, Vlachos and Sergiadis [7] extend the fuzzy relative entropy to IF situation and derive a relative entropy measure for IFSs.

Definition 3.3. Let A, B ∈ IFS (X), the IF relative entropy of A against B is defined as $I (A, B) = \sum_{i = 1}^{n} (μ_{A} (x_{i}) ln \frac{μ_{A} (x_{i})}{\frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))}$ (3) $+ ν_{A} (x_{i}) ln \frac{ν_{A} (x_{i})}{\frac{1}{2} (ν_{A} (x_{i}) + ν_{B} (x_{i}))})$

From Definition 3.3, we notice that Equation (3) is not a real relative entropy from the perspective of information theory since ∀x_i ∈ X, μ_A (x_i) + ν_A (x_i) ⩽1 while {μ_A, ν_A} and ${\frac{1}{2} (μ_{A} + μ_{B}), \frac{1}{2} (ν_{A} + ν_{B})}$ , as two discrete distributions of relative entropy, the sum of their respective probability should be equal to 1. Moreover the hesitation degree for IFS is not considered and the measure is not normalized. To avoid such, we give the following revised form:

Definition 3.4. Let A, B ∈ IFS (X), the IF relative entropy of A against B is defined as

$RE (A, B) = \frac{1}{n} \sum_{i = 1}^{n} RE (A (x_{i}), B (x_{i}))$ (4) where

$RE (A (x_{i}), B (x_{i})) = μ_{A} (x_{i}) {log}_{2} \frac{μ_{A} (x_{i})}{\frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))}$ (5) $\begin{matrix} + ν_{A} (x_{i}) {log}_{2} \frac{ν_{A} (x_{i})}{\frac{1}{2} (ν_{A} (x_{i}) + ν_{B} (x_{i}))} \\ + π_{A} (x_{i}) {log}_{2} \frac{π_{A} (x_{i})}{\frac{1}{2} (π_{A} (x_{i}) + π_{B} (x_{i}))} \end{matrix}$

We know Equation (5) is meaningless if μ_A, ν_A or π_A = 0, there upon we make the following extension:

Proposition 3.1. If there exists π_A (x_i) =0 and π_B (x_i) ≠0, we reverse the position of π_A and π_B, namely turn $π_{A} {log}_{2} \frac{π_{A}}{\frac{1}{2} (π_{A} + π_{B})}$ of Equation (5) into $π_{B} {log}_{2} \frac{π_{B}}{\frac{1}{2} (π_{A} + π_{B})}$ , that is π_B; if ∃π_A (x_i) =0 and π_B (x_i) =0, naturally we take $π_{A} {log}_{2} \frac{π_{A}}{\frac{1}{2} (π_{A} + π_{B})} = 0$ on x_i. The state of μ_A, μ_B or ν_A, ν_B = 0 is similar to the above that.

The relative entropy of Definition 3.4 takes into account all the three parameters characterizing IFSs, thus it contains much more information than those existing ones. Through its structure, we can conclude the following theorem.

Theorem 3.1. For A, B ∈ IFS (X), RE (A, B) satisfies the following properties: (1) RE (A, B) = RE (A^c, B^c) and RE (A, B^c) = RE (A^c, B); (2) 0 ⩽ RE (A, B) ⩽1 and RE (A, B) =0 iff A = B.

Proof. (1) According to the structure of Eqs.(4) and (5), Property (1) apparently holds. (2) Let f (x) = log ₂x (x > 0), then $f^{″} (x) = - \frac{1}{(ln 2) x^{2}} < 0$ , so f (x) is a convex-up function, and we have Jensen inequality $f (\sum_{i = 1}^{n} λ_{i} x_{i}) \geq \sum_{i = 1}^{n} λ_{i} f (x_{i})$ where $λ_{i} \geq 0, \sum_{i = 1}^{n} λ_{i} = 1$ . If and only if x₁ = x₂ = ⋯ = x_n, the inequality becomes equality. Thus ∀x_i ∈ X, $\begin{matrix} μ_{A} (x_{i}) {log}_{2} \frac{μ_{A} (x_{i})}{\frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))} + ν_{A} (x_{i}) \\ \times {log}_{2} \frac{ν_{A} (x_{i})}{\frac{1}{2} (ν_{A} (x_{i}) + ν_{B} (x_{i}))} + π_{A} (x_{i}) {log}_{2} \frac{π_{A} (x_{i})}{\frac{1}{2} (π_{A} (x_{i}) + π_{B} (x_{i}))} \\ = & - (μ_{A} (x_{i}) {log}_{2} \frac{μ_{A} (x_{i}) + μ_{B} (x_{i})}{2 μ_{A} (x_{i})} + ν_{A} (x_{i}) \\ \times & {log}_{2} \frac{ν_{A} (x_{i}) + ν_{B} (x_{i})}{2 ν_{A} (x_{i})} + π_{A} (x_{i}) {log}_{2} \frac{π_{A} (x_{i}) + π_{B} (x_{i})}{2 π_{A} (x_{i})}) \\ \geq & - {log}_{2} (μ_{A} (x_{i}) \cdot \frac{μ_{A} (x_{i}) + μ_{B} (x_{i})}{2 μ_{A} (x_{i})} + ν_{A} (x_{i}) \cdot \frac{ν_{A} (x_{i}) + ν_{B} (x_{i})}{2 ν_{A} (x_{i})} \end{matrix}$ $\begin{matrix} + π_{A} (x_{i}) \cdot \frac{π_{A} (x_{i}) + π_{B} (x_{i})}{2 π_{A} (x_{i})}) = - {log}_{2} (\frac{μ_{A} (x_{i}) + ν_{A} (x_{i}) + π_{A} (x_{i})}{2} \\ + \frac{μ_{B} (x_{i}) + ν_{B} (x_{i}) + π_{B} (x_{i})}{2}) = 0 \Rightarrow RE (A (x_{i}), B (x_{i})) \geq 0 . \end{matrix}$ For the randomicity of x_i, RE (A, B) ≥0 and iff μ_A = μ_B, ν_A = ν_B, i.e. A = B, RE (A, B) =0. On the other hand, $\begin{matrix} \frac{μ_{A} (x_{i})}{\frac{1}{2} (μ_{A} (x_{i}) + μ_{B} (x_{i}))} ⩽ 2, \frac{ν_{A} (x_{i})}{\frac{1}{2} (ν_{A} (x_{i}) + ν_{B} (x_{i}))} ⩽ 2, \\ \frac{π_{A} (x_{i})}{\frac{1}{2} (π_{A} (x_{i}) + π_{B} (x_{i}))} ⩽ 2, \forall x_{i} \in X . \end{matrix}$ Therefore $\begin{matrix} RE (A, B) ⩽ \frac{1}{n} \sum_{i = 1}^{n} (μ_{A} (x_{i}) {log}_{2} 2 + ν_{A} (x_{i}) {log}_{2} 2 \\ + π_{A} (x_{i}) {log}_{2} 2) = \frac{1}{n} \sum_{i = 1}^{n} (μ_{A} (x_{i}) + ν_{A} (x_{i}) + π_{A} (x_{i})) = 1 . \end{matrix}$ Because the relative entropy defined by Equation (4) is neither symmetric nor satisfies the triangle inequality, it is not a real sense of geometrical distance but information distance. The granularity structure divided by relative entropy provides thinner information granularity and it represents higher discernibility than those conventional distance due to its specific construction and previous distance formulas can’t solve the ordering problem of those points which locate the midperpendicular between two alternatives but the relative entropy distance can, so it has more prospects for practical application. From Theorem 3.1, we discover that the relative entropy in Definition 3.1 can be generalized further to IF granular

space and it is of realistic significance to take relative entropy as the separation between two IFSs. As an efficient separation measure, we will utilize it to measure the dissimilarity extent of two IFSs. The larger RE (A, B), states the larger the difference between A and B. On the contrary, A is closer to B.

4 Determination method of attribute weights based on relative entropy

Different attributes have different impacts on the final decision and they should be given different weights. When attribute weights are completely unknown or partly unknown, we need to excavate weights information from the decision matrix. Now that our relative entropy can be regarded as a distance measure, we can learn from the TOPSIS thought how to determine the attribute weights. The main idea is similarity to ideal solution, that is to say, we choose the alternative with the shortest separation from the positive ideal solution (PIS) and the farthest separation from the negative ideal solution (NIS). Thus, it has two direction when deciding the alternatives’ preference relation. In order to utilize overall PIS and NIS, we define the relative closeness degree (RCD) for IFSs so as to measure comprehensively the dissimilarity level of every IFS and the two extremes. The lower the RCD, indicates that the object is nearer to the PIS and farther from the NIS relatively and that object will be what we want. Thereupon, the weights’ distribution should meet the criterion that the RCD of all alternatives to the ideal ones is as small as possible. In other words, if an attribution can make the total RCD lesser, then such an attribution will be assigned a bigger weight, conversely smaller. In this work, we use the relative entropy for distance measure. Suppose the set of objects A = {A₁, A₂, ⋯, A_m}, the set of attributes X = {x₁, x₂, ⋯, x_n}, the weight vector w={w₁, w₂, ⋯, w_n}^T, where A_i = {〈x_j, μ_{A
_i} (x_j), ν_{A
_i} (x_j) 〉|x_j ∈ X} with μ_{A
_i} (x_j), ν_{A
_i} (x_j) denoting the membership and non-membership degree of the ith alternative A_i over the jth attribute x_j respectively, i = 1, 2, ⋯, m ; j = 1, 2, ⋯, n. The PIS A⁺ = {〈x_j, μ₊ (x_j), ν_- (x_j) 〉|x_j ∈ X}, where $μ_{+} (x_{j}) = ⋁_{i = 1}^{m} μ_{A_{i}} (x_{j}), ν_{-} (x_{j}) = ⋀_{i = 1}^{m} ν_{A_{i}} (x_{j})$ ; accordingly, the NIS A^- = {〈x_j, μ_- (x_j), ν₊ (x_j) 〉|x_j ∈ X}, where $μ_{-} (x_{j}) = ⋀_{i = 1}^{m} μ_{A_{i}} (x_{j}), ν_{-} (x_{j}) = ⋁_{i = 1}^{m} ν_{A_{i}} (x_{j})$ (if x_j is a cost-type index, then A⁺, A^- turn to their converse separately). Whereupon the RCD C (A_i (x_j), A^* (x_j)) of performance value A_i (x_j) to the homologous ideal value A^* (x_j) in decision matrix under the attribute x_j: $\begin{matrix} C (A_{i} (x_{j}), A^{*} (x_{j})) = \\ \frac{RE (A_{i} (x_{j}), A^{+} (x_{j}))}{RE (A_{i} (x_{j}), A^{+} (x_{j})) + RE (A_{i} (x_{j}), A^{-} (x_{j}))} \end{matrix}$

Here A^* (x_j) represents the ideal solution with respect to the attribution x_j including PIS A⁺ (x_j) and NIS A^- (x_j). Sometimes it stands for PIS, sometimes NIS.

Remark 4.1. For ∀i, j, if A_i (x_j) = A⁺ (x_j), C (A_i (x_j), A^* (x_j)) =0; if A_i (x_j) = A^- (x_j), then C (A_i (x_j), A^* (x_j)) =1. For another, if A_i (x_j) ≠ A⁺ (x_j), when C (A_i (x_j), A^* (x_j)) →0, A_i (x_j) → A⁺ (x_j); if A_i (x_j) ≠ A^- (x_j), when C (A_i (x_j), A^* (x_j)) →1, A_i (x_j) → A^- (x_j). Whereupon the RCD C (A_i (x_j), A^* (x_j)) can be used to describe the closeness extent between characteristic A_i (x_j) and the ideal solution A^* (x_j) for the attribution x_j and the smaller the value of RCD, the better the alternative A_i for x_j.

As the above analysis, the weight vector w should make the total RCD F ( w ) of all the objects to ideal ones for all the attributes reaches minimum value. Hence, we establish the following non-linear programming model: $\begin{matrix} (M - 1) {\begin{matrix} min F (w) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} w_{j}^{2} C^{2} (A_{i} (x_{j}), A^{*} (x_{j})) \\ s . t . \sum_{j = 1}^{n} w_{j} = 1, w_{j} \in [0, 1] \end{matrix} \end{matrix}$ To solve the above model, we construct Lagrange function first $L (w, ξ) = \sum_{i = 1}^{m} \sum_{j = 1}^{n} w_{j}^{2} C^{2} (A_{i} (x_{j}), A^{*} (x_{j})) + 2 ξ (\sum_{j = 1}^{n} w_{j} - 1)$ where ξ is a Lagrange multiplier. Let all the partial derivatives of function L be 0, we have $\begin{matrix} {\begin{matrix} \frac{\partial L}{w_{j}} = 2 w_{j} \sum_{i = 1}^{m} C^{2} (A_{i} (x_{j}), A^{*} (x_{j})) + 2 ξ = 0 \\ \frac{\partial L}{ξ} = 2 (\sum_{j = 1}^{n} w_{j} - 1) = 0 \end{matrix} \end{matrix}$ By solving the above equations, we obtain the optimal weight formula: $w_{j} = {(\sum_{i = 1}^{m} C^{2} (A_{i} (x_{j}), A^{*} (x_{j})))}^{- 1}$ (6) $\times {(\sum_{j = 1}^{n} {(\sum_{i = 1}^{m} C^{2} (A_{i} (x_{j}), A^{*} (x_{j})))}^{- 1})}^{- 1}, j = 1, 2, \dots, n .$ The Equation (6), which is observed that under one attribution, the lower the total RCD, that is, the relatively less the difference of all the objects to the PIS and the relative farther to the NIS, the attribution will be more weighty; oppositely, the counterpart will be smaller. Hence, the result decided by Equation (6) accords with the aforementioned requirements of the weights assignation.

5 Similarity measure between IFSs

5.1 Similarity degree of IFSs

First, let’s symmetrize the relative entropy RE (A, B) of Equation (4) as SE (A, B): $SE (A, B) = \frac{1}{2} [RE (A, B) + RE (B, A)]$ then a similarity degree between IFSs A and B can be defined as $S (A, B) = \frac{SE (A, B^{c})}{SE (A, B) + SE (A, B^{c})}$ (7) where B^c denotes the complement set of B. By referring to the structural thought of Refs. [37, 38] about association measure, we give the following theorem without proving.

Theorem 5.1. ∀A, B ∈ IFS (X), S (A, B) satisfies the following properties: (1) S (A, B) = S (A^c, B^c) = S (B, A) = S (B^c, A^c); (2) S (A, B^c) = S (A^c, B) = S (B^c, A) = S (B, A^c); (3) 0 ⩽ S (A, B) ⩽1; (4) iff A = B, S (A, B) =1; iff A = B^c, S (A, B) =0; (5) S (A, B) + S (A, B^c) =1; (6) iff $S (A, B) = S (A, B^{c}), S (A, B) = \frac{1}{2}$ .

In fact, Equation (7) is proposed based on TOPSIS. The larger S (A, B), it indicates the higher the extent of similarity between A and B. Compared to the similarity formula induced by distance measure by S (A, B) =1 - d (A, B), our formula not only determines the similarity degree of A, B, but also tests the dissimilarity degree of A and B^c. Besides, in some cases, the result obtained by Equation (7) is more reasonable than the one transformed by separation measure and has higher discriminability. Please look at the following example.

Example 5.1. Suppose X = {x₁, x₂}, A, B, C are three IFSs defined on X with A = {〈x₁, 0.2, 0.3〉, 〈x₂, 0.4, 0.3〉}, B = {〈x₁, 0.2, 0.4〉, 〈x₂, 0.4, 0.4〉}, C = {〈x₁, 0.3, 0.4〉, 〈x₂, 0.5, 0.4〉}.

We notice from the above example, the respective membership of x₁ and x₂ is the same for A and B, each non-membership of x₁ and x₂ is identical for B and C. As for A and C, whether membership or non-membership of x₁, x₂ are different. Hence in intuition, the similarity of A and B is more than that between A and C, scilicet S (A, B) > S (A, C). If using the generalized normalized Hausdorff distance metric between two IFSs M and N

$\begin{matrix} d_{H} (M, N) = (\frac{1}{n} \sum_{i = 1}^{n} max {| μ_{M} (x_{i}) - μ_{N} (x_{i}) |^{p}, \\ | ν_{M} (x_{i}) - ν_{N} (x_{i}) |^{p}})^{\frac{1}{p}}, p \geq 1 \end{matrix}$ whatever the parameter p is, d_H (A, B) = d_H (B, C) = d_H (A, C), accordingly we have s_H (A, B) = s_H (B, C) = s_H (A, C). Clearly this isn’t in conformity with intuitive understanding. If adopting the generalized IF normalized distance for M and N

$\begin{matrix} L_{\mod} (M, N) = \frac{1}{n} \sum_{i = 1}^{n} (\frac{1}{2} | μ_{M} (x_{i}) - μ_{N} (x_{i}) |^{p} \\ + \frac{1}{2} | ν_{M} (x_{i}) - ν_{N} (x_{i}) |^{p})^{\frac{1}{p}}, p \geq 1 \end{matrix}$ the arguments p also has no effect on our example. L_mod (A, B) =0.05, L_mod (A, C) =0.1, the corresponding similarity S_L (A, B) =0.95, S_L (A, C) =0.9. This seems consistent with our intuition but the similarity result of A and C, reaches 0.9, and the two similarity values have few differences. This is not very reasonable by comparing their membership and non-membership.

If utilizing the proposed formula of similarity degree, we get S (A, B) =0.8387, S (A, C) =0.6846. It is noticeable that the result calculated by Equation (7) is distinguished more easily from different pairs of IFSs, additionally our results are more accurate. In fact, our method differs from traditional one in respective focus, which emphasizes the diversity of decision information from the geometric perspective, yet this paper tends to the fuzzy extent of decision information in view of the viewpoint of information theory. In real application, different elements of the universal set have different status and they should hold different weights. The weight w_i of x_i in X is supposed to satisfy $\sum_{i = 1}^{n} w_{i} = 1, w_{i} \in [0, 1]$ . Now we add the attribute weights to Equation (7) and obtain the formula of similarity degree with weights: $S_{w} (A, B) = \frac{{SE}_{w} (A, B^{c})}{{SE}_{w} (A, B) + {SE}_{w} (A, B^{c})}$ (8) where ${SE}_{w} (A, B) = \frac{1}{2} [{RE}_{w} (A, B) + {RE}_{w} (B, A)]$ , and $\begin{matrix} {RE}_{w} (A, B) = \sum_{i = 1}^{n} w_{i} (μ_{A} (x_{i}) {log}_{2} \frac{2 μ_{A} (x_{i})}{μ_{A} (x_{i}) + μ_{B} (x_{i})} \\ + ν_{A} (x_{i}) {log}_{2} \frac{2 ν_{A} (x_{i})}{ν_{A} (x_{i}) + ν_{B} (x_{i})} + π_{A} (x_{i}) {log}_{2} \frac{2 π_{A} (x_{i})}{π_{A} (x_{i}) + π_{B} (x_{i})}) . \end{matrix}$ Evidently if w_i = 1/n (i = 1, 2, ⋯, n), RE_w (A, B) = RE (A, B) and RE_w (A, B) also satisfies the two items of Theorem 3.1, SE_w (A, B^c) can be acquired similarly, so S_w (A, B) follows the same characteristics as S (A, B).

5.2 Co-correlation degree of IFSs

In statistics, the correlation coefficient between two sets of data X and Y is defined as: $R (X, Y) = \frac{cov (X, Y)}{\sqrt{D (X) D (Y)}}$ where cov (X, Y) = E {[X - E (X)] [Y - E (Y)]}, represents the covariance between random variables X and Y; D (X) = E [X - E (X)] ², D (Y) = E [Y - E (Y)] ² signify the variance X and Y, respectively. Inspired by this structural idea, we draw an analogous formula of correlation measure on IFSs, called co-correlation degree of IFSs. The definition is expressed as follows:

Definition 5.1. Let A, B ∈ IFS (X), X = {x₁, x₂, ⋯, x_n}, then the co-correlation degree between A and B is given by $R (A, B) = \frac{| cov (A, B) |}{\sqrt{D (A) D (B)}}$ (9) where $\begin{matrix} cov (A, B) = \sum_{i = 1}^{n} {[μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})] [μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})] \\ + [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})] [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})] + [π_{A} (x_{i}) - \\ \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})] [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})]}, D (A) = cov (A, A) \\ = \sum_{i = 1}^{n} {[μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})]^{2} + [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})]^{2} \\ + [π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})]^{2}}, D (B) = cov (B, B) = \\ \sum_{i = 1}^{n} {[μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})]^{2} + [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})]^{2} \\ + [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})]^{2}} . \end{matrix}$ Here w_i stands for the weight of x_i (i = 1, 2, ⋯, n) with $x_{i} \in [0, 1], \sum_{i = 1}^{n} w_{i} = 1$ .

Theorem 5.2. ∀A, B ∈ IFS (X), R (A, B) satisfies the following conditions: (1) R (A, B) = R (B, A); (2) 0 ⩽ R (A, B) ⩽1; (3) A = B ⇔ R (A, B) =1.

Proof (1) For the definition of Equation (9), the first item is straightforward. (2) The inequality R (A, B) ≥0 is obvious. Below we prove only R (A, B) ⩽1. By the Cauchy-Schwarz inequality, we obtain $\begin{matrix} | cov (A, B) | ⩽ \sum_{i = 1}^{n} | [μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})] [μ_{B} (x_{i}) \\ - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})] | + \sum_{i = 1}^{n} | [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})] \\ \times [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})] | + \sum_{i = 1}^{n} | [π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})] \\ \times [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})] | \\ ⩽ \sqrt{\sum_{i = 1}^{n} [μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})]^{2} \cdot \sum_{i = 1}^{n} [μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})]^{2}} \\ + \sqrt{\sum_{i = 1}^{n} [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})]^{2} \cdot \sum_{i = 1}^{n} [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})]^{2}} \\ + \sqrt{\sum_{i = 1}^{n} [π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})]^{2} \cdot \sum_{i = 1}^{n} [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})]^{2}} . \end{matrix}$

For the sake of simplicity, we adopt the following notations: $\begin{matrix} \sum_{i = 1}^{n} [μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})] = a, \\ \sum_{i = 1}^{n} [μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})]^{2} = b, \\ \sum_{i = 1}^{n} [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})]^{2} = c, \\ \sum_{i = 1}^{n} [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})]^{2} = d, \\ \sum_{i = 1}^{n} [π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})]^{2} = e, \\ \sum_{i = 1}^{n} [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})]^{2} = f . \end{matrix}$ Hence $\begin{matrix} R (A, B) ⩽ \frac{\sqrt{ab} + \sqrt{cd} + \sqrt{ef}}{\sqrt{a + c + e} \cdot \sqrt{b + d + f}} \\ \Rightarrow R^{2} (A, B) ⩽ \frac{(\sqrt{ab} + \sqrt{cd} + \sqrt{ef})^{2}}{(a + c + e) (b + d + f)} \\ = \frac{ab + cd + ef + 2 \sqrt{abcd} + 2 \sqrt{abef} + 2 \sqrt{cdef}}{ab + cd + ef + ad + af + bc + cf + be + de} \\ ∵ ab + cd + ef + ad + af + bc + cf + be + de \\ - (ab + cd + ef + 2 \sqrt{abcd} + 2 \sqrt{abef} + 2 \sqrt{cdef}) \\ = (\sqrt{ad} - \sqrt{bc})^{2} + (\sqrt{af} - \sqrt{be})^{2} + (\sqrt{cf} - \sqrt{de})^{2} \geq 0 \\ ∴ R^{2} (A, B) ⩽ 1 \Leftrightarrow R (A, B) ⩽ 1 . \end{matrix}$

(3) The necessity is evident, we only prove the sufficiency. According to the equality condition of the above three Cauchy inequalities, R (A, B) =1 if and only if there exists nonzero real number k₁, k₂, k₃ such that $\begin{matrix} {\begin{matrix} μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i}) = k_{1} [μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})], \\ ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i}) = k_{2} [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})], \\ π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i}) = k_{3} [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})] . \end{matrix} \end{matrix}$ For all x_i ∈ X, the above equations hold, so k₁ = k₂ = k₃ ≠ 0. Thereby we can infer μ_A (x_i) = kμ_B (x_i), ν_A (x_i) = kν_B (x_i), π_A (x_i) = kπ_B (x_i). For any x_i ∈ X, π_A (x_i) =1 - μ_A (x_i) - ν_A (x_i), π_B (x_i) =1 - μ_B (x_i) - ν_B (x_i), thus k = 1, i.e. μ_A = μ_B, ν_A = ν_B, π_A = π_B. So we have A = B. □ It is easily noted that when R (A, B) >0, the larger its value is, the higher their correlation strength is. In other words, when R (A, B) is nearer to 1, we think the correlation between A and B is stronger. Similar to the construction means in the reference [8], we can also give another form of co-correlation degree for IFSs A and B: $R^{'} (A, B) = \frac{| cov (A, B) |}{max {D (A), D (B)}}$ (10) where the definition of cov (A, B), D (A), D (B) is the same as that of Definition 5.1.

Theorem 5.3. The correlation measure R′ (A, B) between A and B follows the same properties listed in Theorem 5.1.

Proof The items (1) and (3) are apparent, we just prove R′ (A, B) ⩽1. Based on the proof procedure of Theorem 5.1, we have $| cov (A, B) | ⩽ \sqrt{D (A)} \cdot \sqrt{D (B)}$ . As we all know, $\forall a, b \in R^{+}, \sqrt{ab} ⩽ \frac{1}{2} (a + b) ⩽ max {a, b}$ , so |cov (A, B) | ⩽ max {D (A), D (B)}, i.e., R′ (A, B) ⩽1. If taking weights of diverse elements into consideration, the two association coefficient will be modified as: $R_{1} (A, B) = \frac{| {cov}_{w} (A, B) |}{\sqrt{D_{w} (A)} \sqrt{D_{w} (B)}}$ (11) and

$R_{2} (A, B) = \frac{| {cov}_{w} (A, B) |}{max {D_{w} (A), D_{w} (B)}}$ (12) where $\begin{matrix} {cov}_{w} (A, B) = \sum_{i = 1}^{n} w_{i} {[μ_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{A} (x_{i})] \\ \times [μ_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} μ_{B} (x_{i})] + [ν_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{A} (x_{i})] \\ \times [ν_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} ν_{B} (x_{i})] + [π_{A} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{A} (x_{i})] \\ \times [π_{B} (x_{i}) - \sum_{i = 1}^{n} w_{i} π_{B} (x_{i})]}, \\ D_{w} (A) = {cov}_{w} (A, A), D_{w} (B) = {cov}_{w} (B, B) . \end{matrix}$ It can be noted that if the weight vector w=(1/n,1/n,…,1/n)^T, Eqs.(11) and (12) will reduce to Eqs.(9) and (10), respectively. Similarly, we can derive the following theorem.

Theorem 5.4. For A, B ∈ IFS (X), R₁ (A, B) satisfies (1) R₁ (A, B) = R₁ (B, A); (2) 0 ⩽ R₁ (A, B) ⩽1; (3) A = B ⇔ R₁ (A, B) =1.

R₂ (A, B) expressed by Equation (12) obviously satisfies the three items of Theorem 5.3. Since the process to prove the above characteristics is analogous to that in Theorem 5.1, we do not repeat ithere.

6 Aggregation similarity measure for IFSs

In this part, the aggregation similarity measure of IFSs on the basis of the proposed similarity measures is presented. Before that, we make some theoretical preparation for aggregation metric. First let’s recall the wide definition of similaritymeasure.

Definition 6.1. ([9,10, 9,10]) Let A and B be two IFSs on X, a mapping s: IFS (X) × IFS (X) → [0, 1]. s (A, B) is called the degree of similarity between A and B if it satisfies the following properties: (P1) 0 ⩽ s (A, B) ⩽1; (P2) s (A, B) =1 if and only if A = B; (P3) s (A, B) = s (B, A); (P4) If A ⊆ B ⊆ C, then s (A, C) ⩽ s (A, B) ∧ s (B, C).

Definition 6.2. ([11]) Let a mapping f: [0, 1] ⁿ → [0, 1] (n > 1) satisfy the following conditions: (1) f is idempotent at (0, 0, ⋯, 0) and (1, 1, ⋯, 1), i.e., f (0, 0, ⋯, 0) =0 and f (1, 1, ⋯, 1) =1; (2) f is monotonic increasing in each of its components, i.e., if x_i ⩽ y_i, i = {1, 2, ⋯, n}, then f (x₁, x₂, ⋯, x_n) ⩽ f (y₁, y₂, ⋯, y_n). Then f is referred to as an n-ary aggregation function.

IF similarity value is always between 0 and 1, thus s (A, B) of Definition 6.1 can be applied to Definition 6.2.

Definition 6.3. Let s₁ (A, B), s₂ (A, B), ⋯, s_n (A, B) be n similarity measures between A and B, f is an n-ary aggregation function with strict monotonicity, then $s_{f} (A, B) = f (s_{1} (A, B), s_{2} (A, B), \dots, s_{n} (A, B))$ (13) is called an aggregation of similarity measure between A and B.

With the above definition, we can give the following theorem.

Theorem 6.1. s_f (A, B) given by Equation (13) is of similarity measure.

Proof. Let’s verify the items of Definition 6.1 point by point. (1) Since f is a mapping from [0, 1] ⁿ to [0, 1], we have 0 ⩽ s_f (A, B) ⩽1; (2) If A = B, s_i (A, B) =1, i = 1, 2, ⋯, n. Hence s_f (A, B) = f (s₁ (A, B), s₂ (A, B), ⋯, s_n (A, B)) = f (1, 1, ⋯, 1) =1; (3) s_f (A, B) = f (s₁ (A, B), s₂ (A, B), ⋯, s_n (A, B))= f (s₁ (B, A), s₂ (B, A), ⋯, s_n (B, A)) = s_f (B, A); (4) If A ⊆ B ⊆ C, then s_i (A, C) ⩽ s_i (A, B), ∀ i = 1, 2, ⋯, n. According to the monotonicity of aggregation function, s_f (A, C) = f (s₁ (A, C), s₂ (A, C), ⋯, s_n (A, C)) ⩽f (s₁ (A, B), s₂ (A, B), ⋯, s_n (A, B)). Analogously, we also have s_f (A, C) ⩽ s_f (B, C). Namely, s_f (A, C) ⩽ s_f (A, B)∧ s_f (B, C). Therefore, s_f (A, B) is exactly similarity measure. Aggregation measure relies on a aggregation function, then how to construct a suitable integrated function on earth? Its general notion is the following form $P (x_{1}, x_{2}, \dots, x_{n}) = g (\sum_{i = 1}^{n} ω_{i} h (x_{i}))$ where (ω₁, ω₂, ⋯, ω_n) ∈ [0, 1] ⁿ is a weighting vector such that $\sum_{i = 1}^{n} ω_{i} = 1$ , h ia a convex automorphism, g ia a non-decreasing function such that g (0) =0, the resultant function P : [0, 1] ⁿ → [0, 1]. Specially, g can be taken as the inverse of h, and the P function turns into $P (x_{1}, x_{2}, \dots, x_{n}) = h^{- 1} (\sum_{i = 1}^{n} ω_{i} h (x_{i}))$ . As we know, it’s the most simple to set h (x) = x, and we get the most common aggregation operator, weighted arithmetic averaging (WAA) operator with many excellent traits such as monotonicity, idempotence and intermediate value property. The operator as below $f_{1} (x_{1}, x_{2}, \dots, x_{n}) = \sum_{i = 1}^{n} ω_{i} x_{i}$ Without loss of generality, we take h (x) = x^p (p is a constant) and acquire the well-known Generalized WAA (GWAA) operator:

$\begin{matrix} f (x_{1}, x_{2}, \dots, x_{n}) \\ = (\sum_{i = 1}^{n} ω_{i} x_{i}^{p})^{1 / p}, ω_{i} \in [0, 1], \sum_{i = 1}^{n} ω_{i} = 1 . \end{matrix}$ (14) This operator can adjust the aggregated result flexibly by providing a alternative parameter p. When p = 1, it’s WAA operator; when p = -1, it becomes the weighted harmonic averaging (WHA) operator: $f_{- 1} (x_{1}, x_{2}, \dots, x_{n}) = (\sum_{i = 1}^{n} \frac{ω_{i}}{x_{i}})^{- 1}$ In principle, p ≠ 0 but we can pass to its limit. Actually when p → 0, Equation (14) tends to the WGA operator: $f_{0} (x_{1}, x_{2}, \dots, x_{n}) = \prod_{i = 1}^{n} x_{i}^{ω_{i}}$ In what follows we adopt the three specific GWAA function to separately synthesize the three similarity degrees S_w, R₁, R₂ in this paper so as to take in their results comprehensively, and the synthetic similarity degree is defined by

$\begin{matrix} s_{f} (A, B) = (\sum_{i = 1}^{n} ω_{i} s_{i}^{p} (A, B))^{1 / p}, \\ ω_{i} \in [0, 1], \sum_{i = 1}^{n} ω_{i} = 1 . \end{matrix}$ (15) In particular, when ω_i = 1, ω_j = 0 (j ≠ i), s_f (A, B) = s_i (A, B). The key problem is how to set the weighting factor ω_i (i = 1, 2, ⋯, n) for this combined similarity degree. Bustince et al. [12] suggested that weights distribution should make the output (aggregation similarity measure) least dissimilar to all the inputs (existing similarity measures). This idea can be depicted by the following model to obtain all weights. $(M - 2) {\begin{matrix} min \sum_{i = 1}^{n} (s - s_{i})^{2} \\ s = {(\sum_{i = 1}^{n} ω_{i} s_{i}^{p})}^{1 / p} \\ s . t . 0 ⩽ ω_{i} ⩽ 1, \sum_{i = 1}^{n} ω_{i} = 1 \end{matrix}$ In the actual problem, we have ready-made mathematical software to figure out those optimal weights ω₁, ω₂, ⋯, ω_n after giving a arguments p.

7 Clustering analysis based on the combined similarity degree

Upon the support of the above theory, we can do some clustering analysis under IF granular space through a real-world numerical example.

7.1 The approaches of IF clustering

Suppose that A = {A₁, A₂, ⋯, A_m}, X = {x₁, x₂, ⋯, x_n}, w=(w_1,w_2,…,w_n)^T are successively the set of objects, the set of attributes and the weight vector. All the attribution value of every object is given in the form of IFS, and then get the IF decision matrix D.

Because of similarity measures of this article only have reflexivity and symmetry without transitivity, so they can’t denote equivalence relation. In this case, we employ the algorithm from Refs. [8 , 34–36] to cluster based on fuzzy equivalence relation through turning similarity relation into equivalence relation, that is the transitive closure technique. The detailed steps asfollows:

Step 1. Determine the PIS A⁺ and the NIS A^- from the decision matrix D, utilize Equation (5) to calculate the relative entropy of the object A_i to the PIS A⁺ and the NIS A^- with respect to the attribute x_j, and acquire the corresponding RCD between A_i and A^*over x_j.

Step 2. Utilize Equation (6) to calculate the weight w_j of the attribute x_j and get the weight vector w.

Step 3. Utilize Equation (4) to compute the relative entropy of any two IFSs A_i, A_j, and obtain one weighted similarity degree by Equation (8), eventually form a IF similarity matrix S.

Step 4. Utilize respectively Equation (11), (12) to work out two kinds of co-correlation degrees of A_i, A_j (i, j = 1, 2, ⋯, m) and get their respective correlation matrix R₁, R₂ regarding all alternatives.

Step 5. Utilize Equation (15) to decide the integrated similarity degree s (A_i, A_j) between A_i and A_j via the three matrices S, R₁, R₂ and construct the resultant correlation matrix R after obtaining the proportions of S_w, R₁, R₂ by solving model(M-2).

Step 6. Adopt the square method to compose the association matrix R as an equivalent relation matrix (transitive closure) t (R) in terms of the max-min compositions, i.e. R→ R² → (R²) ² → ⋯ → R^{2^k} → ⋯, until R^{2^k} = R^{2^(k+1)}, then t (R) = R^{2^k}.

Step 7. Utilize t (R) to deal with all the objects for clustering; give different cutting level α to derive the corresponding α-cutting matrix t (R) _α and the cutting matrix (Boolean matrix) under each threshold α represents one class. The clustering method is that if all the elements of the ith line (column) in Boolean matrix t (R) _α are the same as the according elements of the jth line (column) in t (R) _α, then A_i and A_j are viewed as the same class. Obviously, if α = 1, every object will become a single cluster. This is the smallest cluster granule. With the reduction of the threshold α, objects are gradually merged until α = 0, all the objects will be one cluster. The larger the confidence level α, the finer the clustering granularity. When α vary from 1 to 0, the number of clusters changes from m to 1, and lastly a set of clustering with alterable granularity isproduced.

7.2 Applications

In this section, we discuss a practical issue concerning classification of cars in [8]. Ten different cars A_i (i = 1, 2, ⋯, 10) will be classified into several kinds and experts evaluate these cars via six factors: fuel consumption x₁, friction coefficient x₂, price x₃, comfort degree x₄, design x₅, and safety x₆. The evaluation information of each car under the six indexes is expressed by IFSs, the overall data are listed in Table 1 (decision matrix D) and they denote the satisfaction and dissatisfaction degree of each object over all theattribution.

Table 1
The evaluation information of the cars.

x ₁ x ₂ x ₃ x ₄ x ₅ x ₆

A ₁ 〈0.3, 0.4〉〈0.2, 0.7〉〈0.4, 0.5〉〈0.8, 0.1〉〈0.4, 0.5〉〈0.2, 0.7〉

A ₂ 〈0.4, 0.3〉〈0.5, 0.1〉〈0.6, 0.2〉〈0.2, 0.7〉〈0.3, 0.6〉〈0.7, 0.2〉

A ₃ 〈0.4, 0.2〉〈0.6, 0.1〉〈0.8, 0.1〉〈0.2, 0.6〉〈0.3, 0.7〉〈0.5, 0.2〉

A ₄ 〈0.3, 0.4〉〈0.9, 0.0〉〈0.8, 0.1〉〈0.7, 0.1〉〈0.1, 0.8〉〈0.2, 0.8〉

A ₅ 〈0.8, 0.1〉〈0.7, 0.2〉〈0.7, 0.0〉〈0.4, 0.1〉〈0.8, 0.2〉〈0.4, 0.6〉

A ₆ 〈0.4, 0.3〉〈0.3, 0.5〉〈0.2, 0.6〉〈0.7, 0.1〉〈0.5, 0.4〉〈0.3, 0.6〉

A ₇ 〈0.6, 0.4〉〈0.4, 0.2〉〈0.7, 0.2〉〈0.3, 0.6〉〈0.3, 0.7〉〈0.6, 0.1〉

A ₈ 〈0.9, 0.1〉〈0.7, 0.2〉〈0.7, 0.1〉〈0.4, 0.5〉〈0.4, 0.5〉〈0.8, 0.0〉

A ₉ 〈0.4, 0.4〉〈1.0, 0.0〉〈0.9, 0.1〉〈0.6, 0.2〉〈0.2, 0.7〉〈0.1, 0.8〉

A ₁₀ 〈0.9, 0.1〉〈0.8, 0.0〉〈0.6, 0.3〉〈0.5, 0.2〉〈0.8, 0.1〉〈0.6, 0.4〉

	x ₁	x ₂	x ₃	x ₄	x ₅	x ₆
A ₁	〈0.3, 0.4〉	〈0.2, 0.7〉	〈0.4, 0.5〉	〈0.8, 0.1〉	〈0.4, 0.5〉	〈0.2, 0.7〉
A ₂	〈0.4, 0.3〉	〈0.5, 0.1〉	〈0.6, 0.2〉	〈0.2, 0.7〉	〈0.3, 0.6〉	〈0.7, 0.2〉
A ₃	〈0.4, 0.2〉	〈0.6, 0.1〉	〈0.8, 0.1〉	〈0.2, 0.6〉	〈0.3, 0.7〉	〈0.5, 0.2〉
A ₄	〈0.3, 0.4〉	〈0.9, 0.0〉	〈0.8, 0.1〉	〈0.7, 0.1〉	〈0.1, 0.8〉	〈0.2, 0.8〉
A ₅	〈0.8, 0.1〉	〈0.7, 0.2〉	〈0.7, 0.0〉	〈0.4, 0.1〉	〈0.8, 0.2〉	〈0.4, 0.6〉
A ₆	〈0.4, 0.3〉	〈0.3, 0.5〉	〈0.2, 0.6〉	〈0.7, 0.1〉	〈0.5, 0.4〉	〈0.3, 0.6〉
A ₇	〈0.6, 0.4〉	〈0.4, 0.2〉	〈0.7, 0.2〉	〈0.3, 0.6〉	〈0.3, 0.7〉	〈0.6, 0.1〉
A ₈	〈0.9, 0.1〉	〈0.7, 0.2〉	〈0.7, 0.1〉	〈0.4, 0.5〉	〈0.4, 0.5〉	〈0.8, 0.0〉
A ₉	〈0.4, 0.4〉	〈1.0, 0.0〉	〈0.9, 0.1〉	〈0.6, 0.2〉	〈0.2, 0.7〉	〈0.1, 0.8〉
A ₁₀	〈0.9, 0.1〉	〈0.8, 0.0〉	〈0.6, 0.3〉	〈0.5, 0.2〉	〈0.8, 0.1〉	〈0.6, 0.4〉

Step 1. Acquire the PIS A⁺ = {〈x₁, 0.3, 0.4〉, 〈x₂, 1.0, 0.0〉, 〈x₃, 0.2, 0.6〉, 〈x₄, 0.8, 0.1〉, 〈x₅, 0.8, 0.1〉, 〈x₆, 0.8, 0.0〉} and the NIS A^- = {〈x₁, 0.9, 0.1〉, 〈x₂, 0.2, 0.7〉, 〈x₃, 0.9, 0.0〉, 〈x₄, 0.2, 0.7〉, 〈x₅, 0.1, 0.8〉, 〈x₆, 0.1, 0.8〉} from Table 1. P.S. Due to x₁ and x₃ are of cost-type index, the way that derive their positive ideal value and negative ideal value is the opposite of other attribute ideal values.

Step 2. Calculate all the relative entropy under each attribution and homologous RCD (for a large quantity of cross-entropy separation involving the six attributes and ten objects, the results are not listed here). Then utilizing Equation (6), obtain the attribute weights w=(0.1773,0.2375,0.1228,0.1738,0.1411,0.1476)^T.

Step 3. Computing all the similarity degree between IFSs by Equation (8), we obtain one similarity matrix S:

$S = (\begin{matrix} 1.0000 & 0.1677 & 0.2408 & 0.4475 & 0.4652 & 0.8702 & 0.3600 & 0.2453 & 0.4246 & 0.3656 \\ 0.1677 & 1.0000 & 0.8535 & 0.6323 & 0.5701 & 0.1529 & 0.7046 & 0.7660 & 0.6040 & 0.6248 \\ 0.2408 & 0.8535 & 1.0000 & 0.6764 & 0.6396 & 0.2346 & 0.7073 & 0.7590 & 0.6671 & 0.6323 \\ 0.4475 & 0.6323 & 0.6764 & 1.0000 & 0.7047 & 0.4744 & 0.5540 & 0.5637 & 0.9215 & 0.7516 \\ 0.4652 & 0.5701 & 0.6396 & 0.7047 & 1.0000 & 0.4740 & 0.5618 & 0.6528 & 0.6897 & 0.7791 \\ 0.8702 & 0.1529 & 0.2346 & 0.4744 & 0.4740 & 1.0000 & 0.3611 & 0.3376 & 0.4622 & 0.4730 \\ 0.3600 & 0.7046 & 0.7073 & 0.5540 & 0.5618 & 0.3611 & 1.0000 & 0.7525 & 0.5483 & 0.5730 \\ 0.2453 & 0.7660 & 0.7590 & 0.5637 & 0.6528 & 0.3376 & 0.7525 & 1.0000 & 0.5605 & 0.7039 \\ 0.4246 & 0.6040 & 0.6671 & 0.9215 & 0.6897 & 0.4622 & 0.5483 & 0.5605 & 1.0000 & 0.7416 \\ 0.3656 & 0.6248 & 0.6323 & 0.7516 & 0.7791 & 0.4730 & 0.5730 & 0.7039 & 0.7416 & 1.0000 \end{matrix})$

Step 4. Use Eqs.(11) and (12) separately to figure out the degree of similarity for any two IFSs in D and get their corresponding correlation matrix as below:

$R_{1} = (\begin{matrix} 1.0000 & 0.6712 & 0.4979 & 0.1882 & 0.0302 & 0.8668 & 0.5643 & 0.6670 & 0.0572 & 0.3692 \\ 0.6712 & 1.0000 & 0.8427 & 0.1338 & 0.1849 & 0.6631 & 0.7686 & 0.7159 & 0.1081 & 0.0507 \\ 0.4979 & 0.8427 & 1.0000 & 0.3837 & 0.0091 & 0.6095 & 0.7125 & 0.7010 & 0.4246 & 0.0005 \\ 0.1882 & 0.1338 & 0.3837 & 1.0000 & 0.2705 & 0.0774 & 0.0107 & 0.0552 & 0.9594 & 0.0917 \\ 0.0302 & 0.1849 & 0.0091 & 0.2705 & 1.0000 & 0.1060 & 0.2086 & 0.1136 & 0.3594 & 0.6663 \\ 0.8668 & 0.6631 & 0.6095 & 0.0774 & 0.1060 & 1.0000 & 0.6399 & 0.7087 & 0.0653 & 0.0328 \\ 0.5643 & 0.7686 & 0.7125 & 0.0107 & 0.2086 & 0.6399 & 1.0000 & 0.7285 & 0.0017 & 0.0240 \\ 0.6670 & 0.7159 & 0.7010 & 0.0552 & 0.1136 & 0.7087 & 0.7285 & 1.0000 & 0.0074 & 0.0257 \\ 0.0572 & 0.1081 & 0.4246 & 0.9594 & 0.3594 & 0.0653 & 0.0017 & 0.0074 & 1.0000 & 0.2002 \\ 0.3692 & 0.0507 & 0.0005 & 0.0917 & 0.6663 & 0.0328 & 0.0240 & 0.0257 & 0.2002 & 1.0000 \end{matrix})$

$R_{2} = (\begin{matrix} 1.0000 & 0.6671 & 0.4679 & 0.1257 & 0.0296 & 0.6937 & 0.5605 & 0.5547 & 0.0385 & 0.2696 \\ 0.6671 & 1.0000 & 0.7870 & 0.0888 & 0.1824 & 0.5340 & 0.7682 & 0.5991 & 0.0723 & 0.0373 \\ 0.4679 & 0.7870 & 1.0000 & 0.2727 & 0.0084 & 0.4584 & 0.6650 & 0.5478 & 0.3042 & 0.0003 \\ 0.1257 & 0.0888 & 0.2727 & 1.0000 & 0.1771 & 0.0414 & 0.0071 & 0.0307 & 0.9516 & 0.0447 \\ 0.0296 & 0.1824 & 0.0084 & 0.1771 & 1.0000 & 0.0865 & 0.2058 & 0.0964 & 0.2372 & 0.4964 \\ 0.6937 & 0.5340 & 0.4584 & 0.0414 & 0.0865 & 1.0000 & 0.5156 & 0.6819 & 0.0352 & 0.0299 \\ 0.5605 & 0.7682 & 0.6650 & 0.0071 & 0.2058 & 0.5156 & 1.0000 & 0.6100 & 0.0011 & 0.0176 \\ 0.5547 & 0.5991 & 0.5478 & 0.0307 & 0.0964 & 0.6819 & 0.6100 & 1.0000 & 0.0041 & 0.0225 \\ 0.0385 & 0.0723 & 0.3042 & 0.9516 & 0.2372 & 0.0352 & 0.0011 & 0.0041 & 1.0000 & 0.0985 \\ 0.2696 & 0.0373 & 0.0003 & 0.0447 & 0.4964 & 0.0299 & 0.0176 & 0.0225 & 0.0985 & 1.0000 \end{matrix})$

Step 5. Use MATLAB optimization toolbox to work out the most desirable solution of model (M-2) as the weighting vector ω = (0.3333, 0.3339, 0.3328) ^T for p = 1, and calculate the final correlation matrice R through the above three matrices by usingEquation (15).

$R = (\begin{matrix} 1.0000 & 0.5020 & 0.4022 & 0.2538 & 0.1750 & 0.8103 & 0.4949 & 0.4891 & 0.1734 & 0.3349 \\ 0.5020 & 1.0000 & 0.8278 & 0.2850 & 0.3125 & 0.4501 & 0.7471 & 0.6937 & 0.2615 & 0.2376 \\ 0.4022 & 0.8278 & 1.0000 & 0.4443 & 0.2190 & 0.4343 & 0.6950 & 0.6693 & 0.4654 & 0.2110 \\ 0.2538 & 0.2850 & 0.4443 & 1.0000 & 0.3841 & 0.1977 & 0.1906 & 0.2165 & 0.9442 & 0.2960 \\ 0.1750 & 0.3125 & 0.2190 & 0.3841 & 1.0000 & 0.2222 & 0.3254 & 0.2876 & 0.4288 & 0.6474 \\ 0.8103 & 0.4501 & 0.4343 & 0.1977 & 0.2222 & 1.0000 & 0.5056 & 0.5761 & 0.1876 & 0.1786 \\ 0.4949 & 0.7471 & 0.6950 & 0.1906 & 0.3254 & 0.5056 & 1.0000 & 0.6971 & 0.1837 & 0.2049 \\ 0.4891 & 0.6937 & 0.6693 & 0.2165 & 0.2876 & 0.5761 & 0.6971 & 1.0000 & 0.1906 & 0.2507 \\ 0.1734 & 0.2615 & 0.4654 & 0.9442 & 0.4288 & 0.1876 & 0.1837 & 0.1906 & 1.0000 & 0.3468 \\ 0.3349 & 0.2376 & 0.2110 & 0.2960 & 0.6474 & 0.1786 & 0.2049 & 0.2507 & 0.3468 & 1.0000 \end{matrix})$ Step 6. by programming in MATLAB environment, we obtain

$R^{16} = R^{8} \circ R^{8} = (\begin{matrix} 1.0000 & 0.5761 & 0.5761 & 0.4654 & 0.4288 & 0.8103 & 0.5761 & 0.5761 & 0.4654 & 0.4288 \\ 0.5761 & 1.0000 & 0.8278 & 0.4654 & 0.4288 & 0.5761 & 0.7471 & 0.6971 & 0.4654 & 0.4288 \\ 0.5761 & 0.8278 & 1.0000 & 0.4654 & 0.4288 & 0.5761 & 0.7471 & 0.6971 & 0.4654 & 0.4288 \\ 0.4654 & 0.4654 & 0.4654 & 1.0000 & 0.4288 & 0.4654 & 0.4654 & 0.4654 & 0.9442 & 0.4288 \\ 0.4288 & 0.4288 & 0.4288 & 0.4288 & 1.0000 & 0.4288 & 0.4288 & 0.4288 & 0.4288 & 0.6474 \\ 0.8103 & 0.5761 & 0.5761 & 0.4654 & 0.4288 & 1.0000 & 0.5761 & 0.5761 & 0.4654 & 0.4288 \\ 0.5761 & 0.7471 & 0.7471 & 0.4654 & 0.4288 & 0.5761 & 1.0000 & 0.6971 & 0.4654 & 0.4288 \\ 0.5761 & 0.6971 & 0.6971 & 0.4654 & 0.4288 & 0.5761 & 0.6971 & 1.0000 & 0.4654 & 0.4288 \\ 0.4654 & 0.4654 & 0.4654 & 0.9442 & 0.4288 & 0.4654 & 0.4654 & 0.4654 & 1.0000 & 0.4288 \\ 0.4288 & 0.4288 & 0.4288 & 0.4288 & 0.6474 & 0.4288 & 0.4288 & 0.4288 & 0.4288 & 1.0000 \end{matrix}) = R^{8},$ so the IF equivalent matrix t (R) = R⁸.

Step 7. Select different confidence levels α (∈ [0, 1]), and construct α-cutting matrix t (R) _α. Let α decrease from 1 to 0 and conduct multi-granularity clustering via each equivalence relation matrix t (R) _α:

(1) If 0.9442 < α ⩽ 1, A_i (i = 1, 2, ⋯, 10) are classified into 10 types, scilicet each object is of one type alone: {A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀};

(2) If 0.8278 < α ⩽ 0.9442, A_i (i = 1, 2, ⋯, 10) are classified into 9 types: {A₁}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀};

(3) If 0.8103 < α ⩽ 0.8278, A_i (i = 1, 2, ⋯, 10) are classified into 8 types: {A₁}, {A₂, A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀};

(4) If 0.7471 < α ⩽ 0.8103, A_i (i = 1, 2, ⋯, 10) are classified into 7 types: {A₁, A₆}, {A₂, A₃}, {A₄, A₉}, {A₅}, {A₇}, {A₈}, {A₁₀};

(5) If 0.6971 < α ⩽ 0.7471, A_i (i = 1, 2, ⋯, 10) are classified into 6 types: {A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀};

(6) If 0.6474 < α ⩽ 0.6971, A_i (i = 1, 2, ⋯, 10) are classified into 5 types: {A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅}, {A₁₀};

(7) If 0.5761 < α ⩽ 0.6474, A_i (i = 1, 2, ⋯, 10) are classified into 4 types: {A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀};

(8) If 0.4654 < α ⩽ 0.5761, A_i (i = 1, 2, ⋯, 10) are classified into 3 types: {A₁, A₂, A₃, A₆, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀};

(9) If 0.4288 < α ⩽ 0.4654, A_i (i = 1, 2, ⋯, 10) are classified into 2 types: {A₁, A₂, A₃, A₄, A₆, A₇, A₈, A₉}, {A₅, A₁₀};

(10) If 0 < α ⩽ 0.4288, A_i (i = 1, 2, ⋯, 10) are classified into 1 type, scilicet, all the objects are of one group: {A₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉, A₁₀}.

Refs. [8 , 15] involve this real example and Ref. [14] has a analogous example. We might as well make a simple comparison among these experimental outcomes as Tables 2 and 3.

Table 2

The clustering results of other method and formula.

Class	The cluster of Xu et al. [13]	The cluster of Xu et al. [8]
10	{A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀}	{A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀}
9	{A₁}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}	{A₁}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}
8	{A₁}, {A₂, A₇}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₈}, {A₁₀}	{A₁}, {A₂}, {A₃, A₇}, {A₄, A₉}, {A₅}, {A₆}, {A₈}, {A₁₀}
7	{A₁}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₆}, {A₈}, {A₁₀}	{A₁, A₆}, {A₂}, {A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}
6	{A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}	{A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}
5(7)	{A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅, A₁₀}, {A₈}	{A₁, A₆}, {A₂}, {A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}
4(6)	{A₁, A₆, A₈}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅, A₁₀}	{A₁, A₆}, {A₂}, {A₃, A₇}, {A₄, A₉}, {A₅, A₁₀}, {A₈}
3(5)	{A₁, A₄, A₉}, {A₂, A₃, A₇, A₈} {A₅, A₆, A₁₀}	{A₁, A₆}, {A₂}, {A₃, A₅, A₇, A₁₀}, {A₄, A₉}, {A₈}
2	{A₁, A₂, A₃, A₅, A₆, A₇, A₈, A₁₀}, {A₄, A₉}	{A₁, A₆}, {A₂, A₃, A₄, A₅, A₇, A₈, A₉, A₁₀}
1	{A₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉, A₁₀}	{A₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉, A₁₀}

Table 3

The clustering results of other method and formula.

Class	The cluster of Zhang et al. [14]	The cluster of Hwang et al. [15]
10	{A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀}
9	{A₁}, {A₂}, {A₃}, {A₄,A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}
8	{A₁,A₆}, {A₂}, {A₃}, {A₄,A₉}, {A₅}, {A₇}, {A₈}, {A₁₀}
7	{A₁,A₆}, {A₂,A₃}, {A₄,A₉}, {A₅}, {A₇}, {A₈}, {A₁₀}
6
5	{A₁,A₆}, {A₂,A₃,A₇}, {A₄,A₉}, {A₅,A₁₀}, {A₈}
4	{A₁,A₆}, {A₂,A₃,A₇,A₈}, {A₄,A₉}, {A₅,A₁₀}	{A₁,A₆}, {A₂,A₃,A₇,A₈}, {A₄,A₉}, {A₅,A₁₀}
3	{A₁,A₆}, {A₂,A₃,A₅,A₇,A₈,A₁₀}, {A₄,A₉}	{A₁, A₆}, {A₂, A₃, A₅, A₇, A₈, A₁₀}, {A₄,A₉}
2	{A₁,A₂,A₃,A₅,A₆,A₇,A₈,A₁₀}, {A₄,A₉}	{A₁,A₂,A₃,A₅,A₆,A₇,A₈,A₁₀}, {A₄,A₉}
1	{A₁,A₂,A₃,A₄,A₅,A₆,A₇,A₈,A₉,A₁₀}	{A₁,A₂,A₃,A₄,A₅,A₆,A₇,A₈,A₉,A₁₀}

By studying the two table, we can find out the following points at least:

(1) Inspecting the results of the references [8] and in Table 2, [13], we identify some abnormal phenomenon that two objects are categorized to one cluster but they may be not in the same group when partitioned less classes. For the clusters of Ref. [8], cars can’t be classified into 3 and 4 classes and there are two sorts of different results when they are clustered 6 types. The authors [13] employ the IFCM algorithm for clustering, which may produce “overlapped” clusters, that is to say, a data set is assigned to this cluster and it also pertains to another cluster at the same time. That causes the classification boundary not crisp. For example, A₁, A₆ and A₈ belong to a cluster when classified as 4 classes. However, when the granularity of the classification is more rough, A₁, A₄ and A₉ are in the same class, A₆ and A₈ are partitioned into other groups, respectively. Further, A₄ and A₉ are not with A₁ when them are divided into 2 types. The attribute weights are given directly in [8, 13], thereupon weighting information becomes parameter. This leads to the problem in determining which weight is better and how to choose appropriate weight value. We know disparate weights will affect the eventual outcome, which has been demonstrated fully via various dendrograms in [15]. As to the IFCM algorithm, its essence is an iterative method like the fuzzy C-means where the desired number of clusters c and the initial clustering seeds have to be pre-estimated and then modified continually until acquiring the convergence level setted ɛ. Besides, sometimes this technique probably fall into endless loops when operating it, whose reason is that the algorithm demands the data provided to be "convex", otherwise it will generate a local optimal solution.

(2) From Table 3, we notice that the two types of classifications cannot show us the entire data information. This may not meet the requirement of a variety of classification in reality. Why does this happen? An important factor is that they don’t consider the attribution weights and different attributes should account for different proportion. Hwang et al. [15] obtain a hierarchical clustering tree through the improved similarity-based clustering method for the similarity matrix. Their dendrogram shows us only the well-separated clusters and a smaller number of clusters, and the correlation matrix R calculated by the similarity measure formula provided in [15] have many equal value. This reveals the proposed formula lacks high enough resolving quality among several pairs of similar data sets. The clustering technique of Zhang et al. [14] transforms the given IF information into interval-valued one and all the similarity degrees take the form of interval numbers. This will require much more computational effort and mean a lack of accuracy in the similarity degrees. In addition, they only consider the minimal and maximal deviation information and ignore all the attribute weight information on the similarity formula, which will result in the loss of too much information.

Now let’s look at our clustering case. The result displays complete hierarchical clusters from 10 classes to one for us without any abnormity. Though possibly producing large computation quantity during the operation, it’s very fit for programming in computer. Due to it is hierarchical clustering with various cutting levels, we can select appropriate clustering number in line with the specific application environment, so it can meet the requirement of different situation. This desirable result also derives from the similarity formulas presented in this article with strong discriminating power. Another advantage is the calculation of attribute weights. Therefore, there are not any repetitive value in upper or lower triangular portion of the IF similarity matrices S, R₁ and R₂ computed in this article, which precedes results of much literature such as Refs. [8 , 15]. However, no formula can be applied to all conditions in practical problem and different similarity formulas will get different correlation degrees, even conflict consequences sometimes. Thus we adopt aggregation operator to aggregate various similarity degrees so that the final similarity matrix R obtained simultaneously contains the information from Eqs.(8),(11) and (12). Then the corresponding clustering result is quite objective, comprehensive and reasonable. For example, when samples are sorted into 8 groups, A₂ is more similar to A₃ but both are separated in [8 , 14]. It is more suitable to put A₂ and A₃ together just as our clusters. Besides, the clustering algorithm based on transitive closure because of its invariance under the initial numbering of the objects and invariance under monotone transformations of similarity measure [36] is excellent enough so that the clustering results are satisfying.

Table 4

The clusters information of 10 cars.

Classes	Confidence level	Hierarchical clustering for all cutting level
10	0.9520 < α ⩽ 1	{A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀}
9	0.8381 < α ⩽ 0.9520	{A₁}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}
8	0.8361 < α ⩽ 0.8381	{A₁, A₆}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₇}, {A₈}, {A₁₀}
7	0.7574 < α ⩽ 0.8361	{A₁, A₆}, {A₂, A₃}, {A₄, A₉}, {A₅}, {A₇}, {A₈}, {A₁₀}
6	0.7128 < α ⩽ 0.7574	{A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}
5	0.6504 < α ⩽ 0.7128	{A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅}, {A₁₀}
4	0.5988 < α ⩽ 0.6504	{A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀}
3	0.4261 < α ⩽ 0.5988	{A₁, A₂, A₃, A₆, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀}
2	0.3614 < α ⩽ 0.4261	{A₁, A₂, A₃, A₄, A₆, A₇, A₈, A₉}, {A₅, A₁₀}
1	0 < α ⩽ 0.3614	{A₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉, A₁₀}

Table 5

The clusters information of 10 cars.

Classes	Confidence level	Hierarchical clustering for all cutting level
10	0.9468 < α ⩽ 1	{A₁}, {A₂}, {A₃}, {A₄}, {A₅}, {A₆}, {A₇}, {A₈}, {A₉}, {A₁₀}
9	0.8363 < α ⩽ 0.9468	{A₁}, {A₂}, {A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}
8	0.8362 < α ⩽ 0.8363	{A₁}, {A₂, A₃}, {A₄, A₉}, {A₅}, {A₆}, {A₇}, {A₈}, {A₁₀}
7	0.7491 < α ⩽ 0.8362	{A₁, A₆}, {A₂, A₃}, {A₄, A₉}, {A₅}, {A₇}, {A₈}, {A₁₀}
6	0.7141 < α ⩽ 0.7491	{A₁, A₆}, {A₂, A₃, A₇}, {A₄, A₉}, {A₅}, {A₈}, {A₁₀}
5	0.6644 < α ⩽ 0.7141	{A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅}, {A₁₀}
4	0.5659 < α ⩽ 0.6644	{A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀}
3	0.4589 < α ⩽ 0.5659	{A₁, A₂, A₃, A₆, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀}
2	0.4064 < α ⩽ 0.4589	{A₁, A₂, A₃, A₄, A₆, A₇, A₈, A₉}, {A₅, A₁₀}
1	0 < α ⩽ 0.4064	{A₁, A₂, A₃, A₄, A₅, A₆, A₇, A₈, A₉, A₁₀}

In step 5, if p = -1, the GWAA operator becomes the WHA operator and the according optimal weights ω = (0.1620, 0.6982, 0.1398) ^T of those similarity degrees S_w, R₁, R₂, and the rest of the steps remain unchanged. The matching clustering results as shown in Table 4. If p → 0 of step 5, i.e., the WGAA approaches WGA operator and the corresponding share ω = (0.2947, 0.5389, 0.1664) ^T of S_w, R₁, R₂. The remaining processes keep unchanged, the possible classifications of 10 cars A_i (i = 1, 2, ⋯, 10) as Table 5.

By observing the above tables, it’s noted that the clustering of Table 4 differs slightly from Table 5 on 8 classes only. A₁ and A₆ belong to one group in Table 4 while A₂ and A₃ are placed on one class in Table 5. We can also see Table 5 is identical to the case when p = 1. This shows different arguments value have a little different influences on the ultimate clusters granule for the GWAA function. As the discussion above, A₂ should be clustered one type with A₃ first, so it’s proper that the value of p is 1 or approach 0. But whatever the value p is given, the clustering obtained is superior to other clustering results obtained by using single formula of similarity measure in previous research. In fact, there are four brands in all among those cars and cars of the same brand should belong to one class with different versions. Hence the actual classification is {A₁, A₆}, {A₂, A₃, A₇, A₈}, {A₄, A₉}, {A₅, A₁₀}, which is consistent with our clustering when classified into 4 classes, no matter how values of p vary.

8 Conclusions

Cluster analysis is an important investigation direction in the field of machine learning as an unsupervised learning method. After clustering, it should ensure that the same category of data possesses a high similarity as far as possible, and a low similarity between different types of data, so the selection of appropriate similarity measure becomes the key of clustering. Due to the realistic classification is often accompanied by fuzziness, so it’s more natural using the fuzzy theory to do clustering analysis for uncertain granular information, in more line with the objective reality. With the development of IF theory, the cluster method on IF granularity structure has drawn intense attention. In this article, for IF clustering, three sensitive similarity measures for diverse IFSs and the aggregation similarity degree under IF environment are presented. In the acquisition of the attribute weights, we use the relative entropy separation with high distinguishability to construct the RCD instead of establishing directly the optimization model with the PIS or NIS individually, which will ensure that the achieved weights formula is more objective and rational. As a result, the correlation matrix obtained by the integrated similarity formula with weights can better depict the diversity of diverse objects. Based on these theoretical preparations, lastly the author gives the IF cluster algorithm and demonstrates its viability and availability via an illustrative example.

Acknowledgements

The authors are highly grateful to any anonymous reviewer for their careful reading and insightful comments, and their constructive opinions and suggestions will generate an improved version of this article. The work is supported by the National Nature Science Foundation of China (No. 61673020), the Provincial Nature Science Research Key Project for Colleges and Universities of Anhui Province (No. KJ2013A033), and the Talent Support Key Project for Outstanding Young of Colleges and Universities in 2016 (No. gxyqZD2016453).

References

Zadeh

L.A.

, Fuzzy sets, Information and Control 8 (1965),338–353.

Atanassov

K.T.

, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1986),87–96.

Atanassov

K.T.

, More on intuitionistic fuzzy sets, Fuzzy Sets and Systems 33 (1989),37–46.

Shannon

C.E.

, A mathematical theory of communication, The Bell System Technical Journal 27 (1948),623–656.

Kullback

and Leibler

R.A.

, On information and sufficiency, The Annals of Mathematical Satistics 22 (1951),79–86.

Shang

X.G.

and Jiang

W.S.

, A note on fuzzy information measures, Pattern Recognition Letters 18 (1997),425–432.

Vlachos

I.K.

and Sergiadis

G.D.

, Intuitionistic fuzzy information-applications to pattern recognition, Pattern Recognition Letters 28 (2007),197–206.

Z.S.

, Chen

and Wu

J.J.

, Clustering algorithm for intuitionistic sets, Information Sciences 178 (2008),3775–3790.

D.F.

and Chen

C.T.

, New similarity measures of intuitionistic fuzzy sets and application to pattern recognitions, Pattern Recognition Letters 23 (2002),221–225.

10.

Mitchell

H.B.

, On the Dengfeng-Chuntian similarity measure and its application to pattern recognition, Pattern Recognition Letters 24 (2003),3101–3104.

11.

Beliakov

, Pradera

and Calvo

, Aggregation Functions: A Guide for Practitioners, Springer-Verlag, Berlin. Heidelberg. 2007.

12.

Bustince

, Jurio

, Pradera

, ., Generalization of the weighted voting method using penalty functions constructed via faithful restricted dissimilarity functions, European Journal of Operational Research 225 (2013),472–478.

13.

Z.S.

and Wu

J.J.

, Intuitionistic fuzzy C-means clustering algorithms, Journal of Systems Engineering and Electronics 21 (2010),580–590.

14.

Zhang

H.M.

, Xu

Z.S.

and Chen

, On clustering approach to intuitionistic fuzzy sets, Control and Decision 22 (2007),882–888.

15.

Hwang

C.M.

, Yang

M.S.

, Hung

W.L.

, ., A similarity measure of intuitionistic fuzzy sets based on the Sugeno integral with its application to pattern recognition, Information Sciences 189 (2012),93–109.

16.

Xia

M.M.

and Xu

Z.S.

, Entropy/Cross entropy-based group decision making under intuitionistic fuzzy environment, Information Fusion 13 (2012),31–47.

17.

Olcer

A.I.

and Odabasi

A.Y.

, A new fuzzy multiple attributive group decision making methodology and its application to propulsion manoeuvting system slection problem, European Journal of Operational Research 166 (2005),93–114.

18.

Leyva-López

J.C.

and Fernández-González

, A new method for group decision support based on ELEC-TRE III methodology, European Journal of Operational Research 148 (2003),14–27.

19.

Wan

S.P.

, Congregation of the experts’ weights based on relative entropy for group decision-making problem with incomplete information, Communication on Applied Mathematics and Computation 23 (2009),66–70.

20.

Qiu

W.H.

, Management decision and applied entropy, China Machine Press, Beijing, 2001.

21.

Zhou

Y.F.

and Wei

F.J.

, Combination weighting approach in muitiple attribute decision making based on relative entropy, Operations Research and Management Science 15 (2006),48–53.

22.

Gerstenkorn

and Manko

, Correlation of intuitionistic fuzzy sets, Fuzzy Sets and Systems 44 (1991),39–43.

23.

Hong

D.H.

and Hwang

S.Y.

, Correlation of intuitionistic fuzzy sets in probability spaces, Fuzzy Sets and Systems 75 (1995),77–81.

24.

Hung

W.L.

and Wu

J.W.

, Correlation of intuitionistic fuzzy sets by centroid method, Information Sciences 144 (2002),219–225.

25.

Bustince

and Burillo

, Correlation of interval-valued intuitionistic fuzzy sets, Fuzzy Sets and Systems 74 (1995), 237–244.

26.

Hong

D.H.

, A note on correlation of interval-valued intuitionistic fuzzy sets, Fuzzy Sets and Systems 95 (1998),113–117.

27.

Z.S.

, Chen

and Wu

J.J.

, Clustering algorithm for intuitionistic fuzzy sets, Information Sciences 178 (2008),3775–3790.

28.

Wang

, Xu

Z.S.

, Liu

S.S.

and Tang

, A netting clustering analysis method under intuitionistic fuzzy environment, Applied Soft Computing 11 (2011),5558–5564.

29.

Z.S.

, Intuitionistic fuzzy hierarchical clustering algorithms, Journal of Systems Engineering and Electronics 20 (2009),90–97.

30.

Hwang

and Rhee

F.C.H.

, Uncertain fuzzy clustering: Interval type-2 fuzzy approach to C-means, IEEE Trans, Fuzzy Systems 15 (2007),107–120.

31.

Yang

M.S.

and Lin

D.C.

, On similarity and inclusion measures between type-2 fuzzy sets with an application to clustering, Computers and Mathematical with Applications 57 (2009),896–907.

32.

Cover

T.M.

and Thomas

J.A.

, Elements of information theory, John Wiley and Sons, New York, 2006.

33.

Mao

J.J.

, Wang

C.C.

, Yao

D.B.

, ., Mutiple-attribute decision-making method of normal distribution interval number based on cross-entropy, Computer Engineering and Applications 48 (2012),44–48.

34.

Tamura

, Higuchi

and Tanaka

, Pattern classification based on fuzzy relations, IEEE Transactions on Systems, Man, and Cybernetics 1 (1971),61–66.

35.

Dunn

J.C.

, A graph theoretic analysis of pattern classification via Tamura’s fuzzy relation, IEEE Transactions on Systems, Man, and Cybernetics 3 (1974),310–313.

36.

Batyrshin

and Rudas

, Invariant hierarchical clustering schemes. In Perception-based Data Mining and Decision Making in Economics and Finance, Springer Berlin Heidelberg. 2007. pp. 181–206.

37.

Batyrshin

I.Z.

, Association measures on [0, 1], Journal of Intelligent & Fuzzy Systems 29 (2015),1011–1020.

38.

Batyrshin

I.Z.

, On definition and construction of association measures, Journal of Intelligent & Fuzzy Systems 29 (2015),2319–2326.