Abstract
In order to distinguish with effect different intuitionistic fuzzy sets (IFSs), we generalize the asymmetrical relative entropy between IFSs as distance measure for higher discernment. Next, the formula of attribute weights is derived via an optimal model according to TOPSIS from the relative closeness degree constructed by the discerning relative entropy. Then, we propose a similarity formula with strong discernibility and two co-correlation degree formulas from the viewpoint of probability theory and prove their similar traits to the traditional correlation coefficient. To make full use of the three similarity measures presented in this paper, we consider aggregating those similarity measures and derive the synthetical similarity formula. Finally, the derived formula is used for clustering analysis under intuitionistic fuzzy (IF) information and the effectiveness and superiority are verified through a detailed comparison analysis of clustering results obtained by other clustering algorithms.
Introduction
In 1965, Zadeh [1] pioneered the concept of fuzzy set (FS), which captures well the certainty information of real world via using a membership function and in 1986, Atanassov [2, 3] exploited intuitionistic fuzzy set (IFS) based on FS through adding the non-membership degree as another parameter and generate the third argument, hesitancy degree with it. IFS is more flexible and powerful in dealing with uncertainty and complexity of the practical matters comparing to FS. Entropy and correlation measure are two important information measure way in information theory. In 1948, Shannon [4] first proposed the concept of information entropy from the thermal entropy of thermodynamics and gave its computational formula to granulate the quantification of information. Subsequently, Kullback and Leibler [5] put forward the concept of relative entropy based on comentropy to describe the difference of two probability distribution as information granularity. In 1997, Shang and Jiang [6] first extended the entropy measure to the fuzzy theory in order to measure the uncertain information of fuzzy sets and Vlachos and Sergiadis [7] generalized it to IF information granularity, discussed the relation between entropy and relative entropy and made application in pattern recognition, medical diagnosis and image segmentation. Since then, entropy and relative entropy have been crucial research topic in fuzzy theory and the investigation has been applied to many fields such as data mining, pattern recognition, clustering analysis, decision making, granular computing and image processing [16–21]. Gerstenkorn and Manko [22] generalize the concept of correlation coefficient in statistics to IFSs. Hong and Hwang [23] define them in the probability space. The correlation coefficient of the IFSs is calculated by the means of "centroid" [24]. After that, correlation coefficients are extended to the interval-valued IFSs [25, 26] by other scholars. It is with these uncertainty measures, IF information is granulated and lead to the formation of granular structure and granular space.
Cluster analysis is a course that divides all data from the dataset into multiple clusters according to certain granularity level depending on the similarity among samples. With the arrival of the information age, especially under the trend of the big data, it has been an important analytic tool and method for data preprocessing in data mining to obtain the distribution of data, and more and more people begin paying attention to it. In the face of different fuzzy environment, people provide a variety of clustering algorithms to deal with different types of fuzzy data such as IF clustering algorithm [27–29], type-2 fuzzy clustering algorithm [30, 31] etc. In IF situation, Xu et al. [27] utilize the derived correlation coefficient as similarity measure to do clustering analysis for the sample data through the transitive closure method but the classification result is not fine and not very consistent with the practicality. Wang et al. [28] defined a novel closeness degree of two IFSs by modifying the previous findings and then clustered by using the netting means with IF context, which is able to handle and analyze data directly on the derived correlation matrix. This method is simple to operate and easy to grasp with a strong intuition for readers, so it can get results fast and effectively, however, the operation is rough and it will be confusing, even unable to perform when the sample size gets large. Moreover, it is also very difficult to use computerprogramming.
To solve these problems, we’ll do some improvement work and the rest of this work is organized as follows. Section 2 reviews basic content related to IFSs. In Section 3, we evolve the relative entropy of arbitrary two IFSs and develop their properties and functions. Section 4 establishes a nonlinear programming model on the basis of TOPSIS [33] with respect to this relative entropy closeness degree to obtain the formula of attribute weights. Section 5 defines three similarity measures for IFSs and prove their respective properties. In Section 6, we apply the aggregation notion to similarity measure for IFSs and uses the GWAA operator to aggregate the three kinds of degrees of similarity, which will make the derived similarity of IFSs more comprehensive and effective. Finally, the eventual association matrix is built with the aggregated association degree, and it is quickly converted into an equivalent matrix as means of "square". Upon these bases, we can make clustering granularity analysis availably with IF information in Section 7.
Preliminaries
In this section, we briefly recall some necessary concepts and operations relating to IFSs, which will be needed hereinafter.
In particular, when the hesitation degree π A (x) =0 (∀ x ∈ X), the IFS A degenerates into the common fuzzy set. Therefore, IFS is one kind of generalization form of fuzzy set, which adds a parameter, non-membership.
Here, we give some basic operations for IFSs.
(1) A c = {〈x, ν A (x), μ A (x) 〉|x ∈ X}, where A c denotes the complement of the IFS A; (2) A ⊆ B ⇔ μ A (x) ⩽ μ B (x), ν A (x) ≥ ν B (x); (3) A = B ⇔ A ⊆ B and B ⊆ A, i. e., μ A (x) = μ B (x), ν A (x) = ν B (x); (4) A ∪ B = {〈x, μ A (x) ∨ μ B (x), ν A (x) ∧ ν B (x) 〉|x ∈ X}; (5) A ∩ B = {〈x, μ A (x) ∧ μ B (x), ν A (x) ∨ ν B (x) 〉|x ∈ X}.
Relative entropy for IFSs
In 1948, Shannon [4] put forward the concept of information entropy.
Assume Q = {q1, q2, ⋯, q m } is the other probability distribution and its information entropy is
Similarly, Vlachos and Sergiadis [7] extend the fuzzy relative entropy to IF situation and derive a relative entropy measure for IFSs.
From Definition 3.3, we notice that Equation (3) is not a real relative entropy from the perspective of information theory since ∀x
i
∈ X, μ
A
(x
i
) + ν
A
(x
i
) ⩽1 while {μ
A
, ν
A
} and
We know Equation (5) is meaningless if μ A , ν A or π A = 0, there upon we make the following extension:
The relative entropy of Definition 3.4 takes into account all the three parameters characterizing IFSs, thus it contains much more information than those existing ones. Through its structure, we can conclude the following theorem.
space and it is of realistic significance to take relative entropy as the separation between two IFSs. As an efficient separation measure, we will utilize it to measure the dissimilarity extent of two IFSs. The larger RE (A, B), states the larger the difference between A and B. On the contrary, A is closer to B.
Different attributes have different impacts on the final decision and they should be given different weights. When attribute weights are completely unknown or partly unknown, we need to excavate weights information from the decision matrix. Now that our relative entropy can be regarded as a distance measure, we can learn from the TOPSIS thought how to determine the attribute weights. The main idea is similarity to ideal solution, that is to say, we choose the alternative with the shortest separation from the positive ideal solution (PIS) and the farthest separation from the negative ideal solution (NIS). Thus, it has two direction when deciding the alternatives’ preference relation. In order to utilize overall PIS and NIS, we define the relative closeness degree (RCD) for IFSs so as to measure comprehensively the dissimilarity level of every IFS and the two extremes. The lower the RCD, indicates that the object is nearer to the PIS and farther from the NIS relatively and that object will be what we want. Thereupon, the weights’ distribution should meet the criterion that the RCD of all alternatives to the ideal ones is as small as possible. In other words, if an attribution can make the total RCD lesser, then such an attribution will be assigned a bigger weight, conversely smaller. In this work, we use the relative entropy for distance measure. Suppose the set of objects A = {A1, A2, ⋯, A
m
}, the set of attributes X = {x1, x2, ⋯, x
n
}, the weight vector
Here A* (x j ) represents the ideal solution with respect to the attribution x j including PIS A+ (x j ) and NIS A- (x j ). Sometimes it stands for PIS, sometimes NIS.
As the above analysis, the weight vector
Similarity degree of IFSs
First, let’s symmetrize the relative entropy RE (A, B) of Equation (4) as SE (A, B):
In fact, Equation (7) is proposed based on TOPSIS. The larger S (A, B), it indicates the higher the extent of similarity between A and B. Compared to the similarity formula induced by distance measure by S (A, B) =1 - d (A, B), our formula not only determines the similarity degree of A, B, but also tests the dissimilarity degree of A and B c . Besides, in some cases, the result obtained by Equation (7) is more reasonable than the one transformed by separation measure and has higher discriminability. Please look at the following example.
We notice from the above example, the respective membership of x1 and x2 is the same for A and B, each non-membership of x1 and x2 is identical for B and C. As for A and C, whether membership or non-membership of x1, x2 are different. Hence in intuition, the similarity of A and B is more than that between A and C, scilicet S (A, B) > S (A, C). If using the generalized normalized Hausdorff distance metric between two IFSs M and N
If utilizing the proposed formula of similarity degree, we get S (A, B) =0.8387, S (A, C) =0.6846. It is noticeable that the result calculated by Equation (7) is distinguished more easily from different pairs of IFSs, additionally our results are more accurate. In fact, our method differs from traditional one in respective focus, which emphasizes the diversity of decision information from the geometric perspective, yet this paper tends to the fuzzy extent of decision information in view of the viewpoint of information theory. In real application, different elements of the universal set have different status and they should hold different weights. The weight w
i
of x
i
in X is supposed to satisfy
In statistics, the correlation coefficient between two sets of data X and Y is defined as:
For the sake of simplicity, we adopt the following notations:
(3) The necessity is evident, we only prove the sufficiency. According to the equality condition of the above three Cauchy inequalities, R (A, B) =1 if and only if there exists nonzero real number k1, k2, k3 such that
R2 (A, B) expressed by Equation (12) obviously satisfies the three items of Theorem 5.3. Since the process to prove the above characteristics is analogous to that in Theorem 5.1, we do not repeat ithere.
In this part, the aggregation similarity measure of IFSs on the basis of the proposed similarity measures is presented. Before that, we make some theoretical preparation for aggregation metric. First let’s recall the wide definition of similaritymeasure.
IF similarity value is always between 0 and 1, thus s (A, B) of Definition 6.1 can be applied to Definition 6.2.
With the above definition, we can give the following theorem.
Upon the support of the above theory, we can do some clustering analysis under IF granular space through a real-world numerical example.
The approaches of IF clustering
Suppose that A = {A1, A2, ⋯, A
m
}, X = {x1, x2, ⋯, x
n
},
Because of similarity measures of this article only have reflexivity and symmetry without transitivity, so they can’t denote equivalence relation. In this case, we employ the algorithm from Refs. [8, 34–36] to cluster based on fuzzy equivalence relation through turning similarity relation into equivalence relation, that is the transitive closure technique. The detailed steps asfollows:
Applications
In this section, we discuss a practical issue concerning classification of cars in [8]. Ten different cars A i (i = 1, 2, ⋯, 10) will be classified into several kinds and experts evaluate these cars via six factors: fuel consumption x1, friction coefficient x2, price x3, comfort degree x4, design x5, and safety x6. The evaluation information of each car under the six indexes is expressed by IFSs, the overall data are listed in Table 1 (decision matrix D) and they denote the satisfaction and dissatisfaction degree of each object over all theattribution.
The evaluation information of the cars.
The evaluation information of the cars.
(1) If 0.9442 < α ⩽ 1, A i (i = 1, 2, ⋯, 10) are classified into 10 types, scilicet each object is of one type alone: {A1}, {A2}, {A3}, {A4}, {A5}, {A6}, {A7}, {A8}, {A9}, {A10};
(2) If 0.8278 < α ⩽ 0.9442, A i (i = 1, 2, ⋯, 10) are classified into 9 types: {A1}, {A2}, {A3}, {A4, A9}, {A5}, {A6}, {A7}, {A8}, {A10};
(3) If 0.8103 < α ⩽ 0.8278, A i (i = 1, 2, ⋯, 10) are classified into 8 types: {A1}, {A2, A3}, {A4, A9}, {A5}, {A6}, {A7}, {A8}, {A10};
(4) If 0.7471 < α ⩽ 0.8103, A i (i = 1, 2, ⋯, 10) are classified into 7 types: {A1, A6}, {A2, A3}, {A4, A9}, {A5}, {A7}, {A8}, {A10};
(5) If 0.6971 < α ⩽ 0.7471, A i (i = 1, 2, ⋯, 10) are classified into 6 types: {A1, A6}, {A2, A3, A7}, {A4, A9}, {A5}, {A8}, {A10};
(6) If 0.6474 < α ⩽ 0.6971, A i (i = 1, 2, ⋯, 10) are classified into 5 types: {A1, A6}, {A2, A3, A7, A8}, {A4, A9}, {A5}, {A10};
(7) If 0.5761 < α ⩽ 0.6474, A i (i = 1, 2, ⋯, 10) are classified into 4 types: {A1, A6}, {A2, A3, A7, A8}, {A4, A9}, {A5, A10};
(8) If 0.4654 < α ⩽ 0.5761, A i (i = 1, 2, ⋯, 10) are classified into 3 types: {A1, A2, A3, A6, A7, A8}, {A4, A9}, {A5, A10};
(9) If 0.4288 < α ⩽ 0.4654, A i (i = 1, 2, ⋯, 10) are classified into 2 types: {A1, A2, A3, A4, A6, A7, A8, A9}, {A5, A10};
(10) If 0 < α ⩽ 0.4288, A i (i = 1, 2, ⋯, 10) are classified into 1 type, scilicet, all the objects are of one group: {A1, A2, A3, A4, A5, A6, A7, A8, A9, A10}.
Refs. [8, 15] involve this real example and Ref. [14] has a analogous example. We might as well make a simple comparison among these experimental outcomes as Tables 2 and 3.
The clustering results of other method and formula.
The clustering results of other method and formula.
By studying the two table, we can find out the following points at least:
(1) Inspecting the results of the references [8] and in Table 2, [13], we identify some abnormal phenomenon that two objects are categorized to one cluster but they may be not in the same group when partitioned less classes. For the clusters of Ref. [8], cars can’t be classified into 3 and 4 classes and there are two sorts of different results when they are clustered 6 types. The authors [13] employ the IFCM algorithm for clustering, which may produce “overlapped” clusters, that is to say, a data set is assigned to this cluster and it also pertains to another cluster at the same time. That causes the classification boundary not crisp. For example, A1, A6 and A8 belong to a cluster when classified as 4 classes. However, when the granularity of the classification is more rough, A1, A4 and A9 are in the same class, A6 and A8 are partitioned into other groups, respectively. Further, A4 and A9 are not with A1 when them are divided into 2 types. The attribute weights are given directly in [8, 13], thereupon weighting information becomes parameter. This leads to the problem in determining which weight is better and how to choose appropriate weight value. We know disparate weights will affect the eventual outcome, which has been demonstrated fully via various dendrograms in [15]. As to the IFCM algorithm, its essence is an iterative method like the fuzzy C-means where the desired number of clusters c and the initial clustering seeds have to be pre-estimated and then modified continually until acquiring the convergence level setted ɛ. Besides, sometimes this technique probably fall into endless loops when operating it, whose reason is that the algorithm demands the data provided to be "convex", otherwise it will generate a local optimal solution.
(2) From Table 3, we notice that the two types of classifications cannot show us the entire data information. This may not meet the requirement of a variety of classification in reality. Why does this happen? An important factor is that they don’t consider the attribution weights and different attributes should account for different proportion. Hwang et al. [15] obtain a hierarchical clustering tree through the improved similarity-based clustering method for the similarity matrix. Their dendrogram shows us only the well-separated clusters and a smaller number of clusters, and the correlation matrix R calculated by the similarity measure formula provided in [15] have many equal value. This reveals the proposed formula lacks high enough resolving quality among several pairs of similar data sets. The clustering technique of Zhang et al. [14] transforms the given IF information into interval-valued one and all the similarity degrees take the form of interval numbers. This will require much more computational effort and mean a lack of accuracy in the similarity degrees. In addition, they only consider the minimal and maximal deviation information and ignore all the attribute weight information on the similarity formula, which will result in the loss of too much information.
Now let’s look at our clustering case. The result displays complete hierarchical clusters from 10 classes to one for us without any abnormity. Though possibly producing large computation quantity during the operation, it’s very fit for programming in computer. Due to it is hierarchical clustering with various cutting levels, we can select appropriate clustering number in line with the specific application environment, so it can meet the requirement of different situation. This desirable result also derives from the similarity formulas presented in this article with strong discriminating power. Another advantage is the calculation of attribute weights. Therefore, there are not any repetitive value in upper or lower triangular portion of the IF similarity matrices S, R1 and R2 computed in this article, which precedes results of much literature such as Refs. [8, 15]. However, no formula can be applied to all conditions in practical problem and different similarity formulas will get different correlation degrees, even conflict consequences sometimes. Thus we adopt aggregation operator to aggregate various similarity degrees so that the final similarity matrix R obtained simultaneously contains the information from Eqs.(8),(11) and (12). Then the corresponding clustering result is quite objective, comprehensive and reasonable. For example, when samples are sorted into 8 groups, A2 is more similar to A3 but both are separated in [8, 14]. It is more suitable to put A2 and A3 together just as our clusters. Besides, the clustering algorithm based on transitive closure because of its invariance under the initial numbering of the objects and invariance under monotone transformations of similarity measure [36] is excellent enough so that the clustering results are satisfying.
The clusters information of 10 cars.
The clusters information of 10 cars.
In step 5, if p = -1, the GWAA operator becomes the WHA operator and the according optimal weights
By observing the above tables, it’s noted that the clustering of Table 4 differs slightly from Table 5 on 8 classes only. A1 and A6 belong to one group in Table 4 while A2 and A3 are placed on one class in Table 5. We can also see Table 5 is identical to the case when p = 1. This shows different arguments value have a little different influences on the ultimate clusters granule for the GWAA function. As the discussion above, A2 should be clustered one type with A3 first, so it’s proper that the value of p is 1 or approach 0. But whatever the value p is given, the clustering obtained is superior to other clustering results obtained by using single formula of similarity measure in previous research. In fact, there are four brands in all among those cars and cars of the same brand should belong to one class with different versions. Hence the actual classification is {A1, A6}, {A2, A3, A7, A8}, {A4, A9}, {A5, A10}, which is consistent with our clustering when classified into 4 classes, no matter how values of p vary.
Cluster analysis is an important investigation direction in the field of machine learning as an unsupervised learning method. After clustering, it should ensure that the same category of data possesses a high similarity as far as possible, and a low similarity between different types of data, so the selection of appropriate similarity measure becomes the key of clustering. Due to the realistic classification is often accompanied by fuzziness, so it’s more natural using the fuzzy theory to do clustering analysis for uncertain granular information, in more line with the objective reality. With the development of IF theory, the cluster method on IF granularity structure has drawn intense attention. In this article, for IF clustering, three sensitive similarity measures for diverse IFSs and the aggregation similarity degree under IF environment are presented. In the acquisition of the attribute weights, we use the relative entropy separation with high distinguishability to construct the RCD instead of establishing directly the optimization model with the PIS or NIS individually, which will ensure that the achieved weights formula is more objective and rational. As a result, the correlation matrix obtained by the integrated similarity formula with weights can better depict the diversity of diverse objects. Based on these theoretical preparations, lastly the author gives the IF cluster algorithm and demonstrates its viability and availability via an illustrative example.
Acknowledgements
The authors are highly grateful to any anonymous reviewer for their careful reading and insightful comments, and their constructive opinions and suggestions will generate an improved version of this article. The work is supported by the National Nature Science Foundation of China (No. 61673020), the Provincial Nature Science Research Key Project for Colleges and Universities of Anhui Province (No. KJ2013A033), and the Talent Support Key Project for Outstanding Young of Colleges and Universities in 2016 (No. gxyqZD2016453).
