Abstract
Abstract
Dual hesitant fuzzy set (DHFS) is a very comprehensive set which includes fuzzy set, intuition fuzzy set and hesitant fuzzy set as its special cases. Distance and similarity measures play great roles in many areas, such as decision making, pattern recognition, etc. In this paper, we introduce some distance and similarity measures for DHFSs based on Hamming distance, Euclidean distance and Hausdorff distance. Two examples are used to illustrate these distance and similarity measures and their applications in pattern recognition. Finally, the comparisons among DHFSs and the corresponding IVIFSs and HFSs are made in detail by utilizing the developed distance measures.
Keywords
Introduction
Nowadays people are always surprised by the new technology, iris recognition, character recognition, self-driving car, etc., which used to be seen in science fictions have come true. Pattern recognition that originates from 1920 s is the main implementation technology. With mainly relying on some mathematical methods this technology aims to recognize environment and things automatically. Although great progresses have been made there are still lots of problems remained. taking a primary problem for example in the real world it is hard to make a precise definition for some uncertain things, as a result, the fuzzy set [1] using the membership degree to describe the relationship between element and its class has offered great help to pattern recognition recently.
The fuzzy set which was introduced by Zadeh in 1965, has been developed greatly, and several famous extensions have been introduced, such as the intuitionistic fuzzy set (IFS) [2], interval-valued intuition fuzzy set (IVIFS) [3] and hesitant fuzzy set (HFS) [4], etc. In these sets, distance and similar measures which are the vital concepts in patter recognition are defined to describe the relationships among elements and classes.
The Hamming distance and Euclidean measure for IFSs were firstly introduced by Szmidt and Kacprzyk [5]. Then, Hung and Yang [6] proposed Hausdorff distances for IFSs. Thereafter, other distance and similarity measures for IFSs and IVIFSs have been introduced [7–13], and all these measures can be applied in pattern recognition. Xu and Xia [14, 15] developed several hesitant fuzzy distance and similarity measures, and a number of hesitant fuzzy ordered weighted distance measures and hesitant fuzzy ordered weighted similarity measures for HFSs or hesitant fuzzy elements.
Dual hesitant fuzzy set (DHFS) was proposed by Zhu et al. [16], which is considered to be more comprehensive for containing the decision maker’s evaluation information in actual applications [17]. Until now, all the existing distance and similarity measures are not applicable to DHFSs. Therefore, it is necessary to make some significant extensions for these measures for dealing with dual hesitant fuzzy information. So the remainder of this paper is organized as follows: Section 2 introduces the primary knowledge of some helpful measures. Various distance measures for DHFSs are developed in Section 3, and the corresponding similarity measures can be gotten. Some applications for pattern recognition are shown in Section 4, which are demonstrated by two examples. In order to reveal the relationships between DHFSs and some other fuzzy sets, several comparisons among them by using distance and similarity measures are made respectively in Section 5. Section 6 ends this paper with some concluding remarks.
Preliminaries
Briefly here we only review two kinds of fuzzy sets (i.e., interval-valued intuitionistic fuzzy sets (IVIFSs) and hesitant fuzzy sets (HFSs)) and their distance and similarity measures which will be used in the next sections.
IVIFSs and their distance and similarity measures
For an IVIFS , Xu and Yager [18] called the pair an interval-valued intuitionistic fuzzy number (IVIFN). For convenience, an IVIFN can be presented as , where [γ -, γ +] ⊆ [0, 1], [η -, η +] ⊆ [0, 1], and satisfy γ + + η + ≤ 1.
Several commonly used distances for IVIFSs are as follows [13]:
(1) The generalized interval-valued intuitionistic fuzzy normalized distance:
(2) The generalized interval-valued intuitionistic fuzzy weighted Hausdorff distance:
(3) The generalized hybrid interval-valued intuitionistic fuzzy weighted distance:
In general, the distance and similarity measures are complementary each other. Assume that there are two IVIFSs and , is denoted as the distance measure between and , and is the similarity measure correspondingly. It can be concluded that . So in this paper, we focus on the discussion of distance measures, and the similarity measures can be easily obtained.
In order to make HFS easily understood, Xia and Xu [19] defined a mathematical symbol:
In order to calculate the distances between two HFSs A and B on X ={ x 1, x 2, …, x n }, we should do some preparing work. First of all, we let l h A (x) be the number of values in h A (x), l x i = max{ l h A (x i ), l h B (x i ) } for each x i ∈ X and be the jth largest value in h A (x). Secondly, we should make some extensions when comparing HFEs, for example, assume that h A (x i ) = { 0.2, 0.3, 0.6 } , h B (x i ) ={ 0.5, 0.6 }, l h A (x i ) = 3 and l h B (x i ) = 2. It is obvious that l h A (x i ) and l h B (x i ) are not always equal. With respect to this problem, Xu and Xia [18] introduced a method that mainly extends the shorter HFE by adding some values until both of them have the same length when we compare them. Two ways extending h B (x i ) are available. One is called the optimist rule which extends h B (x i ) as h B (x i ) ={ 0.5, 0.6, 0.6 } while the other is called the pessimist rule which extends it as h B (x i ) ={ 0.5, 0.5, 0.6 }. The pessimistic rule has been adopted commonly because we generally assume that the decision makers are pessimistic.
Eventually several distance measures of HFSs [14] which are based on Hamming distance, Euclidean distance and Hausdorff distance are introduced:
(1) The generalized hesitant normalized distance:
(2) The generalized hesitant weighted Hausdorff distance:
(3) The generalized hybrid hesitant weighted distance:
There are also many other distances for HFSs which can be seen as the special cases of the three distances above and the corresponding similarity measures can obtained from these distances by defining s (A, B) = 1 - d (A, B), where d (A, B) can take d ghn (A, B), d ghwh (A, B) or d ghhw (A, B), respectively.
As a new extension of fuzzy sets, DHFSs have attracted much attention recently, which can be defined as follows:
Until now, there is no research on the distance and similarity measures for DHFSs. So we will firstly propose the axioms for distance and similarity measures under dual hesitant fuzzy environment:
0 ≤ d (A, B) ≤ 1;
d (A, B) = 0 if and only if A = B;
d (A, B) = d (B, A).
0 ≤ s (A, B) ≤ 1;
s (A, B) = 0 if and only if A = B;
s (A, B) = s (B, A).
Like the other fuzzy sets, the relationship between s (A, B) and d (A, B) also obeys the formulas that s (A, B) = 1 - d (A, B). So we will mainly discuss the distances for DHFSs, then the similarity measures can be easily gotten.
For convenience, we assume that there are two DHFSs:
If you want to calculate the distance between the two DHFSs, the first problem is that l h A (x) ≠ l h B (x) and l g A (x) ≠ l g B (x) (l g A (x) can be seen as the number of values in g A (x)) in most cases. So we use the extension method introduced by Xu and Xia [14] to extend the shorter ones for the sets of membership degree values and nonmembership degree values respectively. Let l h(i) = max{ l h A (x i ), l h B (x i ) } and l g(i) = max{ l g A (x i ), l g B (x i ) }. The pessimistic principle is utilized for the calculation of distances between two DHFSs in this paper.
Hamming distance, Euclidean distance and Hausdorff distance are three famous distances which will be adopted for the definition of DHFSs distances. Because that DHFSs are composed of two parts which are the sets of membership degrees and the sets of nonmembership degrees. Both of them play an important role in describing the relationships among DHFSs. So below we define a dual hesitant normalized Hamming distance:
Similarly, a dual hesitant normalized Euclidean distance can be defined as follows:
With the generalization of the two distances (8) and (9), a generalized dual hesitant normalized distance can be obtained:
Hausdorff distance is recognized as the maximum distance of a set to the nearest point in the other set [20], which describes the dissimilarity degree of two sets. So below we will extend the Hausdorff distances for DHFSs. A generalized dual hesitant normalized Hausdorff distance can be expressed as:
Especially, when λ = 1, it reduces to a dual hesitant normalized Hamming–Hausdorff distance:
With the combination of the normalized Hamming distance, the normalized Euclidean distance and the normalized Hausdorff distance, a hybrid dual hesitant normalized Hamming distance, a hybrid dual hesitant normalized Euclidean distance, and a generalized hybrid dual hesitant normalized distance are given as follows respectively:
In practice, because that each x i plays a different role in set X, it should be weighted variously. In this way, we will propose some weighted versions of the above distance measures.
Suppose that the weights of the elements x i (i = 1, 2, …, n) are w i (i = 1, 2, …, n), with w i ∈ [0, 1] and . First of all, the normalized hamming distances, the normalized Euclidean distances and the normalized Hausdorff distances can be rewritten as the weighted distances, such as a generalized dual hesitant weighted distance:
As the special cases, when λ = 1, Equations (17) and (18) are respectively reduced to a dual hesitant weighted Hamming distance:
If λ = 2, then Equations (17) and (18) are respectively reduced to a dual hesitant weighted Euclidean distance:
With the combination of Equations (17) and (18), a generalized hybrid dual hesitant weighted distance is derived:
Especially, if λ = 1, then we can get a hybrid dual hesitant weighted Hamming distance:
If λ = 2, then we can get a hybrid dual hesitant weighted Euclidean distance:
In these distance formulas above, the variable x i is random. If both x ∈ X = [a, b] and the weight w (x) of x obeying w (x) ∈ [0, 1] and are continuous in X, then the distance measures above can be extended as below:
(1) Equation (19) can be transformed into a continuous dual hesitant weighted Hamming distance:
(2) Equation (21) can be transformed into a continuous dual hesitant weighted Euclidean distance:
The two distances above can be summarized as the special cases of a generalized continuous dual hesitant weighted distance:
Based on Equations (18), (20) and (22), we can obtain a generalized continuous dual hesitant weighted Hausdorff distance:
Especially, for λ = 1, 2, the corresponding formulas are the continuous dual hesitant weighted Hamming-Hausdorff distance:
With the combination of the generalized continuous dual hesitant weighted distance and the generalized continuous dual hesitant weighted Hausdorff distance, a generalized hybrid continuous dual hesitant weighted distance can be obtained:
If λ = 1, then Equation (32) degenerates to a hybrid continuous dual hesitant weighted Hamming distance:
If λ = 2, then Equation (32) degenerates to a hybrid continuous dual hesitant weighted Euclidean distance:
In the above continuous dual hesitant distances, w (x) can be changed with respect to x which is a variable and belongs to [a, b]. In probability and statistics theory, if x is a continuous random variable and each x ∈ [a, b] is chosen equally, then we can say that x obeys the uniform distribution, and the corresponding probability density is , for every x.
Based on the uniform distribution, and assume that the weight , correspondingly, the continuous dual hesitant fuzzy weighted Hamming distance can be transformed into a continuous dual hesitant normalized Hamming distance:
Meanwhile, the continuous dual hesitant weighted Euclidean distance can be transformed into a continuous dual hesitant normalized Euclidean distance:
Furthermore, a generalized continuous dual hesitant normalized distance can be concluded:
It is obvious that the other continuous dual hesitant distances can be discussed similarly. As the special cases, the generalized continuous dual hesitant weighted Hausdorff distance can be transformed into a generalized continuous dual hesitant normalized Hausdorff distance:
If λ = 1, then Equation (38) degenerates to a continuous dual hesitant normalized Hamming- Hausdorff distance:
If λ = 2, then Equation (38) degenerates to a continuous dual hesitant normalized Euclidean-Hausdorff distance:
As for a generalized hybrid continuous dual hesitant weighted distance, if , then a generalized hybrid continuous dual hesitant normalized distance can be derived:
For λ = 1, 2, two distance measures can be gained, which are a hybrid continuous dual hesitant normalized Hamming distance:
With respect to the corresponding similarity measures, they can be easily obtained by utilizing the formula s (A, B) = 1 - d (A, B).
The pattern recognition which aims to provide the most likely match for the input values usually takes the distances between two sets into account. Assume that there are n categories A i (i = 1, 2, …, n), and B is a set to be recognized. The basis for recognition is that if , then B is recognized as belonging to A j , j∈ { 1, 2, …, n }.
In order to recognize which pattern a new metalmaterial B = {{ 0.8, 0.2 } , { 0.8, 0.2 } , { 0.5, 0.2 } , {0.7, 0.3}} belongs to, Chen et al. [21] let the weight vector of the attributes G i (i = 1, 2, 3, 4) be w = (0.40, 0.22, 0.18, 0.20) T . Then by utilizing three correlation coefficients, a result is obtained that B belongs to A 2.
In this example, the generalized hybrid dual hesitant weighted distance will be used for pattern recognition. In order to make comparisons with Chen et al. [21]’s results derived from correlation measures, we will follow the weight vector and build material data mentioned in Chen et al. [21]’s example. Then we calculate the generalized hybrid dual hesitant weighted distances among A i (i = 1, 2, 3, 4, 5) and B, and by choosing the shortest distances, the pattern of B will be obtained. The distances among A i (i = 1, 2, 3, 4, 5) and B with λ = 1, 2, 4, 10 are shown in Table 2.
It is clear in the table that the generalized hybrid dual hesitant weighted distance among A i (i = 1, 2, 3, 4, 5) and B increases along with the value of λ, and no matter how much the value of λ is, the minimal distance is the distance among A 2 and B. Based on the minimum distance principle, it is easy to get the conclusion that B belongs to A 2, which is consistent with Chen et al. [21]’s result.
There are four patients: Al, Bob, Joe and Ted. The symptoms of them are listed in Table 4.
Wang et al. [22] utilized the correlation coefficients for pattern recognition. In this paper, we aim to seeking diagnoses for them which mainly rely on the nearest distance between diseases and each patient. Two kinds of distances (i.e., the generalized dual hesitant normalized distance and the generalized dual hesitant weighted Hausdorff distance) are adopted. First of all, the generalized dual hesitant normalized distances among patients and diseases are shown in Table 5.
Generally speaking, the distances between each disease and patient increase along with the value of λ. However, it is not hard to get the conclusions that no matter what the value of λ is, Al is attacked by malaria, Bob is attacked by stomach problem, and Ted is attacked by viral fever. As for Joe, by using Equation (10) and when λ = 1 and 2, the diagnosis is that Joe suffers from viral fever; when λ = 4 and 10, Joe suffers from malaria. In some cases, the results are not stable. In our opinion, we boil down it to the nonlinearity of distance formulas.
Secondly, we take the generalized dual hesitant weighted Hausdorff distance into account. Assume that the four patients are more likely to be troubled with stomach and chest problems for their living condition, so we let the weight vector of the five diseases be: w = (0.2, 0.15, 0.15, 0.25, 0.25) T . By use of the distance formula (18), the distances between each patient and disease are obtained in Table 9.
From the above four tables, it can be concluded Al suffers from malaria, the main problem for Bob and Joe is stomach problem (specially, Joe also may be troubled with viral fever for the reason that the distance between Joe and viral fever is 0.1550 when λ = 1), and Ted is troubled with viral fever or malaria that varies with the values of the parameter λ.
Finally, we will make a comparison among two distances mentioned above. Hausdorff distance is recognized to be the largest distance. If you pay attention to the values in the eight tables, it is easy to find that the generalized dual hesitant weighted Hausdorff distances with the weight vector w = (0.2, 0.15, 0.15, 0.25, 0.25) T are much bigger than the generalized dual hesitant normalized distances correspondingly.
Although the formulas for the two distance measures are different, the results are the same that Al suffers from viral fever and Bob is troubled with stomach problem.
If we compare the results derived by using our distance measures with those of Wang et al. [22] which utilize the correlation coefficients ρ DHFS 1 , ρ DHFS 2 and ρ DHFS 3 , then the diagnosis results are very similar. For example, the results which utilize ρ DHFS 1 are that Ted suffers from viral fever, Al and Joe from malaria, and Bob from stomach problem.
Discussion
IFSs, HFSs and fuzzy multisets (FMSs) can be seen as the special cases of DHFSs under certain conditions [16], while the IVIFS is defined as an envelope of a typical DHFS (in order to make a strictly distinction between DHFSs and other fuzzy sets, typical DHFE or T-DHFE is defined, in which both the numbers of membership degrees and nonmembership degrees are more than one. Correspondingly, the T-DHFS means the typical DHFS) [23]. Actually, a T-DHFE not only can be degenerated to a HFE by removing the nonmembership degree set but also can be extended to an IVIFN. As for the other data, such as fuzzy number, intuitionistic fuzzy number, etc., it is hard to get the corresponding form of DHFE. In addition, DHFS is always the T-DHFS in most cases, so we will not distinguish DHFS and T-DHFS in the following discussion.
Since the DHFSs can be reduced to other fuzzy sets, the relationships among them should be explored. If the corresponding changes will not influence the final results in DHFSs, some algorithms for DHFSs can be simplified. So far, there are no research on the comparisons of DHFSs and their corresponding reduced forms. Therefore, in the following, we focus on exploring whether the corresponding changes for DHFSs will influence final results. Here we just take the IVIFSs and HFSs into account by calculating the distances among DHFSs and the corresponding reduced forms respectively. The data of Example 2 are adopted.
Comparisons among DHFSs and the corresponding IVIFSs
Firstly, we will reduce the DHFSs to the IVIFSs by using the method proposed by Zhu et al. [19].
Based on Definition 6, the corresponding IVIFSs of DHFSs can be gotten easily.
First of all, we transform the data of Tables 3 and 4 into those in Table 13.
Secondly, we should calculate the distances among patients and diseases respectively by utilizing Equation (1) (the generalized interval-valued intuitionistic fuzzy normalized distance) and Equation (2) (the generalized interval-valued intuitionistic fuzzy weighted Hausdorff distance) with the weight vector w = (0.2, 0.15, 0.15, 0.25, 0.25) T which are the corresponding distances of Equation (10) (the generalized dual hesitant normalized distance) and Equation (18) (the generalized dual hesitant weighted Hausdorff distance). The derived results are shown as in Tables14–17.
From the data of the above four tables, we can know that no matter what the distances and parameters are, the conclusions are consistent that Al suffers from malaria and Ted suffers from viral fever. These results are steadier than those of Example 2 which are described by DHFSs.
Comparisons among DHFSs and the corresponding HFSs
In this part, we will repeat the process in Section 4.1. First of all, DHFSs are rewritten into the corresponding HFSs. The data of diseases and patients described by HFSs are contained in Table 18.
Secondly, we calculate the generalized hesitant normalized distances and the generalized hesitant weighted Hausdorff distances with the weight vector w = (0.2, 0.15, 0.15, 0.25, 0.25) T respectively. The results are shown in Tables 19 and 20 respectively.
As the above two tables show although some information is lost, the diagnosis for Al is the same. So this change of DHFSs will not influence the final results for pattern recognition in some cases.
Comparing the two tables above, there is little difference among these diagnoses in which the majority are viral fever except one. These results are steadier than those calculated by DHFSs. To some extent, even some information is reduced, the diagnosis for Ted is not becoming more explicit. Through analysis, we deem that there are two reasons: One is that sometimes there are little difference between two diseases in symptoms, and another is that the patient will suffer from several diseases at the same time.
Generally speaking, sometimes the DHFSs can be replaced by the simpler forms under the conditions in which no matter what distances and parameters are adopted, the pattern recognition results are the same. In another word, if the pattern recognition results vary with distances and parameters, it should not reduce information from DHFSs anymore.
Concluding remarks
In this paper, by using Hamming distance, Euclidean distance, Hausdorff distance and their generalizations, we have proposed several distance measures for DHFSs under two conditions whose variables are discrete and continuous respectively. The corresponding similarity measures have been obtained easily. These distance and similarity measures are useful in pattern recognition and they have been verified in the examples of building material recognition and disease diagnosis respectively. In the end, some comparisons among DHFSs and the corresponding IVIFSs and HFSs have been shown and some interesting conclusions have also been derived.
Footnotes
Acknowledgments
This research was funded by the National Natural Science Foundation of China (No. 61273209), and the Central University Basic Scientific Research Business Expenses Project (No. skgt201501).
