Abstract
In this paper, a new version of distance measure for hesitant fuzzy linguistic term sets is developed. The new version of distance measure provides an objective way to handle the diverse dimensions of two HFLTSs, i.e., without shortening the HFLTS to be a linguistic interval or extending the shorter one by adding linguistic terms. The Hamming distance, the Euclidean distance and the corresponding forms are put forward. By the relationship between the distance measure and the similarity measure, some parallel similarity measures are also investigated. As applications of the proposed measures, the issues of pattern recognition, clustering analysis and multi-criteria decision making are considered.
Keywords
Introduction
The concept of hesitant fuzzy linguistic term sets (HFLTSs) was first introduced by Rodríguez, Martínez and Herrera [1, 5], which was applied to express the hesitant preferences when evaluating using a linguistic term set. Different from the 2-tuple linguistic model [22, 25], the hesitant fuzzy linguistic term sets represent a hesitant among more than one linguistic terms, while the 2-tuple linguistic model considers a certain hesitant degree with given linguistic term. Thus, HFLTSs can be seen as a continuation of 2-tuple linguistic model [25–27].
Since the notion of HFLTSs has been developed, it has been widely studied and applied to practical issues. Beg and Rashid [2] proposed the TOPSIS (Technique for Order Preference by Similarity to an Ideal Solution) method for HFLTSs to aggregate the subjective evaluations given by experts in decision making. Chen and Hong [3] provided a multi-criteria linguistic decision method with HFLTSs using the pessimistic and the optimistic attitude of the decision maker. Yavuz, et al. [4] and Montes, et al. [9] applied the hesitant fuzzy linguistic model to the evaluation of alternative-fuel vehicles and to the housing market, respectively. Lee and Chen [6, 7] gave the likelihood-based comparison relations of HFLTSs and proposed some hesitant fuzzy linguistic aggregation operators, such as the hesitant fuzzy linguistic weighted average (HFLWA) operator, the hesitant fuzzy linguistic weighted geometric (HFLWG) operator, et al. Wang, et al. [8] developed a novel outranking rational system of HFLTSs and also introduced a dominance relation by using the outranking degrees. Rodríguez, Martínez and Herrera [10] utilized the comparative linguistic expressions, which is closer to human being’s cognitive models for expressing linguistic preferences on the basis of HFLTSs. Liu and Rodríguez [11] presented a new representation of HFLTSs by means of a fuzzy envelope to carry out the computing with words process (CWW).
As a type of information, in the process of data analysis and data mining with HFLTSs, the difference between units needs to be measured, so the similarity and category of the unit can be determined. Liao, Xu and Zeng [11] investigated a family of distance and similarity measures for HFLTSs and analyzed for discrete and continuous cases. Liao and Xu [12] introduced another family of distance and similarity measures for HFLTSs, such as the cosine distance and similarity measures, the weighted and continuous cases.
Similar to the way that dealt with hesitant fuzzy sets [13], Liao, Xu and Zeng [11] and Liao and Xu [12] extended the shorter HFLTS so that the compared HFLTSs have equal lengths. However, such way has a little drawback when it is applied under the hesitant fuzzy linguistic environment. For example, for two HFLTSs
Figure 1.1 shows the existing ways to handle the different dimensions in defining the distance measure and similarity measure.

Existing ways to handle HFLTSs with different dimensions.
The shortening way can be finished according to Beg and Rashid [2] and Wang, et al. [8], which utilized the envelope of HFLTS and transformed a HFLTS into a linguistic interval. Next, the extending way was given by Liao, et al. [11], which provided a similar way as dealing with the hesitant fuzzy sets [14].
However, as mentioned above, both of the two existing methods have certain drawbacks, i.e., the shortening way will loss the information provided by the evaluator and the extending way will be affected by the attitude of the evaluator. Thus, a new way to handle the dimension problem will be developed in this paper, which can avoid the two drawbacks. Recently, Dong, Chen and Herrera [28] developed a difference measure between two HFLTSs, which can be seen as a certain distance measure for HFLTSs by considering the numbers of two HFLTSs’ union and intersection. The difference measure provides a relative distance measure for HFLTSs, which is a relative way to handle the drawbacks mentioned above and is not easy to be extended to the case of interval-valued hesitant fuzzy linguistic term sets (IVHFLTSs) [29]. While in practical issues, a direct way may be needed.
To do this, the rest of this paper is arranged as follows:
In Section 2, we briefly review some elementary concepts which are needed in this paper. Section 3 discusses a new version of distance and similarity measure for HFLTSs. Section 4 provides the application of the defined distance and similarity measures to some real problems. Section 5 gives some conclusions and remarks of this paper.
In this section, we mainly review some relevant concepts so as to facilitate further discussions.
Let S = {s0, s1, …, s
g
} be a linguistic term set, where g is an even number and the element s
i
represents a possible value for corresponding linguistic variable, and the following characteristics are valid: The set is ordered: s
α
> s
β
, if α > β; There is a negation operator: neg (s
α
) = sg-α; If s
α
> s
β
, then
To express the hesitancy when using linguistic variables, Rodríguez, et al. [1] introduced the following concept of hesitant fuzzy linguistic term set (HFLTS).
Hereinafter, without special instructions, the notations
The envelope of a HFLTS is defined as follows:
In Rodríguez, et al. [1], the following context-free grammar G H is defined to produce simple but rich linguistic expressions that can be easily represented by means of HFLTSs.
According to the production rules, different ways can be utilized to transform linguistic expressions into HFLTS, which can be shown as below: E
GH
(s
i
) = {s
i
|s
i
∈ E
GH
(less than s
i
) = {s
j
|s
j
∈ E
GH
(greater than s
i
) = {s
j
| s
j
∈ E
GH
(between s
i
and s
j
) = {s
k
| s
k
∈
The distance measure and similarity measure of HFLTSs is defined according to
With the definitions mentioned above, Wang, et al. [8] gave the Euclidean distance measure of HFLTSs, in which the envelope of HFLTS is utilized, while Liao, et al. [11] defined several families of distance and similarity measures by using the distance and similarity measures of hesitant fuzzy set [14].
New versions of distance measures for two sets of HFLTSs
For the reason of possible different dimensions of HFLTSs, Wang, et al. [8] and Liao, et al. [11] provided two models to deal with the dimensions, i.e., shortening the HFLTSs by using the envelope or extending the HFLTSs by adding linguistic labels so that the dimensions are the same. In this section, we will give a different way to solve the dimension problem.
Before developing the new version of distance measures between two sets of HFLTSs, we first consider the following new distance of HFLTSs.
The notion of distance is used to measure the difference between two objects. When the distance measure is applied to HFLTSs, an effective way is comparing the elements of two HFLTSs one by one (see Fig. 3.1).

Comparison between two HFLTSs-Mode I.
Assume that
Actually, the comparison between two HFLTSs can also be finished in the following way:
From Fig. 3.2, the comparison between two HFLTSs can start from the ends of two HFLTSs (named as Mode II), which is different from Mode I shown in Fig. 3.1. It can be obtained that both of the two modes can measure the difference between two HFLTSs. Hereinafter, the first mode will be used.

Comparison between two HFLTSs-Mode II.
Therefore, the following concept of distance measure for HFLTSs is valid:
Similarly, the Euclidean distance measure can be defined as below:
From Equations (1) and (2), two conclusions can be listed in the following:
Assume that
Since
For the reason that the linguistic term If We have If Thus, If So If We have If Thus, If
So
The conclusion that
Therefore, the conclusion is valid. □
With the distance measures of HFLTSs, the distance measure of two sets of HFLTSs can be derived. The main problem of defining the distance between two sets of HFLTSs can be summarized as follows: The diverse dimensions of elements in two sets of HFLTSs; The different dimensions of two sets of HFLTSs, i.e., let

Comparisons between two sets of HFLTSs.
Similar to the way that handle HFLTSs, the distance measure of two sets of HFLTSs can be defined as follows:
Thus, the Euclidean distance measure between Θ1 and Θ2 is written by
To show the application of the proposed distance measures, the following example is valid:
Distance measures between Θ1 and Θ2
From Example 3.2 and Equations (5, 6), it can be known that the normalized Hamming distance and the normalized Euclidean distance measure are located in the range of [0, 1].
The normalized generalized distance measure for two sets of HFLTSs can be defined as below:
Known by Equation (7), d nH (Θ1, Θ2) and d nE (Θ1, Θ2) are two special cases of d nG (Θ1, Θ2).
By Property 2.1 and the normalized distance measures, we can get corresponding similarity measures.
In this section, several examples will be developed to show the application of proposed measures in the fields of pattern recognition, fuzzy clustering and multiple attribute decision making.
Hesitant fuzzy linguistic approach to pattern recognition
Assume that there are four patterns (adapted from Li, et al. [15] and Du & Hu [19]), which are all represented by the HFLTS, denoted as Θ1, Θ2, Θ3, Θ4 (see Table 4.1), F = {F1, F2, F3, F4} is the set of features for recognition. Now, there is an unknown sample B, a practical issue is to justify which pattern the unknown sample B belongs to.
Patterns description
Patterns description
Herein,
By Equations (5) and (6), Table 4.2 shows two distance measures between Θ i (i = 1 ∼4) and B.
Distance measures’ results
The notation Doc(j) represents the confidence of an distance metric in recognizing a given sample that belongs to the pattern (j), which is introduced by Hatzimichailidis, et al. [20], Papakostas, et al. [21] and can be defined by:
The greater the value of Doc(j) is, the more confident the result would be.
From Table 4.2, both of the two distance measures show that the given sample B should be classified to Pattern 3.
When using the normalized generalized distance measure d nG (Θ, Θ′), with the changing of parameter λ, the results are shown in Fig. 4.1.

The affection of parameter λ to the recognition.
According to Fig. 4.1, with the changing of the parameter λ, the comparison between d (Θ2, B) and d (Θ4, B) is also changing. However, the given unknown sample B should always be classified to Pattern 3, i.e., the distance between Pattern 3 and sample B is always the least one among the four distances.
Table 4.3 shows the recognition produced by some existing distance measures.
Recognition given by other distance measures
From Table 4.3, it can be concluded that the given sample B should be classified to Pattern 4, the result is different from the recognition produced by our method. Actually, by comparing the corresponding linguistic term in two HFLTSs, from a global aspect, Pattern 3 is closer to sample B. However, when the lengths of two HFLTSs are unified to be the same by using the envelope (or adding some linguistic label(s)), the initial meanings of the hesitant fuzzy linguistic evaluations have been changed.
Thus, compared with the existing distance measures, our proposed approach is objective and is able to avoid too much information loss.
Herein, the similarity measure generated by the normalized Euclidean distance measure d nE (Θ1, Θ2) is considered. Table 4.4 shows the similarities between the 9 patterns.
Similarities between the 9 patterns
By using the hierarchical clustering method, the clustering results are listed in the following Table 4.5.
Clustering results with different levels based on d nE
According to Table 4.5, the hierarchical clustering shows the partitions that determined by levels of α. Given a value α in [0, 1], for instance, α = 0.92, then {A1, A4, A6, A8} forms a cluster, {A2} is a cluster with only one pattern and {A3, A5} , {A7, A9} form another two distinct clusters.
Multi-criteria decision making is an important branch of decision theories, which has been widely studied and applied to many practical issues. The combination of HFLTS and multi-criteria decision making has also been widely researched [1, 12]. With the concept of distance measure of HFLTSs, Beg and Rashid [2] considered the ‘Technique for Order Preference by Similarity to an Ideal Solution’ (TOPSIS) method of multi-criteria decision making with HFLTSs. To illustrate the application of proposed distance/similarity measures, the following TOPSIS method with new measures is given.
The assessments are listed in Table 4.6.
The assessments provided for this problem
The assessments provided for this problem
The solution can be summarized as follows: Transforming the hesitant fuzzy linguistic term sets to linguistic labels To be convenient for analysis, by transforming the hesitant fuzzy linguistic term sets to linguistic labels Table 4.6 can be rewritten by Determining the positive and negative ideal points By comparing the evaluated HFLTSs under the same criteria, the positive ideal point (HFLTS+) and negative ideal point (HFLTS-) can be obtained, i.e.,
Calculating the distance/similarity between each alternative and the ideal points Take the Hamming distance measure as an example, the results are shown in Table 4.8. Let HFLTS
i
= (HFLTS
i
1, HFLTS
i
2, HFLTS
i
3, HFLT S
i
4) be the evaluated hesitant fuzzy linguistic term set of the i-th alternative. Then, the distance between the i-th alternative and the positive (or negative) ideal point is denoted as Computing the closeness coefficients of all alternatives With the distances between the ideal points and the alternatives, the closeness coefficient of the i-th alternative can be computed according to
By Equation (9) and Table 4.8, the closeness coefficients of all alternatives are calculated, the results are shown in the last row of Table 4.8. Ranking the alternatives Noting that the larger the alternative’ closeness coefficient is, the better the alternative would be. Therefore, the order of all alternatives (Hamming distance with mode I) can be produced, i.e.,
The best choice is x1.
It can also be derived that the order of all alternatives (Hamming distance with mode II) is x1 ≻ x3 ≻ x2 ≻ x4, i.e., the best choice is x1.
Therefore, x1 is the best alternative.
It’s worth noting that our result is different from the conclusion given by Liu and Rodríguez [23], i.e., the best choice is x3. From Table 4.7, when considering the criteria of Aroma and bouquet (C2), flavor and finish (C3) and overall quality level or potential (C4), it can be seen that x1 are all dominant to x3. So our result is relatively intuitive.
The assessments provided for this problem
Hamming distance between each alternative and the ideal points
To compare the feasibility and efficiency of our developed distance and similarity measures, we give some comparisons among diverse measures from both theoretically and practically.
Comparisons among existing distance/similarity masures
Comparisons among existing distance/similarity masures
The complexity of the developed method is the largest, but there does not exist information loss during the calculation;
The reason that our method keeps the information in the processing is the two HFLTSs don’t need to be shortened or enlarged through the calculation. Noting that a certain information loss would happen if the HFLTSs is shortened to be the envelope or is enlarged so that the dimensions of two HFLTSs are the same;
In Refs. [2, 11], the forms of Hamming distance and Euclidean distance are utilized. Correspondingly, the correlation between two variables can not be reflected, i.e., the case that more than one variable of the same characteristic would not be measured very well when using such distance/similarity measures. For instance, to compare the distances among
When using our method, the result is
Therefore, from the angle of jumpy and nonlinearity of human thinking, the distance given by our method is more intuitive.
In accordance with the natural processing way, the largest scale allows us to measure the distance/similarity among HFLTSs without any data compression.
The article introduced a new version of distance and similarity measure for HFLTSs. It is worth mentioning that the existing ways in defining the distance (or similarity) measures of HFLTSs usually loss the information either or contain certain subjective factors. For the reason that the distance (or similarity) measure is used to describe the difference (or similarity) between two HFLTSs and the measuring process can be finished by comparing the elements in two HFLTSs one-by-one, the new version of distance/similarity measure is developed. The Hamming distance, the Euclidean distance and their corresponding normalized ones are investigated. As applications of the proposed measures, three practical issues including the pattern recognition, the clustering analysis and the multi-criteria decision making are studied.
In the future, we consider the importance of each HFLTS in two series of HFLTSs and put forward the weighted distance/similarity measures. Besides, some other objective information measures of HFLTSs, for example, the entropy measure, are also meaningful to be considered. Moreover, the new version of distance/similarity measures can also be extended to hesitant fuzzy sets [13–15].
Footnotes
Acknowledgments
The work was supported by National Natural Science Foundation of China (Nos. 71301001, 71371011, 71501002). Philosophy and social science planning project in Anhui Province (No. AHSKQ2016D13), Provincial Natural Science Research Project of Anhui Colleges (Nos. KJ2015A379, KJ2016A250).
