Clustering experts in linguistic environment: A hybrid method

Abstract

Investigating clusters of experts is an interesting topic in the large-group decision-making (LGDM) problem, since being familiar with patterns (groups) of experts is beneficial to some other actions needed for decision-making (e.g., reconciliation of opinions derived from different expert groups). However, not too much attention has been paid to expert clustering in the LGDM problem under a linguistic environment. Besides, it seems that only the decision information is utilized to group experts while the auxiliary (outside) knowledge (e.g., expertise and occupation) about these experts has not been fully considered during the clustering process. To address this issue, this study proposes a hybrid method integrating outside knowledge about experts with practical preference information under the interval-valued linguistic environment to cluster experts. The method consists of four elements: pre-clustering of experts according to the given knowledge, the optimization model to transform the interval-valued 2-tuple linguistic (IV2TL) decision information, the data envelopment analysis-discriminant analysis (DEA-DA) model to deal with a two-cluster issue, and iterative clustering based on the DEA-DA model to cluster experts into multiple clusters. The feasibility and validity of the proposed method are illustrated with a real-world example. A comparison with the maximal tree clustering method in the linguistic environment is provided.

Keywords

Large-group decision-making (LGDM)interval-valued 2-tuple linguistic (IV2TL) representation model outside knowledge expert clustering

List of notations

A linguistic term set

(s_i, α_i)

A 2-tuple

s _i

A linguistic term

α _i

The value of symbolic translation

The aggregation result of a set of linguistic terms

[(s_i, α_i) , (s_j, α_j)]

An interval-valued 2-tuple

[β_i, β_j]

The aggregation result of a set of linguistic intervals

E = {e₁, e₂, . . . , e_N}

The expert set

X = {x₁, x₂, . . . , x_m}

The alternative set

U = {u₁, u₂, . . . , u_n}

The attribute set

\hat{B}, \hat{C}

The IV2TL decision matrix

The real-number decision matrix

G _g

The separated group (or cluster) to contain experts

The variable used to transform the IV2TL decision information into numerical values

λ _g

The importance degree of each objective

d, c, c – ɛ

The discriminant score

1 Introduction

Group decision-making (GDM) problems have been the focus of academic research, which aims to achieve a solution for a specific decision problem among some experts (decision makers) [1]. Generally, the GDM process only involves a relatively small scale of experts (e.g., 3–5 persons) and the complexity of this problem is low [2]. However, nowadays the group scale enlarges with the increasing complexity of decision-making problems and development of information technology, especially for some problems concerning public interests [3]. Meanwhile, decision environment, groups, and attributes have experienced profound changes in modern GDM problems, due to rapid development of society and economy [4]. As a result, conventional GDM models cannot effectively tackle these complex problems in which multiple relations and interests should be simultaneously balanced. Chen and Liu [5] referred to such problems as the large GDM (LGDM) problems and characterized them by four features: (a) experts of the group are allowed to make decisions at different times in different places benefited from the internet; (b) the group size generally exceeds 20, and both competition and cooperation occur among experts owing to their various background; (c) connections may exist among different decision attributes; (d) preference information of experts is uncertain. As such, the first step is to find a suitable type of data to represent uncertain information given by experts. Currently, several modes have been adopted in GDM problems to assign values to decision attributes, such as fuzzy numbers [6], intuitionistic fuzzy numbers [7], interval-valued intuitionistic fuzzy numbers [8], and linguistic variables [9]. Since experts tend to express their opinions with qualitative information in the real world, it is more appropriate to utilize linguistic variables instead of numerical values to model the features of human decision-making. Furthermore, linguistic computing dealing with uncertainty has been widely applied to different types of decision-making problems in practice due to its good performance [10 –12]. To effectively avoid loss and distortion of information in the course of linguistic information processing, Herrera and Martínez [13] developed the 2-tuple linguistic (2TL) representation model that consists of a linguistic term and a real number. In recent years, the 2TL representation model has been extensively utilized in the GDM problems [14]. However, in most cases, it is difficult for experts to express their preferences on decision attributes by means of a linguistic term, due to the complexity of human thinking process and that of decision-making problems under uncertainty [11]. Instead, the descriptions of decision attributes may fall between two linguistic terms (or a linguistic interval), in order to express their knowledge more accurately. For instance, the person may think that the price of a smartphone is between “Medium” and “High”. To solve this issue, Lin et al. [15] proposed the interval-valued 2-tuple linguistic (IV2TL) representation model to denote the interval-valued linguistic information, which satisfies the precondition that all experts share the same linguistic term set. Based on this, the present study will consider the LGDM problem under an interval-valued linguistic environment.

The procedure for the LGDM problem is generally composed of several major stages: expert clustering, weight determination for experts, weight determination for attributes, consensus reaching of experts, and comprehensive ranking [5]. Currently, research efforts have been conducted concerning the above stages, which lay a foundation for solving the LGDM problem [16, 17]. For instance, Liu et al. [11] developed a two-layer weight determination model to obtain expert weights in a cluster and the cluster weights, under the precondition that all expert clusters are known in advance. Liu et al. [4] adopted an objective method for assigning weights to the primary attributes based on decision information of all experts, where information transformation was achieved by means of a membership-based method directing at different expert groups (i.e., optimistic, neutral, and pessimistic). Due to the features of the LGDM problem, it also introduces some difficulties for experts to form a consensus in the decision-making process [18]. Suppose the preferences of all experts are various and mutually independent, then their thinking modes can be divided into three types: (a) experts in similar thinking patterns (a consensus exists); (b) the views of experts may change, but they all belong to a similar group; and (c) experts aggregate within groups (or similar sub-groups) [19]. The third thinking mode is the most common, which also reflects the focus of future study. Until now, many approaches have been put forward to help reach a consensus after decision information is collected from experts in the GDM and LGDM problems under various fuzzy environment [10 , 21]. In the process, clustering experts can result in decision information in each cluster with higher consistency and a lower degree of conflict, consequently simplifying the consensus reaching process and significantly improving its efficiency in such problems [20 , 23]. After acknowledging previous research, it can be seen that expert clustering acts as a critical role in the problem-solving process of LGDM, since it can avoid the redundant expert information, and reduce the complexity of group aggregation as well as workload of subsequent activities (e.g., weight determination for experts or attributes). Therefore, we devote ourselves to proposing a novel method for clustering experts in the LGDM problem.

Currently, the issues of expert clustering have been addressed by many methods under a fuzzy environment, but only a few direct at the LGDM problem under a (traditional) linguistic environment [22 –25]. Wang et al. [26] measured the similarity between any two clusters and separated experts into clusters using an improved hierarchical clustering approach. Xu et al. [27] developed a risk measurement model for quantifying emergency events based on attributes assessed by linguistic information, and then utilized a group member clustering algorithm to generate several aggregates with equal levels of decision risk. Xiao et al. [28] collected the preference degree between alternatives from experts using the linguistic distribution assessment and adopted the preference clustering approach to decompose them into clusters. In the course of expert clustering, information processing (from linguistic information to numeric one) is an indispensable step in previous studies, but the existing methods are mainly used to deal with the general linguistic information, such as 2 tuples, which cannot be applied to clustering experts in the LGDM problems under an interval-valued linguistic environment. Even though linguistic information is transformed, the typical approaches to clustering analysis (e.g., the minimal spanning tree [29], the vector space clustering method [19], and the k-means range algorithm [30]) need to set the initial ranking of clustering elements and the whole process is irreversible. As such, these methods are not justified to deal with clustering issues in the LGDM problems. Moreover, it is difficult to provide an exact number or a range for the appropriate threshold, which significantly influences the number of clusters [26].

To address these shortcomings, the objectives of this study include two parts: (a) to reasonably transform the interval-valued linguistic information given by experts, which lays a foundation for the later process of expert clustering; (b) to solve the clustering issue without depending on an initial setting of the threshold which is normally difficult to be determined. For the latter, the idea of combining outside knowledge about experts with practical preference information is proposed to solve the clustering issue in the LGDM problems, when clustering standards are deficient. If we can firstly divide experts into several groups according to the outside knowledge, it will guarantee a sound consistency in each group and reflect the distinct features among various groups. However, even if in the same group, there also exist some members with different decision attitudes. Thus, we need to allocate the individuals with different attitudes in each group into a more suitable group. To achieve the two objectives, this paper combines the optimization model, the data envelopment analysis-discriminant analysis (DEA-DA) model, and iterative clustering to develop a hybrid method for clustering experts in the LGDM problem under an interval-valued linguistic environment. Not only does this method lay the foundation for weight determination for experts (attributes) and decision information aggregation, but it also extends the application of the DEA-DA model from the real number field to a linguistic environment.

The remainder of this paper is organized as follows: we first introduce some concepts, theorems and definitions regarding to the 2TL and IV2TL representation models in Section 2. In Section 3, a framework of the hybrid method for clustering experts is established to solve the LGDM problem under an interval-valued linguistic environment and its primary components are then introduced in detail. In Section 4, a practical example is illustrated to present the feasibility of the hybrid method, and then the proposed method is compared with the maximal tree clustering method under the 2TL environment. We conclude the paper and give some future extensions in Section 5.

2 Preliminaries

In this section, we first introduce basic concepts of the 2TL representation model as well as some definitions and theorems regarding 2-tuple(s). Then, the IV2TL representation model will be reviewed at the end of this section.

2.1 The 2TL representation model

Let S = {s_i|i = 0, 1, 2, . . . , t} be a linguistic term set with odd cardinality. Any label s_i represents a possible value for a linguistic variable, and it should satisfy the following characteristics: (a) the set is ordered: s_i > s_j, if i > j; (b) max operator: max(s_i, s_j) = s_i, if s_i ≥ s_j; (c) min operator: min(s_i, s_j) = s_i, if s_i ≤ s_j; (d) there is a negation operator: Neg (s_i) = s_j, where j = t–i [13]. For example, S can be defined as S={s₀= extremely poor, s₁= very poor, s₂= poor, s₃= medium, s₄= good, s₅= very good, s₆= extremely good}.

A 2TL representation model based on the concept of symbolic translation is used to represent the linguistic evaluation information by means of a 2-tuple (s_i, α_i). Where, s_i is a linguistic term in predefined linguistic term set S; α_i is the value of symbolic translation and α_i ∈ [-0.5, 0.5).

Definition 1 [13]. Suppose β ∈ [0, t] is the aggregation result of a set of linguistic terms from S, i = round (β) and α = β - i are two values, where i ∈ [0, t] and α ∈ [-0.5, 0.5). α is the symbolic translation and the round (·) is the usual rounding operation.

Definition 2 [13]. Suppose β ∈ [0, t] is the aggregation result of a set of linguistic terms from S. Then, the 2-tuple, which indicates the equivalent information to β, could be obtained by the following function: $Δ : [0, t] \to S \times [- 0.5, 0.5),$ (1) $Δ (β) = {\begin{matrix} s_{i}, i = round (β) \\ α = β - i, α \in [- 0.5, 0.5) \end{matrix}$ (2)

Definition 3 [13]. Let S = {s_i|i = 0, 1, 2, . . . , t} be a linguistic term set and (s_i, α_i) be a 2-tuple. There is a Δ^-1 function that restores the 2-tuple to its equivalent numerical value β ∈ [0, t] ⊂ R, where $Δ^{- 1} : S \times [- 0.5, 0.5) \to [0, t],$ (3) $Δ^{- 1} (s_{i}, α) = i + α = β .$ (4)

It is obvious that the transformation of a linguistic term into a 2TL representation consists of adding a value zero as a symbolic translation, i.e., Δ^-1 (s_i, 0) = i.

Definition 4 [31]. Let x = {(s₁, α₁) , (s₂, α₂) , . . . . , (s_n, α_n)} be a set of 2-tuples. Then, the 2-tuple arithmetic mean is defined as $\begin{matrix} (\bar{s}, \bar{α}) = TAM ((s_{1}, a_{1}), (s_{2}, a_{2}), . . ., (s_{n}, a_{n})) \\ = Δ (\frac{1}{n} \sum_{j = 1}^{n} Δ^{- 1} (s_{j}, a_{j})), \bar{s} \in S, \\ \bar{α} \in [- 0.5, 0.5) . \end{matrix}$ (5)

Definition 5 [32]. Let (s_i, α_i) and (s_j, α_j) be two arbitrary 2-tuples derived from one linguistic term set. Then, the 2-tuple deviation value between (s_i, α_i) and (s_j, α_j) is defined as $d ((s_{i}, α_{i}), (s_{j}, α_{j})) = Δ^{- 1} (s_{i}, α_{i}) - Δ^{- 1} (s_{j}, α_{j}) .$ (6)

According to Definition 5, the properties of the 2-tuple deviation measure can be inferred as follows:

Theorem 1 [11]. Let (s_i, α_i) , (s_j, α_j) , (s_k, α_k) be three 2-tuples. Then,

d ((s_i, α_i) , (s_j, α_j)) ∈ R. In particular, d ((s_i, α_i) , (s_j, α_j)) =0 if Δ^-1 (s_i, α_i) = Δ^-1 (s_j, α_j);

d ((s_i, α_i) , (s_j, α_j)) = - d ((s_j, α_j) , (s_i, α_i));

d ((s_i, α_i) , (s_k, α_k)) = d ((s_i, α_i) , (s_j, α_j)) + d ((s_j, α_j) , (s_k, α_k)).

Based on the 2-tuple deviation measure, the 2-tuple covariance and the 2-tuple standard deviation are introduced respectively, which will lay the foundation for the optimization model under the interval-valued linguistic environment in Section 3.

Definition 6 [11]. Let B = (b_ij) _m×n be the 2-tuple matrix, where b_ij = (s_ij, α_ij), s_ij ∈ S and α_ij ∈ [-0.5, 0.5). Then, $\begin{matrix} Cov (b_{j}, b_{k}) = \frac{1}{m} \sum_{i = 1}^{m} [d (b_{ij}, \bar{b_{j}}) \times d (b_{ik}, \bar{b_{k}})] \\ (j, k = 1, 2, . . ., n) \\ = \frac{1}{m} \sum_{i = 1}^{m} [d ((s_{ij}, α_{ij}), (\bar{s_{j}}, \bar{α_{j}})) \\ \times d ((s_{ik}, α_{ik}), (\bar{s_{k}}, \bar{α_{k}}))] \\ = \frac{1}{m} \sum_{i = 1}^{m} [(Δ^{- 1} (s_{ij}, α_{ij}) - Δ^{- 1} (Δ \\ (\frac{1}{m} \sum_{i = 1}^{m} Δ^{- 1} (s_{ij}, α_{ij})))) \times (Δ^{- 1} \end{matrix}$ $\begin{matrix} (s_{ik}, α_{ik}) - Δ^{- 1} (Δ (\frac{1}{m} \sum_{i = 1}^{m} Δ^{- 1} (s_{ik}, \\ α_{ik}))))] \\ = \frac{1}{m} \sum_{i = 1}^{m} [(β_{ij} - \frac{1}{m} \sum_{i = 1}^{m} β_{ij}) \\ \times (β_{ik} - \frac{1}{m} \sum_{i = 1}^{m} β_{ik})], \end{matrix}$ (7) where, b_j and b_k are two arbitrary columns of 2-tuples; Cov (b_j, b_k) is the covariance of b_j and b_k; $(\bar{s_{j}}, \bar{α_{j}}) = Δ (\frac{1}{m} \sum_{i = 1}^{m} Δ^{- 1} (s_{ij}, α_{ij}))$ is the mean of 2-tuples in the jth column; $d ((s_{ij}, α_{ij}), (\bar{s_{j}}, \bar{α_{j}}))$ is the deviation value of the 2-tuple in the ith row of the jth column and the mean of the 2-tuples in the jth column; $\bar{b_{k}}$ and $d (b_{ik}, \bar{b_{k}})$ are the similar meanings as $\bar{b_{j}}$ and $d (b_{ij}, \bar{b_{j}})$ , respectively.

Specifically, when j = k, the 2-tuple variances can be obtained as Definition 7.

Definition 7 [11]. Let B = (b_ij) _m×n be the 2-tuple matrix, where b_ij = (s_ij, α_ij), s_ij ∈ S and α_ij ∈ [-0.5, 0.5). Then, $\begin{array}{l} δ_{_{j}}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {[d (b_{i j}, \bar{b_{j}})]}^{2} (j = 1, 2, ..., n) \\ = \frac{1}{m} \sum_{i = 1}^{m} {[d ((s_{i j}, α_{i j}), (\bar{s_{j}}, \bar{α_{j}}))]}^{2} \\ = \frac{1}{m} \sum_{i = 1}^{m} {[Δ^{- 1} (s_{i j}, α_{i j}) - Δ^{- 1} (Δ (\frac{1}{m} \sum_{i = 1}^{m} Δ^{- 1} (s_{i j}, α_{i j})))]}^{2} \\ = \frac{1}{m} \sum_{i = 1}^{m} {[β_{i j} - \frac{1}{m} \sum_{i = 1}^{m} β_{i j}]}^{2}, \end{array}$ (8) where, $δ_{j}^{2}$ and $δ_{j}$ are the variance and the standard deviation of the jth column of the 2-tuple matrix; the meanings of $\bar{b_{j}}$ and $d (b_{ij}, \bar{b_{j}})$ are the same as Definition 6.

2.2 The IV2TL representation model

Since the opinions of experts are largely restricted by some factors (e.g., their knowledge, environment, and working experience), the attribute descriptions provided by experts may fall between the two linguistic terms (linguistic intervals). To solve this problem, we introduce the concept of the IV2TL representation model [11, 15]. To ensure the aggregation results of these linguistic intervals could be explained without a loss of information, we assume that all the experts share the same linguistic term set.

Definition 8 [11, 15]. Let S = {s_i|i = 0, 1, 2, . . . , t} be a linguistic term set and [β_i, β_j] be the aggregation result of a set of linguistic intervals, where β_i, β_j ∈ [0, t] , β_i < β_j. The interval-valued 2-tuple can be derived as: $\begin{matrix} Δ ([β_{i}, β_{j}]) = {[(s}_{i}, α_{i} {), (s}_{j}, α_{j})], with \\ {\begin{matrix} s_{i}, i = round (β_{i}) \\ s_{j}, j = round (β_{j}) \\ α_{i} = β_{i} - i, α_{i} \in [- 0.5, 0.5) \\ α_{j} = β_{j} - j, α_{j} \in [- 0.5, 0.5) \end{matrix} \end{matrix}$ (9)

Definition 9 [11]. Let S = {s_i|i = 0, 1, 2, . . . , t} be a linguistic term set and [(s_i, α_i) , (s_j, α_j)] be an interval-valued 2-tuple. There is a Δ^-1 function that restores the interval-valued 2-tuple to its equivalent interval value [β_i, β_j] , β_i, β_j ∈ [0, t] , β_i < β_j, where $Δ^{- 1} {[(s}_{i}, α_{i} {), (s}_{j}, α_{j})] = [i + α_{i}, j + α_{j}] = [β_{i}, β_{j}] .$ (10)

Similarly, a linguistic interval [s_i, s_j] can be transformed into an IV2TL representation [(s_i, 0) , (s_j, 0)]. Besides, the 2-tuple could also be considered as a special interval-valued 2-tuple in the form of (s_i, α_i) = [(s_i, α_i) , (s_i, α_i)].

Example 1. Suppose an interval-valued 2-tuple assessed in S = {s₀, s₁, s₂, s₃, s₄, s₅, s₆} obtains its result β_i = 3.1 and β_j = 3.8. Then, the representation of this information with an interval-valued 2-tuple will be Δ ([3 . 1, 3 . 8]) = [(s₃, 0.1) , (s₄, - 0.2)] as shown in Fig. 1.

Fig. 1

Example of an IV2TL representation model.

3 A hybrid method for clustering experts in the interval-valued linguistic environment

This section mainly includes four parts: we first give the mathematical representation of the LGDM problem in an interval-valued linguistic environment. Based on GDM information, a framework of the hybrid method for clustering experts is proposed under the environment and its components including the optimization model and the DEA-DA model are then introduced in detail.

3.1 The LGDM problem under the interval-valued linguistic environment

In the LGDM problem, experts are invited to give their evaluations to the attributes of some alternatives. According to four features of the LGDM problem described in the introduction, suppose that E = {e₁, e₂, . . . , e_N} (N ≥ 20) is the expert set, X = {x₁, x₂, . . . , x_m} (m ≥ 2) is the alternative set, and U = {u₁, u₂, . . . , u_n} (n ≥ 2) is the attribute set, then each expert e_k ∈ E should give the opinion to each attribute u_j ∈ U belonging to each alternative x_i ∈ X. Let ${\hat{B}}_{k} = ({\tilde{b}}_{ij}^{k})_{m \times n}$ be the IV2TL decision matrix of expert e_k, where ${\tilde{b}}_{ij}^{k}$ represents the evaluation value of expert e_k towards the attribute u_j of the alternative x_i. We assume that the experts give the evaluation ${\tilde{b}}_{ij}^{k}$ in form of the IV2TL representation model like [(s_i, α_i) , (s_j, α_j)] derived from the linguistic term set S = {s_i|i = 0, 1, 2, . . . , t}. Then, the IV2TL decision information in the LGDM problem could be represented by the following matrix: $\hat{B} = [\begin{matrix} {\hat{B}}_{1} \\ ⋮ \\ {\hat{B}}_{N} \end{matrix}] = {([\begin{matrix} {\tilde{b}}_{11}^{1} & \dots & {\tilde{b}}_{1 n}^{1} \\ ⋮ & ⋱ & ⋮ \\ {\tilde{b}}_{m 1}^{1} & \dots & {\tilde{b}}_{mn}^{1} \end{matrix}] \dots [\begin{matrix} {\tilde{b}}_{11}^{N} & \dots & {\tilde{b}}_{1 n}^{N} \\ ⋮ & ⋱ & ⋮ \\ {\tilde{b}}_{m 1}^{N} & \dots & {\tilde{b}}_{mn}^{N} \end{matrix}])}^{T}$

Thus, expert clustering in the LGDM problem under the interval-valued linguistic environment could be defined as the process of obtaining expert clusters based on the given decision matrix.

3.2 Framework of the hybrid method for clustering experts

Considering there is lack of mature standards to provide an exact number or a range for the proper threshold in the clustering issue, outside knowledge about experts (e.g., career, occupation, and age) can be regarded as the basis for pre-classification, and the attribute selection mainly depends on the actual decision-making issue. However, even if in the same group, there also exist some members with different decision attitudes. Based on this, the paper proposes a hybrid method for clustering experts in the LGDM problem combining outside knowledge and practical preference information. In other words, we firstly divide experts into several groups based on outside knowledge, and then allocate the individuals with different attitudes in each group into a more suitable group by means of the proposed method. Compared with traditional clustering methods, the method in this paper is applied under the precondition that partial information about experts is known. A framework of the hybrid method for clustering experts in the LGDM problem under the interval-valued linguistic environment is displayed in Fig. 2.

Fig. 2

A framework of the hybrid method for clustering experts in the LGDM problem.

It can be seen that the hybrid method consists of four elements, that is, the IV2TL decision information, the optimization model, the two-stage MIP DEA-DA model (in what follows, this model is abbreviated as the DEA-DA model) to deal with a two-cluster issue, and iterative clustering to generate multiple expert clusters. Where, the optimization model developed in this study is used to transform the IV2TL decision information into information represented by 2 tuples (or finally numerical values); the DEA-DA model as a non-parametric approach proposed by Sueyoshi [33] incorporated the discriminant capabilities of DEA into DA (an important evaluation and clustering analysis method), in order to avoid misclassification and overlap in DA.

Suppose that there exist N experts in the LGDM problem under an interval-valued linguistic environment and the total samples G to be clustered consist of all the experts. If we divide all the experts into h (h ≥ 2) groups or sample groups according to their outside knowledge, suppose there exist N_g experts in each sample group G_g (g = 1, 2, . . . , h), and $\sum_{g = 1}^{h} N_{g} = N$ . This paper adopts the concept of iterative clustering to transform a h-group clustering problem into h–1 two-group clustering ones, in order to overcome the limitations of the DEA-DA model: model performance largely depends on the sample pre-classification and the method is applied to solve clustering issues with two groups. As such, in Fig. 2, there are h–1 phases for the entire iterative clustering and each phase has its own DEA-DA model labelled by a serial number. If we take phase 2 as an example, the DEA-DA model (i.e., DEA-DA₂) will split the cluster G_g that was generated at phase 1 into another two clusters (G₂ and G_g (g ≠ 1, 2)) until clustering results at this phase remain stable. Similarly, the new cluster G_g at a certain phase will be sent to the next phase to repeat the steps and the iterative process will end until h groups of experts are generated at phase h–1. Furthermore, an example is given as Fig. 3 to visually mimic the results of iterative clustering.

Fig. 3

Visualization of iterative clustering.

Figure 3(a) represents the discriminant result obtained from the clustering of the original sample, which indicates the result achieved from the first iteration of the DEA-DA model based on the IV2TL decision information at the first phase. The two parallel lines in each iteration are regarded as the two hyperplanes determined in the HO model (the second stage of the DEA-DA model). It can be observed from this iterative process that the number of overlapping samples between the two sample groups would decrease gradually with a convergent trend as the iteration continues. The whole process will end until no overlapping areas exist.

3.3 The optimization model

To make a relatively comprehensive description of the evaluation information from all experts regarding different alternatives, this paper regards the attributes of each alternative as different evaluation objects. Thus, the number of attributes that all the experts evaluate increases from n to l = m × n. If we collect each line of ${\hat{B}}_{k} (k = 1, 2, . . ., N)$ as one group of evaluation values of attributes, the vector ${\hat{B}}_{k}^{'} = ({\tilde{b}}_{ij}^{k})_{1 \times (m \times n)}$ is obtained. Thus, the original IV2TL decision matrix $\hat{B}$ can be transformed into the new IV2TL decision matrix $\hat{C} = ({\tilde{c}}_{kj})_{N \times l}$ , where l = m × n. Then, the corresponding interval value $[{\underline{β}}_{kj}, {\bar{β}}_{kj}]$ of ${\tilde{c}}_{kj}$ could be obtained by Equation (10). Basically, $[{\underline{β}}_{kj}, {\bar{β}}_{kj}]$ can be regarded as the number set with the same probability in a certain scope, where ${\underline{β}}_{kj}$ and ${\bar{β}}_{kj}$ represent the two endpoints of the interval value. Then, $[{\underline{β}}_{kj}, {\bar{β}}_{kj}]$ can be represented by a segment $β_{kj}^{'} = μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj} (0 \leq μ \leq 1)$ . From this point, when variable μ is assigned to a specific value, the interval value $[{\underline{β}}_{kj}, {\bar{β}}_{kj}]$ will be transformed into a general real-number. In other words, the attribute value will be converted from one segment to one point. Also, the linguistic meaning of the general real number can be translated based on Definition 2. However, different values of μ will lead to various distributions of all attribute values, and further influence clustering results of the experts. Therefore, the quality of clustering results is determined by confirming the value of μ reasonably and effectively. This paper considers that the optimal value of μ is to make the distance of intragroup samples as small as possible (a preferable homogeneity is equipped) and the distance of intergroup samples as large as possible (a preferable diversity is equipped). Take three groups as an example to show the expected clustering results, when the optimal μ is determined (see Fig. 4). We desire the values of d₁, d₂ and d₃ are as large as possible and distributions of samples in G₁, G₂ and G₃ are relatively concentrated.

Fig. 4

The expected clustering results.

To achieve this goal, variances are used to measure the deviation between samples and their sample means. There are numerous examples concerning the applications of variances [34 –37]. Based on this, we use the concept of variance to establish the optimization model for clustering experts, which lays the foundation to construct the DEA-DA model under the interval-valued linguistic environment. If we divide all the experts into h (h ≥ 2) sample groups, there exist N_g experts in each sample group G_g (g = 1, 2, . . . , h), and $\sum_{g = 1}^{h} N_{g} = N$ . Since the expected clustering results are to maximize the variances of intergroup samples and to minimize the variances of intragroup samples, multi-objective functions should be established.

Step 1. Establish the objective function f (μ) for the maximum intergroup variance.

Based on Equation (5), in sample group G_g, the mean $c_{gj}^{'}$ of N_g experts’ evaluation values for attribute u_j is $\begin{matrix} c_{gj}^{'} = TAM (Δ (μ {\underline{β}}_{1 j} + (1 - μ) {\bar{β}}_{1 j}), \\ Δ (μ_{g} {\underline{β}}_{2 j} + (1 - μ) {\bar{β}}_{2 j}), . . ., Δ (μ {\underline{β}}_{N_{g} j} + (1 - μ) {\bar{β}}_{N_{g} j})) \\ = Δ (\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})) \\ (g = 1, 2, . . ., h; j = 1, 2, . . ., l) . \end{matrix}$ (11)

According to Equations (5) and (8), the variance $σ_{j}^{2}$ , which represents variances of all sample groups’ evaluation values of attribute u_j, is $\begin{matrix} σ_{j}^{2} = \frac{1}{h} \sum_{g = 1}^{h} [d (c_{gj}^{'}, {\bar{c}}_{j}^{'})]^{2} (j = 1, 2, . . ., l) \\ = \frac{1}{h} \sum_{g = 1}^{h} [d (Δ (\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})), \end{matrix}$ $\begin{matrix} TAM (Δ (\frac{1}{N_{1}} \sum_{k = 1}^{N_{1}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})), \\ Δ (\frac{1}{N_{2}} \sum_{k = 1}^{N_{2}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})), . . ., \\ Δ (\frac{1}{N_{h}} \sum_{k = 1}^{N_{h}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}))))]^{2} \\ = \frac{1}{h} \sum_{g = 1}^{h} [\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}) \\ - \frac{1}{h} \sum_{g = 1}^{h} (\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})))]^{2}, \end{matrix}$ (12) where, ${\bar{c}}_{j}^{'}$ is the mean of evaluation values of all the sample groups.

As clustering experts in the LGDM problem is based on the experts’ evaluations of each attribute of all the alternatives, this paper regards the importance degree of all attributes as equal to determine the value of μ more objectively. In this case, the variance weights of each attribute are all assigned to one. Then, the objective function with the maximum intergroup variance could be represented by $max f (μ) = \sum_{j = 1}^{l} σ_{j}^{2}$ .

Step 2. Establish the objective function f_g (μ) for the minimum intragroup variance.

According to Equations (5) and (8), in the sample group G_g, the variance $σ_{gj}^{2}$ of the N_g experts’ evaluations of attribute μ_j could be represented by $\begin{matrix} σ_{gj}^{2} = \frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} [d (c_{kj}^{'}, {\bar{c}}_{j}^{″})]^{2} (j = 1, 2, . . ., l) \\ = {\frac{1}{N}}_{g} \sum_{k = 1}^{N_{g}} [d (Δ (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}), \\ TAM (Δ (μ {\underline{β}}_{1 j} + (1 - μ) {\bar{β}}_{1 j}), Δ (μ {\underline{β}}_{2 j} + (1 - μ) \\ {\bar{β}}_{2 j}), . . ., Δ (μ {\underline{β}}_{N_{g} j} + (1 - μ) {\bar{β}}_{N_{g} j}))]^{2} \\ = \frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} [(μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}) \\ - \frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})]^{2}, \end{matrix}$ (13) where, ${\bar{c}}_{j}^{″}$ is the mean of N_g experts’ 2-tuple evaluation value $c_{kj}^{'}$ of attribute u_j in the sample group G_g.

Similar to the disposal of variance weights of intergroup attributes, the objective function with the minimum intragroup variance could be represented by $min f_{g} (μ) = \sum_{j = 1}^{l} σ_{gj}^{2}, g = 1, 2, . . ., h$ .

Step 3. Based on the concept of linear weighting methods in the multi-objective programming, an optimization model is established as Equation (14). Notably, the objective function with the maximum intergroup variance max f (μ) should be transformed into $min - f (μ) = - \sum_{j = 1}^{n} σ_{j}^{2}$ to ensure that the obtained value of the target function is the minimum. $\begin{matrix} min \sum_{g = 1}^{h} λ_{g} f_{g} (μ) - λ_{h + 1} f (μ) \\ s . t . 0 \leq μ \leq 1, \\ \sum_{g = 1}^{h + 1} λ_{g} = 1, λ_{g} \geq 0 . \end{matrix}$ (14)

If place the expressions f (μ) and f_g (μ) into the above formula, then the formula could be further delivered as $\begin{matrix} min \sum_{g = 1}^{h} (\frac{λ_{g}}{N_{g}} \sum_{j = 1}^{l} \sum_{k = 1}^{N_{g}} [(μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}) \\ - \frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj})]^{2}) \\ - \frac{λ_{h + 1}}{h} \sum_{j = 1}^{l} \sum_{g = 1}^{h} [\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}) \\ - \frac{1}{h} \sum_{g = 1}^{h} (\frac{1}{N_{g}} \sum_{k = 1}^{N_{g}} (μ {\underline{β}}_{kj} + (1 - μ) {\bar{β}}_{kj}))]^{2} \\ s . t . 0 \leq μ \leq 1, \\ \sum_{g = 1}^{h + 1} λ_{g} = 1, λ_{g} \geq 0, \end{matrix}$ (15) where, λ_g is the importance degree of each objective. The determination methods of their values include Delphi, the analytic hierarchy process and the αmethod. When we adopt the DEA-DA model to cluster experts, assume that the importance degree of each objective should be equal. In other words, the value of λ_g should be 1/(h + 1). Furthermore, the objective function can be finally simplified in form of a quadratic function with μ under the constraint μ ∈ [0, 1]. If the quadratic function opens upward, the variable μ^* only has one optimal solution; otherwise, one or two optimal solution(s) may exist.

3.4 The DEA-DA model

The DEA-DA model (basically, the mixed integer linear programming) can produce a locally optimal solution for weight estimates of discriminant functions, further generating evaluation scores to determine group membership [33]. Suppose there are two sample groups G₁ and G₂ in G during phase g (g = 1, 2, . . . , h - 1), where there exist N₁ and N₂ experts, respectively and N₁ + N₂ ≤ N. We could obtain the DEA-DA model based on the general real-numbers $β_{kj}^{'} = μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj} (0 \leq μ^{*} \leq 1)$ , where μ^* is calculated by Equation (15). A more detailed introduction for the DEA-DA model can refer to the study of Sueyoshi [33].

Stage 1. Build the classification and overlap identification (COI) model $\begin{matrix} min s \\ s . t . \sum_{j = 1}^{l} (λ_{j}^{+} - λ_{j}^{-}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] \\ - d + s \geq 0, k \in G_{1}, \\ \sum_{j = 1}^{l} (λ_{j}^{+} - λ_{j}^{-}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] \\ - d - s \leq 0, k \in G_{2}, \\ \sum_{j = 1}^{l} (λ_{j}^{+} + λ_{j}^{-}) = 1, \\ ζ_{j}^{+} \geq λ_{j}^{+} \geq ɛ ζ_{j}^{+} and ζ_{j}^{-} \geq λ_{j}^{-} \geq ɛ ζ_{j}^{-}, \\ j = 1, 2, . . ., l, \\ ζ_{j}^{+} + ζ_{j}^{-} \leq 1, j = 1, 2, . . ., l, \\ \sum_{j = 1}^{l} (ζ_{j}^{+} + ζ_{j}^{-}) = l, \end{matrix}$ $\begin{matrix} d, s : unrestricted, ζ_{j}^{+} = 0 / 1, ζ_{j}^{-} = 0 / 1, \\ and all other variables \geq 0, \end{matrix}$ (16) where, d is a discriminant score; s is a slack that represents an absolute distance between the discriminant score (d) and the discriminant function.

Assume $λ_{j}^{+ *}$ , $λ_{j}^{- *}$ , d^* and s^* are the optimal solutions of the COI model, the corresponding discriminant rule is: if s^* < 0, then there is no overlap within sample groups; if s^* ≥ 0: (a) if $\sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] > d^{*} + s^{*}$ , then sample k is allocated to G₁; (b) if $\sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] < d^{*} - s^{*}$ , then sample k is allocated to G₂; (c) if $d^{*} - s^{*} \leq \sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] \leq d^{*} + s^{*}$ , then sample k belongs to G₁ ∩ G₂. According to this rule, G can be divided into the following subsets:

$R_{1} = {k \in G_{1} | \sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} +$ $(1 - μ^{*}) {\bar{β}}_{kj}] > d^{*} + s^{*}},$

$R_{2} = {k \in G_{2} | \sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} +$ $(1 - μ^{*}) {\bar{β}}_{kj}] < d^{*} - s^{*}},$

D₁ = G₁ - R₁,

D₂ = G₂ - R₂,

$D = D_{1} \cup D_{2} = {k \in G | d^{*} - s^{*} \leq \sum_{j = 1}^{l} (λ_{j}^{+ *}$ $- λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] \leq d^{*} + s^{*}},$

where, D = D₁ ∪ D₂ indicates the overlapping or misclassified samples, which could be further judged through the discriminant model in the second stage.

Stage 2. Build the handling overlap (HO) model $\begin{matrix} min \sum_{k \in D_{1}} y_{k} + \sum_{k \in D_{2}} y_{k} \\ s . t . \sum_{j = 1}^{l} (λ_{j}^{+} - λ_{j}^{-}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] - c \\ + {My}_{k} \geq 0, k \in D_{1}, \end{matrix}$ $\begin{matrix} \sum_{j = 1}^{l} (λ_{j}^{+} - λ_{j}^{-}) [μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] - c \\ - {My}_{k} \leq - ɛ, k \in D_{2}, \\ \sum_{j = 1}^{l} (λ_{j}^{+} + λ_{j}^{-}) = 1, \\ ζ_{j}^{+} \geq λ_{j}^{+} \geq ɛ ζ_{j}^{+} and ζ_{j}^{-} \geq λ_{j}^{-} \geq ɛ ζ_{j}^{-}, \\ ζ_{j}^{+} + ζ_{j}^{-} \leq 1, j = 1, 2, . . ., l, \\ \sum_{j = 1}^{l} (ζ_{j}^{+} + ζ_{j}^{-}) = p, \\ c : unrestricted, ζ_{j}^{+} = 0 / 1, ζ_{j}^{-} = 0 / 1, \\ y_{k} = 0 / 1 and all other variables \geq 0, \end{matrix}$ (17) where, M is a given large number; y_k, a binary variable, is used to count the number of misclassified observations; c (k ∈ G₁) and c - ɛ (k ∈ G₂) are the discriminant scores. If the number of overlapping or misclassified samples exceeds the number of observations l, then p = l; otherwise, p is a positive integer satisfying p < l.

Assume $λ_{j}^{+ *}$ , $λ_{j}^{- *}$ and c^* are the optimal solutions of the HO model, the corresponding discriminant rule is: (a) if $\sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *})$ $[μ^{*} {\underline{β}}_{kj} + (1 - μ^{*}) {\bar{β}}_{kj}] \geq c^{*}$ , then sample k is allocated to G₁; (b) if $\sum_{j = 1}^{l} (λ_{j}^{+ *} - λ_{j}^{- *}) [μ^{*} {\underline{β}}_{kj} +$ $(1 - μ^{*}) {\bar{β}}_{kj}] \leq c^{*} - ɛ$ , then sample k is allocated to G₂.

Then, we classify the two sample groups G₁ and G₂ on a iterative basis, until samples belonging to G₁ and G₂ remain stable. Based on the framework presented in Fig. 2, the integrated process of the hybrid method is presented in Algorithm 1. Based on the clusters derived from the DEA-DA model at a given iteration, the principles to select the new group for clustering in the next iteration are: (a) if the two clusters are significantly unbalanced, the larger-size group will be handled further; (b) if the two clusters are nearly balanced, the group with more different group members pre-classified by outside knowledge would be selected.

Algorithm 1.

The hybrid method for clustering experts in the LGDM problem in linguistic environment.

Input: Experts with outside knowledge; The IV2TL decision matrix

\hat{B}

Phase 1: Transform the IV2TL decision information into numerical values

Step 1. Establish the objective function

max f (μ) = \sum_{j = 1}^{l} σ_{j}^{2}

for the intergroup variance by Equation (12).

Step 2. Establish the objective function

min f_{g} (μ) = \sum_{j = 1}^{l} σ_{gj}^{2}

for the intragroup variance by Equation (13).

Step 3. Establish an optimization model integrating the above objective functions to obtain the best solution for μ^* by Equation (15).

Step 4. Generate the real-number decision matrix R.

Phase 2: Separate N experts into h clusters

Fori ← 1 toh–1 do

Step 1. Pre-classify all experts into two clusters according to outside knowledge.

Step 2. Execute the DEA-DA model (i.e., Equation (16) and Equation (17)) to generate two new clusters.

Step 3. Select one cluster from the two new clusters according to the principles for the next clustering iteration.

End For

Output: h clusters of all experts, where N_g experts in each cluster satisfies

\sum_{g = 1}^{h} N_{g} = N

4 Illustrative example

To investigate the feasibility and validity of the proposed hybrid method for clustering experts, the salary reform for teachers at the university is given as an example to show the application in the LGDM problem under an interval-valued linguistic environment. Then, we will compare the proposed method with another clustering method.

4.1 Example with the hybrid method

To reasonably determine the salary level of each professor, the university authority decides to enact a concrete salary reform plan. The financial department at the university invites 30 experts e_k (k = 1, 2, . . . , 30), including administrative staff, research and teaching staff as well as retirees, to evaluate three candidate alternatives x_i (i = 1, 2, 3). Assume that each alternative has three attributes (research contribution u₁, teaching quality u₂ and service assurance u₃) and all the experts adopt the linguistic term set S= {s₀= extremely poor, s₁= very poor, s₂ =poor, s₃ =medium, s₄ =good, s₅ =very good, s₆ =extremely good} to evaluate each attribute of all alternatives. Note that the number of experts, alternatives or attributes in reality may be much larger than that listed in the example.

Step 1. According to the occupation of each expert (i.e., administrative staff, research and teaching staff, and retirees), we divided the 30 experts into three groups. Suppose e₁-e₁₀ belonged to Group 1, e₁₁-e₂₀ belonged to Group 2, and e₂₁-e₃₀ belonged to Group 3.

Step 2. We gathered the evaluation values given by the experts with the IV2TL representation model in Table 1.

Table 1
The IV2TL decision matrix $\hat{B}$

Sample u ₁ u ₂ u ₃ Sample u ₁ u ₂ u ₃

e ₁ x ₁ [(s₂, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)] [(s₁, 0), (s₂, 0)] e ₂ x ₁ [(s₄, 0), (s₅, 0)] [(s₃, 0), (s₄, 0)] [(s₄, 0), (s₆, 0)]

x ₂ [(s₄, 0), (s₅, 0)] [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] x ₂ [(s₃, 0), (s₅, 0)] [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₅, 0)]

x ₃ [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₅, 0)] [(s₁, 0), (s₂, 0)] x ₃ [(s₁, 0), (s₂, 0)] [(s₂, 0), (s₃, 0)] [(s₂, 0), (s₄, 0)]

e ₃ x ₁ [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] [(s₅, 0), (s₆, 0)] e ₄ x ₁ [(s₄, 0), (s₆, 0)] [(s₃, 0), (s₄, 0)] [(s₄, 0), (s₆, 0)]

x ₂ [(s₄, 0), (s₅, 0)] [(s₃, 0), (s₅, 0)] [(s₄, 0), (s₅, 0)] x ₂ [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₅, 0)]

x ₃ [(s₁, 0), (s₃, 0)] [(s₂, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)] x ₃ [(s₁, 0), (s₂, 0)] [(s₂, 0), (s₃, 0)] [(s₂, 0), (s₄, 0)]

e ₅ x ₁ [(s₂, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)] [(s₁, 0), (s₂, 0)] e ₆ x ₁ [(s₄, 0), (s₅, 0)] [(s₃, 0), (s₄, 0)] [(s₄, 0), (s₆, 0)]

x ₂ [(s₄, 0), (s₆, 0)] [(s₄, 0), (s₆, 0)] [(s₄, 0), (s₆, 0)] x ₂ [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)]

x ₃ [(s₃, 0), (s₄, 0)] [(s₄, 0), (s₅, 0)] [(s₀, 0), (s₂, 0)] x ₃ [(s₂, 0), (s₃, 0)] [(s₂, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)]

... ... ...

e ₂₅ x ₁ [(s₄, 0), (s₅, 0)] [(s₂, 0), (s₃, 0)] [(s₃, 0), (s₄, 0)] e ₂₆ x ₁ [(s₄, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] [(s₅, 0), (s₆, 0)]

x ₂ [(s₂, 0), (s₄, 0)] [(s₄, 0), (s₅, 0)] [(s₂, 0), (s₃, 0)] x ₂ [(s₃, 0), (s₅, 0)] [(s₃, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)]

x ₃ [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] [(s₃, 0), (s₄, 0)] x ₃ [(s₁, 0), (s₂, 0)] [(s₂, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)]

e ₂₇ x ₁ [(s₃, 0), (s₅, 0)] [(s₂, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)] e ₂₈ x ₁ [(s₃, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)] [(s₃, 0), (s₅, 0)]

x ₂ [(s₂, 0), (s₃, 0)] [(s₄, 0), (s₆, 0)] [(s₁, 0), (s₂, 0)] x ₂ [(s₂, 0), (s₃, 0)] [(s₄, 0), (s₅, 0)] [(s₂, 0), (s₃, 0)]

x ₃ [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] [(s₃, 0), (s₅, 0)] x ₃ [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₆, 0)] [(s₄, 0), (s₆, 0)]

e ₂₉ x ₁ [(s₃, 0), (s₅, 0)] [(s₂, 0), (s₄, 0)] [(s₃, 0), (s₄, 0)] e ₃₀ x ₁ [(s₃, 0), (s₄, 0)] [(s₂, 0), (s₃, 0)] [(s₄, 0), (s₅, 0)]

x ₂ [(s₂, 0), (s₄, 0)] [(s₄, 0), (s₆, 0)] [(s₁, 0), (s₂, 0)] x ₂ [(s₃, 0), (s₄, 0)] [(s₄, 0), (s₅, 0)] [(s₂, 0), (s₃, 0)]

x ₃ [(s₄, 0), (s₆, 0)] [(s₅, 0), (s₆, 0)] [(s₄, 0), (s₅, 0)] x ₃ [(s₄, 0), (s₅, 0)] [(s₄, 0), (s₆, 0)] [(s₃, 0), (s₄, 0)]

Sample	u ₁	u ₂	u ₃	Sample	u ₁	u ₂	u ₃
e ₁	x ₁	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₁, 0), (s₂, 0)]	e ₂	x ₁	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]
	x ₂	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]		x ₂	[(s₃, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]
	x ₃	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]	[(s₁, 0), (s₂, 0)]		x ₃	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]
e ₃	x ₁	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]	e ₄	x ₁	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]
	x ₂	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₅, 0)]	[(s₄, 0), (s₅, 0)]		x ₂	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]
	x ₃	[(s₁, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]		x ₃	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]
e ₅	x ₁	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₁, 0), (s₂, 0)]	e ₆	x ₁	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]
	x ₂	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]		x ₂	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]
	x ₃	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₀, 0), (s₂, 0)]		x ₃	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]
... ... ...
e ₂₅	x ₁	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₃, 0), (s₄, 0)]	e ₂₆	x ₁	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]
	x ₂	[(s₂, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]		x ₂	[(s₃, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]
	x ₃	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]		x ₃	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]
e ₂₇	x ₁	[(s₃, 0), (s₅, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	e ₂₈	x ₁	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₃, 0), (s₅, 0)]
	x ₂	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₆, 0)]	[(s₁, 0), (s₂, 0)]		x ₂	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]
	x ₃	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₅, 0)]		x ₃	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]
e ₂₉	x ₁	[(s₃, 0), (s₅, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	e ₃₀	x ₁	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₅, 0)]
	x ₂	[(s₂, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]	[(s₁, 0), (s₂, 0)]		x ₂	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]
	x ₃	[(s₄, 0), (s₆, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]		x ₃	[(s₄, 0), (s₅, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]

Step 3. We selected each line of ${\hat{B}}_{k} (k = 1, 2, . . ., 30)$ as one group of evaluation values of attributes and transformed the matrix ${\hat{B}}_{k}$ into a vector. Then, matrix $\hat{B}$ was transformed into the new IV2TL decision matrix $\hat{C} = ({\tilde{c}}_{kj})_{30 \times 9}$ in Table 2.

Table 2

The IV2TL decision matrix $\hat{C}$

${\tilde{c}}_{kj}$	x ₁			x ₂			x ₃
	u ₁	u ₂	u ₃	u ₄	u ₅	u ₆	u ₇	u ₈	u ₉
e ₁	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₁, 0), (s₂, 0)]	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]	[(s₁, 0), (s₂, 0)]
e ₂	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]
e ₃	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₅, 0)]	[(s₄, 0), (s₅, 0)]	[(s₁, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]
e ₄	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₅, 0)]	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]
e ₅	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₁, 0), (s₂, 0)]	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₀, 0), (s₂, 0)]
e ₆	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]
... ... ...
e ₂₅	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]
e ₂₆	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₅, 0), (s₆, 0)]	[(s₃, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₁, 0), (s₂, 0)]	[(s₂, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]
e ₂₇	[(s₃, 0), (s₅, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₆, 0)]	[(s₁, 0), (s₂, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₅, 0)]
e ₂₈	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₃, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]	[(s₄, 0), (s₆, 0)]
e ₂₉	[(s₃, 0), (s₅, 0)]	[(s₂, 0), (s₄, 0)]	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₄, 0)]	[(s₄, 0), (s₆, 0)]	[(s₁, 0), (s₂, 0)]	[(s₄, 0), (s₆, 0)]	[(s₅, 0), (s₆, 0)]	[(s₄, 0), (s₅, 0)]
e ₃₀	[(s₃, 0), (s₄, 0)]	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₅, 0)]	[(s₃, 0), (s₄, 0)]	[(s₄, 0), (s₅, 0)]	[(s₂, 0), (s₃, 0)]	[(s₄, 0), (s₅, 0)]	[(s₄, 0), (s₆, 0)]	[(s₃, 0), (s₄, 0)]

Step 4. Based on Table 2, we transformed the IV2TL decision matrix $\hat{C}$ into the corresponding interval value $[{\underline{β}}_{kj}, {\bar{β}}_{kj}]$ of ${\tilde{c}}_{kj}$ by Equation (10). Then, the optimal solution μ^* of the optimization model could be obtained by Equation (15), that is, μ^* is equal to 0.731. Thus, the matrix $\hat{C}$ was finally converted into the matrix R = (r_kj) _30×9 with general real-numbers by $β_{kj}^{'} = 0.731 {\underline{β}}_{kj} + 0.269 {\bar{β}}_{kj}$ (see Table 3), which could be delivered to the DEA-DA model for the latter processing. The linguistic meaning of the number in Table 3 could be translated by means of Definition 2.

Table 3

The real-number decision matrix R

r _kj	x ₁			x ₂			x ₃
	u ₁	u ₂	u ₃	u ₄	u ₅	u ₆	u ₇	u ₈	u ₉
e ₁	2.5	2.3	1.3	4.3	5.3	4.3	3.3	3.5	1.3
e ₂	4.3	3.3	4.5	3.5	3.3	3.5	1.3	2.3	2.5
e ₃	5.3	4.3	5.3	4.3	3.5	4.3	1.5	2.5	3.3
e ₄	4.5	3.3	4.5	3.3	3.3	3.5	1.3	2.3	2.5
e ₅	2.5	2.3	1.3	4.5	4.5	4.5	3.3	4.3	0.5
e ₆	4.3	3.3	4.5	3.3	3.3	3.3	2.3	2.5	2.3
... ... ...
e ₂₅	4.3	2.3	3.3	2.5	4.3	2.3	5.3	4.3	3.3
e ₂₆	4.5	4.3	5.3	3.5	3.3	3.3	1.3	2.5	2.3
e ₂₇	3.5	2.5	3.3	2.3	4.5	1.3	5.3	4.3	3.5
e ₂₈	3.3	2.3	3.5	2.3	4.3	2.3	5.3	4.5	4.5
e ₂₉	3.5	2.5	3.3	2.5	4.5	1.3	4.5	5.3	4.3
e ₃₀	3.3	2.3	4.3	3.3	4.3	2.3	4.3	4.5	3.3

Step 5. We first divided all the experts into two sample groups, where samples from Group 1 made up G₁, and samples from Group 2 and Group 3 constituted G₂. From Fig. 2, if the experts intend to be clustered into h groups, h-1 phases should be processed. In term of the salary reform problem, two phases need to be completed to realize the expert clustering.

Step 6. For phase 1, we adopted the DEA-DA model to conduct a two-group clustering of G₁ and G₂ through iterative clustering based on the matrix R. From Fig. 3, the DEA-DA model will undergo an iteration process, which means that the elements in G₁ and G₂ should always change until the process converges. To sum up, G₁ and G₂ could be simply regarded as two containers whose elements would vary during the iteration. The results of the first phase of expert clustering are summarized in Tables 4 and 5.

Table 4

The variable results of the DEA-DA model at phase 1

Variable	Iteration 1		Iteration 2
	COI model	HO model	COI model
$λ_{1}^{+}$	0	0	0
$λ_{1}^{-}$	0.335	0	0.245
$λ_{2}^{+}$	0.078	0	0
$λ_{2}^{-}$	0	0.029	0
$λ_{3}^{+}$	0	0	0
$λ_{3}^{-}$	0.007	0	0.080
$λ_{4}^{+}$	0.098	0	0.056
$λ_{4}^{-}$	0	0.082	0
$λ_{5}^{+}$	0	0.285	0
$λ_{5}^{-}$	0.082	0	0.052
$λ_{6}^{+}$	0.041	0	0
$λ_{6}^{-}$	0	0	0.048
$λ_{7}^{+}$	0.009	0	0
$λ_{7}^{-}$	0	0	0.013
$λ_{8}^{+}$	0	0	0
$λ_{8}^{-}$	0.197	0.408	0.322
$λ_{9}^{+}$	0.153	0.196	0.184
$λ_{9}^{-}$	0	0	0
d ^*	–1.150		–1.946
s ^*	0.012		–0.035
c ^*		0

Table 5

The clustering results of the experts at phase 1

Sample	Iteration 1		Iteration 2
	COI model	HO model	COI model
e ₁	G ₁		G ₁
e ₂	G ₁		G ₁
e ₃	G ₂		G ₂
e ₄	Overlap	G ₁	G ₁
e ₅	G ₂		G ₂
e ₆	G ₂		G ₂
e ₇	G ₂		G ₂
e ₈	G ₂		G ₂
e ₉	G ₁		G ₁
e ₁₀	G ₁		G ₁
e ₁₁	Overlap	G ₂	G ₂
e ₁₂	G ₂		G ₂
e ₁₃	G ₂		G ₂
e ₁₄	G ₂		G ₂
e ₁₅	G ₂		G ₂
e ₁₆	Overlap	G ₂	G ₂
e ₁₇	Overlap	G ₂	G ₂
e ₁₈	G ₂		G ₂
e ₁₉	Overlap	G ₂	G ₂
e ₂₀	Overlap	G ₂	G ₂
e ₂₁	G ₂		G ₂
e ₂₂	G ₂		G ₂
e ₂₃	G ₂		G ₂
e ₂₄	G ₂		G ₂
e ₂₅	G ₂		G ₂
e ₂₆	G ₂		G ₂
e ₂₇	G ₂		G ₂
e ₂₈	Overlap	G ₂	G ₂
e ₂₉	G ₂		G ₂
e ₃₀	G ₂		G ₂

After using the COI model twice, the value of s^* came to –0.035, which indicated group memberships of all samples were determined by the DEA-DA model for two iterations. Where, e₁, e₂, e₄, e₉ and e₁₀ belonged to G₁, while the rest of samples belonged to G₂.

Step 7. We redistricted the samples from G₂. Since the DEA-DA model is only applied to the two-group clustering problem, the samples from G₂ should be divided into two new groups (G₁ and G₂). The main idea is still based on outside knowledge about experts, but slightly different from the original pre-grouping. We pick up any two groups, allocate their samples into G₁ and G₂, respectively, and then all the other remaining groups will be added to either G₁ or G₂. In this case, we selected the samples from Group 2 as G₁ and the samples from Group 3 as G₂, and all the remaining experts in other groups (here only Group 1) were added to G₁. Finally, e₃, e₅, e₆, e₇, e₈, e₁₁, e₁₂, e₁₃, e₁₄, e₁₅, e₁₆, e₁₇, e₁₈, e₁₉, and e₂₀ belonged to G₁, while e₂₁, e₂₂, e₂₃, e₂₄, e₂₅, e₂₆, e₂₇, e₂₈, e₂₉, and e₃₀ belonged to G₂.

Step 8. During phase 2, we conducted a clustering analysis of the new G₁ and G₂ by means of the DEA-DA model through iterative clustering. The results of the second phase of expert clustering are shown in Tables 6 and 7.

Table 6

The variable results of the DEA-DA model at phase 2

Variable	Iteration 1		Iteration 2
	COI model	HO model	COI model
$λ_{1}^{+}$	0	0	0
$λ_{1}^{-}$	0.120	0	0.274
$λ_{2}^{+}$	0.064	0.345	0.074
$λ_{2}^{-}$	0	0	0
$λ_{3}^{+}$	0	0	0
$λ_{3}^{-}$	0.098	0	0.005
$λ_{4}^{+}$	0.074	0	0.155
$λ_{4}^{-}$	0	0	0
$λ_{5}^{+}$	0.171	0	0
$λ_{5}^{-}$	0	0	0
$λ_{6}^{+}$	0	0.306	0
$λ_{6}^{-}$	0.027	0	0.019
$λ_{7}^{+}$	0	0.299	0.006
$λ_{7}^{-}$	0.094	0	0
$λ_{8}^{+}$	0	0.049	0
$λ_{8}^{-}$	0.169	0	0.255
$λ_{9}^{+}$	0.184	0	0.168
$λ_{9}^{-}$	0	0	0
d ^*	–0.221		–0.957
s ^*	0.037		–0.005
c ^*		3.279

Table 7

The clustering results of the experts at phase 2

Sample	Iteration 1		Iteration 2
	COI model	HO model	COI model
e ₃	G ₁		G ₁
e ₅	Overlap	G ₁	G ₁
e ₆	G ₂		G ₂
e ₇	G ₁		G ₁
e ₈	G ₂		G ₂
e ₁₁	G ₁		G ₁
e ₁₂	G ₂		G ₂
e ₁₃	G ₂		G ₂
e ₁₄	G ₂		G ₂
e ₁₅	G ₁		G ₁
e ₁₆	G ₁		G ₁
e ₁₇	G ₁		G ₁
e ₁₈	Overlap	G ₁	G ₁
e ₁₉	Overlap	G ₁	G ₁
e ₂₀	G ₁		G ₁
e ₂₁	G ₂		G ₂
e ₂₂	Overlap	G ₂	G ₂
e ₂₃	Overlap	G ₂	G ₂
e ₂₄	Overlap	G ₂	G ₂
e ₂₅	G ₂		G ₂
e ₂₆	Overlap	G ₂	G ₂
e ₂₇	Overlap	G ₂	G ₂
e ₂₈	Overlap	G ₂	G ₂
e ₂₉	Overlap	G ₂	G ₂
e ₃₀	G ₂		G ₂

After using the COI model twice, the value of s^* came to –0.005. It indicates that the group memberships of all samples were determined after adopting the DEA-DA model for the first time. Where, e₃, e₅, e₇, e₁₁, e₁₅, e₁₆, e₁₇, e₁₈, e₁₉ and e₂₀ belonged to G₁, while e₆, e₈, e₁₂, e₁₃, e₁₄, e₂₁, e₂₂, e₂₃, e₂₄, e₂₅, e₂₆, e₂₇, e₂₈, e₂₉ and e₃₀ belonged to G₂.

Step 9. We aggregated the clustering results of the two phases and obtained the expert clusters in the LGDM problem under the interval-valued linguistic environment (see Table 8).

Table 8

The final clustering result for experts in the LGDM problem

Group	Sample
Group 1	e₁, e₂, e₄, e₉, e₁₀
Group 2	e₃, e₅, e₇, e₁₁, e₁₅, e₁₆, e₁₇, e₁₈, e₁₉, e₂₀
Group 3	e₆, e₈, e₁₂, e₁₃, e₁₄, e₂₁, e₂₂, e₂₃, e₂₄, e₂₅, e₂₆, e₂₇,
e₂₈, e₂₉, e₃₀

Step 10. End.

It could be seen that the final clustering result was greatly different from the pre-classification. Overall, the scale of the groups varied with the order (Group 3 > Group 2 > Group 1). Only five administrative staff still stayed in the original group, while most of them were distributed to two other groups. The retiree staff group absorbed 5 members from other groups, resulting in a total number of 15 members.

4.2 Comparison with the maximal tree clustering method

Currently, research into the LGDM problems under the interval-valued linguistic environment is still at the early stage, and the relevant achievements are few [11]. Moreover, when the value of μ_g is determined, the DEA-DA model with interval values is transformed from that with the IV2TL decision matrix into that with the general real-numbers. Based on this, this paper will compare the DEA-DA model with the maximal tree clustering method under the 2TL environment proposed by Yu and Fan [38]. The reason for selecting this method for comparison is that there is lack of study into expert clustering in the multi-attribute decision-making problems under the 2TL environment. Although the method selected mainly focused on the alternative clustering, similarly the algorithm can be also applied to clustering the experts. Therefore, we will briefly provide the procedure of the maximal tree clustering method to cluster experts based on the illustrative example as follows.

Step 1. Based on the real-number matrix R in Table 3, we calculated the similarity coefficient matrix R′ by Equation (18).

$\begin{matrix} r_{ki}^{'} = \frac{| \sum_{j = 1}^{9} (r_{kj}^{'} - \bar{r_{k}^{'}}) (r_{ij}^{'} - \bar{r_{i}^{'}}) |}{(\sum_{j = 1}^{9} (r_{kj}^{'} - \bar{r_{k}^{'}})^{2} \sum_{j = 1}^{9} (r_{ij}^{'} - \bar{r_{i}^{'}})^{2})^{0.5}} \\ (k, i = 1, 2, . . ., 30), \end{matrix}$ (18) where, $\bar{r_{k}^{'}} = \frac{1}{9} \sum_{j = 1}^{9} r_{kj}^{'}$ and $\bar{r_{i}^{'}} = \frac{1}{9} \sum_{j = 1}^{9} r_{ij}^{'}$ .

Considering there was a great deal of information in the matrix R′, the results of the similarity coefficient of all the experts were partly shown in Table 9.

Table 9

The similarity coefficient matrix

	e ₁	e ₂	e ₃	e ₄	e ₅	e ₆	e ₇	e ₈		e ₂₃	e ₂₄	e ₂₅	e ₂₆	e ₂₇	e ₂₈	e ₂₉	e ₃₀
e ₁	1.000	0.150	0.246	0.186	0.953	0.174	0.106	0.285		0.906	0.906	0.100	0.273	0.002	0.101	0.053	0.074
e ₂	0.150	1.000	0.983	0.992	0.143	0.930	0.588	0.909		0.110	0.012	0.494	0.945	0.563	0.661	0.537	0.289
e ₃	0.246	0.983	1.000	0.979	0.230	0.898	0.676	0.896		0.228	0.084	0.565	0.941	0.630	0.722	0.605	0.422
e ₄	0.186	0.992	0.979	1.000	0.181	0.941	0.549	0.903		0.135	0.038	0.426	0.941	0.513	0.612	0.495	0.280
e ₅	0.953	0.143	0.230	0.181	1.000	0.139	0.036	0.205	... ...	0.876	0.958	0.059	0.229	0.080	0.179	0.105	0.065
e ₆	0.174	0.930	0.898	0.941	0.139	1.000	0.435	0.890		0.045	0.008	0.238	0.936	0.339	0.508	0.414	0.097
e ₇	0.106	0.588	0.676	0.549	0.036	0.435	1.000	0.582		0.217	0.084	0.855	0.571	0.938	0.944	0.876	0.791
e ₈	0.285	0.909	0.896	0.903	0.205	0.890	0.582	1.000		0.300	0.092	0.490	0.979	0.489	0.617	0.431	0.204
				... ...									... ...
e ₂₃	0.906	0.110	0.228	0.135	0.876	0.045	0.217	0.300		1.000	0.888	0.311	0.272	0.126	0.056	0.009	0.281
e ₂₄	0.906	0.012	0.084	0.038	0.958	0.008	0.084	0.092		0.888	1.000	0.044	0.102	0.106	0.225	0.111	0.135
e ₂₅	0.100	0.494	0.565	0.426	0.059	0.238	0.855	0.490		0.311	0.044	1.000	0.466	0.932	0.858	0.819	0.774
e ₂₆	0.273	0.945	0.941	0.941	0.229	0.936	0.571	0.979		0.272	0.102	0.466	1.000	0.474	0.638	0.466	0.249
e ₂₇	0.002	0.563	0.630	0.513	0.080	0.339	0.938	0.489	... ...	0.126	0.106	0.932	0.474	1.000	0.913	0.921	0.817
e ₂₈	0.101	0.661	0.722	0.612	0.179	0.508	0.944	0.617		0.056	0.225	0.858	0.638	0.913	1.000	0.898	0.778
e ₂₉	0.053	0.537	0.605	0.495	0.105	0.414	0.876	0.431		0.009	0.111	0.819	0.466	0.921	0.898	1.000	0.822
e ₃₀	0.074	0.289	0.422	0.280	0.065	0.097	0.791	0.204		0.281	0.135	0.774	0.249	0.817	0.778	0.822	1.000

Step 2. We provided the maximal tree. If each expert was regarded as a vertex and all the vertices were connected, we can obtain the fuzzy weighted graph (see related concepts in Yu and Fan [38]). The similarity coefficient r′_ki was the weight corresponding to each side. If selecting any circuit from the fuzzy weighted graph, we removed the side with the smallest weight. Based on the rule, we obtained the maximal tree until there were no circuits in the fuzzy weighted graph. The maximal tree could be seen in Fig. 5.

Fig. 5

The results of the maximal tree I.

Step 3. We obtained the clustering results. The maximal tree will be cut based on the cutting level λ. When λ ≥ r′_ki (λ ∈ [0, 1]), remove the side corresponding to r_ij . The remaining interconnected objects which are interconnected will be divided into a group, and then one type of clustering results is obtained. Since all the experts were expected to be divided into three groups, we assumed 0.751 ≤ λ < 0.907 and obtained the clustering results by means of the maximal tree clustering method (see Table 10).

Table 10

The clustering result for experts using the maximal tree clustering method

Group	Sample
Group 1	e₁, e₅, e₁₃, e₁₄, e₁₆, e₂₀, e₂₁, e₂₂, e₂₃, e₂₄
Group 2	e₂, e₃, e₄, e₆, e₈, e₉, e₁₁, e₁₈, e₁₉, e₂₆
Group 3	e₇, e₁₀, e₁₂, e₁₅, e₁₇, e₂₅, e₂₇, e₂₈, e₂₉, e₃₀

Step 4. End.

Comparing Tables 8 and 10, it could be seen that the clustering results by means of the two methods are completely different. The IV2TL representation model is a relatively new concept, and research into the GDM problems under the environment is still at an early stage. Meanwhile, regarding the clustering method proposed in this paper, there are nearly no directly comparable methods at present. As such, we only could make a comparison after the IV2TL decision information is transformed into the general real numbers using the optimization model. We attribute this as the reason for large differences in the clustering results. In the following, we would give some advantages of our proposed model over the one based on the maximal tree clustering method.

Firstly, more data information could be processed by the method proposed in this paper. The principle of the maximal tree clustering method is that the similarity coefficient as a relative index is the basis for clustering all the samples. From Equation (18), the calculation of similarity degree is based on the deviation between the value of each attribute and the mean of all the attributes in a sample. Thus, this approach is performed based on a comprehensive similarity index, and not directly on the original data set. In other words, some information in the original data set is neglected. Conversely, the proposed DEA-DA model through iterative clustering can explore the original data set fully, since it constructs several dividing lines (or hyperplanes) for the clustering through the original data of all the samples. It not only maintains the initial differences of all the attributes, but also has iterative clustering at each phase, which makes the clustering results stable. It doesn’t mean the maximal tree clustering method is not right, but what to stress is that the DEA-DA model used in this paper is nearer to the basic data resource (the original decision matrix provided by the experts).

Secondly, the guidance for selecting the range of the threshold is not clear for the maximal tree clustering method. To provide a brief description, we take five samples as an example to show the results of the maximal tree (see Fig. 6). When 0.62 ≤ λ < 0.75, all the samples are divided into groups ({e₃}, {e₅}, {e₁,e₂,e₄}); when 0.75 ≤ λ < 0.84, all the samples are divided into groups ({e₃}, {e₅}, {e₂}, {e₁,e₄}). It shows that different thresholds lead to the different clustering results, but how to choose the threshold range for this method is not so straightforward. However, we can pre-divide the experts into several groups based on some criteria, and then obtain the clustering groups with the DEA-DA model more easily without determining the threshold value.

Fig. 6

The results of the maximal tree II.

5 Conclusions

To address the expert clustering issue in the LGDM problem under the interval-valued linguistic environment, we first assume some information about the experts is known, which could be regarded as the basis for pre-classification. Then, combining outside knowledge about experts and practical preference information, this paper proposed a hybrid method including the optimization model and the DEA-DA model with iterative clustering to cluster experts in the LGDM problem. The feasibility of the clustering method was illustrated by the example of the salary reform for professors at a university. In comparison with the maximal tree clustering method, two major advantages of the proposed method were given to expert clustering in the LGDM problem: (a) more information from the original data resource could be utilized; (b) it is not necessary to determine the threshold to obtain the clusters since importing groups divided by outside knowledge makes expert clustering much easier and more reasonable in an application scenario.

The main contributions of the paper could be summarized as follows: (a) we propose a hybrid method that combines the outside knowledge about experts with practical preference information, which can improve the clustering efficiency and distribute the distinct members in groups to a more reasonable group; (b) the IV2TL is integrated into the DEA-DA model after transformed by the optimization model, which extends the application area of the traditional DEA-DA model; (c) clustering experts in the LGDM problem can reasonably lay the foundation for experts’ weight determination and information aggregation based on the final clusters.

The key point of the research lies in a reasonable pre-classification for the experts, which should be guided by theoretical bases or practical experience. Here, the term “reasonable” doesn’t mean “unique”. Conversely, there should be massive reasonable criteria used for the pre-classification, and different ways of pre-classifications will lead to the distinct clusters. Once we decide to pre-classify the experts into several groups according to some criteria (say occupation), it means that we are interested in the opinion characteristics based on occupation. Similarly, we could also be interested in the opinion features from the age or salary perspective or from multiple perspectives (to pre-classify more groups). Future study will focus on the relationship between clustering results and the pattern of the pre-classified groups. Clustering experts under the interval-valued linguistic environment is the primary focus in this study. Based on the separated clusters, future study can consider how to reach a consensus within each cluster and among different clusters in order to aggregate decision information in the LGDM problem.

Footnotes

Acknowledgments

We gratefully acknowledge the financial support of the National Natural Science Foundation of China (Grant No. 71572123 and No. 71722004).

References

, Wen

and Zhang

, A two-stage consensus method for large-scale multi-attribute group decision making with an application to earthquake shelter selection, Computers & Industrial Engineering 116 (2018), 113–129.

Liu

, Shen

, Zhang

, Chen

and Wang

, An interval-valued intuitionistic fuzzy principal component analysis model-based method for complex multi-attribute large-group decision-making, European Journal Operational Research 245 (2015), 209–225.

Liu

, Chen

, Shen

, Sun

and Xu

, A complex multi-attribute large-group decision making method based on the interval-valued intuitionistic fuzzy principal component analysis model, Soft Computing 18 (2014), 2149–2160.

Liu

, Shen

, Chen

, Sun

and Chen

, A complex multi-attribute large-group PLS decision-making method in the interval-valued intuitionistic fuzzy environment, Applied Mathematical Modeling 38 (2014), 4512–4527.

Chen

X.H.

and Liu

, Improved clustering algorithm and its application in complex huge group decision-making, Systems Engineering and Electronics 28 (2006), 1695–1699.

Zadeh

L.A.

, Fuzzy sets, Information Control 8 (1965), 338–353.

Atanassov

K.T.

, Intuitionistic fuzzy sets, Fuzzy Sets and Systems 20 (1986), 87–96.

Atanassov

and Gargov

, Interval valued intuitionistic fuzzy sets, Fuzzy Sets and Systems 31 (1989), 343–349.

Zadeh

L.A.

, The concept of a linguistic variable and its application to approximate reasoning-I, Information Sciences 8 (1975), 199–249.

10.

Zhang

, Liang

and Zhang

, Reaching a consensus with minimum adjustment in MAGDM with hesitant fuzzy linguistic term sets, Information Fusion 42 (2018), 12–23.

11.

Liu

, Shen

, Chen

and Wang

, A two-layer weight determination method for complex multi-attribute large-group decision-making experts in a linguistic environment, Information Fusion 23 (2015), 156–165.

12.

Liu

, Xu

and Herrera

, Consensus model for large-scale group decision making based on fuzzy preference relation with self-confidence: Detecting and managing overconfidence behaviors, Information Fusion 52 (2019), 245–256.

13.

Herrera

and Martínez

, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Transations on Fuzzy Systems 8 (2000), 746–752.

14.

Martínez

and Herrera

, An overview on the 2-tuple linguistic model for computing with words in decision making: Extensions, applications and challenges, Information Sciences 207 (2012), 1–18.

15.

Lin

, Lan

J.B.

and Lin

Y.H.

, Multi-attribute group decision-making method based on the aggregation operators of interval 2-tuple linguistic information, Journal of Jilin Normormal University 1 (2009), 5–9.

16.

Ding

R.X.

, Wang

, Shang

, Liu

and Herrera

, Sparse representation-based intuitionistic fuzzy clustering approach to find the group intra-relations and group leaders for large-scale decision making, IEEE Transactions on Fuzzy Systems 27 (2019), 559–573.

17.

Ding

R.X.

, Wang

, Shang

and Herrera

, Social network analysis-based conflict relationship investigation and conflict degree-based consensus reaching process for large scale decision making using sparse representation, Information Fusion 50 (2019), 251–272.

18.

Zhang

, Dong

, Chiclana

and Yu

, Consensus efficiency in group decision making: A comprehensive comparative study and its optimal design, European Journal of Operational Research 275 (2019), 580–598.

19.

X.H.

and Chen.

X.H.

, A clustering algorithm in group decis ion-making, Proceedings of the 8th Pacific-Asia International Conference of Information Systems, (2004), 10–20.

20.

Shi

, Wang

, Palomares

, Guo

and Ding

R.X.

, A novel consensus model for multi-attribute large-scale group decision making based on comprehensive behavior classification and adaptive weight updating, Knowledge-Based Systems 158 (2018), 196–208.

21.

Zhang

, Dong

and Herrera-Viedma

, Group decision making with heterogeneous preference structures: An automatic mechanism to support consensus reaching, Group Decision and Negotiation 28 (2019), 585–617.

22.

Liu

, Zhou

, Ding

R.X.

, Ni

and Herrera

, Defective alternatives detection-based multi-attribute intuitionistic fuzzy large-scale decision making model, Knowledge-Based Systems (2019), 104962.

23.

Liu

, Xu

, Montes

, Ding

R.X.

and Herrera

, Alternative ranking-based clustering and reliability index-based consensus reaching process for hesitant fuzzy large scale group decision making, IEEE Transations on Fuzzy Systems 27 (2019), 159–171.

24.

Shen

, Pedrycz

and Wang

, Clustering homogeneous granular data: Formation and evaluation, IEEE Transantions on Cybernetics 49 (2019), 1391–1402.

25.

Shen

and Pedrycz

, Collaborative fuzzy clustering algorithm: Some refinements, International Journal of Approximate Reasoning 86 (2017), 41–61.

26.

Wang

, Xu

, Huang

and Cai

, A linguistic large group decision making method based on the cloud model, IEEE Transations on Fuzzy Systems 26 (2018), 3314–3326.

27.

, Yin

and Chen

, A large-group emergency risk decision method based on data mining of public attribute preferences, Knowledge-Based Systems 163 (2019), 495–509.

28.

Xiao

, Wang

and Zhang

, Managing personalized individual semantics and consensus in linguistic distribution large-scale group decision making, Information Fusion 53 (2020), 20–34.

29.

Zahn

C.T.

, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Transactions on Computers C–20 (1971), 68–86.

30.

Papamichail

G.P.

and Papamichail

D.P.

, The k-means range algorithm for personalized data clustering in e-commerce, European Journal of Operational Research 177 (2007), 1400–1408.

31.

Herrera

, Martínez

and Sánchez

P.J.

, Managing non-homogeneous information in group decision making, European Journal of Operational Research 166 (2005), 115–132.

32.

Dong

, Hong

W.C.

, Xu

and Yu

, Selecting the individual numerical scale and prioritization method in the analytic hierarchy process: A 2-Tuple fuzzy linguistic approach, IEEE Transactions on Fuzzy Systems 19 (2011), 13–25.

33.

Sueyoshi

, Mixed integer programming approach of extended DEA-discriminant analysis, European Journal of Operational Research 152 (2004), 45–55.

34.

and Da

, A least deviation method to obtain a priority vector of a fuzzy preference relation, European Journal of Operational Research 164 (2005), 206–216.

35.

, Chen

and Wu

, Clustering algorithm for intuitionistic fuzzy sets, Information Sciences 178 (2008), 3775–3790.

36.

Y.J.

and Da

Q.L.

, Standard and mean deviation methods for linguistic group decision making and their applications, Expert Systems with Applications 37 (2010), 5905–5912.

37.

and Chen

, The maximizing deviation method for group multiple attribute decision making under linguistic environment, Fuzzy Sets Systems 158 (2007), 1608–1617.

38.

C.H.

and Fan

Z.P.

, Maximal tree clustering method based on dyadic semantic information processing, Systems Engeering and Electronics 28 (2006), 1519–1522.