A sample survey based method on transforming linguistic terms into fuzzy sets and the application in MADM problems

Abstract

Sometimes, people are prone to describe objects with natural languages including words and sentences, so it is useful to compute with words. However, it is obvious that words mean different to different people. Hence, linguistic information must be transformed into numerical forms before aggregation. To deal with this problem, we propose a novel transforming method based on sample survey. Firstly, through survey questionnaire, we collect the numerical data corresponding to each word. Then, we preprocess the collected data to remove those invalid or unreasonable points. Meanwhile, fuzzy sets are valid to present uncertainty, and the triangular fuzzy sets are the simplest one, which can capture the similarity and dissimilarity to the same word from different people. Therefore, based on means and deviations of the remaining data, we ultimately encode the linguistic terms into triangular fuzzy sets and establish the codebooks. The feasibility and effectiveness are illustrated through an application in the MADM problem about shopping online recommendation from real life.

Keywords

Sample survey fuzzy numbers MADM online shopping recommendation

1 Introduction

Computing with words (CWW) is a methodology in which the objects of computation are words and propositions drawn from a natural language [1]. More often than not, it is necessary when the available information is too imprecise to justify the use of numbers, and when there is a tolerance for imprecision which can be exploited to obtain tractability, robustness, low solution cost and better rapport with reality [2]. Not only that, even in coming years, computing with words is likely to evolve into a basic methodology in its own right with wide-ranging ramifications on both basic and applied levels [2].

In day-to-day activities, some problems present qualitative aspects that are complex to assess by means of precise numerical values [3]. Whereas, how to transform linguistic variables into the form of numerical values, which can be processed directly by computers, is another challenge to human beings. Fortunately, Zadeh [4] proposes the concept of fuzzy sets which consider the uncertainty in evaluation and judgment. And the use of the fuzzy linguistic approach [5] has provided very good results [3]. Since fuzzy sets can present the complexity of natural languages, then words in the computing with words (CWW) paradigm may be modeled by fuzzy sets [4, 6]. In recent years, some related research has been contributed. Linguistic term set [7 –10] and fuzzy information aggregation methods [11, 12] are studied. And Computing with words in decision making is investigated in depth [13 –17]. For example, Liu et al. encode words into interval type-2 fuzzy sets using an interval approach [18]. Then, Wu et al. enhance the interval approach for encoding words into interval type-2 fuzzy sets and discussed its convergence analysis [6].

However, in the literatures above, the linguistic variables are transformed into numerical values in some regular ways or in some formula. In fact, through sample survey, it is shown that people correspond the numerical values to linguistic terms irregularly, i.e., the differences of adjacent words from the same linguistic term set might be distinct [19]. For instance, the difference between “very good” and “good” might be less than the difference between “good” and “common”. Therefore, we present a sample survey based method to encode words into fuzzy sets. Specially, through sampling survey, we will establish a codebook, which maps each word in the linguistic term set S to a triangular fuzzy number, which is simple but valid in capturing the uncertainty from words. In the process of data collection, we have found that when expressing a linguistic variable, majority of subjects prefer a single point to an interval. So the method is called point approach (PA), which consists of two parts, the data part and the fuzzy set (FS) part. In the data part, the point data corresponding to each linguistic term are collected from a group of subjects respectively, then, the data are preprocessed with three steps. The first step is bad data processing, through which nonsensical data are removed. The second step is reasonable point processing, which discards those data points that are smaller/more in numbers, but superior/inferior to some others on the semantic. The third step is tolerance limit processing, which removes the numbers that are extremely smaller/more than others. After that, the remaining data are valid and reasonable. Ultimately, some data statistics are computed for the surviving points. In the FS part, the parameters of the triangular fuzzy number are determined using the data statistics, and the derived triangular fuzzy number is mapped to the linguistic term.

To illustrate the feasibility, we provide a real multi-attribute decision making (MADM) problem, in which both the weights and the assessments of the attributes are measured in the form of linguistic terms. Based on the established codebook aforementioned, we can transform the linguistic information into numerical values. Then, the alternatives are ranked according to the rules of fuzzy sets. Ultimately, the decision maker can make the optimal choice.

The remainder of the paper is organized as follows. In Section 2, some basic concepts are reviewed, which include fuzzy sets, triangular fuzzy numbers and the stags of sample survey. In Section 3, we design a survey questionnaire, through which we collect the numerical data which are corresponding to each word. In Section 4, we provide the detail steps of processing the collected data. In Section 5, the words are encoded into triangular fuzzy sets and the codebooks are established. In Section 6, an application in shopping online is provided to illustrate the availability of the proposed method. Finally, Section 7 draws conclusions and discusses some future researches.

2 Preliminaries

In this section, we introduce some basic concepts, operation rules and methods to be used.

2.1 Fuzzy sets and triangular fuzzy numbers

Definition 1. [20] It is a fuzzy set defined by a membership function $μ_{\tilde{A}} (x)$ on the universe of real numbers X such that each element x in X is assigned in the interval [0, 1]. The numeric value $μ_{\tilde{A}} (x)$ stands for the grade of membership of x in $\tilde{A}$ .

Definition 2. [20] A triangular fuzzy number $\tilde{a}$ is defined by a triplet $\tilde{a} = (a^{L}, a^{M}, a^{U})$ with membership given by: $μ_{\tilde{A}} (x) = {\begin{matrix} (x - a^{L}) / (a^{M} - a^{L}) a^{L} \leq x \leq a^{M} \\ (a^{U} - x) / (a^{U} - a^{M}) a^{M} \leq x \leq a^{U} \\ 0 x \in (- \infty, a^{L}) \cup (a^{U}, + \infty) \end{matrix}$ (1)

2.2 The operation rules and comparison method of the triangular fuzzy numbers

2.2.1 The operation rules of the triangular fuzzy numbers

Suppose that $\tilde{a} = (a^{L}, a^{M}, a^{U})$ and $\tilde{b} = (b^{L}, b^{M}, b^{U})$ are two triangular fuzzy numbers. According the extension principle of fuzzy sets, the addition result of two triangular fuzzy numbers can still keep as a triangular fuzzy number. However, the multiplication and division of two triangular fuzzy numbers do not produce triangular fuzzy numbers, but only the fuzzy numbers in triangular shape [21], and also require complicated computations at every α-cuts. In the present paper, for simplicity, we will adopt the following approximation formulas as [21]. The operation rules are shown as follows [20].

Addition rules: $\tilde{a} + \tilde{b} = (a^{L} + b^{L}, a^{M} + b^{M}, a^{U} + b^{U}),$

Subtraction rules: $\tilde{a} - \tilde{b} = (a^{L} - b^{U}, a^{M} - b^{M}, a^{U} - b^{L}),$

Multiplicative rules: $\tilde{a} \times \tilde{b} ≅ (a^{L} \times b^{L}, a^{M} \times b^{M}, a^{U} - b^{U}),$

Inverse rules: ${(\tilde{a})}^{- 1} ≅ {(a^{L}, a^{M}, a^{U})}^{- 1} = (\frac{1}{a^{U}}, \frac{1}{a^{M}}, \frac{1}{a^{L}}),$

Division rules: $\tilde{a} \div \tilde{b} ≅ (a^{L} \div b^{U}, a^{M} \div b^{M}, a^{U} \div b^{L}) .$

In the formulas above, all the elements in $\tilde{a}$ and $\tilde{b}$ are positive.

2.2.2 The comparison method of the triangular fuzzy numbers

Suppose that $\tilde{a} = (a^{L}, a^{M}, a^{U})$ and $\tilde{b} = (b^{L}, b^{M}, b^{U})$ are two triangular fuzzy numbers. The defuzzified value of $\tilde{a}$ is defined as: $m (\tilde{a}) = \frac{a^{L} + 2 a^{M} + a^{U}}{4}$ . Similarly, we have the defuzzified value $m (\tilde{b}) = \frac{b^{L} + 2 b^{M} + b^{U}}{4}$ .

The comparison method [20] is defined as: If $m (\tilde{a}) \geq m (\tilde{b})$ , then $\tilde{a} \geq \tilde{b}$ .

2.3 Linguistic term sets

Zadeh [5] presented the concept of linguistic variables whose values are words or sentences in a natural language instead of numbers. The fuzzy linguistic approach represents qualitative suspects as linguistic values, which has successfully been applied to deal with decision making problems [22]. It is very important to choose appropriate way to describe linguistic term sets and their semantics.

On one side, the same word does not always signify the same thing to different persons. On the other side, there must be some similarity in the same word. That means the words include both similarity and dissimilarity to different people synchronously. For instance, in a scale of 0–10, “good” may be corresponding to “7.5” to an optimist, while corresponding to “8.2” to a pessimist. However, numerical values mapping to the word “good” from different people should be near to a certain number (for example, “8”).

In the literatures, linguistic variables are mapped into numerical values with some formulas in a regular way [23, 24]. For example, Rodriguez demonstrated a set of seven linguistic terms by means of an ordered structure approach. It is shown in Fig. 1 as follows: S = { s₀, s₁, s₂, s₃, s₄, s₅, s₆ } = {nothing, very low, low, medium, high, veryhigh, perfect}.

Fig.1

The membership functions of the linguistic terms.

In Fig. 1, we can conclude that the degrees of dispersion of all the terms are the same, and the membership functions of the linguistic terms change uniformly from the lowest one “s₀” to the highest one “s₆”, i.e., the distinction between central points of adjoining linguistic terms is identical. However, it does not conform to the actual rating in decision making. When mapping the linguistic terms to numerical values, majority of people may appear evident diversity in the degrees of dispersion from different terms, and the membership functions of the linguistic terms change irregularly.

Therefore, in order to transform linguistic terms into numerical values more accurately, we present a novel method based on sample survey.

3 Data collecting

In order to translate the linguistic information into numerical values, we can establish a codebook, which maps words to fuzzy sets. Firstly, we design a sample survey through the questionnaire and print 40 copies, then distribute them to 40 subjects randomly. We did the sample survey at a classroom and a dining room in Southeast University, Nanjing city, China. The initial data are collected from 40 subjects After the data are collected, the codebook S can be established.

3.1 Survey design and questionnaire

The objective of the survey is to collect the point data corresponding to each word in the linguistic term sets. The subjects of the survey are undergraduates in Southeast University. We design a proper questionnaire shown in Appendix A and distribute it to 40 subjects.

It is notable that, in order to avoid the effect between the linguistic terms as far as possible, we arrange them randomly as far as possible.

3.2 Data collecting

At first, the scale of 0–10 is established and a vocabulary of words is created that is convinced to cover the entire scale, then the methodology for collecting point data from a group of subjects consists of two steps: (1) randomize the words, and (2) survey a group of subjects to provide point data for the words on the scale.

Words need to be randomized so that subjects will not correlate their points from one word to the next. For each word in the application-dependent encoding vocabulary, a group of n subjects are asked the following question:

On a scale of 0–10, what is the number that you associate to the word ____ ?

It is important to note that the number can be decimals between two integers. For example, the number of “Very good” might be 8.9.

Thus n data points a⁽ⁱ⁾ (i = 1, 2, …, n) are collected from these subjects for the word through a survey questionnaire shown above. They are then preprocessed in the following step.

4 Data processing

In this section, we provide the detail data processing, which can remove those unreliable points.

4.1 Initial data display and rearrangement

After the data are collected through the survey questionnaire, they must be tabulated for statistical analysis. We construct a table for each group.

In the following, for convenience to present, we take the first group, for example, other two group data can be processed similarly.

Firstly, the data are rearranged from worst to best. After that, for the first group, the linguistic terms are respectively: Extremely bad, Very bad, Bad, Common, Good, Very good, Extremely good. They are processed and analyzed in following steps.

4.2 Data processing

Processing the n points a⁽ⁱ⁾ (i = 1, 2, …, n) consists of three stages, and details are provided for each of these stages in the following.

(1) Bad data processing. Such processing removes nonsensical results (some subjects do not take a survey seriously and so provide useless results). Only the data with 0 ≤ a⁽ⁱ⁾ ≤ 10 (i = 1, 2, …, n) are accepted; others are rejected.

After bad data processing, there will be n′ ≤ n remaining data points.

(2) Reasonable point processing: Reasonable point processing is performed on the remaining n′ a⁽ⁱ⁾, i.e. for the two words s_j and s_k (j < k), only the points satisfying $a_{j}^{(i)} \leq a_{k}^{(i)}$ are kept. Others are rejected. For example, if some subject associates the words “9” to the word “good”, however, “8.2” to “very good”, obviously, it is not reasonable. So both the two numbers should be removed.

After reasonable point processing, we discard those data points that are exceeding some others in linguistic essence, but smaller in numbers. And there will be n″ ≤ n′ remaining points shown in Appendix A, for which the following data statistics are then computed: m_a and s_a (sample mean and standard deviation of the n″ remaining points).

(3) Tolerance limit processing. Note that the point data have been collected from more than 30 subjects who are chosen independently, so we can assume the statistics to be approximately normal distributed. Then, tolerance limit processing is performed on the remaining n″ points a⁽ⁱ⁾, and only points satisfying a⁽ⁱ⁾ɛ [m_a - ks_a, m_a + ks_a] are kept, where the tolerance factor k is determined so that one can assert with 100 (1 - γ)% confidence that the given limits contain at least the proportion 1 - α of the measurements [25]. For example, when data have been collected from 30 subjects, using k = 2.549 means one can be 95% confident that 95% of the 30 data fall in the interval [m_a - ks_a, m_a + ks_a]. The parameter k decreases with the increase of the remaining data number. Meanwhile, the remaining data is more than 30, so the parameter k is approximate to 2. For convenience, we assume k = 2 in this paper.

Finally, after the step of tolerance limit processing there will be m ≤ n″ remaining data points.

Obviously, through data processing, the surviving data are reliable and valid, thus they can be applied to encode words into numerical values.

5 Encoding words into triangular fuzzy sets—codebooks

In this section, we establish the codebooks, which map linguistic term sets to triangular fuzzy sets.

5.1 The statistics computing

The surviving m data are analyzed sequentially. For each word, we compute the following statistics: m_a and s_a (sample mean and standard deviation of the m remaining points), that are shown as Table 1. For the term “Very good”, we have collected 39 numbers, which are shown in Appendix A. Nevertheless, there might be some unreasonable points. For instance, the subject 7 corresponds “10” to “Good ”, but “8” to “Very good”, which are shown in Appendix A, obviously, it is not reasonable, so both of the numbers are removed. After that, for the remaining 35 numbers, we obtain the mean $m_{a}^{'} = 8.9$ and the standard deviation $s_{a}^{'} = 0.55$ , based on which we construct the tolerance interval [7.8, 10]. There is one number “7.5” out of the interval, so it is discarded. Finally, 34 numbers are remaining and the mean m_a = 8.94 and the standard deviation s_a = 0.5. The detail process and the result are shown in Table 1.

Table 1
The first group remaining data points and their statistics

Data part: Processing

Terms Step 1 n′ Step 2 n″ Mean $m_{a}^{'}$ Std $s_{a}^{'}$ Tolerance interval Step 3 m Mean m_a Std s_a

Extremely bad 38 33 0.94 1.22 [0, 3.38] 32 0.78 0.82

Very bad 39 33 2.15 1 [0.15, 4.15] 31 2.28 0.86

Bad 39 38 3.37 1.27 [0.83, 5.91] 36 3.56 1.01

Common 39 39 5.55 0.82 [3.91, 7.19] 38 5.49 0.72

Good 40 36 7.84 0.93 [5.98, 9.7] 35 7.78 0.86

Very good 39 35 8.9 0.55 [7.8, 10] 34 8.94 0.5

Extremely good 39 39 9.76 0.44 [8.88, 10] 38 9.8 0.33

	Data part: Processing
Extremely bad	38	33	0.94	1.22	[0, 3.38]	32	0.78	0.82
Very bad	39	33	2.15	1	[0.15, 4.15]	31	2.28	0.86
Bad	39	38	3.37	1.27	[0.83, 5.91]	36	3.56	1.01
Common	39	39	5.55	0.82	[3.91, 7.19]	38	5.49	0.72
Good	40	36	7.84	0.93	[5.98, 9.7]	35	7.78	0.86
Very good	39	35	8.9	0.55	[7.8, 10]	34	8.94	0.5
Extremely good	39	39	9.76	0.44	[8.88, 10]	38	9.8	0.33

Intuitively, the mean denotes the central point of numbers corresponding to some term. While, the standard deviation denotes the dispersion of the numbers. Therefore, the mean and the standard deviation imply similarity and dissimilarity of the same term to different people, respectively.

5.2 Encoding words into fuzzy sets—codebooks

For each word, based on the statistics, the m surviving points a⁽ⁱ⁾ (i = 1, 2, ⋯ , m) are mapped to the parameters of a triangular fuzzy number $\tilde{a} = (a^{L}, a^{M}, a^{U})$ using the formulas as follows [25]: $a^{L} = \max {0, m_{a} - 2 s_{a}}$ (2) $a^{M} = m_{a}$ (3) $a^{U} = \min {10, m_{a} + 2 s_{a}}$ (4) where the maximum in the formula (2) can guarantee a^L nonnegative. Similarly, the minimum in the formula (4) can guarantee a^U no more than 10.

For the term “Very good”, a^L = 8.94 - 2 ×0.5 = 7.94, a^M = 8.94 and a^U = 8.94 + 2 ×0.5 = 9.94, respectively. Thus, the term “Very good” is transformed to the triangular fuzzy number (7.94, 8.94, 9.94). Other terms can be transformed similarly. Therefore, we can establish the first codebook S¹ shown as Table 2. The membership functions of the linguistic terms in codebook S¹ are illustrated in Fig. 2.

Table 2

The first codebook S¹

Terms	a _L	a _M	a _U
Extremely bad	0	0.78	2.42
Very bad	0.56	2.28	4
Bad	1.54	3.56	5.58
Common	4.05	5.49	6.93
Good	6.60	7.78	9.5
Very good	7.94	8.94	9.94
Extremely good	9.14	9.8	10

Fig.2

The membership functions of the linguistic terms in codebook S¹.

In Fig. 2, the horizontal axis x denotes the numerical values on the scale 0–10 corresponding to each terms in the codebook S¹, while vertical axis denotes the membership degree of the linguistic terms. The number in the x-axis corresponds to the numerical center point of each linguistic term.

Similarly, we can obtain other codebooks S₂ and S₃, which are shown in Tables 3, 4, respectively.

Table 3

The second codebook S²

Terms	a _L	a _M	a _U
Extremely low	0	0.82	2.42
Very low	0.44	1.98	3.52
Low	1.83	3.15	4.47
Somewhat low	2.78	4.2	5.62
Moderate	4.33	5.25	6.17
Somewhat high	5.74	7.38	9.02
High	6.4	7.98	9.56
Very high	7.92	8.94	9.96
Extremely high	9.24	9.84	10

Table 4

The third codebook S³

Terms	a _L	a _M	a _U
Very unimportant	0	1.47	3.77
Unimportant	0.74	3.02	5.3
Moderate important	4.1	5.56	7.02
Important	6.84	8.06	9.28
Very important	8.35	9.41	10

The membership functions of the terms in the codebook S₂ and S₃ are illustrated in Figs. 3 and 4, respectively.

Fig.3

The membership functions of the linguistic terms in codebook S².

Fig.4

The membership functions of the linguistic terms in codebook S³.

5.3 The characteristics and novelty of the transforming method

In the traditional methods, linguistic variables are mapped into numerical values with some formulas in a regular way [23, 24]. However, in this paper, based on sample survey, we establish codebooks, which map words in the linguistic term set to triangular fuzzy numbers. The translating process consists of two parts, the data part and the fuzzy set (FS) part. In the data part, the data points corresponding to the linguistic terms are collected from a group of subjects respectively. Then, the data are preprocessed in order to guarantee that the remaining data are valid and reasonable. In the FS part, for the surviving points, the means and standard deviations are computed. Then the parameters of the triangular fuzzy number are determined based on the data statistics, and the derived triangular fuzzy numbers are mapped to the linguistic terms. The proposed method has some characteristics or advantages.

The membership functions of the terms in the codebooks change irregularly, i.e., the distinction between central points of adjoining linguistic terms varies widely.

The degrees of dispersion from the terms are distinct evidently. The dispersion degrees imply the uncertainty of the terms. Intuitively, different terms imply different uncertainty, even though they are from the same person.

Except the first one and the last one in the linguistic term sets, the membership functions of the terms in the codebooks are symmetrical. The response values are between 0 and 10.

The characteristic can be illustrated via special examples in the codebook S₁. For instance, the numerical values corresponding to the term “Common” focus on the near to the number “5.49”, similarly, “Extremely bad” corresponds to “0.78”, “Very bad” corresponds to “2.28”, “Bad” corresponds to “3.56”, “Good” corresponds to “7.78”, “Very good” corresponds to “8.94” and “Extremely good” corresponds to “9.8”,respectively. Obviously, the gaps between center points of the adjacent terms are varied. Specifically, the first three terms “Extremely bad” “Very bad” and “Bad” are relatively close to each other, meanwhile, the last three terms “Good” “Very good” and “Extremely good” are also relatively close to each other. However, the term “Common” is relatively far to its both neighbors. Intuitively, that is reasonable. In fact, the first three terms belong to the broad range of the term “Bad”, the difference between them is only the different extent, that could be seen as quantitative changes. Similarly, the last three terms belong to the broad range of the term “Good”. Nevertheless, it is qualitative changes from “Bad” to “Common” or from “Common” to “Good”, so the gap is much larger than the former.

So the proposed method seems to be more in line with the real world.

6 An application in E-commerce recommendation

Recently, it becomes more and more popular to go shopping online in China. We are so familiar to some online shopping websites, which include, www.amazon.com, www.jd.com, www.taobao.com and so on. On November 11, 2014, or called “singles day” in China, which is online shopping spree, all the e-commerce companies have harvested greatly. Especially, China’s largest Alibaba Group Holding Ltd broke a record when sales exceeded 10 billion Yuan ($1.6 billion) within 40 minutes after the start of its “11/11” online shopping spree, and hit a one-day sales record of 57.1 billion Yuan ($9.3 billion). To seize the foreign market, the shopping websites provide the services for shopping overseas, which attract much attention, especially from white-collar employees. www.amazon.com possesses the apparent advantages over the oversea shopping business due to its rich experience and reliable logistics. Overall, Black Friday sales declined about 11 percent compared with 2013, according to the National Retail Federation (NRF). Online sales, however, were up 14.3 percent, according to IBM Digital Analytics Benchmark.

Now, a sports enthusiast from the undergraduates plans to purchase a pair of sneakers on www.amazon.com. By primary election, there are still four remaining alternatives, which are a₁ (Adidas: isolation low G66010), a₂ (Nike: prime hype DF winterized 684892) a₃ (Puma: trinomic XS 850 plus 35614306) and a₄ (Mizuno: wave inspire 10 J1GC144402), respectively. He concerns three attributes: c₁ (price), c₂ (design) and c₃ (evaluation), where price includes commodity price itself and the express fees, design includes color and trade dress, and evaluation includes assessments from product quality, performance and service. And he prefers to rate the attributes and their importance in natural languages instead of numerical values. His preferences to the alternatives with respect to different attributes are shown as Table 5, which can be regarded as linguistic decision making matrix (denoted by $\tilde{A}$ ). For example, he considers that the alternative a₁ (Adidas) is “very low” in price, “common” in design and “bad” in evaluation. Other alternatives are similar. Meanwhile, he considers that the weights of c₁ (price), c₂ (design) and c₃ (evaluation) are w₁ (very important), w₂ (unimportant) and w₃ (important), respectively.

Table 5
Linguistic decision making matrix $\tilde{A}$

Alternative a _i c₁ (price) c₂ (design) c₃ (evaluation)

Very important Unimportant Important

a ₁ Very low Common Bad

a ₂ Moderate Good Good

a ₃ High Very good Common

a ₄ Very high Very good Very good

Alternative a _i	c₁ (price)	c₂ (design)	c₃ (evaluation)
a ₁	Very low	Common	Bad
a ₂	Moderate	Good	Good
a ₃	High	Very good	Common
a ₄	Very high	Very good	Very good

Then, which one is the optimal choice? It can be considered as the MADM problem [26] with linguistic evaluations. We attempt to help him choose the most proper alternative. The decision making steps are follows.

Step 1. Transforming the linguistic decision matrix to triangular fuzzy matrix

For convenience, we denote linguistic decision making matrix with $\tilde{A} = {({\tilde{x}}_{ij})}_{4 \times 3}$ , where ${\tilde{x}}_{ij}$ is a linguistic term for codebook S¹ or S². Then, we replace ${\tilde{x}}_{ij}$ with its corresponding triangular fuzzy number and obtain the fuzzy matrix, denoted by $\tilde{A}$ too.

For example, the element ${\tilde{x}}_{31}$ is “high”, it can be transformed into triangular fuzzy number ${\tilde{x}}_{31} = (6.4, 7.98, 9.56)$ . Other elements can be obtained similarly. The fuzzy matrix is shown in Table 6. Meanwhile, the weights w₁(very important), w₂(unimportant) and w₃(important) are transformed into fuzzy numbers too. Based on the codebook S₃, w₁ = (8.35, 9.41, 10), w₂ = (0.74, 3.02, 5.3) and w₃ = (6.84, 8.06, 9.28).

Table 6

Fuzzy decision making matrix $\tilde{A}$

Alternative a _i	c₁ (price)	c₂ (design)	c₃ (evaluation)
a ₁	(0.44, 1.98, 3.52)	(4.05, 5.49, 6.93)	(1.54, 3.56, 5.58)
a ₂	(4.33, 5.25, 6.17)	(6.06, 7.78, 9.5)	(6.06, 7.78, 9.5)
a ₃	(6.4, 7.98, 9.56)	(7.94, 8.94, 9.94)	(4.05, 5.49, 6.93)
a ₄	(7.92, 8.94, 9.96)	(7.94, 8.94, 9.94)	(7.94, 8.94, 9.94)

Step 2. Standardizing the fuzzy matrix

Though the attribute c₁(price) is cost variable, which means the more, the inferior. Thus, the corresponding fuzzy numbers should be standardized so as to establish the normal fuzzy matrix $\tilde{V} = {({\tilde{v}}_{ij})}_{4 \times 3}$ . For example, ${\tilde{x}}_{31}$ can be normalized as follows. $\begin{matrix} {\tilde{v}}_{31} & = & (10, 10, 10) - {\tilde{x}}_{31} = (10, 10, 10) \\ - (6.4, 7.98, 9.56) = (0.44, 2.02, 3.6) . \end{matrix}$

Similarly, we can obtain: $\begin{matrix} {\tilde{v}}_{11} & = & (6.48, 8.02, 9.56) \\ {\tilde{v}}_{21} & = & (3.83, 4.75, 5.67) \\ {\tilde{v}}_{41} & = & (0.04, 1.06, 2.08) . \end{matrix}$

Nevertheless, the attributes c₂ (design) and c₃ (evaluation) are profit variables, so the normal values are same as the fuzzy numbers, i.e., ${\tilde{v}}_{ij} = {\tilde{x}}_{ij}$ , (i = 2, 3 ; j = 1, 2, 3, 4). The normal fuzzy matrix $\tilde{V}$ is shown as Table 7.

Table 7

Normal decision making matrix $\tilde{V}$

Alternative a _i	c₁ (price)	c₂ (design)	c₃ (evaluation)
a ₁	(6.48, 8.02, 9.56)	(4.05, 5.49, 6.93)	(1.54, 3.56, 5.58)
a ₂	(3.83, 4.75, 5.67)	(0.06, 7.78, 9.5)	(6.06, 7.78, 9.5)
a ₃	(0.44, 2.02, 3.6)	(7.94, 8.94, 9.94)	(4.05, 5.49, 6.93)
a ₄	(0.04, 1.06, 2.08)	(7.94, 8.94, 9.94)	(7.94, 8.94, 9.94)

Step 3. Computing the weighted value of the alternatives

The weighted value of the alternative a_i is computed with the formula: $\begin{matrix} {\tilde{v}}_{i} & = & (v_{i}^{L}, v_{i}^{M}, v_{i}^{U}) = \frac{\sum_{j = 1}^{3} (w_{j} \times {\tilde{v}}_{ij})}{\sum_{j = 1}^{3} w_{j}}, \\ (i = 1, 2, 3, 4, 5) . \end{matrix}$

For example, $\begin{matrix} {\tilde{v}}_{2} & = & (v_{2}^{L}, v_{2}^{M}, v_{2}^{U}) \\ = & \frac{very important \times moderate + unimportant \times good + important \times good}{very important + unimportant + important} \\ = & \frac{(8.35, 9.41, 10) \times (3.83, 4.75, 5.67) + (0.74, 3.02, 5.3) \times (7.94, 8.94, 9.94) + (6.84, 8.06, 9.28) \times (7.94, 8.94, 9.94)}{(8.35, 9.41, 10) + (0.74, 3.02, 5.3) + (6.84, 8.06, 9.28)} \\ ≅ (4.89, 6.39, 7.94) . \end{matrix}$

The weighted values of the alternative a₂, a₃ and a₄ can be achieved similarly, which are shown in Table 8.

Table 8

The decision result of the alternatives

Alternative a _i	Weighted value ${\tilde{ν}}_{i}$	Defuzzified value $m (\tilde{ν})$	Rank
a ₁	(4.25, 5.89, 7.49)	5.88	2
a ₂	(4.89, 6.39, 7.94)	6.4	1
a ₃	(2.34, 4.4, 6.22)	4.34	4
a ₄	(3.8, 5.32, 6.74)	5.3	3

Step 4. Computing and ranking the alternatives

Firstly, the defuzzified value of alternative a_i is computed with the formula: $m ({\tilde{v}}_{i}) = \frac{v_{i}^{L} + 2 v_{i}^{M} + v_{i}^{U}}{4}$

For example, $\begin{matrix} m ({\tilde{v}}_{1}) & = & \frac{v_{1}^{L} + 2 v_{1}^{M} + v_{1}^{U}}{4} \\ = & \frac{4.25 + 2 \times 5.89 + 7.49}{4} = 5.88 . \end{matrix}$

The defuzzified values of other alternatives can be acquired similarly, which are shown in Table 8.

The decision result is: a₂ ≻ a₁ ≻ a₄ ≻ a₃. So the alternative a₂ (Nike: prime hype DF winterized 684892) is the optimal choice.

In this problem, all the attributes (c₁ (price), c₂ (design), c₃ (evaluation)) are rated in linguistic terms, which can be transformed into fuzzy numbers based on the codebook. The codebook is established through sample survey. Then, based on the rules of fuzzy sets, the linguistic information is computed and aggregated. Ultimately, the alternatives are ranked according to the defuzzified values.

The proposed method transforms the linguistic terms into fuzzy numbers through actual survey. So the process of decision making seems to be more consistent with real behavior.

Notably, the sample survey is conducted in the campus, and subjects are undergraduates. So the obtained codebooks are suitable to the students. In fact, the sport enthusiast in the example is a student. Hence, the results are reasonable.

7 Conclusions

In this paper, we have provided a sample survey based method to transform words into numerical values, and established the codebooks, which map linguistic terms to fuzzy sets. Therefore, based on the rules of fuzzy sets, the natural language information can be computed and aggregated. The availability is illustrated through a real application in shopping online.

The novelties and/or characteristics in this paper are as follows.

The weights and the assessments of the attributes are all represented with linguistic variables. In some cases, it is difficult to assess the attributes as well as the weights, thus, we can adopt natural languages, which are convenient and friendly to the decision makers. Even for the unfamiliar domain, people can provide their own assessments.

The linguistic terms are transformed into numerical values based on sample survey, instead of some regular formula. In real evaluation, the words from the same linguistic term set change irregularly from worst to best, i.e., the differences between two adjacent words might be distinct. And the deviation degree of numerical values corresponding to each word might be distinct too. So the proposed transforming method is more consistent with real world in decision making.

In the future research, we will continue working on linguistic decision making problems, and applied the developed method to some other domains such as industrial structure evaluation [27], recommendation systems, behavioral decision making and so on.

Footnotes

Appendix A

Survey questionnaire

Dear friends:

Good morning!

Thank you for your help! In order to transform words information to numerical values, we make the survey. Please answer the following questions.

Important to note that: ➀ The more the number is, the superior it means; ➁ The number can be decimals.

The first group:

To you, on a scale of 0–10, what is the number that corresponds to the word good ?

Answer: ___.

Similarly, bad corresponds to ___; common corresponds to ___; very good corresponds to ___; extremely bad corresponds to ___; extremely good corresponds to ___; very bad corresponds to ___.

The second group:

Similarly again, very high corresponds to ___; low corresponds to ___; somewhat high corresponds to ___; moderate corresponds to ___; somewhat low corresponds to ___; extremely high corresponds to ___; extremely low corresponds to ___; high corresponds to ___; very low corresponds to ___.

The third group:

Similarly again, important corresponds to ___; very unimportant corresponds to ___; moderate important corresponds to ___; unimportant corresponds to ___; very important corresponds to ___.

Thank you again! Good luck to you!

Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) (71771002).

References

Zadeh

L.A.

, From computing with numbers to computing with words - From manipulation of measurements to manipulation of perceptions, IEEE Transactions on Circuits and Systems I-Regular Papers 46(1) (1999), 105–119.

Zadeh

L.A.

, Fuzzy logic equals Computing with words, IEEE Transactions on Fuzzy Systems 4(2) (1996), 103–111.

Herrera

and Martinez

, A 2-tuple fuzzy linguistic representation model for computing with words, IEEE Transactions on Fuzzy Systems 8(6) (2000), 746–752.

Zadeh

L.A.

, Fuzzy Sets, Information and Control 8(3) (1965), 338–353.

Zadeh

L.A.

, The concept of a linguistic variable and its application to approximate reasoning, Information Sciences 8 (1975), 199–249.

D.R.

, Mendel

J.M.

and Coupland

, Enhanced interval approach for encoding words into interval type-2 fuzzy sets and its convergence analysis, IEEE Transactions on Fuzzy Systems 20(3) (2012), 499–513.

Wang

and Xu

Z.S.

, Interactive algorithms for improving incomplete linguistic preference relations based on consistency measures, Applied Soft Computing 42 (2016), 66–79.

Zhou

and Xu

Z.S.

, Generalized asymmetric linguistic term set and its application to qualitative decision making involving risk appetites, European Journal of Operational Research 254(2) (2016), 610–621.

Wang

and Xu

Z.S.

, Some consistency measures of extended hesitant fuzzy linguistic preference relations, Information Sciences 297 (2015), 316–331.

10.

Zhou

and Xu

Z.S.

, Asymmetric hesitant fuzzy sigmoid preference relations in the analytic hierarchy process, Information Sciences 358 (2016), 191–207.

11.

D.J.

, Intuitionistic fuzzy information aggregation under confidence levels, Applied Soft Computing 19 (2014), 147–160.

12.

D.J.

, Zhang

W.Y.

and Huang

, Dual hesitant fuzzy aggregation operators, Technological and Economic Development of Economy 22(2) (2016), 194–209.

13.

Herrera

, Martinez

and Sanchez

P.J.

, Managing non-homogeneous information in group decision making, European Journal of Operational Research 166 (2005), 115–132.

14.

Herrera

, et al., Computing with words in decision making: Foundations, trends and prospects, Fuzzy Optim Decis Making 8 (2009), 337–364.

15.

Herrera

, Herrera-Viedma

and Martinez

, A fusion approach for managing multi-granularity linguistic term sets in decision making, Fuzzy Sets and Systems 114 (2000), 43–58.

16.

Herrera

and Herrera-Viedma

, Linguistic decision analysis_ steps for solving decision problems under linguistic information, Fuzzy Sets and Systems 115 (2000), 67–82.

17.

Herrera

, Herrera-Viedma

and Martínez

, A fuzzy linguistic methodology to deal with unbalanced linguistic term sets, IEEE Transactions on Fuzzy Systems 16(2) (2008), 354–370.

18.

Liu

and Mendel

J.M.

, Encoding words into interval type-2 fuzzy sets using an interval approach, IEEE Transactions on Fuzzy Systems 16(6) (2008), 1503–1521.

19.

Liu

S.L.

and Liu

X.W.

, A sample survey based linguistic MADM method with prospect theory for online shopping problems, Group Decision and Negotiation 25(4) (2016), 749–774.

20.

Dubois

and Prade

, Fuzzy Sets and Systems: Theory and Application, New York: Academic Press, 1980.

21.

Laahoven

P.J.M.

and Pedrycz

, A fuzzy extension of Saaty’s priority theory, Fuzzy Sets and Systems 11 (1983), 229–241.

22.

Lee

L.-W.

and Chen

S.-M.

, Fuzzy decision making based on likelihood-based comparison relations of hesitant fuzzy linguistic term sets and hesitant fuzzy linguistic operators, Information Sciences 294 (2015), 513–529.

23.

Z.S.

, Linguistic Decision Making: Theory and Methods Berlin: Springer, 2012.

24.

Silva

V.B.S.

and Morais

D.C.

, A group decision-making approach using a method for constructing a linguistic scale, Information Sciences 288 (2014), 423–436.

25.

Walpole

R.W.

, et al., Probability and Statistics for Engineers and Scientists 8th Ed, Upper Saddlebroock River: NJ: Prentice–Hall, 2007.

26.

Qin

J.D.

and Liu

X.W.

, Multi-attribute group decision making using combined ranking value under interval type-2 fuzzy environment, Information Sciences 297 (2015), 293–315.

27.

Cheng

Z.H.

, et al., Industrial structure, technical progress and carbon intensity in China’s provinces, Renewable and Sustainable Energy Reviews 81(2) (2018), 2935–2946.

	Data part: Processing
Terms	Step 1 n′	Step 2 n″	Mean $m_{a}^{'}$	Std $s_{a}^{'}$	Tolerance interval	Step 3 m	Mean m_a	Std s_a
Extremely bad	38	33	0.94	1.22	[0, 3.38]	32	0.78	0.82
Very bad	39	33	2.15	1	[0.15, 4.15]	31	2.28	0.86
Bad	39	38	3.37	1.27	[0.83, 5.91]	36	3.56	1.01
Common	39	39	5.55	0.82	[3.91, 7.19]	38	5.49	0.72
Good	40	36	7.84	0.93	[5.98, 9.7]	35	7.78	0.86
Very good	39	35	8.9	0.55	[7.8, 10]	34	8.94	0.5
Extremely good	39	39	9.76	0.44	[8.88, 10]	38	9.8	0.33

A sample survey based method on transforming linguistic terms into fuzzy sets and the application in MADM problems

Abstract

Keywords

1 Introduction

2 Preliminaries

2.1 Fuzzy sets and triangular fuzzy numbers

2.2.1 The operation rules of the triangular fuzzy numbers

2.2.2 The comparison method of the triangular fuzzy numbers

2.3 Linguistic term sets

3.1 Survey design and questionnaire

3.2 Data collecting

4 Data processing

4.1 Initial data display and rearrangement

4.2 Data processing

5 Encoding words into triangular fuzzy sets—codebooks

5.1 The statistics computing

6 An application in E-commerce recommendation

Table 5 Linguistic decision making matrix A ˜ Alternative a i c1 (price) c2 (design) c3 (evaluation) Very important Unimportant Important a 1 Very low Common Bad a 2 Moderate Good Good a 3 High Very good Common a 4 Very high Very good Very good

Footnotes

Appendix A

Acknowledgments

References

Table 5
Linguistic decision making matrix $\tilde{A}$

Alternative a _i c₁ (price) c₂ (design) c₃ (evaluation)

Very important Unimportant Important

a ₁ Very low Common Bad

a ₂ Moderate Good Good

a ₃ High Very good Common

a ₄ Very high Very good Very good