Abstract
Case retrieval is the major step in case-based reasoning (CBR). The similarity measurement between historical cases and the target case is very important in the case retrieval, and affects the results of the decision. In CBR practical application, there are hybrid attribute values for case attributes. The representation of the case and performing case retrieval with high retrieval accuracy for hybrid multiple formats of attribute values are significant challenges, but an in-depth study is lacking. The objective of this paper is to develop a new case retrieval method to hybrid multi-attribute, which contains four formats of attribute values, i.e., crisp numbers, interval numbers, multi-granularity linguistic variables, and intuitionistic fuzzy numbers (IFNs). First, crisp numbers, interval numbers, and multi-granularity linguistic variables are transformed into IFNs and an attribute similarity measurement based on IFNs is proposed. The attribute weights are determined by an optimal matching model. This model belongs to a type of multi-objective problem and can be solved using the min-max method. Furthermore, the case similarities between historical cases and the target case are obtained by aggregating attribute similarities using evidence reasoning, and the proper historical case(s) can be retrieved according to the obtained hybrid case similarities. Finally, a case study of the gas explosion in China’s Fujian province is conducted to demonstrate the proposed approach and its potential application.
Keywords
Introduction
Case-based reasoning (CBR) is a method of knowledge-based problem-solving that solves the present problem by referring to historical experience. It is in line with human thought; moreover, it is convenient and easy to use. Over the decades, CBR has been widely used in various areas such as medicine [1, 2], the manufacturing industry [3, 4], business [5, 6], emergency decision making [7–9], and fault diagnosis [10]. CBR is used to support decision-makers to find desirable solution(s) to solve problem(s). It usually includes four steps, i.e., case retrieval, case reuse, case revision, and case retention [11]. Case retrieval is the first and core step and its function is to find historical case(s) similar to the target case. If the retrieved historical case(s) are close to the target case, a solution referring to the retrieved historical case(s) would be applicable; otherwise, it would be unacceptable. The case retrieval process often relies on the similarity measurement. Thus, a good similarity measurement is important for case retrieval. To retrieve proper historical case(s) for solving problems, the study of similarity measurement is vital.
In CBR applications based on case retrieval [1, 13], the representation of problem attributes often takes the form of multiple attributes, and the formats of the attribute values vary [15]. In reality, the case may be with hybrid data formats, i.e., crisp symbols, crisp numbers, and fuzzy variables. For example, in a gas explosion emergency, the value of attribute ’ventilation’ is expressed as linguistic variables, i.e., ’very good’, ’good’, ’bad’, ’very bad’, the value of attribute ’gas concentration’ is represented as interval numbers, i.e., [0.1,0.25], the value of attribute ’the number of underground personnel’ is represented as crisp numbers, i.e., 150. Therefore, how to measure the case similarities between the target case and historical cases with hybrid multi-attribute is an important research topic.
To date, a number of methods have been suggested in case retrieval [6, 12–17]. For example, Qi et al. [12] proposed a new fuzzy similarity measurement, which is based on adapted Gaussian membership. Zarandi et al. [13] proposed fuzzy representation of cases and a fuzzy clustering of fuzzy data model to similarity matching for case retrieval. Xiong [14] adopted fuzzy if-then rules as the form of knowledge representation and built a similarity model. Fan et al. [15] presented a hybrid similarity measure, which considered five formats of attribute values: crisp symbols, crisp numbers, interval numbers, fuzzy linguistic variables and random variables. Qi et al. [16] developed a case retrieval method combined with similarity measurement and multi-criteria decision making, which considers fuzzy linguistic evaluation, triangular function, trapezoidal function and Gaussian function. Chang et al. [6] developed a semantic-based case retrieval, which integrates short-text semantic similarity and recognizing textual entailment. Zhao et al. [18] presented a similarity-based case retrieval algorithm, which considered three formats values: crisp symbols, crisp numbers and interval numbers. Qi et al. [19] proposed a new similarity measurement that included four independent similarity measurement metrics, i.e., fuzzy similarity measurement, numeric similarity measurement, textual similarity measurement, and interval similarity measurement. Qi et al. [20] developed a similarity measurement based on support vector regression, and the types of attributes contained textual, numerical, interval, and fuzzy variables. Among the existing studies, the methods for measuring similarity between each historical case and the target case usually consider hybrid multiple formats of attribute values.
The hybrid similarity measurement methods in the existing CBR studies have made significant contribution to case retrieval. The cases often contain fuzzy information. Since Zadeh [21] introduced the concept of fuzzy logic, fuzzy-set theory has been applied in the case expression. Wu et al. [22] employed a triangular membership function to represent the variables and developed a fuzzy CBR retrieval mechanism. Chang et al. [23] proposed a membership function to calculate the membership value and developed fuzzy case-based reasoning. However, it is very complex and time-consuming to calculate the membership value based on membership functions. To overcome this problem, Cheng et al. [24] employed the membership itself to express the fuzzy number and proposed a new similarity measurement by combing it with fuzzy-set theory to solve construction disputes problems. Qi et al. [16] represented the fuzzy number using a triangular number, trapezoidal number, and fuzzy linguistic evaluation terms and proposed a new similarity method employing the membership function itself. Nevertheless, all these above studies do not consider the format of intuitionistic fuzzy numbers (IFNs). In reality, the representation of fuzzy information cannot be expressed only by the degree of membership, but there is also the degree of non-membership and hesitation. For example, in the gas explosion, some decision makers evaluate the safety of the mine like this: the safety degree is 0.4, the un-safety degree is 0.5, and the uncertain degree is 0.1. In this situation, IFNs can express this very well. IFNs can deal with fuzzy and uncertainty information more flexibility and practicality. Therefore, new methods for hybrid multi-attribute case retrieval, which consider four formats of attribute values, i.e., crisp number, interval numbers, multi-granularity linguistic variables and IFNs, are worth being studied.
To do this, we must deal with the following three challenges: (1) How to measure the attribute similarities for hybrid multi-attribute on the basis of preserving the integrity of data information as much as possible. (2) How to determine the attribute weights easily and accurately, which have a significant influence on the accuracy of the case similarity measurement. (3) How to integrate attribute similarities into case similarity.
For the first challenge, an attribute similarity measurement based on IFN is given. For the second challenge, an optimization model based on distance variance has been constructed to determine attribute weights. The third challenge is to integrate the attribute similarities with the attribute weights using evidence reasoning (ER) method. The result of the attribute similarities is IFNs. ER can not only deal with the uncertainty caused by randomness or fuzziness, but also can distinguish between unknown and uncertain information. Over past decades, ER has been widely used in information fusion [25–27] and has made good effect. In particular, ER has been applied to aggregate fuzzy information. For example, Wang et al. [28] developed a fuzzy ER analytic algorithm to aggregate triangular IFNs. Chen et al. [29] used the ER to aggregate each decision maker’s decision matrix in fuzzy multi-attribute group decision making. Bao et al. [30] applied the ER to calculate the comprehensive prospective values in the form of IFNs. Therefore, ER is believed to be helpful for the aggregation of IFNs.
To verify the effectiveness of the proposed method, an example in generating emergency alternative is applied to illustrate the feasibility of the hybrid multi-attribute case retrieval method. Then, the proposed method is compared and analyzed with other case retrieval methods.
The remainder of this study is arranged as follows. Section 2 briefly reviews the steps of case similarity measurement and the basic concepts of IFNs. Section 3 introduces the problem description and information transformation. Section 4 gives a similarity measurement based on IFNs. Section 5 introduces the case retrieval method based on ER and intuitionistic fuzzy theory. Section 6 discusses a case study to demonstrate the efficiency of the proposed case retrieval method, and Section 7 concludes the paper.
Preliminaries
In this section, the similarity calculation process is introduced, and some basic concepts of intuitionistic fuzzy theory are recalled.
Similarity measurement
Similarity measurement is the central issue for identifying the most similar historical case(s) to solve the target problem [31]. The most widely applied technique in similarity measurement studies is the distance function [32–34]. The similarity measurement includes three steps and is described as follows.
Step 1: Calculate the attribute distance.
The most commonly used distance formulas are Manhattan distance function and Euclidian distance function. However, these distance functions only apply to numeric attributes. Sometimes the distance calculated by the two functions may be distorted. In order to avoid distorting the results, it is necessary to normalize the distance calculation by the Max function [31]. Sometimes the normalization is performed by the Max-Min function.
Aside from these distance formulas, some studies have developed other distance calculation formula, such as Gaussian distance [33], cosine angle distance, and the hybrid method combining several distance functions [24].
After that, the attribute similarities have been calculated by various similarity measurements without the distance value, such as inverse exponential function [15], grey correlation degree [33], and integration of several similarity measurements [33].
Step 2: Calculate the attribute weights
The determination of attribute weights has a significant impact on the similarity measurement [35]. Hence, the allocation method of the attribute weights is one important research topic. Currently, there are two techniques to determine the attribute weights. One is to employ statistical methods to determine the attribute weights, such as the optimization model [36] and the function based on grey correlation degree [37]. The other is through machine learning to determine the attribute weights [35, 38]. However, machine learning requires a certain amount of data for training purposes.
Step 3: Calculate the case similarity
After calculating the attribute distance/similarity and attribute weights, all the attribute distances/similarities with regard to every historical case should be aggregated to define the case similarity. A commonly used aggregation method is linear weighting.
Some basic concepts of intuitionistic fuzzy theory
For convenience, Xu [40] defined the IFN in the form α = (μ α , ν α ), satisfying the condition: μ α ∈ [0, 1], ν α ∈ [0, 1], μ α + ν α ≤ 1, where a score function is defined as s (α) = μ α - ν α .
S (A1, A2) is an IFS; S (A1, A2) = 〈1, 0〉, if and only if A1 = A2; S (A1, A2) = S (A2, A1); If A1 ⊆ A2 ⊆ A3, then S (A1, A3) ⊆ S (A1, A2) and S (A1, A3) ⊆ S (A2, A3);
Problem description and information transformation
Problem description
This section briefly describes the problem on hybrid multi-attribute case retrieval method with four different formats of attribute values.
Suppose there are m historical cases denoted by C
i
(i ∈ {1, 2, …, m}) and one target case denoted by C0. Let Z = {Z1, Z2, …, Z
n
} be a finite set of n attributes with regard to the historical case C
i
and the target case C0, where Z
j
denotes the jth attribute (j ∈ {1, 2, …, n}). Let X
i
= {xi,1, xi,2, …, xi,n} be a vector of attribute values with regard to the historical case C
i
, where xi,j represent the jth attribute of the historical case C
i
(i ∈ {1, 2, …, m} , j ∈ {1, 2, …, n}); Let X0 = {x0,1, x0,2, …, x0,n} be a vector of attribute values with regard to the target case C0, where x0,j represent the jth attribute of the target case C0. Let W = (w1, w2, …, w
n
) be a vector of attribute weights, where w
j
denotes the weight of attribute Z
j
, such that
Information transformation
For the hybrid multi-attribute, it is necessary to convert them into a uniform attribute type before calculating the case similarity; otherwise, the method is unable to assemble the attribute similarities and produce the final result. There are IFNs in the case. The common way to convert the IFNs into real number is to use the score function, but this will cause the incomplete expression of the decision information [42, 43]. In this situation, converting the other formats of attribute values, i.e. crisp symbols, crisp numbers, multi-granularity linguistic variables, into IFNs is more reasonable. Hence, information transformation is given as follows.
First, to dispose of the hybrid attribute data easily, we normalize them into numbers from 0 to 1. For attribute xi,j which is a real number, let
For attribute xi,j is an interval number, i.e.
For the attribute xi,j is a multi-granularity linguistic variable, i.e.
Then,
The next step is to transform the attributes into IFNs. If the normalized value of attribute Z
j
regarding to historical case C
i
is a real number
When the normalized attribute value
Through the above information transformation, the normalized attribute values
When the attributes are converted to IFNs, a similarity measurement based on IFNs should be proposed. Szmidt et al. [45] proposed some definitions of distances between IFNs. Inspired by this, an attribute similarity measurement based on IFN is proposed in this section.
1) Let us first prove that S (A s , A t ) is an IFN.
Because 0 ≤ H ≤ 1, and 0 ≤ L ≤ 1, obviously,
2)
Because S (A s , A t ) =<1, 0 >, then
Obviously, |us,l - ut,l|=0, and |(1 - vs,l) - (1 - vt,l) |=0, so that
us,l = ut,l, and vs,l = vt,l.
Thus, A s = A t .
Because A s = A t , we have
us,l = ut,l, vs,l = vt,l,
thus S (A s , A t ) =<1, 0 >.
3) Obviously, S (A s , A t ) = S (A s , A t ).
4) If A1 ⊆ A2 ⊆ A3, namely, u1,l ≤ u2,l ≤ u3,l, v1,l ≥ v2,l ≥ v3,l, and
Similarly, we can derive S (A1, A3) ⊆ S (A2, A3).
Especially, for two IFNs, namely, α1 =< μ1, v1 >, α2 =< μ2, v2 >, we have
In this section, we give a case retrieval method based on IFNs. First, the calculation formula to measure the attribute similarity is given. Then, the maximizing differential model to determine the attribute weights is constructed. Afterwards, case similarity measurement is given by aggregating the attribute similarities and attribute weights. Finally, the proper historical case(s) can be gained according to the ranking of the case similarities. The method is introduced as follows.
Attribute similarity measurement
When the attributes xi,j and x0,j are converted into IFNs by Equations (1)-(5), i.e., xi,j = < ui,j, vi,j > , x0,j =< u0,j, v0,j >, the attribute similarity measurement is presented as follows.
The attributes xi,j and x0,j are IFNs, the attribute similarity can be calculated through the new similarity measurement based on IFNs, which are proposed in Theorem 1. In view of this analysis, let s0i,j denote the attribute similarity between the target case C0 and historical case C i with regard to the attribute Z j , then the calculation formula of s0i,j is given by
Determination of attribute weights is essential in the case retrieval. There are mainly two kinds of methods to determine the attribute weights. One kind is by optimizing model [36], the other kind is machine learning method [35, 46].
In our proposed method, in order to undertake an objective retrieval result and use easily, the attribute weights are determined using an optimal model. Wang et al. [47] proposed a weight determination method based on variance between two attributes, which has proved to be simple and effective. Inspired by this, the attribute weights are computed by using an optimal model based on attribute deviation. Then the attribute determination method is introduced as follows.
According the score function, let x
a
= (x1,a, x2,a, …, xm,a) and x
b
= (x1,b, x2,b, …, xm,b) be two attribute values with regard to the attributes Z
a
and Z
b
(a, b ∈ {1, 2, …, n}).Then, x
a
and x
b
are converted into the score function vectors, where the result is p
a
= (p1,a, p2,a, …, pi,a, …, pm,a), p
b
= (p1,b, p2,b, …, pi,b, …, pm,b), The variance between p
a
and p
b
is defined as
Because d′ has the same monotonic property of d, we use d′ instead of d in the model defined by (10).
which is equivalent to
Then Equation (14) can be equivalently rewritten in matrix form such that
where
Clearly, R is a positive nonsingular matrix. Therefore
Solving (15) and (17) together, we obtain
According to (18), we have w* ≥ 0, thus satisfying the normalized constraint condition, which completes the proof.
According to Corollary 1, the attribute weights W = (w1, w2, …, w n ) can be gained.
To retrieve proper historical case(s), it is necessary to measure the case similarities. Thus, the attribute similarities with IFNs should be aggregated into the case similarities. However, IFNs contains uncertain information. ER is an uncertainty reasoning method, and it has a distinct advantage in the information fusion of the uncertainty. Hence, we use ER to aggregate the attribute similarities, which are in the form of IFNs. Here, the aggregation process is given as follows.
First, convert the attribute similarity s0i,j into the grade belief form. Let the grade set be H = {H
r
|r = 1, 2}, H1 =<1, 0 >, H2 =<0, 1 >, the attribute similarity s0i,j can be expressed as:
Then, transform the belief degrees into basic probability masses by combining the attribute weight w
j
and the belief degrees using the following equations:
Next, the basic probability masses on n basic attributes are combined into an aggregated basic probability assignment by using the following analytic formulas [48]:
Finally, the aggregated probability assignments are normalized into overall belief degrees by using the following equations:
The result of aggregation by ER is a distribution assessment vector, so it is not straightforward to rank the historical cases. In order to rank the historical cases, we convert (βr,j, βH,i) into IFNs as follows:
Furthermore, we use a score function Sim
i
to calculate the case similarities as below,
Obviously, the case similarities can be obtained from Equation (30), and Sim i ∈ [0, 1]. The greater Sim i is, the more similar the historical case C i with target case C0 will be.
The following case study involves an emergency decision making problem considering generating an alternative for a gas explosion in China, and implements the proposed case retrieval method.
Problem description and analysis process
In recent years, gas explosions have occurred frequently in China, causing serious harm to society. How to deal with a gas explosion quickly and effectively caused the government and the coal company to attach higher priority to safety concerns. Fortunately, the reasons and solutions for gas explosions are similar. This means that coal companies can draw on historical cases to make decisions when a new gas explosion occurs.
Company A is a coal company in China’s Fujian province, which is very concerned with gas explosions. Once a gas explosion occurs, it leads to serious consequences, such as human casualties, collapse of the coal mine, and fire disaster. In order to better deal with gas explosions, company A collected ten historical cases (C1, C2, …, C10) and created a case base. At the same time, company A identified C0 as the target case. Each case contains two parts of the problem and solution. The problem considers six attributes, including the number of underground personnel (Z1, unit: person), gas concentration (Z2, unit: %), CO concentration (Z3, unit: %), ventilation (Z4), O2 concentration (Z5, unit: %), and mine safety (Z6). Among them, Z1 is a crisp number, Z2, Z3 and Z5 are interval numbers, and Z4 is a multi-granularity linguistic variable, whose linguistic assessment sets are S1 ={ DB:definitely bad, B:bad, M:medium, G:good, DG:definitely good}, S2 ={ DB:definitely bad, VB:very bad, B:bad, M:medium, G:good, VG:very good, DG:definitely good }, and Z6 is a IFN. The attribute values of the historical cases and the target case are shown in Table 1. Here, the objective of this study is to retrieve the most similar historical case from the case base and use it as reference for the target case of company A.
To solve the above problem, CBR is used, i.e., the case retrieval method in this paper is used to select the most similar historical case with the target case. To obtain the desirable historical case, the following steps are taken.
Information for historical cases and target case
Information for historical cases and target case
Step 1: Using Equation(3), transform the multi-granularity linguistic variables Z4 into a crisp number. Using Equation (1)-(2), normalize the attributes of Z1, Z2, Z3, Z4 and Z5.
Step 2: Using Equation (4)-(5), transform the attributes of Z1, Z2, Z3, Z4 and Z5 into IFNs.
Step 3: Using Equation (7), calculate the attribute similarity s0i,j, the results are shown in Table 2.
Step 4: We convert the attribute similarities into the score function. Using model (10), the attribute weights are obtained, and the result is W = (0.1880, 0.1598, 0.1523, 0.1885, 0.1687, 0.1427).
Step 5: Using Equation (19)-(28), convert the attribute similarities into grade belief form. And after using ER to assemble, the results are shown in columns 2, 3 and 4 of Table 3. Then, the case similarities Sim i are obtained using the score function; the result is shown in column 5 of Table 3. The ranking of the historical cases is based on the case similarities, and the results are shown in column 6 of Table 3.
Attribute similarity s0i,j
Case similarities and ranking
Obviously, the number one historical case is the most similar to the target case, which is C10.
From the computation results obtained by using the proposed case retrieval method, that the retrieved historical case C10 is the most similar case to the target case (C0).Thus, the alternatives for historical case C10 can be used directly or modified to deal with the target case C0.
As can be seen from this case, the proposed case retrieval method is powerful in handing hybrid multi-attribute case retrieval because the cases include four formats of attribute values, i.e., crisp numbers, interval numbers, multi-granularity linguistic variables, and IFNs. Furthermore, the attribute weights are determined by an optimal matching model, which are more objective and can improve the accuracy of the retrieved results. Finally, the attribute similarities are assembled by ER which can handle the uncertain information well. Therefore, this proposed method is feasible and effective for retrieving hybrid multi-attribute cases.
To further illustrate the novelty, validity, and feasibility of the proposed method, the following comparisons are performed.
First, in order to show the performance of the proposed method, we compare it with the hybrid similarity measurement (HSM) method [15], which does not convert all the attribute value into IFNs.
Because the HSM method in [15] and the Euclidian method [19] cannot deal with IFNs, we convert the attribute value of Z6 into crisp numbers using the score function. Then, we use the inverse exponential function proposed in [15] to calculate the attribute similarities. Furthermore, the linear weighting method is used to calculate the case similarities. The above two methods are named F-CBR and C-CBR, respectively, while the proposed method in this paper is named IR-CBR. Finally, we compare the ranking of the ten historical cases according to the case similarities, the results are shown in Fig 1.

Ranking comparison of the three methods.
As can be seen from Fig. 1, the most similar historical case was case C10. The two methods [15, 19] have slight differences from the proposed method, such as the ranking of the historical case C9 was 8 rather than 4. This is because the results of the two methods for calculating attribute similarity are different. Take the attribute Z5 regarding to C1 for example, the attribute similarity s01,5 in Table 2 is 〈0.9, 0.1〉, but the attribute similarity s01,5 is 0.8586 using the HSM method. When they compare, the attribute similarity s01,5 in Table 2 should convert to a crisp number, it means that it has to subtract the non-member function, i.e. s01,5 = 0.9 - 0.1. Obviously, the IFNs in calculating case similarity consider the determined and uncertain information, and reduce the loss of information.
Next, to illustrate the superiority of the proposed method, we use the score function in [49] to integrate the case similarities in Table 2, and this method is named SC-CBR. The results of the case similarities are shown in Fig 2. Case similarities of the two methods.
As we can see from Fig. 2, rankings are the same, but the results of ER were higher than the score function. This is because ER also builds up uncertain information.
Finally, compared with the method of Qi et al. [16], our approach has an obvious advantage in attributes’ weight determination. The method in [16] uses the method of average weights to calculate the case similarity, which does not consider the importance of the attributes. The method in [7] gives the attributes’ weights in advance, which can be subjective. The method in [33] uses a membrane computing-based approach to optimize the attribute weights, which require a certain amount of data. By contrast, the weights of attributes determined according to the data of the case base are more objective, simple, and require little data in our approach.
This study proposes a case retrieval method to hybrid multi-attribute. The method unifies the hybrid multiple attribute values into IFNs. Then, the proposed similarity measurement based on IFNs is used to calculate the case similarities, which take into account not only the deterministic attribute similarities but also the uncertain attribute similarities. Therefore, the proposed method differs from the method in [15] and can provide better decision results. In our proposed method, IFNs are massed by ER, which can better deal with uncertain or even missing information. Compared with the score function, the use of ER can lead to a more reasonable decision result, because the uncertain information is also considered in the case similarity measurement. The method for attribute weights determination is proposed by constructing a maximizing differential model that can overcome the subjective impact of attributes and the need for large amounts of data. The proposed method is expected to have more potential applications in the near future.
In the future, we will focus on integrating modified ER with other fuzzy theory to address fuzzy information in different cases, such as Pythagorean Fuzzy Sets [50], neutrosophic sets [51]. Moreover, machine learning methods will be introduced into CBR to improve the accuracy of the decision making, such as in the determination of attribute weights and case adaptation.
Footnotes
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China under the Grant No. 61773123, Humanities and Social Science Foundation of Chinese Ministry of Education under the Grant No. 16YJC630008, Fujian Natural Science Foundation of China, No. 2017J01513.
