Abstract
Construction of membership functions from numerical data is very important in various applications of the fuzzy set theory. In last two decade, many methods of membership function generation have been developed. Majority of the methods are application domain dependent and complex. In this paper, a simple method for the construction of membership functions from numerical data is proposed. To validate the proposed method, commonly used and suggested evaluation measures: average error rate, mean magnitude of relative error (MMRE), balanced mean magnitude of relative error (BMMRE), and coefficient of determination (R2), have been taken. The validating results show that proposed method has a higher accuracy than existing methods. The sensitivity analysis has been performed to analyze the impact of input variable on the output variable.
Introduction
The fuzzy set theory provides a way to capture the uncertainty, vagueness and imprecision. Zadeh provided new way on the thinking about uncertainty [27, 28]. Membership functions play a very important role in fuzzy inference system building. Membership function can be generated either with the help of domain expert or real data. However, most of the research articles in journals dealing with fuzzy logic either use domain expert knowledge or appear without using membership function [13, 15]. Developing membership functions from numerical data is one of the fundamentalstep in the design of a problem which is to be solved by fuzzy set theory. There are no standard guidelines or rules that can be used to opt the appropriate membership function construction technique. Another problem that makes membership function construction an important task, is the lack of consensus on the definition and interpretation of membership functions.
The personal interpretations of a meaning of concept varies from person to person. It is likely that different fuzzy profile may be developed to define the same concept. In [6], three different interpretations of a single statement are pointed out. The problem of construction of membership function is very important because the success of a method depends on the membership functions used. Therefore, it is needful to explain that how the membership function is derived.
In literature, many methods have been found of membership function generation based on heuristics, histograms, neural networks, clustering, genetic algorithms [4, 20]. Dombi [5] pointed out some common features among these different approaches which are as follow: all membership functions are continuous. all membership functions map an interval [a, b]to [0, 1]. membership functions are either monotonically increasing or monotonically decreasing or both increasing and decreasing.
Heuristic method uses predefined shape of membership functions [11]. Majority of the methods are application domain dependent and complex. It is impractical to use different membership function construction technique for different application problem. It is not impossible, come up to a single membership construction technique which will work for mostapplications problem.
In literature, many research articles have been found for generation of fuzzy if- then rule from numeric data [1–3, 21–23]. The drawbacks of the most of these models are that the membership functions are predefined to map numerical data into linguistic terms. Hong and Lee [8] proposed a method based on the fuzzy clustering technique and the decision tables to derive membership functions from numerical data. They have predefined the initial membership functions of the input variable and updated it by a series of merge operations. However, as the number of variables becomes larger, the decision table will grow tremendously and the decision tables become more complicated. Ping and Chen proposed a new method for fuzzy profile generation based on α- cuts of equivalence relations. As the number of input variable becomes larger and variable have huge amount of values, the complexity of proposed algorithm will increased [23]. Mitra et al. [19] developed a method for automatic linguistic discretization of continuous attributes using quantiles. Recently, Makrehchi et al.[17] proposed a method for generating optimal fuzzy membership function through genetic algorithm.
Graphically, fuzzy profiles are represented in the form of membership functions. In literature, many types of membership functions exist. However, triangular membership function simplifies the process of computation [14, 24–26]. Therefore, in the proposed model, only triangular membership functions are considered. The triangular membership functions is shown in Fig. 1. Based on literature survey, in this paper, we proposed a general learning method for construction of membership function.
The rest of this paper is organized as follows: The proposed methodology is explained in Section 2. In Section 3, an illustrative example is explained to demonstrate the proposed method. In Section 4, testing result and validation is discussed. In Section 5, finally the conclusion is presented.
Proposed methodology
In this section, we propose a technique for construction of fuzzy profile from numerical data. Let S denote a set of N training patterns (F1, F2,...., Fj,...., Fn) and Fj denote a feature. The feature Fj have n values: v1j, v2j,...., vnj. Fjmin and Fj max denote the minimum and maximum value of feature Fj. When the input feature is quantitative value then followings steps are performed for membership function generation. Step 1. Sort the values of feature in ascending order. Step 2. Perform K means clustering algorithm for clustering the quantitative values of feature Fj into k clusters (y1, y2,...., yi,...., yk). yi min and yi max denote the minimum and maximum value of ith cluster. Also find out the cluster centers (b1, b2,...., bi,...., bk) of k clusters (y1, y2,...., yi,...., yk). Step 3. Determine the membership value of two boundary points of every clusters. Find the difference between adjacent data. For each pair vi and vi +1 (i = 1, 2, 3, . . . . . , n - 1) the difference is diffi= vi +1-vi. Find the similarity value between adjacent data of quantitative values of feature Fj. The similarity between adjacent data is obtained according to the following formula [7].
where, sm- represents the similarity between adjacent data. C-control parameter deciding the shape of membership functions. σ
s
-standard derivation of deffi. Substep 3.3. The minimum value of similarity in ith cluster is chosen as the membership value of two boundary point yi min and yi max of ith cluster. Step 4. Determine the left vertex point (ai, 0) by interpolation.
Step 5. Determine the right vertex point (ci, 0) by interpolation.
Step 6. Find the membership value of each quantitative values of feature Fj.
In this section we demonstrate the proposed method with a training example. The numeric value of training example [8] has been considered to explain the proposed algorithm. The data has been reproduced in Table 1.
Construction of membership function
The construction of membership function of insurance fee with proposed algorithm is explained step by step as follows: Step 1 and step 2. Sort the values of feature in ascending order. Let k = 3, After applying k means clustering algorithm we get three clusters y1, y2, and y3. Cluster y1 contain 3 values (2000, 2100, 2200), cluster y2 contain 3 values (2500, 2600, 2700), and cluster y3 contain 2 values (3200, 3300). Cluster center of y1, y2, and y3 cluster is b1 = 2100, b2 = 2600, and b3 = 3250 respectively, which is shown in Table 2. Step 3. In this step, we determine the membership value of two boundary points of every clusters by applying the substep 3.1 to substep 3.3.
In order to get similarity value between adjacent values of insurance fee, first of all, the difference between adjacent data is calculated e.g. (v 2 - v 1 = 2100 - 2000 = 100). The calculated values of diffi are shown in Table 2.
Let constant C=4. The standard deviation is calculated as 145.69, the value of similarity (s m ) is calculated using Equation 1 as follows:
. . . . . . . . . . . .
It is clear from Table 2 that the minimum similarity value of cluster y1, y2, and y3 are 0.83, 0.83, and 0.83 respectively. Therefore the membership value of the two boundary points yimin and yimax of yi (i=1, 2, 3) is 0.83. Step 4. Determine the left vertex point (ai, 0) by interpolation. For cluster y1, b1=2100, y1min = 2000, μ (y
1min) = 0.83. For cluster y2, b2=2600, y2min = 2500, μ (y
2min) = 0.83, and for cluster y3, b3=3250, y3min = 3200, μ (
y
3min
) = 0.83. The value of left vertex point is calculated using Equations 2 and 3 as follows:
a
1 = 1511.76
a
2 = 2011.76
a
3 = 2955.88 Step 5. Determine the right vertex point (ci, 0) by interpolation. For cluster y1, b1=2100, y1max = 2200, μ (y
1max) = 0.83. For cluster y2, b2=2600, y2max = 2700, μ (y
2max) = 0.83, and for cluster y3, b3=3250, y3max = 3300, μ (y
3max) = 0.83. The value of left vertex point is calculated using Equations 4 and 5 as follows:
c
1 = 2600
c
2 = 3188.24
c
3 = 3544.11 Step 6. In this step, membership value of each quantitative values of feature Fj is calculated using Equation 6.
After the operation of step 1 to 6, the fuzzy profile of insurance fee is derived as shown in Fig. 2. Similarly fuzzy profile of Age and Property has been derived which is shown in Fig. 3 and Fig. 4.
Design fuzzy rule
In this step, fuzzy rule is defined in the form of IF–THEN conditional statement. IF part of the rule is known as antecedent and THEN part is consequent. The fuzzy rules are generated with the help of domain expert and numerical data [22]. The rules are as follows: If age is y1 and property is y1 Then Insurance fees is y1 If age is y1 and property is y2 Then Insurance fees is y1 If age is y1 and property is y3 Then Insurance fees is y1 If age is y2 and property is y1 Then Insurance fees is y2 If age is y2 and property is y2 Then Insurance fees is y2 If age is y2 and property is y3 Then Insurance fees is y2 If age is y3 and property is y1 Then Insurance fees is y3 If age is y3 and property is y2 Then Insurance fees is y3 If age is y3 and property is y3 Then Insurance fees is y3
Fuzzy inference system
The fuzzy inference system (FIS) for predicting the insurance fee is shown in Fig. 5. The age and property have been considered as input to the FIS. Insurance fee has been considered as output.
Fuzzy inference engine maps fuzzy set into a fuzzy set. A fuzzy Max-Min operator is used for this model. In many applications, crisp value needs to be obtained as an output. The defuzzification method such as centroid, max-min and bisection etc. maps fuzzy set into crisp value. Centroid method of defuzzification is used in this model.
Testing result and validation
To validate the suitability and applicability of the proposed algorithm, it applied on training example [8] to predict the insurance fees. The predicated insurance fees, and insurance fees predicated by Hong et al. [8] shown in Table 3. From Table 3, it is clear that the testing results are closer to actual output data than testing result by Hong et al. algorithm [8].
A comparison of the testing results between proposed algorithm and Hong et al. algorithm [8] is listed in Table 4. From Table 4, it is clear that the predictive accuracy of the proposed algorithm, expressed by the different measures is better than that of Hong et al. algorithm [8]. These satisfactory validating results give confidence in the membership function construction, but of course further validation using big datasets would provide even greater confidence in the suitability and applicability of the proposed algorithm. However, the proposed method will not have any implication even for a large number of data set because the membership function has not been predefined and the proposed algorithm is not dependent on the size of thedata set.
The sensitivity analysis has been performed to analyze the impact of input variable on the output. It can be observed from Fig. 6. and Fig. 7. that age causes increasing variation in insurance fee. It is seen from Fig. 8. that the input variable property is insignificant than the input variable age in predicting the output variable insurance fees. Similar type of result is also shown by Hong and Lee [8].
Conclusion
In this paper, a new approach is proposed for automatically construction of fuzzy membership functions from numerical data. The proposed algorithm reduces the time and effort needed to fuzzy profile development. This algorithm significantly helps researchers and software practitioners to develop fuzzy inference system for various applications. The proposed method is better than Hall et al. algorithm [8] due to followings reasons: No need to predefine fuzzy profile of input and output variables. The predictive accuracy of the proposed algorithm is better than one presented in [8].
