A method for generating membership function from numerical data

Abstract

Construction of membership functions from numerical data is very important in various applications of the fuzzy set theory. In last two decade, many methods of membership function generation have been developed. Majority of the methods are application domain dependent and complex. In this paper, a simple method for the construction of membership functions from numerical data is proposed. To validate the proposed method, commonly used and suggested evaluation measures: average error rate, mean magnitude of relative error (MMRE), balanced mean magnitude of relative error (BMMRE), and coefficient of determination (R²), have been taken. The validating results show that proposed method has a higher accuracy than existing methods. The sensitivity analysis has been performed to analyze the impact of input variable on the output variable.

Keywords

Membership function fuzzy set fuzzy inference system k-means clustering training data

1 Introduction

The fuzzy set theory provides a way to capture the uncertainty, vagueness and imprecision. Zadeh provided new way on the thinking about uncertainty [27, 28]. Membership functions play a very important role in fuzzy inference system building. Membership function can be generated either with the help of domain expert or real data. However, most of the research articles in journals dealing with fuzzy logic either use domain expert knowledge or appear without using membership function [13, 15]. Developing membership functions from numerical data is one of the fundamentalstep in the design of a problem which is to be solved by fuzzy set theory. There are no standard guidelines or rules that can be used to opt the appropriate membership function construction technique. Another problem that makes membership function construction an important task, is the lack of consensus on the definition and interpretation of membership functions.

The personal interpretations of a meaning of concept varies from person to person. It is likely that different fuzzy profile may be developed to define the same concept. In [6], three different interpretations of a single statement are pointed out. The problem of construction of membership function is very important because the success of a method depends on the membership functions used. Therefore, it is needful to explain that how the membership function is derived.

In literature, many methods have been found of membership function generation based on heuristics, histograms, neural networks, clustering, genetic algorithms [4 , 20]. Dombi [5] pointed out some common features among these different approaches which are as follow:

all membership functions are continuous.

all membership functions map an interval [a, b]to [0, 1].

membership functions are either monotonically increasing or monotonically decreasing or both increasing and decreasing.

Heuristic method uses predefined shape of membership functions [11]. Majority of the methods are application domain dependent and complex. It is impractical to use different membership function construction technique for different application problem. It is not impossible, come up to a single membership construction technique which will work for mostapplications problem.

In literature, many research articles have been found for generation of fuzzy if- then rule from numeric data [1–3 , 21–23]. The drawbacks of the most of these models are that the membership functions are predefined to map numerical data into linguistic terms. Hong and Lee [8] proposed a method based on the fuzzy clustering technique and the decision tables to derive membership functions from numerical data. They have predefined the initial membership functions of the input variable and updated it by a series of merge operations. However, as the number of variables becomes larger, the decision table will grow tremendously and the decision tables become more complicated. Ping and Chen proposed a new method for fuzzy profile generation based on α- cuts of equivalence relations. As the number of input variable becomes larger and variable have huge amount of values, the complexity of proposed algorithm will increased [23]. Mitra et al. [19] developed a method for automatic linguistic discretization of continuous attributes using quantiles. Recently, Makrehchi et al.[17] proposed a method for generating optimal fuzzy membership function through genetic algorithm.

Graphically, fuzzy profiles are represented in the form of membership functions. In literature, many types of membership functions exist. However, triangular membership function simplifies the process of computation [14 , 24–26]. Therefore, in the proposed model, only triangular membership functions are considered. The triangular membership functions is shown in Fig. 1. Based on literature survey, in this paper, we proposed a general learning method for construction of membership function.

The rest of this paper is organized as follows: The proposed methodology is explained in Section 2. In Section 3, an illustrative example is explained to demonstrate the proposed method. In Section 4, testing result and validation is discussed. In Section 5, finally the conclusion is presented.

2 Proposed methodology

In this section, we propose a technique for construction of fuzzy profile from numerical data. Let S denote a set of N training patterns (F₁, F₂,...., F_j,...., F_n) and F_j denote a feature. The feature F_j have n values: v_1j, v_2j,...., v_nj. F_jmin and F_j max denote the minimum and maximum value of feature F_j. When the input feature is quantitative value then followings steps are performed for membership function generation.

Step 1. Sort the values of feature in ascending order.

Step 2. Perform K means clustering algorithm for clustering the quantitative values of feature F_j into k clusters (y₁, y₂,...., y_i,...., y_k). y_i min and y_i max denote the minimum and maximum value of i^th cluster. Also find out the cluster centers (b₁, b₂,...., b_i,...., b_k) of k clusters (y₁, y₂,...., y_i,...., y_k).

Step 3. Determine the membership value of two boundary points of every clusters.

Find the difference between adjacent data. For each pair v_i and v_i +1 (i = 1, 2, 3, . . . . . , n - 1) the difference is diff_i= v_i +1-v_i.

Find the similarity value between adjacent data of quantitative values of feature F_j. The similarity between adjacent data is obtained according to the following formula [7].

$S_{m} = {\begin{matrix} 1 - \frac{{diff}_{i}}{C * σ_{s}} & for {diff}_{i} \leq c * σ_{s} \\ 0 & otherwise \end{matrix}$ (1)

where,

sm- represents the similarity between adjacent data.

C-control parameter deciding the shape of membership functions.

σ _s-standard derivation of deffi.

Substep 3.3. The minimum value of similarity in i^th cluster is chosen as the membership value of two boundary point y_{i min} and y_{i max} of i^th cluster.

Step 4. Determine the left vertex point (a_i, 0) by interpolation.

a_{i}^{'} = b_{i} - \frac{b_{i} - y_{imin}}{1 - μ (y_{imin})}

(2)

$a_{i} = {\begin{matrix} 0 & for a_{i}^{'} \leq 0 \\ b_{i - 1} & for 0 < a_{i}^{'} \leq b_{i - 1} \\ a_{i}^{'} & for a_{i}^{'} > b_{i - 1} \end{matrix}$ (3)

Step 5. Determine the right vertex point (c_i, 0) by interpolation.

c_{i}^{'} = b_{i} + \frac{y_{imax} - b_{i}}{1 - μ (y_{imax})}

(4)

$c_{i} = {\begin{matrix} c_{i}^{'} & for c_{i}^{'} \leq b_{i + 1} \\ b_{i + 1} & for c_{i}^{'} > b_{i + 1} \end{matrix}$ (5)

Step 6. Find the membership value of each quantitative values of feature F_j.

$μ (v) = {\begin{matrix} \frac{b_{i} - v}{b_{i} - a_{i}} & for v < b_{i} \\ \frac{c_{i} - v}{c_{i} - b_{i}} & for b_{i} \leq v < c_{i} \\ 0 & otherwise \end{matrix}$ (6)

3 An illustrative example

In this section we demonstrate the proposed method with a training example. The numeric value of training example [8] has been considered to explain the proposed algorithm. The data has been reproduced in Table 1.

3.1 Construction of membership function

The construction of membership function of insurance fee with proposed algorithm is explained step by step as follows:

Step 1 and step 2. Sort the values of feature in ascending order. Let k = 3, After applying k means clustering algorithm we get three clusters y₁, y₂, and y₃. Cluster y₁ contain 3 values (2000, 2100, 2200), cluster y₂ contain 3 values (2500, 2600, 2700), and cluster y₃ contain 2 values (3200, 3300). Cluster center of y₁, y₂, and y₃ cluster is b₁ = 2100, b₂ = 2600, and b₃ = 3250 respectively, which is shown in Table 2.

Step 3. In this step, we determine the membership value of two boundary points of every clusters by applying the substep 3.1 to substep 3.3.

In order to get similarity value between adjacent values of insurance fee, first of all, the difference between adjacent data is calculated e.g. (v ₂ - v ₁ = 2100 - 2000 = 100). The calculated values of diffi are shown in Table 2.

Let constant C=4. The standard deviation is calculated as 145.69, the value of similarity (s _m) is calculated using Equation 1 as follows: $s_{1} = 1 - \frac{100}{145.69 * 4} = 0.83$

$s_{2} = 1 - \frac{100}{145.69 * 4} = 0.83$

. . . . . . . . . . . .

$s_{6} = 1 - \frac{500}{145.69 * 4} = 0.14$

$s_{7} = 1 - \frac{100}{145.69 * 4} = 0.83$

It is clear from Table 2 that the minimum similarity value of cluster y₁, y₂, and y₃ are 0.83, 0.83, and 0.83 respectively. Therefore the membership value of the two boundary points y_imin and y_imax of y_i (i=1, 2, 3) is 0.83.

Step 4. Determine the left vertex point (a_i, 0) by interpolation. For cluster y₁, b₁=2100, y_1min = 2000, μ (y _1min) = 0.83. For cluster y₂, b₂=2600, y_2min = 2500, μ (y _2min) = 0.83, and for cluster y₃, b₃=3250, y_3min = 3200, μ (_{y
_3min}) = 0.83. The value of left vertex point is calculated using Equations 2 and 3 as follows:

$a_{1}^{'} = 2100 - \frac{2100 - 2000}{1 - 0.83} = 1511.76$

a ₁ = 1511.76

$a_{2}^{'} = 2600 - \frac{2600 - 2500}{1 - 0.83} = 2011.76$

a ₂ = 2011.76

$a_{3}^{'} = 3250 - \frac{3250 - 3200}{1 - 0.83} = 2955.88$

a ₃ = 2955.88

Step 5. Determine the right vertex point (c_i, 0) by interpolation. For cluster y₁, b₁=2100, y_1max = 2200, μ (y _1max) = 0.83. For cluster y₂, b₂=2600, y_2max = 2700, μ (y _2max) = 0.83, and for cluster y₃, b₃=3250, y_3max = 3300, μ (y _3max) = 0.83. The value of left vertex point is calculated using Equations 4 and 5 as follows:

$c_{1}^{'} = 2100 + \frac{2200 - 2100}{1 - 0.83} = 2688.23 > 2600$

c ₁ = 2600

$c_{2}^{'} = 2600 + \frac{2700 - 2600}{1 - 0.83} = 3188.24$

c ₂ = 3188.24

$c_{3}^{'} = 3250 + \frac{3300 - 3250}{1 - 0.83} = 3544.11$

c ₃ = 3544.11

Step 6. In this step, membership value of each quantitative values of feature Fj is calculated using Equation 6.

After the operation of step 1 to 6, the fuzzy profile of insurance fee is derived as shown in Fig. 2. Similarly fuzzy profile of Age and Property has been derived which is shown in Fig. 3 and Fig. 4.

3.2 Design fuzzy rule

In this step, fuzzy rule is defined in the form of IF–THEN conditional statement. IF part of the rule is known as antecedent and THEN part is consequent. The fuzzy rules are generated with the help of domain expert and numerical data [22]. The rules are as follows:

If age is y1 and property is y1 Then Insurance fees is y1

If age is y1 and property is y2 Then Insurance fees is y1

If age is y1 and property is y3 Then Insurance fees is y1

If age is y2 and property is y1 Then Insurance fees is y2

If age is y2 and property is y2 Then Insurance fees is y2

If age is y2 and property is y3 Then Insurance fees is y2

If age is y3 and property is y1 Then Insurance fees is y3

If age is y3 and property is y2 Then Insurance fees is y3

If age is y3 and property is y3 Then Insurance fees is y3

3.3 Fuzzy inference system

The fuzzy inference system (FIS) for predicting the insurance fee is shown in Fig. 5. The age and property have been considered as input to the FIS. Insurance fee has been considered as output.

Fuzzy inference engine maps fuzzy set into a fuzzy set. A fuzzy Max-Min operator is used for this model. In many applications, crisp value needs to be obtained as an output. The defuzzification method such as centroid, max-min and bisection etc. maps fuzzy set into crisp value. Centroid method of defuzzification is used in this model.

4 Testing result and validation

To validate the suitability and applicability of the proposed algorithm, it applied on training example [8] to predict the insurance fees. The predicated insurance fees, and insurance fees predicated by Hong et al. [8] shown in Table 3. From Table 3, it is clear that the testing results are closer to actual output data than testing result by Hong et al. algorithm [8].

A comparison of the testing results between proposed algorithm and Hong et al. algorithm [8] is listed in Table 4. From Table 4, it is clear that the predictive accuracy of the proposed algorithm, expressed by the different measures is better than that of Hong et al. algorithm [8]. These satisfactory validating results give confidence in the membership function construction, but of course further validation using big datasets would provide even greater confidence in the suitability and applicability of the proposed algorithm. However, the proposed method will not have any implication even for a large number of data set because the membership function has not been predefined and the proposed algorithm is not dependent on the size of thedata set.

The sensitivity analysis has been performed to analyze the impact of input variable on the output. It can be observed from Fig. 6. and Fig. 7. that age causes increasing variation in insurance fee. It is seen from Fig. 8. that the input variable property is insignificant than the input variable age in predicting the output variable insurance fees. Similar type of result is also shown by Hong and Lee [8].

5 Conclusion

In this paper, a new approach is proposed for automatically construction of fuzzy membership functions from numerical data. The proposed algorithm reduces the time and effort needed to fuzzy profile development. This algorithm significantly helps researchers and software practitioners to develop fuzzy inference system for various applications. The proposed method is better than Hall et al. algorithm [8] due to followings reasons:

No need to predefine fuzzy profile of input and output variables.

The predictive accuracy of the proposed algorithm is better than one presented in [8].

References

Burkhardt

Bonissone

1992

Automated fuzzy knowledge base generation and tuning

Fuzzy Systems, 1992, IEEE International Conference on 179 188

IEEE

Cano

Nava

2002

A fuzzy method for automatic generation of membership function using fuzzy relations from training examples

In Fuzzy Information Processing Society, 2002 Proceedings NAFIPS 2002 Annual Meeting of the North American 158 162

IEEE

Chen

Chang

2005

A new method to construct membership functions and generate weighted fuzzy rules from training instances

Cybernetics and Systems: An International Journal 36 4 397 414

Cintra

Camargo

Monard

2008

A study on techniques for the automatic generation of membership functions for pattern recognition

Congresso da Academia Trinacional de Ciências (C3N) 1 1 10

Dombi

1990

Membership function as an evaluation

Fuzzy Sets and Systems 35 1 1 21

Dubois

Prade

1994

Fuzzy sets-a convenient fiction for modeling vagueness and possibility

Fuzzy Systems, IEEE Transactions on 2 1 16 21

Hattori

Torii

1993

Effective algorithms for the nearest neighbor method in the clustering problem

Pattern Recognition 26 5 741 746

Hong

Lee

1996

Induction of fuzzy rules and membership functions from training examples

Fuzzy Sets and Systems 84 1 33 47

Ishibuchi

Fujioka

Tanaka

1993

Neural networks that learn from fuzzy if then rules

Fuzzy Systems, IEEE Transactions on 1 2 85 97

10.

Ishibuchi

Nakashima

2001

Effect of rule weights in fuzzy rule-based classification systems

Fuzzy Systems, IEEE Transactions on 9 4 506 515

11.

Ishibuchi

Nozaki

Tanaka

1993

Efficient fuzzy partition of pattern space for classification problems

Fuzzy Sets and Systems 59 3 295 304

12.

Ishibuchi

Yamamoto

2005

Rule weight specification in fuzzy rule-based classification systems

Fuzzy Systems, IEEE Transactions on 13 4 428 435

13.

Jin

2003

Advanced fuzzy systems design and applications

112

Springer

14.

Kaya

Alhajj

2003

A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining

Fuzzy Systems, 2003 FUZZ’03 The 12th IEEE International Conference on 2 881 886

IEEE

15.

Kuncheva

2000

Fuzzy classifier design

Springer

16.

Lee

1990

Fuzzy logic in control systems: Fuzzy logic controller ii

Systems, Man and Cybernetics, IEEE Transactions on 20 2 419 435

17.

Makrehchi

Kamel

2011

An information theoretic approach to generating fuzzy hypercubes for if-then classifiers

Journal of Intelligent and Fuzzy Systems 22 1 33 52

18.

Medasani

Kim

Krishnapuram

1998

An overview of membership function generation techniques for pattern recognition

International Journal of Approximate Reasoning 19 3 391 417

19.

Mitra

Konwar

Pal

2002

Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: Generation and evaluation

Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 32 4 328 339

20.

Ross

2009

Fuzzy logic with engineering applications

John Wiley & Sons

21.

Takagi

Sugeno

1985

Fuzzy identification of systems and its applications to modeling and control

Systems, Man and Cybernetics, IEEE Transactions on SMC-15 1 116 132

22.

Wang

Mendel

1992

Generating fuzzy rules by learning from examples

Systems, Man and Cybernetics, IEEE Transactions on 22 6 1414 1427

23.

Chen

1999

A new method for constructing membership functions and fuzzy rules from training examples

Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 29 1 25 40

24.

Yadav

Chaturvedi

Misra

2012

Early software defects prediction using fuzzy logic

International Journal of Performability Engineering 8 4 399

25.

Yadav

2014

Early software reliability analysis using reliability relevant software metrics

International Journal of System Assurance Engineering and Management 1 12

26.

Yadav

2015

A fuzzy logic based approach for phase-wise software defects prediction using software metrics

Information and Software Technology 63 44 57

27.

Zadeh

1965

Fuzzy sets

Information and Control 8 3 338 353

28.

Zadeh

1975

The concept of linguistic variable and its application to approximate reasoning-1

Information Sciences 8 199 245