A new modeling algorithm based on ANFIS and GMDH

Abstract

System modeling is one of the most important tasks of dynamic analysis and prediction systems, and imprecise model may lead to high bias. The presence of noise in sample data can make it more difficult to obtain precise system models. A new modeling algorithm called ANFIS-GMDH is presented in this paper, which builds upon the traditional ANFIS structure and utilizes the self-organizing mechanisms of GMDH. The aim of ANFIS-GMDH is to improve upon the traditional ANFIS method and prevent overfitting of noisy data. The well-studied Box-Jenkins gas furnace data is utilized to validate the algorithm, with results showing that the proposed algorithm performs better than traditional ANFIS, GMDH and subtractive clustering for both noisy and noiseless data, without any significant increase in execution time.

Keywords

ANFIS-GMDH noisy data system modeling gas furnace data

1 Introduction

System models are of great importance in almost all fields, with a well-defined system model allowing researchers to better understand and predict a system’s behavior. With the existence of time-varying characteristic, nonlinearity and noise corruption, however, it becomes harder to obtain a universal structure for complex systems [1].

As a result of this, fuzzy modeling has emerged as an important branch of systems modeling, and has attracted much attention in recent decades [1 –4]. Fuzzy systems can approximate any continuous function [3, 5] by decomposing the input space into subspaces and interpreting each subspace with output MF (Mamdani fuzzy model), constant (zero-order TS fuzzy model) or linear (one-order TS model) expressions [6, 7]. Of these widely used fuzzy models, the TS model has attracted more attention than the Mamdani model due to its simpler and more interpretable consequent parts [4, 8]. There are however no standard procedures for transforming expert knowledge into TS fuzzy rules, and the identification of the premise and consequent parts of TS fuzzy models are quite technical [2]. More attention has thus been paid to structure identification. Different kinds of algorithms, such as subtractive clustering [9], self-organizing mechanisms [10, 11], and neuro-networks [7, 8] have been adopted for TS models building.

The Adaptive Neuro Fuzzy Inference System (ANFIS) is one such algorithm, which solves the problems of fuzzy rule generation and structure identification by combining the TS fuzzy models with adaptive network. Since first proposed by Jang in 1993 [8], ANFIS has been widely used in many fields such as time series prediction [12], system modeling and control [13], fault diagnosis [14] and nonlinear fitting [15]. However, it has to be noted that two restrictions may narrow its applicability:

The performance of the ANFIS system depends heavily on the training and test data pairs [16], and non-comprehensive training datasets may result in the system being improperly trained, causing large identification errors or even failure [17].

After having chosen the number and shape of MFs, the structure of ANFIS remains fixed, and its complexity cannot be adjusted to trade-off between system simplicity and generalization ability, which will cause model overfit in noisy environments [18].

Since the existence of noise in real world data, ANFIS may overfit the noisy data and some local errors may occur due to the limitations mentioned above. Some previous researches about noisy data regression can also be found in [19, 20]. To better apply ANFIS in noisy environment, a new algorithm combining both ANFIS and the group method of data handling (GMDH) is presented. The new algorithm, called ANFIS-GMDH, is based on the traditional ANFIS structure and takes advantage of the self-organizing mechanism that is at the core of GMDH. We aim to combine the advantages of ANFIS at fuzzy rules generation with those of GMDH at finding optimal structure complexity according to training and testing data sets [21], and to generate a new modeling method which robustly handles noisy data.

The paper proceeds as follows: Section 2 briefly introduces existing algorithms, including ANFIS, subtractive clustering and GMDH; Section 3 presents the structure and procedures of the newly developed algorithm ANFIS-GMDH; Section 4 contains numerical simulations for the Box-Jenkins gas furnace data, and a conclusion is drawn at last.

2 Preliminaries and basic concepts

2.1 Adaptive Neuro Fuzzy Inference System (ANFIS)

ANFIS is firstly proposed for system modeling by Jang in 1993 [8]. It integrates the back-propagation (BP) and the least squares estimation (LSE), realizing identification of structure and parameters of TS fuzzy system by adaptive network, and during the process of system learning, the fuzzy rules are derived automatically.

A typical structure of a first order TS model can be descried as:

IF x₁ is A_i1 and x₂ is A_i2 ... and x_n is A_in,

Then y = b_i0 + b_i1x₁ + b_i2x₂ + … + b_inx_n.

Where x₁, x₂, …, x_n are system inputs, y stands for system output, A_i1, A_i2, …, A_in are fuzzy sets, b_i0, b_i1, …, b_in are output parameters of fuzzy system.

The graphical representation of the equivalent ANFIS structure for TS fuzzy system is shown in Fig. 1; for simplicity, it is assumed that this ANFIS example system has two input variables and one output variable.

As shown in Fig. 1, the structure of ANFIS can be divided into five layers. Each node in the same layer contains the same function, and note that the square indicates the adaptive node (parameters varying during training), and the circle stands for fixed nodes. The function of each layer can be illustrated as follows (O_ij stands for ith node in jth layer).

Layer 1: System Fuzzification

All of the input variables will be interpreted as linguistic labels in this layer, and the node number of individual input indicates the number of membership functions (MF). The shape of MFs can be chosen according to the system understudy. Typically speaking, the parameters in this layer are called premise parameters.

Layer 2: Firing Strength Calculation

Each node in Layer 2 multiplies input signals (i.e. the MFs) and outputs the product. Every output of individual node stands for the firing strength of relevant fuzzy rule.

Layer 3: Firing Strength Normalization

Each node of this layer outputs the ratio between the ith firing strength and the sum of all firing strengths, and is called normalized firing strength.

Layer 4: System Defuzzification

Each node in Layer 4 outputs the crisp value of the corresponding fuzzy rule, and the output is the product between output of layer 3 and input variables linear combination. The parameters of this layer are known as consequent parameters.

Layer 5: Overall Output

This layer contains only one node and outputs the overall crisp output value by summing all of the individual node output of Layer 4.

The training procedure of ANFIS contains the following steps:

Forward pass: Input data are used to calculate each node output until the crisp output value in layer 5 is obtained, then LSE is used to identify the consequent parameters. The premise parameters remain fixed in this step.

Backward pass: The error rate between ANFIS output and data output are calculated, and propagate backward toward the input end. Backpropagation gradient descent is utilized to update the premise parameters. The consequent parameters remain fixed during this process.

Test the error between ANFIS output and data output, stopping if the error achieves the predefined value, otherwise continue iteration.

As stated in the introduction part, the spirit of fuzzy system is “divide and conquer” [22]. ANFIS divides the input space equally into subspace, i.e. with the partition method called grid partition [8], and adjust the area for individual subspace by input-output data pairs. However, with the increment of input variables number and MFs, grid partition cannot escape the “curse of dimensionality” problem, which will increase computation time significantly.

2.2 Subtractive clustering

Subtractive clustering is proposed to overcome the drawbacks of traditional grid partition. It divides the data set into fuzzy clusters and then projects different clusters into input space [9 , 24], in this manner, the selected cluster centers can be treated as characteristic system behavior, and one cluster stands for one “if-then” rule in TS model.

For a given data set, the influence of certain point to other points, namely potential, is defined as: $P_{j} = \sum_{k = 1}^{N} e^{- (4 / r_{a}^{2}) {∥ d_{j} - d_{k} ∥}^{2}}$ (1) where r_a is a positive constant called cluster radius, which defines the radius of influence. Higher potential value of certain data point means more data points are surrounded. Assuming that P₁ has the highest potential, and corresponding data point $d_{1}^{*}$ is defined as the first cluster center.

Next, potentials need to be recalculated for the rest of data. To eliminate the effects of the first cluster center, new potential is calculated according to: $P_{i} \Leftarrow P_{i} - P_{1}^{*} e^{- (4 / r_{b}^{2}) {∥ d_{i} - d_{1}^{*} ∥}^{2}}$ (2) where r_b = δa is also a positive constant, and δ is called the quash factor. By reducing the influence of the first center, the data points near $d_{1}^{*}$ have smaller probability to be chosen as the second cluster center. This process will continue if the kth potential satisfies the following condition: $P_{k}^{*} > \bar{ɛ} P_{1}^{*}$ (3) where $\bar{ɛ}$ is called the acceptance ratio. The process will end if: $P_{k}^{*} < \underline{ε} P_{1}^{*}$ (4) where $\underline{ε}$ is called the rejection ratio. If the potential value of the newly calculated center falls into the area between these two ratios, new criterion is used: $\frac{v_{min}}{r_{a}} + \frac{P_{k}^{*}}{P_{1}^{*}} \geq 1$ (5) where v_min is the minimal distance between $d_{k}^{*}$ and previously calculated cluster centers. If $P_{k}^{*}$ doesn’t satisfied this criteria, the point with the second largest potential is selected and re-tested. For the parameters of δ, $\bar{ɛ}$ , $\underline{ε}$ , suggestion values are given here: δ = 1.25, $\bar{ɛ} = 0.5$ and $\underline{ε} = 0.15$ . Similar choice can be found in [9 , 25]. It should be noted that the choice of cluster radius isthe most important task of subtractive clustering, and the value will directly decide the number of cluster, i.e. the number of fuzzy rules for the generated fuzzy system.

Since the radius offers different “solution” for the studied system, and in order to keep the balance between system complexity and generalization ability, we used 8 radius values ranging from 0.25 to 0.6 (with an interval of 0.05) for the numerical examples and the one with the minimum MSE value is selected.

2.3 GMDH

Ivakhenko proposed GMDH to depict complex nonlinear systems via polynomial in 1970s [26]. By automatically choosing the combination of system partial expression according to a certain external criterion, the complexity of the generated polynomial will increase gradually within each iteration until the minimum external criterion value is achieved. In order to get satisfaction result for a variety of requirements, different kinds of external criteria may be applied to sift the candidate partial expression [27]. The principle and procedure of GMDH are as follows:

Given a MISO system with sampled input-output data pairs, GMDH can depict the system with a polynomial as expressed in: $y = a + \sum_{i = 1}^{m} b_{i} x_{i} + \sum_{i = 1}^{m} \sum_{j = 1}^{m} c_{ij} x_{i} x_{j} + \dots$ (6) where y is the output of the system; x_i, x_j, ⋯ , x_m are the input variables; and a, b_i, c_ij, ⋯ are the system parameters to be estimated. The modeling procedure is as follows:

Step 1. Divide the sampled data pairs into two sets: training set and checking set.

Step 2. Build partial functions according to different input variable combinations. Assume that the system has m input variables, taking n inputs from the input variables set and form a quadratic equation expressed with these n variables. For example, if we choose the variables x₁ and x₂, and the polynomial function can be expressed as: $z = A + {Bx}_{1} + {Cx}_{2} + {Dx}_{1}^{2} + {Ex}_{2}^{2} + {Fx}_{1} x_{2}$ (7) where A, B, C, D, E, F are connecting weights. Putting the result of this polynomial in the first column of estimation matrix, expressed as $z_{12}^{1}$ and the rest of the matrix can be calculated in the same manner. For m system variables, a total of $C_{2}^{m}$ values will be calculated accordingly.

Step 3. Substitute the original system variables with newly developed partial function which has lower external criteria value. In this paper, the Akaike information criterion (AIC) is applied as the external criterion [28]: $AIC = 2 k + n \ln (\frac{RSS}{n})$ (8) where k is the number of parameters to be estimated, n is the total data number of checking data set, RSS is the error variance between the estimated and actual outputs. It should be noted that some other kinds of external criterions can also be chosen at this step, and it’s hard to assert that which criterion is the best since the problem varied. Some external criterions can be found in [27].

Step 4. Choose the polynomial corresponding to the minimum external criterion value. With the increment of system complexity, the external criterion value of the newly developed polynomial will decrease at first, and begin to rise when the output of polynomial starts overfitting the checking data set. The selection and combination process will cease automatically when the minimum external criterion value is obtained, which means the balance point between overfitting and underfitting has been reached.

3 The algorithm of ANFIS-GMDH

The proposed ANFIS-GMDH algorithm is introduced in this section. Basically speaking, ANFIS-GMDH is a combination of ANFIS and GMDH, and the overall output layer of ANFIS is replaced by GMDH to further improve the accuracy of the modeling process. Figures 2 and 3 provide structural diagram and a flow chart respectively for the algorithm. ANFIS-GMDH contains seven steps, with steps 2 to 4 corresponding to the ANFIS, while steps 5 to 7 correspond to the self-organizing mechanism of GMDH. The process can be defined as follows:

Step 1. Data Division. At the first step, the input data will be divided into three parts: training data set A, testing data set B and validation data set C.

Step 2. Variable Fuzzification. This step corresponds to Layer 1 of the traditional ANFIS structure. Input variables are transformed into fuzzy sets by grid partition, and operators need to decide the number and the shape of MFs. It should be noted that these parameters need to be chosen carefully, as they will directly determine the execution time.

Step 3. Fuzzy implication. The structure and fuzzy rules of ANFIS are determined in this step. For the sample system shown in Fig. 2, nine fuzzy rules are generated in this step: $\{\begin{array}{c} \begin{array}{c} R^{1} : I f x_{1} i s A_{1} a n d x_{2} i s B_{1}, t h e n y^{1} = a_{1}^{0} + a_{1}^{1} x_{1} + a_{1}^{2} x_{2} \\ R^{2} : I f x_{1} i s A_{1} a n d x_{2} i s B_{2}, t h e n y^{2} = a_{2}^{0} + a_{2}^{1} x_{1} + a_{2}^{2} x_{2} \end{array} \\ ⋮ \\ R^{9} : I f x_{1} i s A_{3} a n d x_{2} i s B_{3}, t h e n y^{9} = a_{3}^{0} + a_{3}^{1} x_{1} + a_{3}^{2} x_{2} \end{array}$ (9)

Step 4. ANFIS parameter Identification. This step corresponding to the hybrid learning algorithm of ANFIS. The premise parameters and the consequent parameters generated in step 2 and step 3 are calculated by LSE and backpropagation gradient descent from the training dataset A and the testing dataset B.

Step 5. GMDH Parameter Identification. The output of Step 4, i.e. the linear descriptions of each subspace, is imported into the GMDH part and the system model is further explored through using of the self-organizing mechanism. All of the linear descriptions are combined in pairs, and candidate partial equations are formed according to Equation (7). The parameters of Equation (7) are calculated by LSE from training dataset A and testing dataset B. For the sample system in Fig. 2, a total of $C_{9}^{2} = 36$ equations are produced.

Step 6. Model Selection. The best M candidate equations with the smallest external criterion values are selected.

Step 7. External Criterion Judgment.

The smallest external criterion values of current generation and previous generations are compared. If the current criterion value is greater than the previous one, a system structure corresponding to the optimal complexity has been obtained. Otherwise, steps 5 to 7 need to be repeated until the minimum value is generated.

4 Simulation

In this section, a sample analysis of a practical dataset, Box-Jenkins gas furnace data [29], is conducted, to validate the performance of the proposed ANFIS-GMDH algorithm. Three other algorithms are presented in comparison: ANFIS (with grid partitioning), subtractive cluster algorithm, and GMDH. To the authors’ best knowledge, there is no standard method to decide the number and shape of MFs for ANFIS (with grid partitioning). In this case, after trail and error, the number of MFs for ANFIS is set to be 2, and the shape is chosen as bell shape. Further, for purpose of evaluating the robustness of the four algorithms, different levels of Gaussian white noise data are generated, with the variance of the being set to δ = 0, 0 . 1, 0 . 2, 0 . 3. It should be noted that in order to eliminate the randomness of the noise, the sets of noise are generated only once, with these same noise data being used with all of the four algorithms. For comparison, the mean square error (MSE) is chosen as the evaluation criterion.

4.1 Box-Jenkins gas furnace data

The Box-Jenkins gas furnace data is a benchmark data series for evaluating system modeling algorithms. This data set contains 296 pairs of measurements from a SISO system with sample interval of 9 seconds [29]. The input u_k is chosen as the gas flow rate of the furnace and the output y_k is the CO₂ concentration of outlet gas.

For the proposed ANFIS-GMDH algorithm, it’s assumed that six candidate inputs are available: u_k-3, u_k-2, u_k-1, y_k-3, y_k-2, y_k-1. In this case, a total of 293 data pairs can be further used for system modeling and validation. In the first step, these 293 data pairs are divided into two parts for input variable selection: the first 146 pairs are utilized as training set, and the remaining 147 pairs are defined as the testing set. Based on the algorithm proposed by Jang [30], of all of input candidate combination, four most relevant inputs are selected as input: [u_k-2, u_k-1, y_k-2, y_k-1]. The details about the input variable selection are shown in the Appendix part.

After input selection, the 293 input data pairs are divided into three parts for validation of different algorithms: the first 73 data pairs are defined as training set A, the following 73 data pairs are testing set B, and remaining 147 data pairs are system validation set C. The four levels of noise are added to the output variable y_k in A, B and C.

Figure 4 presents the results of these four different algorithms when there is no noise, from which it can be seen that all of the four methods can approximate the correct output quite well.

The results from the comparison of different noise level between our algorithm and other algorithms are presented in Table 1. When the noise was added, the MSE of the ANFIS-GMDH algorithm increased from 0.1243 to 0.4046, an increase of 0.2803, while the MSE of the other three algorithms increased by at least 0.3927. With the increment of noise level, the MSE value of the ANFIS, subtractive clustering and GMDH algorithms grew faster than that of ANFIS-GMDH. It can thus be seen that our method demonstrates superior performance in all of the four noise levels tested for the Box-Jenkins gas furnace data series. It should moreover be noted that for all of the four noise levels tested, the GMDH algorithm performed better than ANFIS and subtractive clustering, which can be interpreted as demonstrating that, by adjusting complexity according to the AIC criterion, the GMDH algorithm develops the property of noise-immunity.

The reasons why the proposed ANFIS-GMDH method performs better than the other three algorithms may can be interpreted from two aspects:

From the standpoint of traditional ANFIS, the structure of ANFIS is almost fixed and operators can only adjust the number and shape of MFs. Once these parameters are specified, ANFIS can only adjust the premise and consequent parameters with hybrid algorithm, which means for ANFIS cannot adapt to noisy data by adjusting system structure, which means that over-fitting of noisy data is hard to be prevented from the system structure viewpoint. However, for the proposed ANFIS-GMDH method, by replacing the overall output layer with a self-organizing mechanism, a more flexible system structure is built, and the model complexity can thus be adjusted automatically according to both training and validation data set. The simulation result reveal that this is an effective way to remedy the overfitting.

For the GMDH part, the input converts from traditional crisp values to a linear expression for each subspace of the input space. This renders the input of GMDH more interpretable than its original crisp input value, with previous researchers having concluded that fuzzy inference system is universal approximation of any nonlinear function.

The proposed ANFIS-GMDH method can thus integrate the advantages of both ANFIS and GMDH, showing the superior robustness to traditional ANFIS and GMDH algorithms.

The ANFIS-GMDH method does however have its own limitations:

The presented ANFIS-GMDH algorithm is based on ANFIS (grid partition), so if the system input and the number of MFs increases significantly, the structure’s complexity may increase and the time required for system training will increase rapidly. This method may hence not be suitable for use with numerous input variables; and the subtractive clustering may be a reasonable substitute for the first part of the algorithm. For the subtractive clustering method there is however no standard method of choosing the cluster radius, meaning its use must be approached on a case by case basis relying on the judgment of the operators.

Since the proposed algorithm is based on ANFIS, the limitations of ANFIS mentioned in the introduction part will also restrict its application. To build accurate system model, the training signal must be chosen carefully to meet the requirement of completeness. If the training signal is non-comprehensive in both time and frequency domain, high bias may appear.

4.2 Execution time analysis

For these four algorithms, the execution time comparison is presented in Table 2 1 , and the execution time is the mean value of 20 runs for each algorithm.

It can be observed that the execution time of the proposed ANFIS-GMDH method largely depends on the execution time of ANFIS and the complexity of the self-organizing mechanism. For the gas furnace data series, ANFIS generates a network with 2⁴ = 16 fuzzy rules, with an execution time of 0.3057 s, while the GMDH algorithm takes only 0.0083 s for four input variables with two hidden layers. For the ANFIS-GMDH algorithm, the number of input variables to the self-organizing mechanism is 16, which is equivalent to the number of fuzzy rules in the ANFIS stage, and the number of hidden layers is 3, which means the minimum external criterion value appears after 3 iterations. The difference in execution time between ANFIS (grid partition) and subtractive clustering can be interpreted as a result of the difference between the partition method used. ANFIS divides the input space equally into 16 subspaces and adjusts the area of each subspace by gradient descent, while subtractive clustering projects fuzzy clusters onto the input space, with complexity of the input space primarily depending on the radius of influence. It is hence hard to determine which method is superior in terms of execution time, thus the decision should be made on a case-by-case basis.

Since these four algorithms are trained off-line, and the ANFIS-GMDH algorithm performs the best among these four methods in both noisy noiseless environments, we can conclude that this algorithm is superior to other three methods for the gas furnace data and worthy further study.

5 Conclusion

In this paper, a new modeling approach is presented. Towards addressing the deficiencies of traditional ANFIS in terms of adjusting system complexity to handle noisy data, the self-organizing mechanism of GMDH is utilized to improve robustness. The combination of these two algorithms, termed ANFIS-GMDH, can automatically adjust system complexity to suit the training data set. The proposed algorithm can prevent overfitting of noisy data, and provides more reasonable input into the GMDH stage. Results of the simulation using Box-Jenkins gas furnace data demonstrate the superior performance of the proposed algorithm for both noiseless and noisy data, and the increment of execution time is acceptable.

Further research may concentrate on the following aspects:

From the viewpoint of system input, the ANFIS-GMDH algorithm cannot escape the “curse of dimensionality” when the number of input variables and member functions is relatively large. It may hence be fruitful to utilize clustering technology, which has been widely used for handling large numbers of input variables.

The proposed algorithm is trained off-line. In order to better apply this method real-time, the online version for ANFIS-GMDH needs further research. What’s more, the execution time for the online version worth studying, since it’s of great importance for real-time application.

6 Appendix:

One of the most important tasks of using ANFIS is selecting the most relevant input variables combination from candidate input variables [30, 31].

The basic idea of input variables selection is that after one epoch training trail, the model with the smallest RMSE has greater potential to generate satisfactory system than other model, i.e. the smallest RMSE input variables combination after one epoch will be chosen as the input variables for system modeling[30].

The input combination of gas surface data is chosen as: [u_k-2, u_k-1, y_k-2, y_k-1]. For the other 14 candidate input combinations, the training and testing error are shown in Fig. 5 and Table 3, and the detailed input combination in listed in Table 4.

Footnotes

The four solutions are obtained with i5 CPU @2.6GHz and 4GB RAM.

Acknowledgments

Acknowledgments to China Scholarships Council (CSC) for financial support of Mr. yechen Qin for the duration of the study in Texas A&M University.

References

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.