Abstract
Fuzzy functions (FFs) models were introduced as an alternate representation of the fuzzy rule based approaches. This paper presents novel Interactively Recurrent Fuzzy Functions (IRFFs) for nonlinear chaotic time series prediction. Chaotic sequences are strongly dependent on their initial conditions as well as past states, therefore feed forward FFs models cannot perform properly. To overcome this weakness, recurrent structure of FFs is proposed by placing local and global feedbacks in the output parts of multidimensional subspaces. IRFFs’ optimized parameters should minimize the output error and maximize clusters density. To achieve these contradictory goals, Non-dominated Sorting Genetic Algorithm II (NSGAII) is applied for simultaneously optimizing the objectives. Also, feedback loop parameters are tuned by utilizing gradient descent algorithm with line search strategy based on the strong Wolfe condition. The experimental setup includes comparative studies on prediction of benchmark chaotic sequences and real lung sound data. Further simulations demonstrate that our proposed approach effectively learns complex temporal sequences and outperforms fuzzy rule based approaches and feed forward FFs.
Keywords
Introduction
Over the last three decades of research on the dynamical systems, it has shown that chaos has been widely observed in both natural and engineering systems. Several methods of computational intelligence and soft computing have been widely used for chaotic signals prediction and identification. Neuro-fuzzy systems is one of the interesting soft computing approaches utilized in various application such as modelling, classification, and prediction [1–8]. Turksen introduced another type of fuzzy structure, called Fuzzy Functions (FFs) [9, 10] which has a simpler structure compared with neuro-fuzzy rule based systems. The multidimensional structure characteristic of the input space of FFs leads to elimination of combination operators such as, t-norm and t-conorm in antecedent fuzzy subsets. It is one of the main differences of the multidimensional structures with the rule based structures [11]. Moreover the obtained membership values as well as input variables are utilized to estimate FFs’ functions which are equivalent to consequent parts of the rule based systems. These functions can be estimated by different regression methods like the Least Square Estimation (LSE) [10], Multi Adaptive Regression Spline (MARS) [12], and Support Vector Machine (SVM) [13].
The attractor of chaotic systems contains interactive structure in the state space. Therefore the corresponding time series has a considerable dependency on their past states and are very sensitive to the initial conditions. So their prediction are much more difficult than those of static/algebraic systems [14]. Recurrent structures enable the model to address temporal sequences powerfully by responding to memory information based on the system past states. In contrast to feed forward fuzzy rule based systems, the combination of recurrent structures and fuzzy systems has been investigated in several fuzzy models. Two categories of recurrent structures were introduced as local and global feedbacks. By adopting global feedbacks, some of fuzzy structures such as, Takagi–Sugeno–Kang (TSK) type recurrent fuzzy network (TRFN) [15], the enhanced memory TRFN [16], the high-order recurrent neuron-fuzzy system (HO-RNFS) [17], and the recurrent self-organizing neural fuzzy inference network (RSONFIN) [18], have been proposed. Some other fuzzy neural networks using only local feedbacks have also been proposed [2, 18–20]. In [18] a recurrent self-evolving fuzzy neural network with local feedbacks (RSEFNN-LF) is proposed that its recurrent structure comes from locally feeding the firing strength of a fuzzy rule back to itself. Dynamic fuzzy neural network (DFNN) [19], self-recurrent Wavelet Neural Network (SRWNN) [2], and Indirect Stable Adaptive Fuzzy Wavelet Neural Controller (ISAFWNC) [20] use a local feedback structure similar to RSEFNN-LF, with the exception that the local feedbacks are located in their conclusion parts. As a local and global method, the interactively recurrent self-evolving fuzzy neural network (IRSFNN) [21] has been introduced which its recurrent structure is formed as an external loops and internal feedback.
In the same way, the main goal of this paper is extending the FFs approach to a recurrent structure, which can utilized nonlinear and chaotic time series prediction. In contrast to conventional FFs structure that used weighting average for aggregation, in this study recurrent structure is applied to improve modeling capacity of relationships among states of the time series which resulted in our propose scheme, Interactively Recurrent Fuzzy Functions. Membership values of input vectors are calculated by using Unsupervised Optimal Fuzzy Clustering (UOFC) and then multidimensional fuzzy subspaces will be achieved. IRFFs’ recurrent links are placed in the output parts of the multidimensional subspaces and have led to the creation of FFs interactive structure. Beside the scalar input variables, their membership values are then used by Multivariate Adaptive Regression Spline (MARS) to determine each cluster’s FF.
One important task in designing IRFFs is optimization of free parameters such as number of fuzzy subsets and fuzzy exponents as well as feedback weights. In some studies, the number of rules is fixed [2, 22] or generated automatically based on a firing strength criterion [18, 21]. In our proposed system, objectives are minimizing IRFFs’ error and increasing the density of its fuzzy subspaces. Simultaneous optimization of both objectives by using conventional approaches is practically infeasible. To overcome this issue, we applied Non-dominated Sorting Genetic Algorithm II (NSGAII) [23] to optimize two objectives at the same time. Also, interactive feedback weights are trained with gradient descent algorithm with line search based on strong wolf condition that automatically tunes learning rate. Some of the well-known chaotic time series and a set of recorded lung sound signals in laboratory are considered for the prediction task and discussed to verify IRFF’s performance. Figure 1 shows the proposed scheme for IRFFs system:
The rest of this paper is organized as follows; the preliminaries of state space reconstruction is briefly reviewed in Section 2. Then in Section 3, UOFC method for multidimensional fuzzy subspaces will be presented. MARS regression and NSGA II algorithms for optimization are described separately in Sections 4and 5. Details of proposed IRFFs’ structure and parameter learning scheme are presented in Sections 6. In Section 7, we will compare our system with conventional methods. Finally, Section 8 is reserved for conclusions and future works.
Stare space reconstruction
State space may be used to study the geometric and dynamic properties of a given dynamic system. But in the most of the problems, it is not possible to measure all of dynamical/state variables, and a time series of observation is only available. The dynamics of the original system are unknown in given scalar data and so one of the most basic steps of analyzing time series is state space reconstruction.
According to Taken’s embedding theorem [24], reconstructed state space of time series x (n) is given by:
The key issue in state space reconstruction is properly determination of ‘m’ and ‘τ’, because different delay and embedding dimension cause different state space for one time series. For calculating embedding dimension, False Nearest Neighbor (FNN) [25] was utilized. The main idea of FNN is to examine how the number of neighbors of a point along a signal trajectory change with increasing embedding dimension. By examining how the number of neighbors change as a function of dimension, an appropriate embedding can be determined. If embedding dimension is chosen too low, many of the neighbors will be false, but in an appropriate embedding dimension or higher, the neighbors are real.
Also, τ must be chosen in such a way that makes the components of the states to be independent. The best determination method of τ is using mutual information function which shows nonlinear and general dependence between two variables. Criteria for determine τ is the first local minimum value of mutual information function for different values of delays [26].
By assuming that the number of clusters is known, the objective function of basic FCM algorithm is as following [27]:
After minimization, membership degree of features and cluster prototype are as follows [27]:
And the distance function is as follows [29]:
Suppose y indicates the dependent variable and x = (x1, …, x
n
) is the vector of explanatory variables in the set of observations where x
i
= (xi1, xi2, …, x
im
) ∈ R
m
. least square regression is one of the conventional methods utilized for modeling of the relation between explanatory and output variables. In this method, b
j
(x) ; j = 1, 2, …, P are basis functions and the coefficients are β
j
which calculated by minimizing the Sum of Squared Error (SSE) defined as below [30]:
If specific form considered for basis function b j (x) it will be non-adaptive predictive model and if they become determined base observe data, it will be called adaptive. MARS is one of the fast and compatible adaptive regression approaches. This nonparametric regression method which is a generalization of step by step linear regression can effectively reveal hidden patterns and nonlinear relation in data sets [30]. In MARS the basis functions are chosen as hinge functions (h1, h2) and formed by a pair of functions known as spline, given as [31]:
Since for each of the explanatory variables x j there are n independent observations, by considering each observation as a knot, we have 2n functions for each x j . By considering set B includes all functions h1j, h2j, j = 1, …, d and product of two or more of them, each member of set B can be utilized as basis function of Equation 8 [31]. MARS is performed in forward and backward steps as follows:
Forward phase: This step starts with intercepting b0 or average of the response value, and then adds iteratively basis function in couple form to model. Coefficients β in Equation 8, are estimates in order to have the most decrease in SSE. To add a new basis function, MARS must search all of the below items: Statements in the model. All of variables (can select one for making basis function). All values of variable (for specify knot in new h function).
Adding new statement ends once changes in SSE become negligible or given maximum number of statements is achieved. At the end of this phase, a model with a lot of statements is provided that has poor performance in generalization and thus it needs a backward elimination phase.
Backward phase starts with the last model of forward phase and in each step removes a basis function from the model that has the lowest contribution to increase SSE. By eliminating each of basis function, a new model is obtained that can be the candidate for the final model. Performance of new model will be assessed with “Generalized Cross Validation” and this process continues until all of the basis functions are removed. MARS selects an optimum model based on minimizing GCV. GCV for kth step of removing phase, calculated asbellow [31]:
To solve many real world problems, it is necessary to optimize several objectives, simultaneously. Thus, Multi-objective (MO) optimization has been recently popular and was applied in different applications, like [32]. Here, MO optimization can be stated as finding the vector of solutions which optimizes the following vector function:
There are several methods for implementation of MO such as NSGA [34], MOGA [35], and NPGA [36]. NSGAII is an evolutionary and population based metaheuristic algorithm with predefined generation numbers (gen) and number of populations (pop) in each generation. The algorithm by utilizing a specific selection operator, creates a mating pool combines the parent and offspring population, selects the best solutions. Finally NSGAII finds the best solution in each generation and creates the Pareto optimal solution.
The proposed IRFFs comes as an extended version of FFs and is illustrated in Fig. 3, the links between clusters create the interactively recurrent dynamical structure. By utilizing the interaction feedback, the proposed IRFFs captures the information from other subspaces. Our IRFFs system follows the following steps: State space reconstruction. Calculating membership values by use of UOFC clustering in state space. Reproducing augmented input by appending membership values to primary inputs. Estimate appropriate function for each cluster utilizing MARS method. Calculation aggregation weight in the form of interaction between rules and past states. Weighted averaging for output calculation.
IRFFs’ structure
Each rows of Sis one state in state space. So nth input data is s (n) = (x (n) , x (n + τ) , …, x (m - 1) τ)) and the corresponding output is y (n) = x (n + mτ).
Here and λ are the samples number, the membership value of tth data in cluster k, the weight of aggregation, and the coefficient of interactive and recurrent relation, respectively. The aggregation weights depend not only on current membership values of each cluster, but also on the previous membership values of other clusters and itself.
In this subsection the structure and parameters learnings of IRFFs are presented. All free design parameters are number of clusters (C), degree of fuzziness (m), penalty per knot (GCV), and recurrent layer coefficients (λ). Figure 4 presents IRFFs’ flow chart learning scheme which includes structure and parameterlearning.
Structure learning
In order to have compact clusters and better IRFFs’ performance, the following two objectives are considered:
1. Minimizing error value of prediction.
2. Maximizing partition density value for fuzzy clustering (Equation 7).
For having an optimum tradeoff between both objectives, the structure learning phase uses NSGAII algorithm for optimizing the IRFF structural parameters (m, C, GCV). At the end of training step, best value of m*, C*, and GCV* are specified and used for test data.
In all generations of NSGAII, recurrent layer coefficients (λ) are obtained by applying gradient descent algorithm with line search based on strong wolf condition. Its main advantage is automatically determination of learning rate (η).
Interactively recurrent weights learning based on error of modeling is done as an iterative algorithm. Based on the error function introduced in Equation 17, we trained coefficients by using the followingequations:
Where
Equal case as for recurrent weight and inequality for interactive weight, here we have critical assumption that interactive weight is bidirectional.
This section presents three examples including chaotic sequence prediction, time series prediction in noisy environment, and multichannel lung soundmodeling. In all of the following examples the validity of our proposed scheme is confirmed by simulation. Here, population size and the number of generations are set to 100 and 500, respectively. Also Root Mean Squared Error (RMSE) and percent root mean squared difference (PRD) are considered as the quality criteria:
Example 1: Lorenz equations time series
Here, the time series prediction problem applied for the data generated from the Lorenz equations given by [37]:
where x, y, and z are state variables and a, b, and c are adjusting parameters. For a = 10, b = 8/3, and c = 28 the Lorenz equations will have chaotic behavior. After generating data sequences by settingtime steps of 0.05 and initialization values of (1, 1, 1) for state, we remove the first 201 transition points and keep the next 1000 points for every state variables. The first 800 points are utilized for training dataset and last 200 points for checking set. Based on Taken’s embedding theorem [24], the delay and embedding dimension are set as 1 and 3, respectively. Figure 5 shows the prediction result for series xand the excelent ability of IRFFs for predicting Lorenz time series.
Table 1 lists comparative results of Lorenz equations time series prediction based on the proposed IRFFs and other approaches. Here, we compared three FRB and two FFs system including Recurrent Self-Evolving Fuzzy Neural Network (RSEFNN) [18], Fuzzy Recurrent Wavelet Neural Network (FRWNN) [2], and Squared Root Cubature Kalman Filter-γ Echo State Network (SCKF-γESN [38], Fuzzy Functions with LSE (FFs-LSE) and Fuzzy Functions with MARS (FFs-MARS) [10, 12]. The results indicate that by addition of recurrent layer and interactive structure in IRFFs, better RMSE and PRD than those of other fuzzy structures are achieved. Although the IRFFs performance and recurrent structures such as RSEFNN and FRWNN are similar, the number of clusters in IRFFs is fewer than the number of fuzzy rules of them.
Example 2: noisy Mackey-Glass time series
The Mackey–Glass time series s (t) is extracted from the following differential equation [39]:
Now, we want to compare performances of different prediction methods in the presence of additive noise which has uniform distribution. By setting SNR = 6 dB, the result of Fig. 6 and Table 2 are achieved. Figure 6 shows that the system is able to reasonably follow the sequence of Mackey Glass time-series in the presence of noise.
In Table 2, the performance of the proposed IRFFs is compared with the following fuzzy forecasters: 2u function type2 fuzzy systems (2u T2 FLS) [41], Quasi-T2 Fuzzy Logic System Q-T2 FLS [42], and other systems those are used in Example 1. Multi objective learning of IRFFs generates 5 multi-dimensional fuzzy subspaces while FF-MARS have 7 ones. Therefore IRFFs has simpler structure than that of FFs approaches which decreases the output error. Moreover, Table 2 shows that because of using local and global feedbacks, our proposed method outperforms the 2u T2 FLS [41] and Q-T2 FLS [42] approaches which used uncertainty in their type2 membership functions.
Example 3: (real lung sound signal modeling)
Since the lung sound signal is chaotic and has very complex dynamic [43], its estimation and modeling are used in different areas like classification based on model parameter, denoising, and compression [44]. Here, we use the signals recorded in the Department of Pneumology in Shariati Hospital by Bioinstrument and Biological signal processing laboratory researchers of Amirkabir University of Technology’s [45]. Data was recorded using a 6-channel respiratory sound acquisition device in a non-sound proof room. Each record contains 3 inhales and 3 exhales with 1 second pause after each breath.
In state space reconstruction, the parameters value of m = 6, τ = 8 are obtained. For estimation and prediction purpose, one inhale of a lung sound was picked up from a healthy subject. Due to the chaotic nature and existence of high relationship among the states of the lung sound signals, the shapes of the signals are very complex.
Figure 7 shows original lung sound signal along with one predicted by our proposed method in all 6 different channels. This figure confirms that the temporal dynamics of lung sound signals in each channel is effectively modeled by interactively structure. In channel 3, although IRFFs doesn’t track the signal in very short time duration around 1600, its performance in the rest time intervals and in all of the other channels aredesirable. Also, Fig. 8 shows error of proposed method for channel 6 data.
Table 3 lists the results of different FFs methods and other interactive structures on lung sound data modelling. The innovative learning algorithm generates optimal values compared to the other methods and leads to lower training and test errors as well as less complex system.
In this paper a recurrent structure for chaotic time-series prediction is added to fuzzy functions system. By placing local and global feedbacks in the output parts of FFs’ multidimensional subspaces, an interactive structure is achieved and IRFFs systems is proposed. IRFFs uses recurrent structure to memorize previous states of the system. Self-feedbacks in IRFFsprovide capturing the critical local information of time series sequences. Furthermore, global feedbacks feeding the output part of each multidimensional subspace to others, enable IRFFs to effectively address temporal sequences responding to memory information from prior system states. Therefore, generalization of IRFFs in comparison with feed forwards FFs is improved.
Multi-dimensional fuzzy subspaces are created by use of UOFC. This method can consider different types of cluster shape, density and number of data points in each cluster, which is necessary in time series clustering for identifying all kinds of system behaviors.
Also, MARS method has been applied for linear and nonlinear regression of each multidimensional subspace. In fact by MARS can learn different kind of dynamical behavior.
The structure and parameter learning algorithm enables IRFFs to automatically identify its desired structure. It eliminates the sensitivity of learning algorithm to initial selection of parameters and provides the admissible structure for given data-sets. Simultaneous optimization of parameters with NSGAII provides minimal structures and maximum prediction capability. Therefore, IRFFs has less complex architecture than that of conventional FFs. Also, in each generation of NSGAII, it is needed to optimize the recurrent coefficients. Gradient descent algorithm with line search based on Wolfe condition is utilized for this purpose. Because of automatically determination of learning rate, possibility of divergence and increasing the computational complexity resulted from inappropriate selection of learning rate are eliminated. Also convergence of algorithm become faster.
To confirm the performance of IRFFs, two bench-mark and one recorded lung sound time-series were utilized for simulation. This is the first time that FFs structures was implemented for biological signal processing. In all examples, IRFFs was compared with latest popular FRB and FFs methods. As a future work, we build a framework for approximating FFs in consequent of each multi-dimensional subspaces, based on fuzzy method and in such a way that can use parameter of fuzzy function in pattern recognition purposes.
