Interactively recurrent fuzzy functions with multi objective learning and its application to chaotic time series prediction

Abstract

Fuzzy functions (FFs) models were introduced as an alternate representation of the fuzzy rule based approaches. This paper presents novel Interactively Recurrent Fuzzy Functions (IRFFs) for nonlinear chaotic time series prediction. Chaotic sequences are strongly dependent on their initial conditions as well as past states, therefore feed forward FFs models cannot perform properly. To overcome this weakness, recurrent structure of FFs is proposed by placing local and global feedbacks in the output parts of multidimensional subspaces. IRFFs’ optimized parameters should minimize the output error and maximize clusters density. To achieve these contradictory goals, Non-dominated Sorting Genetic Algorithm II (NSGAII) is applied for simultaneously optimizing the objectives. Also, feedback loop parameters are tuned by utilizing gradient descent algorithm with line search strategy based on the strong Wolfe condition. The experimental setup includes comparative studies on prediction of benchmark chaotic sequences and real lung sound data. Further simulations demonstrate that our proposed approach effectively learns complex temporal sequences and outperforms fuzzy rule based approaches and feed forward FFs.

Keywords

Recurrent fuzzy functions chaotic time series prediction non-dominated sorting genetic algorithm unsupervised optimal fuzzy clustering multivariate adaptive regression spline

1 Introduction

Over the last three decades of research on the dynamical systems, it has shown that chaos has been widely observed in both natural and engineering systems. Several methods of computational intelligence and soft computing have been widely used for chaotic signals prediction and identification. Neuro-fuzzy systems is one of the interesting soft computing approaches utilized in various application such as modelling, classification, and prediction [1 –8]. Turksen introduced another type of fuzzy structure, called Fuzzy Functions (FFs) [9, 10] which has a simpler structure compared with neuro-fuzzy rule based systems. The multidimensional structure characteristic of the input space of FFs leads to elimination of combination operators such as, t-norm and t-conorm in antecedent fuzzy subsets. It is one of the main differences of the multidimensional structures with the rule based structures [11]. Moreover the obtained membership values as well as input variables are utilized to estimate FFs’ functions which are equivalent to consequent parts of the rule based systems. These functions can be estimated by different regression methods like the Least Square Estimation (LSE) [10], Multi Adaptive Regression Spline (MARS) [12], and Support Vector Machine (SVM) [13].

The attractor of chaotic systems contains interactive structure in the state space. Therefore the corresponding time series has a considerable dependency on their past states and are very sensitive to the initial conditions. So their prediction are much more difficult than those of static/algebraic systems [14]. Recurrent structures enable the model to address temporal sequences powerfully by responding to memory information based on the system past states. In contrast to feed forward fuzzy rule based systems, the combination of recurrent structures and fuzzy systems has been investigated in several fuzzy models. Two categories of recurrent structures were introduced as local and global feedbacks. By adopting global feedbacks, some of fuzzy structures such as, Takagi–Sugeno–Kang (TSK) type recurrent fuzzy network (TRFN) [15], the enhanced memory TRFN [16], the high-order recurrent neuron-fuzzy system (HO-RNFS) [17], and the recurrent self-organizing neural fuzzy inference network (RSONFIN) [18], have been proposed. Some other fuzzy neural networks using only local feedbacks have also been proposed [2 , 18–20]. In [18] a recurrent self-evolving fuzzy neural network with local feedbacks (RSEFNN-LF) is proposed that its recurrent structure comes from locally feeding the firing strength of a fuzzy rule back to itself. Dynamic fuzzy neural network (DFNN) [19], self-recurrent Wavelet Neural Network (SRWNN) [2], and Indirect Stable Adaptive Fuzzy Wavelet Neural Controller (ISAFWNC) [20] use a local feedback structure similar to RSEFNN-LF, with the exception that the local feedbacks are located in their conclusion parts. As a local and global method, the interactively recurrent self-evolving fuzzy neural network (IRSFNN) [21] has been introduced which its recurrent structure is formed as an external loops and internal feedback.

In the same way, the main goal of this paper is extending the FFs approach to a recurrent structure, which can utilized nonlinear and chaotic time series prediction. In contrast to conventional FFs structure that used weighting average for aggregation, in this study recurrent structure is applied to improve modeling capacity of relationships among states of the time series which resulted in our propose scheme, Interactively Recurrent Fuzzy Functions. Membership values of input vectors are calculated by using Unsupervised Optimal Fuzzy Clustering (UOFC) and then multidimensional fuzzy subspaces will be achieved. IRFFs’ recurrent links are placed in the output parts of the multidimensional subspaces and have led to the creation of FFs interactive structure. Beside the scalar input variables, their membership values are then used by Multivariate Adaptive Regression Spline (MARS) to determine each cluster’s FF.

One important task in designing IRFFs is optimization of free parameters such as number of fuzzy subsets and fuzzy exponents as well as feedback weights. In some studies, the number of rules is fixed [2 , 22] or generated automatically based on a firing strength criterion [18, 21]. In our proposed system, objectives are minimizing IRFFs’ error and increasing the density of its fuzzy subspaces. Simultaneous optimization of both objectives by using conventional approaches is practically infeasible. To overcome this issue, we applied Non-dominated Sorting Genetic Algorithm II (NSGAII) [23] to optimize two objectives at the same time. Also, interactive feedback weights are trained with gradient descent algorithm with line search based on strong wolf condition that automatically tunes learning rate. Some of the well-known chaotic time series and a set of recorded lung sound signals in laboratory are considered for the prediction task and discussed to verify IRFF’s performance. Figure 1 shows the proposed scheme for IRFFs system:

The rest of this paper is organized as follows; the preliminaries of state space reconstruction is briefly reviewed in Section 2. Then in Section 3, UOFC method for multidimensional fuzzy subspaces will be presented. MARS regression and NSGA II algorithms for optimization are described separately in Sections 4and 5. Details of proposed IRFFs’ structure and parameter learning scheme are presented in Sections 6. In Section 7, we will compare our system with conventional methods. Finally, Section 8 is reserved for conclusions and future works.

2 Stare space reconstruction

State space may be used to study the geometric and dynamic properties of a given dynamic system. But in the most of the problems, it is not possible to measure all of dynamical/state variables, and a time series of observation is only available. The dynamics of the original system are unknown in given scalar data and so one of the most basic steps of analyzing time series is state space reconstruction.

According to Taken’s embedding theorem [24], reconstructed state space of time series x (n) is given by:

$\begin{matrix} s (n) & = & [x (n), x (n + τ), \dots, x (n + (m - 1) τ)] \\ n = 1, 2, \dots, M \end{matrix}$ (1)where s (n), m and τ are nth state, embedding dimension, and time delay of space reconstruction, respectively. Also, M = n - (m - 1) τ and N is the length of time series.

The key issue in state space reconstruction is properly determination of ‘m’ and ‘τ’, because different delay and embedding dimension cause different state space for one time series. For calculating embedding dimension, False Nearest Neighbor (FNN) [25] was utilized. The main idea of FNN is to examine how the number of neighbors of a point along a signal trajectory change with increasing embedding dimension. By examining how the number of neighbors change as a function of dimension, an appropriate embedding can be determined. If embedding dimension is chosen too low, many of the neighbors will be false, but in an appropriate embedding dimension or higher, the neighbors are real.

Also, τ must be chosen in such a way that makes the components of the states to be independent. The best determination method of τ is using mutual information function which shows nonlinear and general dependence between two variables. Criteria for determine τ is the first local minimum value of mutual information function for different values of delays [26].

3 Unsupervised Optimal Fuzzy Clustering (UOFC)

By assuming that the number of clusters is known, the objective function of basic FCM algorithm is as following [27]: $J_{q} (V, U) = \sum_{i = 1}^{M} \sum_{j = 1}^{C} u_{ij}^{q} d (x_{i}, v_{j})$ (2)where q > 1 is fuzziness and x_i, v_j, M, and C are the ith input data, the center of the jth cluster, the number of data points and clusters respectively. Also, U is an M × m matrix whose ijth element equals to the membership degree of x_i in jth cluster and V is C × m matrix which contains m dimensional center of clusters, and d (x_i, v_j) is the distance between x and jth cluster center.

After minimization, membership degree of features and cluster prototype are as follows [27]: $u_{ij} = \frac{1}{\sum_{k = 1}^{m} {(\frac{d (x_{i}, v_{j})}{d (x_{i}, v_{k})})}^{\frac{1}{q - 1}}}$ (3) $v_{j} = \frac{\sum_{i = 1}^{N} u_{ij}^{q} x_{i}}{\sum_{i = 1}^{N} u_{ij}^{q}}$ (4) if we use exponential distance (de) instead of Euclidean norm, we will obtain the UOFC algorithm introduced by Gath and Geva [28]. In this case, the algorithm learns hyper ellipsoidal cluster with different densities and different number of clusters which solves more complex problems [28]. Covariance matrices of clusters are used to consider shapes of them in calculations. Fuzzy covariance matrix of ith cluster as follows [29]: $F_{i} = \frac{\sum_{k = 1}^{M} u_{ik}^{q} (x_{k} - v_{i}) (x_{k} - v_{i})^{T}}{\sum_{k = 1}^{M} u_{ik}^{q}}, 1 \leq i \leq C$ (5)

And the distance function is as follows [29]:

$\begin{matrix} d_{ik}^{2} (x_{k}, v_{i}) \\ = \frac{(2 π)^{(\frac{n}{2})} V_{j}}{α_{i}} exp (\frac{1}{2} (x_{k} - v_{i})^{T} F_{i}^{- 1} (x_{k} - v_{i})) \end{matrix}$ (6)where $α_{i} = \frac{1}{N} \sum_{k = 1}^{N} u_{ik}$ is prior probability and $V_{j} = {| F_{j} |}^{\frac{1}{2}}$ is fuzzy hypervolume of jth cluster. Based on the concept of hypervolume and cluster density Gath and Geva [28] proposed two indices for automatic determination of the number of clusters. Here we used one of the indices as follows: $PD = \frac{S}{FH}$ (7)where PD is partition density. Moreover, $FH = \sum_{j = 1}^{m} V_{j}$ and $S = \sum_{i = 1}^{m} \sum_{j = 1}^{N} u_{ij}$ are total fuzzy hypervolume and total sum of membership degrees, respectively. Compact clusters lead to large values of PD, and vice versa, so large values of PD are emblem of “compact” clusters [28].

4 Multivariate adaptive regression spline

Suppose y indicates the dependent variable and x = (x₁, …, x_n) is the vector of explanatory variables in the set of observations where x_i = (x_i1, x_i2, …, x_im) ∈ R^m. least square regression is one of the conventional methods utilized for modeling of the relation between explanatory and output variables. In this method, b_j (x) ; j = 1, 2, …, P are basis functions and the coefficients are β_j which calculated by minimizing the Sum of Squared Error (SSE) defined as below [30]: $SSE = \sum_{i = 1}^{n} (y_{i} - β_{0} - β_{1} b_{1} (x_{i}) - \dots - β_{p} b_{p} (x_{i}))^{2}$ (8)

If specific form considered for basis function b_j (x) it will be non-adaptive predictive model and if they become determined base observe data, it will be called adaptive. MARS is one of the fast and compatible adaptive regression approaches. This nonparametric regression method which is a generalization of step by step linear regression can effectively reveal hidden patterns and nonlinear relation in data sets [30]. In MARS the basis functions are chosen as hinge functions (h₁, h₂) and formed by a pair of functions known as spline, given as [31]:

$\begin{matrix} h_{1} & = & max {0, x - c} \\ h_{2} & = & max {0, c - x} \end{matrix}$ (9)where, C is a constant called knot. Figure 2 depicts the spline functions.

Since for each of the explanatory variables x_j there are n independent observations, by considering each observation as a knot, we have 2n functions for each x_j. By considering set B includes all functions h_1j, h_2j, j = 1, …, d and product of two or more of them, each member of set B can be utilized as basis function of Equation 8 [31]. MARS is performed in forward and backward steps as follows:

Forward phase: This step starts with intercepting b₀ or average of the response value, and then adds iteratively basis function in couple form to model. Coefficients β in Equation 8, are estimates in order to have the most decrease in SSE. To add a new basis function, MARS must search all of the below items:

Statements in the model.

All of variables (can select one for making basis function).

All values of variable (for specify knot in new h function).

Adding new statement ends once changes in SSE become negligible or given maximum number of statements is achieved. At the end of this phase, a model with a lot of statements is provided that has poor performance in generalization and thus it needs a backward elimination phase.

Backward phase starts with the last model of forward phase and in each step removes a basis function from the model that has the lowest contribution to increase SSE. By eliminating each of basis function, a new model is obtained that can be the candidate for the final model. Performance of new model will be assessed with “Generalized Cross Validation” and this process continues until all of the basis functions are removed. MARS selects an optimum model based on minimizing GCV. GCV for kth step of removing phase, calculated asbellow [31]:

$GCV = \frac{RSS}{N \times {(1 - \frac{effective number of parameters}{N})}^{2}}$ (10)

$\begin{matrix} effective number of parameters \\ = number of MARS terms + penalty \\ \times \frac{number of MARS terms - 1}{2} \end{matrix}$ (11)where N and 2 < λ < 4 are the number of observations and smoothing parameter, respectively. Also $\frac{number of MARS terms - 1}{2}$ is the number of knot of hing functions exist in model.

5 Non-dominant sorting Genetic Algorithm II (NSGA II)

To solve many real world problems, it is necessary to optimize several objectives, simultaneously. Thus, Multi-objective (MO) optimization has been recently popular and was applied in different applications, like [32]. Here, MO optimization can be stated as finding the vector ${\bar{p}}^{*} = {[p_{1}^{*}, p_{2}^{*}, \dots, p_{n}^{*}]}^{T}$ of solutions which optimizes the following vector function: $\bar{f} (\bar{p}) = {[f_{1} (\bar{p}), f_{2} (\bar{p}), \dots, f_{k} (\bar{p})]}^{T}$ (12)where k is the number of cost functions and will be optimized. There are some insights in finding the optimum vector ${\bar{p}}^{*}$ in this situations. When there are two or more objective functions which are conflicting it is needed to optimize all objectives simultaneously. The main difficulty in this situation is comparing one solution another. When the relative importance of the objectives is unknown, there are multiple solutions, and each of which is considered acceptable and equivalent. At the end of the procedure, there is a set of solutions named Pareto set. A decision vector ${\bar{p}}^{*}$ is called Pareto optimal if and only if there is no object that dominates ${\bar{p}}^{*}$ . In other words, ${\bar{p}}^{*}$ is Pareto optimal if there exists no feasible vector $\bar{p}$ which causes a reduction on some criterion without a simultaneous increase in at least another. Detailed information about definition of pareto optimal is available in [33].

There are several methods for implementation of MO such as NSGA [34], MOGA [35], and NPGA [36]. NSGAII is an evolutionary and population based metaheuristic algorithm with predefined generation numbers (gen) and number of populations (pop) in each generation. The algorithm by utilizing a specific selection operator, creates a mating pool combines the parent and offspring population, selects the best solutions. Finally NSGAII finds the best solution in each generation and creates the Pareto optimal solution.

6 The proposed Interactively Recurrent Fuzzy Functions (IRFFs)

The proposed IRFFs comes as an extended version of FFs and is illustrated in Fig. 3, the links between clusters create the interactively recurrent dynamical structure. By utilizing the interaction feedback, the proposed IRFFs captures the information from other subspaces. Our IRFFs system follows the following steps:

State space reconstruction.

Calculating membership values by use of UOFC clustering in state space.

Reproducing augmented input by appending membership values to primary inputs.

Estimate appropriate function for each cluster utilizing MARS method.

Calculation aggregation weight in the form of interaction between rules and past states.

Weighted averaging for output calculation.

6.1 IRFFs’ structure

Step 1: Here, X = [x₁ … x_N] is the input where N is the number of data points. In state space reconstruction, trajectory matrix with lag τ and dimension m is obtained as follows: $S = [\begin{matrix} x_{1} & x_{1 + τ} & \dots & x_{1 + (m - 1) τ} \\ ⋮ & ⋱ & \dots & ⋮ \\ x_{N - (m - 1) τ} & x_{N - (m - 2) τ} & \dots & x_{N} \end{matrix}]$ (13)

Each rows of Sis one state in state space. So nth input data is s (n) = (x (n) , x (n + τ) , …, x (m - 1) τ)) and the corresponding output is y (n) = x (n + mτ).

Step 2: State space points are clustered for determining system subspaces. UOFC clustering results in separating system behaviors. After termination of clustering, membership degrees of each state in each cluster is determined by Equation 9.

Step 3: In this step, augmented inputs of FFs are created. Then membership values and their transformations are appended to the primary inputs: $φ_{i} = [s, Γ, f (Γ)] i = 1, \dots, M .$ where Γ is the membership vector containing membership values of different clusters and f (Γ) indicates membership values transformation (Details of selecting appropriate transformation is available in [10]).

Step 4: to estimate the functions of IRFFs, MARS is applied which uses matrix φ as follows: $φ_{i} = [\begin{matrix} s_{1} & Γ_{1} & f (Γ_{1}) \\ ⋮ & ⋮ & ⋮ \\ s_{K} & Γ_{K} & f (Γ_{K}) \end{matrix}], i = 1, \dots, C$ (14)where, K is the number of members of a cluster and every row is an explanatory variable and its corresponding output is x_n+mT. Here, a step prior to prediction is done and NARMAX model for all clusters created which corresponds to the behavior of each cluster.

Step 5: Weights of recurrent part of IRFFs are considered to form local and global feedback loops. The outputs of this step are utilized for aggregation weights in calculation of IRFF output. These weights are computed as follows:

$\begin{matrix} ψ_{k}^{t} (u) & = & (1 - γ_{k}) u_{k}^{t} + \overset{C}{\underset{i \neq k}{\sum_{i = 1}}} λ_{ik} ψ_{i}^{t - 1} \\ k & = & 1, \dots, C, t = 1, \dots, M \end{matrix}$ (15)

Here $γ_{k} = \sum_{i = 1}^{C} λ_{ik},$ $t, u_{k}^{t}, ψ,$ and λ are the samples number, the membership value of tth data in cluster k, the weight of aggregation, and the coefficient of interactive and recurrent relation, respectively. The aggregation weights depend not only on current membership values of each cluster, but also on the previous membership values of other clusters and itself.

Step 6: Finally, system output is calculated as follows: $f (x_{j}) = \frac{\sum_{i = 1}^{C} ψ_{i} (u) . {FF}_{i} (φ_{j})}{\sum_{i = 1}^{C} ψ_{i} (u)}$ (16)where FF_i (φ_j) is the output of i th function rule for j th input data.

6.2 IRFFs’ learning

In this subsection the structure and parameters learnings of IRFFs are presented. All free design parameters are number of clusters (C), degree of fuzziness (m), penalty per knot (GCV), and recurrent layer coefficients (λ). Figure 4 presents IRFFs’ flow chart learning scheme which includes structure and parameterlearning.

6.2.1 Structure learning

In order to have compact clusters and better IRFFs’ performance, the following two objectives are considered:

1. Minimizing error value of prediction. $E = \frac{1}{2} {(y^{t} - f (x^{t}))}^{2}$ (17)y^t is the desired output value, and f (x^t) is IRFFs output value of tth sample.

2. Maximizing partition density value for fuzzy clustering (Equation 7).

For having an optimum tradeoff between both objectives, the structure learning phase uses NSGAII algorithm for optimizing the IRFF structural parameters (m, C, GCV). At the end of training step, best value of m^*, C^*, and GCV^* are specified and used for test data.

6.2.2 Parameter learning

In all generations of NSGAII, recurrent layer coefficients (λ) are obtained by applying gradient descent algorithm with line search based on strong wolf condition. Its main advantage is automatically determination of learning rate (η).

Interactively recurrent weights learning based on error of modeling is done as an iterative algorithm. Based on the error function introduced in Equation 17, we trained coefficients by using the followingequations: $λ_{ij}^{t + 1} = λ_{ij}^{t} + Δ λ_{ij}^{t} + \propto Δ λ_{ij}^{t - 1}$ (18) $Δ λ_{ij}^{t} = - η \frac{\partial E}{\partial λ_{ij}} i, j = 1, \dots, K$ (19) where 0< η<1 and ∝ are the learning step and momentum weight (which is practically selected around 0.8), respectively. Developing the previous expressions, it has that: $\frac{\partial E}{\partial λ_{ij}} = - (y - f (x)) \frac{\partial f}{\partial λ_{ij}}$ (20)

Where

$\begin{matrix} \frac{\partial f}{\partial λ_{ij}} = {\begin{matrix} (ψ_{i} (t - 1) - u_{i} (t)) {FF}_{i} (q), i = j \\ (ψ_{j} (t - 1) - u_{i} (t)) {FF}_{i} (q) + \end{matrix} \\ (ψ_{i} (t - 1) - u_{j} (t)) {FF}_{j} (q), i \neq j \end{matrix}$ (21)

Equal case as for recurrent weight and inequality for interactive weight, here we have critical assumption that interactive weight is bidirectional.

7 Simulations

This section presents three examples including chaotic sequence prediction, time series prediction in noisy environment, and multichannel lung soundmodeling. In all of the following examples the validity of our proposed scheme is confirmed by simulation. Here, population size and the number of generations are set to 100 and 500, respectively. Also Root Mean Squared Error (RMSE) and percent root mean squared difference (PRD) are considered as the quality criteria: $RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {(y (i) - \hat{y} (i))}^{2}}{N}}$ (22) $PRD % = \sqrt{\frac{\sum_{i = 1}^{N} {(y (i) - \hat{y} (i))}^{2}}{\sum_{i = 1}^{N} {(y (i))}^{2}}} \times 100$ (23) where $\hat{y}$ is the IRFFs’ output.

Example 1: Lorenz equations time series

Here, the time series prediction problem applied for the data generated from the Lorenz equations given by [37]: ${\begin{matrix} \frac{dx}{dt} = a (y - x) \\ \frac{dy}{dt} = cx - xz - y \\ \frac{dz}{dt} = xy - bz \end{matrix}$ (24)

where x, y, and z are state variables and a, b, and c are adjusting parameters. For a = 10, b = 8/3, and c = 28 the Lorenz equations will have chaotic behavior. After generating data sequences by settingtime steps of 0.05 and initialization values of (1, 1, 1) for state, we remove the first 201 transition points and keep the next 1000 points for every state variables. The first 800 points are utilized for training dataset and last 200 points for checking set. Based on Taken’s embedding theorem [24], the delay and embedding dimension are set as 1 and 3, respectively. Figure 5 shows the prediction result for series xand the excelent ability of IRFFs for predicting Lorenz time series.

Table 1 lists comparative results of Lorenz equations time series prediction based on the proposed IRFFs and other approaches. Here, we compared three FRB and two FFs system including Recurrent Self-Evolving Fuzzy Neural Network (RSEFNN) [18], Fuzzy Recurrent Wavelet Neural Network (FRWNN) [2], and Squared Root Cubature Kalman Filter-γ Echo State Network (SCKF-γESN [38], Fuzzy Functions with LSE (FFs-LSE) and Fuzzy Functions with MARS (FFs-MARS) [10, 12]. The results indicate that by addition of recurrent layer and interactive structure in IRFFs, better RMSE and PRD than those of other fuzzy structures are achieved. Although the IRFFs performance and recurrent structures such as RSEFNN and FRWNN are similar, the number of clusters in IRFFs is fewer than the number of fuzzy rules of them.

Example 2: noisy Mackey-Glass time series

The Mackey–Glass time series s (t) is extracted from the following differential equation [39]: $\frac{ds (t)}{dt} = \frac{0.2 s (t - l)}{1 + s^{10} (t - l)} - 0.1 s (t)$ (25)where l is lag parameter. Here, chaotic behavior is occurred by setting l > 17. We obtain Mackey–Glass time sequences by applying the fourth-order Runge–Kutta method [40] for the Equation 25. In this study, the initial value s (0) is set 1.2 and s (k) is derived for 1 ≤ k ≤ 1200. Also, since the first 201 points are transition points, they are removed and the next 500 generated data are utilized for training and the remaining data is applied for testing.

Now, we want to compare performances of different prediction methods in the presence of additive noise which has uniform distribution. By setting SNR = 6 dB, the result of Fig. 6 and Table 2 are achieved. Figure 6 shows that the system is able to reasonably follow the sequence of Mackey Glass time-series in the presence of noise.

In Table 2, the performance of the proposed IRFFs is compared with the following fuzzy forecasters: 2u function type2 fuzzy systems (2u T2 FLS) [41], Quasi-T2 Fuzzy Logic System Q-T2 FLS [42], and other systems those are used in Example 1. Multi objective learning of IRFFs generates 5 multi-dimensional fuzzy subspaces while FF-MARS have 7 ones. Therefore IRFFs has simpler structure than that of FFs approaches which decreases the output error. Moreover, Table 2 shows that because of using local and global feedbacks, our proposed method outperforms the 2u T2 FLS [41] and Q-T2 FLS [42] approaches which used uncertainty in their type2 membership functions.

Example 3: (real lung sound signal modeling)

Since the lung sound signal is chaotic and has very complex dynamic [43], its estimation and modeling are used in different areas like classification based on model parameter, denoising, and compression [44]. Here, we use the signals recorded in the Department of Pneumology in Shariati Hospital by Bioinstrument and Biological signal processing laboratory researchers of Amirkabir University of Technology’s [45]. Data was recorded using a 6-channel respiratory sound acquisition device in a non-sound proof room. Each record contains 3 inhales and 3 exhales with 1 second pause after each breath.

In state space reconstruction, the parameters value of m = 6, τ = 8 are obtained. For estimation and prediction purpose, one inhale of a lung sound was picked up from a healthy subject. Due to the chaotic nature and existence of high relationship among the states of the lung sound signals, the shapes of the signals are very complex.

Figure 7 shows original lung sound signal along with one predicted by our proposed method in all 6 different channels. This figure confirms that the temporal dynamics of lung sound signals in each channel is effectively modeled by interactively structure. In channel 3, although IRFFs doesn’t track the signal in very short time duration around 1600, its performance in the rest time intervals and in all of the other channels aredesirable. Also, Fig. 8 shows error of proposed method for channel 6 data.

Table 3 lists the results of different FFs methods and other interactive structures on lung sound data modelling. The innovative learning algorithm generates optimal values compared to the other methods and leads to lower training and test errors as well as less complex system.

8 Conclusion

In this paper a recurrent structure for chaotic time-series prediction is added to fuzzy functions system. By placing local and global feedbacks in the output parts of FFs’ multidimensional subspaces, an interactive structure is achieved and IRFFs systems is proposed. IRFFs uses recurrent structure to memorize previous states of the system. Self-feedbacks in IRFFsprovide capturing the critical local information of time series sequences. Furthermore, global feedbacks feeding the output part of each multidimensional subspace to others, enable IRFFs to effectively address temporal sequences responding to memory information from prior system states. Therefore, generalization of IRFFs in comparison with feed forwards FFs is improved.

Multi-dimensional fuzzy subspaces are created by use of UOFC. This method can consider different types of cluster shape, density and number of data points in each cluster, which is necessary in time series clustering for identifying all kinds of system behaviors.

Also, MARS method has been applied for linear and nonlinear regression of each multidimensional subspace. In fact by MARS can learn different kind of dynamical behavior.

The structure and parameter learning algorithm enables IRFFs to automatically identify its desired structure. It eliminates the sensitivity of learning algorithm to initial selection of parameters and provides the admissible structure for given data-sets. Simultaneous optimization of parameters with NSGAII provides minimal structures and maximum prediction capability. Therefore, IRFFs has less complex architecture than that of conventional FFs. Also, in each generation of NSGAII, it is needed to optimize the recurrent coefficients. Gradient descent algorithm with line search based on Wolfe condition is utilized for this purpose. Because of automatically determination of learning rate, possibility of divergence and increasing the computational complexity resulted from inappropriate selection of learning rate are eliminated. Also convergence of algorithm become faster.

To confirm the performance of IRFFs, two bench-mark and one recorded lung sound time-series were utilized for simulation. This is the first time that FFs structures was implemented for biological signal processing. In all examples, IRFFs was compared with latest popular FRB and FFs methods. As a future work, we build a framework for approximating FFs in consequent of each multi-dimensional subspaces, based on fuzzy method and in such a way that can use parameter of fuzzy function in pattern recognition purposes.

References

Gholizade-Narm

and Shafiee Chafi

M.R.

, Using repetitive fuzzy method for chaotic time series prediction, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology 28 (2015), 1937–1946.

Ganjefar

and Tofighi

, Single-hidden-layer fuzzy recurrent wavelet neural network: Applications to function approximation and system identification, Information Sciences 294 (2015), 269–285.

Jang

J.-S.R.

, ANFIS: Adaptive-network-based fuzzy inference system, Systems, Man and Cybernetics, IEEE Transactions on 23 (1993), 665–685.

Tripathy

B.C.

and Dutta

A.J.

, Lacunary bounded variation sequence of fuzzy real numbers, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology 24 (2013), 185–189.

Nithya

and Duraiswamy

, Correlated gain ratio based fuzzy weighted association rule mining classifier for diagnosis health care data, Journal of Intelligent and Fuzzy Systems 29 (2015), 1453–1464.

Bazoobandi

H.-A.

and Eftekhari

, A fuzzy based memetic algorithm for tuning fuzzy wavelet neural network parameters, Journal of Intelligent and Fuzzy Systems 29(1) (2015), 241–252.

Tripathy

B.C.

, Braha

and Dutta

A.J.

, A new class of fuzzy sequences related to the ℓp space defined by Orlicz function, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology 26 (2014), 1273–1278.

Tripathy

B.C.

and Debnath

, γ-open sets and γ-continuous mappings in fuzzy bitopological spaces, Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology 24 (2013), 631–635.

Turksen

I.B.

, A review of developments from fuzzy rule bases to fuzzy functions, Hacettepe Journal of Mathematics and Statistics 41 (2012).

10.

Türkşen

I.B.

, Fuzzy functions with LSE, Applied Soft Computing 8 (2008), 1178–1188.

11.

Uncu

and Turksen

, A novel fuzzy system modeling approach: Multidimensional structure identification and inference, Fuzzy Systems, 2001. The 10th IEEE International Conference on, 2001, pp. 557–561.

12.

Zarandi

, Zarinbal

, Ghanbari

and Turksen

, A new fuzzy functions model tuned by hybridizing imperialist competitive algorithm and simulated annealing. Application: Stock price prediction, Information Sciences 222 (2013), 213–228.

13.

Çelikyilmaz

and Burhan Türkşen

, Fuzzy functions with support vector machines, Information Sciences 177 (2007), 5163–5177.

14.

Kantz

and Schreiber

, Nonlinear time series analysis vol. 7: Cambridge university press, 2004.

15.

Juang

C.-F.

, A TSK-type recurrent fuzzy network for dynamic systems processing by neural network and genetic algorithms, Fuzzy Systems, IEEE Transactions on 10 (2002), 155–170.

16.

Stavrakoudis

D.G.

and Theocharis

, A recurrent fuzzy neural network for adaptive speech prediction, Systems, Man and Cybernetics, 2007 ISIC IEEE International Conference on, 2007, pp. 2056–2061.

17.

Theocharis

, A high-order recurrent neuro-fuzzy system with internal dynamics: Application to the adaptive noise cancellation, Fuzzy Sets and Systems 157 (2006), 471–500.

18.

Juang

C.-F.

, Lin

Y.-Y.

and Tu

C.-C.

, A recurrent self-evolving fuzzy neural network with local feedbacks and its application to dynamic system processing, Fuzzy Sets and Systems 161 (2010), 2552–2568.

19.

Mastorocostas

and Theocharis

J.B.

, A recurrent fuzzy-neural model for dynamic system identification, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 32 (2002), 176–190.

20.

Alizadeh

and Tofighi

, Full-adaptive THEN-part equipped fuzzy wavelet neural controller design of FACTS devices to suppress inter-area oscillations, Neurocomputing 118 (2013), 157–170.

21.

Lin

Y.-Y.

, Chang

J.-Y.

and Lin

C.-T.

, Identification and prediction of dynamic systems using an interactively recurrent self-evolving fuzzy neural network, Neural Networks and Learning Systems, IEEE Transactions on 24 (2013), 310–321.

22.

Connor

J.T.

, Martin

R.D.

and Atlas

L.E.

, Recurrent neural networks and robust time series prediction, Neural Networks, IEEE Transactions on 5 (1994), 240–254.

23.

Deb

, Pratap

, Agarwal

and Meyarivan

, A fast and elitist multiobjective genetic algorithm: NSGA-II, Evolutionary Computation, IEEE Transactions on 6 (2002), 182–197.

24.

Takens

, Detecting strange attractors in turbulence, Dynamical systems and turbulence, Warwick 1980, ed: Springer, 1981, pp. 366–381.

25.

Kennel

M.B.

, Brown

and Abarbanel

H.D.

, Determining embedding dimension for phase-space reconstruction using a geometrical construction, Physical Review A 45 (1992), 3403.

26.

Fraser

A.M.

and Swinney

H.L.

, Independent coordinates for strange attractors from mutual information, Physical Review A 33 (1986), 1134.

27.

Theodoridis

and Koutroumbas

, Pattern Recognition: Elsevier Science, 2008.

28.

Gath

and Geva

A.B.

, Unsupervised optimal fuzzy clustering, Pattern Analysis and Machine Intelligence, IEEE Transactions on 11 (1989), 773–780.

29.

Bezdek

J.C.

and Dunn

J.C.

, Optimal fuzzy partitions: A heuristic for estimating the parameters in a mixture of normal distributions, Computers, IEEE Transactions on 100 (1975), 835–838.

30.

Hastie

, Tibshirani

, Friedman

, Hastie

, Friedman

and Tibshirani

, The elements of statistical learning vol. 2: Springer, 2009.

31.

Friedman

J.H.

, Multivariate adaptive regression splines, The annals of statistics 1991, pp. 1–67.

32.

Deb

, Multi-objective optimization using evolutionary algorithms vol. 16: John Wiley & Sons, 2001.

33.

Saha

, Maulik

and Plewczynski

, A new multi-objective technique for differential fuzzy clustering, Applied Soft Computing 11 (2011), 2765–2776.

34.

Srinivas

and Deb

, Muiltiobjective optimization using nondominated sorting in genetic algorithms, Evolutionary computation 2 (1994), 221–248.

35.

Deb

, Multi-objective genetic algorithms: Problem difficulties and construction of test problems, Evolutionary Computation 7 (1999), 205–230.

36.

Horn

, Nafpliotis

and Goldberg

D.E.

, A niched Pareto genetic algorithm for multiobjective optimization, Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence, Proceedings of the First IEEE Conference on, 1994, pp. 82–87.

37.

Banerjee

, Mitra

and Rondoni

, Applications of chaos and nonlinear dynamics in engineering vol. 1: Springer Science & Business Media, 2011.

38.

Han

, Xu

, Liu

and Wang

, Online multivariate time series prediction using SCKF-γESN model, Neurocomputing 147 (2015), 315–323.

39.

Belair

, Glass

, an der Heiden

and Milton

, Dynamical disease: Identification, temporal aspects and treatment strategies of human illness, Chaos: An Interdisciplinary Journal of Nonlinear Science 5 (1995), 1–7.

40.

Butt

, Introduction to Numerical Analysis Using MATLAB^®: Jones & Bartlett Learning, 2009.

41.

Molaeezadeh

S.F.

and Moradi

M.H.

, A 2uFunction representation for non-uniform type-2 fuzzy sets: Theory and design, International Journal of Approximate Reasoning 54 (2013), 273–289.

42.

Mendel

J.M.

, Liu

and Zhai

, -plane representation for type-2 fuzzy sets: Theory and applications, Fuzzy Systems, IEEE Transactions on 17 (2009), 1189–1207.

43.

Ahlstrom

, Johansson

, Hult

and Ask

, Chaotic dynamics of respiratory sounds, Chaos, Solitons & Fractals 29 (2006), 1054–1062.

44.

Palaniappan

, Sundaraj

and Ahamed

N.U.

, Machine learning in lung sound analysis: A systematic review, Biocybernetics and Biomedical Engineering 33 (2013), 129–135.

45.

Fard

P.J.M.

, Moradi

M.H.

and Saber

, Chaos to randomness: Distinguishing between healthy and non-healthy lung sound behaviour, Australasian Physical & Engineering Sciences in Medicine, 2014, pp. 1–8.