Abstract
The data driven black-box or gray-box models like neural networks and fuzzy systems have some disadvantages, such as the high and uncertain dimensions and complex learning process. In this paper, we combine the Takagi-Sugeno fuzzy model with long-short term memory cells to overcome these disadvantages. This novel model takes the advantages of the interpretability of the fuzzy system and the good approximation ability of the long-short term memory cell. We propose a fast and stable learning algorithm for this model. Comparisons with others similar black-box and grey-box models are made, in order to observe the advantages of the proposal.
Introduction
The model of a system is the representation of the structure (properties) of the system, the choice of how a model is developed depends on what is expected to be represented in it. Obtaining models can be done in different ways, such as through physical laws (mathematical modeling); it is the most common form, but this type of technique needs knowing exactly the environment in which the system operates, as well as making the biggest amount of theoretical considerations as possible. Another way to obtain models is through measurements of aspects of interest from the system (black-box models), or such measurements together with some equations that describe the system behavior (gray-box models), achieving high robustness and adaptability. Neural networks (NNs) and fuzzy systems are very common to use as black-box models and gray-box models, respectively. The NNs and fuzzy systems can generate models by learning processes, either for system modeling or adaptive control.
Fuzzy models use fuzzy rules of the IF-THEN type to identify systems. There are two main types of fuzzy models, Mamdani fuzzy systems and Takagi-Sugeno (TS) fuzzy systems, with several comparisons between them showing that TS fuzzy systems are better for engineering tasks such as the modeling and control of systems [1]. Fuzzy systems represents experts knowledge, but they can be constructed in such a way that they emulate an expert through learning processes (like a NN) resulting in an ANFIS (adaptive network based fuzzy inference system) [2]. The ANFIS systems are based on a TS fuzzy system and transform fuzzy systems into something similar to NNs. TS models with ANFIS training are very powerful in wide range of scientific fields, such as in energy field [3] and economic decision [4].
If the consequences (THEN parts) of a TS fuzzy system are taken as nonlinear functions, it is possible to obtain better results in the general performance of the fuzzy system [5, 6]. The inclusion of NNs of different types in ANFIS systems was introduced and discussed in many works, generating fuzzy-neural or neuro-fuzzy models [7–9]. More recent works on this topic propose RNNs to estimate the consequences in fuzzy systems (to approach a nonlinear function in each consequent part), for example the wavelet network (WN) are very suitable for this task [10]. In [31] different types of fuzzy systems are analyzed and compared to each other; these fuzzy systems are structured with conventional representations and with NNs and, according to this work, the construction of a fuzzy system depends essentially for which application it will be used. However, the approximation accuracies of the above TS models are not satisfied.
Recently a deep learning model, named long-short term memory (LSTM), has been developed [11–14]. It has a recurrent structure and is based on information management through gates, these gates measure the suitability of the data they receive as input data, the stored data by the LSTM and the data generated by the LSTM as result. LSTM networks overcome many disadvantages of RNNs and they converge relatively faster [15–18]. Some training algorithms specifically for LSTM networks have been proposed, to further improve their performance [19]. But as a disadvantage, the internal structure of a LSTM is more complex than the conventional RNNs. The use of deep LSTM networks is still under development, as well as their use with other intelligent systems like the fuzzy systems. For example, in the robotics field and specifically in the medical robotics some fuzzy-neural networks that includes LSTM networks are proposed to control surgeon robots [20, 21], other important applications that includes LSTM networks and fuzzy systems are the management of resources, like the management of electrical energy [22]; but in these proposals the LSTM networks and fuzzy systems work in a decoupled way.
Among the simplest ways to adjust the parameters of NNs are supervised learning algorithms, highlighting the back propagation (BP) algorithm. The BP is one of the most popular algorithms to train NNs because of its simplicity [24]. Recurrent NNs (RNNs) are the most used to model systems, because they can generate relatively fast system models, for both linear and nonlinear systems [26]. Variations of the BP algorithm have been developed to be able to adjust the parameters of RNNs efficiently, stand out the back propagation through time algorithm (BPTT) [25]. By analyzing the stability of RNNs, these networks can deal with the problem of noise and disturbances [27]. This last network represents that conventional RNNs can be changed into more complex structures in order to obtain better results as the case may be.
In order to create a network that reacts faster and with a better approximation, specially for applications in real time related to the identification and control of systems, in this paper, we make the following contributions: We propose a novel TS fuzzy model, which employes the LSTM networks inside the structure of the fuzzy system. This model is established by the fuzzy system and benefited by the LSTM network estimation. A learning process for this fuzzy-network is proposed, it performs in a short period of time and it is feasible. The stability of the proposed model taking into account the training algorithm is proved.
To show the advantages of the novel fuzzy LSTM network, comparisons between the proposal and other intelligent algorithms are made by using the Mackey-Glass time series and a nonlinear benchmark system. These comparatives are made to show that: 1) the novel model has better modeling performances than the other algorithms; 2) the proposed method has fast convergence and it can achieve easily the assigned task.
Fuzzy modeling using LSTM cells
A system can be represented as a nonlinear function in discrete time as follows:
The representation shown in (1) and (2) is known as a NARMA model. To model that system, we use fuzzy IF-THEN rules similar to a conventional TS fuzzy system, then for the p-th rule it has:
The membership functions associate to each A
jp
are described as follows:
A more general representation of the value of each element of (4) in each fuzzy set can be done in a vectorial way:
The consequent part (THEN part) of one fuzzy rule in (3) is represented by h p (k). The function h p (k) usually is defined as a linear combination of the inputs signals (2) of the system (1), but as was said in the introduction, better estimations are achieve with the use of nonlinear functions (with the input signals as arguments); this nonlinear functions can be easily obtained by a NNs, and one of the best to do this is a LSTM cell. However, when n y and n u in (2) are unknown, i.e., we do not known how long the current status depends on their previous information, especially when the time series is long, the information between the relevant and place becomes smaller and smaller. So we need a model which can hands the “long-term dependencies”, and LSTM cells has this property.
The estimation of (1) is obtained by the defuzzification of the fuzzy system (3) with p rules:
The concept of the fuzzy system using the LSTM cells is shown in Fig. 1, and it is divided in 4 layers: in the first layer the inputs of the network are organized, in the second layer this inputs are fuzzificated, in the third layer the values of the IF and THEN parts are calculated, and in the fourth layer the estimation of the system is made according to (6).

Fuzzy model with LSTM cells.
So, the LSTM cells in the consequent part is shown in Fig. 2. The cells process data using the “gate” technique to let useful information pass through its structure. This cell is capable of handling long-term and short-term data dependencies in more efficient way than a conventional RNN. The cells can work together, as a network and also can be organized as an array.

LSTM cell for the consequent part.
The LSTM network has several stages, which are describe by:
From (6), the output of the fuzzy system is
According to function approximation theories of fuzzy systems [28], the identified nonlinear process (1) can be represented as:
Once the structure of the fuzzy system has already been defined, it is necessary to design a training algorithm to adjust its parameters or weights. In this paper, a variation of the BPTT algorithm is chosen to train the fuzzy system. We apply a narrow “window” to apply the BPTT. This window only considers the values generated by the fuzzy LSTM network in the current iteration and its immediate past iteration. In this training method the values generated by the fuzzy LSTM network in the oldest iterations are forgotten, also this can be easily applied for online training. The training algorithm is defined by:
In (17), η
W
determines the amount that increases or decreases each weight, while α
W
helps to stabilize the modification by considering the past weight adjustment. The modelling error between the desired value and the fuzzy model is defined as:
The modeling objective of the fuzzy system is
The modification (19) can be obtained by the application of the chain rule, diagrammatic rules, and the signal flow of the network and it can be organized into an array like in (17). By the considerations made before, the adjustment of the parameters of the fuzzy LSTM network described in (4)-(13) can be easy to obtain. To illustrate this fact, for example, if we consider m = 1, l = 1 and κ > 1 the gradient (19) for each element of W
i
in the consequent part is:
A similar calculation is made for the adjustment of W
f
, W
s
, W
o
, V
f
, V
i
, V
s
and V
o
. In other hand, for the premise part, for example, the adjustment of χ
j
in the membership functions of (5) are:
Also, something similar for ϒ j is done to compute its adjustment. As it was said before, in this paper we are only are interested in open-loop identification, we assume that the plant (1) is bounded-input and bounded-output stable,i.e., y (k) and U r (k) in (1) are bounded. The following theorem gives a stable gradient descent training algorithm for the fuzzy neural model.
Each element in (27) works in an independent way, and every element is defined in a similar manner. Here we only show how to prove L1 (k)
So, L1 admits a ISS-Lyapunov function, the dynamic of the identification error is input-to-state stable. Because L1 is the function of e (k) and μ (k). The “INPUT” corresponds to the second term of (36),i.e., the modeling error μ (k). The “STATE” corresponds to the first term of (36), i.e., the identification error e (k) . Because the “INPUT” μ (k) is bounded and the dynamic is ISS, the “ STATE” e (k) is bounded.
Continuing, (36) can be rewritten as
In this section, we use two examples to compare our method with the other classical methods. Our fuzzy system with LSTM cells is “fuzzy LSTM”, the other establish intelligent algorithms are: the RNN with Kalman Filter (KFRNN) [27], the deep LSTM networks (LSTM) [16], the zero order ANFIS system (ANFIS 0) [9], the first order ANFIS system (ANFIS 1) [8], the fuzzy wavelet network (fuzzy WN) [10], and the stable fuzzy-neural network similar with KFRNN (fuzzy KFRNN) [30]. The hyper-parameters of all these models are the same, such as the input number, the number of the fuzzy rules, the training and the testing data, etc.
Mackey-Glass time series
The first example consist on a model generation for the Mackey-Glass (MG) time-delay system, also known as MG time series:
This time series is chaotic with no clearly defined period. The series does not converge or diverge, and the trajectory is highly sensitive to initial conditions. So, (42) was solved for 1, 200s, samples of the time series are taken with a sampling period T = 1s, creating the vector y (k) with k = 1, …, 1201. We use the values of y (k) to define U
r
(k) = [y (k-3) , y (-20)]
T
, that was used to made the estimation
We established p = 9 fuzzy rules for the fuzzy systems (m = 2, κ = 3, l = 1), the dimensions of the NNs are defined from several tests with different sizes and choosing the smallest NNs that offers a good performance. The learning rates in (24) are chosen as η = 0.8 and α = 0.7. The comparison results are shown in the Table 1. Here the modeling error E (k) at the end of each phase is defined like in (18) and it represents the performance of the algorithms, a low value indicates a better performance. This table shows that all algorithms have similar performances in average, but our algorithm has little advantages than the others.
Modeling errors of MG time series estimation (×10-2)
The Fig. 3 gives the modeling process of the “LSTM”, the “fuzzy WN” and the “fuzzy LSTM”. We can see that only our method is able to generated an acceptable model for the MG time series. Also, we only show three algorithms, because the performance of the “ KFRNN” was very similar to the “LSTM”, and the others fuzzy systems performance are similar with the “fuzzy WN”. This example is important because we can watch the capabilities of the algorithms to generated models when they do not have access to the immediate past information of a process and when the data to construct a model are not close between them, for which the proposal overcomes the others algorithms.

Modeling of MG time series. The subfigures: (a) LSTM, (b) fuzzy WN, and (c) fuzzy LSTM. “A” is the time series response and “B” is the network response.
We selected the benchmark problem proposed in [29] and [23] as the second example, this problem corresponds to a MIMO (multi-input-multi-output) nonlinear system in discrete time. As in the first example, a model generation for this system is required. The system is defined as:
with n u 1 = n u 2 = 10, 20, 30, 40.
Similar to the past example, the vector y (k) = [y1 (k) , y2 (k)] T was constructed by taking samples of the system with a sample period T = 0.01s, the input vector was defined as U r (k) = [u1 (k) , u2 (k)]. To simulate perturbations, random values in [-0.5, 0.5] are added to U r and y (k) in the training phase and random values in [-0.2, 0.2] are added to U r and y (k) in the testing phase of the algorithms. While the. In this example, we used p = 18 fuzzy rules for the fuzzy systems (m = 2, κ = 3, l = 2), the dimensions of the NNs are defined from several tests with different sizes and choosing the smallest NNs that offers a good performance.

Nonlinear system modeling with “fuzzy WN”. The subfigures: (a) y1 (k) in the training, (b) y2 (k) in the training, (c) y1 (k) in the testing, (d) y2 (k) in the testing. “ A” is the system response and “B” is the model response.

Nonlinear system modeling with “fuzzy LSTM”. The subfigures: (a) y1 (k) in the training, (b) y2 (k) in the training, (c) y1 (k) in the testing, (d) y2 (k) in the testing. “ A” is the system response and “B” is the model response.
We simulated the system in the following way: we train the algorithms to learn the system (43) with (44) during 180s, obtaining 18, 001 iterations for the training process, a testing is made immediately after the training with the same input signal during 60s (6,001 iterations). Also, a testing with a different input from the training (45) was made during 180s, obtaining 18, 001 iterations for this process.
In the Table 2 are shown the modeling errors, according to (18), obtained for each intelligent algorithm at the end of the training and testing phases. In this table, the NNs seem to have a better performance than the fuzzy systems, in the sense that this algorithms converges fast and offers a lower modeling error. Only our proposal has a similar (even slightly better) performance that the NNs.
Modeling errors of the nonlinear system estimation (×10-2)
The Fig. 4 and 5 give the modeling processes of the “fuzzy WN” and the “fuzzy LSTM”. We show the “fuzzy WN” and the “fuzzy LSTM” because for this example all the other fuzzy systems had a similar performance that the “fuzzy WN”, and the other neural model had a similar performance the “fuzzy LSTM”. So, our proposal can generate an acceptable model for nonlinear systems with fast convergence, like a NN but offering a more complete approach, a gray box model instead of a black box model.
As shown in above figures and tables, the proposal model offers very good modelling results for the time series and the nonlinear system. Also it has better robustness and adaptability. It has been shown that our method has better testing results for multi-step prediction, or when some recent data are not available.
In order to obtain better approximation and fast training for TS models, we propose a novel fuzzy-neural network which applies the LSTM networks into the TS model. This model can be interpreted as more complete LSTM networks. Since the data for the model can be explained in the sense of fuzzy systems, the performances of both TS models and LSTM models are improved.
We also design a fast training method for this fuzzy LSTM network to overcome the training problem of the LSTM networks when they are applied in online cases. Stability analysis of the proposed algorithm is given. We use two examples to compare our model with the other intelligent algorithms, the results show that the new model is faster and has better performances than the other algorithms for nonlinear system identification. Our future work will be real world applications of the proposed fuzzy-neural network.
