Abstract
Chaotic systems are dynamic systems with aperiodic and pseudo-random properties, and systems in many fields exhibit chaotic time-series properties. Aiming at the fuzzy modeling problem of chaotic time series, this paper proposes a new fuzzy identification method considering the selection of important input variables. The purpose is to achieve higher model modeling and prediction accuracy by constructing a model with a simple structure. The relevant input variable was swiftly chosen in accordance with the input variable index after the Two Stage Fuzzy Curves method was used to determine the weight of the correlation between each input variable and the output from a large number of selectable input variables. The center and width of the irregular Gaussian membership function were then optimized using the fuzzy C-means clustering algorithm and the particle swarm optimization technique, which led to the determination of the fuzzy model’s underlying premise parameters. Finally, the fuzzy model’s conclusion parameters were determined using the recursive least squares method. This model is used to simulate three chaotic time series, and the outcomes of the simulation are contrasted and examined. The outcomes demonstrate that the fuzzy identification system has higher prediction accuracy based on a simpler structure, demonstrating its validity.
Keywords
Introduction
Chaotic systems are dynamic systems with aperiodic and pseudo-random properties. In the past few decades, a large number of literatures have been studied on chaotic systems [1, 2]. Traditional prediction of chaotic time series is based on dynamic trajectory. The phase space reconstruction method is used to reconstruct the dynamic trajectory of the potential system behind the chaotic time series from the one-dimensional chaotic time series, and then other methods are used to realize the evolution law of the dynamic trajectory, so as to realize the prediction[3]. With the development of technology, machine learning and deep learning are applied to chaotic time series prediction. These methods can be roughly divided into four categories. The first category is the local model based on phase space reconstruction, such as zero-order local model [4]; The second is global model based on phase space reconstruction, such as cyclic predictive neural network [5], neural fuzzy network [6], etc. The third category uses a variety of machine learning techniques to directly predict chaotic time series [7]. The fourth category uses deep learning methods to predict chaotic time series, such as long and short-term memory (LSTM) neural network and its improved model [8]. In recent years, due to the shortcoming and limitation of single forecasting model, more and more researches have been made on combined forecasting model. Literature [9] combined the chaotic linear regression model and Elman neural network model to realize the modeling and prediction of Lorenz chaotic time series. A prediction model based on hybrid neural network and attention mechanism (Att-CNN-LSTM) was proposed [10]. Firstly, phase space reconstruction and data normalization were carried out for chaotic time series. Next the convolutional neural network (CNN) was used to extract the spatial features of the reconstructed phase space of the time series. Then, the features extracted by CNN were combined with the original time series, and the time features were extracted according to the spatial features using the long-and short-term memory network (LSTM). Finally, the key temporal and spatial features of the time series were captured by the attention mechanism, and the final prediction results were given. More and more combined models appear, which does achieve the improvement of prediction accuracy, but also improves the complexity of the model.
Fuzzy model is a kind of nonlinear model in essence, which can approach any nonlinear system with any accuracy, and has the characteristics of simple structure and high identification accuracy. Therefore, fuzzy logic based on fuzzy model has become an effective method to describe complex, ill-formed and nonlinear dynamic systems. The literature [11] combined the Takagi-Sugeno fuzzy model with short and long term memory unit to overcome the shortcomings of high dimension, uncertainty and complex learning process. The model made full use of the interpretability of fuzzy systems and the good approximation ability of long and short memory units. Part of the research on fuzzy model in the identification and control of chaotic systems has been carried out [12]. Fuzzy clustering is one of the methods commonly used in chaotic system modeling. The literature [13] proposed a fuzzy modeling method based on hierarchical fuzzy clustering, which optimized the structure of the T-S fuzzy model through a series of steps, and realizes the modeling and prediction of nonlinear systems. The literature [14] used algorithms based on nearest neighbor fuzzy clustering and robust fuzzy clustering. By introducing the local division correlation factor, each group of input sample vectors was smoothed, and the robustness, identification accuracy and identification speed of system modeling were improved. The above two articles both use fuzzy clustering algorithm to realize the prediction of chaotic time series, which is enough to show that this algorithm has a great advantage in the prediction of T-S fuzzy model. But none of them considered the influence of input variables on the accuracy of fuzzy system modeling and prediction.
Generally speaking, the establishment of T-S fuzzy model includes two parts: structure identification and parameter identification. The selection of input variables, the determination of fuzzy rules and the division of fuzzy space belong to the category of structure identification. In fuzzy modeling, the ratio of the two parts and parameter identification is 100:10:1[15], which shows the important role of input variable selection in T-S fuzzy modeling. For an unknown system, the number of fuzzy rules increases exponentially if all variables are considered, but in practice the number of fuzzy rules is finite. For fuzzy rule-based identification problems, selecting important input variables can solve the contradiction between improving model accuracy and reducing the size of rule base. Generally, fuzzy modeling is always based on empirical fixed input variables, and it is difficult to ensure that there are no redundant variables in the set of input variables, which will directly affect the identification accuracy of the model. Therefore, for the fuzzy modeling of nonlinear systems, it is obviously necessary to have a robust IVS method, which does not rely on the prior knowledge or assumptions of the system, has low computational cost, and can represent the nonlinear and interdependent relationship between the candidate inputs. In reference [16], a two-stage fuzzy curves and surfaces (TSFCS) method was proposed to identify the input structure of nonlinear systems. The first stage fuzzy curve is the local average value of the output for each input, and the second stage fuzzy curve is the local estimate of the variance.Two - stage fuzzy surface is a two-dimensional analogy of two-stage fuzzy curve. This two-stage fuzzy curve and surface method can be used to automatically and quickly identify the important input variables needed in a model system with a large number of inputs. In this paper, the method of selecting important input variables can be introduced into fuzzy system modeling.
In the identification of prerequisite parameters, the partition of fuzzy subsets is determined by the prerequisite membership function. Membership function is a way to describe the membership degree of fuzzy subset, which plays an important role in the process of determining fuzzy transformation. Therefore, this paper designs an irregular Gaussian function to describe the fuzzy subset of t-S fuzzy model. In practical application, when this function is used as a fuzzy logic function, the determination of its center and width is the key to determine the rationality of fuzzy division. However, this function does not have the ability to automatically optimize and update these two parameters. Therefore, appropriate optimization algorithm can be used to determine the center and width of the Gaussian function.
To solve the above problems, a Mackey-Glass chaotic time series prediction model based on input variable selection is proposed in this paper. Firstly, two stage fuzzy curve method is used to quickly select the important input variables from a large number of optional input variables. Next the fuzzy C-means clustering algorithm is used to get the center of the input data and take it as the center of the irregular Gaussian function. Then, PSO algorithm is used to optimize the width of the irregular Gaussian function under the condition of invariant center. Finally, recursive least squares algorithm is used to identify the parameters of the fuzzy model.The proposed method is used to predict mackey-Glass chaotic time series, and the simulation results verify the effectiveness of the proposed method.
T-S fuzzy model
The T-S fuzzy dynamic model is described by fuzzy IF-THEN rules that locally represent the input-output relationship of the nonlinear model [17]. First, considering the data set
By using the inference product, the single-valued fuzzy generator and the center average defuzzier, the defuzzification output of the entire T-S fuzzy model can be expressed as:
Selection of important input variables
The method of TSFCS in [16] is more suitable to deal with the situation that the input variables are interdependent. The first stage fuzzy curves are local averages of the output for each of inputs, and the second stage fuzzy curves are local estimates of the variance. The two stage fuzzy surfaces are the two-dimensional analogs of the two stage fuzzy curves. The correlation between the input variables of the benchmark problem of T-S fuzzy model is very small, so TSFCS method can be improved to make it more suitable for T-S fuzzy model. In this part, the TSFCS method is simplified as a two-stage fuzzy curve (TSFC) method, which can give the weight of the correlation degree between each input variable and output. On this basis, the importance index of all input variables will be obtained. Important input variables can be selected according to input variable indicators. See Fig. 1 for specific algorithm flow chart.

Box and Jenkins example (case 1) fuzzy model performance.
The first stage fuzzy curves are based a simple idea: the most important inputs do the best job of approximating the output is. Suppose that the fuzzy system has M possible input variables: x1, x2, . . . , x M one output y, and N pairs of input and output data (xk1, xk2, . . . , x kM , y k ).
First, for each input x
i
(i = 1, 2, . . . , M), a Gaussian membership function μ
ki
(x
i
) in x
i
- y space is defined as follows:
Then, using formula (4), we create a fuzzy curve
If the degree of correlation between the variable x
i
and the output y is higher than that between x
j
and y, then ∀k, the value of
Under i ≠ j conditions, if
Using first stage fuzzy curves, the second stage fuzzy curves can be given as follow:
If ∀k, there is a significant difference between
In contrast to
From the fuzzy identification methods in literature [18] and [19], it can be seen that the shape of the Gaussian membership function is more suitable for fuzzy identification, and the determination of the two parameters (center and width) of the Gaussian function is more important. In order to improve the model identification accuracy, particle swarm optimization (PSO) is used to optimize the center and width of gaussian function at the same time in literatures, which can improve the identification accuracy, but increase the computer optimization time. In order to avoid complex iterative optimization process, from the perspective of simplified model structure, this paper designs an irregular Gaussian function, whose shape is shown in Fig. 2. The FCM is combined with the irregular Gaussian function, and the fuzzy clustering can automatically search the center of the gaussian membership function, so as to make up for the defect that the center of the Gaussian membership function cannot be automatically determined. Membership function of system. T-S fuzzy model identification approach based on TSFC.

Step 1: Selection of important input variables
Step 2: Fuzzy C-means cluster center acquisition
Fuzzy C-means clustering method is one of the most widely used clustering methods proposed by Bezdek in 1981 [21]. Because of its high efficiency, simplicity and easy implementation, it has been widely used in pattern recognition and data clustering. Among many methods of automatically generating fuzzy IF-THEN rules, fuzzy clustering method is the most popular method. The main idea of the algorithm is to minimize the following objective function:
Each element of the partition matrix φ ik ∈ [0, 1] quantifies the degree to which the data point X k belongs to the cluster i, and satisfies the constraints expressed by the following equation:
The basic idea of the FCM algorithm is based on the criteria given by the minimization formula (4) of the centers of different classes and fuzzy partition coefficients, and then define the update relationship according to the results of the minimization:
According to the given data set (X k , y k ), the FCM algorithm is summarized into the following steps:
(1) Initialize the number of iterations l = 0, set the number of clusters c, weight index m, stop criterion ɛ, and initialize the fuzzy partition matrix Φ randomly;
(2) Update the cluster center
(3) Use the formula (5) to update the distance D ik of each cluster;
(4) The fuzzy partition matrix U = (φ ik ) represented by the update formula (8);
(5) If ∥Φ(l) - Φ(l-1) ∥ > ɛ go back to step (2), otherwise stop.
According to the above algorithm, the final cluster center
Step 3: Obtain the width of Gaussian function
In the PSO optimization algorithm, the potential solution of each optimization problem is a "particle" in the search space. The particle has only two attributes: speed and position. Speed characterizes the speed of movement, and position characterizes the direction of movement. Each particle has a fitness value, and it follows the optimal particle at the time to search for the optimal solution separately in the search space. The PSO algorithm first initializes the particle swarm randomly, each particle has an initial position and velocity, and then optimizes it through an iterative method.
Assuming that the target search space is d-dimensional, a community is formed by N particles, and the position and flight speed of the ith particle are respectively recorded as d-dimensional vector B k = (bk1, bk2, . . . , b kd ), V k = (vk1, vk2, . . . , v kd ), i = 1, 2, . . . , N. The current optimal position found by the i particle, that is, the individual extreme value, is recorded as P id = (pi1, pi2, . . . , p id ), and the current optimal position found by the entire particle swarm, that is, the global extreme value, is recorded as P gd = (pg1, pg2, . . . , p gd ). Initialize the PSO particle population randomly, and then iteratively solve.
In each iteration, the particle follows the historical optimal solution of the individual particle’s optimal solution P id and P gd population, and updates the speed and position according to the following formula:
Among them, j = 1, 2, . . . , d, c1 and c2 are the adjustment factors of P id and P gd , also called acceleration constants. r1 and r2 are random numbers between 0 and 1, used to increase random search. ω is the inertia factor, which can be calculated by the linear iterative equation shown in the formula (18):
In the formula, ω
min
and ω
max
are the minimum and maximum values of iteration weights, respectively. Let DT be the current number of iterations, and MaxDT is the maximum number of iterations. As the number of iterations increases, the inertia weight decreases linearly from 0.9 to 0.4, which can make the particle swarm have a stronger search ability at the beginning of the iteration, can explore the solution space in a large range, and give full play to its advantages in global search. In the later stage, as the optimal fitness of the population gets better and better, the particle swarms gather in a better position, and more detailed search is needed to effectively exert the local search ability of PSO. When the preset maximum number of iterations or the preset minimum fitness threshold is reached, the iteration is terminated. The width of the Gaussian function
Step 4: Obtain the prerequisite membership function
In this paper, an irregular Gaussian membership function is designed to identify the premise parameters, and its formula is shown in the formula (19). Putting the center v
i
and width
Step 5: Use the recursive least squares algorithm to identify the conclusion parameters
Suppose the identification object is P (X, Y), where X is the input of the system, Y is the output of the system, X ∈ R N , Y ∈ R q . Because such a MIMO system can be divided into q MISO subsystems for identification, only the identification of the MISO system is discussed. The model used in this paper is the T-S fuzzy model, such as the formula (20). Definition:
From the formula (2), the output of the system can be expressed as:
Substituting N for the input output data into the above equation, a matrix equation can be obtained:
Step 6: Calculating the mean square error (MSE) of the performance index. The calculation formula is as follows. If the MSE meets the identification accuracy, the identification algorithm ends; otherwise, add c and go to step 2.
In order to verify the predictive ability of the fuzzy model based on the selection of important input variables, experiments are carried out using Logistic, Lorenz and Mackey-Glass chaotic time series. In order to evaluate the prediction error, this article uses several evaluation indicators. The Root Mean Square Error (RMSE) is used to measure the degree of dispersion of a set of numbers, and is used to measure the deviation between the observed value and the true value. Mean Absolute Error (MAE) can avoid the problem of mutual cancellation of errors, so it can accurately reflect the size of the actual forecast error. Mean Absolute Percentage Error (MAPE) can describe accuracy and is often used as a statistical indicator to measure the accuracy of forecasts, such as time series forecasts. In addition to the above three evaluation indicators, a Root Mean Square Percentage Error (RMSPE) commonly used in evaluating time series forecasting performance is introduced, namely:
The logistic chaotic mapping equation is
When 3 ≤ μ ≤ 4, the dynamic system of Logistic mapping changes from period to chaos. In this paper, the initial value of the selected equation is x0 = 0.32, and μ = 3.8. In order to ensure the chaos, the first 1000 data are discarded, and the last 3000 data are selected as the model data samples.
Using the 3000 input and output data pairs obtained by formula (1), let x i = x (t - i) , i = 1, . . . , 50, y = x (t),
Using the TSFC method proposed in Section 3, calculate the performance index function P i of the variable x i , and get the variable list according to its importance (see Table 1). From Table 1, we can see the following input variables: x (t - 1) , x (t - 2) , x (t - 3) and x (t - 4) are the four important input variables of a chaotic system. Supposing the number of fuzzy rules is 3, and using a fuzzy model based on the selection of important input variables for training prediction. The membership functions of the model adopt the regular Gaussian membership function and the irregular Gaussian membership function proposed in this paper. The prediction results and errors of the fuzzy model based on the irregular Gaussian membership function are shown in Fig. 4 and 5. In order to intuitively reflect the superiority of the membership function proposed in this paper, Table 2 lists the comparison of the prediction results of the fuzzy model based on the conventional Gaussian function and the fuzzy model based on the irregular Gaussian function.
List of performance index for Logistic time series
Model error comparison for Logistic time series
It can be seen from Fig. 4 that the fuzzy model has achieved good results in the prediction of Logistic time series. This model can not only predict the trend of sequence changes, but also the error between the real value and the predicted value is small, indicating that the model proposed in this paper has achieved high prediction accuracy. As shown in Table 2, compared with the prediction error based on the conventional Gaussian function fuzzy model, the prediction errors of the irregular Gaussian function fuzzy model considering important input variables are all the smallest. It shows that the model proposed in this paper has high prediction accuracy and good prediction performance.

Comparison of our model and the original system for Logistic time series(Testing).
The Lorenz chaotic map equation is
The initial value of the selected equation is x = y = z = 1, and let a = 10, e = 28, and b = 8/3. In order to ensure the chaos, the first 1000 data are discarded, and the last 3000 data are selected as the model data samples. This paper selects the x component of the Lorenz system for predictive analysis, and takes the first 80% as the training set, and the last 20% as the test set. Similarly, after obtaining the data, first using the TSFC method proposed in Section 3 to calculate the performance index function P i of the variable x i , and getting the variable list according to its importance (see Table 3). The fuzzy model is used for training prediction, and the two membership functions are also used for comparison to highlight the superiority of the irregular Gaussian function. The prediction results and the prediction relative error of the fuzzy model based on the irregular Gaussian function are shown in Fig. 5. Table 4 lists the specific values of the fuzzy model prediction error comparison.

Comparison of our model and the original system for Lorenz chaotic time series(Testing).
List of performance index for Lorenz chaotic time series
Model error comparison for Lorenz chaotic time series
The prediction results of the x component of the Lorenz system shown in Fig. 5 well represent the prediction performance of the fuzzy model considering the selection of input variables. On the prediction set, the model’s prediction result has a small error and high accuracy, which is very close to the true value. It can be seen from Table 4 that the model proposed in this paper still performs well on the Lorenz sample set, and the improvement of prediction performance reflects the robustness of the model proposed in this paper. Although compared with the prediction result of the fuzzy model based on the conventional Gaussian function, the prediction result of the fuzzy model based on the irregular Gaussian function is slightly inferior in terms of the degree of dispersion and the actual prediction error. But it shows some advantages in terms of accuracy. From this point of view, the irregular Gaussian function proposed in this article still has a certain value in use.
The Lorenz Chaotic Time Series is modeled and predicted using a hybrid model called CEEMDAN-LSTM that combines Complete Ensemble Empirical Mode Decomposition with Adaptive Noise and long short-term memory [22]. The model’s prediction accuracy is RMSE=1.327, MAE=1.124, and MAPE=0.119. The fuzzy model based on the selection of significant input variables described in this paper has significant advantages over the CEEMDAN-LSTM model in three assessment metrics. The model presented in this paper has a higher identification accuracy than the neural network model, as can be shown.
The first two simulation examples show the superiority of the fuzzy model in forecasting. However, because the important input variables are sorted sequentially, the superiority of the fuzzy model that considers the important input variables proposed in this paper is not fully contrasted. Therefore, this paper adds a third simulation example Mackey-Glass chaotic delay differential equation, which is widely used to compare the learning and generalization capabilities of different fuzzy models. The Mackey-Glass chaotic time series is generated by the following differential equation:
The task of time series prediction is to use the value of the known time series in x (t) to predict the value of x (t + P) in the future (P is the number of forecast steps). In order to obtain the value of the time series at the integer point, the fourth-order Runge-Kutta method can be used to obtain the numerical solution of the formula (20). Fig. 6 shows the Mackey-Glass chaotic time series. Setting the initial value x (0) =1.2, and collecting 1000 input and output data pairs between t = 124 and t = 1123. Fig. 7 shows the phase diagram of the system.

Mackey-Glass chaotic time series.

Phase diagram of chaotic system.
After getting 1000 input and output data pairs, let x i = x (t - i) , i = 1, . . . , 18, y = x (t + 1),
List of performance index for Mackey-Glass chaotic system
Using the TSFC method proposed in Section 3, the performance index function P i of the variable x i is calculated, and the variable list is obtained according to its importance (see Table 5). Table 5 shows the following input variables: x (t - 1) , x (t - 2) , x (t - 3) , x (t - 4) , x (t - 5) and x (t - 18) are the six important input variables of the chaotic system.
In order to verify the predictive performance and generalization of the T-S fuzzy model considering the selection of important input variables, 1000 data pairs are divided into two parts, the first 500 data pairs are used as the training sample set, and the remaining 500 data pairs are used as the test sample set.

Comparison of our model and the original system for Mackey-Glass chaotic system(Training).

Comparison of our model and the original system for Mackey-Glass chaotic system(Testing).
Comparison of the results of selecting input variables for Mackey-Glass chaotic system
Comparison of different models for Mackey-Glass chaotic system
The high-precision modeling and prediction of the nonlinear system is made possible by the fuzzy modeling approach, which is based on the selection of significant input variables and the optimization of the T-S fuzzy model’s structural design. The system’s redundant input is removed, the fuzzy rule base is reduced, and the inconsistency between the model’s accuracy and the rule base is resolved using the important input variable selection approach. To achieve reliable identification of the fuzzy model’s premise parameters, the fuzzy subsets are described using an irregular Gaussian membership function. A practical and efficient method for the identification of complicated nonlinear systems as well as the prediction and management of chaotic time series is provided by the recursive least square method, which is employed to realize the real-time update of the system conclusion parameters. In the future, the model will be used in subsequent research to predict wind speed and other real systems having chaotic characteristics.
