Abstract
As a model for reasoning and decision-making based on fuzzy rules, fuzzy systems have high interpretability. However, when the data dimension increases, the fuzzy system will face the problem of “rule explosion”, making it difficult to learn and predict effectively. In this paper, the fuzzy system trained by the FLOWFS (Fast-Learning with Optimal Weights for Fuzzy Systems) algorithm is used as sub-module in the deep fuzzy system, and the deep fuzzy system DFLOWFS (Deep FLOWFS) is constructed from the bottom-up hierarchical structure as the following three steps. 1) The FLOWFS algorithm assigns weight attributions to each fuzzy rule, and the rule weights are trained by the least square method with regularization terms to shorten training time and improve accuracy. 2) Three strategies of dividing high-dimensional inputs into multiple low-dimensional inputs are proposed as sequential division, random division and correlation division. Then, it is verified by experiments that the correlation division has the best performance. 3) The sub-module discarding method is proposed to discard the sub-modules with poor performance to have a maximum improvement of 13.8% compared to the DFLOWFS without using the sub-module discarding method. Then, the optimized DFLOWFS is verified and compared with the other three classic regression models on the three UCI datasets. Experiments show that with the increase of the data dimension, DFLOWFS not only have good interpretability but also have good accuracy. Furthermore, DFLOWFS performs best among all models in comprehensive scores, with good learning ability and generalization ability. Therefore, the proposed strategies with hierarchical structure for optimal shallow fuzzy systems are effective, which give a new insight for fuzzy system research.
Introduction
From traditional machine learning to artificial neural networks to deep learning, artificial intelligence has provided us with great convenience in daily life, especially in fields such as facial recognition and autonomous driving [1]. However, its lack of interpretability also brings security risks [2]. As a reasoning Decision model based on fuzzy rules, fuzzy system usually has high interpretability and researchers [3].
Although fuzzy system is a highly interpretable artificial intelligence method, its theoretical research is not mature enough [4]. The extraction of fuzzy rules was mainly based on the knowledge and experience of experts in the early stage, and it is called fuzzy system modeling based on expert knowledge. With the development of the times, the amount of information is gradually increasing, and it is inevitable that there is a lack of expert knowledge. Therefore, research began to turn to data-driven fuzzy system modeling, especially the Wang-Mendel (WM) method proposed by Wang Lixin and Mendel [5–7]. It realized the extraction of fuzzy rules from data, no longer relying mainly on expert knowledge and experience, greatly improving the intelligence level of fuzzy systems. However, the WM method has the problem that the accuracy is not high enough, which inevitably affects the reliability of decision-making. Therefore, how to improve the accuracy of fuzzy system modeling has become the key issue [8].
Many scholars have attempted many new methods, such as those based on fuzzy clustering, genetic algorithms, and neural networks. Guo J [9] proposed an improved WM method based on the approximate fuzzy c-means (AFCM) algorithm to generate fuzzy rules, which improves the performance of fuzzy systems. Zhang R [10] combines the improved genetic algorithm (IGA) with dynamic Auto to propose a regression (ARX) Takagi Sugeno (T-S) fuzzy model with exogenous inputs. The T-S fuzzy model has good modeling accuracy and simple structure. Harifi [11] proposed using the newly released Emperor Penguin Swarm Algorithm to optimize ANFIS (adaptive-network-based fuzzy inference system), achieving higher model accuracy. Liu [12] proposed a Hebbian based rule reduction algorithm (HeRR) and a new rough set based attribute selection algorithm RS-HeRR for fuzzy system rule reduction, which are used to generate effective, interpretable and compact rule sets. Although these methods improve the accuracy of fuzzy system modeling, they all have some shortcomings. On the one hand, the model is relatively complex and the optimization time is too long, and most of it starts from a single direction without considering the domain partition, membership function parameters, number of rules, fuzzy system structure, and other comprehensive factors. On the other hand, it is difficult to solve the problem of dealing with high-dimensional data when fuzzy systems face “dimensional disasters”. The DFLOWFS method proposed in this article is a fuzzy system modeling method that can efficiently and quickly process high-dimensional data. Compared with other methods, it not only has better learning ability and generalization ability, but also higher interpretability; With the increase in data dimensions, DFLOWFS still has good performance and can better fit data; It has significant theoretical and practical significance.
FLOWFS: Fast learning-algorithm with optimal weights for fuzzy systems
The algorithm that assigns fuzzy rules weight attributes and trains the rule weights through the least squares method is named as FLOWFS (Fast Learning-algorithm with Optimal Weights for Fuzzy Systems). The least squares method is a numerical optimization technique, which seeks to find the optimal parameters of the function to be fitted by minimizing the sum of squares of errors between all true and fitted values. The use of the least squares method can be easily and quickly solved, and has good mathematical theoretical support, as well as good fitting effect and accuracy. When FLOWFS uses the least squares method to train weights, it adds a regularization term to prevent the Multicollinearity problem of the characteristic matrix, while restraining the over fitting of the model, and improving the generalization ability of the model. It significantly reduces the Time complexity of the algorithm, improves the running time of the algorithm, and improves the accuracy. From this, DFLOWFS is based on fuzzy systems with fast running speed and high accuracy in low dimensional space. High dimensional problems are transformed into multiple low dimensional problems and combined to effectively process high-dimensional data. The specific implementation steps of FLOWFS are shown in Table 1.
FLOWFS algorithm implementation steps
FLOWFS algorithm implementation steps
Through the analysis, the optimal structure is searched from all the fuzzy set structures, thus changing the structure of the initial fuzzy sets to adapt to the data, and then changing the membership degree of the data in the rules [13]. Compared with the initial time, the data in the rules and the membership degree on the network expands or shrinks by a certain multiple. In order to achieve this effect, we assign weighted attributes to the fuzzy rules, and change the membership degree of the data on the rules through the rule weights, which is equivalent to changing the structure of the fuzzy sets [14, 15].
For the extraction of fuzzy rules in the fuzzy system [16], a complete fuzzy rule base is obtained by training the full rule base, so it is only necessary to determine the weight of the rules. At the same time, it is not necessary to reduce redundant rules after adding rule weights. When the weight of the fuzzy rule is 0 or close to 0, the rule will not work in the model, which is equivalent to the rule being reduced. It can be seen that the addition of fuzzy rule weights not only simplifies the optimization of the fuzzy set structure, but also reduces the redundant rules, which greatly reduces the training time of the model, and the time complexity is significantly reduced.
FLOWFS is a fuzzy system modeling algorithm based on rule weight optimization. It assigns weight to each fuzzy rule and trains the rule weights by the least squares method with a regularization term [17]. The accuracy of the model has also been improved. The fuzzy system trained by the FLOWFS is used as a sub-module in the deep fuzzy system and the deep fuzzy system is constructed from the bottom up [18]. The deep fuzzy system is named DFLOWFS, and its algorithm steps are shown in Table 2. Taking the window size of 3 and the sliding step size of 2 as an example, the structure of the DFLOWFS is shown in Fig. 1.
DFLOWFS algorithm steps
DFLOWFS algorithm steps

Structure diagram of deep fuzzy system.
Suppose an n-dimensional input and single-output training data
Sub-dataset
We train the second layer in the same way as we train the first layer. A submodule is constructed with a window size of 3 and a window sliding step size of 2. First, the dataset will be divided into
During the construction process, in the case of one-dimensional convolution, the data dimension in the last window of each layer may have the problem of insufficient window size. For example, for a four-dimensional input data, assuming that the window size is 3 and the sliding step size of the window is 2, then after the window is slid once, the data in the window only includes the third dimension and the fourth dimension, less than three dimensions. Therefore, this paper uses random filling to fill the vacancies, and randomly selects the missing number of data dimensions from the data dimensions that are not in the current window to fill. In the above example, this method randomly selects one dimension from the first and the second dimensions, and forms a three-dimensional sub-dataset with the third and the fourth dimensions, which satisfies the window size.
High-dimensional input partition strategy
In Section 3, we divided the sub-datasets according to the initial order of the data, and we defined this division as sequential division. Sequential division can be regarded as an unguided division, which only constructs sub-datasets in a continuous range based on the initial order, so the initial order of data input features will affect each sub-fuzzy system, thereby affecting the performance of deep fuzzy systems.
In this section, we propose two other partitioning strategies. A division strategy that we define as random division is also an unguided division strategy. The random scramble function is used to randomly scramble the data dimension, which introduces randomness to the division of the dataset, so that it no longer depends on the initial data. order [20]. The other is defined as correlation division. This division strategy first calculates the Pearson correlation coefficient between each dimension of the input feature and the target variable according to Formula (3), and then arranges the data dimensions in descending order according to the correlation coefficient. The data are divided sequentially [21]. This is a guided partitioning strategy. Assuming that there is a four-dimensional input data, we give each dimension a number of 1, 2, 3, and 4, respectively. After the correlation coefficient is sorted in descending order, the order of the data dimensions becomes 3, 2, 4, and 1, then we will According to this order, the sub-dataset is divided according to the sequence division strategy. If the window size is 3 and the sliding step is 1, then the data of the third, second and fourth dimensions will form the first sub-dataset, and the second sub-dataset that, the fourth dimension and the first dimension will build.
Among them,
We use the Bc dataset in Table 8 to conduct comparative experiments with DFLOWFS with a window size of 3 and a sliding step size of 2, to compare the effects of the three partitioning strategies on the model performance. Figure 2 illustrates the difference between the predicted value and the true value of DFLOWFS on the training set and test set under three partitioning strategies, where order means sequential division, random means random division, and corr means correlation coefficient for guided division (Fig. 2(1) shows the errors of the three partitioning strategies on the training set, and Fig. 2(2) shows the errors of the three partitioning strategies on the test set).

Error images of DFLOWFS under three partition strategies.
At the same time, we use objective evaluation indicators absolute mean error MAE (Formula (4)), mean square error MSE (Formula (5)), determination coefficient R2 (Formula (6)), and symmetric mean absolute percentage error SMAPE (Formula (7)) for evaluation. Validate on three datasets in the UCI Machine Learning Repository (detailed in Section 5.2). The evaluation indicators of DFLOWFS on the training set under three partitioning strategies are shown in Table 3. The evaluation indicators of DFLOWFS on the test set under three partitioning strategies are shown in Table 4.
Evaluation index values of DFLOWFS on the training set under the three partition strategies
Evaluation index values of DFLOWFS on the test set under the three partition strategies
From the above experiments, it can be seen that DFLOWFS using relevant partitioning strategies has the best performance on both the training and testing sets. As shown in Tables 3 and 4, DFLOWFS, which uses relevant partitioning strategies, ranks first among all four evaluation indicators and has good learning and generalization abilities. DFLOWFS under random partitioning strategy also has good fitting performance due to the introduction of randomness. It can be seen that compared to random partitioning strategies and sequential partitioning strategies, correlated partitioning strategies have better performance.
In Section 3, each sub-module of each layer is retained, so that there may be some sub-modules with poor performance participating in the establishment of the model, causing the model to be biased. On the other hand, all sub-modules are retained, which will make the model too complex and easily lead to overfitting.
Therefore, this section proposes the submodule discarding method [22], which selectively discards submodules at each layer. After the training of each layer is completed, the performance of the sub-module is evaluated. Here we use the coefficient of determination R2 to evaluate the performance of the sub-module and set a drop threshold. When the R2 value of the sub-module is lower than the drop threshold, the sub-module will be discarded and no longer participates in the construction of the next layer of fuzzy systems. Suppose after training, the first layer has four sub-modules, sub-module
In order to verify the effectiveness of the sub-module discarding method, we also use the Bc dataset in Table 8, with the window size of 3, the sliding step size of 2, and the sub-input space using the correlation division of DFLOWFS to conduct comparative experiments, and compare the sub-module discarding. The effect of DFLOWFS model performance with and without the submodule dropping method, where we set the dropout threshold to 0.2 in the submodule dropping method. Figure 3 is an image of the difference between the predicted value and the true value of DFLOWFS on the training set and test set under the two sub-module selection strategies.

Error images of DFLOWFS under two submodule selection strategies.
At the same time, we use objective evaluation indicators MAE, MSE, R2 and SMAPE for evaluation. Validate on three datasets in the UCI Machine Learning Repository (detailed in Section 5.2). The evaluation index values of DFLOWFS under the two submodule selection strategies on the training set are shown in Table 5, and the evaluation index values on the test set are shown in Table 6.
Evaluation index values of DFLOWFS on the training set under the two submodule selection strategies
Evaluation index values of DFLOWFS on the test set under the two submodule selection strategies
It can be seen from the above experiments that although the values of each indicator in the training set have decreased, all indicators in the test set have improved, and the proportion of improvement is higher than that of the training set. Among them, MAE on the test set increased by 7.8%, MSE increased by 13.8%, R2 increased by 4.7%, and SAMPE increased by 10.1%, while the maximum decrease on the training set was only 4.7%. Therefore, it can be seen that the performance improvement of the model on the test set is greater than the decrease in the training set, and the generalization ability of the model is greatly improved. Compared with the learning ability, the generalization ability of the model is more important in practical applications. The improvement of the ability can better predict the unknown data. From another point of view, the sub-module discarding method reduces the complexity of the model and prevents the model from overfitting by discarding some sub-modules with poor performance.
Through the above research, we improved the learning algorithm of DFLOWFS in Table 2. The complete learning algorithm of DFLOWFS is shown in Table 7 (The implementation process of the FLOWFS module is shown in Table 1).
Steps of the optimized DFLOWFS algorithm
Steps of the optimized DFLOWFS algorithm
In order to evaluate the proposed DFLOWFS algorithm, the back propagation (BP) neural network, the radial basis function (RBF) neural network [23], and long short-term memory (LSTM) method were introduced for comparison and validation. At the same time, in order to verify the impact of introducing fuzzy rule weights, we will not conduct rule weight training but also add comparisons, that is, the algorithm that does not execute step 4 in Table 4-1 will be named FRFS. In addition, we introduce absolute mean error MAE (Formula (4)), mean square error MSE (Formula (5)), determination coefficient R2 (Formula (6)), and symmetric mean absolute percentage error SMAPE (Formula (7)) for quantitative evaluation.
Assuming n m-dimensional training data
For the three evaluation indicators MAE, MSE, and SMAPE, the smaller the indicator value, the smaller the fitting error of the model and the better the fitting effect. For R2, the upper limit of R2 is 1. When R2 is negative, it indicates that the model does not have fitting ability. When R2 is between 0 and 1, the model has a certain fitting ability. The closer R2 is to 1, the better the fitting effect of the model. For runtime, the shorter the time, the faster the model is established.
In order to verify the effectiveness of DFLOWFS, we introduce BP neural network, RBF neural network, LSTM for comparison experiments. Similarly, we use four objective evaluation indicators MAE, MSE, R2 and SMAPE for quantitative evaluation, and the four evaluation indicators are comprehensively considered by a comprehensive scoring system. The computing environment is as follows: processor: Intel (R) Core (TM) i7-8550U CPU @ 1.80GHz, 1992 Mhz, 4 cores, 8 logical processors.The proposed algorithm is verified on three datasets of the machine learning database UCI Machine Learning Repository. The three datasets are Bias correction of numerical prediction model temperature forecast Dataset (Bc dataset), Superconductivity Data Dataset (Scd dataset) And Smartphone Dataset (Smartphone Dataset). Evaluate the predicted results based on the average of 20 experimental results. The characteristics of each dataset are summarized in Table 8, of which 70% are used as training set and 30% are used as test set.
Summary of dataset features
Summary of dataset features
Figure 4(1) shows the error between the predicted results and the true values of DFLOWFS, LSTM, BP neural network, and RBF neural network on the Bc training set. Figure 4(2) shows the error between the predicted results and the true values of DFLOWFS, LSTM, BP neural network, and RBF neural network on the Bc test set. The results show that DFLOWFS is superior to the other three methods.

Prediction error map of each model on the Bc dataset.
Table 9 shows the evaluation index values of each model on the Bc training set, Table 10 shows the evaluation values of each model on the Bc test set, and Table 11 shows the comprehensive scores obtained by each model on the Bc dataset based on the comprehensive scoring system.
Evaluation index values of each model on the Bc training set
Evaluation index values of each model on the Bc test set
Comprehensive scores of each model on the Bc dataset
In Table 9, our proposed DFLOWFS is significantly better than the other three in terms of MSE, R2 and SMAPE, and it is second only to RBF in MSE with a small difference, with a gap of only 0.000165. It can be seen that DFLOWFS has good learning ability on the Bc training set. From Table 10, DFLOWFS ranks first in all four evaluation indicators, and has the best generalization ability on the Bc test set with the indicators that are significantly ahead of the other three models. Finally, according to the comprehensive scores of each model on the Bc dataset in Table 11, DFLOWFS ranks first in both the training set and the test set, with good performance.
Figure 5 (1) shows the predicted results of DFLOWFS, LSTM, BP neural network, and RBF neural network on the training set of Scd and the error of the true values of each algorithm. Figure 5 (2) shows the predicted results of DFLOWFS, LSTM, BP neural network, and RBF neural network on the Scd test set and the error of the true values of each algorithm. The results indicate that the DFLOWFS method is superior to the other three methods.

Prediction error map of each model on the Scd dataset.
Table 12 shows the evaluation index values of each model on the Scd training set, Table 13 shows the evaluation values of each model on the Scd test set, and Table 14 shows the comprehensive scores obtained by each model on the Scd dataset based on the comprehensive scoring system.
Evaluation index values of each model on the Scd training set
Evaluation index values of each model on the Scd test set
Comprehensive scores of each model on the Scd dataset
First of all, from the training set, it can be seen from Table 12 that DFLOWFS also ranks first in all four evaluation indicators, especially in R2 and MAE has obvious advantages. Then from the test set, according to Table 13, it can be seen that DFLOWFS ranks first in MAE and R2 two indicators, and ranks second in MSE index, and the gap between it and the first-ranked BP is only 0.000586, which is very small. Finally, from the comprehensive score, it can be seen from Table 14 that DFLOWFS ranks first in the training set and has the best ability to learn data rules from the data. DFLOWFS and BP are tied for the first place in the test set, both have good predictive ability for unknown data, and DFLOWFS is a rule-based model compared to the BP neural network model, which has higher interpretability than the BP model. In summary, the DFLOWFS method has good generalization ability and can better predict unknown data, reducing the complexity of the model and preventing overfitting of the model.
Figure 6(1) shows the error between the predicted results and the true values of DFLOWFS, LSTM, BP neural network, and RBF neural network on the Smartphone training set. Figure 6(2) shows the error between the predicted results and the true values of DFLOWFS, LSTM, BP neural network, and RBF neural network on the Smartphone test set. The results show that DFLOWFS is superior to the other three algorithms.

Prediction error map of each model on the Smartphone dataset.
Table 15 shows the evaluation index values of DFLOWFS and the other three algorithms on the Smartphone training set, Table 16 shows the evaluation values of each model on the Smartphone test set, and Table 17 shows the comprehensive scores obtained by each model on the Smartphone dataset based on the comprehensive scoring system.
Evaluation index values of each model on the Smartphone training set
Evaluation index values of each model on the Smartphone test set
Comprehensive scores of each model on the Smartphone dataset
Compared with the first two datasets, the dimension of Smartphone dataset is larger, reaching 560 dimensions, but DFLOWFS still has better performance. According to Table 15, it can be seen that the value of DFLOWFS R2 on the training set reaches 0.99, which fits the data well. Compared with other models, DFLOWFS is better than other models in all indicators, and has obvious advantages. From Table 16, DFLWOFS significantly outperforms other models on MSE, R2 and SMAPE on the test set. Finally, from the comprehensive score in Table 17, the comprehensive score of DFLOWFS ranks first in both the training set and the test set, with good learning ability and generalization ability.
Artificial intelligence has gradually evolved from expert systems to data-driven algorithms, overcoming the limitations of expert knowledge caused by the increasing amount of information with social development [24]. Fuzzy system is a highly interpretable artificial intelligence method, but its theoretical research is not mature enough. The current methods have the problem that the accuracy is not high enough, and the low accuracy will inevitably affect the reliability of decision-making [25]. This paper proposes a deep fuzzy system DFLOWFS built layer by layer from bottom to top based on the fuzzy system FLOWFS optimized by rule weights, and optimizes the structure of the deep fuzzy system from two aspects. The three main innovations of the DFLOWFS algorithm are as follows: By giving rule weights, the fuzzy system is optimized constructively, and the least square method, a numerical optimization method, is innovatively introduced to train the weights. At the same time, regularization terms are added in the training process, which inhibits the reduction of the training time of the model overfitting model, and improves the accuracy of the model. The initial order of data input features will affect each sub fuzzy system, thereby affecting the performance of deep fuzzy systems. Therefore, we propose two other partitioning strategies: random partitioning and correlation partitioning. The experiment shows that DFLOWFS using correlation partitioning strategy performs best on both the training and testing sets, and has good learning and generalization abilities; The random partitioning strategy is second only to correlation partitioning in various indicators, as the introduction of randomness also has good fitting effects. A submodule discarding method was proposed, which selectively discards each layer’s submodules, greatly improving the model’s generalization ability and enabling better prediction of unknown data. On the other hand, the submodule discarding method reduces the complexity of the model and prevents overfitting of the model by discarding some submodules with poor performance.
Footnotes
Acknowledgments
The main idea of this paper dates back to 10 years ago, when Dewang Chen, the corresponding author of this paper, was visiting University of California at Berkeley as a visiting scholar of Prof. Lotfi Zadeh, father of fuzzy logic and member of Academy of Engineering of USA. In the discussion during the serial seminars named as “Interpretability vs Accuracy” hosted by Prof. Zadeh, the idea of interpretable fuzzy modeling occurred to Chen’ mind. After long-time thinking, coding and writing, this article was formed. We would like to express our gratitude to the late Prof. Zadeh, who inspired us to pursue interpretable AI, not just high-accuracy AI.
This work is jointly supported by the National Natural Science Foundation of China under Grant 61976055, the Special Fund for Education and Scientific Research of Fujian Provincial Department of Finance under Grant GY-Z21001, and Scientific Research Foundation of Fujian University of Technology under Grant GY-Z22071.
