Abstract
Fuzzy rules are very important in Takagi-Sugeno-Kang (TSK) fuzzy systems as they not only provide a mapping mechanism for input patterns but also make fuzzy systems interpretable. Current works further introduce rule weights to restrict/strengthen fuzzy rules for more situations. However, most of the embedded rule weights in fuzzy rules are static. In other words, the rule weights keep unchanged once they have been determined by the learning algorithms. In practical applications, it is often expected that each fuzzy rule should deduce different confidence degrees with respect to different input patterns. In this paper, a new TSK fuzzy system is proposed, in which each fuzzy rule is empowered by an individual dynamic rule weight (DRW). DRW is basically a nonlinear function of the input pattern to reflect the confidence degree (acceptability) the fuzzy rule acting on the input pattern. Furthermore, for an input pattern, its “isolation” level can be measured by the aggregated DRW values of the fuzzy rules in the proposed fuzzy system. Specifically, the proposed fuzzy system can be used to identify outliers whose aggregated DRW values of all fuzzy rules are very small. In order to effectively embed DRW to each fuzzy rule, an analogous stacked structure consisting of a basic input-output unit and an augmenting unit is proposed. The stacked architecture is characterized by three features: (i) the augmented information from the augmenting unit can provide indirect pattern information for DRW learning; (ii) the predictive information from the augmenting unit can be differentiated from the interpretability of fuzzy rules; and (iii) the modeling performance can be improved by the stacked generalization principle which leverages the predictive information in the manifold of the input pattern space in the system approximation process. Experimental results on 16 real-life datasets demonstrate the approximation accuracy, interpretability and outlier detection ability of the proposed fuzzy system.
Introduction
TAKAGI-Sugeno-Kang (TSK) fuzzy systems are rule-based fuzzy systems, which have been successfully adopted in different practical applications such as process control, medical diagnosis, financial prediction, and image processing [1–3] because of its strong approximation capability and good interpretability. Fuzzy rules are the souls and skeletons of TSK fuzzy systems because they not only provide a mapping mechanism for input features but also make fuzzy systems interpretable. As for classical TSK fuzzy systems (e.g., 0-order TSK fuzzy system, one-order TSK fuzzy system [4]), each fuzzy rule is treated fairly. To observe the effect of rule weights in fuzzy rule-based systems, some representative results [5–9] have been reported in which rule weights are also called certainty factors or certainty grades. In [5], Nauck and Kruse analyzed the interpretation of rule weights both in the Mamdani-type fuzzy systems and the Sugeno-type fuzzy systems from two aspects, i.e., rule weights are applied to complete rules and rule weights are applied only to the then-parts of fuzzy rules. For the learning of rule weights, they revealed that it could be replaced by modifying the membership functions of the antecedent or consequent fuzzy sets. In [6], Ishibuchi and Nakashima used the rule weight to decide the size of the decision area of each fuzzy rule in the Sugeno-type fuzzy systems. Unlike the results reported in [5], they revealed that the learning of certainty grades could be achieved without modifying the membership functions such that the interpretability of fuzzy rules had not been compromised and the classification performance could also be improved. Later, Ishibuchi et al. [7] presented two heuristic algorithms for rule weight specification. In [8], Tsang et al. introduced global weights and local weights to fuzzy rules in order to enrich the knowledge representations and accordingly designed a fuzzy neural network-based learning algorithm to tune global and local weights. In [9], Chen and Huang firstly presented a weighting mechanism that features appearing in the if-parts were endowed with different weights and then employed genetic algorithms to adjust weights of features in order to estimate the null values in relational database systems. More references about rule weights can be found in [10–13].
From the existing researches on rule weights, it can be seen that most of them concentrate on the idea of restricting/strengthening the corresponding fuzzy rules. In other words, once rule weights are determined by learning algorithms, they will be kept unchanged. However, in practice, rule weights could be much relaxed and judiciously exploited. Figure 1 illustrates a medical diagnostic system, where the knowledge of medical experts can be summarized into each fuzzy rule. The if-part of a fuzzy rule can be represented by symptom descriptions of a patient and the then-part can be represented by the diagnosis result for a patient. Suppose one expert has the knowledge that “If a patient is with photophobia, impaired vision, double images, and hyperglycemia, then the patient may be suffered from diabetic retinopathy (DR).” When the fuzzy rule is applied to a patient and a patient having a family history of diabetes, respectively, the confidence degrees (i.e. rule weights) of diagnosis as DR should be apparently different. The patient having a family history of diabetes may be suffered from DR with 70% confidence degree while the other may be suffered from DR with 10% confidence degree. In other words, fixing a stationary rule weight on each fuzzy rule is not sufficient to model the complex scenarios in the real world. Thus, a dynamic rule weight (DRW) should be exploited to reflect the confidence degree of each fuzzy rule with respect to each input pattern.

An instance of a medical diagnostic system.
Inspired by the practical requirements, this work proposes a new TSK fuzzy system termed as DRW-TSK-FS in which DRW is embedded in each fuzzy rule. DRW-TSK-FS is different from the existing TSK fuzzy systems with rule weights on the following aspects:
(i) In DRW-TSK-FS, the rule weight in each fuzzy rule is dynamic. As for different input patterns being judged by the same fuzzy rule, even all of them match with the if-part of the fuzzy rule, the rule weight of this fuzzy rule acting on each input pattern may be different. In other words, the rule weight can be considered as a function of the input pattern. To the contrary, the rule weight in most existing fuzzy systems is only interpreted as the strength of each fuzzy rule. The larger the rule weight is, the larger the decision area of the fuzzy rule is.
(ii) DRW learning is very simple. From the mathematical view, DRW can be expressed as the product of serval membership functions. Hence, clustering techniques (e.g., FCM [14]) can be employed to achieve DRW learning.
The contributions of this paper can be summarized as follows:
(i) A new TSK fuzzy system DRW-TSK-FS with dynamic rule weights is proposed. The structure of DRW-TSK-FS consists of a basic input-output unit and an augmenting unit. The augmenting unit is used to add predictive information to the original input pattern space that is taken as the new input of the basic input-output unit. Such a structure is similar to the stacked structure such that its performance can be insured by the stacked generalization principle because the predictive information can be leveraged to open the manifold of the feature space to improve the generalization capability. Although the new input pattern space contains predictive information fuzzy rules in the basic input-output unit have the predictive information as augmented inputs from the augmenting unit, they can be understood as those with only the original input patterns in the if-parts together with DRW with respect to each predictive information from the augmenting unit. Hence, the if-parts of all fuzzy rules in DRW-TSK-FS always have concise physical meanings.
(ii) DRW in each fuzzy rule is a nonlinear function of the input pattern which reflects the confidence degree (acceptability) of the fuzzy rule with respect to the input pattern. In addition, for an input pattern, its “isolation” level can be measured by the summed value of DRW of all fuzzy rules in DRW-TSK-FS. Specifically, DRW-TSK-FS can be used to identify outliers whose summed value of DRW is very small.
(iii) DRW-TSK-FS has been evaluated on 16 datasets and the experimental results have demonstrated its enhanced or at least comparable modeling performance as well as its ability in outlier detection.
The following sections are organized as follows. TSK fuzzy systems and their learning algorithms are briefly reviewed in Section II. In Section III, the structure of DRW-TSK-FS is given and its learning process is discussed. Section IV reports the experimental results. Section V concludes the whole paper and describes future works.
In this section, the TSK fuzzy system and its learning algorithms are briefly reviewed since the TSK fuzzy system is the basic unit of DRW-TSK-FS.
Concept and principle
TSK fuzzy systems are the most frequently used fuzzy systems in which the kth fuzzy rule can be formulated as
In (1),
Generally speaking, parameters involved in the if-parts and then-parts are usually learned separately. In the if-parts, the Gaussian membership function is adopted as the mapping function and a fuzzy set
In (4),
In (5), u j k represents the fuzzy membership degree of the object x j belonging to the kth cluster by FCM clustering, h is a scale parameter that is manually tuned.
When the parameters in the if-parts are determined by clustering techniques, let
so the solution to the output of the TSK fuzzy systems in (2) can also be rewritten as the following form
With respect to the parameter learning in (7), there exist so many outstanding learning algorithms and criteria. In [17], the least square criterion function was introduced for the parameter learning of the then-parts. However, this criterion used in TSK fuzzy systems is not robust for modeling small datasets or noisy datasets [4]. In [18], another one, i.e., ɛ-insensitive criterion was proposed for the parameter learning of the then-parts. With an L1-norm penalty term, the parameter learning can be taken as a quadratic programming problem and the corresponding TSK fuzzy system is called L1-TSK-FC. Comparing with the least square criterion, L1-TSK-FC becomes more robust against small datasets and noisy datasets. Furthermore, in [4], the authors pointed out that an L2-norm penalty term can also be employed to develop ɛ-insensitive criterion-based algorithms (i.e., L2-TSK-FC) for the parameter learning of the then-parts. The insensitive parameter ɛ is involved in the objective function as a penalty term. Comparing with L1-TSK-FC, L2-TSK-FC is more superior because the insensitive parameter ɛ can be obtained automatically.
In (1), if
DRW-TSK-FS: The proposed TSK fuzzy system with dynamic rule weights
In this section, detailed analyses about DRW-TSK-FS including its structure, fuzzy rules, and the involved dynamic rule weights are given. Then, the learning algorithm of DRW-TSK-FS is presented.
Structure of DRW-TSK-FS
The structure of DRW-TSK-FS is illustrated in Fig. 2 where

Structure of DRW-TSK-FS.
As for the augmenting unit, it can be realized by many regression models like SVR [19, 20] and neural networks (NN) [21–23]. In this paper, the single layer neural network is employed to realize the augmenting unit. Hence, the augmenting unit is termed as the NN unit. As for the TSK fuzzy system in Fig. 2, the 0-order-TSK fuzzy system is adopted in order to keep the interpretability of DRW-TSK-FS. Its justification can be described as follows.
Obviously, when the NN unit involved in DRW-TSK-FS is silent, the TSK fuzzy system in Fig. 2 is a 0-order TSK fuzzy system with the kth rule expressed as
If x1 is
where y
t
is the tth output of the NN unit and
With the singleton fuzzifier, the product inference and the center average defuzzifier, the output of DRW-TSK-FS can be computed as
From (10), it can be seen that the outputs generated by the NN unit involve in the if-parts of DRW-TSK-FS such that the interpretability of the fuzzy rule in (10) degenerates. This is because the outputs of the NN unit are not endowed with specific physical meanings. Despite the potential advantage that the approximation accuracy of DRW-TSK-FS can be insured by the stacked structure, DRW-TSK-FS becomes incomprehensible. Furthermore, the confidence degree that the fuzzy rule can act on an input pattern cannot be properly modeled. In the following, let discuss another form of DRW-TSK-FS to address the aforementioned issues.
As pointed out in [25, 26], when the Gaussian membership function is employed as the corresponding fuzzy membership function, the activation degrees cannot be normalized. Hence, without taking the denominator in (11) into considerations, the output of DRW-TSK-FC can be expressed as
It is obvious that the output formulated in (12) can be expressed in another form,
which indicates that there exists another form of DRW-TSK-FS whose output is as same as that in (13). Accordingly, the kth fuzzy rule of the new form of DRW-TSK-FS is defined as
If x1 is
With regards to the if-parts in (14), it is clear that except the original input patterns, there are no other patterns being involved and thus the interpretability keeps compared to the complicated fuzzy rule shown in (10). With respect to the predictive information, a new dynamic concept termed as the dynamic rule weight DRW is defined for each fuzzy rule. Thus, the predictive information from the NN unit now does not complicate the interpretation of the fuzzy rules in DRW-TSK-FS because it can be hidden in DRW as shown in (14).
Please note, as a new concept, DRW here is a function of the input pattern
In DRW-TSK-FS, with regards to the parameter learning of the if-parts, FCM is employed to obtain the center vectors and the kernel width vectors for fuzzy membership functions. After all the if-parts of fuzzy rules are determined, one may obtain
Hence, the training problem of DRW-TSK-FS can be transformed into the learning of parameters involved in the linear regression model expressed in (16). Based on the least square (LS) criterion, the solution for the parameter learning of the regression model in (16) can be derived by minimizing the following LS criterion function
where η is the learning rate given by users, iter represents the current moment. The learning algorithm of DRW-TSK-FS is listed in Algorithm 1.
In algorithm 1, t can exert influence on the approximation accuracy of DRW-TSK-FS. Our extensive experimental results in the following section reveal that t is set in a relatively reasonable range from d/3 to d/2 can meet our requirements.
In this subsection, the time complexity of Algorithm 1 will be firstly analyzed, then the interpretability of DRW-TSK-FS will be discussed quantificationally.
The time complexity of DRW-TSK-FS contains four parts, i.e., generating t groups of predictive information, learning parameters involved in the if-parts of fuzzy rules, computing the value of the if-parts of fuzzy rules and learning parameters involved in the then-parts of fuzzy rules. The time complexity of the first part depends on the training approach for NN. For a given training approach, e.g., back-propagation [27], the time complexity is O (THN2), where H is the number of nodes in the hidden layer. The second part is achieved by FCM [14], thus the time complexity is O (NK (d + t) 2). The time complexity of the third part is O (NK). Lastly, with regards to the parameter learning of the then-parts, the upper bound of the time complexity is O (IterMaxNK2).
Therefore, the entire asymptotic time complexity of DRW-TSK-FS is O (N (THN + K (1 + (d + t) 2) + IterMaxNK2)). Obviously, with the augmenting unit (NN), the time complexity of DRW-TSK-FS is at least near to the magnitude of O (N2), which is higher than that of the classic TSK fuzzy system, e.g., 0-order TSK fuzzy system. However, we should keep in mind that the predictive information generated by the augmenting unit (NN) can be leveraged to open the manifold of the feature space to improve the generalization capability of DRW-TSK-FS. Also, with regards to the implementation of NN in Algorithm 1, the neural network toolbox provided by the Matlab environment is employed [40], which is effective and high-efficiency for many application scenarios.
As stated in [37–39], the interpretability of the TSK fuzzy system can be evaluated by both rule-based level (measured by the number of fuzzy rules) and fuzzy partition level (measured by the number of features and membership functions). In other words, the interpretability of DRW-TSK-FS can be quantificationally evaluated by the number of involved parameters during the training procedure. Therefore, with regards to DRW-TSK-FS with the fuzzy rule shown in (14), it needs 2Kd parameters in the if-parts and K parameters in the then-parts. Thus, the total number of parameters is K (2d + 1). However, if the fuzzy rule in (10) is adopted, the number of parameters involved in the if-parts is 2K (d + t), and hence the total number of parameters is K (2d + 1) +2Kt. That is to say, by introducing DRW, the interpretability is enhanced since the predictive information is hidden in DRW.
Experimental results
In this section, DRW-TSK-FS is evaluated from the perspective of its approximation ability and outlier detection performance. The Gas Furnace dataset [28] and 15 KEEL regression datasets [29] are respectively chosen for this evaluation. Five benchmarking regression models, 0-order-TSK-FS [4], L2-TSK-FS [4], FS-FCSVM [30], TSFS-SVR [31] and a single hidden layer neural network (NN) are employed for comparison. The detailed experiments are organized as follows. In subsection IV.A, experimental setups including the parameter setting and the experimental environment configuration are given. The experimental results on the Gas Furnace dataset and KEEL datasets are respectively reported in subsections IV.B and IV.C. In subsection IV.D, the advantages of DRW-TSK-FS are discussed. Lastly, two statistical approaches are employed to analyze the significant differences between DRW-TSK-FS and benchmarking regression models.
Experimental setup
In our experiments, parameters in all benchmarking regression models and DRW-TSK-FS are determined by 5-fold cross-validation combining with the grid searching strategy on training datasets. In 0-order-TSK-FS, L2-TSK-FS, FS-FCSVM and DRW-TSK-FS, the number of fuzzy rules is set in a relatively small range from 2 to 15 [32, 33] in order to consider the balance between the approximation ability and interpretability of each model. In addition, for NN, the number of hidden nodes is also searched from 2 to 15. Table 1 shows the parameter search in the corresponding models.
Parameter setup (search) of different models
Parameter setup (search) of different models
To evaluate the performance (model error) quantitatively, the performance measure also used in [28] is hired:
where N is the size of the training set, and y real and y predicting denote the real output and predicting output, respectively. The smaller the value of Err, the better the modeling performance of the corresponding regression model.
In order to test the statistical significance between DRW-TSK-FS and the benchmarking regression models, two non-parametric statistical tests [34] are introduced in our experiments. More specifically, the Friedman ranking test [35] is used to detect statistical differences among the results generated by different models and the Holm post-hoc test [36] is used to find the models that reject the equality hypothesis with respect to a chosen control model. The two non-parametric statistical tests are both provided by the KEEL toolbox.
All adopted models are implemented in the Matlab environment and all the experiments are conducted on a PC with 4 cores of I5-7200U with 8 G Bytes of memory.
To evaluate the performance of DRW-TSK-FS, the Gas Furnace dataset downloaded from http://openmv.net/info/gas-furnace is firstly employed. The dataset is a time series, containing successive pairs of 296 observations of the gas rate X (b) and the percentage CO2 in the gas Y (b) generated from continuous records at 9 s intervals.
In the experiments, X (b), X (b) - X (b - 1), Y (b - 1) and Y (b - 1) - Y (b - 2) are taken as the input features affecting the current output Y (b). the first 250 vectors of the form [X (b) , X (b) - X (b - 1) , Y (b - 1) , Y (b - 1) - Y (b - 2)] are selected to train DRW-TSK-FS and the rest is used for model testing.
Suppose that we use t to represent the number of activations of the NN unit, i.e., the number of predictive information augmented to the original feature space of the training set. To observe the influence of t exerted on the model error of DRW-TSK-FS, t is first fixed from 0 to 5 respectively and then DRW-TSK-FS is carried out with each t for 30 trials. Figure 3 illustrates the average model error Err with different t.

Modeling error Err of DRW-TSK-FS with different t.
When t is set to 0, the NN unit is not activated. Hence, DRW-TSK-FS degenerates to a 0-order-TSK-FS, which obtains the modeling error Err as high as 4.221*1e-3. When the NN unit is activated with a non-zero t, the average Err is reduced significantly and reaches to its minimum when t = 2. However, as the value of t continues to grow, the modeling error Err does not further decrease, but increases a bit and then stays stable. This experimental result indicates that augmenting moderate predictive information from the NN unit to the original feature space can indeed reduce the modeling error. However, excessive predictive information may overwhelm the original features and even increases the model error. In addition, excessive predictive information means that a higher time complexity of DRW-TSK-FS. In fact, experimental results (including the following results on KEEL datasets) reveal a guideline that t can be set in a relatively reasonable range from d/3 to d/2, where d is the number of the features in the original feature space.
Table 2 lists the comparative results of the adopted benchmarking models and DRW-TSK-FS in terms of the minimum (Min) and average (Mean) modeling errors, the standard deviation (Std) and the optimal number of fuzzy rules/hidden nodes. The best average modeling errors are marked in bold. Please note that, for NN, the number of hidden nodes instead of the number of fuzzy rules is reported. From Table 2, we can see that comparing with other benchmarking models such as 0-order-TSK-FS, L2-TSK-FS, FS-FCSVM, NN and TSFS-SVR, DRW-TSK-FS wins the best performance in terms of minimum and average modeling errors.
The performance of different regression models on the Gas Furnace dataset
In addition to approximation performance, the interpretability of a fuzzy system is also very important. To observe the interpretability of DRW-TSK-FS more concretely, Table 3 shows 3 interpretable fuzzy rules with t being set to 1 and 2, respectively. It is obvious that the predictive information from the NN unit is not involved in the if-parts and does not complicate the interpretation of the then-parts of the fuzzy rules because it is hidden in DRW of these rules. Each row in Table 3 can be translated into a fuzzy rule with a linear then-part with DRW. For example, when t is set to 1, the first fuzzy rule can be expressed as
If x1 is
then f
k
(
Rules obtained from the Gas Furnace dataset
In practice, data collected from sensors may be contaminated at one point. How to detect the abnormal data (outliers) is also very important. In DRW-TSK-FS, DRW reflects the acceptability of each fuzzy rule for
To investigate DRW-TSK-FS’s ability to detect outliers, we simulate abnormal data (outliers) by randomly selecting 3 vectors (objects) from the testing set and deviating them from their original distributions by adding 20% noises. For an arbitrary object i in the testing set, its isolation level, denoted as DRW(i), can be computed as the sum value of DRW of all fuzzy rules. Figure 4(a) and 4(b) illustrate the isolation level of each object in the testing set with and without noises respectively. Note that in order to magnify the differences of isolation levels, the logarithmic form of the isolation level of each object is plotted.

Isolation level of each object. (a) Testing set without noises. (b) Testing set with noises.
Figure 4 shows that there exist three objects (or at least two objects) whose isolation levels are significantly different from those of the others. We check that the three objects are exactly the ones which we randomly select from the testing set and add noises.
Next, more datasets will be selected to demonstrate the characteristics involved in DRW-TSK-FS.
15 real-world regression datasets are selected from the KEEL project repository to further demonstrate the characteristics of DRW-TSK-FS. Table 4 lists the detailed information (i.e., number of features, number of objects) of the selected datasets.
Table 5 gives the experimental results of DRW-TSK-FS and the benchmarking models in terms of minimum and average model errors, the standard deviation and the optimal number of fuzzy rules/hidden nodes. Except for BAS, WIZ, MAC and DEL, DRW-TSK-FS obtains the lowest modeling errors among all benchmarking models. The results further demonstrate the promising performance of DRW-TSK-FS. Also, 10 datasets are selected and the relationships between the modeling errors and t are plotted in Fig. 5.

Modeling error of DRW-TSK-FC with different t on KEEL datasets.
As discussed in subsection IV.B, an appropriate t can make DRW-TSK-FS keep a balance between the modeling error and the time complexity. From Fig. 5, we find that t set to 4, 2, 3, 6, 3, 3, 3, 2, 1 and 2 on STO (d = 9), FRI (d = 5), CON (d = 8), MOR (d = 11), LAS (d = 4), WAN (d = 9), AUT (d = 7), ELE (d = 4), QUA (d = 3) and DIA (d = 2) respectively can basically meet our requirements. Also, we find that the adopted t values on these 10 datasets basically locate in the range from d/3 to d/2. The results further demonstrate the effectiveness of the guideline revealed in subsection IV.B.
Similarly, we select 4 datasets (MOR, STO, AUT and ELE) and simulate the outliers by randomly selecting 10 vectors from the testing dataset and adding 20% noises to them in order to further investigate DRW-TSK-FS’s ability in outlier detection. The experimental results are plotted in Fig. 6.

Isolation level of each object on MOR, STO, AUT and ELE.
From Fig. 6, we find that DRW-TSK-FS detects 9 and 10 outliers respectively on MOR and STO based on their isolation levels. However, for AUT and ELE, although the intuitional results are inferior to that on MOR and STO, we can use a threshold to select the outliers dynamically. For example, in ELE, we can consider the objects as outliers when their values of log(DRW(i)) are less than – 1.5 or – 2.0 according to the specific application scenarios.
The experimental results on all selected datasets demonstrate that with the predictive information from the NN unit, the approximation accuracy of DRW-TSK-FS is enhanced. Also, by introducing DRW, the interpretability of DRW-TSK-FS still keeps. Moreover, the sum value of DRW of all fuzzy rules can help us detect outliers. Overall, based on the experimental results, the main advantages of DRW-TSK-FS over other benchmarking models are highlighted as follows. (i) The structure of DRW-TSK-FS is significantly different from other benchmarking models. It is very similar to the stacked structure such that the predictive information can be leveraged to open the manifold of the training set so as to enhance the approximation accuracy of DRW-TSK-FS. For instance, on the gas furnace dataset, DRW-TSK-FS wins the best performance, and on 15 KEEL regression datasets, DRW-TSK-FS wins the best performance on 11 ones. (ii) Although the original feature space is augmented by the predictive information from the NN unit, with an equivalent transformation, for each fuzzy rule, the predictive information is removed from the if-parts and hidden in DRW such that the interpretability of DRW-TSK-FS keeps. From the fuzzy rules listed in Table 3, it is obvious that the if-parts do not contains the predictive information whose physical meanings are not explicit. (iii) With the introduced DRW, another advantage over other benchmarking models is that the sum value of DRW all fuzzy rules reflects the isolation level of each object. As for outliers, compared with normal objects, their summed DRW values are very small. In other words, the outliers should have very low isolation level in DRW-TSK-FS and they can be probably detected by their summed DRW values with respect to all fuzzy rules.
With the above advantages, DRW-TSK-FS is expected to apply to practical engineering applications. Here, two guidelines are given from the aspects of approximation accuracy and outlier detection based on the above experimental results. (i) The number of predictive information generated by the NN unit should be set in a relatively reasonable range from d/3 to d/2. (ii) As for abnormal data (outliers), firstly, it is strongly suggested to give a decision graph about log(DRW) of each object. Then according to the graph, it is recommended to use a threshold to select the abnormal data dynamically based on the application requirements.
Non-parametric statistical analysis
In this section, we employ two non-parametric statistical methods, i.e., the Friedman ranking test and the Holm post-hoc test to analyze the statistical significance of the performance of the proposed DRW-TSK-FS model as compared with the introduced benchmarking models. The level of confidence α is set to 0.05.
The Friedman ranking test is firstly employed and the ranking performance is illustrated in Fig. 7, which indicates that DRW-TSK-FS achieves the best ranking performance among all the benchmarking models. The p-value in Fig. 7 is much smaller than the specified α value, illustrating the statistical significance of the performance DRW-TSK-FS as compared with the benchmarking models.

Friedman ranking test.
Next, we also use the Holm post-hoc test to analyze the statistical significance when benchmarking DRW-TSK-FS with the other models. The results are recorded in Tables 6. All models are ordered by the z-value computed during the test procedure. Holm’s procedure rejects those hypotheses having an unadjusted p-value smaller than the Holm (α/i). From Tables 6, we can see that with respect to 0-order-TSK-FS, though the hypothesis is not rejected, their correspondingly very low p-values still reveal the competitiveness of DRW-TSK-FS.
Main characteristics of 15 KEEL datasets
The modeling performance in terms of the maximum accuracy, the average number of rules, the average accuracy and the standard deviation on 15 datasets
Holm test results for DRW-TSK-FC vs. FS-FCSVM, 0-order-TSK-FC, NN, TSFS-SVR and L2-TSK-FC with α = 0.05
In this paper, a novel TSK fuzzy system DRW-TSK-FS is proposed in which dynamic rule weights are introduced to guarantee the interpretability and the stacked generalization principle is used to ensure the modeling performance. A basic input-output unit and an augmenting unit are involved in DRW-TSK-FS and form a special structure that is similar to the stacked structure. By means of the stacked structure, the predictive information from the augmenting unit as augmented features can be leveraged to open the manifold structure of the feature space such that the enhanced modeling performance can be expected. Superficially, since the predictive information has no specific physical meaning, it seems to reduce the interpretability of fuzzy rules. However, from the mathematical view, fuzzy rules in DRW-TSK-FS can be rewritten in another form in which the predictive information is removed from the if-parts and hidden in DRW. Hence, the if-parts of all fuzzy rules always have concise physical meanings. More importantly, DRW can be taken as a nonlinear function of the input pattern which reflects the acceptability of a fuzzy rule with regard to an input pattern. In addition, the isolation level of an input pattern can be measured by the summed DRW values of all fuzzy rules in DRW-TSK-FS. The promising modeling performance, high interpretability and the ability of outlier detection are evaluated on 16 real-life datasets.
In our future works, we will gear towards developing a TSK fuzzy system which provides a mechanism to measure the acceptability of fuzzy rules acting on each individual feature involved in the input patterns. Moreover, like the issues discussed in [41, 42], we will extend the current results for control of general nonlinear systems based on fuzzy/piecewise dynamic models and for similar issues.
Footnotes
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 81701793, 61272210, 61572236, 61300151 and NSFC-JSPS Grant 6161101250, by the Natural Science Foundation of Jiangsu Province under Grant BK20161268, by the Fundamental Research Funds for the Central Universities (JUDCF13030), and by the Science and Technology Plan Funding Project of Nantong (MS12017016-2).
