An adaptive incremental TSK fuzzy system based on stochastic configuration and its approximation capability analysis

Abstract

The aim of this study is to improve randomized methods for designing a Takagi-Sugeno-Kang (TSK) fuzzy system. A novel adaptive incremental TSK fuzzy system based on stochastic configuration, named stochastic configuration fuzzy system (SCFS), is proposed in this paper. The proposed SCFS determines the appropriate number of fuzzy rules in TSK fuzzy system by incremental learning approach. From the initial system, new fuzzy rules are added incrementally to improve the system performance until the specified performance is achieved. In the process of generation of fuzzy rules, the stochastic configuration supervision mechanism is applied to ensure that the addition of fuzzy rules can continuously improve the performance. The premise parameters of new adding fuzzy rules are randomly assigned adaptively under the supervisory mechanism, and the consequent parameters are evaluated by Moore-Penrose generalized inverse. It has been proved theoretically that the supervisory mechanism can help to ensure the universal approximation of SCFS. The proposed SCFS can reach any predetermined tolerance level when there are enough fuzzy rules, and the training process is finite. A series of synthetic data and benchmark datasets are used to verify SCFS’s performance. According to the experimental results, SCFS achieves satisfactory prediction accuracy compared to other models.

Keywords

Stochastic configuration fuzzy system universal approximation incremental learning

1 Introduction

In the past few decades, Takagi-Sugeno-Kang (TSK) fuzzy system [1], as one of the most important and popular fuzzy systems, has received more and more attention and has been successfully applied in many different fields, such as data modeling [2], pattern recognition [3], model approximation [4], adaptive control [5], etc. However, as the dimension and scale of the dataset increase, the rule base and the parameters of TSK fuzzy system increase exponentially. How to determine a TSK fuzzy system with appropriate architecture and good learning and generalization performance is a significant research problem.

The task of designing a TSK fuzzy system includes structure recognition and parameters optimization of both fuzzy antecedents and functional consequents. Fuzzy clustering methods [6, 7] have been widely applied for structure recognition, optimization techniques [8, 9], feature selection [10] and back propagation (BP) algorithm are used for parameters optimization.

In the process of using fuzzy clustering methods to determine the system structure, it is a crucial problem to determine an appropriate number of clustering in advance, which determines the number of rules. The performance of fuzzy systems with few rules is poor, while large rules may cause overfitting.

The parameters learning of TSK fuzzy systems can be realized by data-driven techniques similar to artificial neural network training method. Randomized approaches which have great potential in developing fast training algorithms become popular as a result of their good modeling performance and training speed. In [11], The ELM method is applied to optimize the fuzzy system parameters. In [12], a regularization ELM is introduced to randomly select fuzzy layer parameters and to determine neural layer parameters by solving optimization problems in regularization framework. The scope of random parameters is important to the training process of randomized approaches. In [13], IRVFLN’s universal approximation capacity is found to be affected by the scope of random parameters. In [14], stochastic configuration network (SCN) is presented to solve the problem. It ensures universal approximation by assigning random parameters under a supervisory mechanism. Various SCN models [15 –17] have been proposed for data modeling to obtain universal approximation. In [18], a complex multidimensional numerical integration method based on SCN is proposed to deal with engineering problems. Deep stacked SCN is presented for data streams modeling in [19]. In [20], SCN’s structural parameters are improved by incorporating the driving amount.

According to current results, determining an appropriate number of fuzzy rules is a critical step in improving fuzzy system approximation. The traditional learning methods of fuzzy system are to set the number of rules in advance. This may affect the approximation ability and generalization ability of fuzzy system. In addition, although the randomized approaches have been used to optimize the parameters of fuzzy system, how to ensure the universal approximation of randomly generated fuzzy systems is still a problem.

Motivated by these facts, in this study an adaptive incremental TSK fuzzy system based on stochastic configuration (SCFS) is proposed. We determine the architecture by an adaptive incremental learning approach and randomly assign the premise parameters with inequality constraints so as to establish a TSK fuzzy system with appropriate fuzzy rules and good approximation capability. The incremental learning approach of SCFS starts with an initial structure and then gradually adds fuzzy rules until predefined termination criteria are met. Obtaining a compact fuzzy system and an appropriate number of fuzzy rules are straightforward. Fuzzy rules are added randomly using a supervisory mechanism which ensures universal approximation. The proposed SCFS is evaluated by a number of popular regression benchmarks and compared with some start-of-the-art approaches. It appears that SCFS is capable of achieving satisfactory performance based on numerical examples.

The rest of paper starts with some preliminaries for TSK Fuzzy system in section 2. The construction of SCFS is introduced in section 3. Section 4 investigates approximation properties of SCFS and section 5 proposes the parameters learning algorithm of SCFS. In section 6, a series of synthetic data and benchmarks datasets are carried out to illustrate the performance of SCFS. Section 7 gives some conclusions.

2 Preliminaries

TSK fuzzy system is the most popular fuzzy system, and its fuzzy rules have the following form: $R_{k} : If x_{1} is A_{k 1} and \dots and x_{m} is A_{km},$ $then y = f_{k} (x), k = 1, 2, \dots, K,$ (1) where $x = {(x_{1}, x_{2} \dots, x_{m})}^{T} \in ℝ^{m}$ is the system input vector, A_ki is a fuzzy set subscribed by x_i, consequent part f_k (x) is a function of input x.

The rule R_k of TSK fuzzy system maps antecedent fuzzy sets (A_k1, ⋯ , A_km) to consequent function f_k (x). By sum-product inference, the defuzzified output is written as follows: $y = \frac{\sum_{k = 1}^{K} τ_{k} (x) f_{k} (x)}{\sum_{k = 1}^{K} τ_{k} (x)},$ (2) where $τ_{k} (x) = \prod_{i = 1}^{m} A_{ki} (x_{i})$ is the activation level of the rule R_k.

Centralized TSK fuzzy system which omit the denominator in (2) is suggested in [21 –23] and the output can be simplified into $y = \sum_{k = 1}^{K} τ_{k} (x) f_{k} (x),$ (3) which is adopt in this study.

The consequent function f_k (x) usually is a polynomial of the input variables. The zeros-order and first-order TSK fuzzy systems are two popular TSK fuzzy system, in which constant and first order polynomial are chosen as consequent function respectively. However, the consequent function can also be other function. In [24], Chebyshev polynomial is used to improve the nonlinear expression ability of the consequent function, which is employed in this study and has the following form: $f_{k} (x) = φ {(x)}^{T} a_{k}, (4)$ (4) where φ (x) = (1, T₁ (x₁) , ⋯ , T₁ (x_m) , T₂ (x₁) , ⋯ , T_n (x_m)) ^T, T_l+1 (x_i) is Chebyshev polynomial with T_l+1 (x_i) = 2x_iT_l (x_i) - T_l-1 (x_i), T₀ (x_i) = 1 and T₁ (x_i) = x_i, n is the degree of Chebyshev polynomial and a_l is the consequent parameter.

3 Construction of stochastic configuration fuzzy system

The construction of SCFS is elaborated in this section. SCFS retains the form (3) of a centralized TSK fuzzy system, but adopts the idea of stochastic configuration to gradually add rules until specified condition is met.

Given a target function $f : ℝ^{m} \to ℝ$ and an initial SCFS S_{L
₀} with L₀ fuzzy rules, $R_{k} : If x_{1} is A_{k 1} and \dots and x_{m} is A_{km},$ $then y = f_{k} (x), k = 1, 2, \dots, L_{0},$ where the consequent function f_k (x) adopts the function of Chebyshev polynomial of form (4), i.e., f_k (x) = φ (x) ^Ta_k, the membership functions of antecedent fuzzy set A_ki is of the form $A_{ki} (x_{i}) = exp [- {(\frac{(x_{i} - c_{ki})}{σ_{ki}})}^{2}]$ where c_ki and σ_ki are center and width of fuzzy set A_ki. The premise parameters of R_k are denoted as $c_{k} = (c_{k 1}, \dots, c_{km}) and σ_{k} = (σ_{k 1}, \dots, σ_{km})$ .

Hence, the output of S_{L
₀} can be represented as $S_{L_{0}} (x) = \sum_{k = 1}^{L_{0}} τ_{k} (x) φ {(x)}^{T} a_{k}$ and the residual error ofS_{L
₀} is $e_{L_{0}} (x) = f (x) - S_{L_{0}} (x),$ where the activation level $τ_{k} (x) = \prod_{i = 1}^{m} A_{ki} (x_{i})$ .

If ∥e_{L
₀}∥ fails to reach the tolerance level ɛ, a new fuzzy rule R_L₀+1 will be added based on a supervisory mechanism $\frac{{(\sum_{i = 1}^{N} τ_{L_{0} + 1} (x_{i}) e_{L_{0}} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L_{0} + 1} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L_{0}} {(x_{i})}^{2}} \geq r_{L_{0} + 1}$ where r_L₀+1 is a given real number. The supervisory mechanism can ensure the new SCFS S_L₀+1 have an improved residual error. The new rule R_L₀+1 has the following form: $R_{L_{0} + 1} : If x_{1} is A_{(L_{0} + 1) 1} and \dots and x_{m} is A_{(L_{0} + 1) m},$ $then y = φ {(x)}^{T} a_{L_{0} + 1}$ where c_L₀+1 andσ_L₀+1 are randomly allocated based the supervisory mechanism, and a_L₀+1 can be evaluated easily by using Moore-Penrose generalized inverse to solve a least square problem.

Hence, the output of S_L₀+1 can be represented as $S_{L_{0} + 1} (x) = \sum_{k = 1}^{L_{0} + 1} τ_{k} (x) φ {(x)}^{T} a_{k},$ (5) and the residual error of S_L₀+1 is $e_{L_{0} + 1} (x) = e_{L_{0}} (x) - τ_{L_{0} + 1} (x) φ {(x)}^{T} a_{L_{0} + 1} .$ (6)

The incremental learning approach of SCFS, which repeats the above incremental process and gradually adds new rules until the predefined tolerance level ɛ is met, is illustrated in Fig. 1.

Fig. 1

The incremental learning approach of SCFS.

The initial system S_{L
₀} with L₀ fuzzy rules can either be generated by traditional methods or is a null system. For given target function f, new rules are produced incrementally under a supervisory mechanism for improving the performance of the SCFS until specified performance ɛ is met.

It can be found that different form the traditional fuzzy system which requires a given number of fuzzy rules in advance, SCFS increases rules gradually by incremental learning to obtain a suitable number of rules. And the supervision mechanism of SCFS can ensure universal approximation which is proved in the next section.

4 Convergence analysis of stochastic configuration fuzzy system

In this section, the approximation capability of SCFS will be analyzed theoretically.

Given dataset with inputs $X = {(x_{1}, x_{2}, \dots, x_{N})}^{T} \in ℝ^{N \times m}$ and outputs $Y = f (X) = {(y_{1}, y_{2}, \dots, y_{N})}^{T} \in ℝ^{N}$ , where $x_{i} = (x_{i 1}, x_{i 2}, \dots, x_{im}) \in ℝ^{m}$ . The output of the SCFS S_L-1 with L - 1 fuzzy rules on dataset {X, Y} can be represented as $S_{L - 1} (X) = {(S_{L - 1} (x_{1}), S_{L - 1} (x_{2}), \dots, S_{L - 1} (x_{N}))}^{T}$ and the residual error of S_L-1 is $e_{L - 1} (X) = {(e_{L - 1} (x_{1}), e_{L - 1} (x_{2}), \dots, e_{L - 1} (x_{N}))}^{T}$ where $S_{L - 1} (x_{i}) = \sum_{k = 1}^{L - 1} τ_{k} (x_{i}) φ {(x_{i})}^{T} a_{k},$ $e_{L - 1} (x_{i}) = y_{i} - S_{L - 1} (x_{i}) .$

If e_L-1 (X) ≥ ɛ, new fuzzy rule R_L is generated and added to the old SCFS S_L-1 to get the new SCFS S_L with L fuzzy rules. The output and the residual error of the S_L on dataset {X, Y} can be represented as $S_{L} (X) = \sum_{l = 1}^{L} η_{l}^{T} a_{l} = S_{L - 1} (X) + η_{L}^{T} a_{L} . ∥$ (7) $e_{L} (X) = Y - S_{L} (X) = e_{L - 1} (X) - η_{L}^{T} a_{L}$ (8) where η_l = (τ_l (x₁) φ (x₁) , ⋯ , τ_l (x_N) φ (x_N)) ^T .

The premise parameters c_Land σ_L of new added rule R_L are randomly assigned under inequality constraint.

Then, we will give the first theorem of SCFS and analyze the convergence of the approximation error as the number of rules increases.

Theorem 1. Given $r_{L} \in [\frac{1}{N + 1}, \frac{e_{L - 1} {(x_{q})}^{2}}{\sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}),$ if the randomly generated premise parameters c_L, σ_L satisfy the following inequality $\frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}} \geq r_{L},$ (9) and the consequent parameters a_L $\in ℝ^{nm + 1}$ are evaluated as $a_{L} = {(\frac{\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i})}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2}}, 0, \dots, 0)}^{T},$ then, we have $lim_{L \to + \infty} e_{L} (X) = 0 .$

where $τ_{L} (x_{i}) = exp [- \sum_{j = 1}^{m} {(\frac{(x_{ij} - c_{Lj})}{σ_{Lj}})}^{2}]$ and $x_{q} \in {x_{j} | | e_{L - 1} (x_{j}) | = max_{i} | e_{L - 1} (x_{i}) |}$ .

Proof: From (8), we have $∥ e_{L} (X) ∥^{2} = ∥ e_{L - 1} (X) - η_{L}^{T} a_{L} ∥^{2}$ $= ∥ e_{L - 1} (X) ∥^{2} - 2 e_{L - 1} {(X)}^{T} η_{L}^{T} a_{L} + a_{L}^{T} η_{L} η_{L}^{T} a_{L} .$

Since $a_{L} = {(\frac{\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i})}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2}}, 0, \dots, 0)}^{T},$ then $η_{L}^{T} a_{L} = {(τ_{L} (x_{1}), \dots, τ_{L} (x_{N}))}^{T} \frac{\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i})}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2}} .$

So, ∥e_L (X) ∥ ² $= ∥ e_{L - 1} (X) ∥^{2} (1 - \frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}})$ $\leq ∥ e_{L - 1} (X) ∥^{2} (1 - r_{L}) \leq ∥ e_{0} (X) ∥^{2} Π_{l = 1}^{L} (1 - r_{L})$ which implies that $∥ e_{L} {(X)}^{2} \leq e_{0} (X) ∥^{2} {(1 - \frac{1}{1 + N})}^{L} .$ and $lim_{L \to + \infty} ∥ e_{L} (X) ∥ = 0$ . □

Remark 1. We present a constructive scheme for consequent parameters in Theorem 1, which leads to the convergence of approximation error of SCFS. The inequality constraint (9) is proposed for finding appropriate premise parameters c_L and σ_L. According to the proof of Theorem 1, the inequality constraint (9) can ensure the universal approximation, which is the theoretical basis of data modeling using SCFS.

Then, the next question is the existence of c_L and σ_L satisfying the supervisory mechanism (9), which is studied in the following theorem 2.

Theorem 2. Suppose dataset X = (x₁, x₂, ⋯ , x_N) ^T satisfying x_i ≠ x_q for any i ≠ q, for any $r_{L} < \frac{e_{L - 1} {(x_{q})}^{2}}{\sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}$ , there exist c_L and σ_L, such that $\frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}} \geq r_{L} .$ where $τ_{L} (x_{i}) = exp [- \sum_{j = 1}^{m} {(\frac{(x_{ij} - c_{Lj})}{σ_{Lj}})}^{2}]$ and $x_{q} \in {x_{j} | | e_{L - 1} (x_{j}) | = max_{i} | e_{L - 1} (x_{i}) |}$ .

Proof: Let c_L = x_q, then τ_L (x_q) = 1. Since dataset X satisfy that x_i ≠ x_q for any i ≠ q, we can get $lim_{σ_{L} \to 0} τ_{L} (x_{i}) = 0$ , which implies that there exists σ_L such that $max_{i \neq q} τ_{L} (x_{i}) \leq u$ $for any u \in (0, \frac{1}{2 (N - 1)}]$ .

Thus, we have $\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} = 1 + \sum_{i \neq q} τ_{L} {(x_{i})}^{2} \leq 1 + (N - 1) u^{2}$ .

On the other hand, ${(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}$ $= {(| e_{L - 1} (x_{q}) | + \sum_{i \neq q} (τ_{L} (x_{i}) \cdot δ_{iq} \cdot | e_{L - 1} (x_{i}) |))}^{2}$ where $δ_{iq} = \frac{sign (e_{L - 1} (x_{i}))}{sign (e_{L - 1} (x_{q}))}$ , sign (·) is sign function. Since e_L-1 (X) > ɛ implies e_L-1 (x_q) ≠ 0 and δ_iq ≥ -1, we can get ${(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}$ $\geq {(| e_{L - 1} (x_{q}) | - \sum_{i \neq q} u | e_{L - 1} (x_{i}) |)}^{2} .$ $\geq (1 - 2 u (N - 1)) {| e_{L - 1} (x_{q}) |}^{2} .$

Hence, we have $\frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}$ $\geq \frac{(1 - 2 u (N - 1)) e_{L - 1} {(x_{q})}^{2}}{(1 + (N - 1) u^{2}) \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}} .$

We notice that $\frac{(1 - 2 u (N - 1)) e_{L - 1} {(x_{q})}^{2}}{(1 + (N - 1) u^{2}) \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}$ goes up to $\frac{e_{L - 1} {(x_{q})}^{2}}{\sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}$ when u → 0⁺. So, we can know that for any $r_{L} < \frac{e_{L - 1} {(x_{q})}^{2}}{\sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}$ , there exists $u_{0} \in (0, \frac{1}{2 (N - 1)}]$ such that $\frac{(1 - 2 u (N - 1)) e_{L - 1} {(x_{q})}^{2}}{(1 + (N - 1) u^{2}) \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}} \geq r_{L}$ for any u < u₀. □

Remark 2. Theorem 2 proves the existence of c_L and σ_L satisfying the supervisory mechanism which is used in Theorem 1 to ensure the continuous approximation improvement. In fact, according to the proof of Theorem 2, we can easily find the feasible parameter values $c_{L} = x_{q} and σ_{Lj} = min_{k \in Q_{j}} {(x_{kj} - x_{qj})}^{2}$ where $Q_{j} = {k \in {1, 2, \dots, N} | {(x_{kj} - x_{qj})}^{2} \neq 0},$ which means that the premise parameters of fuzzy rule satisfying the inequality constraint are sure to be found.

The consequent parameters a_L in Theorem 1 are given and remain fixed. This may lead a slow convergence of incremental learning. In fact, there is a recalculation scheme. The consequent parameters are determining by minimizing global residual and recalculating as rule increases. Theorem 3 gives the convergence when the consequent parameters a_L are updated by the recalculation scheme.

Theorem 3. Given $r_{L} \in [\frac{1}{N + 1}, \frac{e_{L - 1} {(x_{q})}^{2}}{\sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}}),$ if the randomly generated premise parameters c_L, σ_L satisfy the following inequality $\frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1}^{*} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1}^{*} {(x_{i})}^{2}} \geq r_{L},$ and the consequent parameters are evaluated by $[a_{1}^{*}, \dots, a_{L}^{*}] = arg min_{a} ∥ Y - \sum_{l = 1}^{L} η_{l}^{T} a_{l} ∥, (10)$ (10) rthen we have $lim_{L \to + \infty} e_{L}^{*} (X) = 0 .$ where $e_{L}^{*} (X) = Y - \sum_{l = 1}^{L} η_{l}^{T} a_{l}^{*}$ .

Proof: Denote ${\hat{e}}_{L} (X) = e_{L - 1}^{*} (X) - η_{L}^{T} {\hat{a}}_{L}$ where ${\hat{a}}_{L} = (\frac{\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i})}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2}}, 0, \dots, 0)$ .

It is obvious that $∥ e_{L}^{*} (X) ∥^{2} \leq ∥ {\hat{e}}_{L} (X) ∥^{2} \leq (1 - r_{L}) ∥ e_{L - 1}^{*} (X) ∥^{2}$ which implies that $∥ e_{L}^{*} (X) ∥^{2} \leq ∥ e_{0}^{*} (X) ∥^{2} {(1 - \frac{1}{N + 1})}^{L}$ and $lim_{L \to + \infty} ∥ e_{L}^{*} (X) ∥ = 0 .$ □

Remark 3. The optimization of consequent parameters in Theorem 3 can be evaluated by using Moore-Penrose generalized inverse, which can improve the convergence rate of SCFS compare with those directly given and remain fixed in Theorem 1.

Remark 4. According to the proof of Theorem 3, we can find that for any given ɛ > 0, if the rule number L satisfies $L \geq 2 ln (\frac{ɛ}{∥ Y ∥}) / ln (1 - \frac{1}{N + 1})$ then the residual error norm $∥ e_{L}^{*} (X) ∥$ satisfies $∥ e_{L}^{*} (X) ∥ \leq ɛ$ . This means that any predetermined tolerance level can be reached when there are enough fuzzy rules, and training process of the proposed method is finite.

5 Parameters learning algorithm for stochastic configuration fuzzy system

Theorem 1 and Theorem 3 have proved the convergence of two kind of parameters learning approach of SCFS. The two approach evaluate consequent parameters by constructing and optimizing schemes, respectively. Note that the performance of the latter is superior to that of the former. Hence, we only give the parameters learning algorithm associated with Theorem 3 here. Generally, the proposed algorithm is composed of the following components.

Configuration of premise parameters: Premise parameters c_L and σ_L are randomly assigned under the supervisory mechanism.

Evaluation of consequent parameters: Evaluating and recalculating the consequent parameters $[a_{1}^{*}, \dots, a_{L}^{*}]$ by solving the optimization problem using Moore-Penrose generalized inverse, then adding the new generated fuzzy rule to the system.

In the process of configuring the premise parameters, the value ranges of random parameters c_L and σ_L are designed as $c_{L, j} \in [min_{i} x_{ij}, max_{i} x_{ij}],$ $σ_{L, j} \in [0, K min_{k \in Q_{j}} {(x_{kj} - x_{qj})}^{2}],$ where $Q_{j} = {k \in {1, 2, \dots, N} | {(x_{kj} - x_{qj})}^{2} \neq 0},$ $x_{q} \in {x_{j} | | e_{L - 1} (x_{j}) | = max_{i} | e_{L - 1} (x_{i}) |}$ and K is a volatility multiplier. We randomly generate T pairs of parameters c_L, σ_L at a time from their respective ranges and choose pairs of parameters that satisfy the supervisory mechanism (9). Denote $H_{L} = \frac{{(\sum_{i = 1}^{N} τ_{L} (x_{i}) e_{L - 1} (x_{i}))}^{2}}{\sum_{i = 1}^{N} τ_{L} {(x_{i})}^{2} \sum_{i = 1}^{N} e_{L - 1} {(x_{i})}^{2}} .$ (11)

The supervisory mechanism can be represented as $H_{L} \geq r_{L}$ (12) where r_L is set to $\frac{1}{N + 1}$ . if no parameter satisfies the conditions (12), we will select a bigger K from a given set of positive numbers ϒ and repeat randomly generate T pairs of parameters c_L, σ_L until at least one satisfactory pair is found. If more than one satisfactory pair of parameters are found, we choose the pair with the largest value of H_L. Because according to the proof process of Theorem 3, the larger value of H_L means that the new rule R_L will bring more improvement to the approximation error.

In the process of evaluating the consequent parameters, the consequent parameters $[a_{1}^{*}, \dots, a_{L}^{*}]$ are evaluated by solving the optimization problem using Moore-Penrose generalized inverse as ${a^{(L)}}^{*} = arg min_{a^{(L)}} ∥ Y - η^{(L)} a^{(L)} ∥^{2}$ $= {({η^{(L)}}^{T} η^{(L)})}^{- 1} {η^{(L)}}^{T} Y$ where $η^{(L)} = (η_{1}^{T}, \dots, η_{L}^{T}), a^{(L)} = (a_{1}^{T}, \dots,$ ${a_{L}^{T})}^{T}$ . The aim is to obtain the consequent parameters a^(L)^* with the minimum learning errors. However, this can lead to overfitting, especially if η^(L)^Tη^(L) is ill-conditioned. Hence, we add a regular term λ ∥ a^(L) ∥ ² and evaluate the consequent parameter as ${a^{(L)}}^{+} = arg min_{a^{(L)}} (∥ Y - η^{(L)} a^{(L)} ∥^{2} + λ ∥ a^{(L)} ∥^{2})$ $= {({η^{(L)}}^{T} η^{(L)} + λ I)}^{- 1} {η^{(L)}}^{T} Y$ (13) where λ is the regular coefficient.

Thus, the new fuzzy rule R_L is generated and added to old system S_L-1 form a new system S_L. the residual error of new system S_L is $e_{L} (X) = Y - η^{(L)} {a^{(L)}}^{+}$ (14)

As that, the new fuzzy rules are generated gradually under a supervisory mechanism and added to the system until specified performance ɛ is met. In the parameters training algorithm, in order to avoid too long training process and facilitate the control of training process, we set a maximum rules number L_max.

The parameters training algorithm for SCFS is summarized as follows:

Parameters training algorithm for SCFS

Given dataset {X, Y}; Set the error tolerance ɛ, the maximum configuration times T, a set of positive numbers ϒ ={ K_min : ΔK, : K_max } and the maximum number of rules L_max.

1. Initial e = Y, L = 0, two null sets P and W.

2. While L < L_max and e > ɛ, Do

Stage 1: Premise parameters configuration

3. For K ∈ ϒ

4. For t = 1, ⋯ , T

5. Randomly allocate c_L and σ_L;

6. Calculate H_L based on Equation (11);

7. If $H_{L} \geq \frac{1}{N + 1}$

8. save c_L and σ_L in W, H_L in P;

9. End If

10. End For

11. If W is not empty

12. Find $c_{L}^{*}$ , $σ_{L}^{*}$ that maximize H_L in P;

13. Break and go to step 16;

14. End If

15. End For

Stage 2: Consequent parameters evaluation

16. Compute a^(L) based on Equation (13);

17. Compute e based on Equation (14);

18. End while

Return c_i, σ_i, a_i, i = 1, 2, ⋯ , L.

Thus, the proposed parameters training algorithm has completed the designing of SCFS.

Remark 5. In the parameter learning algorithm of SCFS, better premise parameters c_L, σ_L are selected from random parameters satisfying inequality constraints, and the consequent parameters are optimized, so the approximation performance of the model is better.

6 Numerical simulations

We investigate the performance of the proposed SCFS in this section. Compared with the proposed SCFS, MQ [25], ELM [26], IRVFLN [13], SCN [14], ISCN [27], CSSA-SCN [28], BLS [29], FBLS [30], HPFNN [31] and other models are performed. Root mean squares error (RMSE) is adopted to evaluate the prediction accuracy of the proposed SCFS and the selected comparison models. The statistic measure is denoted by $RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}$ where N is the number of sample data, y_i is the actual value while ${\hat{y}}_{i}$ is the predictive value. In the following experiments, every experiment is repeated 30 times and the mean and standard deviations of RMSE are reported.

6.1 Synthetic data (Syn_1D)

A one-dimensional nonlinear function f₁ (x) is considered. $f_{1} (x) = 0.2 e^{- {(10 x - 4)}^{2}} + 0.5 e^{- {(80 x - 40)}^{2}}$ $+ 0.3 e^{- {(80 x - 20)}^{2}}, x \in [0, 1] .$

As in [14], training data set consists of 1000 samples uniformly distributed in the interval [0,1], and testing dataset is 300 samples produced from an equally spaced grid on [0, 1].

The parameters are set as follows. Let the degree of Chebyshev polynomial n be 4, the regularization parameter λ be 1e - 10 and the collection of positive constants ϒ be{ 1e3 : 1e3 : 1e4 } in SCFS. Let the predefined maximum configuration times T be 100 in ISCN, SCN, CSSA-SCN and SCFS, the scope of random parameters be [- 1, 1] in IRVFLN and ELM, and the learning rate in MQ be 0.5.

Table 1 compares the training and testing RMSE of the proposed SCFS and other six models, in which L_max is maximum rules number or hidden nodes number (from the perspective of neural fuzzy network, each rule corresponds to a hidden node, so we will not distinguish hidden nodes and fuzzy rules below). From the comparison results in Table 1 where the bold items show the best results, it is obvious that the proposed SCFS has the best training performance and testing performance.

Table 1
RMSE Comparisons on Syn_1D

Dataset Algorithms Training RMSE Testing RMSE

L_max=25 L_max=50 L_max=25 L_max=50

mean std mean std mean std mean std

Syn_1D MQ 0.1031 0.0001 0.103 0.0001 0.1011 0.0003 0.1011 0.0003

ELM 0.0631 0.0001 0.0631 0.0001 0.0631 0.0001 0.063 0.0001

IRVFL 0.163 0.0008 0.1626 0.0005 0.1622 0.0012 0.1617 0.0008

SCN [14] 0.0332 0.0065 0.0097 0.0036 0.0308 0.006 0.01 0.0033

ISCN [27] 0.0745 0.0049 0.034 0.0091 0.0773 0.0039 0.0367 0.0093

CSSA-SCN [28] 0.0058 0.0025 0.0009 0.0003 0.0058 0.0025 0.0009 0.0003

SCFS 8E-05 9E-05 1E-05 1.1E-05 1E-04 0.0001 4.2E-05 7.8E-05

Dataset	Algorithms	Training RMSE	Testing RMSE
Syn_1D	MQ	0.1031	0.0001	0.103	0.0001	0.1011	0.0003	0.1011	0.0003
	ELM	0.0631	0.0001	0.0631	0.0001	0.0631	0.0001	0.063	0.0001
	IRVFL	0.163	0.0008	0.1626	0.0005	0.1622	0.0012	0.1617	0.0008
	SCN [14]	0.0332	0.0065	0.0097	0.0036	0.0308	0.006	0.01	0.0033
	ISCN [27]	0.0745	0.0049	0.034	0.0091	0.0773	0.0039	0.0367	0.0093
	CSSA-SCN [28]	0.0058	0.0025	0.0009	0.0003	0.0058	0.0025	0.0009	0.0003
	SCFS	8E-05	9E-05	1E-05	1.1E-05	1E-04	0.0001	4.2E-05	7.8E-05

Figure 2 shows the testing performance of the IRVFLN, SCN and SCFS. It can be found that the SCFS has distinctly better generalization ability compared against the IRVFLN and SCN.

Fig. 2

testing performance on Syn_1D when L_max = 25.

Table 2 presents the result of efficiency comparisons. It can be found that IRVFL are failed in achieving the expected training tolerances while L_max is set as 50 and SCFS requires fewer fuzzy rules than SCN to achieve the expected training tolerances.

Table 2

Efficiency comparisons under the same error tolerance on Syn_1D

Algorithms	error tolerance = 0.05		error tolerance = 0.03
	Hidden nodes		Hidden nodes
	(mean,std)		(mean,std)
IRVFL	–	–	–	–
SCN	19.9	3.617	30.6	3.565
SCFS	2	0	2.5	0.527

The proposed SCFS uses the incremental learning approach to designing TSK fuzzy system. A common problem with incremental learning process is that it may be very long (maybe infinite if unlucky). In Section 4, we explain this problem from theoretically two aspects: the existence of random parameters and sufficient conditions to achieve a given precision. The supervision mechanism of SCFS ensures that the incremental learning process of SCFS is finite. The results in Table 2 also explain this problem from the perspective of simulation experiments.

Figure 3 shows the comparisons of convergence rates for testing data among the proposed SCFS, SCN and IRVFL. It can be found that SCFS has the best convergence performance. That is, SCFS can achieve satisfactory performance with less fuzzy rules.

Fig. 3

Testing RMSE with 50 additive nodes or rules.

6.2 Stock prices of ten aerospace companies (Stock)

The Stock dataset from the KEEL (http://keel.es/) dataset repository contains 950 daily stock prices of ten aerospace companies. 75% of the samples are randomly chosen as training dataset and the rest as testing dataset.

The parameters are set as follows. Let the degree of Chebyshev polynomial n be 3, the regularization parameter λ be 0.1 and the collection of positive constants ϒ be{ 1, 100, 1e4, 1e6, 1e8 } in SCFS. Let the maximum configuration times T be 100 in ISCN, SCN, CSSA-SCN and SCFS, the scope of random parameters (w, b) be [- 1, 1] in BLS andFBLS.

Table 3 compares the training and testing RMSE of the proposed SCFS and other five models on Stock. In Table 3, the bold items show the best RMSE. It is obvious that SCFS performs best in both training and testing performance. This shows that the proposed SCFS performs well not only on synthetic data, but also on real datasets

Table 3
RMSE Comparisons on Stock

Dataset Algorithms Training RMSE Testing RMSE

L_max=25 L_max=50 L_max=25 L_max=50

Mean std mean std mean std mean std

Stock SCN [14] 0.0399 0.001 0.0309 0.0006 0.044 0.0013 0.0374 0.0015

ISCN [27] 0.0355 0.0012 0.0271 0.0008 0.0471 0.0069 0.0414 0.0058

CSSA-SCN [28] 0.0401 0.0011 0.0305 0.0006 0.0442 0.0016 0.0365 0.0016

BLS 0.04862 0.0015 0.0361 0.00303 0.0505 0.00055 0.039379 0.00312

FBLS 0.03552 0.0012 0.0278 0.0008 0.039 0.00265 0.032202 0.00131

SCFS 0.01984 0.0001 0.0165 0.00031 0.0269 0.00169 0.025349 0.00102

Dataset	Algorithms	Training RMSE	Testing RMSE
Stock	SCN [14]	0.0399	0.001	0.0309	0.0006	0.044	0.0013	0.0374	0.0015
	ISCN [27]	0.0355	0.0012	0.0271	0.0008	0.0471	0.0069	0.0414	0.0058
	CSSA-SCN [28]	0.0401	0.0011	0.0305	0.0006	0.0442	0.0016	0.0365	0.0016
	BLS	0.04862	0.0015	0.0361	0.00303	0.0505	0.00055	0.039379	0.00312
	FBLS	0.03552	0.0012	0.0278	0.0008	0.039	0.00265	0.032202	0.00131
	SCFS	0.01984	0.0001	0.0165	0.00031	0.0269	0.00169	0.025349	0.00102

Figure 4–7 shows the performance comparison between SCFS with different values of paraments n and λ. We can see that although a bigger degree of Chebyshev polynomial n and a smaller regularization parameter λ contribute to better training performance, but on testing performance it is not necessarily and it may occur overfitting. The selection of appropriate parameters can improve the performance of the model. Grid search for the parameters in SCFS and other models to get better performance is used in this paper.

Fig. 4

Training RMSE of SCFS with different n.

Fig. 5

Testing RMSE of SCFS with different n.

Fig. 6

Training RMSE of SCFS with different λ.

Fig. 7

Testing RMSE of SCFS with different λ.

6.3 Twenty open datasets

We next train 10 algorithms (including the proposed SCFS) with 20 datasets from UCI and KEEL repositories and perform statistical analysis to further evaluate the proposed SCFS. Table 4 shows the detail of the 20 datasets, including the abbreviation name which will be used in subsequent tables.

Table 4
Information about the 20 open datasets

Datasets Abbr. No. of features No. of data

Airfoil Self-Noise ASN 5 1503

Boston Real State Prices BP 13 506

Concrete Strength CS 8 1030

Automobile Miles Per Gallon MPG 7 392

Yacht Hydrodynamics YH 6 308

QSAR Fish Toxicity QFT 6 908

Energy Efficiency (Cooling) EC 8 768

Treasury TR 15 1049

Abalone AB 8 4177

Red Wine Quality RW 11 1599

Energy Efficiency (Heating) EH 8 786

Analyzing Categorical AC 7 4052

Auto MPG6 MP6 5 392

Daily Electricity Energy DEE 6 365

Baseball BA 16 337

Weather Izmir Data WIZ 9 1461

Weather Ankara WA 9 1609

Laser Generated Data LA 4 993

Electrical-Maintenance ELE 4 1056

Computer Activity CA 21 8192

Datasets	Abbr.	No. of features	No. of data
Airfoil Self-Noise	ASN	5	1503
Boston Real State Prices	BP	13	506
Concrete Strength	CS	8	1030
Automobile Miles Per Gallon	MPG	7	392
Yacht Hydrodynamics	YH	6	308
QSAR Fish Toxicity	QFT	6	908
Energy Efficiency (Cooling)	EC	8	768
Treasury	TR	15	1049
Abalone	AB	8	4177
Red Wine Quality	RW	11	1599
Energy Efficiency (Heating)	EH	8	786
Analyzing Categorical	AC	7	4052
Auto MPG6	MP6	5	392
Daily Electricity Energy	DEE	6	365
Baseball	BA	16	337
Weather Izmir Data	WIZ	9	1461
Weather Ankara	WA	9	1609
Laser Generated Data	LA	4	993
Electrical-Maintenance	ELE	4	1056
Computer Activity	CA	21	8192

The testing RMSE comparisons of the proposed SCFS with other 9 algorithms, such as MLP, SVR, LR, KNN, HPFNN [31], SCN [14], BLS [29], FBLS [30] and SCBLS [32] are listed in Table 5. The results of HPFNN, MLP, SVR, LR, and KNN are borrowed from [31], and the results of FBLS, BLS, SCN, SCBLS are the optimal test RMSE obtained by simulation on the dataset using the source code. The regularization parameter λ and the degree of Chebyshev polynomial n are selected from {1e-3, 1e-2, 1e-1} and {1, 2, 3, 4} respectively. Let the collection of positive constants ϒ be{ 1e3 : 1e3 : 1e4 } and the predefined maximum configuration times T be 100.

Table 5

Testing RMSE Comparisons of 10 algorithms on 20 open datasets

Dataset	SCFS	MLP	SVR	LR	KNN	HPFNN	SCN	BLS	FBLS	SCBLS
ASN	3.5623	4.3612	4.8892	4.8185	2.7477	3.6479	5.2468	4.1255	4.2006	3.9016
BP	4.0518	4.4957	5.033	4.8768	4.6623	3.6176	6.1395	5.7109	5.7305	4.7395
CS	5.3247	7.7434	10.943	10.495	9.4581	6.7854	7.2041	7.166	7.0735	6.7368
MPG	2.9045	3.2941	3.4528	3.3755	3.3837	2.5723	3.1761	3.178	3.1271	3.1633
YH	1.3912	1.1839	10.904	8.9711	7.6057	0.8783	9.3073	6.9055	5.2876	4.3551
QFT	0.8486	0.9554	0.9505	0.9415	0.8715	0.9166	0.8986	0.8697	0.8798	0.8763
EC	1.2389	2.4109	3.2947	3.2011	5.223	1.773	3.0224	2.8301	2.039	2.943
TR	0.2027	0.2407	0.2455	0.243	0.2496	0.2305	0.225	0.2328	0.2272	0.2149
AB	2.1089	2.3723	2.2679	2.2245	2.860	2.0966	2.2502	2.183	2.1872	2.2228
RW	0.6380	0.7392	0.6588	0.6535	0.7666	0.629	0.6601	0.6571	0.6616	0.6552
EH	0.6381	1.3836	2.9975	2.9413	5.1824	0.7662	2.3875	2.4324	1.2139	2.3985
AC	0.0795	0.0951	0.5134	0.4129	0.1032	0.078	0.113	0.102	0.0921	0.0924
MP6	2.3414	3.277	3.5248	3.4446	3.3228	2.6323	2.085	2.174	2.0788	2.1359
DEE	0.3837	0.4635	0.4129	0.4084	0.8301	0.3883	0.3881	0.3811	0.3714	0.3791
BA	631.21	1021.5	758.26	744.4	838.37	615.14	711.36	666.68	642.84	678.4
WIZ	1.095	1.3819	1.259	1.2591	2.4036	1.1028	1.1402	1.1426	1.1511	1.1389
WA	1.2346	1.4974	1.5762	1.5702	2.8141	1.2754	1.2453	1.3104	1.2811	1.2825
LA	6.4787	7.2183	23.557	23.075	11.749	7.0169	6.3307	6.2107	6.1545	6.3058
ELE	91.37	153.14	168.11	164.5	114.95	115.26	110.75	113.69	111.97	99.984
CA	2.3123	3.1991	12.436	9.7016	3.4753	2.9284	4.2178	2.9228	4.6363	3.4578

The bold items in Table 5 are the minimum testing RMSE of the 10 algorithms on each dataset. We can find that our SCFS gets the minimum testing RMSE on the most data sets than the other 9 algorithms. The proposed SCFS has the best generalization ability and gets the minimum on 9 datasets. The second is HPFNN which gets the minimum on 7 datasets, but the difference is small. Hence, we perform statistical tests below to show that the difference between the proposed SCFS and HPFNN is significant.

We divide the 9 comparison models into two groups according to whether they are randomized methods: (1) MLP, SVR, LR, and KNN; and (2) HPFNN, SCBLS, FBLS, BLS and SCN.

First, the Friedman test is adopted to each group for analyzing the ranking of these models. Table 6 and Table 7 display the results of the Friedman test of each group. It can be seen that the rankings of SCFS in two group are both first. What’s more, the two p-values are both less than 0.1, they demonstrate that the ranking are significantly different.

Table 6

Result of Friedman tests among SCFS, MLP, SVR, LR, and KNN

	SCFS	MLP	SVR	LR	KNN
Ranking	1.1	2.75	4.2	3.25	3.7
p-value	3.41E-09

Table 7

Result of Friedman tests among SCFS, MLP, SVR, LR, and KNN

	SCFS	HPFNN	SCBLS	FBLS	BLS	SCN
Ranking	1.9	2.95	3.45	3.7	4.25	4.75
p-value	2.60E-05

Second, the Holm post-hoc test is applied to compare the proposed SCFS with other models by one vs one. The results are listed in Tables 8 and 9. if the significance level α is large than the adjusted p-value, we reject the null hypothesis H₀ that the comparative two models have no significant difference. It means there are significant differences. We can find that the proposed SCFS are significantly better than SVR, KNN, LR, MLP, SCBLS, FBLS, BLS and SCN. And from Table 9, we also can see the average rankings of SCFS and HPFNN have significantly differences. Hence, compared with other 9 models, the proposed SCFS has good generalizationperformance.

Table 8

Result of Holm test for SCFS vs MLP, SVR, LR, and KNN with α = 0.1

Models	Statistic	Adjusted p-value	Hypothesis
SVR	6.2	0	H0 is rejected
KNN	5.2	0	H0 is rejected
LR	4.3	0.00003	H0 is rejected
MLP	3.3	0.00097	H0 is rejected

Table 9

Result of Holm test for SCFS vs SCN, BLS, FBLS, SCBLS and HPFNN with α = 0.1

Models	Statistic	Adjusted p-value	Hypothesis
SCN	4.81738	0.00001	H0 is rejected
BLS	3.97222	0.00028	H0 is rejected
FBLS	3.04256	0.00704	H0 is rejected
SCBLS	2.61998	0.01759	H0 is rejected
HPFNN	1.77482	0.07593	H0 is rejected

7 Conclusion

This paper develops SCFS, a kind of adaptive incremental TSK fuzzy system based on stochastic configuration, to improve randomized methods for designing fuzzy system. The proposed SCFS determines the appropriate number of fuzzy rules in TSK fuzzy system by incremental learning approach, which starts from an initial system, and gradually adds randomly generated fuzzy rules to improve the system performance until the specified performance is achieved. The premise parameters of new adding fuzzy rules are randomly assigned under the supervisory mechanism to ensure the continuous improvement of system performance. The universal approximation property and the convergence of approximation error of SCFS have been proved theoretically in Section 4.

Hence, the proposed SCFS has the advantages of adaptive and fast modeling of randomized methods and can guarantee good approximation ability. We compare the performance of the proposed SCFS with the other 14 models on a series of synthetic data and benchmark datasets. The experimental results and statistical analyses show that the proposed SCFS is significantly outperform than other methods in terms of both approximation and generalization performance.

In the future, we can combine broad learning system with SCFS to further improve the approximation performance and training speed of the fuzzy system by referring to the characteristics of broad learning system, such as fast training and strong nonlinear representation ability. In addition, the improvement of the interpretability of fuzzy system and the integration of intelligence optimization algorithm [33 –35] are also interesting.

Footnotes

Acknowledgments

This work is supported by the National Key Research and Development Program of China (2018AAA0100300), the Fundamental Research Funds for the Central Universities (DUT22YG227) and the National Natural Science Foundation (NNSF) of China (12071056, 61773088).

References

Sugeno

, Takagi

Fuzzy identification of systems and its applications to modeling and control, IEEE Transactions on Systems, Man, and Cybernetics 15(1) (1985), 116–132.

Ashrafi

, Prasad

D.K.

, Quek

IT2-GSETSK: An evolving interval Type-II TSK fuzzy neural system for online modeling of noisy data, Neurocomputing 407 (2020), 1–11.

Zhang

, Deng

, Ishibuchi

, Pang

L.M.

Robust TSK Fuzzy System Based on Semisupervised Learning for Label Noise Data, IEEE Transactions on Fuzzy Systems 29(8) (2021), 2145–2157.

Sadjadi

E.N.

, Garcia Herrero

, Manuel Molina

, Hatami Moghaddam

On Approximation Properties of Smooth Fuzzy Models, International Journal of Fuzzy Systems 20(8) (2018), 2657–2667.

Mahmoud

T.A.

, Abdo

M.I.

, Elsheikh

E.A.

, Elshenawy

L.M.

Direct adaptive control for nonlinear systems using a TSK fuzzy echo state network based on fractional-order learning algorithm, Journal of the Franklin Institute 358 (2021), 9034–9060.

, Zou

, Zhang

, Lai

An evolving T–S fuzzy model identification approach based on a special membership function and its application on pump-turbine governing system, Engineering Applications of Artificial Intelligence 69 (2018), 93–103.

, Zhou

, Fu

, Kou

, Xiao

T–S fuzzy model Identification with a gravitational search-based hyperplane clustering algorithm, IEEE Transactions on Fuzzy Systems 20(2) (2012), 305–317.

Mahajan

, Abualigah

, Pandit

A.K.

, Altalhi

Hybrid Aquila optimizer with arithmetic optimization algorithm for global optimization tasks, Soft Comput 26 (2022), 4863–4881.

Mahajan

, Abualigah

, Pandit

A.K.

et al. Fusion of modern meta-heuristic optimization methods using arithmetic optimization algorithm for global optimization tasks, Soft Comput 26 (2022), 6749–6763.

10.

Mahajan

, Pandit

A.K.

Hybrid method to supervise feature selection using signal processing and complex algebra techniques, Multimed Tools Appl 81 (2021), 28755–28778.

11.

Wang

, Song

, Li

Approximation properties of ELM-fuzzy systems for smooth functions and their derivatives, Neurocomputing 149 (2015), 265–274.

12.

, Pillai

G.N.

Regularized extreme learning adaptive neuro-fuzzy algorithm for regression and classification, Knowledge-Based Systems 127 (2017), 100–113.

13.

, Wang

Insights into randomized algorithms for neural networks: Practical issues and common pitfalls, Information Sciences 382–383 (2017), 170–178.

14.

Wang

, Li

Stochastic Configuration Networks: Fundamentals and Algorithms, IEEE Transactions on Cybernetics 47(10) (2017), 3466–3479.

15.

Zhang

, Ding

, Zhang

, Jia

Parallel stochastic configuration networks for large-scale data regression, Applied Soft Computing 103 (2021), 107143.

16.

Dai

, Zhou

, Li

, Zhu

, Wang

Hybrid Parallel Stochastic Configuration Networks for Industrial Data Analytics, IEEE Trans Ind Inf 18(4) (2022), 2331–2341.

17.

Felicetti

M.J.

, Wang

Deep stochastic configuration networks with optimised model and hyper-parameters, Information Sciences 600 (2022), 431–441.

18.

, Huang

, Wang

Stochastic configuration networks for multi-dimensional integral evaluation, Information Sciences 601 (2022), 323–339.

19.

Pratama

, Wang

Deep stacked stochastic configuration networks for lifelong learning of non-stationary data streams, Information Sciences 495 (2019), 150–174.

20.

Wang

, Dai

, Ma

, Shang

Driving amount based stochastic configuration network for industrial process modeling, Neurocomputing 394 (2020), 61–69.

21.

Wang

, Chung

F.L.

, Shen

H.B.

, Hu

D.W.

Cascaded centralized TSK fuzzy system: Universal approximator and high interpretation, Applied Soft Computing Journal 5 (2005), 131–145.

22.

Zhou

, Chung

, Wang

Deep TSK Fuzzy Classifier with Stacked Generalization and Triplely Concise Interpretability Guarantee for Large Data, IEEE Transactions on Fuzzy Systems 25(5) (2017), 1207–1221.

23.

Buckley

J.J.

Sugeno type controllers are universal controllers, Fuzzy Sets and Systems 53 (1993), 299–303.

24.

Pratama

, Lu

, Anavatti

, Lughofer

, Lim

C.P.

An incremental meta-cognitive-based scaffolding fuzzy neural network, Neurocomputing 171 (2016), 89–105.

25.

Kwok

T.Y.

, Yeung

D.Y.

Objective functions for training new hidden units in constructive neural networks, IEEE Transactions on Neural Networks 8(5) (1997), 1131–1148.

26.

Huang

G.B.

, Zhu

Q.Y.

, Siew

C.K.

Extreme learning machine: Theory and applications, Neurocomputing 70 (2006), 489–501.

27.

Zhu

, Feng

, Wang

, Jia

, He

A further study on the inequality constraints in stochastic configuration networks, Information Sciences 487 (2019), 77–83.

28.

Zhang

, Ding

A stochastic configuration network based on chaotic sparrow search algorithm, Knowledge-Based Systems 220 (2021), 1–20.

29.

Chen

C.L.P.

, Liu

Broad Learning System: An Effective and Efficient Incremental Learning System Without the Need for Deep Architecture, IEEE Transactions on Neural Networks and Learning Systems 29(1) (2018), 10–24.

30.

Feng

, Chen

C.L.P.

, Xu

, Liu

On the Accuracy–Complexity Tradeoff of Fuzzy Broad Learning System, IEEE Transactions on Fuzzy Systems 29(10) (2021), 2963–2974.

31.

Zhang

, Oh

S.K.

, Fu

Hierarchical polynomial-based fuzzy neural networks driven with the aid of hybrid network architecture and ranking-based neuron selection strategies, Applied Soft Computing 113 (2021), 1–17.

32.

Zhou

, Wang

, Li

H.X.

, Bao

M.H.

Stochastic configuration broad learning system and its approximation capability analysis, International Journal of Machine Learning and Cybernetics 13 (2021), 797–810.

33.

Mahajan

, Pandit

A.K.

Image Segmentation and Optimization Techniques: A Short Overview, Medicon Engineering Themes 2(2) (2022), 47–49.

34.

Mahajan

, Abualigah

, Pandit

A.K.

Hybrid arithmetic optimization algorithm with hunger games search for global optimization, Multimed Tools Appl 81 (2022), 28755–28778.

35.

Lakshmi

Y.V.

, Singh

, Abouhawwash

, Mahajan

, Pandit

A.K.

, Ahmed

A.B.

Improved Chan Algorithm Based Optimum UWB Sensor Node Localization Using Hybrid Particle Swarm Optimization, IEEE Access 10 (2022), 32546–32565.