Reservoir structure optimization of echo state networks: A detrended multiple cross-correlation pruning perspective

Abstract

Reservoir structure optimization of echo state networks (ESN) is an important enabler for improving network performance. In this regard, pruning provides an effective means to optimize reservoir structure by removing redundant components in the network. Existing studies achieve reservoir pruning by removing insignificant neuronal connections. However, such processing causes the optimized neurons to still remain in the reservoir and thus hinder network inference by participating in computations, leading to suboptimal utilization of pruning benefits by the network. To solve this problem, this paper proposes an adaptive pruning algorithm for ESN within the detrended multiple cross-correlation (DMC²) framework, i.e., DMAP. On the whole, it contains two main functional parts: DMC² measure of reservoir neurons and reservoir pruning. Specifically, the former is used to quantify the correlation among neurons. Based on this, the latter can remove neurons with high correlation from the reservoir completely, and finally obtain the optimal network structure by retraining the output weights. Experiment results show that DMAP-ESN outperforms its competitors in nonlinear approximation capability and reservoir stability.

Keywords

Echo state network reservoir structure optimization pruning time-series prediction detrended multiple cross-correlation

1 Introduction

Echo state networks (ESN) proposed by Jaeger [1, 2] is considered as an effective bio-inspired method for dealing with temporal trend data. The reservoir is the typical structure of an ESN, a large-scale internal recurrent topology. It can project the input signal into a high-dimensional state space, and then perform a nonlinear transformation on a time-dependent signal. In fact, this can be considered as a typical feature-learning process. In a classical ESN, the output weights are the only structure that needs to be trained and can be obtained by solving simple least squares problems. It avoids the problems of sluggish convergence and high computing costs that occur in traditional recurrent neural networks (RNN). Nowadays, thanks to its simple topology and powerful nonlinear feature characterization ability, ESN is widely used in many fields, such as time series prediction [3], anomaly detection [4], and language modeling [5].

However, the structural optimization of ESN remains a challenging problem. On the one hand, it is difficult to satisfy the accuracy requirements of complex nonlinear time series prediction in most situations since the reservoir of classical ESN is randomly generated. On the other hand, it always requires a lot of trials and even luck to choose a suitable reservoir size. Obviously, the reservoir cannot learn enough knowledge from the input data to satisfy the inference requirements if the neurons are too few, while too many neurons may lead to the potential overfitting problem [6]. For these reasons, ESN is severely limited in its promotion.

To this end, it is highly necessary to design efficient structural optimization methods for such an ESN, especially in the optimization of reservoir size.

1.1 Related work

RelatedWork Currently, a large number of researchers have made some attempts at the structural optimization of ESN. For example, Rodan and Tino et al. [7] developed a simple cycle reservoir network (SCRN) with deterministic parameters, which can perform comparably to a standard random ESN on widely used time-series benchmarks. Xue et al. [8] proposed an ESN model with both ring topology and leakage-integrated units, which produced richer reservoir dynamics and lower reservoir state correlation. Kawai et al. [9] introduced the small-world network topology into the design of the reservoir structure. Such a clustering structure can extend the range of echo state properties and contribute to the extension study of ESN topology. Xue et al. [10] proposed an ESN structure optimization algorithm based on particle swarm optimization, which allowed the adjusted network to have better adaptability, stronger prediction and stability. However, these solutions maintain the pursuit of high estimation accuracy without considering the applicability in real deployments. For example, the size of these networks can easily surge to converge to the best possible performance, which is inefficient and impractical. In addition, these networks can easily fall into the trap of overfitting problem with increasing network size. Hence, it is imperative to explore an efficient ESN with small size.

To tackle the above problem, network pruning can be considered as an effective structural optimization method for ESN, which can significantly reduce the model size, runtime memory, and computational operations at a low network training overhead. Particularly, designers generally tend to design reservoirs large enough to ensure that the extracted features are comprehensive enough. Nevertheless, not every neuron in the reservoir is able to detect the key features of the network input, leading to the existence of redundant neurons. In other words, a good reservoir should be high cohesion and low coupling with sufficient feature mapping capability [11]. For such a reservoir structure, it can easily be obtained by network pruning. On the one hand, pruning can remove unimportant network parameters, so that the pruned network has a smaller size. On the other hand, the reservoir does not focus on unimportant features since only important neurons are retained after pruning, which avoids potential overfitting problems. In some recent work, various methods have been proposed to remove redundant parameters in ESN. Scardapane et al. [12] proposed an online pruning algorithm for redundant connections of reservoirs, which allowed to obtain optimal sparse reservoirs in a robust approach. Liu et al. [13, 14] proposed a method to remove redundant connections based on the correlation between reservoir nodes, which solved the redundancy problem of reservoir information. In addition, there are also some studies focused on pruning redundant connections from the reservoir to the output layer [15 –18]. These studies made contributions to reservoir structure optimization by reducing redundant parameters in the network. However, it is worth noting that the neurons optimized by the above methods are still existing in the reservoir, which may perturb the reservoir dynamics directly or indirectly and even negatively affect the network inference by participating in the computation. As a result, the network is unable to fully exploit the benefits provided by the optimized neurons, thereby resulting in a sub-optimal performance improvement for the pruned network. Hence, our interest lies in finding an efficient pruning algorithm to remove the entire redundant neurons from the reservoir, aiming to completely eliminate their negative impact on the ESN. Specifically, in order to optimize the topology of an ESN from a pruning perspective, it is particularly critical to accurately assess the correlation between reservoir neurons. Such a correlation can provide important a priori knowledge for the pruning procedure, enabling it to accurately identify neurons that should be removed from the reservoir. For this reason, it is crucial to seek an effective method for analyzing correlations between reservoir neurons. It is well known that the traditional Pearson correlation coefficient (PCC) is commonly used to measure the association relationship between two neurons [18, 19]. However, such an analytical tool heavily relies on the linear assumption of the data, and exhibits significant inapplicability when dealing with non-stationary data [20]. Unfortunately, the assessment of correlation between reservoir neurons is often achieved by analyzing the dependencies between their state series, but these state series frequently exhibit significant non-linear and non-stationary behavior [21]. Hence, PCC is not well-suited for the analysis of correlation between reservoir neurons.

The detrended multiple cross-correlation (DMC²) coefficient is a powerful tool for modeling the dependencies among multiple variables [22]. Unlike PCC, it not only overcomes the limitations of the linearity assumption of the original data, but also does not have strict requirements on the stationarity of the data. So far, it has been widely used in many fields [22, 23]. For example, Wang et al. accurately identified PM2.5 as the main factor affecting air quality of Beijing using DMC² and demonstrated that their analysis results have statistical significance [20]. Da et al. defined a method for assessing the statistical significance of DMC² based on a probability distribution function and accurately quantified the interactions between stock data from different financial markets [24]. Moreover, some researchers have also applied DMC² to the modeling of ecological data and have shown that it can provide better statistical results than traditional methods [25]. In view of the aforementioned advantages, this paper considers employing DMC² to assess the correlation among reservoir neurons, with the aim of providing accurate and reliable prior knowledge for the subsequent pruning procedure.

1.2 Contributions

In this paper, we propose an adaptive pruning algorithm for ESN structure optimization within the DMC² framework, i.e., DMAP. The DMC² measure of reservoir neurons and reservoir pruning are its two main functional parts. Specifically, the former can evaluate the correlation of reservoir neurons to provide guidance for reservoir pruning. Subsequently, the latter can completely remove the neurons with high correlation from the reservoir, and finally, the optimal reservoir topology can be obtained by retraining the output weights of the network. Extensive experiments show that DMAP algorithm greatly reduces the size of the ESN network, while the nonlinear approximation performance and stability are improved. The contributions of this study are outlined below.

We propose a novel adaptive pruning algorithm (DMAP) for removing unimportant neurons from the reservoir, which enhances the network’s prediction capability and stability.

We quantify the correlation among multiple reservoir neurons by DMC².

We prove the DMAP’s effectiveness through extensive simulation experiments.

We measure the reservoir richness after each pruning using the average state entropy (ASE).

The remainder of this paper is structured as follows. Section 2 provides an overview of the ESN. Section 3 gives a general overview of DMAP algorithm. Section 4 performs a large number of simulations to prove the effectiveness of our strategy. The pruning process of DMAP is discussed in detail in Section 5. Finally, Section 6 provides a summary of our work.

2 Echo state network

A standard ESN consists of a reservoir with N nodes characterized by a nonlinear transfer function f (·). At moment t, this network is driven by the input signal u^t and generates the output y^t .

Specifically, as a discrete-time nonlinear system, the ESN reads

$x^{t} = f (W^{res} x^{(t - 1)} + W^{in} u^{t}),$ (1)

$y^{t} = W^{out} x^{t},$ (2)

where x ^t is the global reservoir state at time t, $W^{in} \in ℝ^{N \times I}$ is the input weight matrix (connections from the input to the reservoir), $W^{res} \in ℝ^{N \times N}$ is the reservoir weight matrix (internal connections of the reservoir), $W^{res} \in ℝ^{O \times N}$ is the output weight matrix (connections from the reservoir to the output), and f (·) is the reservoir activation function, usually tanh.

As mentioned above, W ^out is the only parameter that needs to be trained in an ESN. During the training process, the states of the reservoir neurons are collected into the state matrix X, denoted as

$X = (x_{i}) = (\begin{matrix} x_{i}^{t_{min + 1}} \\ x_{i}^{t_{min + 2}} \\ ⋮ \\ x_{i}^{t} \end{matrix}),$ (3)

where i = 1, 2, … , N, represents the number of the corresponding reservoir neuron.

The corresponding network readouts are then collected into the target matrix Y, denoted as $Y = (\begin{matrix} y^{t_{min + 1}} \\ y^{t_{min + 2}} \\ ⋮ \\ y^{t} \end{matrix}) .$ (4)

Specifically, in order to mitigate the effects of reservoir initialization, the certain washout time t^min should be discarded. In fact, it can be viewed as a regression problem to determine a suitable W ^out, that is $Y = X W^{out} .$ (5) In general, the least squares method is used to solve the problem, denoted as $W^{out} = \underset{w}{arg min} X w - Y,$ (6)

where

denotes a 2-norm operator. The optimal solution for W ^out is calculated as follows $W^{out} = X^{+} Y = {(X^{T} X)}^{- 1} X^{T} Y,$ (7)

where X ⁺ represents the generalized Moore-Penrose pseudoinverse of X. The determination of the optimal W ^out means the end of the whole ESN training.

Notably, the three main hyperparameters of an ESN need to be initialized before the training procedure, i.e., α, β, and r.

α is an input-scaling parameter, and the elements in W ⁱⁿ are commonly initialized randomly from a uniform distribution in [- α, α].

β is the sparsity parameter, defined as the proportion of non-zero elements to the total elements in the reservoir weight matrix W ^res.

r is the spectral radius parameter, defined as the maximum of the absolute values of all the eigenvalues of the reservoir weight matrix W ^res. W ^res is initialized based on a matrix W: $W^{res} = r \frac{W}{| ς_{max} (W) |},$ (8)

where the elements of W are generated randomly in [-1, 1], and ς_max (W) is the largest eigenvalue of W .

3 Methodologies

This section describes the general framework for optimizing the reservoir structure by DMAP, as shown in Fig. 1 overview. In this framework, two main parts are involved: DMC² measure of reservoir neurons and reservoir pruning. Functionally, the former can estimate the correlation between neurons in the reservoir based on quantitative DMC² measure. Taking this as a basis, the latter allows to obtain a compact reservoir by removing highly correlated neurons from the reservoir. Next, the relevant algorithms involved in modules DMC² measure of reservoir neurons and reservoir pruning are described in detail, respectively.

Fig. 1

Schematic diagram of optimizing reservoir structure by DMAP.

3.1 DMC² measure of reservoir neurons

In this subsection, we provide a detailed construction procedure for the DMC². Firstly, obtain the state vectors of all neurons in the reservoir. Subsequently, the detrended cross-correlation analysis (DCCA) coefficient (defined as the ratio between the detrended covariance function $F_{DCCA}^{2}$ and the detrended variance function F_DFA [23]) between each two neurons are computed one by one using these state vectors to construct the detrended cross-correlation matrix. Using this as a basis, the DMC² of the selected neuron versus all remaining neurons in the reservoir can be calculated. The detailed computational procedure is elaborated below.

Assuming that x _i and x _j represent the state vectors of i-th and j-th neurons in the reservoir, respectively, we can obtain two new vectors by computing the cumulative deviation values of each state vector, described as follows ${\begin{matrix} S_{i}^{k} = \sum_{t = 1}^{k} (x_{i}^{t} - m_{i}) \\ S_{j}^{k} = \sum_{t = 1}^{k} (x_{j}^{t} - m_{j}) \end{matrix},$ (9)

where m_i and m_j represent the mean values of x _i and x _j, respectively, $x_{i}^{t}$ and $x_{j}^{t}$ represent the state of i-th and j-th neurons at moment t, respectively, and k = 1, 2, ⋯ , L (the state vector length of each reservoir neuron). Subsequently, each new vector is divided into ϒ windows of equal length n for better fit the local trend. In this study, we set 4 ≤ n ≤ L/4. Then, we calculate the local trend ${{\tilde{S}}_{i}^{k, υ}}$ and ${{\tilde{S}}_{j}^{k, υ}}$ of each window for ${S_{i}^{k}}$ and ${S_{j}^{k}}$ by least-squares fitting, respectively. Based on the local trends obtained, the variances f_{DFA_i} and f_{DFA_j} as well as the covariance f_DCCA of the residuals after detrending can be calculated, given by ${\begin{matrix} f_{DF A_{i}} (n, v) = {[n^{- 1} \sum_{k = 1}^{n} {(S_{i}^{(v - 1) n + k} - {\tilde{S}}_{i}^{k, v})}^{2}]}^{1 / 2} \\ f_{DF A_{j}} (n, v) = {[n^{- 1} \sum_{k = 1}^{n} {(S_{j}^{(v - 1) n + k} - {\tilde{S}}_{j}^{k, v})}^{2}]}^{1 / 2} \\ f_{DCCA} (n, v) = [n^{- 1} \sum_{k = 1}^{n} (S_{i}^{(v - 1) n + k} - {\tilde{S}}_{i}^{k, v}) \\ {\times (S_{j}^{(v - 1) n + k} - {\tilde{S}}_{j}^{k, v})]}^{1 / 2} \end{matrix},$ (10)

where υ = 1, 2, 3, ⋯ , ϒ. Furthermore, we can obtain the corresponding detrended variance functions F_{DFA_i} and F_{DFA_j} as well as the detrended covariance function F_DCCA by Equation (10), denoted as follows

${\begin{matrix} F_{D F A_{i}} (n) = {[Υ^{- 1} \sum_{v = 1}^{Υ} f_{{DFA}_{i}}^{2} (n, v)]}^{1 / 2} \\ F_{D F A_{i}} (n) = {[Υ^{- 1} \sum_{v = 1}^{Υ} f_{{DFA}_{j}}^{2} (n, v)]}^{1 / 2} \\ F_{DCCA} (n) = {[Υ^{- 1} \sum_{v = 1}^{Υ} f_{DCCA}^{2} (n, v)]}^{1 / 2} \end{matrix}$ (11)

Next, the DCCA coefficient of the i-th neuron versus the j-th neuron can be evaluated, which is computed by a function ρ_i, j (n) about the window length n, that is

$p i, j (n) = {[F_{D F A_{i}} (n) F_{D F A_{j}} (n)]}^{- 1} F_{D C C A}^{2} (n),$ (12) where -1 ⩽ ρ_i,j ⩽ 1. Since removing the trend of each order in the state vectors of the considered neurons makes ρ_i, j (n) possible to accurately estimate the similarity of the two reservoir neurons. Now, we can easily extend DCCA to a multivariate scenario for capturing multiple cross-correlation among N reservoir neurons, i.e., DMC². Concretely, for the reservoir state matrix X obtained by Equation (3), we can evaluate the DMC² coefficient of the i-th neuron versus the remaining N-1 neurons in the reservoir, which is related to the DCCA of the paired neurons in the reservoir, given by $δ_{i} (n) = ψ_{i, J} {(n)}^{T} ρ^{- 1} (n) ψ_{i, J} (n),$ (13) where ψ_{i,_J} (n) ^T is the detrended cross-correlation vector of the i-th neuron with the remaining N-1 neurons, denoted as $ψ_{i, J} {(n)}^{T} = (ρ_{i, 1}, ρ_{i, 2}, \dots, ρ_{i, j (i \neq j)}, \dots),$ (14) and

$ρ^{- 1} (n) = {(\begin{matrix} 1 & \dots & ρ_{1, i - 1} & ρ_{1, i + 1} & \dots \\ ⋮ & ⋱ & ⋮ & ⋮ & ⋮ \\ {ρ_{i - 1}}_{, 1} & \dots & 1 & ρ_{i - 1, i + 1} & \dots \\ ρ_{i + 1, 1} & \dots & ρ_{i + 1, i - 1} & 1 & \dots \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ \end{matrix})}^{- 1}$ (15) is the inverse matrix of the detrended cross-correlation coefficient of N-1 neurons except the i-th neuron.

For any given scenario, 0 ≤ δ_i (n) ≤1, with a rigorous derivation in [20]. Specifically, δ_i (n) =0 indicates no multiple cross-correlation, while δ_i (n) =1 indicates perfect multiple cross-correlation [23]. In our scenario, δ_i is used to measure the correlation between the reservoir neurons, aiming to provide guidance for the subsequent pruning procedure to identify redundant neurons.

3.2 Reservoir pruning

In this subsection, we provide a detailed description of reservoir pruning in the DMAP framework. Unlike these methods that only set the weights to zero [12 –18], DMAP can completely remove these selected redundant neurons from the reservoir under the guidance of DMC² criteria. The advantage of such processing is that any potential perturbations caused by the optimized neurons can be eliminated because they no longer exist in the network. It results in a more efficient and simplified network structure.

Algorithm algorithmic gives the pseudocode for optimizing ESN structures using DMAP. Specifically, create an ESN with large enough size and drive the reservoir by sample data, recording the states of the neurons, and storing them into X. Then, we can calculate the output weight matrix W ^out by Equation (7) and obtain network error E. Further, the DMC² coefficient of each reservoir neuron is calculated by Equation (13) and store them into the variable Γ = (δ₁, δ₂, δ₃, …). Based on Γ, we can remove the user-defined k neurons from the reservoir. Obviously, if we want to reduce the reservoir size while keeping the performance as stable as possible, we should remove the neurons with high δ-values. Finally, stop the pruning operation and obtain the optimized DMAP-ESN network by retraining W ^out when E exceeds the tolerable range or the maximum number of prunings is reached; otherwise, repeat the above pruning operations.

Algorithm 1 Pseudocode for optimizing ESN structure using DMAP.

Input: Sample data u _t , reservoir size N, spectral radius r,washout time t_min, state collection time t_max, maximum number of pruning m, and the threshold k for each pruning.

Output: Optimised DMAP-ESN.

1: ♯ Initialize ESN

2: Randomly generate W_in, W_res;

3: Set r∈(0, 1);

4: Configure an ESN;

5: for p = 1 to m do

6: ♯ Training ESN

7: for t = t^min to t^max do

8: Update reservoir states using Equation (1);

9: Collect network state X and output Y ;

10: end for

11: Calculate W ^out Wout by Equation (7) and network error E;

12: N← Obtain reservoir size by X ;

13: ♯ DMC² measure of reservoir neurons

14: for i = 1 to N do

15: Calculate ♯_i of i-th reservoir neuron by Equation (13);

16: Save δ_i to Γ (i);

17: end for

18: ♯ Reservoir pruning

19: for q = 1 to k do

20: i←Obtain the index of the largest value in Γ

21: W _in (i, :) = null;

22: W _res (:, i) = null;

23: W _res (i, :) = null;

24: Γ (i) = null;

25: end for

26: if E exceeds the tolerable range then

27: break;

28: end if

29: end for

4 Simulations

In this section, we verify the effectiveness of the proposed DMAP method, considering three different datasets: the chaotic mapping system Mackey-Glass (MG), the Multiple Superimposed Oscillator (MSO) problem, and the daily mean temperature (Temp) of Beijing. In our scenario, each dataset is divided into two parts for training and testing, respectively. Moreover, to emphasize the advantages of our method, we also evaluate some ESN-based models, unoptimized ESN (U-ESN), ESN based on Pearson correlation pruning (named P-ESN in this paper) [14], and C-ESN based on contribution pruning [17]. In the following experiments, each evaluated model under consideration has no output feedback. To ensure fairness, the same hyperparameter settings are used for each model in the same prediction task, and Tables 1 lists the detailed parameter settings.

Table 1
Hyperparameter settings of the evaluated models in three time series prediction tasks

Hyperparameter MG&MSO Temp

U-ESN C-ESN P-ESN DMAP-ESN U-ESN C-ESN P-ESN DMAP-ESN

Input scaling (α) 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05

Sparsity ratio (β) 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3

Spectral radius (r) 0.99 0.99 0.99 0.99 0.3 0.3 0.3 0.3

Initial reservoir size (N) 50/100 50/100 50/100 50/100 50/100 50/100 50/100 50/100

Hyperparameter	MG&MSO	Temp
Input scaling (α)	0.1	0.1	0.1	0.1	0.05	0.05	0.05	0.05
Sparsity ratio (β)	0.3	0.3	0.3	0.3	0.3	0.3	0.3	0.3
Spectral radius (r)	0.99	0.99	0.99	0.99	0.3	0.3	0.3	0.3
Initial reservoir size (N)	50/100	50/100	50/100	50/100	50/100	50/100	50/100	50/100

4.1 Datasets

In this part, we provide comprehensive explanations for the considered datasets, including the MG, the Temp, and the MSO. In this study, each dataset is split into two subgroups: the training set and the testing set, and their lengths are set to 300 and 300, respectively. Specifically, 100 time steps are discarded with the aim of washing out the initial transient. The description of the considered dataset is given below.

It is well known that the MG system [26, 27] is a typical benchmark with chaotic attractors for time series modeling. It has been extensively utilized for performance assessment of predictive models in many literatures. A time-delay differential equation with the following form can be used to derive the MG time series $\frac{dy (t)}{dt} = \frac{ay (t - τ)}{1 + y^{n} (t - τ)} + by (t),$ (16) where the relevant parameters are set as n = 10, a = 0.2, b = -0.1. The MG system can exhibit chaotic behavior when the time delay τ > 16.8. In our experiments, for a time delay τ = 17, 600 data points are generated from Equation (16).

Additionally, the real-world dataset of daily mean temperature in Beijing is used for our experimental evaluation. The data collection period spanned from December 9, 2021, to July 31, 2023, comprising a total of 600 records.

Finally, the analyzed MSO time series is created by adding up a number of straightforward sinusoidal functions, given by $y (n) = \sum_{i = 1}^{s} sin (α_{i} n),$ (17) where s is the quantity of sine waves, α_i represents the frequency of the summed sine waves, and n designates the time step’s integer index. The MSO problems with various numbers of sine waves are taken into consideration in the literature [28], where the sine wave frequencies are drawn from the same set: α₁ = 0.2, α₂ = 0.311, α₃ = 0.42, α₄ = 0.51, α₅ = 0.63, α₆ = 0.74, α₇ = 0.85, and α₈ = 0.97. We generate MSO time series of length 600 using s = 5 in our experiments.

4.2 Evaluation criteria

In our scenario, the Root Mean Square Error (RMSE) and the maximum local Lyapunov exponent (λ) are considered to measure the nonlinear approximation performance and stability of the evaluated models, respectively.

The RMSE reflects the deviation degree of the actual and predicted values, and the smaller the RMSE, the better the model prediction performance. The formula of the RMSE can be expressed as $RMSE = \sqrt{\frac{\sum_{k = 1}^{l} (y (k) - \hat{y} (k))^{2}}{l}},$ (18) where l is the number of samples, y (k) and $\hat{y}$ denote the expected output and the predicted output, respectively.

On the other hand, we explore the Jacobi matrix of the reservoir state update (1) and produce a derived metric, i.e., the maximum local Lyapunov exponent, λ. Such a quantity is used to approximate the separation rate in phase space for trajectories with very similar initial conditions. λ is considered to characterize reservoirs, and is widely used as a measure of reservoir stability in much of the literature [29, 30]. λ can be calculated by considering the Jacobi matrix at moment t, and denoted as follows when the neuron uses a hyperbolic tangent activation function

$J (x^{t}) = [\begin{matrix} 1 - {(x_{1}^{t})}^{2} & 0 & \dots & 0 \\ 0 & 1 - {(x_{2}^{t})}^{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 - {(x_{N}^{t})}^{2} \end{matrix}] W^{res},$ (19)

where $x_{i}^{t}$ denotes the state of the i-th neuron at moment t. Then λ can be derived from the following equation $λ = max_{i = 1, \dots, N} \frac{1}{l} \sum_{t = 1}^{l} log (ξ_{i}^{t}),$ (20) where $ξ_{i}^{t}$ represents the absolute value of the i-th eigenvalue of J (x^t) and l is the total number of time steps in the considered trajectory. Typically, the dynamics is ordered if λ < 0, whereas the dynamics is chaotic if λ > 0. In addition, a large number of studies have shown that the reservoir near the critical point (λ ≈ 0) has the optimal computational power. This point is called the chaos edge [9].

4.3 Results

Fig. 2shows the fitting results of each model to the original signal in the three time series prediction tasks. Specifically, the partial fitting results in each subfigure are given as a picture-in-picture (PIP) for better visualization. From the trend-change details in these PIPs, the prediction curves of the proposed DMAP-ESN are more similar to the trends of the desired curves compared to the other alternatives, especially in the MG and MSO prediction tasks, implying the optimal nonlinear approximation capability.

Fig. 2

Prediction results of the evaluated models when the minimum RMSE_test can be obtained (initialized reservoir size N = 100).

Figure. 3 shows the violin plots of the prediction errors after taking the absolute values of the evaluated models in three prediction tasks. Each violin plot consists of a box plot and density traces located on its left and right sides. The symmetric density traces can visualize the distribution of the absolute values of the prediction errors, and its width reflects the frequency of occurrence of data points. The built-in box plots give information about the basic distribution of data. On the one hand, the boxes of DMAP-ESN are the flattest and closer to zero overall, especially in Fig. 3(a), meaning smaller deviations in predicting the data. On the other hand, the minimum mean values also imply its advantage in prediction performance. Tables 2 shows the performance comparison results of each model when N = 50, 100. Once again, the minimum RMSE_test suggests that DMAP-ESN is the optimal architecture to handle these prediction tasks.

Fig. 3

Violin plots of the prediction errors after taking the absolute values of the evaluated models in three time series prediction tasks (initialized reservoir size N = 100).

Table 2

Minimum RMSE_test (×10^-3) condition of each experiment group

Dataset	Descriptive statistic	Initialized reservoir size N=50				Initialized reservoir size N=100
		U-ESN	C-ESN	P-ESN	DMAP-ESN	U-ESN	C-ESN	P-ESN	DMAP-ESN
MG	Mean	4.26	4.07	3.60	2.53	1.10	0.93	0.79	0.23
	Std.	0.23	0.37	0.24	0.57	0.07	0.26	0.21	0.12
Temp	Mean	4620.90	2765.08	3033.27	2699.43	4106.18	2727.62	2904.27	2662.05
	Std.	41.35	15.37	13.54	10.90	58.73	8.57	41.73	15.57
MSO	Mean	25.46	23.91	16.00	10.79	3.03	2.72	1.40	0.91
	Std.	3.28	2.81	4.43	3.85	0.16	0.33	0.51	0.38

Bold values indicate the best test performance for the current task.

Tables 3 lists the number of reservoir neurons removed by DMAP when the minimum RMSE_test can be obtained. Obviously, the number of removable neurons is task-dependent. For example, only 30 neurons can be removed from the reservoir in the MSO task, whereas up to 90 neurons can be removed from the reservoir in the Temp task when the initial N = 100. Therefore, this implies that we should analyze the characteristics of the input data in more depth in order to reasonably choose the appropriate initial reservoir size for that data, which helps reduce the time consumed by pruning.

Table 3

Number of removed reservoir neurons by DMAP

Reservoir size	MG	Temp	MSO
50	17	42	18
100	38	90	30

Fig. 4shows the obtained λ of the evaluated model reservoirs by varying r. As expected, it is seen from this figure that λ of DMAP-ESN is the minimum and reaches zero at the latest compared to its competitors, especially in Fig. 4(a). It indicates that the reservoir topology optimized by DMAP suppresses the chaos of neural dynamics to a greater extent, which possesses optimal stability compared with the considered alternatives. As a result, it is reasonable to believe that the processing of removing the neurons with high DMC² completely from the reservoir can maximally eliminate the perturbation components of the network and contribute to enhancing reservoir stability.

Fig. 4

Lyapunov exponent versus spectral radius.

In fact, the nonlinear approximation capability of ESN is closely related to reservoir stability, as shown in Fig. 5. For each task, it can be noticed that the RMSE_test of each model is at a small value when the λ < 0 (refer to the relevant panels in Fig. 4). However, once λ > 0 (beyond the stability edge), the RMSE_test increases significantly and fluctuates drastically with the growth of r. Combined with the discussion of reservoir stability in Fig. 4, it can be proven that DMAP-ESN allows the use of larger r to obtain satisfactory prediction performance and has stable dynamics.

Fig. 5

RMSE_test versus spectral radius.

5 Discussion

In this subsection, the process of ESN optimization by DMAP is discussed in detail when the initial N = 100, and our interest focuses on clarifying the effects caused by pruning on the network, such as reservoir richness, memory capacity, etc., aiming to give sufficient understanding to the question of how the proposed method affects the dynamic behaviors of the network.

5.1 DMC² Visualization

Fig. 6 shows DMC² coefficients of the removed neurons in the pruning process. It is worth noting that only one neuron is removed in each pruning. From this figure, it is seen that the DMC² coefficients of the removed neurons hold significantly high values at the early stage of pruning, implying a high coupling of the reservoir at this time. However, DMC² declined gradually as pruning continued, the reason for this being that high-similarity neurons are selectively removed by the proposed discard strategy. Hence, this proves that DMAP algorithm possesses a remarkable de-redundancy ability. In fact, it also suggests that our method may have the potential to solve the problem of reservoir collinearity [31], although this is beyond the scope of this paper.

Fig. 6

Heat map of the removed neurons δ during the pruning process, where the data are normalized for better visualization.

5.2 Performance insight

Fig. 7 shows the RMSE_test of the models evaluated at different reservoir sizes. Specifically, in each subfigure, the red curve is drawn according to the RMSE_test obtained by the DMAP-ESN after each pruning operation while processing the current task, where the initialized reservoir size is 100. The blue curve is drawn according to the RMSE_test obtained by the unoptimized U-ESN at different reservoir sizes while processing the current task.

Fig. 7

RMSE_test curves of DMAP-ESN during the pruning process.

On the one hand, from the red curves in Fig. 7(a)-(c), it can be seen that the RMSE_test of DMAP-ESN shows a significant decreasing trend after each pruning operation in the early stage of structure optimization, implying the improvement of the model’s nonlinear approximation performance. This is due to the fact that pruning the appropriate number of reservoir neurons is equivalent to the feature selection process of the model [32]. This process effectively removes redundant and irrelevant features from the reservoir, mitigating their detrimental effect on the network readout, and thus improves the nonlinear approximation capability of DMAP-ESN. However, the performance of DMAP-ESN decreased significantly with further compression of the reservoir size after pruning, such as in Fig. 7(a) when N < 55, in Fig. 7(b) when N < 4, and in Fig. 7(c) when N < 67. This is because excessive pruning results in the removal of important reservoir neurons, with the network failing to detect key features related to the input signal, leading to a degradation in the prediction ability of the DMAP-ESN. Additionally, this also appears to be related to the loss of reservoir richness [33].

On the other hand, from the blue curves in Figs. 7(a)-(c), it is seen that these curves have significant fluctuations, meaning that U-ESN of different sizes produces unreliable results, despite the fact that it may have better performance than DMAP-ESN when the reservoir size is equal in some cases. As discussed at the beginning of this paper, that is why it is a difficult task to choose the appropriate reservoir size to obtain satisfactory performance for an ESN with significant stochastic properties. For this challenge, our method can solve it effectively. At the beginning of the network design, engineers do not need to bother about the size of the reservoir, but just set the size of the reservoir at random, then DMAP can optimize the initialized ESN to a reasonable size and with excellent performance.

5.3 Reservoir richness

To investigate the effect of the removed neurons on the information content of the reservoir, we calculate the average state entropy (ASE) of the reservoir after each pruning operation, and it is a good metric for quantifying dynamical richness [34]. In this paper, the ASE of the reservoir is given by $ASE = \frac{1}{l} \sum_{t = 1}^{l} H (X^{t}),$ (21) where $X^{t} = (x_{1}^{t}, x_{2}^{t}, \dots, x_{N}^{t})$ is the state vector of reservoir neurons at moment t, and H (X^t) is the Shannon entropy of X^t, i.e. $H (X^{t}) = - \sum_{x^{t} \in X^{t}} ρ (x^{t}) log ρ (x^{t}),$ (22) where ρ (·) is the probability mass function.

Fig. 8 shows the ASE of the DMAP-ESN reservoir after each pruning. From this figure, it is observed that the ASE changes only weakly and remains at a high level when pruning the appropriate number of neurons. It indicates that the contribution of the removed neurons in enriching reservoir information content is weak. However, the curves decline rapidly with further increases in the number of the removed neurons, such as in the MSO task and when m > 36. There is no doubt that too much pruning increases the risk of important neurons being removed, which leads to a significant decrease in reservoir information content. As a result, it can confirm the inference we make in Fig. 7. That is, excessive pruning operations can negatively affect the reservoir richness, thereby leading to a loss of network performance.

Fig. 8

ASE curves of DMAP-ESN reservoir during the pruning process, where m denotes the number of removed neurons.

5.4 Memory capacity

To investigate the difference in the ability of DMAP-ESN to encode past input information, we evaluate its memory capacity (MC). As described by jaeger in [35], the ability to reconstruct the input signal from a previous time t can be measured by MC. For a given time delay k, the well-fitting characteristic is measured based on the squared correlation coefficient between the desired output (i.e., input signal delayed by k time steps) and the observed network output y(t) $M C_{k} = \frac{{Cov}^{2} (u (t - k), y (t))}{Var (u (t)) Var (y (t))},$ (23) where cov represents covariance, var represents variance, u(t) is the model input at time step t, and y(t) is the model output for u (t - k). Then, short term memory (STM) capacity can be defined as $MC = \sum_{k = 1}^{T} M C_{k} .$ (24)

Fig. 9 shows the forgetting curves of DMAP-ESN in three prediction tasks when k = 1, 2, . . . , 40, where detCoeff is the square correlation coefficient (e.g., MC_k in Equation (23)). It is seen that DMAP-ESN shows weaker memory capacity than U-ESN in most situations as k increases. Furthermore, Tables 4 lists the STM capacity of DMAP-ESN in three prediction tasks when T = 40. As we observed, DMAP-ESN is beaten without suspense by the competition in three prediction tasks. It is obvious that a smaller reservoir means a weaker memory capacity [36]. Therefore, the number of removed neurons should be limited if we expect the pruned network to retain its STM capacity as much as possible.

Fig. 9

The forgetting curves of DMAP-ESN.

Table 4

STM capacity of DMAP-ESN in three time series prediction tasks

Model	MG	Temp	MSO
U-ESN	39.92	16.81	39.93
DMAP-ESN	39.52	13.47	39.84

6 Conclusion

In this paper, we investigate the problem of structural optimization in ESN modeling for nonlinear regression tasks. Distinguishing from other existing studies, this paper proposes a novel pruning algorithm in the framework of DMC² for optimizing the reservoir structure, namely DMAP. It can completely remove the neurons with high correlation from the reservoir, thereby eliminating their negative impact on the network. This approach makes the optimized network structure more suitable for nonlinear regression tasks. Extensive simulation results show that DMAP-ESN outperforms the comparative models in nonlinear approximation capability and reservoir stability.

So far, our study has only considered the simple situation of using DMAP-ESN for predicting time series with noise-free data, whereas the interference of random noise may lead to degradation of the model performance. In recent work [37], Chen et al. proposed a probabilistic regularization method to optimize the output weights of an ESN by taking into account the distributional information of the modeling error, and it improves the ability of the model to handle noisy data. Inspired by this, we will strive to investigate similar methods to optimize the output weights of DMAP-ESN in the future, aiming to enhance its resistance to noise interference. One potential solution is to optimize the output weights of the network by removing the connections from reservoir neurons with high DMC² to the output layer, and we firmly believe that such an optimization is expected to enhance the robustness of DMCP-ESN to noise interference.

Footnotes

Acknowledgements

This work was supported in part by the Science and Technology Project of Hebei Education Department, Grant ZD2021088, and in part by this work was supported in part by the open fund project from Marine Ecological Restoration and Smart Ocean Engineering Research Center of Hebei Province, Grant HBMESO2315.

References

Jaeger

, The ąřecho stateąś approach to analysing and training recurrent neural networks-with an erratum note, Bonn, Germany: German National Research Center for Information Technology GMD Technical Report 148(34) (2001), 13.

Jaeger

, Haas

, Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication, Science 304(5667) (2004), 78–80.

Zhou

, Han

, Xiao

, Gui

, Adebisi

, Gacanin

, Sari

, Multiscale network traffic prediction method based on deep echo-state network for internet of things, IEEE Internet of Things Journal 9(21) (2022), 21862–874.

Ullah

, Hussain

, Khan

Z.A.

, Haroon

, Baik

S.W.

, Intelligent dual stream cnn and echo state network for anomaly detection, Knowledge-Based Systems 253 (2022), 109456.

Liu

, Li

, Pan

, Loo

C.K.

, Grammatical structure detection by instinct plasticity based echo state networks with genetic algorithm, Neurocomputing 467 (2022), 173–183.

Yang

, Nie

, Qiao

, Wang

, Robust echo state network with sparse online learning, Information Sciences 594 (2022), 95–117.

Rodan

, Tino

, Minimum complexity echo state network, IEEE transactions on Neural Networks 22(1) (2010), 131–144.

Xue

, Li

, The combination of circle topology and leaky integrator neurons remarkably improves the performance of echo state network on time series prediction, PloS one 12(7) (2017), e0181816.

Kawai

, Park

, Asada

, A small-world topology enhances the echo state property and signal propagation in reservoir computing, Neural Networks 112 (2019), 15–23.

10.

Xue

, Zhang

, Slowik

, Automatic topology optimization of echo state network based on particle swarm optimization, Engineering Applications of Artificial Intelligence 117 (2023), 105574.

11.

Y.-C.

, Wang

, Zhang

, Liu

, Modeling datadriven sensor with a novel deep echo state network, Chemometrics and Intelligent Laboratory Systems 206 (2020), 104062.

12.

Scardapane

, Nocco

, Comminiello

, Scarpiniti

, Uncini

, An effective criterion for pruning reservoir’s connections in echo state networks, in 2014 International Joint Conference on Neural Networks (IJCNN). IEEE (2014), 1205–1212.

13.

Liu

, Bai

, Jin

, Wang

, Su

, Kong

, Broad echo state network with reservoir pruning for nonstationary time series prediction, Computational Intelligence and Neuroscience (2022), 2022.

14.

Liu

W.-J.

, Bai

Y.-T.

, Jin

X.-B.

, Su

T.-L.

, Kong

J.-L.

, Adaptive broad echo state network for nonstationary time series forecasting, Mathematics 10(17) (2022), 3188.

15.

Dutoit

, Schrauwen

, Van Campenhout

, Stroobandt

, Van Brussel

and Nuttin

, Pruning and regularization in reservoir computing, Neurocomputing 72(7-9) (2009), 1534–1546.

16.

Wang

, Ni

, Yan

, Optimizing the echo state network based on mutual information for modeling fed-batch bioprocesses, Neurocomputing 225 (2017), 111–118.

17.

, Liu

, Qiao

, Li

, Structure optimization for echo state network based on contribution, Tsinghua Science and Technology 24(1) (2018), 97–105.

18.

Wang

, Wu

Q.J.

, Wang

, Wu

, Yu

, Optimizing simple deterministically constructed cycle reservoir network with a redundant unit pruning auto-encoder algorithm, Neurocomputing 356 (2019), 184–194.

19.

Shen

, Zhang

, Mao

, Improving deep echo state network with neuronal similarity-based iterative pruning merging algorithm, Applied Sciences 13(5) (2023), 2918.

20.

Wang

, Xu

, Fan

, Statistical properties of the detrended multiple cross-correlation coefficient, Communications in Nonlinear Science and Numerical Simulation 99 (2021), 105781.

21.

, Shen

, Cottrell

G.W.

, Deepr-esn: A deep projection-encoding echo-state network, Information Sciences 511 (2020), 152–171.

22.

Zebende

, da Silva Filho

, Detrended multiple crosscorrelation coefficient, Physica A: Statistical Mechanics and its Applications 510 (2018), 91–97.

23.

Guedes

, da Silva Filho

, Zebende

, Detrended multiple cross-correlation coefficient with sliding windows approach, Physica A: Statistical Mechanics and its Applications 574 (2021), 125990.

24.

da Silva Filho

, Zebende

, de Castro

, Guedes

, Statistical test for multiple detrended cross-correlation coefficient, Physica A: Statistical Mechanics and its Applications 562 (2021), 125285.

25.

Millán

, Limberger

, Cumbrera

, Cross-correlating deforestation and atmospheric indexes to local rainfall and temperature patterns through detrended multiple crosscorrelation analysis, 2023.

26.

Wang

, Zhao

, Zheng

, Niu

, Gao

, Li

, A novel time series prediction method based on pooling compressed sensing echo state network and its application in stock market, Neural Networks 2023.

27.

Viehweg

, Worthmann

, Mäder

, Parameterizing echo state networks for multi-step time series prediction, Neurocomputing 522 (2023), 214–228.

28.

Koryakin

, Lohmann

, Butz

M.V.

, Balanced echo state networks, Neural Networks 36 (2012), 35–45.

29.

Bianchi

F.M.

, Livi

, Alippi

, Investigating echo-state networks dynamics by means of recurrence analysis, IEEE Transactions on Neural Networks and Learning Systems 29(2) (2016), 427–439.

30.

Wang

, Yan

, Improved simple deterministically constructed cycle reservoir network with sensitive iterative pruning algorithm, Neurocomputing 145 (2014), 353–362.

31.

, Ren

, Liu

, Han

, Hierarchical echo state network with sparse learning: A method for multidimensional chaotic time series prediction, IEEE Transactions on Neural Networks and Learning Systems 2022.

32.

Liu

, Sun

, Luo

, Yang

, Cao

, Zhai

, Echo state network optimization using binary grey wolf algorithm, Neurocomputing 385 (2020), 310–318.

33.

Sun

, Hao

, Wang

, Li

, Reservoir dynamic interpretability for time series prediction: A permutation entropy view, Entropy 24(12) (2020), 1709.

34.

Ozturk

M.C.

, Xu

, Principe

J.C.

, Analysis and design of echo state networks, Neural computation 19(1), 111–138.

35.

Jaeger

, Short term memory in echo state networks, 2001.

36.

Verstraeten

, Schrauwen

, dąŕHaene

, Stroobandt

, An experimental unification of reservoir computing methods, Neural Networks 20(3), 391–403.

37.

Chen

, Liu

, Li

, Echo state network with probabilistic regularization for time series prediction, IEEE/CAA Journal of Automatica Sinica 10(8) (2023), 1743–1753.

Reservoir structure optimization of echo state networks: A detrended multiple cross-correlation pruning perspective

Abstract

Keywords

1 Introduction

1.1 Related work

1.2 Contributions

2 Echo state network

4 Simulations

5.1 DMC2 Visualization

Footnotes

Acknowledgements

References

5.1 DMC² Visualization