A novel intelligent strategy for probabilistic electricity price forecasting: Wavelet neural network based modified dolphin optimization algorithm

Abstract

To simplify decision making of market participants, a careful and reliable electricity market price forecasting method is indispensable. Nevertheless, due to the Instability in market clearing prices (MCPs), it is rather tough to forecast MCPs accurately. Using probabilistic forecasting is a new solution to overcome the low accuracy of forecast. Transformation from traditional point forecasts to probabilistic interval forecasts is too important to model the uncertainties of forecasts. Thus the decision making activities of market participants are supported against uncertainties and risks effectively. In this paper a hybrid approach to achieve prediction intervals (PIs) of MCPs is proposed that modified dolphin echolocation optimization algorithm (MDEOA) is applied to estimate point forecasts, model uncertainties, and noise variance. This proposed electricity price probabilistic forecasting method is evaluated by a generalized and comprehensive framework. To test the proposed hybrid method, real price data from Ontario, New England, and, Australian electricity markets are used and effectiveness of the method is validated.

Keywords

Probabilistic forecasting wavelet neural network modified dolphin echolocation optimization algorithm wavelet preprocessing prediction intervals

1 Introduction

Throughout the world, the traditional regulated and monopolistic electricity markets are transformed to deregulated and competitive markets. In these deregulated markets, an accurate electricity price forecasting is necessary for market participants in their decision makings such as bidding strategies and investment decisions. Nevertheless, due to time varying nature of power demand and the increasing renewable energies, the electricity price is greatly variable. Therefore, accurate electricity price forecasting is a very difficult work. According to this complex nature of electricity price forecasting, the importance of probabilistic electricity price forecasting is so high. The probabilistic forecasts have ability to considering the inherent uncertainties in prices series so they help to overcome the forecasting risks. Hence, to simplify decision making activities, a careful and reliable probabilistic electricity price forecasting is helpful.

Most research has focused on point electricity price forecasting. Some of before works are based on time series which can be traced autoregressive integrated moving average (ARIMA) [1] and generalized autoregressive conditional heteroscedasticity (GARCH) [2]. In [3], the wavelet method is used to improve performance of the ARIMA model. Although time series reported interesting results in electricity price forecast. But due to their linearity, these techniques can’t predict high non-linear behavior of electricity price signals accurately. To overcome the restrictions of time-series models, neural networks are used for electricity price forecasting [4 –6]. Also, a composition of the NN and ARIMA models are presented in [7].

Interest in probabilistic forecasting methods is increased due to inherent restrictions of point forecasting methods. For example, PIs method is used to consider uncertainties associated with point electricity price forecasting. Using PIs market participants can prepare themselves for best and worst situation. PIs are obtained based on price forecasting and its errors. In [8], ARIMA models are used to predict the price and the errors. However, ARIMA is a linear model that couldn’t pursue the behavior of the electricity price signals. A combination of neural networks and Kalman filter for forecasting of MCP and relevant confident intervals (CI) is used in [9]. In [10], the support vector machine (SVM) is used for point forecasting, and prediction intervals are computed using variance equation and maximum likelihood estimate. also in [11] a combination contains of extreme learning machine (ELM) in order to predict electricity price and uncertainty of prediction and maximum likelihood method to predict the variance of noise is used. In [12], to track rank of principal components a recursive dynamic factor analysis algorithm is used and forecasting is done recursively by Kalman filter. Hence, the covariance and therefore the prediction intervals of electricity prices are predicted. Also a model based on variational heteroscedastic Gaussian process (VHGP) and active learning technique is presented in [13]. In this works VHGP is used to determine heteroscedastic Gaussian prediction intervals. As can be seen, according to an excellent approximation capabilities of neural networks [14], it is widely used in electricity price forecasting [4 , 15].

In this paper, a hybrid approach based on modified dolphin echolocation optimization algorithm (MDEOA), wavelet preprocess, wavelet neural network (WNN) and feature selection technique is proposed. The evolutionary algorithm is used to train WNN. In this algorithm, least squared error (LSE) is considered as objective function. A two stages feature selection technique is presented to select the most efficacious input for forecasting process. Also, bootstrapping technique is used to compute the model uncertainty and PIs. Finally, based on variance of noise and model uncertainty, prediction intervals are achieved.

This method is a hybrid, accurate, and reliable approach for hourly probabilistic electrical price forecasting. To evaluate performance of this method, it is tested on data of Ontario, New England, and Australian markets. Obtained numerical results show the proposed method has effective and efficient performance so it’s usable in practical applications as a device for electricity prices forecasting.

2 Methodology

2.1 Wavelet preprocessing

Wavelet transform is a significant tool for analyses the frequency components of signals that has overcome limitations of Fourier and short-time Fourier transforms. This tool has great ability to extract relevant information from frequency-time, non-periodic, and transient signals. This transform split data into various frequency components and study on each component take place with matched resolution to its scale [16]. In this paper, to decompose price series into a set of producer series that have better behavior, the wavelet transform is used. Forecasting is done for each producer series separately, and actual forecasted price is obtained by inverse wavelet transform. For time series of price, decomposition coefficients of wavelet transform are adjusted by the method that proposed in [3].

$\begin{matrix} p_{mn}^{L} & = & 2^{- (\frac{m}{2})} \sum_{t = 0}^{T - 1} p_{t} L (\frac{t - n . 2^{m}}{2^{m}}) \\ = & 2^{- (\frac{m}{2})} \sum_{t = 0}^{T - 1} p_{t} L_{mn} (t) \end{matrix}$ (1)

Where L (.) is the selected wavelet function, p_t is price value at hour t, T is length of price series, and $p_{mn}^{L}$ is decomposition coefficient that is associated with position n and resolution level m. Calculation speed can be increased with correct expression of this equation as a complication and using efficient fast Fourier transform [17].

Using multi-resolution techniques that are based on father and mother wavelet function as complement is an effective way to apply the wavelet transform. To extract low and high frequency of the series, the father and mother wavelets are used respectively. Because of appropriate mathematical properties of orthogonal wavelet functions, preferably they are selected. So, approximation and detail series (A_m (m = 1, 2, …, M) and D_m (m = 1, 2, …, M)) are defined as follow: $A_{m} = \sum_{n} p_{mn}^{φ} φ_{mn} (t), m = 1, 2, \dots, M$ (2)

And $D_{m} = \sum_{n} p_{mn}^{Ψ} ψ_{mn} (t), m = 1, 2, \dots, M$ (3)

Where $p_{mn}^{φ}$ and $p_{mn}^{Ψ}$ are coefficients that are obtained from Equation (1). φ_mn and ψ_mn are the father and mother wavelet functions that are the solutions of following equations respectively: $φ (t) = \sum_{k = - \infty}^{\infty} α_{k} \sqrt{2} φ (2 t - k)$ (4)

And $ψ (t) = \sum_{k = - \infty}^{\infty} (- 1)^{k} α_{- k + 1} \sqrt{2} ψ (2 t - k)$ (5)

More details of above equations are achieved from [20]. The price series in time domain p_t (t = 1, 2, …, T) can be reconstructed as follows: $p_{t} = D_{1} + A_{1} + \dots + D_{M} + A_{M}$ (6)

Daubechies wavelet is most appropriate case for non-stationary series [18] and in this work is used.

2.2 Two stages feature selection

The electricity prices series can be considered as a non-linear mapping of effective variables like historical price. By assuming that historical price or any other effective parameter is available, a set of inputs (candidate features) is made that shown by CF(t).

In order to determine length of CF(t), a compromise between the number of candidate features and the length of inputs data must be done. If these values were big, more information is available for forecasting engine but computation burden is high and vice versa. Thus, a subset of CF(t) shall be selected by a feature selection technique. This subset must have the most influential features and the others must be filtered. This process is shown in Fig. 1.

In [19, 20], single-stage and two-stage feature selection techniques were presented. These techniques are based on information-theoretic criterion of mutual information (MI). It also in [21] for wind power forecast is applied. This two-stage technique also is used here. In following a brief description of this method is provided, detailed in [20] is available.

I. First stage (irrelevancy filter): In this stage, the MI between each candidate feature x (t) (a member of CF(t)) with target variable is calculated by the following equation:

$\begin{matrix} MI (x, y) & = & \sum_{i = 1}^{n} \sum_{j = 1}^{m} P (x_{i}, y_{j}) \\ {log}_{2} (\frac{P (x_{i}, y_{j})}{P (x_{i}) P (y_{j})}) \end{matrix}$ (7)

Where P (X) is the probability distribution of X. The equation generally is between two variables x with n members and y with m members. Bigger amount provided by the equation indicates that this candidate feature has more common information with target variable. As a result, x (t) is a highly relevant feature. The candidate features are regular basis on their MI throwing from the above equation, and feature with larger MI is ranked higher. Set of features with MI bigger than a threshold T₁ as relevant features are selected to forecast and the subset is shown as CF₁(t). Other features known as irrelevant entries and are put in the subset CF₁’(t) (Fig. 1).

II. Second stage (redundancy filter): In this stage, redundancy of features in CF₁(t) is founded and filtered. For the purpose, between both x₁ (t) and x₂ (t) (x₁, x₂ ∈ CF₁), MI is calculated that is shown to form of MI [x₁, x₂]. The bigger value represents more common information and redundancy between two features. Thus, the redundancy criterion for each x_k (t) from CF₁(t) with other members is calculated as follows: $RC [x_{k} (t)] = max_{x_{l} (t) \in {CF}_{1} (t) - {x_{k} (t)}} (MI [x_{k} (t), x_{l} (t)])$ (8)

A feature with bigger RC represents more redundancy or feature with less information. Features can be ranked according redundancy and if a feature has a RC bigger than a threshold T₂ is indicates that redundancy of the feature with another feature is too much and one of them should be removed. To do this, if RC of x_k (t) is bigger than T₂, the feature x_j (t) that have biggest MI with x_k (t) (MI [x_k (t) , x_j (t)] > MI [x_k (t) , x_l (t)] , x_l (t) ∈ CF_1^- {x_k (t) , x_j (t)}) is identified.

Now to choose between x_k (t) and x_j (t) to be deleted, their MIs with target variable are used and feature with smaller MI (less relevant) is deleted. This operation is performed on all members of CF₁(t) until no feature was available with RC bigger than T₂. The remained subseries known as CF₂(t) (Fig. 1) that is a relevant and non-redundant set. Setting the thresholds T₁ and T₂ is done with a compromise between quality and number of input features.

2.3 Developed wavelet neural network (WNN)

In the recent works [22 –24] the wavelet transform as a preprocessing in electricity prices forecasting is used. Another approach is using wavelet in forecasting process through the wavelet neural network construction which a wavelet function as activation function of hidden neurons is considered. According to wavelet’s local properties and the ability of adapting wavelet shape according to training data instead of adapting parameters of the activation function with fixed shape [25], WNNs high ability to generalize compared with classical forward neural networks are provided.

In this work a WNN with Morlet wavelet as activation function of hidden neurons is used for forecasting the price of electricity. Figure 2 shows the structure of a three layer forward WNN. In the figure X = [x₁, x₂, …, x_m] is the input vector and y is the target variable.

Forecasting engine produce the input-output mapping from x to y. In the hidden layer, activation function of each neuron is determined as follows: $\begin{matrix} f_{i} (x_{1}, x_{2}, \dots, x_{m}) = \prod_{j = 1}^{m} ψ_{a_{i}, b_{i}} (x_{j}) \forall i = 1, 2, \dots n \end{matrix}$ (9) $ψ_{a_{i}, b_{i}} (x_{j}) = ψ (\frac{x_{j} - b_{i}}{a_{i}})$ (10)

Where n is the number of hidden layer’s neurons. ψ (.) is defined as follows: $ψ (x) = e^{- 0.5 x^{2}} cos (5 x)$ (11)

In Equations (9) and (10), ψ_{a_i,b_i} (.) is the shifted and scaled version of the ψ (.) with a_i and b_i as shift and scale parameters. Finally, the output is calculated as follows: $y = \sum_{i = 1}^{n} w_{i} F_{i} (x_{1}, x_{2}, \dots, x_{m}) + \sum_{j = 1}^{m} v_{j} x_{j}$ (12)

In the above equation, w_i is weight between the hidden neuron i and output and v_j is weight between input j and output. In other words, the output of WNN is obtained by a combination of input and output of wavelet functions. With respect to the above, vector of free parameters of WNN is as follows:

$\begin{matrix} Z & = & [v_{1}, \dots, v_{m}, w_{1} \dots, w_{n}, \\ a_{1}, \dots, a_{n}, b_{1}, \dots, b_{n}] \end{matrix}$ (13)

As a result, WNN has np = 3n + m free parameters.

2.4 Modified dolphin echolocation optimization algorithm (MDEOA)

The dolphin echolocation algorithm is a new optimization algorithm. The algorithm has specific characteristics like simple concept, easy implementation, and fast convergence. Best conducted study on marine mammals is related to the dolphins. A dolphin can produce sounds in the form of clicks in high-frequency. When the sound hit to the target, the number of its energy returned and recovered to dolphin. As soon as the sound received, dolphin produces other sounds again. With time delay between the sounds and their returns, the dolphin can assess its distance from the target. By sending and receiving reflected clicks in continuous, the dolphin can follow objects [26].

This algorithm, simulate the dolphin’s echolocation and limiting the search related by distance from the target. For defined this process more clearly, two phases are introduced: In the first phase, the algorithm evaluate all space search to form that to a general search space, so it should be looking for unexplored areas. This task is done by create a series of random locations in the search space. In the second phase concentrate to evaluate the best places from the first phase. These are inherent characteristics of all evolutionary algorithms [27].

Before then begin optimization, the search space must be classified according to the following rules:

For each variable that must be optimized during the process, alternatives of the search space have to be sort ascending or descending. If a variable has more than one characteristic, the sorting is done for the most important [31 –33]. Using this method for the variable i, vector A_i is made with length N_i that include all possible scenarios of variable i.

By placing these vectors as columns of a matrix, the alternatives matrix (AM) is made in dimensions M * N_v. Where M is the maximum length of vectors A_i and N_v is the number of variables. In addition, the curve of convergence factor (CF) that it should change during the optimization process must be specified. Here, the convergence factor is determined as follows: $CF (I_{i}) = {PP}_{1} + (1 - {PP}_{1}) - (\frac{I_{i}^{p} - 1}{I_{m}^{p} - 1})$ (14)

Where PP is predefined probability and PP₁ is convergence factor in the first iteration that answers are chosen randomly. I_i, I_m, and p are iteration i, maximum iteration, and degree of the curve, respectively.

The main steps of dolphin echolocation optimization algorithm are as follow.

Step 1: Generate N_l random locations for dolphin and also create matrix L (alternatives matrix) with dimensions N_l * N_v. Where N_l and N_v are the number of locations and variables, respectively.

Step 2: Calculate CF for current iteration by Equation (14).

Step 3: Calculate the fitness function for each of locations.

Step 4: Calculate the accumulative fitness by rules of dolphins as follow:

A: The location L (i, j) in column j of the alternatives matrix is founded, and name it as A.

for i = 1 to number of locations

for j = 1 to number of variables

for -R_e < k < R_e $af (A + k)_{j} = 1 / - R_{e} (R_{e} - | k | F (i)) + af (A + k)_{j}$ (15)

Where af (A + k) _j is the accumulative fitness of the alternative A + k to be chosen for the variable j. R_e is the effective radius that accumulative fitness of A’s neighbors are impressed with its fitness. F (i) is the fitness of location i. It is noteworthy that if alternatives are selected on the edges (A + k is invalid i.e. A + k < 0 or A + k > L_i) the value of af is calculated using reflection characteristic. In this case, if distance between alternative and edges is less than R_e, assuming that there is a similar alternative like that a mirror be placed on edges.

B: For equal probability distribution in the search space, a small amount φ is added to all arrays as af = af + ɛ.

C: Find the best location in current iteration and call it “the best location”. Find alternative assigned to variables of the best location and put them af equal to zero. In other words:

for i = 1 to number of variables

for j = 1 to number of alternative

if i = the best location(j) ${af}_{ij} = 0$ (16)

Step 5: For the variable j probability of selection alternative i is calculated based on the equation described in below: $P_{ij} = {af}_{ij} / - \sum_{i = 1}^{L_{j}} {af}_{ij}$ (17)

Step 6: Determine the probability for all alternatives of all variables of the best location equal to PP and then the rest divide between other alternative as follow:

for i = 1 to number of variables

for j = 1 to number of alternative

if i = the best location(j) $P_{ij} = PP$ (18)

else $P_{ij} = (1 - PP) P_{ij}$ (19)

Step 7: Calculate the locations in the next iteration using the probabilities available for each alternative. Repeat steps 2 to 6 until achieve maximum iteration.

2.4.1 Modifying

To improve and have powerful algorithm, a new modification on the algorithm is described in this section. Call the best place as l_best. In fact, the speed of each location l_i to the best location is determine by distance between l_i and l_best. In each iteration, this distance is calculated based as follows: $D_{i, best} = ∥ l_{best} - l_{i} ∥ = \sqrt{\sum_{k = 1}^{N} (l_{best, k} - l_{i, k})}$ (20)

Where N is the length of control vector. Now absorption parameter is defined for the best location as follows: $β = β_{0} exp (- D_{i, best})$ (21)

Where β₀ is the absorption parameter in distance zero. Considering the above equation, absorption parameter has inversely related to the distance. Now the location l_i is updated by following equations: $l_{i}^{new} = β l_{i} + (1 - β) l_{best} + u_{k}$ (22) $u_{k} = γ (rand (.) - 0.5)$ (23)

In Equation (22), two first term is for making balance between l_best and l_i, and third norm is for a random movement in search space. γ is absorption coefficient which is defined to control click rate.

3 Probabilistic forecasting

3.1 PI formulation

Uncertainty of forecasting model is the main factor of uncertainty in price forecasting. The uncertainty is happened due to incorrect structure and parameters of neural networks. The noise of training data is another reason for uncertainty of forecasting. Random behavior of data regression causes this type of uncertainty. For a set of separate pairs as {(x_i, t_i)} , i = 1, 2, …, N, purpose of forecasting is expressed as follows: $t_{i} = r (x_{i}) + e (x_{i})$ (24)

Where t_i is forecast target i and x is inputs vector. e (.) is a noise with zero mean and r (.) is mean of true regression. It is assumed that the noise has a normal distribution and variance ${\hat{σ}}_{e}^{2}$ that is related to the input variables. According to [10, 31], this assumptions are reasonable. Trained neural network $\hat{r} (.)$ , can be considered as an estimator of true regression r (.). In such a case, forecasting error can be expressed as follows: $t_{i} - \hat{r} (x_{i}) = [r (x_{i}) - \hat{r} (x_{i})] + e (x_{i})$ (25)

In above equation, $t_{i} - \hat{r}$ and $r - \hat{r}$ represent forecasting error and neural network estimation error according to true regression, respectively. Assuming that the noise and estimation error are statistically independent, total variance of forecasting error ${\hat{σ}}_{t}^{2}$ is equal to sum of the model uncertainty variance ${\hat{σ}}_{r}^{2}$ and noise variance ${\hat{σ}}_{e}^{2}$ . ${\hat{σ}}_{t}^{2} (x_{i}) = {\hat{σ}}_{r}^{2} (x_{i}) + {\hat{σ}}_{e}^{2} (x_{i})$ (26)

For a set of data as {(x_i, t_i)}, the PIs with (1-α) ×100% confidence level are shown as [D_t,α (x_i) , H_t,α (x_i)] and are expressed as follow: ${\begin{matrix} D_{t, α} (x_{i}) = \hat{r} (x_{i}) - (z_{1 - α / - 2} \sqrt{{\hat{σ}}_{t}^{2} (x_{i})}) \\ H_{t, α} (x_{i}) = \hat{r} (x_{i}) + (z_{1 - α / - 2} \sqrt{{\hat{σ}}_{t}^{2} (x_{i})}) \end{matrix}$ (27)

Where D_t,α (.) and H_t,α (.) indicate down bound and high bound of PI respectively. z_1-α/2 is critical value of standard normal distribution that associate on confidence level.

3.2 Bootstrapping method

Bootstrapping technique is a powerful device for statistical inference processes, which was introduced in 1979 [29, 30]. Bootstrapping is mainly based on using the sample of a population for inferences about that population. Simplicity and robustness are advantages of bootstrapping method in comparison with methods based on parametric assumptions. Other methods in computing the standard error are unreliable and complex [18]. The bootstrapping technique is used in order to compose the uncertainty of forecasting model in PIs, which this process can be expressed as follow:

Step 1: Obtain $\hat{r} (.)$ for the given training data {(x_i, t_i)}.

Step 2: Calculate the error between forecasting output and original targets as $\hat{e} = t_{i} - \hat{r} (x_{i})$ .

Step 3: Transfer the center of resulting errors to zero as ${\tilde{e}}_{i} = {\hat{e}}_{i} - ((\sum_{i} {\hat{e}}_{i}) / - N))$ .

Step 4: Insert this new errors instead of previous errors then calculate new targets using equation ${\tilde{t}}_{i} = r (x_{i}) + \tilde{e} (x_{i})$ and define new training data in form of {(x_i, t_i)}.

Step 5: Forecast ${\hat{r}}_{q} (x_{i})$ for new training data in bootstrapping iteration q.

Step 6: Repeat steps 2 to 6 until achieve to B bootstrapping iterations.

At the end of these steps, training data is created B times and in each time, neural networks are trained. Now, mean of trained neural networks output is expressed as follows. $\hat{r} (x_{i}) = (\sum_{q} {\hat{r}}_{q} (x_{i})) / - B$ (28)

In this equation, ${\hat{r}}_{q} (x_{i})$ represent the forecasted value in bootstrapping iteration q. Finally, variance of model uncertainty is estimated as follows: ${\hat{σ}}_{r}^{2} (x_{i}) = (\sum_{q} ({\hat{r}}_{q} (x_{i}) - \hat{r} (x_{i}))^{2}) / - (B - 1)$ (29)

3.3 Noise variance

For modeling the uncertainty of data noise, it’s required to compute the variance of noise. According to Equation (25), variance of estimated noise is obtained as follows: ${\hat{σ}}_{e}^{2} = E [(t - \hat{r})^{2}] - {\hat{σ}}_{r}^{2}$ (30)

To estimate the model with the purpose of fitting extra residual, square of residuals set can be defined as follows: $R^{2} (x_{i}) = max ([t_{i} - \hat{r} (x_{i})]^{2} - {\hat{σ}}_{r}^{2} (x_{i}), 0)$ (31)

Now a new set of training data with positive output is defined as {(x_i, R² (x_i))}, i = 1, 2, …, N. To model the uncertainty of data noise, a neural network is trained by this new data and its output and variance of main forecast are added. Therefore, B + 1 forecasts are done.

3.4 PI evaluation

In this section, to evaluate the proposed method, the reliability evaluation (RE) criterion and sharpness evaluation (SE) criterion of PI are investigated. Reliability is a significant feature for validity and authenticity of PIs. It is expected with normal probability equal to (1–α)×100%, outputs would be close to the main targets that is called prediction interval nominal confidence (PINC). For N_t available data samples, prediction interval coverage probability (PICP) is expressed as follows: $PICP = (\sum_{i = 1}^{N_{t}} I_{i, α}) / - N_{t}$ (32)

Where I_i,α is PICP Index that is achieved as follows: $I_{i, α} = {\begin{matrix} 1 & , t_{i} \in [D_{t, α} (x_{i}), H_{t, α} (x_{i})] \\ 0 & , t_{i} \notin [D_{t, α} (x_{i}), H_{t, α} (x_{i})] \end{matrix}$ (33)

PICP indicates the reliability of PIs and when reliability is high, this value must asymptotically attain to PINC. Therefore, the average coverage error (ACE) is expressed as difference between PICP and PINC. This index reflects the quality of PIs. For PIs with high quality, ACE should be as much as possible close to zero. $ACE = PICP - PINC$ (34)

High reliability prediction intervals can be easily achieved by increasing the width of the intervals. However, it is illogical for practical applications. So, sharpness evaluation criterion for measuring the average of prediction intervals width is used. This measure reflects the ability of the model to focus on uncertainty of probabilistic forecasting. This criterion is calculated as follows: $SE = (\sum_{i = 1}^{N_{t}} [H_{t, α} (x_{i}) - D_{t, α} (x_{i})]) / - N_{t}$ (35)

Smaller value for SE reflects the narrower the prediction intervals and better performance.

4 Model implementation

In this section, implementation and considerations of model are presented. In Fig. 3 for better perception, a flowchart of proposed model performance is presented. Elucidation for various parts of the flowchart is provided in follow.

First of all, the length of training data is determined. The purpose is forecasting for a week and data of two previous months is used for training. Historical price is the only data which is used as input. The desired accuracy and ability to compare proposed model performance with previous works are the reasons for this choice. Due to the high number of training data, which enlarge neural network and increase computation burden, the feature selection is used to reduce the input parameters. Historical data is given to feature selection and irrelevancy and redundancy filters are applied on them. Therefore, hours with higher effect on next hour are defined as inputs. For example, more efficient data of Nov 2014 in Ontario and Australian electricity markets is shown in Table 1.

As mentioned in previous sections, the wavelet preprocessing is used to improve the forecasting performance. In this paper, daubechies (dB4) wavelet transform is used to turn two months training data set into four subseries (Fig. 4). Two of them are approximation subseries (A₁, A₂) and two others are details subseries (D₁, D₂). Now for training and price forecasting, these subseries are used. To compose the uncertainty of model using bootstrapping, B neural networks are trained by each one of these subseries using modified dolphin echolocation optimization algorithm. Using inverse wavelet transform, outputs of model uncertainty neural networks are backed into time domain. Based on original data and outputs of model uncertainty part, a neural network is trained to model uncertainty of data noise. As a result, 4B + 1 neural networks are trained for each hour. Outputs of model uncertainty and noise data uncertainty are used to calculate PIs. The process of these 4B + 1 neural networks is shown in Fig. 5.

Then, the forecasting hour is checked and process is done if time period was over (forecasting for hour 168 is done). If the time period is not over, forecasted result of current hour is used to forecast next hour by adding it to end of inputs data. Until end of the forecast period this process is continued.

5 Numerical results

The proposed model to forecast electricity price for the year 2014 from Ontario (IESO), Australian (ANEM), and New England electricity markets so as to have encyclopedic assessment of its performance is exerted. Historical price of markets is taken from [34 –36], respectively. Information of the Australian electricity market is related to the South Australia. In all markets, hourly prices are considered as inputs of model.

To compare and validate the proposed model, some other methods based on same data are considered. First method is persistence that is a simple solution to time series forecasting. In this method current price is obtained directly from price of a previous day. In second method to calculate the PI, bootstrapping is used and neural networks are trained by Back Propagation (BP) technique. Third method is presented in [18], which is based on bootstrapping and ELM. In this method the data noise is considered in calculation of PI.

For year 2014 in the Ontario, Australian, and New England electricity markets and for two confidence level (90% and 95%), forecasting is done by these four methods. The results are represented in Tables 2 to 6. These tables are related to May 2014 and Nov 2014 of Ontario market, May 2014 and Oct 2014 of Australian electricity market, and May 2014 of New England electricity market, respectively.

According to the information of these Tables, the proposed method has better performance to comparison with other methods in most tests. In the proposed hybrid method, value of PICP was very close to PINC in all tests, therefore ACE is too small. Absolute value of ACE for the proposed method in all of the tests isn’t more than 2% and also value of SE evaluation in most tests is less than other methods. These results are shown a very good performance method.

Persistence is a very simple method and doesn’t have good result, therefore isn’t suitable for this application. Although in some cases BP method has good value of ACE, but it doesn’t have good performance in all of the tests. Also training neural networks with BP method takes high computation burden, therefore this method can’t be a reasonable choice for probabilistic electricity prices forecasting. Low computation burden is advantage of the ELM method, but this method can implement only on single layer neural networks. The proposed method has higher accuracy compared with ELM method. To use wavelet preprocess makes a good performance method compared with the method that proposed in [18]. However, ELM method has acceptable results so computation burden has vital impact to designate these two methods. The proposed method has better performance and higher computation burden, whereas ELM method has weaker performance and lower computation burden. As regards, because of weekly horizon forecast, there is no time restriction and according to good performance of the proposed hybrid method, this method is more reasonable.

The graphically results of probabilistic forecasting using proposed method are shown in Figs. 6 to 10. This results are for two weeks of 2014, in the Ontario and the Australian electricity market, and one week of 2014, in New England electricity market. Confidence level of 95% for three cases and 90 % for two cases are considered. These results comprise actual value of prices and high and down bound of prediction intervals.

Above numerical and graphical results show high ability of proposed method to follows prices time series with well reliability. This reliability and flexibility of the proposed method prove that it’s usable in practical electricity markets probability forecasting.

Historical price is the only neural networks input in this paper, whereas electrical price depends on other variables chronological events, weather conditions, bidding strategy of participants etc. By adding these parameters as neural networks entrance, performance of the method can be improved.

6 Conclusion

An applicable method for electricity price forecasting can help market participants in decision making processes. Price time series has really nonlinear behavior and depends on a lot of parameters, so forecasting always has inevitable errors. To use more efficient tools and their combination is a common way to electricity price forecasting. In this paper, modified dolphin echolocation optimization algorithm, wavelet preprocessing, WNN, and feature selection technique are combined in order to provide a high performance method. In this hybrid method, uncertainty of model and uncertainty of data noise is composed. By considering both uncertainties, prediction intervals are obtained. Finally, data of Ontario, New England, and Australian electricity markets is used to test the proposed method that provide acceptable results. High performance in uncertainty modeling and high training accuracy made the method a good choice for probabilistic electricity price forecasting.

References

Contreras

, Espinola

, Nogales

F.J.

and Conejo

A.J.

, ARIMA models to predict next-day electricity prices, Power Systems, IEEE Transactions on18 (2003), 1014–1020.

Garcia

R.C.

, Contreras

and Van Akkeren

and Garcia

J.B.C

, A GARCH forecasting model to predict day-ahead electricity prices, Power Systems, IEEE Transactions on20 (2005), 867–874.

Conejo

A.J.

, Plazas

M.A.

, Espinola

and Molina

A.B.

, Day-ahead electricity price forecasting using the wavelet transform and ARIMA models, Power Systems, IEEE Transactions on20 (2005), 1035–1042.

, Dong

and Liu

, Neural network models for electricity market forecasting, in Proc of 4th Australian Workshop on Signal Processing and its applications, Australia, 1996.

, Dong

and Liu

, Short-term electricity price forecasting using wavelet and SVM techniques, in Dynamics of Continuous Discrete and Impulsive Systems-Series B-Applications, 2003, pp. 372–377.

Amjady

, Day-ahead price forecasting of electricity markets by a new fuzzy neural network, Power Systems, IEEE Transactions on21 (2006), 887–896.

Areekul

, Senjyu

, Toyama

and Yona

, Notice of violation of IEEE publication principles a hybrid ARIMA and neural network model for short-term price forecasting in deregulated market, Power Systems, IEEE Transactions on25 (2010), 524–530.

Zhou

, Yan

, Ni

, Li

and Nie

, Electricity price forecasting with confidence-interval estimation through an extended ARIMA approach, IEE Proceedings-Generation, Transmission and Distribution153 (2006), 187–195.

Zhang

and Luh

P.B.

, Neural network-based market clearing price prediction and confidence interval estimation with an improved extended Kalman filter method, Power Systems, IEEE Transactions on20 (2005), 59–66.

10.

Zhao

J.H.

, Dong

Z.Y.

, Xu

and Wong

K.P.

, A statistical approach for interval forecasting of the electricity price, Power Systems, IEEE Transactions on23 (2008), 267–276.

11.

Chen

, Dong

Z.Y.

, Meng

, Xu

, Wong

K.P.

and Ngan

, Electricity price forecasting with extreme learning machine and bootstrapping, Power Systems, IEEE Transactions on27 (2012), 2055–2062.

12.

, Chan

, Tsui

and Hou

, A new recursive dynamic factor analysis for point and interval forecast of electricity price, IEEE Transactions on Power Systems28 (2013), 2352–2365.

13.

Kou

, Liang

, Gao

and Lou

, Probabilistic electricity price forecasting with variational heteroscedastic Gaussian process and active learning, Energy Conversion and Management89 (2015), 298–308.

14.

Hornik

, Stinchcombe

and White

, Multilayer feedforward networks are universal approximators, Neural networks2 (1989), 359–366.

15.

Szkuta

, Sanabria

and Dillon

, Electricity price short-term forecasting using artificial neural networks, Power Systems, IEEE Transactions on14 (1999), 851–857.

16.

Graps

, An introduction to wavelets, Computational Science & Engineering, IEEE2 (1995), 50–61.

17.

Nievergelt

and Nievergelt

, Wavelets made easy, vol. 174: Springer, 1999.

18.

Wan

, Xu

, Wang

, Dong

Z.Y.

and Wong

K.P.

, A hybrid approach for probabilistic forecasting of electricity price, Smart Grid, IEEE Transactions on5 (2014), 463–470.

19.

Amjady

and Keynia

, Day-ahead price forecasting of electricity markets by mutual information technique and cascaded neuro-evolutionary algorithm, Power Systems, IEEE Transactions on24 (2009), 306–318.

20.

Amjady

and Keynia

, Electricity market price spike analysis by a hybrid data model and feature selection technique, Electric Power Systems Research80 (2010), 318–327.

21.

Chitsaz

, Amjady

and Zareipour

, Wind power forecast using wavelet neural network trained by improved Clonal selection algorithm, Energy Conversion and Management89 (2015), 588–598.

22.

Shrivastava

N.A.

and Panigrahi

B.K.

, A hybrid wavelet-ELM based short term price forecasting for electricity markets, International Journal of Electrical Power & Energy Systems55 (2014), 41–50.

23.

Catalao

, Pousinho

and Mendes

, Hybrid wavelet-PSO-ANFIS approach for short-term electricity prices forecasting, IEEE Transactions on Power Systems26 (2011), 137–144.

24.

Mandal

, Haque

A.U.

, Meng

, Srivastava

A.K.

and Martinez

, A novel hybrid approach using wavelet, firefly algorithm, and fuzzy ARTMAP for day-ahead electricity price forecasting, Power Systems, IEEE Transactions on28 (2013), 1041–1051.

25.

Pindoriya

, Singh

and Singh

, An adaptive wavelet neural network-based energy price forecasting in electricity markets, Power Systems, IEEE Transactions on23 (2008), 1423–1432.

26.

May

, The Greenpeace book of dolphins:, Sterling Pub. Co, 1990.

27.

Najibi

and Niknam

, Stochastic scheduling of renewable micro-grids considering photovoltaic source uncertainties, Energy Conversion and Management98 (2015), 484–499.

28.

Kavousi-Fard

, Niknam

and Khooban

M.H.

, Intelligent stochastic framework to solve the reconfiguration problem from the reliability view, Science, Measurement & Technology, IET8(5) (2014), 245–259.

29.

Soltanpour

M.R.

, Khooban

M.H.

and Soltani

, Robust fuzzy sliding mode control for tracking the robot manipulator in joint space and in presence of uncertainties, Robotica32(03) (2014), 433–446.

30.

Khooban

M.H.

and Niknam

, A new intelligent online fuzzy tuning approach for multi-area load frequency control: Self Adaptive Modified Bat Algorithm, International Journal of Electrical Power & Energy Systems71 (2015), 254–261.

31.

Hamilton

J.D.

, Time series analysis, vol. 2: Princeton university press, Princeton, 1994.

32.

Efron

, Bootstrap methods: Another look at the jackknife, The annals of Statistics, 1979, pp. 1–26.

33.

Efron

and Tibshirani

R.J.

, An introduction to the bootstrap:, CRC press,, 1994.

34.

Website of Ontario electricity market, http://www.ieso.ca.

35.

Website of Australian electricity market, http://www.aemo.com.au.

36.

Website of New England electricity market, http://www.iso-ne.com.