Abstract
To simplify decision making of market participants, a careful and reliable electricity market price forecasting method is indispensable. Nevertheless, due to the Instability in market clearing prices (MCPs), it is rather tough to forecast MCPs accurately. Using probabilistic forecasting is a new solution to overcome the low accuracy of forecast. Transformation from traditional point forecasts to probabilistic interval forecasts is too important to model the uncertainties of forecasts. Thus the decision making activities of market participants are supported against uncertainties and risks effectively. In this paper a hybrid approach to achieve prediction intervals (PIs) of MCPs is proposed that modified dolphin echolocation optimization algorithm (MDEOA) is applied to estimate point forecasts, model uncertainties, and noise variance. This proposed electricity price probabilistic forecasting method is evaluated by a generalized and comprehensive framework. To test the proposed hybrid method, real price data from Ontario, New England, and, Australian electricity markets are used and effectiveness of the method is validated.
Keywords
Introduction
Throughout the world, the traditional regulated and monopolistic electricity markets are transformed to deregulated and competitive markets. In these deregulated markets, an accurate electricity price forecasting is necessary for market participants in their decision makings such as bidding strategies and investment decisions. Nevertheless, due to time varying nature of power demand and the increasing renewable energies, the electricity price is greatly variable. Therefore, accurate electricity price forecasting is a very difficult work. According to this complex nature of electricity price forecasting, the importance of probabilistic electricity price forecasting is so high. The probabilistic forecasts have ability to considering the inherent uncertainties in prices series so they help to overcome the forecasting risks. Hence, to simplify decision making activities, a careful and reliable probabilistic electricity price forecasting is helpful.
Most research has focused on point electricity price forecasting. Some of before works are based on time series which can be traced autoregressive integrated moving average (ARIMA) [1] and generalized autoregressive conditional heteroscedasticity (GARCH) [2]. In [3], the wavelet method is used to improve performance of the ARIMA model. Although time series reported interesting results in electricity price forecast. But due to their linearity, these techniques can’t predict high non-linear behavior of electricity price signals accurately. To overcome the restrictions of time-series models, neural networks are used for electricity price forecasting [4–6]. Also, a composition of the NN and ARIMA models are presented in [7].
Interest in probabilistic forecasting methods is increased due to inherent restrictions of point forecasting methods. For example, PIs method is used to consider uncertainties associated with point electricity price forecasting. Using PIs market participants can prepare themselves for best and worst situation. PIs are obtained based on price forecasting and its errors. In [8], ARIMA models are used to predict the price and the errors. However, ARIMA is a linear model that couldn’t pursue the behavior of the electricity price signals. A combination of neural networks and Kalman filter for forecasting of MCP and relevant confident intervals (CI) is used in [9]. In [10], the support vector machine (SVM) is used for point forecasting, and prediction intervals are computed using variance equation and maximum likelihood estimate. also in [11] a combination contains of extreme learning machine (ELM) in order to predict electricity price and uncertainty of prediction and maximum likelihood method to predict the variance of noise is used. In [12], to track rank of principal components a recursive dynamic factor analysis algorithm is used and forecasting is done recursively by Kalman filter. Hence, the covariance and therefore the prediction intervals of electricity prices are predicted. Also a model based on variational heteroscedastic Gaussian process (VHGP) and active learning technique is presented in [13]. In this works VHGP is used to determine heteroscedastic Gaussian prediction intervals. As can be seen, according to an excellent approximation capabilities of neural networks [14], it is widely used in electricity price forecasting [4, 15].
In this paper, a hybrid approach based on modified dolphin echolocation optimization algorithm (MDEOA), wavelet preprocess, wavelet neural network (WNN) and feature selection technique is proposed. The evolutionary algorithm is used to train WNN. In this algorithm, least squared error (LSE) is considered as objective function. A two stages feature selection technique is presented to select the most efficacious input for forecasting process. Also, bootstrapping technique is used to compute the model uncertainty and PIs. Finally, based on variance of noise and model uncertainty, prediction intervals are achieved.
This method is a hybrid, accurate, and reliable approach for hourly probabilistic electrical price forecasting. To evaluate performance of this method, it is tested on data of Ontario, New England, and Australian markets. Obtained numerical results show the proposed method has effective and efficient performance so it’s usable in practical applications as a device for electricity prices forecasting.
Methodology
Wavelet preprocessing
Wavelet transform is a significant tool for analyses the frequency components of signals that has overcome limitations of Fourier and short-time Fourier transforms. This tool has great ability to extract relevant information from frequency-time, non-periodic, and transient signals. This transform split data into various frequency components and study on each component take place with matched resolution to its scale [16]. In this paper, to decompose price series into a set of producer series that have better behavior, the wavelet transform is used. Forecasting is done for each producer series separately, and actual forecasted price is obtained by inverse wavelet transform. For time series of price, decomposition coefficients of wavelet transform are adjusted by the method that proposed in [3].
Where L (.) is the selected wavelet function, p t is price value at hour t, T is length of price series, and is decomposition coefficient that is associated with position n and resolution level m. Calculation speed can be increased with correct expression of this equation as a complication and using efficient fast Fourier transform [17].
Using multi-resolution techniques that are based on father and mother wavelet function as complement is an effective way to apply the wavelet transform. To extract low and high frequency of the series, the father and mother wavelets are used respectively. Because of appropriate mathematical properties of orthogonal wavelet functions, preferably they are selected. So, approximation and detail series (A
m
(m = 1, 2, …, M) and D
m
(m = 1, 2, …, M)) are defined as follow:
And
Where and are coefficients that are obtained from Equation (1). φ
mn
and ψ
mn
are the father and mother wavelet functions that are the solutions of following equations respectively:
And
More details of above equations are achieved from [20]. The price series in time domain p
t
(t = 1, 2, …, T) can be reconstructed as follows:
Daubechies wavelet is most appropriate case for non-stationary series [18] and in this work is used.
The electricity prices series can be considered as a non-linear mapping of effective variables like historical price. By assuming that historical price or any other effective parameter is available, a set of inputs (candidate features) is made that shown by CF(t).
In order to determine length of CF(t), a compromise between the number of candidate features and the length of inputs data must be done. If these values were big, more information is available for forecasting engine but computation burden is high and vice versa. Thus, a subset of CF(t) shall be selected by a feature selection technique. This subset must have the most influential features and the others must be filtered. This process is shown in Fig. 1.
In [19, 20], single-stage and two-stage feature selection techniques were presented. These techniques are based on information-theoretic criterion of mutual information (MI). It also in [21] for wind power forecast is applied. This two-stage technique also is used here. In following a brief description of this method is provided, detailed in [20] is available.
I. First stage (irrelevancy filter): In this stage, the MI between each candidate feature x (t) (a member of CF(t)) with target variable is calculated by the following equation:
Where P (X) is the probability distribution of X. The equation generally is between two variables x with n members and y with m members. Bigger amount provided by the equation indicates that this candidate feature has more common information with target variable. As a result, x (t) is a highly relevant feature. The candidate features are regular basis on their MI throwing from the above equation, and feature with larger MI is ranked higher. Set of features with MI bigger than a threshold T1 as relevant features are selected to forecast and the subset is shown as CF1(t). Other features known as irrelevant entries and are put in the subset CF1’(t) (Fig. 1).
II. Second stage (redundancy filter): In this stage, redundancy of features in CF1(t) is founded and filtered. For the purpose, between both x1 (t) and x2 (t) (x1, x2 ∈ CF1), MI is calculated that is shown to form of MI [x1, x2]. The bigger value represents more common information and redundancy between two features. Thus, the redundancy criterion for each x
k
(t) from CF1(t) with other members is calculated as follows:
A feature with bigger RC represents more redundancy or feature with less information. Features can be ranked according redundancy and if a feature has a RC bigger than a threshold T2 is indicates that redundancy of the feature with another feature is too much and one of them should be removed. To do this, if RC of x k (t) is bigger than T2, the feature x j (t) that have biggest MI with x k (t) (MI [x k (t) , x j (t)] > MI [x k (t) , x l (t)] , x l (t) ∈ CF1- {x k (t) , x j (t)}) is identified.
Now to choose between x k (t) and x j (t) to be deleted, their MIs with target variable are used and feature with smaller MI (less relevant) is deleted. This operation is performed on all members of CF1(t) until no feature was available with RC bigger than T2. The remained subseries known as CF2(t) (Fig. 1) that is a relevant and non-redundant set. Setting the thresholds T1 and T2 is done with a compromise between quality and number of input features.
In the recent works [22–24] the wavelet transform as a preprocessing in electricity prices forecasting is used. Another approach is using wavelet in forecasting process through the wavelet neural network construction which a wavelet function as activation function of hidden neurons is considered. According to wavelet’s local properties and the ability of adapting wavelet shape according to training data instead of adapting parameters of the activation function with fixed shape [25], WNNs high ability to generalize compared with classical forward neural networks are provided.
In this work a WNN with Morlet wavelet as activation function of hidden neurons is used for forecasting the price of electricity. Figure 2 shows the structure of a three layer forward WNN. In the figure X = [x1, x2, …, x m ] is the input vector and y is the target variable.
Forecasting engine produce the input-output mapping from x to y. In the hidden layer, activation function of each neuron is determined as follows:
Where n is the number of hidden layer’s neurons. ψ (.) is defined as follows:
In Equations (9) and (10), ψa
i
,b
i
(.) is the shifted and scaled version of the ψ (.) with a
i
and b
i
as shift and scale parameters. Finally, the output is calculated as follows:
In the above equation, w i is weight between the hidden neuron i and output and v j is weight between input j and output. In other words, the output of WNN is obtained by a combination of input and output of wavelet functions. With respect to the above, vector of free parameters of WNN is as follows:
As a result, WNN has np = 3n + m free parameters.
The dolphin echolocation algorithm is a new optimization algorithm. The algorithm has specific characteristics like simple concept, easy implementation, and fast convergence. Best conducted study on marine mammals is related to the dolphins. A dolphin can produce sounds in the form of clicks in high-frequency. When the sound hit to the target, the number of its energy returned and recovered to dolphin. As soon as the sound received, dolphin produces other sounds again. With time delay between the sounds and their returns, the dolphin can assess its distance from the target. By sending and receiving reflected clicks in continuous, the dolphin can follow objects [26].
This algorithm, simulate the dolphin’s echolocation and limiting the search related by distance from the target. For defined this process more clearly, two phases are introduced: In the first phase, the algorithm evaluate all space search to form that to a general search space, so it should be looking for unexplored areas. This task is done by create a series of random locations in the search space. In the second phase concentrate to evaluate the best places from the first phase. These are inherent characteristics of all evolutionary algorithms [27].
Before then begin optimization, the search space must be classified according to the following rules:
For each variable that must be optimized during the process, alternatives of the search space have to be sort ascending or descending. If a variable has more than one characteristic, the sorting is done for the most important [31–33]. Using this method for the variable i, vector A i is made with length N i that include all possible scenarios of variable i.
By placing these vectors as columns of a matrix, the alternatives matrix (AM) is made in dimensions M * N
v
. Where M is the maximum length of vectors A
i
and N
v
is the number of variables. In addition, the curve of convergence factor (CF) that it should change during the optimization process must be specified. Here, the convergence factor is determined as follows:
Where PP is predefined probability and PP1 is convergence factor in the first iteration that answers are chosen randomly. I i , I m , and p are iteration i, maximum iteration, and degree of the curve, respectively.
The main steps of dolphin echolocation optimization algorithm are as follow.
Step 1: Generate N l random locations for dolphin and also create matrix L (alternatives matrix) with dimensions N l * N v . Where N l and N v are the number of locations and variables, respectively.
Step 2: Calculate CF for current iteration by Equation (14).
Step 3: Calculate the fitness function for each of locations.
Step 4: Calculate the accumulative fitness by rules of dolphins as follow:
A: The location L (i, j) in column j of the alternatives matrix is founded, and name it as A.
for i = 1 to number of locations
for j = 1 to number of variables
for -R
e
< k < R
e
Where af (A + k) j is the accumulative fitness of the alternative A + k to be chosen for the variable j. R e is the effective radius that accumulative fitness of A’s neighbors are impressed with its fitness. F (i) is the fitness of location i. It is noteworthy that if alternatives are selected on the edges (A + k is invalid i.e. A + k < 0 or A + k > L i ) the value of af is calculated using reflection characteristic. In this case, if distance between alternative and edges is less than R e , assuming that there is a similar alternative like that a mirror be placed on edges.
B: For equal probability distribution in the search space, a small amount φ is added to all arrays as af = af + ɛ.
C: Find the best location in current iteration and call it “the best location”. Find alternative assigned to variables of the best location and put them af equal to zero. In other words:
for i = 1 to number of variables
for j = 1 to number of alternative
if i = the best location(j)
Step 5: For the variable j probability of selection alternative i is calculated based on the equation described in below:
Step 6: Determine the probability for all alternatives of all variables of the best location equal to PP and then the rest divide between other alternative as follow:
for i = 1 to number of variables
for j = 1 to number of alternative
if i = the best location(j)
else
Step 7: Calculate the locations in the next iteration using the probabilities available for each alternative. Repeat steps 2 to 6 until achieve maximum iteration.
To improve and have powerful algorithm, a new modification on the algorithm is described in this section. Call the best place as l
b
est. In fact, the speed of each location l
i
to the best location is determine by distance between l
i
and l
b
est. In each iteration, this distance is calculated based as follows:
Where N is the length of control vector. Now absorption parameter is defined for the best location as follows:
Where β0 is the absorption parameter in distance zero. Considering the above equation, absorption parameter has inversely related to the distance. Now the location l
i
is updated by following equations:
In Equation (22), two first term is for making balance between l b est and l i , and third norm is for a random movement in search space. γ is absorption coefficient which is defined to control click rate.
PI formulation
Uncertainty of forecasting model is the main factor of uncertainty in price forecasting. The uncertainty is happened due to incorrect structure and parameters of neural networks. The noise of training data is another reason for uncertainty of forecasting. Random behavior of data regression causes this type of uncertainty. For a set of separate pairs as {(x
i
, t
i
)} , i = 1, 2, …, N, purpose of forecasting is expressed as follows:
Where t
i
is forecast target i and x is inputs vector. e (.) is a noise with zero mean and r (.) is mean of true regression. It is assumed that the noise has a normal distribution and variance that is related to the input variables. According to [10, 31], this assumptions are reasonable. Trained neural network , can be considered as an estimator of true regression r (.). In such a case, forecasting error can be expressed as follows:
In above equation, and represent forecasting error and neural network estimation error according to true regression, respectively. Assuming that the noise and estimation error are statistically independent, total variance of forecasting error is equal to sum of the model uncertainty variance and noise variance .
For a set of data as {(x
i
, t
i
)}, the PIs with (1-α) ×100% confidence level are shown as [Dt,α (x
i
) , Ht,α (x
i
)] and are expressed as follow:
Where Dt,α (.) and Ht,α (.) indicate down bound and high bound of PI respectively. z1-α/2 is critical value of standard normal distribution that associate on confidence level.
Bootstrapping technique is a powerful device for statistical inference processes, which was introduced in 1979 [29, 30]. Bootstrapping is mainly based on using the sample of a population for inferences about that population. Simplicity and robustness are advantages of bootstrapping method in comparison with methods based on parametric assumptions. Other methods in computing the standard error are unreliable and complex [18]. The bootstrapping technique is used in order to compose the uncertainty of forecasting model in PIs, which this process can be expressed as follow:
Step 1: Obtain for the given training data {(x i , t i )}.
Step 2: Calculate the error between forecasting output and original targets as .
Step 3: Transfer the center of resulting errors to zero as .
Step 4: Insert this new errors instead of previous errors then calculate new targets using equation and define new training data in form of {(x i , t i )}.
Step 5: Forecast for new training data in bootstrapping iteration q.
Step 6: Repeat steps 2 to 6 until achieve to B bootstrapping iterations.
At the end of these steps, training data is created B times and in each time, neural networks are trained. Now, mean of trained neural networks output is expressed as follows.
In this equation, represent the forecasted value in bootstrapping iteration q. Finally, variance of model uncertainty is estimated as follows:
For modeling the uncertainty of data noise, it’s required to compute the variance of noise. According to Equation (25), variance of estimated noise is obtained as follows:
To estimate the model with the purpose of fitting extra residual, square of residuals set can be defined as follows:
Now a new set of training data with positive output is defined as {(x i , R2 (x i ))}, i = 1, 2, …, N. To model the uncertainty of data noise, a neural network is trained by this new data and its output and variance of main forecast are added. Therefore, B + 1 forecasts are done.
In this section, to evaluate the proposed method, the reliability evaluation (RE) criterion and sharpness evaluation (SE) criterion of PI are investigated. Reliability is a significant feature for validity and authenticity of PIs. It is expected with normal probability equal to (1–α)×100%, outputs would be close to the main targets that is called prediction interval nominal confidence (PINC). For N
t
available data samples, prediction interval coverage probability (PICP) is expressed as follows:
Where Ii,α is PICP Index that is achieved as follows:
PICP indicates the reliability of PIs and when reliability is high, this value must asymptotically attain to PINC. Therefore, the average coverage error (ACE) is expressed as difference between PICP and PINC. This index reflects the quality of PIs. For PIs with high quality, ACE should be as much as possible close to zero.
High reliability prediction intervals can be easily achieved by increasing the width of the intervals. However, it is illogical for practical applications. So, sharpness evaluation criterion for measuring the average of prediction intervals width is used. This measure reflects the ability of the model to focus on uncertainty of probabilistic forecasting. This criterion is calculated as follows:
Smaller value for SE reflects the narrower the prediction intervals and better performance.
In this section, implementation and considerations of model are presented. In Fig. 3 for better perception, a flowchart of proposed model performance is presented. Elucidation for various parts of the flowchart is provided in follow.
First of all, the length of training data is determined. The purpose is forecasting for a week and data of two previous months is used for training. Historical price is the only data which is used as input. The desired accuracy and ability to compare proposed model performance with previous works are the reasons for this choice. Due to the high number of training data, which enlarge neural network and increase computation burden, the feature selection is used to reduce the input parameters. Historical data is given to feature selection and irrelevancy and redundancy filters are applied on them. Therefore, hours with higher effect on next hour are defined as inputs. For example, more efficient data of Nov 2014 in Ontario and Australian electricity markets is shown in Table 1.
As mentioned in previous sections, the wavelet preprocessing is used to improve the forecasting performance. In this paper, daubechies (dB4) wavelet transform is used to turn two months training data set into four subseries (Fig. 4). Two of them are approximation subseries (A1, A2) and two others are details subseries (D1, D2). Now for training and price forecasting, these subseries are used. To compose the uncertainty of model using bootstrapping, B neural networks are trained by each one of these subseries using modified dolphin echolocation optimization algorithm. Using inverse wavelet transform, outputs of model uncertainty neural networks are backed into time domain. Based on original data and outputs of model uncertainty part, a neural network is trained to model uncertainty of data noise. As a result, 4B + 1 neural networks are trained for each hour. Outputs of model uncertainty and noise data uncertainty are used to calculate PIs. The process of these 4B + 1 neural networks is shown in Fig. 5.
Then, the forecasting hour is checked and process is done if time period was over (forecasting for hour 168 is done). If the time period is not over, forecasted result of current hour is used to forecast next hour by adding it to end of inputs data. Until end of the forecast period this process is continued.
Numerical results
The proposed model to forecast electricity price for the year 2014 from Ontario (IESO), Australian (ANEM), and New England electricity markets so as to have encyclopedic assessment of its performance is exerted. Historical price of markets is taken from [34–36], respectively. Information of the Australian electricity market is related to the South Australia. In all markets, hourly prices are considered as inputs of model.
To compare and validate the proposed model, some other methods based on same data are considered. First method is persistence that is a simple solution to time series forecasting. In this method current price is obtained directly from price of a previous day. In second method to calculate the PI, bootstrapping is used and neural networks are trained by Back Propagation (BP) technique. Third method is presented in [18], which is based on bootstrapping and ELM. In this method the data noise is considered in calculation of PI.
For year 2014 in the Ontario, Australian, and New England electricity markets and for two confidence level (90% and 95%), forecasting is done by these four methods. The results are represented in Tables 2 to 6. These tables are related to May 2014 and Nov 2014 of Ontario market, May 2014 and Oct 2014 of Australian electricity market, and May 2014 of New England electricity market, respectively.
According to the information of these Tables, the proposed method has better performance to comparison with other methods in most tests. In the proposed hybrid method, value of PICP was very close to PINC in all tests, therefore ACE is too small. Absolute value of ACE for the proposed method in all of the tests isn’t more than 2% and also value of SE evaluation in most tests is less than other methods. These results are shown a very good performance method.
Persistence is a very simple method and doesn’t have good result, therefore isn’t suitable for this application. Although in some cases BP method has good value of ACE, but it doesn’t have good performance in all of the tests. Also training neural networks with BP method takes high computation burden, therefore this method can’t be a reasonable choice for probabilistic electricity prices forecasting. Low computation burden is advantage of the ELM method, but this method can implement only on single layer neural networks. The proposed method has higher accuracy compared with ELM method. To use wavelet preprocess makes a good performance method compared with the method that proposed in [18]. However, ELM method has acceptable results so computation burden has vital impact to designate these two methods. The proposed method has better performance and higher computation burden, whereas ELM method has weaker performance and lower computation burden. As regards, because of weekly horizon forecast, there is no time restriction and according to good performance of the proposed hybrid method, this method is more reasonable.
The graphically results of probabilistic forecasting using proposed method are shown in Figs. 6 to 10. This results are for two weeks of 2014, in the Ontario and the Australian electricity market, and one week of 2014, in New England electricity market. Confidence level of 95% for three cases and 90 % for two cases are considered. These results comprise actual value of prices and high and down bound of prediction intervals.
Above numerical and graphical results show high ability of proposed method to follows prices time series with well reliability. This reliability and flexibility of the proposed method prove that it’s usable in practical electricity markets probability forecasting.
Historical price is the only neural networks input in this paper, whereas electrical price depends on other variables chronological events, weather conditions, bidding strategy of participants etc. By adding these parameters as neural networks entrance, performance of the method can be improved.
Conclusion
An applicable method for electricity price forecasting can help market participants in decision making processes. Price time series has really nonlinear behavior and depends on a lot of parameters, so forecasting always has inevitable errors. To use more efficient tools and their combination is a common way to electricity price forecasting. In this paper, modified dolphin echolocation optimization algorithm, wavelet preprocessing, WNN, and feature selection technique are combined in order to provide a high performance method. In this hybrid method, uncertainty of model and uncertainty of data noise is composed. By considering both uncertainties, prediction intervals are obtained. Finally, data of Ontario, New England, and Australian electricity markets is used to test the proposed method that provide acceptable results. High performance in uncertainty modeling and high training accuracy made the method a good choice for probabilistic electricity price forecasting.
