Application of SVR with backtracking search algorithm for long-term load forecasting

Abstract

Long-term load forecasting is an important issue for a country’s power suppliers to determine the future electric system plan, investment and operation. This paper presents a novel hybrid long-term forecasting method with support vector regression(SVR) and backtracking search algorithm(BSA) optimization algorithm, which is used to obtain the parameters of the SVR. The practical case of China’s annual electricity demand is used to evaluate the effectiveness of the proposed method. According to the results, the performance of the proposed method is better than the SVR model with default parameters, back propagation artificial neural network (BPNN) and regression forecasting models in annual load forecasting.

Keywords

Long-term load forecasting support vector regression (SVR)backtracking search algorithm

1 Introduction

Load forecasting, which is important for electric market’s partners to make their trading decision, can be categorized as short-term and long-term forecasting. The time horizon of short-term forecasting’s range is from a few minutes and a day, and it also provide the decision message to decide the electric company’s dispatching schedule. The time horizon of long-term load forecasting’s range is between a few months and ten years [1]. On the other hand, long-term forecasting’s time range is from a few months and a year. It also plays a key role in the decision of power system long-term investment, power maintenance planning and power system construction schedule. Annual load forecasting is one of the most typical long-term forecasting problem, however, it is difficult to forecast annual load forecasting accurately, because the annual load is affected by many non-linear factors such as GDP, population and other national economic and social factors, which are always uncertain and uncontrollable.

In the last two decades, many researches are focus on long-term load forecasting. The forecasting methods can be categories as two types. One is the economic input-output forecasting method, which uses GDP, population, other economic conditions and history load values as the input variables to create a load forecasting model such as multiple linear regression, artificial neural network (ANN) and so on [2 –4]. The input variables, which may affect the long-term load characteristics, can divided into economic variables, society variables, planning development variables and load variables four categories.

The other forecasting methods use only history load values to create a forecasting model, which considers the forecasting problem as a time-series forecasting problem. The forecasting model does not accept the economic variables, society variables and planning development variables because it is no way to confirm the accuracy of the variables value, and how to get the future variables value is also a difficult forecasting problem. One of the classical time series forecasting model is ARMA, which is always employed in the short-term load forecasting [5 –7], and like the above economic input-output forecasting method, the ANN and support vector regression (SVR) are also used in the time series forecasting model [8, 9]. The greatest feature of this model can be easily obtained the history load data and not use forecast any variables value. For this advantage, it is better to choose the time-series forecasting model in long-term forecasting.

Recently, support vector regression (SVR) is one of a popular intelligence model in load forecasting. Intelligence model such as ANN and SVR needn’t have any prior knowledge, and it can deal with non-linear fitting at any accuracy [10, 11]. SVR uses the structural risk minimization principle to minimize the generalization errors rather than ANN’s the empirical risk minimization principle, which can easily get global optimal solution. In small sample situation, SVR method can be considered a way to replace ANN. According to many studies, the forecasting result performance of SVR is better than ANN and other time-series forecasting model [12, 13].

The core problem of using SVR to forecast is determining a structure, including input variables, output variables and parameters, which are always determined by experts’ experience, correlation analysis or optimization algorithms. The purpose of correlation analysis of variables and load time series is finding out suitable input variables of intelligence model, and the optimization algorithms are usually used to determine three parameters, which are kernel function parameter r, error penalty parameter C and Insensitive loss function e in SVR. The optimization algorithms include swarm intelligence optimization related algorithm, such as genetic algorithm, differential evolution (DE) algorithm [13]. Backtracking search algorithm (BSA), proposed by P. Civicioglu in 2013, is a new optimization algorithm for searching numerical optimization problems. The algorithm is easy to use because it is real-value coding, only needing a single control parameter and not over sensitive to the initial value. According to comparison results in dozens of benchmark function, BSA’s performance is more successfully than six widely used evolutionary algorithms such as PSO, CMAES, ABC, JDE, CLPSO and SADE [14]. In order to enhance SVR models’ forecasting performance, we employ BSA to determine the parameters of SVR and give the performance of annual load forecasting to explain its effectiveness.

The rest of the paper is organized as follows: Section 2 introduces an annual forecasting model with SVR optimized by BSA. Section 3 gives data and forecasting result description. At last, conclusions are given in Section 4.

2 Support vector regression model optimized by BSA in annual load forecasting

2.1 A brief introduction of support vector regression

Support vector Machine (SVM), based on machine statistical learning theory, firstly used in pattern analysis. It has been successfully applied in many areas such as text categorization, pattern recognition, and signal processing and so on. With the development of SVM, it has promoted to deal with the non-linear regression function fitting problem, which changed into support vector regression model (SVR) method. Suppose a set of data G = {(x_i, d_i)}, x_i ∈ Rⁿ, i = 1 ⋯ N is a n dimension input vector, d_i ∈ R¹ is a corresponding target output, N expresses the total number of pattern records. The linear regression estimate function can be express as [15, 16]. $y = f (x) = w ψ (x) + b$ (1)

In which, ψ (x) is a nonlinear mapping from the input space to a high dimensional feature space. w is a weight vector and b is a threshold value, which can be estimated by minimizing the regularized risk function. $R (C) = (C / N) \sum_{i = 1}^{N} L_{ɛ} (d_{i}, y_{i}) + {∥ w ∥}^{2} / 2$ (2)

In which, $L_{ɛ} (d, y) = {\begin{matrix} 0 & | d - y | \leq ɛ \\ | d - y | - ɛ & otherwise \end{matrix}$ (3)

∥w ∥ ²/2 measures the flatness of the function, and the function L_ɛ (d, y) is called the e-insensitive loss function. C is considered to specify the trade-off between the empirical risk and the model flatness, e express the Vapnik’s linear loss function zone to measure empirical error. After two slack variables ζ and ζ^* are introduced to represent the distance from actual values to the corresponding boundary values of e-tube, Equation (2) can be transformed to Equation (4). $\begin{matrix} R (w, ζ, ζ^{*}) = {∥ w ∥}^{2} / 2 + C \sum_{i = 1}^{N} (ζ_{i} + ζ_{i}^{*}) \\ s . t . \\ w ψ (x_{i}) + b_{i} - d_{i} \leq ɛ + ζ_{i}^{*}, i = 1, 2, \dots, N \\ d_{i} - w ψ (x_{i}) - b_{i} \leq ɛ + ζ_{i}, i = 1, 2, \dots, N \\ ζ_{i}, ζ_{i}^{*} \geq 0, i = 1, 2, \dots, N \end{matrix}$ (4)

This constrained optimization problem can be solved with the primal Lagrangian form $\begin{matrix} L (w, b, ζ, ζ^{*}, α_{i}, α_{i}^{*}, β_{i}, β_{i}^{*}) \\ = {∥ w ∥}^{2} / 2 + C \sum_{i = 1}^{N} (ζ_{i} + ζ_{i}^{*}) \\ - \sum_{i = 1}^{N} β_{i} [(w ψ (x_{i}) + b - d_{i} + ɛ + ζ_{i})] \end{matrix}$ (5) $\begin{matrix} - \sum_{i = 1}^{N} β_{i}^{*} [(d_{i} - w ψ (x_{i}) - b + ɛ + ζ_{i}^{*})] \\ - \sum_{i = 1}^{N} (α_{i} ζ_{i} + α_{i}^{*} ζ_{i}^{*}) \end{matrix}$

w, b, ζ, ζ^* need be minimized in Equation (5), therefore, it needs satisfy the following conditions. ${\begin{matrix} \frac{\partial L}{\partial w} = 0 \to w - \sum_{i = 1}^{N} (β_{i} - β_{i}^{*}) ψ (x_{i}) = 0 \\ \frac{\partial L}{\partial b} = 0 \to \sum_{i = 1}^{N} (β_{i} - β_{i}^{*}) = 0 \\ \frac{\partial L}{\partial ζ} = 0 \to C - ζ - α_{i} = 0 \\ \frac{\partial L}{\partial ζ^{*}} = 0 \to C - ζ^{*} - α_{i}^{*} = 0 \end{matrix}$ (6)

Using Karush-Kuhn-Tucker conditions and substituting Equation (6) in Equation (5), the dual can be obtained as $\begin{matrix} ϑ (β_{i}, β_{i}^{*}) = \sum_{i = 1}^{N} d_{i} (β_{i} - β_{i}^{*}) - ɛ \sum_{i = 1}^{N} (β_{i} - β_{i}^{*}) \\ - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (β_{i} - β_{i}^{*}) (β_{j} - β_{j}^{*}) K (x_{i}, x_{j}) \\ s . t : \sum_{i = 1}^{N} (β_{i} - β_{i}^{*}) = 0, \\ 0 \leq β_{i} \leq C, 0 \leq β_{i}^{*} \leq C, i = 1, 2, \dots, N . \end{matrix}$ (7)

In which, $β_{i} β_{i}^{*} = 0$ and an optimal desired weight vector of the regression hyperplane is $w^{*} = \sum_{i = 1}^{N} (β_{i} - β_{i}^{*}) ψ (x) .$ (8)

The final regression function is $f (x, β, β^{*}) = \sum_{i = 1}^{l} (β_{i} - β_{i}^{*}) K (x, x_{i}) + b .$ (9)

Where K (x_i, x_j) is the inner product of two vectors in the feature space ψ (x_i) and ψ (x_j), K (x_i, x_j) = ψ (x_i) ψ (x_j) is called the kernel function which needs to meet Mercer’s condition. The most commonly used kernel function is the Gaussian RBF kernel function K (x_i, x_j) = exp(- ∥ x_i - x_j ∥ ²/2σ²), it is also employed in this study. Therefore, to use the above SVR models need to determine three parameters C, ɛ and σ.

It is important to determine three parameters for using SVR in forecasting. C controls the empirical risk degree of SVR, e controls the width of thee-insensitive loss function and the delta controls the Gaussian function width of the kernel function. Most of the researchers select three parameters by experience or some priori knowledge, it lacks automatically efficiently methods of determining the parameters, in this study, Backtracking search algorithm (BSA) is used for determining the parameters of the proposed SVR forecasting method.

2.2 Backtracking search algorithm

Backtracking Search Algorithm (BSA) algorithm is one of the evolutionary algorithms (EA), which was proposed by Pinar Civicioglu in 2013. As other evolution algorithms such as genetic algorithm (GA), BSA is also population based algorithm, which contains five steps as is done in other EAs: initialization, selection-I, mutation, crossover and selection-II. The detail working steps are as follows [14]:

Step 1. Parameters initialization

The main parameters of BSA algorithm are population size N and length of the chromosome, which is always the problem dimension D, Set g = 0. Generate a N * D matrix with uniform probability distribution random values. The generation method is $X_{ij} = rand \cdot (high [j] - low [j]) + low [j]$ (10)

In which, i = 1, 2, …, N, j = 1, 2, …, D, rand is a random number with a uniform probability distribution, and high [j], low [j] is the upper bound and lower bound of the jth column, respectively.

Step 2. Selection-I

In this stage, BSA determines which population is to be used for calculating the search direction, the population is determined using Equation (11), and U is the uniform distribution: ${oldP}_{ij} \sim U ({low}_{j}, {up}_{j})$ (11) at the beginning, BSA has the option of redefining oldP of each iteration through the IF-THEN rule by Equation (12): $if a < b then oldP : = P | a, b \sim U (0, 1)$ (12) where “:=” is the update operation. After oldP is determined, Equation (13) is used to randomly change the order of the individuals in oldP: $oldP : = permuting (oldP)$ (13)

The permuting function used in Equation (13) is a random shuffling function.

Step 3. Mutation operation

BSA generates the trial population using mutation function as Equation (14). $Mutant = P + F^{*} (oldP - P)$ (14)

In Equation (14), F controls the amplitude of the search-direction matrix (oldP - P). Because the historical population is used in the calculation of the search-direction matrix, BSA generates a trial population, taking partial advantage of its experiences from previous generations. This paper uses F = 3*rndn, where rndn ∼ N (0, 1).

Step 4. Crossover operation

BSA’s crossover process generates the final form of the trial population T. The initial value of the trial population is Mutant, as set in the mutation process. Trial individuals with better fitness values for the optimization problem are used to evolve the target population individuals. BSA’s crossover process has two steps. The first step calculates a binary integer-valued matrix of size N * D that indicates the individuals of T to be manipulated by using the relevant individuals of P.

Step 5. Selection-II operation

Selection-II operation T_is that have better fitness values than the corresponding P_is are used to update the P_is based on a greedy selection. If the best individual Pbest has a better fitness value than the global minimum value obtained so far by BSA, the global minimizer is updated to be Pbest, and the global minimum value is updated to be the fitness value of Pbest.

BSA algorithm has the better performance than PSO, CMAES, ABC, JDE, CLPSO and SADE algorithms in the examination of several widely used benchmark functions [14]. According to the study, BSA algorithm is stable and it can obtain a better solution than other algorithms, and the experiment results are also shown that BSA can get the nearest optimal solution than PSO and GA algorithm. Therefore, the BSA algorithm will use to solve the weight determining optimization problem, whose fitness function is employed the mean absolute percentage error function (MAPE), which is common using in load forecasting performance evaluation. $MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{A (i) - F (i)}{A (i)} | \times 100 %$ (15)

Where A (i) is the actual value, F (i) is the forecasting value and n is the total numbers.

2.3 Backtracking search algorithm for parameters selection of SVR model

A flowchart of BSA algorithm for parameters selection of SVR model(BSASVR) is shown in Fig. 1.

The details of the BSASVR model are shown as follows.

Step 1. Initialization parameters. The population numbers (N), the mutation factor (F) and the crossover rate(R) of BSA algorithm should be determined at first, and a chromosome is consist of three parameters C, ɛ and σ, which should be determined in SVR. Therefore, the length of a chromosome is three.

Step 2. Evolution starting. Set g = 0 and employing Equation (10) to generate the population randomly.

Step 3. Using BSA in SVR parameters optimization calculations. Input the generated population into SVR for load forecasting, according to the load forecasting results, calculate and record the fitness function values. And Employing Equation (11) to Equation (14) to generate offspring, then input the offspring into SVR for load forecasting and calculate the fitness value again. Set g = g + 1.

Step 4. Circulation until the stop criterion satisfied. If the g equals the max generation numbers, show the best solution chromosome which is the best parameters of the SVR model, otherwise, go back to Step 3.

3 A numeric example and results analysis

Annual electricity consumption of Beijing city has been considered in the numeric example study, totally 31 load data from time to 1978 to 2014 listed in Table 1 and Fig. 2 are collected from Beijing’s Statistical Yearbook. The proposed BSASVR model is realized by matlab programming with the libsvm toolbox [17]. It is necessary to divide data to the training set and testing set. According to a series of experiment, when the last six load data L_n-6, L_n-5, …, L_n-1 put into the SVR model with the default parameters, it can get a satisfied performance. Therefore, the input variables of the BSASVR model are L_n-6, L_n-5, …, L_n-1, and the output variable is L_n.

In the training stage, a roll-based process is used. Firstly, the top six load data (from 1984 to 1989) of the load series are fed into the BSASVR model, then the first forecasting value of 1990 can be gotten. Secondly, the real value of 1990 in the series is employed to the next forecasting process, in other words, the input six load data is from 1985 to 1990 and the forecasting value of 1991 is obtained. Similarly, the processes are cycling until all of the exacted forecasting values are obtained. The three parameters and C, ɛ and σ are evolved generation by generation, when BSASVR gets the stopping criterion, the three parameters are finally determined from the best chromosome in the terminated population.

As shown in Fig. 2, it can easily find that the increasing linear trend of the load series. Therefore, linear regression forecasting model is employed in order to compare the results. The SVR forecasting model with default parameters and back propagation artificial neural network(BPNN) model are also employed for comparison. The SVR’s structure is the same as the BSASVR’s structure. Similarly, the BPNN contains three layers with six input nodes, eleven hidden nodes and only one output node. The input and output variables are also the same as those of the BSASVR model. The accuracy sets 10^–4 and the max generation is 1000, the BPNN realizes by matlab toolbox.

Figure 2 and Table 1 lists the results of BSASVR, SVR, BPNN and regression forecasting models. Because the BPNN forecasting results are difference every time, the best performance in the ten experiments are chosen. Table 1 also lists errors of these models. According to Fig. 2, it can be clearly seen that the performance of three intelligence forecasting models BSASVR is better than other models. Four forecasting models are all capture the increasing trends to the facts.

From the errors in Table 1, the deviation can be captured between the actual value and the forecasting results of four forecasting models. For load forecasting, 3% is always considered as a standard to measure forecasting results’ error range, the range is also used to compare four methods as follows. Firstly, the proposed model has 3 forecasting result points exceed the range in total 25 result points. They are 3.9%, 4.29% and 4.28% in 1989, 1995 and 1999 respectively. In SVR model, there are totally 10 points. SVR reaches the maximum error 14.33% in 2000. The errors of BPNN are 9 points out of 3%. The regression errors are almost higher than three other forecasting models, and the 13 errors are all exceed 3%. Compare with BPNN model, SVR model is more suitable of small sample load forecasting. Furthermore, the proposed BSASVR model makes better performance than SVR model. The results proved that the parameters determined by BSA algorithm can efficiently improve the forecasting accuracy of the SVR.

MAPE is used to measure the performance of the four forecasting models. The MAPE values lists in Table 2. The proposed method has the best performance, which is 0.92%. Second is the SVR model with default parameters, the MAPE value is 2.72%. BPNN has the third MAPE values, which is 3.04%. Finally, Regression model has the maximum MAPE value, which is 3.42%.

As shown in two tables and one figure, the proposed model BSASVR out performs three other models in annual load forecasting. With the comparison of the SVR model, it clearly shows that the parameters found by BSA algorithm can improve the forecasting accuracy effectually.

4 Conclusions

In this paper, a hybrid forecasting model base on BSA algorithm to determine the SVR model parameters is proposed for annual power consumption forecasting of Beijing city in China. The experiment results show as follows: Firstly, the intelligence load forecasting model has better performance than regression model in the experiment, because the intelligence forecasting models has good non-linear fitting capacity. Secondly, The SVR forecasting model has stability performance in the small sample forecasting, while the BPNN is easily falling into over-fitting problem. Finally, the BSA algorithm can determined the appropriate parameters of SVR model, it can effectually improve the forecasting accuracy. The proposed model is more easily to capture the increasing curve pattern of annual power consumption of Beijing city in China, it is a common electric load pattern in a developing country. In addition, the further research will consider some other advanced optimization searching algorithm to find the appropriate parameters of SVR model and how to reasonably select the input variables of the SVR model.

Footnotes

Acknowledgments

This research was conducted with Natural Science Foundation of China (71401054, 71403030), Beijing Planning Project of Philosophy and Social Science (Grant no. 14JGC108) and the State Grid Corporation headquarters (Grand no. KJGW2015-020).

References

AlRashidi

M.R.

and El-Naggar

K.M.

, Long term electric load forecasting based on particle swarm optimization, Applied Energy10 (2010), 320–326.

Kermanshahi

, Recurrent neural network for forecasting next 10 years loads of nine Japanese utilities, Neurocomputing23 (1998), 125–133.

Hsu

C.-C.

and Chen

C.-Y.

, Regional load forecasting in Taiwan–applications of artificial neural networks, Energy Conversion and Management44 (2003), 1941–1949.

Kermanshahi

and Iwamiya

, Up to year load forecasting using neural nets, International Journal of Electrical Power & Energy Systems24 (2002), 789–797.

Chen

J.-F.

, Wang

W.-M.

and Huang

C.-M.

, Analysis of an adaptive time-series autoregressive moving-average (ARMA) model for short-term load forecasting, Electric Power Systems Research34 (1995), 187–196.

Pappas

S.S.

, Ekonomou

, Karamousantas

D.C.

, Chatzarakis

G.E.

, Katsikas

S.K.

and Liatsis

, Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models, Energy33 (2008), 1353–1360.

Wang

, Tai

, Zhai

, Ye

, Zhu

and Qi

, A new ARMAX model based on evolutionary algorithm and particle swarm optimization for short-term load forecasting, Electric Power Systems Research78 (2008), 1679–1685.

Mamlook

, Badran

and Abdulhadi

, A fuzzy inference model for short-term load forecasting, Energy Policy37 (2008), 1239–1248.

Avci

, Selecting of the optimal feature subset and kernel parameters in digital modulation classification by using hybrid genetic algorithm-support vector machines: HGASVM, Expert Systems with Applications36 (2009), 1391–1402.

10.

Hippert

H.S.

, Pedreira

C.E.

and Souza

R.C.

, Neural networks for short-term load forecasting: A review and evaluation, IEEE Transactions on Power Systems16 (2001), 44–55.

11.

Pai

P.F.

and Hong

W.C.

, Forecasting regional electric load based on recurrent support vector machines with genetic algorithms, Electric Power Systems Research74(3) (2005), 417–425.

12.

Hong

W.C.

, Application of chaotic ant swarm optimization in electric load forecasting, Energy Policy38(10) (2010), 5830–5839.

13.

Jianjun

, Li

, Dongxiao

and Zhongfu

, An annual load forecasting model based on support vector regression with differential evolution algorithm, Applied Energy94 (2012), 65–70.

14.

Civicioglu

, Backtracking search optimization algorithm for numerical optimization problems, Applied Mathematics and Computation219(15) (2013), 8121–8144.

15.

Vapnik

, The Nature of Statistic Learning Theory, Springer-Verlag Press, 1995.

16.

Vapnik

, Statistical Learning Theory, Wiley Press, 1998.

17.

Chen

B.-J.

, Chang

M.-W.

and Lin

C.-J.

, Load forecasting using support vector machines: A study on EUNITE Competition, IEEE Transactions on Power Systems19 (2004), 1821–1830.