A deep learning approach to electric energy consumption modeling

Abstract

Automated metering Infrastructure (AMI) is an integral part of a smart grid. Employing the data collected by the AMI from the consumers to generate accurate electricity consumption forecasts can help the utility in significantly improving the quality of service delivered to the consumer. Design and empirical validation of machine learning based electric energy consumption forecasting systems, is presented in the present study. Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Extreme Learning Machines (ELM) based models are designed and evaluated. One of the major aspects of the work is that the proposed consumption forecasting systems are designed as generalized models, i.e. one single model can be used to generate forecasts for any of the consumers considered, as opposed to the conventional technique of generating a separate model for each consumer. The forecasting systems are designed to generate half-hour-ahead and two-hour-ahead electric energy consumption forecasts. The proposed systems are validated on data for 485 Small and Medium Enterprise (SME) consumers in the CER electric energy consumption dataset. Results indicate that the models proposed in the present study result in good consumption forecast accuracy are hence, well suited for generating electric energy consumption forecast models.

Keywords

CNN electric energy consumption forecast ELM GRU LSTM machine learning

1 Introduction

Automated Metering Infrastructure (AMI) [1] is an important part of the smart-grid infrastructure. Utilizing the electric energy consumption data collected by AMI nodes to generate accurate consumption forecasts can help the electricity supply providers (also referred to as utilities) improve their service quality significantly [2]. An attempt has been made in this study to develop an accurate and reliable electric energy consumption forecasting system for AMI. Previous work [3, 4] by the authors attempted to develop electric energy consumption forecasting models for each consumer individually. Though the results were encouraging, the extremely nonlinear nature of the AMI data resulted in limited forecasting accuracy. This study presents an empirical validation of the applicability of current state-of-the-art machine learning/deep learning techniques in an attempt to improve the accuracy and eliminate the requirement for individual user-specific models for electric energy consumption forecasting. Results obtained are validated on five different performance measures to establish the effectiveness of the proposed methodology. The rest of the paper is organized as follows. Section 2 presents the system description. Section 3 describes the training and testing methodologies adopted. The results and analysis are presented in Section 4. Section 5 presents the conclusions.

2 System description

Unlike other prediction problems, energy consumption forecasting is a challenging problem since the data exhibits temporal sequence dependencies. Hence, Extreme Learning Machines (ELM) and other deep learning based algorithms are well suited in these cases as these algorithms can exploit these temporal dependencies effectively, as observed in [5].

Forecasting techniques proposed in this study are Machine Learning (ML) and deep learning based models as well, trained and validated on users in SME category for data in CER electric energy consumption dataset obtained from Irish Social Science Data Archive (ISSDA) [6]. The ISSDA SME category dataset comprises of half-hourly time series data from 12 am of July 14, 2009 to 11.59 pm of December 29, 2010 for 485 SME users (a total of 25632 data points per user). For modeling purposes no preprocessing other than ‘0-1’ normalization is carried out. The normalized data is then used as input to the machine learning algorithms for training the models for forecasting.

Fourteen deep learning forecasting models based on Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN) and ELM are designed and evaluated in the present study. A detailed block diagram of the proposed methodology is depicted in the Fig. 1. Previous work [3, 4] in the area involved designing and validating a separate forecasting system for each individual user in the dataset. This approach was found to be capable of generating good forecast accuracy, however, scalability was an issue since each users’ consumption forecast needed to be independently modeled resulting in one model per user. In the present study, the consumption patterns exhibited by the entire set of 485 consumers in the ISSDA SME dataset is modeled using a single amalgamated model. Consumption data from all the 485 consumers is fed in parallel to the forecasting system which then generates forecasts for all the 485 users simultaneously, thus removing the necessity of ‘n’ number of individual models in each technique for ‘n’ number of users (in this study, n= 485). The simulations are carried out with help of NVIDIA GeForce GTX 750 TI GPU. Keras API with TensorFlow backend was employed. Table 5 gives the system specification of the models developed. A short description of the machine learning algorithms considered and the parameters selected, is given below.

Fig. 1

Block diagram of proposed system.

2.1 LSTM based forecasting system

LSTM is a special type of RNN, which has a capability to learn longer dependencies in the dataset. The simple structure can be found in [7]. LSTMs have been effectively used in several time series forecasting applications, e.g. in [8]. In the current study, 2 – Layered and 3 – Layered LSTM structures are used for modeling with 64 neurons in each hidden layer. The neuron size in hidden layer is selected via trial-and-error.

The simple LSTM architecture [9] with forget gate [10] is depicted in Fig. 2.

Fig. 2

LSTM architecture [10].

The forget gate determines the unnecessary component from the previous cell state which can be computed as follows [10] $f_{t} = σ (W_{f} X_{t} + U_{f} h_{t - 1} + b_{f})$ (1)

The state update of the cell is determined by the input gate and ‘tanh’ layer, which is calculated as $i_{t} = σ (W_{i} X_{t} + U_{i} h_{t - 1} + b_{i})$ (2) $C_{t}^{*} = \tanh (W_{c} X_{t} + U_{c} h_{t - 1} + b_{c})$ (3) $C_{t} = f_{t} \times C_{t - 1} + i_{t} \times C_{t}^{*}$ (4)

The output from the cell to next cell is calculated by output gate as $o_{t} = σ (W_{o} X_{t} + U_{o} h_{t - 1} + b_{o})$ (5) $h_{t} = o_{t} \times \tanh (C_{t})$ (6)

Where W_f, W_i, W_c and W_o are the weight matrix of the forget gate, input gate, memory cell and output gate respectively and U_f, U_i, U_c and U_o are the recurrent connections of the forget gate, input gate, memory cell and output gate respectively.

2.2 GRU based forecasting system

GRU [10] is a less complex RNN when compared with computations associated with LSTM. There are several experimental evaluations of both LSTM and GRU [9, 10] for their performance on different applications. In the current study, 2-Layer and 3-Layer GRU structures are used for modeling with 64 neurons in each hidden layer. The neuron count is chosen by trial and error method.

Architecture of GRU is shown in Fig. 3.

Fig. 3

GRU architecture [10].

The computation for reset gate and update gate as follows [10]: $z_{t} = σ (W_{z} X_{t} + U_{z} h_{t - 1} + b_{z})$ (7) $r_{t} = σ (W_{r} X_{t} + U_{r} h_{t - 1} + b_{r})$ (8) $h_{t}^{*} = \tanh (W_{h} X_{t} + U_{h} h_{t - 1} + b_{h})$ (9) $h_{t} = (1 - z_{t}) \times h_{t - 1} + z_{t} \times h_{t}^{*}$ (10)

Where W_z, W_r and W_h are weight matrix of update gate, reset gate and hidden state respectively. In the same way U_z, U_r and U_h are recurrent connections of update gate, reset gate and hidden state respectively.

2.3 CNN based forecasting system

CNN is a multilayer perceptron network, which has been widely used in computer vision applications with minimal preprocessing when compared to other ML algorithms. The recent evolution in data mining algorithms has made the CNN an effective contender for time series forecasting. CNN is used in financial forecasting [8]. In the present study 2 – layered and 3 – layered CNN are used in developing the forecast models with 64 neurons in each hidden layer. The neuron count is chosen by trial and error method.

2.4 ELM based forecasting system

ELM is a variant of single layer feedforward neural network [11] with wide range of data mining applications. The performance of ELM is often comparable to many state-of-art deep learning models. The work presented in [3] makes an ELM a competitive candidate in time series forecasting. In the present study, 485 neurons are used in the input as well as the output layers and 970 neurons are employed in the hidden layer.

3 Training and testing approaches

The ratio of the dataset used for training and testing the ML models under consideration is maintained to be 92:8. The training and testing samples size used are 17,520 and 1,440 respectively. The samples in CER Smart corresponds to 48 samples per day.

The training and testing are done based on two different approaches as depicted by the equations given below.

$X = {y_{t}, y_{t - 1}, y_{t - 2}, \dots y_{t - n}}$ (11)

The predictor variable in the current study holds the past four samples of the time series with two different types of target variables as 1 - step ahead forecast and 4 – step ahead forecast as given in Equations (12) and (13). $Y = (\begin{matrix} Y_{0} \\ ⋮ \\ Y_{k} \end{matrix}) = (\begin{matrix} y_{t} & \dots & y_{t - 3} \\ ⋮ & ⋱ & ⋮ \\ y_{t - n + 4} & \dots & y_{t - n} \end{matrix})$ (12) $T_{N} = (\begin{matrix} t_{0} \\ t_{1} \\ \begin{matrix} ⋮ \\ t_{k} \end{matrix} \end{matrix}) = (\begin{matrix} y_{t + N} \\ y_{t - 1 + N} \\ \begin{matrix} ⋮ \\ y_{t - n + N} \end{matrix} \end{matrix})$ (13)

Where N corresponds to number of steps ahead. The forecasting models, generically represented here as mdl (·) are a function of Y, T_N and are trained on the training dataset using the machine learning techniques to minimize the error between the actual N-step ahead consumption value and the model output. The forecasts then can be represented as: ${\hat{t}}_{m + N} = mdl (Y_{m}) \forall m > k$ (14)

The performance of each approach on different models developed was evaluated based on Median Absolute Percentage Error (MdAPE) and Mean Absolute Percentage Error (MAPE) (Table 1), Root Mean Squared Error (RMSE) (Table 3) and Mean Absolute Error (MAE) (in Table 4). In addition to these error metrics, the Directional Accuracy (DA) measure is also considered. DA is a useful measure in AMI forecasting since it gives an indication of how good the model is in forecasting the direction of change in electric energy consumption (increase or decrease).

4 Results and analysis

A total of 14 different models were developed using the modeling techniques described in section 2. The models depicted in Fig. 1 are as follows: odd numbered models, namely, models 1,3,5,7,9,11 and 13 are 1-step- ahead forecast models and even numbered models, namely, models 2,4,6,8,10,12 and 14 are 4 – step-ahead forecast models. The training and testing data for all the models is obtained from the CER dataset with 92% of the data being employed for training and the remaining 8% being used for testing. Hence, the training and testing samples size used are 17,520 and 1,440 respectively. All the models are evaluated based on five performance metrics, namely, MAE, RMSE, MAPE, MdAPE and DA. The results are tabulated as follows: performance in terms of MAPE and MdAPE is presented in Table 1, performance in terms of DA is tabulated in Table 2 and performance based on RMSE and MAE values is tabulated in Tables 3 and 4, respectively.

For the purpose of comparison, the results reported in [3] (validated on the same CER dataset) are also presented alongside those obtained using the models proposed in this study. In Table 1, six error bands for both MAPE and MdAPE are considered: 0–20%, 20.1–40%, 40.1–60%, 60.1–80%, 80.1–100% and >100%, represented as 20,40,60,80,100 and 100 +. The values corresponding to each band indicates the number of SME users for which the forecasting error falls that range.

From Table 1, it can be observed that the ELM and CNN based approaches perform better when compared to other modeling approaches considered in the present study. The individual modeling approach reported in [3] shows that ELM based models are good candidates for forecasting energy consumption, since 90.92% of total number of users within the band of ‘0 - 40% ’ cumulative MdAPE for 4 – Step Ahead Prediction, whereas the proposed systems can accommodate only 36.88%, 37.95% and 35.18% of total number of users within ‘0 – 40% ’ cumulative MdAPE band for the techniques such as 2 – Layered CNN, 3 – Layered CNN and ELM respectively. Even though the proposed work can accommodate (59.34%, 58.24% and 61.54%) lesser number of users in a particular error band for the best techniques in the present study when compared to previous work, the proposed methodology has the advantage of requiring only one single model capable of generating independent forecasts for each of the 485 users considered.

Table 1

User count on error metrics of SME dataset

Technique	Layers	S	MdAPE %						MAPE %
Technique	Layers	S	20	40	60	80	100	100+	20	40	60	80	100	100+
CNN	2	1	63	137	61	45	19	144	19	81	54	34	24	176
		4	56	117	73	35	19	169	16	76	45	39	22	190
	3	1	58	133	73	42	19	144	14	82	51	39	30	172
		4	48	130	65	40	25	161	14	74	48	32	28	192
GRU	2	1	49	133	58	43	34	152	15	83	43	37	19	191
		4	45	109	73	45	32	165	11	72	36	38	30	201
	3	1	54	125	60	39	35	156	11	84	41	32	24	196
		4	48	116	70	47	21	167	10	75	44	36	25	198
LSTM	2	1	50	122	70	45	27	155	14	80	35	45	22	192
		4	45	112	77	42	24	169	10	70	46	31	31	200
	3	1	46	123	72	41	27	160	14	76	40	43	17	198
		4	45	120	68	39	27	170	9	69	46	35	30	199
ELM	1	1	80	116	55	46	18	154	33	76	41	31	27	180
ELM	1	4	45	120	68	39	27	170	19	66	46	32	22	203
ANN – 1[3]	1	4	253	127	58	23	9	15	101	130	66	39	28	121
ANN – 2[3]	1	4	191	139	64	28	17	46	74	129	63	48	37	134
ELM – 1[3]	1	4	237	145	56	23	6	18	105	125	64	41	36	114
ELM – 2[3]	1	4	304	137	28	9	3	4	121	151	78	41	21	73
ERT – 1[3]	–	4	0	33	418	30	2	2	0	7	278	79	26	95
ERT – 2[3]	–	4	243	148	57	21	8	8	82	132	76	54	22	119
Linear – 1[3]	–	4	0	96	337	47	4	1	0	38	259	110	26	52
Linear – 2[3]	–	4	162	129	73	30	18	73	70	131	61	42	33	148

1[3] with Hodrick–Prescott filter preprocessing, 2[3] without Hodrick–Prescott filter preprocessing, S – Steps Ahead.

The best and worst DA forecasting results (1-step and 4-step ahead, implying half-hour and two-hour ahead) for SME consumers is given in Table 2 as a representative for all the 485 SME consumers.

Table 2

Best and worst forecast DA for SME users

Technique	Layers	1 – Step Ahead		4 – Step Ahead
Technique	Layers	Best	Worst	Best	Worst
CNN	2	69.23	44.69	70.65	43.41
CNN	3	67.81	44.43	66.50	43.02
GRU	2	67.68	50.13	67.49	48.77
GRU	3	68.39	49.48	66.14	47.84
LSTM	2	68.13	49.61	65.27	47.60
LSTM	3	67.42	49.29	64.88	48.02
ELM	1	71.57	52.07	74.25	51.18

Table 3

RMSE Best and Worst values of the different techniques considered

Technique	Layers	1 – Step Ahead		4 – Step Ahead
Technique	Layers	Best	Worst	Best	Worst
CNN	2	0.0049	0.2902	0.0052	0.2913
CNN	3	0.0054	0.3237	0.0036	0.3055
GRU	2	0.0236	0.3052	0.0245	0.3102
GRU	3	0.0221	0.3053	0.0247	0.3030
LSTM	2	0.0247	0.3040	0.0255	0.3080
LSTM	3	0.0244	0.3125	0.0242	0.3060
ELM	1	0.0254	0.1702	0.0173	0.2163

Table 4

MAE Best and Worst values of the different techniques considered

Technique	Layers	1 – Step Ahead		4 – Step Ahead
Technique	Layers	Best	Worst	Best	Worst
CNN	2	0.0035	0.2805	0.0041	0.2698
CNN	3	0.0042	0.3116	0.0028	0.2740
GRU	2	0.0093	0.2344	0.0106	0.2596
GRU	3	0.0099	0.2270	0.0096	0.2499
LSTM	2	0.0105	0.2388	0.0104	0.2449
LSTM	3	0.0102	0.2384	0.0112	0.2576
ELM	1	0.0112	0.1206	0.0123	0.1541

Table 5

System specification of the models

Technique	Layers	Neurons	Activation	Optimizer	Loss metrics	Batch Size	Max iterations
CNN	2, 3	64	Hidden: relu, Input and output: linear	ADAM^*	MSE (10^-6)	32	1000
LSTM	2, 3	64	linear	ADAM^*	MSE (10^-6)	32	1000
GRU	2, 3	64	linear	ADAM^*	MSE (10^-6)	32	1000
ELM	1	2 x input size	Sigmoid	–	MSE (10^-6)	–	–

*ADAM optimizer’s parameter is given as (η= 1e-3, β₁= 0.9, ɛ= 1e-08, β₂= 0.999, decay = 0, clipnorm = NULL, clipvalue = NULL).

It is apparent from that ELM based forecasting system outperforms all other forecast models considered in terms of predicting the trend of the time series at least by 3.38% and 5.1% for 1 – step-ahead prediction and 4 – step-ahead prediction respectively, when compared to its CNN counterparts. A clear picture of directionally accurate forecasting can be observed from the Fig. 4 and Fig. 4a, it is also observed that the that proposed methodology can predict even the directional changes with a good degree of accuracy.

Fig. 4

A 2-day window of Time-Series of actual vs predicted values of considered technique for SME user number: 317 [6]; a. Expanded View of the results for user number: 317 [6].

The MAPE values of 16 users out of 485 in SME dataset were found to be infinite as the actual values in the dataset approximately equals to zero for the test data (supposedly indicating no consumption or unavailability of reading from the meter). From the results obtained, it is clear that while the models developed in the current study provide as good a result as the previous work [3], the proposed models also avoid the necessity of ‘n’ individual models for ‘n’ users which in the models based on ELM and CNN architectures outperform all other models developed for the dataset considered, and a single step ahead prediction horizon provides us with minimal prediction error and good DA.

Conclusions

Accurate and reliable forecast models are essential for forecasting the electric energy consumption as this forecasting plays a vital role in grid stability and profitable resource management. Designing and managing individual models for forecasting in this kind of multi-user environment is always a cumbersome process. The proposed methodology efficiently addresses this issue. Based on the analysis of results obtained from the predictive models developed for the time series forecasting the conclusions are summarized as follows:

The proposed technique of designing a single unified model that can generate consumption forecasts for all the users under consideration simultaneously is a viable alternative to the traditional technique that requires separate model for each user.

ELM and CNN based models can be adopted as reliable candidates for electric energy consumption forecasting since the ELM and CNN based models gives consistent results for both longer and shorter prediction horizons.

References

Hart

D.G.

Using AMI to realize the Smart Grid, in: 2008 IEEE power Energy Soc. Gen. Meet. – Convers. Deliv. Electr. Energy 21st Century, IEEE, 2008: pp. 1–2. 10.1109/PES.2008.4596961.

Mohan

, Soman

K.P.

, Sachin Kumar

, A data-driven strategy for short-term electric load forecasting using dynamic mode decomposition model, Appl. Energy.232 (2018) 229–244. 10.1016/J.APENERGY.2018.09.190.

Jayanth Balaji

, Harish Ram

D.S.

, and Nair

B.B.

, Machine learning approaches to electricity consumption forecasting in automated metering infrastructure (AMI) systems: An empirical study, in: Adv. Intell. Syst. Comut., Springer, Cham, 2017: pp. 254–263. 10.1007/978-3-319-57264-2_26.

Jayanth Balaji

Harish Ram

D.S.

and Nair

B.B.

, Modeling of consumption data for forecasting in automated metering infrastructure (AMI) systems, in: Adv. Intell. Syst. Comput., Springer, Cham, 2016: pp. 165–173. 10.1007/978-3-319-33389-2_16.

, Fekri

Time Series Prediction Via Recurrent Neural Networks with the Information Bottleneck Principle, in: 2018 IEEE 19th Int. Work. Signal Process. Adv. Wirel. Commun., IEEE, 2018: pp. 1–5. 10.1109/SPAWC.2018.8445943.

C. for E. Regulation, CER Smart Metering Project – Electricity Customer Behaviour Trial, 2009–2010, 2012.

Gers

F.A.

, Learning to forget: continual prediction with LSTM, in: 9th Int. Conf. Artif. Neural Networks ICANN ’99, IEE, 1999: pp. 850–855. 10.1049/cp:19991218.

H.M , G.E.A. , Menon

V.K.

and S.K.P. , NSE Stock Market Prediction Using Deep-Learning Models, Procedia Comput. Sci.132 (2018) 1351–1362. 10.1016/J.PROCS.2018.05.050.

Chung

, Gulcehre

, Cho

, Bengio

, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, ArXiv Prepr. ArXiv1412.3555. (2014), 1–9. 10.1109/IJCNN.2015.7280624.

10.

Jozefowicz

, Zaremba

, Sutskever

, An empirical exploration of recurrent network architectures, in: Int. Conf. Mach. Learn., 2015: pp. 2342–2350. 10.1109/CVPR.2015.7298761.

11.

Huang

, Zhu

and Siew

, Extreme Learning Machine?: A New Learning Scheme of Feedforward Neural Networks, in: Int. Jt. Conf. Neural Networks, IEEE, Budapest, Hungary, 2004: pp. 985–990.