Abstract
This article presents the Recurrent Neural Network (RNN) and its Attention mechanism to develop forecasting models for renewable energy applications. In this study, wind speed and solar irradiance forecasting models have been developed as these two factors play a significant role in renewable energy production. The irregular nature of wind poses the challenge of accurate wind speed prediction, while solar irradiance forecasting can aid in the planning and deployment of solar power plants. In this paper, six RNN techniques, namely RNN, GRU, LSTM, Content-based Attention, Luong Attention, and Self-Attention based RNN are considered for forecasting the future values of wind speed and solar irradiance in particular geographical locations. The aim is the identification of the advantages, comparison, and importance of different recurrent neural network methods for forecasting models. All models are developed on the datasets of the National Renewable Energy Laboratory (NREL) and NASA’s Prediction of Worldwide Energy Resource (POWER).
Keywords
Introduction
Increasing amounts of environmental pollution and depletion of fossil fuel reserves have led to the search for clean energy sources. Wind and solar power are inexhaustible and clean sources are considered potential energy resources with rapid growth worldwide. The design, installation, and running costs of wind and solar power plants are very high. Several hundred individual wind turbines constitute these wind farms. In order to make a profit from these farms qualitatively, consistent excellent performance is a necessity. Optimization in the placement criteria of turbines and wind farm layout design can maximize the production of operational wind farms. At the design stage, the issues related to placement criteria of turbines and layout of the wind farm are addressed in Chowdhury et al. (2013) and Elkinton et al. (2008), respectively. In research of estimating wind turbine power output, Thapar et al. (2011), Wadhvani et al. (2018), and Dongre et al. (2019) observed that the power produced at wind power plants have a strong relationship with the wind speed of the reported site. Most of the time, the random nature of wind hinders the process of power estimation. Algorithms applied to predict future energy production cannot provide good accuracy, especially in areas with frequent altitude variations. Kushwah et al. (2020) stated that wind speed forecasting plays a significant role in producing cost-effective energy as it is beneficial in the planning and operation process of power generation. In the case of solar energy, the Photovoltaic (PV) system is necessary for converting solar energy to electricity. PV power causes volatility to the grid, which then affects grid stability (Kumar et al., 2020). The power grid operator requires forecasts of production to ensure a cost-effective and secure electricity supply system. Amrouche et al. (2013) stated that there exists a linear correlation between the power output of the PV modules and solar irradiance. Sharma and Kakkar (2017) stated solar irradiance forecasting could be applied for different applications in solar energy systems such as maintenance of stability and regulation, monitoring, and management of scheduling and unit commitment.
Previously several studies were undertaken to develop accurate time series models for wind speed (Mostafaeipour et al., 2019, Kushwah and Wadhvani, 2020) and solar irradiance (Amrouche and Pivert, 2014; Thapar, 2019) forecast applications. Blanchard and Samanta (2019) have applied a nonlinear autoregressive neural network with exogenous inputs for wind speed forecasting, proving that these models perform better than the baseline models. Sheela and Deepa (2013) reviewed wind speed prediction models based on neural networks for the past 15 years concluding that the neural network models performed better than the non-neural network models. Yagli et al. (2019) conducted a study utilizing the irradiance data derived from satellites from many locations on 68 different models based on machine learning. The research concluded that the Multi-Layer Perceptron (MLP) were among the best performing models for assessing the performance of the model. Following this work, the Neural Network (NN) models being utilized here focuses on short-term forecasting based on satellite-derived data. The neural network models do not require data to be stationary and work on nonlinear layers. These are also enormously capable of evaluating complex data structures, capable of reconstructing a data-driven noisy system. This makes the neural networks suitable for variable and complex time series forecasting.
Alzahrani et al. (2017) proposed a solar irradiance prediction approach using deep Recurrent Neural Networks (RNN) to improve the complexity of the model and enable extraction of high-level features. This method performed with better accuracy than conventional support vector machines and feedforward neural networks. The RNN (Rumelhart et al.,1986) architecture accounts for dependencies between data nodes by preserving sequential information in an inner state. However, RNN is prone to exploding and vanishing gradients. As such, variants of RNN were developed, such as Long Short-Term Memory (LSTM) networks (Hochreiter et al., 1997) and Gated Recurrent Units (GRU) by replacement of the conventional perceptron architecture with gating mechanisms that regulate the flow of information across the network. These variants are used widely in forecasting applications.
While designing a cost-effective power system, the first significant issue in the literature is that reliable meteorological data should be available readily to create a techno-economically viable power system. Due to difficulties in installation, maintenance, and the high cost of measuring these data, they are either unavailable or partially available at the installation site. Hence, demand exists for developing alternative ways to predict them. After several studies, it has been observed that the existing methods of wind speed and solar irradiance forecasting suffer from low accuracy of forecasts and less significant data scalability. Further, it has been observed that comparing time series forecasting methods and finding the technique that can provide the future’s best prediction is essential. This paper starts by providing an overview of the RNN models used for forecasting. The RNN variants for time series forecasting, namely, RNN, LSTM, GRU, and Attention-based RNNs have been used to model the data. Attention-based time series models utilize complex and nonlinear interdependencies between time steps as well as time series to predict future data values. Application of the Attention mechanism to deep neural network models allows the network to focus adaptively on input features that are of more importance to the current output and mitigate other features’ interference. An empirical investigation has been conducted on the data provided by NREL and NASA. Finally, the developed models’ performance has been measured using the criteria of mean absolute error (MAE), and root mean squared error (RMSE).
Methodology for forecasting
Deep Neural Networks are a class of artificial neural networks that consist of many layers of perceptrons. It consists of an input, hidden, and an output layer. Recurrent Neural Networks (Unnikrishnan et al., 1994) are different from other deep neural networks due to their capacity for sending information over time-steps. Recurrent Neural Networks can be considered as an optimization over programs (Patterson and Gibson, 2017). This section presents an overview of the deep neural network mechanism of Recurrent Neural Networks, its variants and the Attention-based networks used for forecasting wind speed and solar irradiance.
Recurrent neural network
Recurrent neural networks perform modeling of the time aspect of data by the creation of cycles in the network. RNN is a special type of neural network that accounts for data dependencies between nodes. It preserves the sequential information in an inner state, allowing them to persist the knowledge accrued from subsequent time steps (Goodfellow et al., 2016).
Equation (1) represents the working of an RNN cell with
Long short term memory network
LSTM (Gers et al., 2000) introduces computational components added to an RNN cell, input gate, output gate, and forget gate. The equation for the hidden vector is changed for LSTM through the use of a longer term of memory. The LSTM operations are specifically for having fine-grained data control. The equations for the forward pass are represented in equation (2):
The present input and previous state are processed by
Gated recurrent unit network
The gated recurrent unit (Cho et al., 2014) is a simplification of the LSTM, without the use of cell states explicitly. A single vector is formed from the state vectors. Equation (3) presents the equations of a GRU cell.
A gate controller controls the forget gate and input gate. The input gate is opened on output of 1 with the forget gate being closed while an output of 0 causes vice-versa. For the storage of memory, the location is erased for it to be stored.
Attention-based networks for time series forecasting
In a sequence to sequence model, it is the last encoder hidden state which is forwarded to the decoder as a vector representation, providing a numerical summary of an input sequence. As such, in the case of a long input, the decoder receives only one vector representation which results in forgetting. This led to the introduction of Attention which acts as an interface providing the decoder information from each encoder hidden state. This allows the model to focus on useful portions of the input sequence selectively (Brahma and Wadhvani, 2020a).
In general sequence to sequence models, only the first decoder hidden state is used to predict the outputs. However, in an Attention mechanism, a score is calculated between an encoder and decoder hidden state. After the score calculation, these are put into a softmax layer, to represent Attention distribution. Now, these softmax scores are again multiplied with the encoder hidden states to get the alignment vector. The alignment vectors are then summed to get the context vector, which is fed to the decoder for achieving the output (Shih et al., 2019).
In equation (4), x is input,
Bahdanau and Luong attention
Bahdanau Attention (Bahdanau et al., 2014) was one of the earliest forms of Attention which introduced the concept of Attention. The weights are adjusted, aligned, which is directly responsible for the score while training the model. The encoder and decoder consist of Gated Recurrent Units, while the scoring function is additive.
Luong Attention (Luong et al., 2015) was introduced for simplification and generalization of the Bahdanau Attention. Here, the encoder and decoder are both stacked LSTMs. The score function can be additive, dot product, location-based or general.
Content-based attention
The Attention vectors in cosine scoring or content-based Attention (Graves et al., 2014) are based on key and memory rows similarity. It performs the computation of the cosine similarity, as shown in equation (7), after which the softmax function’s normalization is done.
Self-attention
Self-Attention (Cheng et al., 2016) shares the concepts to Attention fundamentally taking in n inputs and returning n outputs. The self-attention mechanism allows the inputs to interact with self and find out the one to pay more Attention to. The outputs are aggregates of these interactions and Attention scores. It consists of a query, key and value and learns the correlation between the current and past values (Vaswani et al., 2017). The equation below shows the Attention score function with Q, K, and V representing Query, Key, and Value respectively,
Proposed methodology
The model architecture followed here is categorized into two types, the memory-based RNN consisting of vanilla RNN, LSTM, and GRU. The Attention-based RNN consists of Luong, content-based, and self-attention. The input is a univariate dataset separated by appropriate window lags. Figure 1 represents the model architecture followed for future prediction of wind speed and solar irradiance data. RNN represents the models since it is the RNN variants that are being evaluated in this work. The layers denote the hidden layers that constitute a deep neural network. Finally, the output is the future value predicted for a particular time horizon.

Memory-based RNN architecture for wind speed forecasting.
Figure 2 represents the Attention-based architecture for forecasting. The difference between the two architectures is the presence of the Attention layer. The hidden layer before the Attention layer consists of RNN nodes. However, the layer after does not necessarily require to be an RNN layer. The fully connected dense layer can also be used as the last hidden layer. As such, this layer is represented as DNN in the figure shown below.

Attention-based RNN architecture for wind speed forecasting.
The wind speed forecast modeling in the univariate scenario takes the wind speed data at t-1+△ time, say
Empirical investigation with real data
This section represents and discusses the experimental results achieved on the different wind speed and solar irradiance datasets. Firstly, the dataset details used for experimental purposes are discussed. Then, performance evaluation metrics for testing the developed models is described. Finally, result analysis is performed on the performed experiments.
Dataset
The datasets used in the experiments are taken from the NREL (National Renewable Energy Laboratory [NREL], 2012) and NASA’s POWER (Prediction of Worldwide Energy Resources [POWER], 2019) data repository. The task of forecasting wind speed and solar irradiance is performed here considering two applications of renewable energy. The work focuses on univariate forecasting for different time horizons. The proposed methodology is the recurrent neural network forecasting technique, with comparisons between memory-based and Attention-based methods. The wind speed data is taken from wind farms with latitude and longitude (46.95513535,−107.0999908) and (48.80713654,−106.3518372), respectively, while the solar irradiance data was adopted from sites with coordinates (22.71961, 75.85771) and (23.25991, 77.41261) respectively. The characteristics of the wind speed and solar irradiance data are represented in Tables 1 and 2 for the different sites on which the experiments are performed.
Wind speed data characteristics.
Solar irradiance data characteristics.
The wind speed data is an average of 5-min wind speed, while the solar irradiance data consists of daily data. For our experiments, observations are divided into training, validation, and testing pairs.
Performance evaluation metrics
The developed model is required to characterize the underlying patterns of actual data. Model evaluation is required for testing the model’s capacity to generalize. Hence, performance evaluation parameters are essential. For comparison of the models, based on the test performance, the metrics used are Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). Table 3 state the equations of the performance metrics used for the evaluation of forecast performance where
Performance evaluation metrics.
MAE measures the vertical distance between the predicted and actual values, while RMSE provides an error insight. Lower MAE and RMSE values denote better performance.
Results analysis
This section presents the results and analysis of the experiments performed on the wind and solar datasets. Tables 4 and 5 show the results for wind forecasting for the different number of hidden layers and a horizon length of 1 and 6 hours for both the datasets, respectively. Similarly, Tables 6 and 7 present the results for solar forecasting datasets with 1 day ahead and 4 days ahead forecasting. Tables 8 and 9 depict the results for the persistence, DNN, and Support Vector Regression (SVR) forecast models in comparison to the RNN based models. The best performing models in each case is represented in bold in Tables 4–9.
Comparison of prediction errors for different forecast models on first wind speed dataset for different hidden layers and horizon length.
Comparison of prediction errors for different forecast models on second wind speed dataset for different hidden layers and horizon length.
Comparison of prediction errors for different forecast models on first solar irradiance dataset for different hidden layers and horizon length.
Comparison of prediction errors for different forecast models on second solar irradiance dataset for different hidden layers and horizon length.
Performance comparison for forecast models on wind speed datasets.
Performance comparison for forecast models on solar irradiance datasets.
Table 4 shows the forecast performance for the first wind speed dataset predicting 1 hour ahead and 6 hours ahead values. The table also shows the performance of the models on modifying the number of hidden layers. For 1 hour ahead of forecasting, it can be observed that the GRU model with three hidden layers performs the best. Other models also perform with good performance metrics, however, the GRU outperforms other models in this case. Similarly, for 6 hours ahead forecasting, it is the self-attention based models showing the best results. For the second wind speed dataset, the Attention-based models perform the best in both cases of different horizons. Also, increasing the number of hidden layers doesn’t contribute to better performance for forecasting one step ahead. A hidden layer number of 2 performs better in all the one-hour ahead forecasting cases than three hidden layers. For 1 hour ahead of forecasting, the cosine-based Attention gives the best performance metrics values, while for 6 hours ahead of forecasting, the Luong Attention performs the best. The RNN models, thus, are capable of performing well for the purpose of wind speed forecasting.
In solar irradiance forecasting, the GRU and LSTM perform better for one step ahead forecasting, while the Attention-based models show superior performance in multi-step ahead forecasting. The solar irradiance data consists of strong seasonality and is of a complex nature. But, the RNN models are able to model the data from both locations with good prediction performance. Daily data was utilized to forecast 1 day ahead and 4 days ahead of solar irradiance values. The shorter horizon performs better similar to the case of wind speed forecasting. However, hourly data was present in the case of wind time series. Despite the difference in time intervals and applications, the models prove to perform in a generalized manner in both the case.
The performance of models in case of one step ahead forecast is better than performance for multi-time step ahead. This is so since the recurrent neural networks are sequential models with the capacity of understanding temporal dependencies. All the RNN models can perform exceptionally well for the shorter horizon of time. The different variants were developed to overcome the challenges of vanishing and exploding gradients, such that the models could remember for a longer period of time. It also implies that models performing better in longer horizons are capable of capturing long-term dependencies more efficiently. The results for both wind speed and solar irradiance forecasting cases indicate the superior performance of Attention-based models. The performance achieved by the Attention mechanism is better in all the cases of time series forecasting for the longer horizon of time.
Modifying the number of hidden layers show changes in the performance metrics. In some cases, a lower number of hidden layers perform well, while in other cases, increasing the number of layers also improves the forecast performance. Regularization in the form of dropout is also applied to the models. This allows the model to overcome underfitting and overfitting, generalizing according to the data.
It can also be observed that the models’ performance accuracy is better in case of wind speed datasets compared to solar irradiance. This is so since the wind speed datasets are available in 5-minute intervals while solar irradiance datasets are updated every day. This data precision causes the difference in performance, indicating that data availability at shorter intervals of time can improve forecast performance.
Furthermore, Tables 8 and 9 represent the forecast results of the recurrent neural network models in comparison with the existing time series forecasting models of persistence model, Support Vector Regression (SVR) model, and Deep Neural Network (DNN) models. The forecast performance is shown for one step ahead forecasting in both the datasets. As it can be observed, it is the GRU, Attention-based mechanism, and LSTM that performs the best. The models in comparison are not able to perform as accurately as the RNN based models.
Overall, the RNN models, both memory-based and Attention-based, were used for modeling the wind and solar datasets with nonlinear, complex, and seasonal characteristics. The models could understand the temporal dependencies in the datasets and use them for forecast applications. The Attention-based mechanisms can capture the long term dependencies in multiple time steps and prove to be an enhancement to the traditional memory-based RNN since they outperform the traditional RNN models in many cases. Overall, the RNN models and Attention mechanisms can be utilized for practical wind speed and solar irradiance forecasting in a generalized and cost-effective manner.
Conclusion
Forecasting of wind speed and solar irradiance is required for assessing energy and selection of a site for a wind farm and solar parks. Popular recurrent neural network methods, namely RNN, GRU, LSTM, Content-based Attention, Luong Attention, and Self-Attention based RNN, are applied to model the forecast applications. The models were able to detect the underlying patterns of the datasets and forecast future values. The Attention models also captured the nonlinear and temporal dependencies of the wind speed and solar irradiance datasets. They were able to outperform the traditional RNN models in many cases. Also, the GRU model proved to be an effective forecast model for solar irradiance forecasts. Hence, the recurrent neural network model variants show good forecast performance capturing the long-term temporal dependencies, and the Attention mechanism proves to an enhancement to the memory-based RNNs. This review can be utilized for the planning of power stations and research developments.
Footnotes
Acknowledgements
The data used in the research were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The data used in the research were obtained from the NASA Langley Research Center (LaRC) POWER Project funded through the NASA Earth Science/Applied Science Program. needs to be mentioned in the acknowledgment section, since it was mentioned in the official NASA POWER project’s website (
) to acknowledge through this sentence when using their dataset.
