Abstract
This work focused on the prediction of generation of renewable energy (solar and wind) using the machine learning ML algorithms. Prediction of generation are very important to design the better microgrids storage. The various ML algorithms are as logistic regression LR and random forest RA and the ARIMA, time series algorithms. The performance of each algorithm is evaluated using the mean absolute error, mean squared error, root mean squared error, and mean absolute percentage error. The MAE value for the ARIMA (0.06 and 0.20) model for solar and wind energy is very less as compared to RF (15.65 and 61.73) and LR (15.78 and 54.65) of solar and wind energy. Same with MSE and RMSE, the MSE and RMSE value for the ARIMA of solar energy model obtained is 0.01 and 0.08 and wind energy is 0.07 and 0.27 respectively. Comparative analysis of all of these matrices of each algorithm for both the dataset, we concluded that the ARIMA model is best fit for the forecasting of solar energy and wind energy.
Keywords
Introduction
Electricity is generated from a variety of sources, including hydropower, nuclear power, and renewable energy. Coal, oil, or natural gas are used to produce power, account for 1/3rd of global greenhouse gas productions. It is vital to raise citizens’ quality of life by supplying clean and efficient electricity. To achieve the country’s economic development goals, India’s energy demand is increasing A growing supply of energy is a basic requirement for a nation’s economic success. The National Electricity Plan of the Ministry of Power has created a 10-year plan of action with the purpose of offering power across the nation, and also provide an idea to ensure that power is supplied to residents efficiently and at an affordable price. In this way, the govt will accelerate and globalize the transition to RE technology in order to attain imperishable development. Renewable energy can be considered as having capacity to reach the demand of power and also minimize the pollution (Ahmad et al., 2020; Alkandari and Ahmad, 2020; Ikram, 2021).
To save energy residents have motivated to use wind, solar as these are inexhaustible in nature. It goes without saying that sustainable energy is fewer costlier and hazardous. India has an aim of achieving 175 GW energy from renewable by 2022. Solar accounting for 100 GW, 10, 60 and 5 GW accounts for biomass, wind and hydro respectively. Renewable energy comes from natural processes that are renewed on a regular basis. The majority of renewable energy sources are environmentally beneficial and help to reduce carbon emissions, which in turn helps to battle global warming.
Renewable energy is seen as the most encouraging replacement of fossil fuel since it is clean, green, and regenerated over a large geographic region; yet, it also introduces unplanned uncertainty, endangering energy reliability and stability, particularly when it comes to large-scale renewable energy integration (Qiao, 2019). In the latest decades, the electricity market has moved its attention to renewable energy sources to minimize its greenhouse discharge during power production. Solar and wind have regularly been used as a combination due to their RE variety and reliability. Several approaches for predicting RE have been grown over time, all of which have emphasized on the efficiency of estimation techniques with no or small concern for the environmental conditions. Renewable energy can efficiently cut fossil energy use, minimize pollution, and promote the healthy growth of the social economy (Fathollahzadeh et al., 2021; Jia et al., 2021).
Due to the variable character of power production from solar and wind, the operators must regulate and maintain power system properly. The electricity production of photovoltaic-wind must be forecasted to schedule the power transmission for long and short term (Somu et al., 2021). Energy Prediction using Hybrid Renewable Energy Systems proposed. Figure 1 represent classification of generation or power based on time. (Bakhtiari et al., 2021; Lee et al., 2019; Rahman et al., 2021; Somu et al., 2021).

Horizon and timeframe of prediction.
LR: Linear Regression
RF: Random Forest
ARIMA: Autoregressive integrated moving average
AR: Autoregressive etc. … Machine learning (ML) approaches are now widely used in a variety of renewable energy-related applications, including the growth of energy and unification, utilization, and forecasting (Hosein et al., 2020). The remaining part of this study is presented as follows: A proposed plan is conducted in Part 2. Study design and methodology is described in Part 3. At last, Part 4 and 5 brings the paper to an end with results and conclusion.
Proposed plan
In this study, for the prediction of Solar and Wind Energy, we are using three machine learning algorithms.
ARIMA
Logistic Regression
Random Forest
After applying these algorithms, we are checking the performance of each algorithms using following matrices:
Mean Absolute Error
Mean Squared Error
Root Mean Squared Error
Mean Absolute Percentage Error
Figure 2 represents Working flow for this study.

Flowchart for proposed model.
ARIMA
ARIMA is used for time series analysis in which observations are collected in series at specified time intervals. With the help of these series, we can predict the next values on the basis of past data. Generally, in time series, we have only two variables – Time and the feature that we want to forecast. ARIMA is a combination of two models – a. AR (Autoregressive), b. MA (Moving Average)
ARIMA has three hyperparameters that need to be optimized in order to achieve better performance.
(i) p (Autoregressive lags),
(ii) d (Order of differentiating),
(iii) q (Moving Average lags)
Before applying any time series model, stationarity of data is evaluated. Stationary means that over different period of time data should have –
(i) Constant mean, (ii) Constant variance and standard deviation, (iii) Auto-covariance.
Linear regression (LR)
Linear regression is used to predict the value based on some input features. Training is done at the beginning with input and output features. Linear regression is of two types.
Simple linear regression
In this case, only one input feature is taken into consideration with one output variable.
Where,
m is the parameter that need to be optimized in order to find best fit line
C is the bias that need to be optimized in order to find best fit line.
Multiple linear regression (MLR)
It has multiple input variables with one output variable. The equation for multi regression is:
a, b, c… are the parameter that need to be optimized in order to find best fit line
C is the bias that need to be optimized in order to find best fit line.
Random forest
It builds decision trees based upon the samples in classification. One of the key features of this algorithm has to be the ability of handling data sets having continuous variables for the cases of regression and categorical variables for classification. Having said that, this algorithm gives better results for classification problems as compared to regression (Liaw and Wiener, 2002).
To be able to understand the working of random forest methods, a concept called the ensemble technique should be known. Ensemble utilizes two types of methods:
Mean absolute error
MAE is used for measuring the absolute error difference between predicted values and actual values.
Absolute error equation:
Mean absolute error equation:
Where:
∑ is the summation notation, n is the total number of observations,
Mean squared error
MSE is a statistical method used to find the squared error between predicted values and actual values. Less the value of MSE, better will be the performance of the machine learning model.
Root mean squared error
If the value of RMSE is less, then the performance of machine learning will be better.
Mean absolute percentage error
MAPE is used to calculate the performance of machine learning algorithms but instead of accuracy, it measures the accuracy percentage of machine learning algorithms.
Equation for MAPE is:
Study design and methodology
(i) Data collection – For this study, we have collected solar and wind energy generation data in MU (million units) on a daily basis from the National Load Dispatch Centre. Total 1912 observations are collected for both solar and wind energy generation starting from 2017-01-01 to 2022-03-27.
(ii) Tools used – Following tools will be used for the implementation of this study: For machine learning implementation, python programing language is used because of its simplicity, compactness and better readability.
Analysis descriptive summary of data
As we can see from the Table 1, the total number of observed values for both the data is 1912. The maximum unit energy generated for solar energy is 266 MU while for wind energy is 541 MU. Standard deviation of solar energy is 55.19 while for wind is 100.66, that is, there is much variance in wind data so it would be difficult for machine learning algorithms to perform well for wind data.
Descriptive summary of solar and wind energy generation.
Figures 3 and 4. shows the rolling mean and rolling standard deviation of solar energy and wind energy generation. Both rolling mean and standard deviation are changing over time for solar energy, that is, they are not constant therefore, the solar energy are non-stationary in nature, that is, the values of solar energy generation are changing with time. However, the wind generation data looks somewhat non-stationary in nature.

Rolling mean and standard deviation of solar energy generation (MU).

Rolling mean and standard deviation of wind energy generation (MU).
Data transformation to achieve stationarity
To make the data stationary, first we’ll take the log of our data then we will use a differencing method to achieve stationarity. For the accuracy purpose, we will take logarithmic values for wind data as well even though it is already stationary. To make this stationary, we will use a differencing method in which we will take the difference between the log values and moving average values of solar energy generation and wind energy generation.
Log scale(L) = stationary part(L1) + trend (LT)
moving avg of log scale(A) = stationary part (A1) + trend (AT)
result series (R) = L − A = (L1 + LT) − (A1 + AT) = (L1 − A1) + (LT − AT)
Now, both log scale(L) and moving av of log scale(A) is part of series and moving average therefore, the trend will be almost the same, that is, LT - AT = 0 and trend component will be almost removed.
R = L1 – A1
Following graphs (Figures 5 and 6) shows the representation of each energy after removal of trend.

Rolling mean and standard deviation of solar energy generation (MU).

Rolling mean and standard of wind energy generation (MU).
Black line of above graphs represents the original value, that is, result series (r) which is L1 – A1. Above graph seems stationary in nature that is, their mean and standard deviation after some interval is almost same as any particular interval.
Building time series model
In this study, we are using time series models like AR, MA, and ARIMA for forecasting the solar and wind energy generation 1 year ahead in future. Thus, to get the better model, we will compare all these three models RSS (Residual Sum of Squares) value to get the better model for forecasting the energy generation.
Residual Sum of Squares is used to estimate the variance between the fitted line and the regression function.
Where:
From the Tables 2 and 3, the RSS value of ARIMA is lower than the RSS value of both AR and MA therefore, ARIMA will perform better for forecasting both the solar and wind energy generation as compared to AR and MA.
RSS Value of time series model for solar energy.
RSS Value of time series model for wind energy.
Linear regression model
Now that we have prepared our model for the ARIMA model, let’s see how linear regression is working on this problem. For linear regression inputs and outputs, we selected our input as the “Solar energy generation (MU)” and output as Average of “Solar energy generation (MU) after 365 days.” Since, we have only two variables in this problem therefore, this is a simple linear regression problem. The equation for simple linear regression is -
m is the slope of line that need to be optimized
c is intercept of line that need to be optimized
Solar energy
After fitting the linear regression model to our data, the obtained values of m and c for solar energy are 0.88 and 46.87 respectively. Therefore, the final equation of linear regression for solar energy generation will be-
Thus, from the above equation, as the value of
Wind energy
Similarly, for wind energy generation, the obtained m and c values are 0.704 and 53.49 respectively. Therefore, equation of wind energy generation for linear regression will be –
Here also the value of
Random forest model
Next, we are using the Random Forest machine learning model.
Random forest is a very popular and powerful machine learning algorithm which works on combination of multiple decision trees. For this study, we are using a combination of 500 decision trees for preparation of a random forest model. One such tree for both solar and wind energy is shown below in Figure 7 .

Decision tree of Random Forest for first estimator of solar energy.
In the same ways decision tree of Random Forest for first estimator of wind energy Can also presented.
Results and discussion
Visualizations
Now that we have prepared all the algorithms, we are now predicting or forecasting the result of energy generation for 1 year ahead in future. Collected data last date is 2022-03-27 and 1 year prediction, that is, on 2023-03-27, the graph for this prediction is shown below using each algorithm for both the energies generation.
ARIMA
Figure 8 shows the confidence interval of predicted values, that is, the forecasting results will range in shaded regions with the probability of 95%. Figures 8 and 9 represent ARIMA forecast trend of solar and wind energy generation.

ARIMA forecast trend of solar energy generation.

ARIMA forecast trend of wind energy generation.
Random forest
Figure 10 and 11 represents Random forest prediction for solar & wing energy generation.

Random forest prediction for solar energy generation.

Random forest prediction for wind energy generation.
Linear regression
Figures 12 and 13 represent Linear regression model prediction for solar and wind energy generation.

Linear regression model prediction for solar energy generation.

Linear regression model prediction for wind generation.
Prediction results
The value of solar and wind energy generation according to ARIMA, Random Forest and Linear regression on 2022-03-27 is shown in above table (Table 4).
Predicted values, Mean and Std for solar and wind energy using ARIMA, RF, and LR.
From the above table, we have compared predicted result of each model for both the energies on 2023-03-27. But the values of each model are varying therefore, we have to find the best model for the prediction of solar and wind energy generations. For that we are going to compare all these algorithms by using some useful metrics like MAE, MSE, RMSE, MAPE (Table 5).
Evaluating the performance of ARIMA, Random Forest and Linear Regression using metrics like MAE, MSE, RMSE, and MAPE.
Main Model issue is to training of time-series RE data efficiently form the large number of data set and the selection of proper network, applied algorithm for the forecast output. It is changing with the specific dataset.
Data collection
For this study, we have collected solar and wind energy generation data in MU (million units) on a daily basis from the National Load Dispatch Center and GitHub. Total 1912 observations are collected for both solar and wind energy generation starting from 2017-01-01 to 2022-03-27.
Tools used:
Following tools will be used for the implementation of this study: For machine learning implementation, python programing language is used because of its simplicity, compactness and better readability. Python programing language with following libraries of python is used: Sklearn, Pandas, Numpy, Matplotlib, Seaborn, time
Literature analysis
A broad analysis of prediction of RE depending on deep neural network, ANN, FUZZY, LSTM, ARIMA, CNN, Spatio-Temporal Methods, XG Boost Model, LightG, RM, etc techniques are mentioned in a tabular form in Table 6 and additional techniques for investigating its efficacy and applications.
Summary of models and sources used in prediction.
Conclusion
In this study, we have proposed various machine learning and time series algorithms in order to predict the Solar and Wind generations which can help to ensure the adequate resource size of energy storages in microgrids.
Therefore, our main objective for this study was to forecast or predict the renewable energy generations for 1 year ahead in future, and for that we have used ARIMA (Autoregressive integrated moving average) model, Linear regression and Random Forest. Then, we have compared these algorithms using certain performance metrics like MAE, MSE, RMSE, MAPE (see Table 5). The Lower the values of these matrices will be, the better will be the performance of the algorithm. From Table 5, The MAE value for the ARIMA (0.06 and 0.20) model for solar and wind energy is very less as compared to Random Forest (15.65 and 61.73) and Linear Regression (15.78 and 54.65) of solar and wind energy. Same with MSE and RMSE, the MSE and RMSE value for the ARIMA of solar energy model obtained is 0.01 and0.08 and wind energy is 0.07 and 0.27 respectively, which is very low as compared to the MSE, RMSE values for Random Forest and Linear Regression for both solar and wind energy. But the MAPE value for ARIMA (32.07 and 21.42) is relatively higher than the MAPE value for Random Forest (0.12 and 0.40) and Linear regression (0.12 and 0.35).
After comparing all of these matrices of each algorithm for both the dataset, we concluded that the ARIMA model is best fit for the forecasting of renewable energy. In future, for the prediction or forecasting of solar and wind, we can use other machine learning time series models like VAR (Vector Autoregression), SARIMAX, and Ensemble learning like gradient boosting, voting classifier, XGBoost classifier, ADA boost classifier in order to get the minimum values for MAE, MSE, RMSE, that is, to achieve better performance of the model.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
