Abstract
Wind turbine power curve provides technical specification of the wind turbine in the form of nominal wind power readings. This information may used to monitor the performance of the power system, estimate the power produced by the turbine, optimize the operational cost, and improve the reliability of the power system. However, this information is not sufficient to accomplish these tasks. To accomplish these tasks, the accurate modeling of the wind power curve is required. In this article, various curve fitting techniques, namely polynomial regression, locally weighted polynomial regression, spline regression, piecewise polynomial regression, and smoothing spline, have been applied to model the power curve of wind turbine. All these techniques have been used to model the power curve on National Renewable Energy Laboratory (NREL) 2012 dataset with site-id 124693.
Keywords
Introduction
Wind energy holds a promising answer to the world which is threatened by an energy crisis and other related environmental disasters. To make wind industry a reliable source of energy, optimization in the efficiency of the power plants is the need of the hour. Research reported in Chowdhury et al. (2013), Long et al. (2015), Abdulrahman et al. (2017), and Morshedizadeh et al. (2017) has given emphasis on end-to-end controlling of operational wind farm to optimize the efficiency of the power generation. This includes design of wind farm, design of individual turbine, types of turbines to use in the farm based on geographical location and past climatic conditions, and finally power generation performance monitoring of operational wind farm. This work analyzes wide category of models for the accurate monitoring of the performance of wind power generation and produces a quantized comparison of the applied modeling techniques.
In wind energy industry, the performance of a wind turbine can be very suitably modeled using the concept of power curve (Carrillo et al. 2013; Lydia et al. 2014; Sohoni et al., 2016). Wind turbine power curve shows a relationship between wind velocity and power generated; the more accurate the curve the more precise will be the power prediction. Hence, developing accurate models for power curves is a very important area of research. Previously, most of the research was focused on developing power curve obtained from the data provided by the manufacturer of the given wind turbine (Diaf et al., 2008; Nand Kishore and Fernandez, 2011; Yang et al., 2003). Thaper et al. (2011) have applied various techniques on manufacturer data to obtain a good quality power curve. In their work, they have developed a machine-specific power curve which exhibits same behavior while operating in dissimilar site conditions at different locations. For this reason, in power curve modeling, a machine-specific curve is required where machine-specific conditions as well as site-specific conditions are considered in modeling approaches.
This article begins with an overview of polynomial regression (Shokrzadeh et al., 2014; Wadhvani et al., 2017) which has been used to fit the empirical power curve of a wind turbine using actual data. Here, one of the problems with the polynomial regression is that it shows over-fitting with higher degree polynomials. Along with this, the curve shows quick changes for the anomalies in the data. The anomalies problem of the polynomial has been solved by the locally weighted polynomial (Cleveland and Devlin, 1988), but it does not provide flexibility to the curvature of the graph. The piecewise polynomial and splines (Ai et al., 2003; Belhamel et al., 2007; Llombart et al., 2005) provided the flexibility and solves the over-fitting problem but shows erratic behavior at the boundary of the knots. Therefore, smoothing spline (Hollander et al., 2014) has been applied to provide the smoothness to the curve by tuning the smoothing parameter. The smoothness depends on the value of the smoothing parameter. The value of smoothing parameter varies from zero to infinity generating smoothing spline of infinite-dimension. This problem can be resolved by applying natural spline which helps to explicitly define the smoothing spline of finite-dimension. All the models have been analyzed and implemented on SCADA dataset collected from NREL dataset of year 2012 having site-id 124693 (National Renewable Energy Laboratory (NREL), 2012).
Wind power prediction using statistical models
In this section, various statistical methods, namely polynomial regression, piecewise polynomial regression, spline regression, and smoothing spline regression, have been applied to fit the empirical power curve to wind power data.
Polynomial regression
Polynomial regression establishes the relation between explanatory variable and dependent variable using polynomial function. It can be expressed as
Then the goodness of fit of this function to the observed data can be assessed by computing the sum of squared deviations. The least-squares method is to estimate β by minimizing the residual sum of squares (RSS). The residual sum of error can be expressed as
The value of the parameters that is β can be calculated by taking n − 1 partial differentiation. The partial differentiation can be shown as
The value of the parameters, that is, β, can be calculated by taking the k – 1 partial differentiation. The partial differentiation can be shown as
where XT stands for the transpose of X. Polynomial can efficiently approximate the data but it is sensitive to the anomalies in the observed dataset and the also suffers from over-fitting and under-fitting problem. Under-fitting occurs if the data are fitted with the lower degree polynomial. The increase in the degree might improve the performance of the polynomial fit but suffers from over-fitting problem. Figure 1 shows the two different plots. Left panel of Figure 1 shows the under-fitting problem of the polynomial. Similarly, the plot at the right panel shows the over-fitting problem of the polynomial.

Left panel of figure shows that if the data are fitted with the 3° polynomial it under-fits the data. The efficiency of the model improves as the degree of the polynomial increases but it over-fits the data as shown in right panel of figure. The dataset used in above figure is NREL 2012 dataset with site-id 124693.
The polynomial regression offers a simple approach to fit complex datasets but it is not very accurate and adaptive for datasets which change very often when applied using a low degree curve. That is why we have to increase the degree of curve for better fitting but it also increases the time taken in fitting. There is also a sudden increase in the error rates after a certain degree which shows that there is only a fine band of polynomials available for the dataset which makes this method even more inaccurate to use.
Locally weighted polynomial regression
In Cleveland (1979) and Cleveland and Devlin (1988), the authors have applied locally weighted polynomial regression which uses the local regression where every data point is given with the weight. The weight is given to the data points on the basis of its distance to query point. Lowest weight will be given to the data point which lies far away from the targeted function and highest weight will be given to the data point closest to the target point. The curve is constructed with the data points which are closest to the target points. The anomalies and the over-fitting problem of the polynomial can be solved by locally weighted polynomial regression using least square method (Cleveland, 1979; National Renewable Energy Laboratory (NREL) 2012). The problem formulation can be done by calculating
where WS(x0) =diag(K(x0, x1)…, K(x0, xn)) is the diagonal matrix and K(x0, x1) is the smoothing kernel function. The solution to the above problem is
where β is matrix form of the parameter.
Locally weighted polynomial may solve the anomalies problem of the polynomial but it does not solve the over-fitting problem of the polynomial. Figure 2 shows the curve fitting using locally weighted polynomial. Left panel of Figure 2 shows the fitting with the curve with 3° locally weighted polynomial and right panel shows that the curve fitting with the 10° locally weighted polynomial. The fitting with the 10° polynomial is better than the 3° polynomial and both the curve does not change quickly with the anomalies of the observed data. The 10° locally weighted polynomial may have improved the performance of the 3° polynomial removing the sensitivity to the anomalies but cannot solve the over-fitting problem of the polynomial. In Figure 2, right panel still shows the slight over-fitting problem.

Locally weighted polynomial using NREL 2012 dataset using site-id 124693: left panel of the figure shows the 3° locally weighted polynomial whereas right panel of figure shows the 10° locally weighted polynomial.
The locally weighted polynomial regression gives a higher degree of control over the fitting process and also allows us to keep the complexity low over the dataset. But this in turn decreases the flexibility of the fitting model for that dataset and can increase error rates in some cases. In such cases, the polynomial models are a better choice than the locally weighted ones.
Piecewise polynomial regression
The main disadvantage of the locally polynomial is that it does not provide the flexibility to the curvature of the curve. Therefore, to provide more flexibility and get efficient result, piecewise polynomial is a good choice. In piecewise polynomial, the domain of the function x is divide into contiguous interval. The fitting between the pieces is done with the different polynomial functions between every interval. The functions for each interval are
The each piece of the model can be estimated using the least square method. It can be formulated as
The constraints on the basis functions
where (.)+ denotes the positive part and (X − ξ2)+ is the truncated polynomial. The generalization of the piecewise polynomial is done using spline regression with some added constraint.
Spline regression
Spline regression is generalized form of the piecewise polynomial regression. It is the most popular statistical method of curve fitting and produces the efficacious results in estimating the nature of the curve. It can be represented as
where k is the degree of the spline and K is the number of knots,
Smoothing spline
The erratic behavior of the spline can be solved by smoothing spline. Smoothing spline uses the regularization method to solve the complexity problem of the models. It fits the curve focusing on the two terms: first one is the closeness to the data and the second term controls the curvature of the function. The smoothing spline can be represented as
where RSS(f, λ) is the residual sum of error and λ is the smoothing parameter. The smoothing parameter provides the trade-off between the two terms. The two special cases of smoothing parameter are as follows:
Case 1: when λ = 0, function tries to interpolate the data.
Case2: when λ =∞, function is the simple least square fit.
The smoothness of the curve vary from very rough to very smooth depends on the values of the smoothing parameter. The criterion of equation (11) is defined on an infinite-dimensional function space. Natural spline is a good choice to reduce this criterion in infinite-dimensional function space. The solution can be written as:
where
where
where
Smoothing spline using the smoothing parameter solved the problem of the spline but does not provide the proficient results. The solution to the infinite-dimensional problem can be solved using the natural spline. Figure 3 shows that the cubic spline result is erratic than the smoothing spline. The result of the smoothing spline is smoother and better than the cubic spline.

Piecewise polynomial regression using NREL 2012 of site-id 124693: (a) the cubic spline and (b) smoothing spline.
This method is mainly used when the nature of dataset is ever-changing which makes using a single model on the dataset inefficient. This method takes more time than the above mentioned ones but also results in better accuracy and fit in predictions.
Real data application
This section provides a practical application regarding the modeling techniques that have been discussed above. A case study that involves the real data of the wind farms in North America is presented. The datasets are obtained from resource file of National Renewable Energy Laboratory (NREL) (National Renewable Energy Laboratory (NREL), 2012) having site-id 124693. The geographical location of the site-id 124693 is longitude −120.005463 and latitude 46.901657 which has average wind speed of 6.91 m/s and the capacity factor equals to 0.359. NREL obtained all these observations from SCADA (supervisory control and data acquisition) system’s wind plant which is located at height of 100 meters. Data observations are recorded from January 2012 to December 2012 in form of pairs having 10 min average wind speed and corresponding 10 min average power output.
After developing the model that represents the behavior of the actual data, selection of suitable criteria to assess the ability of a model to generalize is also important. To assess the model on basis of goodness-of-fit parameters, a loss function is required which judges the difference between estimated and true value. As in Kohavi (1995), mean average error (MAE) and root mean square error (RMSE) have been applied as a loss function for measuring errors between target value p and prediction model
where
The experimental results by applying polynomial regression, locally weighted polynomial regression, cubic spline, and smoothing spline methods on different loss functions, are shown in Table 1. Comparative analysis is done on the basis of parameters like MAE, RMSE, MSE, and R2 score.
Results obtained for various curve fitting approaches to NREL 2012 Dataset of site-id 124693.
MAE: mean average error; RMSE: root mean square error; MSE: mean square error.
In the above table, the 3° polynomial is acting as a base case. From the table, it can be seen that in all cases, polynomials and weighted polynomials with lower degree show very high error and the goodness of fit is average as well. Furthermore, weighted polynomial is having similar fitting in terms of all the used metrics and does not show any improvement as the preprocessed data observations over the period of one complete year are normally distributed. In addition, significant variation is seen in the error rate when modeling is done using cubic spline regression. At the same time, drop in the R2 values results in poor fitting of the model. To the contrast, 10° polynomial performs best as it shows lower degree of error rate, higher degree of goodness of fit as well as high R2 score. Significant improvement is seen when model is fit with 3° polynomial and 3° weighted polynomial. But no significant improvements were seen when modeling is done using 10° polynomial and 10° weighted polynomial. So, hereby we can conclude that 10° polynomial gives best fit for the given model.
Conclusion
This article presented a comparative study of different empirical power curve modeling techniques for predicting the output power of the turbine as a function of the wind speed. The performance of various methods is analyzed with reference to the real data of the wind farms in North America, obtained from resource file of National Renewable Energy Laboratory (NREL) having site-id 124693. Overall, the polynomial model is the simplest one taking very little time to fit. In practice, lower degree polynomial show high variance in modeling which is not ideal for practical applications. In all cases, the results show that the accuracy of the weighted polynomial curve fitting over the simple polynomial does not improve significantly. The improvements are achieved by applying higher degree polynomial. The results demonstrate that the 100 polynomial and its weighted version show low error and higher R2 score. Finally, it can be concluded that if data observations favor simple model, imposing penalty does not improve the result significantly.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
