Abstract
This work uses the previously developed mixed distribution, however, with added a parameter and function. We introduce the improved mixed distribution model as an alternative wind speed distribution, which combines three distributions. Four extreme value distributions are considered to model mixed distribution, improved mixed distribution, and estimate wind speed. The efficiency of univariate, mixed, and improved mixed distributions on actual wind speed data from China’s diverse regions is estimated. Using the minimal standard error of fit (SEF) test, 65.5% of samples are better suitable for the improved mixed distribution. The best model’s return period is also calculated. Moreover, the addition of another parameter and function in mixed distributions has improved the results. This statistical analysis concluded that improved mixed distributions are more suitable and accurate for assessing wind speed data.
Introduction
Renewable energy accounted for 18.1% of the total energy consumption in 2017 (Nemitallah et al., 2020), and the market share is continuously growing and replacing fossil-based power generation, which is much needed to minimize climate change impacts and to accomplish United Nations’ sustainable development goal #7 (Affordable and Clean Energy) (Al Huneidi et al., 2022; Qadir et al., 2021). Solar and wind-based power generations are the primary renewable resources because of the significant advancements and the decreasing tariff in the past decade (Qadir et al., 2020; Tahir et al., 2022). China is amongst the economic hubs with a population exceeding 1.4 billion, and with the growing energy needs, further installation of 1000 GW is required in the coming decades. In addition, efforts are being made to reduce energy consumption per capita and to establish more renewable-based power plants to lower the environmental effects (Imteyaz et al., 2021; Sahu, 2018). It has been estimated that the wind power potential in China is around 4350 GW, out of which 1200 GW is commercially viable (Zhang and Li, 2012). In 2016, the installed capacity of wind power in China was 23.32 GW (Sahu, 2018). As more wind farms will be set up in the coming years, wind speed estimation with good accuracy is required for feasibility assessment and future forecasting. Furthermore, the wind speed analysis can also provide the likelihood of hurricanes/disasters (Cui et al., 2019).
Wind speed is an essential factor in determining a region’s wind energy capacity. Wind speed data is typically analyzed using statistical distributions (Holmes, 2019; Jung and Schindler, 2019). The wind power estimation is formulated as an explicit function of the calculated parameters of the wind speed distribution. As a result, the types of data, selecting a suitable distribution and estimating wind speed distribution are all critical for accurately estimating a region’s wind energy capacity (Tahir et al., 2020).
Numerous studies have been carried out to model wind speed distributions and variations (Carta et al., 2009; Perrin et al., 2006; Simiu et al., 2001). The wind speed’s probability density function (PDF) is also vital in many wind power generation applications. The published research in the literature for renewable energy suggests various PDFs to define the wind speed distribution. Different probability distributions such as Gumbel, Fréchet, Weibull, and Generalized extreme values have been used to fit the wind data (Jung and Schindler, 2019). Some studies (Dukes and Palutikof, 1995; Heckert et al., 1998; Simiu and Heckert, 1996; Simiu et al., 2001) were based on the epoch and peak threshold (POT) method for modeling extreme wind speeds, for which the Weibull distribution was considered better than the Gumbel distribution.
Pashardes and Christofides (1995) conducted a statistical study of Cyprus’ wind speed and path, which laid a meteorological basis for evaluating Cyprus’ wind energy resources and provided appropriate data estimation of potential wind energy. Carta et al. (2009) studied probability distributions for the wind speed analysis. The statistical examination was performed to see if the wind speed data sample originates from a population with a specific probability distribution. Dorvlo (2002) examined the wind speed data for four different Oman sites and used two parameters, Weibull distribution, for modeling. He evaluated the parameters by means of regression and moments method and found the Chi-square method suitable for improved approximations. Jung and Schindler (2019) carried out a comprehensive literature survey focusing on several statistical distributions for wind speed analysis. In their review, they studied distributions, parameter estimations, and goodness of fit metrics. They recommended that the available methods should be improved to accurately predict wind speeds.
Numerous studies have examined a variety of mixture distributions for the purpose of modeling wind speed data (Qin et al., 2012; Shi et al., 2015; Shin et al., 2016). Previously published research has used the Extreme value and Normal (N) distributions as density components of mixture distributions. These components’ mixture distributions resulted in a good fit to wind speed data.
Mood et al. (1974) presented mixed distribution to model data samples from two populations by:
Where
Furthermore, parameter estimation methodology for distributional models is one of the subjects discussed in conjunction with wind speed distribution determination. Since estimated wind energy is dependent on estimated distributional parameters, accurate estimation of distributional parameters is critical. Several estimation techniques were used and compared based on several parameters. The fmincon function in MATLAB, a function minimizer with linear and nonlinear constraints, is one of the most flexible parameter optimizations.
The improved mixed distribution, which is a combination of three distributions, is proposed for the first time in this study as an alternative to commonly used wind speed distributions and evaluated to demonstrate its capability in estimating wind speed data. The improved mixed distribution has more shape and scale parameters and rising, decreasing, and upside down hazard functions. The rest of this paper is laid out as follows. Section 2 presents commonly used Extreme Value distributions, and their mixed distribution estimation procedures are introduced for a description of wind speed distribution. Section 3 presents the analysis and results obtained from a MATLAB program. Finally, in Section 4, the investigation is concluded with various outcomes.
Materials and methods
Problem definition
This study modified the previous mixed distribution model (1) with one more function in order to generate the new, improved mixed distribution model. A relevant parameter is employed to model the data samples from the three populations to improve the effect and obtain better results.
Where
Data selection
The data used in this article mainly includes meteorological data. The meteorological data comes from “China Surface Meteorological Observation Monthly Data Set 3.0” collected by the National Meteorological Information Center of China Meteorological Administration (China Meteorological Data Service Center, n.d.). These data sets contain monthly observation data from 29 national weather stations from 1951 to 2015, including the capital, provincial capitals, municipalities, autonomous regions, and special administrative regions (SAR), as shown in Figure 1. These elements include maximum wind speed, monthly mean wind speed, monthly minimum wind speed, etc. Since most weather stations in China were built between 1951 and 1960, the data’s lengths vary from 11 to 64 years. The number of wind speed observations available varies from station to station. The missing data will be replaced with the average of adjacent years, as shown in Table 1.

Geographical area of study (China) (China Map, 2011).
Summary of wind speed data at different locations in China.
Methodology
Jung et al. (2017) pointed out that accurate wind speed distribution modeling is a pivotal step in estimating the average wind power generation output. Thus, selecting the most appropriate distribution that best fits the wind speed data at a specific site is significant. For this reason, this study considered four extreme value distributions, including Gumbel, Fréchet, Weibull, and generalized extreme value distributions, to generate mixed distributions and simulate wind speed data. In this section, firstly, substitute equations (3)–(6) into equations (1) and (2) to produce mixed distributions. Secondly, a thorough explanation of the selected distribution functions and the associated parameter estimation method is discussed.
Univariable extreme value distributions
Extreme value distributions have been commonly employed to fit extreme wind speed distributions. These distributions are applied to the extreme values since they can be attained as the limit distribution (as
The functional equation’s solution must satisfy extreme values is called the General Extreme Value (GEV) distribution. The shape parameter
Gumbel distribution
There is strong evidence in the literature survey that recommends using the Gumbel distribution to fit extreme events (Garciano and Koike, 2007; Kunz et al., 2010). For defining probabilistic properties, this distribution is useful (Hosking and Wallis, 1987). The probability density function (PDF) of the Gumbel distribution is:
Where
Fréchet distribution
The Fréchet distribution often provides the best approximation of the structure’s maximum load distribution (R, 1990; Van Den Brink and Können, 2008). The PDF of the Fréchet distribution is
Where
Weibull distribution
This distribution has been used to fit source data, that is, actual wind data, for many years. The probability distribution function can be used to analyze wind data characteristics in any area (Jaramillo and Borja, 2004; Ouarda et al., 2016). The PDF of the Weibull distribution is:
Where
Generalized extreme value distribution (GEV)
Dukes and Palutikof (1995) employed the GEV to analyze extreme wind speeds as an alternative distribution to Weibull. This distribution incorporates the extreme value distributions of Gumbel, Fréchet, and Weibull. However, the best distribution for statistical extremes cannot be decided. The PDF of the GEV is:
Where
Mixed distributions (with the combination of two distributions)
Mixed Gumbel and Weibull distributions (GW): If
Where
Mixed Gumbel and Fréchet distributions (GF): If
Where
Mixed Gumbel and Generalized Extreme Value (GEV) distributions (GGEV): If
Where
Mixed GEV and Fréchet distributions (GEVF): If
Where
Mixed Weibull and Fréchet distributions (WF): If
Where
Mixed Weibull and GEV distributions (WGEV): If
Where
Improved mixed distributions (with the combination of three distributions)
Mixed Gumbel, Fréchet, and Generalized extreme value (GEV) distributions (GFGEV): If
Where
Mixed Gumbel, Fréchet, and Weibull distributions (GFW): If
where
Mixed Gumbel, Weibull, and GEV distributions (GWGEV): If
Where
Mixed Weibull, Fréchet and GEV distributions (WFGEV): If
Where
Fourteen extreme univariate and mixed distributions have unknown parameters. Due to the mathematical expression’s complexity and the unknown parameters, the fmincon optimization technique will determine the unknown parameters. The best distributed univariate or mixed variable can be selected according to the minimum standard fitting error (SEF) criterion (Kite, 1988):
where
Parameters estimation
After selecting the main form of the model’s functional part, the next step in the model building process is to estimate the function’s unknown parameters. Usually, this is achieved by using an optimization method. An objective function associates the response variable with the functional part of a model comprising unknown parameters, resulting in parameter estimates. Close to the actual unknown parameter value. In general, terms, unknown parameters are regarded as variables to be calculated by optimization. At this step of the solving procedure, the data is used as the objective function’s known coefficients. One of the most versatile parameter optimizations is the fmincon function in MATLAB and a function minimizer with linear and nonlinear constraints. A nonlinear programming solver finds the minimum of a problem specified by:
Where
Recurrence interval
One of the essential goals of frequency analysis is to calculate the return period or recurrence interval. If the event with variable
The precipitation associated with the 50- or 100-year average payback period cannot be directly estimated from the data set. It must still be extracted from the 98th percentile and 99th percentile of the fitted distribution, respectively (i.e.
Where
Results and discussion
In this section, 14 univariates and mixed distributions have been used. The best univariate or mixed distribution will be selected by testing the minimum standard error of fit (SEF). This study discusses some stations and their parameters and finds out the suitable distributions along with their return period values. First, this section will discuss the big data station and its parameters. Also, find out their return values according to the best-fit models. Secondly, this study will also show the parameters and return values based on the best-fit models for the short sample data station. Finally, best-fit distributions and return values for all the stations are presented.
As it is known, the global optimum cannot be ensured in any multivariate constrained nonlinear optimization technique. Therefore, caution is needed to avoid local optima. It is always recommended to start with a set of initial parameters. The initial parameters can be computed with the Method of Moments (Bowman and Shenton, 1998). After obtaining initial values, the parameters were estimated for each univariate and mixed distribution by the nonlinear optimization fmincon technique. For example, in the Beijing station, for the case of the Weibull distribution, the mean is equal to 2.39 m/s and the standard deviation equal to 0.68 m/s. With these values and by using the Standard Deviation Method (STDM) (Shaban et al., 2020), in equations (22) and (23), the initial parameters for the Weibull distribution are computed
Where
Figure 2 is a graphical representation of data. The data represents the wind speed of the Beijing station. For instance, Beijing station (Table 2), China’s capital, with 64 years of record, from 1951 to 2015, and 774 observations are listed. The optimum univariate fit was found with the distribution of the Generalized Extreme value, SEF = 0.92 m/s and

Annual mean wind speed series of Beijing station during 1961–2012 (China Meteorological Data Service Center, n.d.).
Parameters of the Beijing station.
Figure 3 shows the PDF graph fitting capabilities of both univariate and mixed distributions, including Weibull, Gumbel, Fréchet, GEV, GW, GF, GGEV, GEVF, WF, WGEV, GFGEV, GFW, GWGEV, and WFGEV distributions on wind speed data measured in the Beijing station. It can be seen from Figure 3 that for the peak wind data, GWGEV shows a better fit than the widely accepted reference distribution in terms of wind speed characteristics. Besides, Figure 4(a) to (f) shows PDF graphs of actual and projected data for Weibull, Gumbel, Fréchet, GEV, GW, and GF distributions. Figure 5(a) to (f) shows PDF graphs of actual and projected data for GGEV, GEVF, WF, WGEV, GFGEV, and GFW distributions. Figure 6(a) and (b) shows PDF graphs of actual and projected data for GWGEV and WFGEV distributions. Each distribution was determined with the predicted values, and the SEF values were found.

PDF graphs of all univariate and mixed distributions for Beijing station.

PDF graphs of the actual and predicted data: (a) Weibull distribution, (b) Gumbel distribution, (c) Fréchet distribution, (d) GEV distribution, (e) GW distribution, (f) GF distribution.

PDF graphs of the actual and predicted data: (a) GGEV distribution, (b) GEVF distribution, (c) WF distribution,(d) WGEV distribution, (e) GFGEV distribution, and (f) GFW distribution.

PDF graphs of the actual and predicted data: (a) GWGEV distribution and (b) WFGEV distribution.
The means and standard deviations of the Beijing station for 14 univariate and mixed distributions (combining two and three distribution models) are shown in Table 3. The GWGEV distribution in Beijing station has the lowest standard deviation, indicating that data is more closely clustered around the mean and hence more dependable. On the other hand, the Fréchet distribution exhibits a negative standard deviation since the data generated by it contains some negative values, indicating that the Fréchet distribution does not fit the data.
Statistics of Beijing station.
For another case, Wenjiang District station (Table 4) with 11 years of record and 138 observations. The better univariate fit was found with the GEV distribution, SEF = 0.32 m/s and
Parameters of Wenjiang District station.
Table 5 shows the final value of the return level. The return periods of 2, 5, 10, 20, 50, and 100 years are shown vertically from top to bottom. The return level of the best-fit distribution for each site is shown here. For example, for Beijing Station, which is best-fitted by GWGEV distribution, the return periods for wind speeds of 2, 5, 10, 50, and 100 years are 2.28, 2.84, 3.19, 3.49, 3.8, and 3.95 m/s respectively.
The return level of the best univariate or mixed distribution for each wind station (m/s).
The presented methods have few restrictions, such as numerical complexity, and depend on the number of iterations. The drawbacks also included the parameters calculated when the collected wind data was not strong, which suffered from accuracy. The gaging instrument must be correctly calibrated before evaluating the data to obtain improved performance.
For each evaluated station, the mixed and univariate return level
For the case of mixed distributions with the combination of two distributions, 10 out of 29 stations, or 34.48% of the samples showed better fitting with the GGEV distribution, 9 out of 29 stations, or 31.03% with the GW distribution, 7 out of 29 stations, or 24.13% with the WGEV distribution, 2 out of 29 stations, or 6.9% with the GF distribution, and 1 out of 29 stations, or 3.44% with the WF distribution.
Mixed distribution with the combination of three distributions, in this case, 14 out of 29 stations, or 48.3% of the samples showed better fitting with the GFGEV distribution, 6 out of 29 stations, or 20.7% with GFW and GWGEV distributions each, while 3 out of 29 stations, or 10.3% with the WFGEV distribution.
As in the previous study (Carlos Agustín, 2013), the following results are obtained when only univariate and mixed distributions with the combination of two distributions are considered: 24 out of 29 stations, or 82.75% of the samples, showed fits better with univraite distributions, including 20 stations better fitting with the GEV distribution and 4 stations with the Weibull distribution. While the rest, 17.24% of samples, or 5 out of 29 stations, fit better with the mixed distributions with the combination of two distributions, including two stations with GW, two stations with GGEV, and one station with WGEV distibution.
This study made a comparison between univariate and mixed distributions (combining two and three distributions models). Due to mathematical formulas’ complexity and the unknown parameters, MATLAB nonlinear optimization with the fmincon technique is used to determine the unknown parameters. The best model selected based on a minimum fitted standard error (SEF). We found that the improved mixed distribution model with the mixture of three distributions is more accurate than univariate distributions and the previously mixed distribution model with the mixture of two distributions. This experiment shows that the improved mixed distribution model fitted better in 19 wind stations out of 29 wind stations. The rest four stations fitted with the previously mixed distribution model and six stations with univariate distributions. 65.5% of samples were fitted better for the improved mixed distributions model, 13.8% with the previous mixed distribution model, and 20.7% with the univariate distributions. The Observations from this experiment’s findings suggest that the GFGEV is the most fully fitted distribution with eight wind stations. However, GFW and GEV fitted with six stations, four stations fitted with GWGEV, two stations fitted with both GGEV, and the GW, WGEV, and WFGEV distributions fitted with one wind station.
Conclusion
In this study, the wind speed is estimated by using univariates and mixed distributions models. At first, the unknown parameters for each probability density function are evaluated by implementing the fmincon technique. Then, random numbers are generated by each distribution using parameters. After that, the SEF method was used to select the best models and found their return values.
The accuracy of 14 parametric distributions for analyzing the wind speed distribution was investigated at 29 locations in China, the capitals of different states of China. The monthly wind speed data from 1951 to 2015 is used for analysis. For this huge data set, the newly proposed mixed distribution model is found to be better than the other distributions for wind speed estimation. The results show that GFGEV is the most fitted distribution. However, this work also shows that it is challenging to identify a single distribution that can be fitted universally. GFGEV, GFW, GWGEV, and GEV distributions are the most commonly used evaluation and recommendation distributions in absolute terms. Overall, 65.5% of samples were fitted better for mixed distributions with a combination of three distributions. If just univariate distributions were applied to the study of wind speed data, 86.2% of the samples would be better fitted by the GEV distribution.
This study finds that the proposed mixed distribution model has accurate return values and the best fitted among other distributions by comparing 14 univariates and mixed distributions while using any data length. These results suggest that the improved mixed distribution (with three distributions or more) must be employed as an additional mathematical tool when analyzing extreme wind speeds.
Footnotes
Appendix
Authors contributions
All authors contributed to the study’s conception and design. Material preparation, writing-original draft preparation, investigation, data collection, and analysis were performed by Hassan Tahir. The visualization was done by Arsalan Ahmed. Supervision and writing-reviewing were performed by Weidong Zhao, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical Approval
All the ethics codes are observed carefully in this study.
