Accurate wind speed forecasting through ensemble models: A real-world case study in Sfax,Tunisia

Abstract

The inherent variability of wind energy necessitates precise forecasting for its effective grid integration. This study investigates machine learning techniques for wind speed forecasting, applying Random Forest, AdaBoost, and Support Vector Regression (SVR), alongside a novel two-layer stacking ensemble model developed to leverage their combined strengths. The models were trained and validated using meteorological data from the National School of Electronics and Telecommunications of Sfax (ENET'Com) for October 2024 and May 2025. The ensemble model consistently outperformed the base models, achieving a Mean Absolute Error (MAE) of 0.255, a Root Mean Square Error (RMSE) of 0.334, and an R² of 0.801 for October. High accuracy was maintained in May, with an MAE of 0.314, an RMSE of 0.429, and an R² of 0.745. These findings validate the efficacy of advanced machine learning, particularly ensemble methods, for enhancing wind energy’s predictability and reliability.

Keywords

machine learning forecasting SVR random forest AdaBoost

Introduction

Given the growing demand for electricity all over the world, national and international policies are moving towards the use of renewable energies, namely, wind energy. Truthful predictions of renewable energy as wind are becoming integrated into the power system and distribution infrastructure. In fact, the major challenge for the transmission and distribution operators is to equalize supply and demand in the market. Wind energy has been considered as a sustainable energy that achieves remarkable development due to its efficiency, affordability, and free-pollution power (Perry Sadorsky, 2021). Moreover, the use of wind turbines in generating electricity has rewarded significant attention thanks to their green power outcome and potential contribution to decarbonization efforts. Given its instability and stochastic nature, generated wind energy requires improvement in management before being injected into the electrical grid (Gilles Notton et al., 2018). Nevertheless, efficient management requires wind power prediction so that to avoid energy loss and plan energy exchange with the electric grid (Brahmi and Chaabene, 2025).

While efficiency improvements in wind generation devices are expected to moderate the energy increasing demand, researchers warn that these gains alone may not be enough to offset the continuous electrification of society (Hamed Rahmani et al., 2023). Driven by factors like the electrification of transportation, including electric vehicles, and the emergence of new electronic devices, global energy demand is projected to keep rising year after year (Michaelides, 2021; Tamor and Stechel, 2022). This trend underscores the urgency of developing new energy sources, especially considering estimates from the World Energy Forum that non-renewable resources like coal, gas, and oil will be depleted within 100 years (Xihui Haviour Chen et al., 2023). With fossil fuels currently accounting for 79% of the world’s primary energy consumption (Ruoso et al., 2024), renewable energy sources like wind power offer a promising solution (Cristea et al., 2022; Paraschiv and Paraschiv, 2023).

Over the coming decades, the transition from fossil-based generation to sustainable energy systems will rely not only on the deployment of clean energy technologies but also on the implementation of responsible energy consumption practices (Paraschiv Spiru, 2023). Among renewable resources, wind energy has emerged as a reliable and abundant option, contributing approximately 743 GW to global electricity generation, accounting for more than 6% of the world’s total electricity supply in 2020 (Cristea et al., 2021). The continued development of large-scale wind farms is expected to play a pivotal role in addressing future energy demand (Paraschiv et al., 2020). Nevertheless, the inherent variability and intermittency of wind power pose significant challenges for its integration into existing power grids. Consequently, accurate wind speed forecasting has become essential for enhancing grid integration, improving energy dispatch strategies, and mitigating the adverse impacts of wind power fluctuations on system reliability and stability (Cristea et al., 2021), .As large-scale wind farms are grid-connected, their inherent variability in energy output becomes magnified (Zhou & Wang, 2021). This intermittency, characterized by significant fluctuations in wind speed, involves issues in grid stability (Li and Liu, 2022; Nabiha Brahmi et al., 2017). To ensure smooth integration and minimize the impact of these unforecastable shifts, accurate wind speed prediction and proactive risk management are crucial (Liu and Zhang, 2022; Wang and Chen, 2023). By quantifying the inherent uncertainty of wind power, we can better anticipate and mitigate disruptions, paving the way for seamless grid integration of this clean energy source.

The wind energy produced depends not only on the number of geographic locations but also on the number and size of installed windmills. Besides, the non-stationarity of wind energy amount expects a correct prediction to ease its adoption in energy business. Wind speed is influenced by a variety of factors such as temperature, pressure, and weather conditions, which are difficult to model using conventional mathematical approaches. Furthermore, wind speed exhibits non-linear and non-stationary behavior, making it particularly challenging to capture its patterns and fluctuations accurately.

Conventionally, meteorological forecasting models have relied on physical and statistical approaches. Physical models, such as numerical weather prediction models, attempt to simulate the complex interactions between different weather variables based on the laws of physics. However, these models present limitations related to the representation of atmospheric processes and the availability of accurate initial conditions. Statistical models, on the other hand, exploit historical data patterns to forecast future outcomes. While statistical models have been widely used in wind speed prediction, they often struggle to capture the non-linear relationships and temporal dependencies in the data, leading to limited forecasting performance.

Accurate wind speed prediction seems to be elementary for estimating wind energy potential, despite the inherent challenges posed by its chaotic and stochastic behavior (Nabiha Brahmi et al., 2023). Based on the prediction horizon, wind speed forecasts can be classified into short-term (minutes to hours), medium-term (hours to a week), and long-term (weeks to years). The available forecasting methods fall into three main categories: physical models as weather research and forecasting (WRF) (Li and Liu, 2022; Qin and Li, 2015), statistical models similar to autoregressive moving average (ARMA) (Zhang and Wang, 2020), and machine learning (ML) models such as support vector regression (SVR).

Considering recent advancements in artificial intelligence, the pursuit of enhanced forecasting accuracy has evolved into a complex and demanding challenge. Nevertheless, accurate forecasting remains a fundamental component in ensuring autonomous operation, operational stability, predictive maintenance, and overall efficiency of wind energy systems. Machine learning techniques play a crucial role in the functionality of forecasting to avoid irregularities and correct precision through its algorithms. Literally, those approaches offer advantages such as pattern recognition, flexibility, ensemble learning, feature selection, scalability, and continuous learning. By recognizing complex patterns and non-linear dependencies in wind speed data, machine learning models can enhance prediction accuracy. The ensemble learning methods, such as AdaBoost, combine multiple models to reduce errors and improve overall accuracy. Furthermore, machine learning algorithms automatically select relevant features, eliminate noise, and handle large datasets efficiently, capturing spatial dependencies and enhancing precision. Continuous learning capabilities allow models to adapt to new data, ensuring up-to-date and accurate predictions. Leveraging these capabilities can support better decision-making in renewable energy planning and operational strategies. The aim of this survey is to introduce a comparative analysis of machine learning algorithms used in wind speed forecast. Validation is based on data collected from ENET’COM Sfax, Tunisia. This research employs a comprehensive dataset covering the period from July 2024 to May 2025.

Primarily, the significance of precise wind speed forecasting is emphasized, and the limitations of traditional empirical and numerical models are discussed, positioning machine learning, particularly ensemble learning, as a promising alternative to enhance accuracy. Subsequently, the evolution of wind forecasting is reviewed, comparing conventional methods to modern data-driven approaches, and clearly defining the study’s motivation, objectives, and scope. Thereafter, the proposed ensemble learning-based methodology is detailed, which employs a two-layer stacking framework: SVR, AdaBoost, and Random Forest as base learners, with a Random Forest meta-learner for optimal prediction integration. The model is trained using real meteorological data from ENET’COM Sfax, Tunisia. Following this, the selected models are explained, including their theoretical principles, relevance to wind forecasting, and optimized hyperparameters. Then, experimental results are presented using MAE, RMSE, and R² metrics, demonstrating that the ensemble model consistently outperforms standalone methods, validating the effectiveness of the stacking approach. Ultimately, key findings are summarized, the advantages of ensemble learning in wind energy forecasting are reinforced, and future research directions are suggested to further improve predictive performance.

Advancements in wind forecasting: From conventional methods to machine learning approaches

Accurate wind speed and power generation forecasts are crucial for various sectors, including energy planning, grid management, aviation safety, and disaster response. This study explores the impact of day-night variations and probabilistic forecasting models on improving forecast accuracy. The research also highlights the benefits of using wind farm variability data and offshore wind data for more reliable predictions.

Conventional methods of wind forecasting

Conventional methods of wind forecasting are based on developed meteorological techniques and models that have been established over many years. In this paper, we exhibit empirical models and Numerical Weather Prediction Models.

Empirical models

Various models and algorithms have been employed to enhance wind speed prediction accuracy and wind power forecasts including autoregressive moving average (ARMA), autoregressive and GARCH models, automatic learning and retrieval of weather information, and wind speed and wind power dynamics models. The fundamental-based models proposed for wind power forecasts are more accurate than the statistical-based models. However, the statistical models are the most commercially used forecasting methods for wind power due to their simplicity and high speed. Since the phenomena related to wind speed and wind power are random processes as their models are non-linear and non-Gaussian distributions, the time series methods could not capture the complete complexities of wind speed series. On the contrary, due to the complex, non-stationary, and turbulent nature of the wind over time and space, the physics-based models that have been used in NWP systems could not derive the entire information of the wind data.

Wind speed and wind power forecasts are the most important components of wind energy management and grid power scheduling and dispatch (Liu et al., 2023). Due to the inherent uncertainty and intermittency of wind, power systems with high wind penetrations require more reserve capacities to prominent levels in which frequency and intensity of using regulating services (e.g., frequency regulation and plant level reserves) make the grid less economic operational (Zhang and Wang, 2020).

Since reserve capacities are associated with the security of the grid, operational reserve amount must be estimated correctly. In this regard, reducing the uncertainty of wind power forecasts is necessary. In other words, power system operation with higher penetrations of variable power increases the market value of metrics that characterizes the certainty of the forecast (Wang and Chen, 2023). Long duration variations in wind speed are typically forecasted using Numerical Weather Prediction (NWP) models. These models solve non-linear differential equations that simulate the evolving atmosphere and are the most sophisticated and widely recognized meteorological tools for weather forecasting. However, NWP forecast models are characterized by significant numerical uncertainties because of chaos in the atmosphere and limitations in computational power and grid size (Bauer et al., 2015). Consequently, estimating the value for the wind speed at specific locations and times from NWP output fields often results in large biases and poorly defined variances about the mean forecasts. Post processing, a numerical forecast, or an analyzed perturbed ensemble of forecasts, with observed data to create a high-quality estimate of the true wind speed under given conditions in the atmosphere, is necessary.

Wind forecasting is extremely important for efficient operation of wind energy projects. Forecasting of wind energy resources has evolved into its role from its initial need for short-term power scheduling to filling imbalances between supply and demand and providing reserve power (Li and Liu, 2022). Due to the inherent unpredictability of power generation from intermittent resources, accurate forecasts bring a clear challenge to grid and system operators, energy traders, and marketers. Historically, generated wind power has been unpredictable and wind power plants could only be accommodated up to a fraction of the power system load (Zhang and Wang, 2020). However, with the advent of wind power forecasting, large-scale wind-providing capacity has been integrated to power systems.

Numerical weather forecasting models

The situation for NWP parameterization is increasingly rare: a 40-year-old computational kernel that we still expect to take us for at least another decade. The area of operational numerical weather prediction is minor compared to the full picture of potential horizontal scientific and engineering development, but we may expect that the process of intensifying development already seen in the machine learning and predictive science communities will continue in the NWP modeling community. Because there is increasing recognition that operational numerical weather prediction is important outside of the NWP modeling community, it may be that a future era of parameterization development will be associated with the creation of at least one, probably several, organizations dedicated to their ESM for weather prediction and historical reanalysis.

One subsequent development has been to generalize Lorenz’s notion of a convective scale model and a planetary scale model. This separation into the dynamical core and physical parameterization is increasingly less clear and it is recognized that different scales are increasingly interdependent (Bauer et al., 2015). The physical parameterizations mix slow and fast processes and increasingly inflexible partitioning of the system into these two types of processes adds to the costs of NWP. A flexible, deep, model of the type that has been proposed and studied in recent years is one potential solution.

Weather prediction has improved significantly over the past few decades, particularly on short to medium length timescales. The advent of powerful supercomputers and major developments in mathematical and physical techniques in the early 21st century has meant that the current horizons for weather prediction are much longer than when Bjerknes theorized the numerical weather prediction (NWP) problem in the 1910s (Palmer and Hagedorn, 2006). The most important development over the last decade has been machine learning (ML) techniques applied to NWP model output data: the physical data-intensive predictions made by these models seem perfectly suited to ML and a significant high-profile paper substantiated this belief in 2015 (Shi et al., 2015).

Advanced machine learning techniques for wind speed forecasting

A sustainable, mostly steady wind speed and its concentration have introduced a compelling case for numerous power generation mechanisms. Wind energy is now among the most reliable power supply resources with sustainable growth around the globe, particularly in Korea. The electric power generated by wind energy has developed dramatically, as part of an attempt to lower the further climate change and minimize the emissions of electricity: the speed generated by the wind is highly unpredictable and non-linear (Li and Liu, 2022). The annual performance factor and the boosting of everyday effectiveness and the greatest power level of energy are strongly dependent upon the meteorological conditions, especially on the wind speed (Zhang and Wang, 2020). It is therefore important to build and develop additional procedures of more trustworthy and highly efficient estimates for rapid and precise shifting wind power. To optimize the expenditures and enhance further reliability, the amounts of wind power generated can have great consequences for the rapid and precise forecasting of wind speed.

Wind power forecasting is understood to be one of the important studies for commercial and applied applications. In general terms, wind power forecasting can be separated into two kinds of temporary and continuing forecasting. Short-term wind speed forecasting models concentrate on tasks that are expected to create instantaneous forecasts for more than 6 hours in advance, based on the higher economic processes having bigger numbers. As a result, short-term precise wind speed forecasting is also of deep importance and urgency for energy systems and future network uncertainties taking in compelling need of the wind power generator. A much bigger essential notion is wind speed, a vital parameter that surpasses one- or two-hours forecasting periods which have been used in several industries, and it has an impact on numerous economic and general societal operations.

This paper proposes investigating the wind speed (WS) forecasting. Machine learning techniques will be employed to attain the non-linearity of these elements and improve forecasting accuracy. Accurate WS prediction is crucial for integrating wind energy into power grids and optimizing electricity generation.

Single model forecasting approaches

Single-model approaches rely on training one supervised learning algorithm to map input features directly to wind speed. Commonly used algorithms include:

• Support Vector Regression (SVR): An extension of Support Vector Machines to regression tasks, SVR seeks to fit a function within a user-defined error margin (ε) while maximizing flatness. Kernel functions (e.g., radial basis function) allow SVR to handle non-linear relationships effectively.

• Random Forest (RF): An ensemble of decision trees trained on random subsets of the data (bagging). By averaging the predictions of many decorrelated trees, Random Forest reduces variance and improves generalization over a single decision tree.

• Artificial Neural Networks (ANN): Multi-layer perceptron and other neural architectures can model highly non-linear mappings by learning hierarchical feature representations. Their flexibility makes them well suited to complex meteorological datasets, though they require careful tuning to avoid overfitting.

While each of these single models can achieve good accuracy under certain conditions, their performance may degrade when faced with noisy data, strong seasonal variability, or when trained on limited datasets. No single algorithm consistently dominates all geographic locations and temporal horizons.

Ensemble learning techniques in renewable energy forecasting

Ensemble learning methods improve predictive performance by combining multiple base learners, thereby compensating for individual model weaknesses and enhancing overall robustness. Key techniques include:

Bagging (bootstrap aggregating): Multiple instances of the same base learner are trained independently on bootstrapped samples of the training data. Predictions are then aggregated—typically by averaging (regression) or majority vote (classification). Random Forest is a prime example of bagging applied to decision trees.

• Boosting: Models are built sequentially, each new learner focusing on the errors made by the ensemble thus far. By iteratively reweighing mispredicted samples, boosting can achieve high accuracy, though it can be more sensitive to noisy labels. AdaBoost is one of the most well-known boosting algorithms.

• Stacking (Stacked Generalization): Base models (of potentially different types) are trained in parallel. Their predictions become inputs to a higher-level “meta-learner,” which learns how to best combine them into a final forecast. Stacking often yields superior performance by leveraging the diversity of heterogeneous base models.

In the context of wind speed forecasting for renewable energy systems, ensemble methods have proven especially effective at handling heterogeneous data sources and highly variable climatic patterns. By reducing variance (bagging), focusing on difficult cases (boosting), or optimally blending complementary models (stacking), ensembles deliver more accurate and reliable forecasts—critical for the design, operation, and economic planning of hybrid renewable energy installation.

Proposed methodology

The investigation consists of forecasting the wind speed by applying three machine learning algorithms (Figure 1). The selected algorithm that offers the best accuracy will be utilized in various wind energy applications, including sizing, management of wind conversion systems, and control. The meticulously collected and maintained wind data was acquired from a database provided by the real-time acquisition chain installed at ENET’COM, Sfax, Tunisia.

Figure 1.

Proposed approach.

The dataset consists of several meteorological parameters, including wind speed, radiation, rain rate, relative humidity, and wind direction. These input variables serve as key factors influencing the accuracy of the prediction models.

Machine learning models undergo an AI process selection, where their effectiveness in predicting wind speed is evaluated. To enhance forecasting accuracy, an ensemble learning approach is implemented. This involves a multi-layer structure (Layer 1 and Layer 2) where different algorithms contribute to the final prediction through a voting mechanism.

Finally, the model (or combination of models) that offers the highest accuracy is selected to generate the final forecasted wind speed, ensuring optimal performance for wind energy applications.

Data processing is a critical aspect of machine learning (ML) workflows, involving the transformation and manipulation of raw data to make it suitable for analysis and modeling. Proper data processing is essential for allowing the quality and reliability of data input, which directly affects the performance of machine learning models. Algorithms are evaluated on a real-time database from the acquisition chain installed at ENET’COM.

The data processing steps are mentioned in Figure 2.

Figure 2.

Data processing steps.

Overview of methodology

This section outlines the proposed methodology for wind speed forecasting using an ensemble learning approach. The strategy is based on a two-layer stacking model, which aims to improve prediction accuracy by combining the strengths of multiple machine learning models. The methodology begins with data preprocessing and feature extraction, followed by training of three base learners: Support Vector Regression (SVR), AdaBoost, and Random Forest (RF). These models generate preliminary predictions that are then passed to a second-layer meta-learner, which integrates their outputs to produce the final forecast. This architecture is designed to capture both linear and non-linear patterns in the wind speed data.

Ensemble learning strategy: Two-layer stacking model

Stacking is a powerful ensemble learning technique that combines multiple predictive models in a layered architecture. The first layer consists of base learners that independently learn from the training data. The outputs of these models serve as input features for a subsequent model layer, referred to as the meta-learner. This hierarchical structure helps to reduce individual model bias and variance, leading to improved generalization performance.

Layer 1: Base learners (SVR, AdaBoost, Random Forest)

In the first layer, three distinct regression models are used to capture different aspects of the wind speed data:

• Support Vector Regression (SVR) is effective at handling high-dimensional data and capturing complex relationships by using kernel functions.

• AdaBoost improves prediction by focusing on difficult instances, sequentially combining weak learners to form a strong regressor.

• Random Forest (RF) provides robustness through the aggregation of multiple decision trees trained on different subsets of data and features.

Each model is trained independently on the same training dataset and generates a separate wind speed prediction.

Layer 2: Meta-learner (Random Forest integration)

The second layer of the stack employs Random Forest as a meta-learner. It takes the outputs of the base learners as input features and learns to weight and combine them optimally. The choice of Random Forest as the meta-model is motivated by its ability to handle non-linear interactions, prevent overfitting through ensemble averaging, and maintain high predictive performance. This integration layer enhances the overall forecast by correcting potential errors from individual base models and synthesizing complementary patterns learned by each.

Model evaluation metrics (MAE, RMSE, R²)

To evaluate the performance of the proposed stacking model and its individual components, three commonly used regression evaluation metrics are employed: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R²). These metrics offer complementary insights into the accuracy and robustness of predictions.

Mean Absolute Error (MAE) measures the average magnitude of the differences between forecasted and observed values, irrespective of their direction. It is defined as:

M A E = \frac{1}{n} \sum_{1 = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(1)

where

y_{i}

is the actual value and

\hat{y_{i}}

is the predicted value.

Root Mean Square Error (RMSE) yields a higher weight to huge errors due to the squaring operation. It is exceptionally useful when large errors are undesirable:

R M S E = \sqrt{\frac{1}{n} \sum_{1 = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(2)

Coefficient of Determination (R²) evaluates the extent to which the model accounts for the variance in the observed data, ranging from 0 to 1, where higher values signify superior predictive performance:

R^{2} = 1 - \frac{\sum_{1 = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{1 = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(3)

where

{\bar{y}}_{i}

is the meaning of the observed values.

These metrics together deliver a comprehensive evaluation of the predictive accuracy, distribution error, and explanatory power of the models under study.

Machine learning models for wind speed prediction

This research investigates three widely used machine learning algorithms—Random Forest, AdaBoost, and Support Vector Regression (SVR) for wind speed prediction. Furthermore, a two-layer stacking ensemble framework is implemented to integrate these base learners and leveraging their complementary strengths to improve predictive performance and robustness.

Support Vector Regression (SVR) algorithm

Support Vector Regression (SVR) is a robust machine learning technique obtained from the principles of Support Vector Machines (SVM), primarily designed for regression tasks. In the context of wind speed predictions, SVR is particularly advantageous due to its ability to handle non-linear relationships and high-dimensional data. It is widely used in time series analysis and environmental forecasting because of its generalization capability and resistance to overfitting, especially in the presence of noise.

The SVR algorithm operates under the principle of finding a function that approximates the relationship between the input features and the target variable (wind speed) while minimizing prediction errors, constrained by a specified margin of tolerance.

The key steps involved in SVR-based wind speed forecasting are as follows:

(1) Feature Vector Construction: The input space for SVR consists of carefully selected features that influence wind speed. These may include:

•Lagged wind speed values (autoregressive features),

•Meteorological parameters (e.g., temperature, humidity, and pressure),

•Time-related variables (e.g., hour of the day and seasonality indicators),

•Spatial information (if wind speed is recorded across multiple sites).

Kernel Function Selection: Since wind speed data often exhibit non-linear trends, SVR employs kernel functions to map the input data into a higher-dimensional space where linear regression can be performed. The Radial Basis Function (RBF) kernel is commonly used for wind forecasting due to its flexibility and effectiveness in modeling non-linear relationships.

(2) Model Formulation: SVR seeks to find a function $f (x) = 〈 w, ϕ (x) 〉 + b$ that approximates the wind speed while allowing a deviation of at most $ε$ from the true target values. Errors within this margin are ignored, while larger deviations are penalized using a regularization parameter $C$ . The optimization process aims to minimize the model complexity and the forecasting error simultaneously.

(3) Model Training: The training process involves solving a convex optimization problem that balances the trade-off between model flatness and tolerance to forecast error. The parameters $C$ , $ε$ , and kernel-specific parameters (e.g., gamma in RBF) are typically optimized using grid search and cross-validation.

(4) Forecast Generation: Once trained, the SVR model is used to forecast wind speed values for unseen input instances. The resulting model captures both short-term fluctuations and broader temporal trends, making it suitable for medium-range wind speed forecasting.

SVR is especially useful in applications where the dataset is not large but contains complex, non-linear relationships. Its controlled margin and robust generalization ability make it an effective base learner in ensemble architectures, such as stacking models, for enhancing the accuracy and reliability of wind speed forecasting.

AdaBoost regressor algorithm

The AdaBoost Regressor (Adaptive Boosting for Regression) is an ensemble learning method designed to enhance the performance of weak regression models by focusing on difficult-to-forecast data points. In wind speed forecasting, it is particularly effective in handling non-linear patterns and temporal fluctuations by sequentially combining multiple base regressors to produce a more accurate and resilient forecast.

The forecasting process using AdaBoost Regressor involves the following steps:

(1) Initialization of Sample Weights: All observations in the wind speed dataset are initially assigned equal weights. This ensures that the first base learner treats all data points uniformly.

(2) Training of the First Base Regressor: A weak regression model—typically a shallow decision tree (e.g., a decision stump)—is trained on the dataset. The model learns to forecast wind speed based on key input features, such as:

•Historical wind speed measurements (lagged values),

•Hour of the day or day of the year (to capture temporal patterns),

•Meteorological variables (e.g., temperature, pressure, and humidity),

•Geographical and terrain-related data, if available.

(3) Error Evaluation and Weight Update: The forecast errors are computed for each data point, commonly using Mean Squared Error (MSE) or Root Mean Squared Error (RMSE). Instances with larger errors are assigned higher weights, increasing their influence in the next iteration. This allows the algorithm to focus on wind speed patterns that are more difficult to forecast.

(4) Iterative Model Training: Subsequent base regressors are trained on the reweighted dataset, where higher emphasis is placed on previously mis forecasted observations. This iterative process continues for a predefined number of boosting rounds (e.g., 50–200 iterations), or until no significant improvement in forecasting accuracy is observed.

(5) Forecast Aggregation: The final wind speed forecast is generated by combining the outputs of all trained base regressors. Each model’s contribution is weighted based on its individual forecasting accuracy. The aggregated output is a weighted sum of the forecasts from all learners, producing a more stable and precise forecast than any individual model alone.

AdaBoost is particularly well-suited for wind speed time forecasting in environments where patterns are complex and the error distribution is heterogeneous. Its ability to iteratively focus on difficult cases makes it a powerful tool when used alongside other models in a hybrid or stacked ensemble.

Random Forest Regressor Algorithm

Random forest is a specialized ensemble learning algorithm that enhances predictive accuracy and robustness by combining multiple models. Ensemble learning, a powerful machine learning paradigm, is effective for both classification and regression tasks, with this study focusing on regression using time-series data to forecast future values. By training diverse models with different features, algorithms, or hyperparameters, ensemble methods aggregate predictions through techniques like averaging or weighted voting. Common approaches include bagging, which trains models on bootstrapped data; boosting, which iteratively corrects errors from prior models (Figure 3); and stacking, where a meta-model learns to optimally combine base-model predictions. This work employs bagging, adaptive boosting, gradient boosting, extreme gradient boosting (XGBoost), and random forest regressors, with detailed derivations available in Ref. (Asbai and Amrouche, 2017). For comparison, standalone models such as decision trees, LSTMs, and ARIMA are also evaluated to benchmark performance against the ensemble-based approach.

Figure 3.

Random Forest Regressor algorithm.

Experiments, results, and discussion

This section outlines the experimental setup, performance evaluation, and analysis of the results obtained from the proposed two-layer stacking model for wind speed predictions. The results of the base learners (SVR, AdaBoost, and Random Forest) as well as the meta-learner (stacked Random Forest) are compared using standard evaluation metrics: MAE, RMSE, and R²

Experimental setup and data description

The dataset used in this study was collected using a real-time acquisition located at ENET’com, Sfax, Tunisia. The acquisition chain was specifically designed for renewable energy research and provides reliable, high-frequency meteorological measurements.

The recorded variables include:

• Wind speed (m/s)—the primary forecast target, measured at a height of 10 m,

• Air temperature (°C),

• Solar irradiation (W/m²),

• Relative humidity (%),

• Timestamp (hourly resolution).

The dataset features two key representative months, October 2024 (characterizing the transition to the cold season) and May 2025 (signaling the shift to the hot season), to illustrate seasonal variability. (See note below regarding the number of months). These data were acquired under real-world operational conditions and accurately reflect local climatic patterns in the Tunis region. The acquisition system ensures precise and consistent readings, serving as a critical testbed for developing and validating intelligent forecasting and energy management systems within renewable energy applications.

Individual model performance

This subsection presents the forecasting performance evaluation of the three base learners applied independently on the wind speed dataset: Support Vector Regression (SVR), AdaBoost Regressor, and Random Forest Regressor. Each model was trained and evaluated using the preprocessed data described in Section 5, with hyperparameters optimized via grid search and five-fold cross-validation. Table 1 summarizes the results obtained on the test set in terms of Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²).

Table 1.

Comparative performance of single and ensemble models for wind speed forecasting (MAE, RMSE, R²).

Months	Models	MAE	RMSE	R²
October 2024	Random Forest	0.306	0.389	0.731
	AdaBoost	0.280	0.357	0.773
	SVR	0.301	0.410	0.701
	Ensemble Learning	0.255	0.334	0.801
May 2025	Random Forest	0.362	0.465	0.701
	AdaBoost	0.324	0.440	0.732
	SVR	0.372	0.485	0.674
	Ensemble Learning	0.314	0.429	0.745

Performance of the ensemble learning model

Figure 4 presents a comparison between actual wind speed measurements and the predictions generated by four different models—Random Forest, AdaBoost, Support Vector Regression (SVR), and an ensemble learning approach—over the month of October 2024. The actual wind speed is depicted as a continuous reference line, while each model’s prediction is represented by a distinct dashed line. Among the models, the ensemble learning approach demonstrates the closest alignment with the actual wind speed values, effectively capturing both the amplitude and the temporal dynamics of the observed fluctuations. AdaBoost also exhibits a satisfactory performance, particularly in regions of moderate wind speed variation. In contrast, SVR shows more pronounced deviations, frequently overestimating or underestimating peak values. The Random Forest model tends to underestimate wind speed during periods of high variability, suggesting a potential underfitting behavior. Overall, the ensemble model achieves superior predictive accuracy and temporal consistency, supporting its effectiveness in modeling complex wind speed patterns through the integration of multiple base learners.

Figure 4.

Mode-based forecasting of wind speed in October 2024.

Figure 5 illustrates a zoom-in window of a time period of Figure 4 to show the varying predictive capabilities of different machine learning algorithms for wind speed forecasting. While the Ensemble Learning model appears to offer the most robust performance for much of the observed period, all models face challenges in accurately predicting sudden and drastic changes in wind speed, as evidenced by the significant overestimation during the sharp decline on October 19th. This highlights an area for potential future research, focusing on improving model robustness to extreme or atypical meteorological events.

Figure 5.

Detailed wind speed forecast for October 2024 (zoomed view).

Figure 6 offers a recent view across a significant portion of the month of May 2025, demonstrating the general efficacy of all implemented models in tracking the actual wind speed fluctuations. Notably, the Ensemble Learning approach consistently exhibits superior predictive accuracy, evidenced by its close alignment with the observed data throughout this broader temporal span. Conversely, Figure 7 provides a granular examination of model performance during a particularly volatile 6-day period within May 2025. This zoomed-in perspective critically reveals a shared limitation across all models: their discernible difficulty in accurately predicting the precise magnitude and timing of abrupt and substantial decreases in wind speed to minimal values, as exemplified by the significant overestimation observed during the sharp trough on May 18th. These visualizations underline the robust capabilities of ensemble-based methods in general wind speed forecasting, while simultaneously highlighting the persistent challenge in accurately forecasting extreme and rapid meteorological shifts, which warrants further research into model resilience and outlier detection.

Figure 6.

Mode-based forecasting of wind speed in May 2025.

Figure 7.

Detailed wind speed forecast for May 2025 (zoomed view).

The combined analysis, mentioned in Figure 8, demonstrates that Ensemble Learning consistently outperforms individual models (Random Forest, AdaBoost, SVR) in wind speed forecasting for both October 2024 and May 2025, as evidenced by its lower MAE and RMSE and higher R2 values. However, a recurring limitation across all models and timeframes is their significant struggle to accurately predict rapid and extreme drops in wind speed, suggesting an area for future research and model improvement, potentially through enhanced anomaly detection or the incorporation of more granular meteorological data.

Figure 8.

Model performance summary.

Comparative analysis: Single models versus ensemble model

This study provides a comprehensive evaluation of three individual machine learning models—Random Forest (RF), AdaBoost (Ad), and Support Vector Regression (SVR)—alongside an Ensemble Stacking approach for wind speed forecasting. The comparative analysis reveals distinct performance characteristics across different seasonal conditions, measured through key metrics including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and coefficient of determination (R²).

The comparative analysis of single and ensemble models for wind speed forecasting demonstrates the significant advantage of an integrated approach. As shown in Table 1, for October 2024, the Ensemble Learning model consistently outperformed its individual components—Random Forest, AdaBoost, and Support Vector Regression (SVR)—achieving the lowest Mean Absolute Error (MAE) of 0.255, the lowest Root Mean Squared Error (RMSE) of 0.334, and the highest R2 of 0.801. This trend continued into May 2025, where the Ensemble model again led with an MAE of 0.314, RMSE of 0.429, and R2 of 0.745.

However, the ensemble approach is not without limitations. Its training process demands significantly greater computational resources compared to individual models, and its performance may vary when applied to regions with climatic conditions differing from the training data. Despite these challenges, the Ensemble Learning model represents a statistically significant improvement over standalone methods, offering enhanced reliability and accuracy for wind speed forecasting. This makes it particularly valuable for applications in wind energy systems, where precise predictions are critical for operational efficiency and grid stability. Future research could explore dynamic weighting mechanisms to further optimize the ensemble’s real-time performance and adaptability to diverse meteorological conditions.

Conclusion and perspectives

This study confirmed the potential of machine learning techniques to improve the accuracy and reliability of wind speed forecasting, using real-world meteorological data of Sfax, Tunisia. By comparing three widely used models Support Vector Regression (SVR), AdaBoost, and Random Forest and developing a two-layer stacking ensemble, it is demonstrated that combining diverse algorithms can lead to more precise predictions. Among the evaluated methods, Random Forest consistently delivered the highest accuracy during stable weather conditions, while the ensemble model showed strong overall performance, especially during periods of increased atmospheric variability.

Although the ensemble stacking model did not outperform Random Forest in every instance, it offered a more balanced and adaptable forecasting solution across seasons. These results highlight the practical added value of ensemble learning strategies in renewable energy forecasting, especially when managing the uncertainties associated with wind power integration into the electrical grid. However, the added complexity and computational demand of ensemble models remain important considerations, particularly for real-time applications or deployment in regions with different climatic patterns.

Future research should explore ways to make ensemble models more adaptive to seasonal and real-time changes, potentially through dynamic weighting or online learning techniques. Integrating machine learning with traditional physical models, such as numerical weather prediction, could also create hybrid systems that better capture both data-driven and physics-based insights. Additionally, testing these approaches across diverse geographical settings and incorporating a broader range of environmental features could improve generalizability and robustness. Ultimately, the goal is to enable more intelligent, accurate, and scalable forecasting systems that support the efficient operation of wind energy infrastructure and contribute to a more sustainable energy future.

Footnotes

Author contributions

Nabiha Brahmi: Conceptualization, Methodology, Data Analysis, Writing – Original Draft, and Editing. Leila Hadj Mefteh: Writing. Maher Chaabene: Supervision, Project Administration.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data supporting the findings of this study are available on a secured website with authentication access.*

References

Asbai

Amrouche

(2017) Amrouche boosting scores fusion approach using front-end diversity and adaboost algorithm, for speaker verification. Computers & Electrical Engineering 62: 648–662.

Bauer

Thorpe

Brunet

(2015) The quiet revolution of numerical weather prediction. Nature 525(7567): 47–55.

Brahmi

Chaabene

(2025) Automatic hyperparameter optimization of random forest for wind speed forecasting and wind energy potential assessment, case study Sfax, Tunisia. In: Springer Book: Advances in Integrated Design and Production III. Springer, Cham, 606–614, 0.1007/978-3-032-04742-7.

Brahmi

Charfi

Maher

(2017) Wind potential assessment for an efficient wind farm sizing. Wind Engineering 41(6): 369–382.

Brahmi

, et al. (2023) Machine learning-based wind speed prediction: a study on gradient boosting regressor. In: Algorithm, 14th international renewable energy congress (IREC). doi: 10.1109/IREC59750.2023.10389466.

Chen

, et al. (2023) Assessing the environmental impacts of renewable energy sources: a case study on air pollution and carbon emissions in China. Journal of Environmental Management 345: 118525. doi: 10.1016/j.jenvman.2023.118525.

Cristea

A-G

Spiru

Simona Paraschiv

, et al. (2021) Assessment of onshore wind energy potential under temperate continental climate conditions. Energy Reports 8: 105–114.

Cristea

A-G

Spiru

Paraschiv

(2022) Assessment of onshore wind energy potential under temperate continental climate conditions. Energy Reports 8: 105–114.

Liu

(2022) A review of short-term wind power generation forecasting methods. Energy Reports 8: 456–469.

10.

Liu

Zhang

(2022) Uncovering wind power forecasting uncertainty sources and their impacts. Renewable and Sustainable Energy Reviews 156: 111993.

11.

Liu

(2023) Enhancing short-term wind speed forecasting using graph attention and frequency-enhanced mechanisms. Energy 263: 125789.

12.

Michaelides

(2021) Primary energy use and environmental effects of electric vehicles. World Electric Vehicle Journal 12(3): 138.

13.

Notton

, et al. (2018) Intermittent and stochastic character of renewable energy sources: consequences, cost of intermittence and benefit of forecasting. Renewable and Sustainable Energy Reviews 87: 96–105.

14.

Palmer

Hagedorn

(2006) Predicting uncertainty in forecasts of weather and climate. Reports on Progress in Physics 69(3): 671–712.

15.

Paraschiv

, et al. (2020) Technical and economic analysis of a solar air heating system integration in a residential building wall to increase energy efficiency by solar heat gain and thermal insulation. Energy Reports 6: 197–206.

16.

Paraschiv

(2023) Contribution of renewable energy (hydro, wind, solar and biomass) to decarbonization and transformation of the electricity generation sector for sustainable development. Energy Reports 9: 535–544. doi: 10.1016/j.egyr.2023.07.024.

17.

Qin

(2015) Wind speed forecasting approach using secondary decomposition algorithm and Elman neural network. Applied Energy 157: 183–194.

18.

Rahmani

, et al. (2023) Next-generation IoT devices: sustainable eco-friendly manufacturing, energy harvesting, and wireless connectivity. IEEE Journal of Microwaves 3(1): 237–255.

19.

Ruoso

Ribeiro

JLD

Olaru

(2024) Electric vehicles' impact on energy balance: three-country comparison. Renewable and Sustainable Energy Reviews 203: 114768.

20.

Sadorsky

(2021) Wind energy for sustainable development: driving factors and future outlook. Journal of Cleaner Production 289: 125779.

21.

Shi

Chen

Wang

, et al. (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. Preprint arXiv:1506.04214. https://arxiv.org/abs/1506.04214

22.

Spiru

(2023) Assessment of renewable energy generated by a hybrid system based on wind, hydro, solar, and biomass sources for decarbonizing the energy sector and achieving a sustainable energy transition. Energy Reports 9: 167–175.

23.

Tamor

Stechel

(2022) Electrification of transportation means a lot more than a lot more electric vehicle. iScience 25(6): 104376.

24.

Wang

Chen

(2023) Ultra-short-term wind power forecasting based on deep Bayesian learning. Renewable Energy 182: 789–798.

25.

Zhang

Wang

(2020) Integrating wind energy into the power grid: impact and solutions. Energy Reports 6: 123–130.

26.

Zhou

Wang

(2021) Wind power potential and intermittency issues in the context of renewable energy development. Energy Policy 149: 112007.

Accurate wind speed forecasting through ensemble models: A real-world case study in Sfax,Tunisia

Abstract

Keywords

Introduction

Advancements in wind forecasting: From conventional methods to machine learning approaches

Conventional methods of wind forecasting

Empirical models

Numerical weather forecasting models

Advanced machine learning techniques for wind speed forecasting

Single model forecasting approaches

Ensemble learning techniques in renewable energy forecasting

Proposed methodology

Overview of methodology

Ensemble learning strategy: Two-layer stacking model

Layer 1: Base learners (SVR, AdaBoost, Random Forest)

Layer 2: Meta-learner (Random Forest integration)

Model evaluation metrics (MAE, RMSE, R2)

Machine learning models for wind speed prediction

Support Vector Regression (SVR) algorithm

AdaBoost regressor algorithm

Random Forest Regressor Algorithm

Experiments, results, and discussion

Experimental setup and data description

Individual model performance

Performance of the ensemble learning model

Comparative analysis: Single models versus ensemble model

Conclusion and perspectives

Footnotes

Author contributions

Funding

Declaration of conflicting interests

Data Availability Statement

References

Model evaluation metrics (MAE, RMSE, R²)