Abstract
The integration of large-scale regional water-wind-solar hybrid energy systems poses challenges to power grid stability due to persistent fluctuations that conventional automatic generation control (AGC) systems struggle to mitigate effectively. To address this issue and optimize frequency modulation resource utilization, this study presents a bidirectional communication-based AGC optimization strategy. The proposed approach enhances reinforcement learning algorithms through a dual-estimation framework, enabling dynamic power distribution among generation units. Simultaneously, the methodology incorporates coordinated grid power flow adjustments to achieve integrated uncertainty modeling and coordinated optimization for regional hydro-wind power systems. Experimental validation demonstrates that the enhanced control strategy achieves an improvement of 2.2%–5.8% in Control Performance Standard (CPS) metrics compared with conventional methods, confirming superior system regulation capability.
Keywords
Introduction
Under the urgent trend of global energy transition, the increasing depletion of traditional fossil energy sources and the serious environmental and carbon emission problems they cause have prompted renewable and clean energy sources to become the core force in realizing sustainable energy development. Water-wind-scenic power generation occupies a pivotal position in AGC due to its rich resource reserves and significant environmental benefits. 1 However, water-wind-solar power generation is strongly uncertain. 2 This uncertainty stems from a variety of complex factors, such as unstable wind and photovoltaic (PV) outputs due to variable weather conditions, and hydroelectric units subject to inter-annual, intra-annual, and intra-daily variations of natural incoming water, which makes the management of power allocation control for water-wind-solar power generation systems face a number of difficult problems, 3 and the complexity of the boundary constraints increases the difficulty of making effective scheduling decisions. When solving the problem of optimal allocation strategy, mathematical planning methods are difficult to ensure the accuracy, 4 intelligent optimization methods are easy to fall into local optimum and slow to address. 5 Therefore, there is an urgent need for an innovative approach to more effectively address the dynamic uncertainties in hydro-wind-solar power generation, providing robust support for the secure and stable operation as well as optimal dispatch of power systems.
The modeling of dynamic uncertainty in regional hydro-wind power generation belongs to the control and allocation area of AGC. 6 The research on the control part of AGC can be traced back to the period of proportional-integral (PI) control, which is widely used in the field of AGC by virtue of the fact that it can be regulated without differences for simple power systems. 7 Hota and Mohanty 6 applied sliding mode variable structure control to AGC controller and proposed an inverse controller with adaptive capability. This method is robust to external disturbances; however, on account of the intensifying nonlinear properties of the power system and the increasing interconnections between regions, AGC requires the control strategy to move towards nonlinearity and interconnectivity. Barakat 8 developed a three-layer architecture AGC model for the dynamic simulation of the whole process of the power system, and proposed a control strategy for the power system under nonlinear deviation; however, the proposed method is overly dependent on the model parameters, and the control strategy is overly dependent on the construction of the model, which is not generalizable.
Model predictive control (MPC) functionality requires an embedded dynamic system representation, which is usually constructed using a linear empirical model of the system during operation. 9 Hasan et al. 10 proposed a cooperative AGC control strategy for power systems based on the distributed MPC algorithm, which realizes the dynamic cooperation between wind power and thermal power plants and further improves the overall performance of AGC control. MPC has the ability to predict future states and can take corresponding control behaviors, which PI control does not have 11 ; however, MPC needs to satisfy certain constraints during the control process and relies on the empirical model of the system. Adaptive control (APC) 12 can ensure safe and stable operation of the system despite a certain degree of uncertainty. 13 Wu and Wang 14 applied two adaptive dynamic programming control algorithms to AGC to reduce the power deviation of the interconnected grid. Compared to MPC, APC controllers have fewer constraints and require less a priori knowledge; however, APC controllers are much more complex and costly than conventional controllers.
In the new form of multi-region interconnected power system, the information connection between different regions is close and complex, 15 for the purpose of realizing the efficient consumption of new energy, considering the system in its entirety, it is imperative to implement the priority consumption of new energy. 16 The traditional AGC control strategy has limitations and is difficult to make complex decisions. RL, as an important branch of machine learning, can solve complex problems by interacting with the state of the environment and accumulating relevant knowledge through its own learning. Yin et al. 17 applied Q-learning to AGC to achieve dynamic uncertainty modeling of the total power of a simple power model by continuously updating the state action table, which enhances the adaptability of the AGC system to the control objectives. Li and Yu 18 proposed an imitation learning strategy by incorporating qualification traces into RL to realize AGC for islanded power systems, and the results show that the proposed controller has faster convergence characteristics and dynamic performance. Xi et al. 19 proposed a multi-intelligent body RL to model the uncertainty of AGC power in order to form an optimal joint control strategy that solves the problem of interconnecting different control regions. Zhao et al. 20 bio-inspired population cooperation strategy incorporating win-lose evaluation criteria and spatiotemporal tunneling mechanism was offered, guaranteeing rapid convergence to Nash equilibrium. This method is in light of the stochastic cooperative framework of multi-intelligent body system and realizes the frequent information exchange between multi-intelligent bodies and improves the efficiency of power allocation. Muduli et al. 21 established a three-tier architecture RL to achieve dynamic allocation of power to AGCs, and utilized the characteristics of intelligences such as independence, autonomy, and collaboration to achieve coordinated control to maintain logical unity while physically distributing control.
According to the analysis of existing research, wind power and photovoltaic units have volatility and uncertainty of output, hydroelectric units by the water hammer effect has a certain counter-regulation characteristics, the above renewable energy large-scale grid integration will weaken the power stability of the power system, and increase the difficulty of frequency control of the power system. AGC, as an important means of secondary frequency regulation in power systems, also faces new problems. For this reason, this paper models the dynamic uncertainty of regional hydro-wind power generation based on RL. Firstly, a water-wind-solar power allocation strategy for AGC region based on two-way communication mode is proposed, which treats the optimized power correction based on the actual response of water-wind-solar generator sets to prevent the waste of FM resources on the basis of the characteristics of hydroelectric power generator units in full consideration. The RL algorithm is then optimized based on dual estimators to maximize returns over time. Finally, different regional power grids employ an improved RL algorithm to engage in multi-agent dynamic game theory for obtaining their respective total power generation targets. The total generation command is then dispatched to each power generation group (PGG). All PGGs is treated as a multi-agent system, where the improved RL algorithm dynamically distributes the composite power adjustment command among individual units. This achieves dynamic uncertainty modeling for regional hydro-wind-solar power generation, collectively maintaining frequency stability across multi-regional hybrid energy systems. The experimental outcome demonstrates that the regional control error of the offered method is decreased by 18.01%–47.41%, which provides a new idea for AGC multi-energy synergistic regulation under high uncertainty environment.
Methods and materials
Reinforcement learning
RL constitutes a computational framework where autonomous agents learn optimal policies through environmental interactions, as shown in Figure 1. The process of this interaction can be described by Markov Decision Process (MDP).
22
MDP is a modeling tool for RL and almost any RL process can be modeled as an MDP.
23
An MDP can usually be labeled as Reinforcement learning model.
The strategy of an intelligent body is how the intelligent body acts according to the current state.
24
There are usually two types of strategies: random strategies and deterministic strategies. A randomized strategy can be expressed as a probability density function as follows.
The deterministic policy, denoted by
The purpose of RL is to study to get the best-suited strategy that boosts the total accumulation of all reward values for the intelligent body, and the total accumulation of all reward values is also called the reward. In practice, when calculating the reward of an intelligent body, it will not simply add up the reward values, but discount the reward values and then add the sum, at this time, the reward of a series of actions of the intelligent body is as follows.
The expectation of the payoff of the intelligence starting from state s is defined as the value of state s, denoted as the value function
Automatic power generation control
AGC is a closed-loop automatic control system developed based on EMS, which is the basic function of grid active dispatch and control.
25
It calculates the active power deficit of the grid according to the frequency deviation of the regional grid and the power change of the inter-regional contact line in the system, modifies the active power output of the produced units in the grid in real time, and guarantees the safe, stable, and economic operation of the grid.
26
In the power system, AGC is realized by adjusting the power output of each automatic generation control unit, which belongs to the category of secondary frequency regulation. Specifically, modern grid interconnection power system automatic generation control mainly includes the following three basic functions. (1) Controlling the active power output of generating units to track load changes in real time and realize the balance of power supply and demand in the region, which is related to the basic frequency regulation of units in the power grid. (2) Control the frequency deviation of the regional power grid within the permissible range, so that the grid frequency is maintained in the rated range.
27
At the same time, maintain the power exchanged in the inter-regional contact line within the planned value, so as to realize the active power balance between the regions. This is related to the secondary frequency control in the grid, also known as load frequency control (LFC). (3) Under the constraints of system security, the generating units in the control area reasonably distribute active loads to ensure the economic function of the grid, which is related to the three frequency regulation in the grid, also known as economic dispatch (EDC).
The advantage of the above function is that the regional controllers only need to collect the frequency deviation and contact line exchange power in their own region, and there is no need for interactive communication between regions, so the control principle is simple and practical. Therefore, centralized AGCs based on contact line and frequency deviation control modes are commonly used in domestic and international interconnected grid control systems. However, as the penetration of new energy sources with strong stochastic characteristics in the power system gradually increases, the traditional centralized AGC can hardly meet the control requirements of the power system. 17 In this context, it is necessary to study a multilevel power dynamic control strategy that can adapt to strong stochastic disturbances to solve the problems faced by the traditional centralized AGC.
Mathematical modeling of regional water-wind power allocation in AGC
AGC framework and its communication model
In order to solve the problem of power stability faced by power system. In this paper, we model the power allocation strategy of water and wind power generation in the AGC region, which can improve the frequency stability and meanwhile ensure the regulation capability of the AGC system. First, a power allocation strategy based on two-way communication mode for water and wind power in AGC area is proposed, which optimizes the power correction based on the actual response of water and wind power units to prevent the waste of FM resources.
The traditional AGC generally adopts a one-way communication mode, in which the dispatching center periodically calculates the power that should be adjusted by the water and wind generator sets in the AGC and sends it to each water and wind generator set. However, the inability to know the exact status of a unit can lead to inadequate scheduling strategies, and may even lead to serious accidents. At present, most of the AGC systems in China have adopted a two-way communication mode. In addition to sending commands from the dispatching center to the AGC hydro-wind turbine generator units, the units will report their operation data periodically, and the dispatching center will refer to the latest operation of the units to calculate the commands for the next AGC cycle. However, the current actual AGC system does not utilize the unit operation data to a high degree, such as real-time calculation of standby capacity, etc., and has not yet explored the additional value brought by bidirectional communication. 28 To fully utilize the operation data fed back from the units, the strategy in this paper adopts the bidirectional communication mode.
The dispatching center continuously calculates the regional control demand according to the system frequency deviation, and every 4∼6s, the dispatching center obtains the latest ARR to calculate the power to be allocated and carries out the allocation and sends the result to each unit; at the same time, the AGC unit feeds back the operation situation to the dispatching center every 0.5∼2s to assist in decision-making of the dispatching center.
Regional hydro-wind-solar generator sets are acceptable for dispatch signage
Let
If the direction of regulation is unchanged for the current cycle, as in the previous cycle the unit was asked to increase power, the unit is asked to increase power for the current cycle. If a unit has not yet responded fully to the command from the previous cycle, applying a new command to the unit may cause a larger response deviation. Therefore, in this paper, it is desired to assign a new task to the unit only after the unit has responded to the previous command, thus ensuring that the response bias is maintained at a low level.
Power to be allocated correction strategy
Assuming that the AGC cycle is
However, when the direction of regulation is changed in the current cycle, for example, if a unit was asked to increase power in the previous cycle and is asked to decrease power in the current cycle, there may be a situation where the unit that increased power has not yet fully responded and continues to increase power, while the other unit receives a command to decrease power and begins to decrease power, resulting in a power deficit that does not actually change significantly. Therefore, in this case, it is necessary to immediately stop the response of the unit that is participating in reverse regulation in the current cycle and make a correction for
After
It should be noted that equation (9) is only a modification of the data within the dispatch center to facilitate subsequent optimal scheduling calculations, and is not a direct reference value sent down to the unit.
Optimization of federated aggregation strategies based on knowledge distillation
Dynamic uncertainty modeling framework for regional hydropower, wind power, and solar power generation power
The existing regional water and wind power allocation algorithm to obtain the total regional power generation is difficult to realize the new energy generation control and reduce the total carbon emission of the system. In addition, the water-wind-scenic generator sets incorporating many novel energy sources show substantial disparities in climbing ability and spatial distance, and the traditional centralized AGC model is difficult to ensure the new demands of large-scale interconnected power systems, and the framework of multi-intelligent body layered control system has been developed. Different types of units differ greatly in their climbing ability and spatial distance, and the multi-intelligence hierarchical control system framework uses a clustering idea to classify the generating units of each regional power grid into different PGG, as shown in Figure 2. Dynamic uncertainty modeling framework for regional hydro-wind-solar generation power.
First, diverse power grids interact in a multi-intelligent-agent game, leveraging advanced RL algorithms to achieve optimal regional power distribution. Each PGG leader receives the overall power command. Meanwhile, Every PGG is viewed as a system with multiple intelligences. Utilizing dynamic allocation, the RL algorithm assigns the complete regulation power directive to every individual unit, and at the same time, inter-regional power flow regulation is applied to sustain frequency stability across the polymorphic multi-region energy grid. When the PGG leader does not have enough capacity left to generate power or to climb at the current moment, it will signal an election within the PGG to try to give the leader position to another intelligence. The improved RL algorithm uses the remaining generator capacity and climbing capacity at the current moment as the criteria. The improved RL algorithm uses the remaining generator capacity and ramping capacity at the current moment as the judgment criteria, and the intelligence with the strongest combination of remaining generator capacity and ramping capacity at the current moment will be the first to send out a signal to inform the unit of this PGG and other PGGs that a new leader has been generated. Through dynamically and optimally allocating the overall regulated power order for every area grid to each leader, the dynamic uncertainty modeling of regional water and wind power generation is realized, thus ensuring holistic coordination and optimal operation of the interconnected multi-area power system.
Improvements in reinforcement learning algorithms
The center of the RL algorithm is iteration, and the basic progress is divided into state evaluation and action selection. 29 State evaluation is the interaction between the algorithm and the current system, and evaluates the current environment state value by function calculation; action selection is the strategy to select the optimal action to influence the system environment through the current environment state value. By alternating between environment state values and action selection, the policy solution will eventually converge to the optimal policy sequence. RL is called a one-step greedy policy by iteratively the present state to the subsequent state. However, the use of one-step greedy strategy iteration will make the strategy close to the short-term optimization, which cannot meet the needs of long-term stability of the power system. Therefore, in this paper, the RL algorithm is integrated with the qualification traces with information backtracking function to minimize the control deviation in order to achieve the maximum return in the long term.
The RL algorithm uses a one-step greedy strategy,
30
and in the last few years it has been shown that the multi-step greedy strategy outperforms the one-step greedy strategy in terms of convergence performance. Therefore, in this chapter,
In traditional RL, maximum expectation estimation
31
overuses greedy strategies to maximize the current reward payoff, and tends to choose the action that can obtain the maximum Q value when making decisions, which makes the strategy exploration process appear as an overestimation of the action value, as a result, this article is in light of the dual estimation of quantities approach, which makes the algorithm have a rapid convergence property through decreasing the difference in the Q value. The improved RL algorithm adopts mutually exclusive value operations
Dynamic optimal allocation strategy of power generation based on improved RL algorithm
In light of the improved RL algorithm, the strong consistency strategy in this paper is based on the coordinated control strategy, where every PGG is conceptualized as an intelligent multi-agent network, and the coordinated consistency strategy updates the state through the information exchange between neighboring intelligences, so that the states of all the intelligences converge to a common value. The strong consistency strategy elects a leader who is responsible for allocating PGG power and taking on larger perturbations. When the chief fails to meet the performance requirements, he will voluntarily step down, and an election program will be run in the PGG. Each unit in the PGG will send an election message to the other units after
Within the area grid, each PGG is treated as a multi-intelligence system and each generating unit within the PGG is treated as an intelligence. The consistency algorithm applies graph theory principles to make each intelligence update its own information state based on the information states of its neighboring intelligences, so that the specified information states of all the intelligences in the network converge to a common value. Assume that there are P intelligences in the PGG, and the information exchange between intelligences
The improved RL algorithm mainly selects the climb time of the generating units as the consistency variable for each unit. In this case, more disturbances and power commands are taken by the PGG leader. As a consistency variable for the wth generating unit of the ith PGG in the area grid, the creep time of this generating unit can be expressed as follows.
The equation for updating the variables for each intelligence within the PGG consistent by climbing ability is as follows.
Meanwhile, when the power command of the generating units exceeds the constraints of the unit capacity, the power requirement of each generating unit with the maximum climb time is set as follows.
Under the condition of frequent information interactions among the intelligences and constant gain, the cooperative consistency of a multi-intelligence system can be realized if and only if the network topology of the directed graph L under the discrete-time sequence is strongly connected. To accurately control the regional power generation and ensure the regional frequency stability, the control part selects the normalized linear weighting of the dimensionless processed ACE’s instantaneous value
Experimental results and analyses
For the purpose of testing the solution performance of the improved RL (EORL) algorithm applied to the optimization model of regional water and wind power allocation strategy, operational data from a demonstration zone of a distributed renewable energy grid system is utilized for simulation studies. A simulated grid with 126 nodes and 62 generating units (35 hydroelectric, 18 wind and 19 photovoltaic) was constructed on the basis of grid data, grid structure and operating rules. All the algorithms were simulated and analyzed based on the PyTorch software framework on a computer with Intel i7-10700K (3.8 GHz) and 32 GB of RAM. The learning rate in the experiments was set to 0.001, the discount factor to 0.9, and the weighting factor to 0.01.
Performance comparison of improved RL algorithms.
To verify the effectiveness of the allocation strategy EORL designed in this research, the AEAGC,
17
DPMSRL,
20
and APIDGC
21
strategies are selected as the benchmark methods, and the system frequency deviation curves of different strategies, as shown in Figure 3. Assuming that node 20 loses 100 MW of normal load at 4s, the system is investigated under different AGC control strategies. When IGRL, ANSRL, and RL are used, the AGC action causes the system frequency to oscillate with a period of about 50 s at ultra-low frequency, and the maximum frequency deviation is maintained at about 0.14 Hz due to the strong frequency regulation effect of the DC load. When the EORL strategy is adopted, the action of AGC unit does not cause frequency oscillation, and the AGC unit stops operating after 150 s, which indicates that the EORL strategy has a better stabilizing regulation performance. When ITRL is adopted, although no oscillation occurs, there is a large frequency deviation after the frequency adjustment, and the system frequency deviation is finally stabilized at about 0.05 Hz. When the EORL strategy is adopted, the system frequency deviation is stabilized at about 0.007 Hz, which implies that the offered strategy has better frequency recovery performance. System frequency deviation curves of different strategies.
In this paper, CPS,
35
regional control error Comparison of control performance of different allocation strategies.
Conclusion
The dual-carbon target drives the rapid transformation of the power system towards cleaner and lower carbonization. However, the massive access of new energy sources and random loads such as photovoltaic, wind, and hydropower leads to increased system stochasticity, and traditional control methods are difficult to eliminate the frequency fluctuations brought about by large-scale grid integration of new energy sources, which in turn poses a great challenge to the stability of the power system. To cope with the above issues, this paper models the dynamic uncertainty of regional water-wind power generation in light of RL. Firstly, based on the two-way communication mode, the AGC regional water and wind power allocation strategy is designed, and the optimized power correction is treated according to the actual response of water and wind generator sets to prevent the waste of FM resources. The RL algorithm is then optimized based on dual estimators to maximize returns over time. Finally, the different regional grids perform a multi-intelligence dynamic game through the improved RL algorithm to acquire the entire power in the region and send the total generation power command to each PGG. The design specifies agent-network topology for all PGG units. A coordinated consensus strategy updates system states through information exchange between neighboring agents. By employing an improved RL algorithm, the whole regulation power command is dynamically allocated to individual generation units, achieving dynamic uncertainty modeling for regional hydro-wind-solar power generation. The experimental outcome demonstrates that the suggested approach can significantly reduce the frequency deviation in the region, provide strong support for the scientific scheduling and stable operation of the power system, and is of great significance in promoting the efficient consumption of renewable energy and the low-carbon transformation of the power system.
Despite the fact that the method presented in this article attains satisfactory control performance, there are still some areas that need to be improved and extended, which are mainly summarized as follows. (1) In real power grids, regions are connected to each other in some way, and the regional grids are connected by complex transmission lines. The more interconnected areas there are, the more topology and state-space considerations are needed, which makes modeling geometrically more difficult. The controllability of the proposed method still needs to be improved. This section can be further expanded in future research. (2) The theoretical basis of the approach in this paper is RL, which is good at making decisions, but not as good at processing the environment as deep learning. Nowadays, the field of artificial intelligence is growing rapidly, and deep learning has become an important research field, which can be combined with the strong perception ability of deep learning to improve RL and complement each other’s advantages.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research received financial assistance from the Luquan Wudongde Power Plant of Three Gorges Jinsha River Yunchuan Hydropower Development Co., Ltd. funded [Z522402001].
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
