Dynamic uncertainty model of regional hydro-wind-solar power generation on reinforcement learning

Abstract

The integration of large-scale regional water-wind-solar hybrid energy systems poses challenges to power grid stability due to persistent fluctuations that conventional automatic generation control (AGC) systems struggle to mitigate effectively. To address this issue and optimize frequency modulation resource utilization, this study presents a bidirectional communication-based AGC optimization strategy. The proposed approach enhances reinforcement learning algorithms through a dual-estimation framework, enabling dynamic power distribution among generation units. Simultaneously, the methodology incorporates coordinated grid power flow adjustments to achieve integrated uncertainty modeling and coordinated optimization for regional hydro-wind power systems. Experimental validation demonstrates that the enhanced control strategy achieves an improvement of 2.2%–5.8% in Control Performance Standard (CPS) metrics compared with conventional methods, confirming superior system regulation capability.

Keywords

automatic power generation control area control uncertainty modeling power allocation reinforcement learning

Introduction

Under the urgent trend of global energy transition, the increasing depletion of traditional fossil energy sources and the serious environmental and carbon emission problems they cause have prompted renewable and clean energy sources to become the core force in realizing sustainable energy development. Water-wind-scenic power generation occupies a pivotal position in AGC due to its rich resource reserves and significant environmental benefits.¹ However, water-wind-solar power generation is strongly uncertain.² This uncertainty stems from a variety of complex factors, such as unstable wind and photovoltaic (PV) outputs due to variable weather conditions, and hydroelectric units subject to inter-annual, intra-annual, and intra-daily variations of natural incoming water, which makes the management of power allocation control for water-wind-solar power generation systems face a number of difficult problems,³ and the complexity of the boundary constraints increases the difficulty of making effective scheduling decisions. When solving the problem of optimal allocation strategy, mathematical planning methods are difficult to ensure the accuracy,⁴ intelligent optimization methods are easy to fall into local optimum and slow to address.⁵ Therefore, there is an urgent need for an innovative approach to more effectively address the dynamic uncertainties in hydro-wind-solar power generation, providing robust support for the secure and stable operation as well as optimal dispatch of power systems.

The modeling of dynamic uncertainty in regional hydro-wind power generation belongs to the control and allocation area of AGC.⁶ The research on the control part of AGC can be traced back to the period of proportional-integral (PI) control, which is widely used in the field of AGC by virtue of the fact that it can be regulated without differences for simple power systems.⁷ Hota and Mohanty⁶ applied sliding mode variable structure control to AGC controller and proposed an inverse controller with adaptive capability. This method is robust to external disturbances; however, on account of the intensifying nonlinear properties of the power system and the increasing interconnections between regions, AGC requires the control strategy to move towards nonlinearity and interconnectivity. Barakat⁸ developed a three-layer architecture AGC model for the dynamic simulation of the whole process of the power system, and proposed a control strategy for the power system under nonlinear deviation; however, the proposed method is overly dependent on the model parameters, and the control strategy is overly dependent on the construction of the model, which is not generalizable.

Model predictive control (MPC) functionality requires an embedded dynamic system representation, which is usually constructed using a linear empirical model of the system during operation.⁹ Hasan et al.¹⁰ proposed a cooperative AGC control strategy for power systems based on the distributed MPC algorithm, which realizes the dynamic cooperation between wind power and thermal power plants and further improves the overall performance of AGC control. MPC has the ability to predict future states and can take corresponding control behaviors, which PI control does not have¹¹; however, MPC needs to satisfy certain constraints during the control process and relies on the empirical model of the system. Adaptive control (APC)¹² can ensure safe and stable operation of the system despite a certain degree of uncertainty.¹³ Wu and Wang¹⁴ applied two adaptive dynamic programming control algorithms to AGC to reduce the power deviation of the interconnected grid. Compared to MPC, APC controllers have fewer constraints and require less a priori knowledge; however, APC controllers are much more complex and costly than conventional controllers.

In the new form of multi-region interconnected power system, the information connection between different regions is close and complex,¹⁵ for the purpose of realizing the efficient consumption of new energy, considering the system in its entirety, it is imperative to implement the priority consumption of new energy.¹⁶ The traditional AGC control strategy has limitations and is difficult to make complex decisions. RL, as an important branch of machine learning, can solve complex problems by interacting with the state of the environment and accumulating relevant knowledge through its own learning. Yin et al.¹⁷ applied Q-learning to AGC to achieve dynamic uncertainty modeling of the total power of a simple power model by continuously updating the state action table, which enhances the adaptability of the AGC system to the control objectives. Li and Yu¹⁸ proposed an imitation learning strategy by incorporating qualification traces into RL to realize AGC for islanded power systems, and the results show that the proposed controller has faster convergence characteristics and dynamic performance. Xi et al.¹⁹ proposed a multi-intelligent body RL to model the uncertainty of AGC power in order to form an optimal joint control strategy that solves the problem of interconnecting different control regions. Zhao et al.²⁰ bio-inspired population cooperation strategy incorporating win-lose evaluation criteria and spatiotemporal tunneling mechanism was offered, guaranteeing rapid convergence to Nash equilibrium. This method is in light of the stochastic cooperative framework of multi-intelligent body system and realizes the frequent information exchange between multi-intelligent bodies and improves the efficiency of power allocation. Muduli et al.²¹ established a three-tier architecture RL to achieve dynamic allocation of power to AGCs, and utilized the characteristics of intelligences such as independence, autonomy, and collaboration to achieve coordinated control to maintain logical unity while physically distributing control.

According to the analysis of existing research, wind power and photovoltaic units have volatility and uncertainty of output, hydroelectric units by the water hammer effect has a certain counter-regulation characteristics, the above renewable energy large-scale grid integration will weaken the power stability of the power system, and increase the difficulty of frequency control of the power system. AGC, as an important means of secondary frequency regulation in power systems, also faces new problems. For this reason, this paper models the dynamic uncertainty of regional hydro-wind power generation based on RL. Firstly, a water-wind-solar power allocation strategy for AGC region based on two-way communication mode is proposed, which treats the optimized power correction based on the actual response of water-wind-solar generator sets to prevent the waste of FM resources on the basis of the characteristics of hydroelectric power generator units in full consideration. The RL algorithm is then optimized based on dual estimators to maximize returns over time. Finally, different regional power grids employ an improved RL algorithm to engage in multi-agent dynamic game theory for obtaining their respective total power generation targets. The total generation command is then dispatched to each power generation group (PGG). All PGGs is treated as a multi-agent system, where the improved RL algorithm dynamically distributes the composite power adjustment command among individual units. This achieves dynamic uncertainty modeling for regional hydro-wind-solar power generation, collectively maintaining frequency stability across multi-regional hybrid energy systems. The experimental outcome demonstrates that the regional control error of the offered method is decreased by 18.01%–47.41%, which provides a new idea for AGC multi-energy synergistic regulation under high uncertainty environment.

Methods and materials

Reinforcement learning

RL constitutes a computational framework where autonomous agents learn optimal policies through environmental interactions, as shown in Figure 1. The process of this interaction can be described by Markov Decision Process (MDP).²² MDP is a modeling tool for RL and almost any RL process can be modeled as an MDP.²³ An MDP can usually be labeled as $(S, A, R, P)$ , in which S stands the state space, which refers to the collection of every conceivable state; A stands for the action space, which refers to the collection of every conceivable action of an intelligent body; and R stands for the reward operation, the reward is the value of the environment feedback after the intelligent body performs an action; P denotes the state transfer probability, which is a property of the environment.

Figure 1.

Reinforcement learning model.

The strategy of an intelligent body is how the intelligent body acts according to the current state.²⁴ There are usually two types of strategies: random strategies and deterministic strategies. A randomized strategy can be expressed as a probability density function as follows.

π (s, a) = P (a ∣ s)

(1)

The deterministic policy, denoted by $π (s) = a$ , indicates that the output is a deterministic action a based on the current state s, rather than a probability distribution of the output actions. In the RL exercise, the intelligent body continuously interacts with the environment to learn, and the interaction process can be described as follows: the intelligent body adopts a behavior a according to the state s of the environment, the environment responds by giving a reward r to the intelligent agent, and the state of the environment is shifted from s to $s^{'}$ . The constant interaction of the intelligence with the environment leads to trajectories as $s_{1}, a_{1}, r_{1}, . . ., s_{t}, a_{t}, r_{t}, s_{t + 1}, . . .$ . These trajectories can be seen as an MDP.

The purpose of RL is to study to get the best-suited strategy that boosts the total accumulation of all reward values for the intelligent body, and the total accumulation of all reward values is also called the reward. In practice, when calculating the reward of an intelligent body, it will not simply add up the reward values, but discount the reward values and then add the sum, at this time, the reward of a series of actions of the intelligent body is as follows.

R_{t} = r_{t} + γ r_{t + 1} + γ^{2} r_{t + 2} + γ^{3} r_{t + 3} + \dots = \sum_{k = 0}^{\infty} γ^{k} r_{t + k}

(2)

where

γ

is the discount rate, indicating the significance of the future reward at the current moment. The value of

γ

is in the range of [0,1], the closer the discount rate is to 0, the more important the current reward is, and the future reward can be ignored; the closer the discount rate is to 1, the more important the future reward is.

The expectation of the payoff of the intelligence starting from state s is defined as the value of state s, denoted as the value function $V^{π} (s)$ , as follows, where $E_{π}$ is the expectation of the payoff of executing the strategy $π$ in state s; and state s’ denotes the successor state of state s.

V^{π} (s) = E_{π} [r_{t} + γ V^{π} (s^{'}) | s_{t} = s]

(3)

In the same vein, formulate the state-action value function

Q^{π} (s, a)

, also called the Q function, as follows.

Q^{π} (s, a) = E_{π} [\sum_{k = 0}^{\infty} γ^{k} r_{t + k} ∣ s_{t} = s, a_{t} = a]

(4)

Automatic power generation control

AGC is a closed-loop automatic control system developed based on EMS, which is the basic function of grid active dispatch and control.²⁵ It calculates the active power deficit of the grid according to the frequency deviation of the regional grid and the power change of the inter-regional contact line in the system, modifies the active power output of the produced units in the grid in real time, and guarantees the safe, stable, and economic operation of the grid.²⁶ In the power system, AGC is realized by adjusting the power output of each automatic generation control unit, which belongs to the category of secondary frequency regulation. Specifically, modern grid interconnection power system automatic generation control mainly includes the following three basic functions.

(1) Controlling the active power output of generating units to track load changes in real time and realize the balance of power supply and demand in the region, which is related to the basic frequency regulation of units in the power grid.

(2) Control the frequency deviation of the regional power grid within the permissible range, so that the grid frequency is maintained in the rated range.²⁷ At the same time, maintain the power exchanged in the inter-regional contact line within the planned value, so as to realize the active power balance between the regions. This is related to the secondary frequency control in the grid, also known as load frequency control (LFC).

(3) Under the constraints of system security, the generating units in the control area reasonably distribute active loads to ensure the economic function of the grid, which is related to the three frequency regulation in the grid, also known as economic dispatch (EDC).

The advantage of the above function is that the regional controllers only need to collect the frequency deviation and contact line exchange power in their own region, and there is no need for interactive communication between regions, so the control principle is simple and practical. Therefore, centralized AGCs based on contact line and frequency deviation control modes are commonly used in domestic and international interconnected grid control systems. However, as the penetration of new energy sources with strong stochastic characteristics in the power system gradually increases, the traditional centralized AGC can hardly meet the control requirements of the power system.¹⁷ In this context, it is necessary to study a multilevel power dynamic control strategy that can adapt to strong stochastic disturbances to solve the problems faced by the traditional centralized AGC.

Mathematical modeling of regional water-wind power allocation in AGC

AGC framework and its communication model

In order to solve the problem of power stability faced by power system. In this paper, we model the power allocation strategy of water and wind power generation in the AGC region, which can improve the frequency stability and meanwhile ensure the regulation capability of the AGC system. First, a power allocation strategy based on two-way communication mode for water and wind power in AGC area is proposed, which optimizes the power correction based on the actual response of water and wind power units to prevent the waste of FM resources.

The traditional AGC generally adopts a one-way communication mode, in which the dispatching center periodically calculates the power that should be adjusted by the water and wind generator sets in the AGC and sends it to each water and wind generator set. However, the inability to know the exact status of a unit can lead to inadequate scheduling strategies, and may even lead to serious accidents. At present, most of the AGC systems in China have adopted a two-way communication mode. In addition to sending commands from the dispatching center to the AGC hydro-wind turbine generator units, the units will report their operation data periodically, and the dispatching center will refer to the latest operation of the units to calculate the commands for the next AGC cycle. However, the current actual AGC system does not utilize the unit operation data to a high degree, such as real-time calculation of standby capacity, etc., and has not yet explored the additional value brought by bidirectional communication.²⁸ To fully utilize the operation data fed back from the units, the strategy in this paper adopts the bidirectional communication mode.

The dispatching center continuously calculates the regional control demand according to the system frequency deviation, and every 4∼6s, the dispatching center obtains the latest ARR to calculate the power to be allocated and carries out the allocation and sends the result to each unit; at the same time, the AGC unit feeds back the operation situation to the dispatching center every 0.5∼2s to assist in decision-making of the dispatching center.

Regional hydro-wind-solar generator sets are acceptable for dispatch signage

Let $u_{i}$ be the identification of whether or not the current cycle of water-wind turbine i in the region is acceptable for dispatching, with 1 being dispatchable and 0 being non-dispatchable, and its identification is defined as follows.

u_{i} (t) = {\begin{cases} 1, | P_{r e f, i} - P_{i} | < Δ P_{g a p} \\ 0, | P_{r e f, i} - P_{i} | \geq Δ P_{g a p} \end{cases}

(5)

where

Δ P_{g a p}

is the maximum response error allowed under the influence of the action dead zone of the water-wind generator set, mechanical factors and so on. In this paper, ignoring the effect of communication delay on the calculation, and assuming that the AGC cycle is an integer multiple of the unit’s upload data cycle, the start of this AGC cycle coincides with the receipt of the unit’s information at the current moment.

If the direction of regulation is unchanged for the current cycle, as in the previous cycle the unit was asked to increase power, the unit is asked to increase power for the current cycle. If a unit has not yet responded fully to the command from the previous cycle, applying a new command to the unit may cause a larger response deviation. Therefore, in this paper, it is desired to assign a new task to the unit only after the unit has responded to the previous command, thus ensuring that the response bias is maintained at a low level.

Power to be allocated correction strategy

Assuming that the AGC cycle is $T_{A G C}$ , the Area Regulation Requirement (ARR) acquired in the current cycle is $P_{A R R, c u r}$ , and the ARR acquired in the previous cycle is $P_{A R R, p r e}$ , the power $Δ P_{T B D}$ to be allocated in the current cycle is shown below.

Δ P_{T B D} = P_{A R R, c u r} - P_{A R R, p r e}

(6)

However, when the direction of regulation is changed in the current cycle, for example, if a unit was asked to increase power in the previous cycle and is asked to decrease power in the current cycle, there may be a situation where the unit that increased power has not yet fully responded and continues to increase power, while the other unit receives a command to decrease power and begins to decrease power, resulting in a power deficit that does not actually change significantly. Therefore, in this case, it is necessary to immediately stop the response of the unit that is participating in reverse regulation in the current cycle and make a correction for $Δ P_{T B D}$ , that is, when $P_{A R R, c u r} \cdot P_{A R R, p r e} < 0$ is satisfied, using equation (7).

Δ P_{T B D, r e} = Δ P_{T B D} + Δ P_{r e}

(7)

where

Δ P_{r e}

is the corrected power for this cycle, which is calculated as shown in equation (8). Its meaning is that on the basis of

Δ P_{T B D}

, the power of the unit that has not yet completed the reverse regulation is cut off.

Δ P_{r e} = \sum_{n = 1}^{N_{A G C}} (P_{r e f, i} - P_{i}) (1 - u_{i})

(8)

After $Δ P_{T B D}$ is corrected, the reference value of the water-wind generating units in the AGC area should be corrected, and the units that have not yet responded completely should immediately stop their subsequent responses and be prepared to accept the dispatch arrangement of the current cycle in order to increase the number of FM resources that can be deployed, in the following manner.

P_{r e f, i} = {\begin{cases} P_{r e f, i}, u_{i} = 1 \\ P_{i}, u_{i} = 0 \end{cases}

(9)

It should be noted that equation (9) is only a modification of the data within the dispatch center to facilitate subsequent optimal scheduling calculations, and is not a direct reference value sent down to the unit.

Optimization of federated aggregation strategies based on knowledge distillation

Dynamic uncertainty modeling framework for regional hydropower, wind power, and solar power generation power

The existing regional water and wind power allocation algorithm to obtain the total regional power generation is difficult to realize the new energy generation control and reduce the total carbon emission of the system. In addition, the water-wind-scenic generator sets incorporating many novel energy sources show substantial disparities in climbing ability and spatial distance, and the traditional centralized AGC model is difficult to ensure the new demands of large-scale interconnected power systems, and the framework of multi-intelligent body layered control system has been developed. Different types of units differ greatly in their climbing ability and spatial distance, and the multi-intelligence hierarchical control system framework uses a clustering idea to classify the generating units of each regional power grid into different PGG, as shown in Figure 2.

Figure 2.

Dynamic uncertainty modeling framework for regional hydro-wind-solar generation power.

First, diverse power grids interact in a multi-intelligent-agent game, leveraging advanced RL algorithms to achieve optimal regional power distribution. Each PGG leader receives the overall power command. Meanwhile, Every PGG is viewed as a system with multiple intelligences. Utilizing dynamic allocation, the RL algorithm assigns the complete regulation power directive to every individual unit, and at the same time, inter-regional power flow regulation is applied to sustain frequency stability across the polymorphic multi-region energy grid. When the PGG leader does not have enough capacity left to generate power or to climb at the current moment, it will signal an election within the PGG to try to give the leader position to another intelligence. The improved RL algorithm uses the remaining generator capacity and climbing capacity at the current moment as the criteria. The improved RL algorithm uses the remaining generator capacity and ramping capacity at the current moment as the judgment criteria, and the intelligence with the strongest combination of remaining generator capacity and ramping capacity at the current moment will be the first to send out a signal to inform the unit of this PGG and other PGGs that a new leader has been generated. Through dynamically and optimally allocating the overall regulated power order for every area grid to each leader, the dynamic uncertainty modeling of regional water and wind power generation is realized, thus ensuring holistic coordination and optimal operation of the interconnected multi-area power system.

Improvements in reinforcement learning algorithms

The center of the RL algorithm is iteration, and the basic progress is divided into state evaluation and action selection.²⁹ State evaluation is the interaction between the algorithm and the current system, and evaluates the current environment state value by function calculation; action selection is the strategy to select the optimal action to influence the system environment through the current environment state value. By alternating between environment state values and action selection, the policy solution will eventually converge to the optimal policy sequence. RL is called a one-step greedy policy by iteratively the present state to the subsequent state. However, the use of one-step greedy strategy iteration will make the strategy close to the short-term optimization, which cannot meet the needs of long-term stability of the power system. Therefore, in this paper, the RL algorithm is integrated with the qualification traces with information backtracking function to minimize the control deviation in order to achieve the maximum return in the long term.

The RL algorithm uses a one-step greedy strategy,³⁰ and in the last few years it has been shown that the multi-step greedy strategy outperforms the one-step greedy strategy in terms of convergence performance. Therefore, in this chapter, $Q (λ)$ is incorporated into the RL algorithm to improve the convergence speed of the dynamic uncertainty modeling of regional hydro-wind power generation. The intelligence performs exploratory actions to obtain the reward value of the current environment for the goal of deriving the error value of the current value function and the evaluated value formula as follows.

σ_{t} = R (s_{t}, a_{t}) + γ_{1} Q^{'} (s_{t + 1}, a_{g}) - Q^{'} (s_{t}, a_{t})

(10)

M_{t} = R (s_{t}, a_{t}) + γ_{1} Q_{t}^{'} (s_{t + 1}, a_{g}) - Q_{t}^{'} (s_{t}, a_{g})

(11)

where

a_{g}

is the greedy action strategy;

σ_{t}

is the Q-value operation error of the intelligent body during the tth step epoch;

M_{t}

is the evaluation of the Q-value function error.

Q_{t}^{'} (s_{t}, a_{g})

is the decision value of the Q-value function at the tth step of iteration under state

s_{t}

and policy

a_{g}

, and

γ_{1}

is the discount factor.

In traditional RL, maximum expectation estimation³¹ overuses greedy strategies to maximize the current reward payoff, and tends to choose the action that can obtain the maximum Q value when making decisions, which makes the strategy exploration process appear as an overestimation of the action value, as a result, this article is in light of the dual estimation of quantities approach, which makes the algorithm have a rapid convergence property through decreasing the difference in the Q value. The improved RL algorithm adopts mutually exclusive value operations $Q_{A}$ and $Q_{B}$ instead of a single value function Q. The action-oriented $π_{A}^{*}$ and $π_{B}^{*}$ are selected as the maximum action values of $Q_{A}$ and $Q_{B}$ , respectively, as shown in the following equation.

π_{A}^{*} (s) = \arg \max Q_{B}^{*} (s, a)

(12)

π_{B}^{*} (s) = \arg \max Q_{A}^{*} (s, a)

(13)

Dynamic optimal allocation strategy of power generation based on improved RL algorithm

In light of the improved RL algorithm, the strong consistency strategy in this paper is based on the coordinated control strategy, where every PGG is conceptualized as an intelligent multi-agent network, and the coordinated consistency strategy updates the state through the information exchange between neighboring intelligences, so that the states of all the intelligences converge to a common value. The strong consistency strategy elects a leader who is responsible for allocating PGG power and taking on larger perturbations. When the chief fails to meet the performance requirements, he will voluntarily step down, and an election program will be run in the PGG. Each unit in the PGG will send an election message to the other units after $T_{i w}$ seconds, and the unit that receives the election message will stop the clock and will not send an election message. The equation for $T_{i w}$ is as bellow.

T_{i w} = 4 - \frac{P_{i w} - P_{i w}^{'}}{P_{i}^{\max}}

(14)

where

T_{i w}

and

P_{i w}^{'}

are the wth generator capacity and current moment power in

P G G_{i}

, respectively.

Within the area grid, each PGG is treated as a multi-intelligence system and each generating unit within the PGG is treated as an intelligence. The consistency algorithm applies graph theory principles to make each intelligence update its own information state based on the information states of its neighboring intelligences, so that the specified information states of all the intelligences in the network converge to a common value. Assume that there are P intelligences in the PGG, and the information exchange between intelligences $v_{p}$ and $v_{q} (p \neq q; p, q = 1, \dots, P)$ is mainly determined randomly by the probability $b_{p q} (0,, b_{p q},, 1)$ , and is independent of other intelligences in the PGG. The exchange of information between intelligences means that they are interconnected. The Laplace matrix $L = [l_{p q}]$ , which reflects the topology of a multi-intelligent system, can be expressed as follows.

{\begin{cases} l_{p p} = \sum_{q = 1, p \neq q}^{P} b_{p q}, \forall p \neq q \\ l_{p q} = - b_{p q} \end{cases}

(15)

The improved RL algorithm mainly selects the climb time of the generating units as the consistency variable for each unit. In this case, more disturbances and power commands are taken by the PGG leader. As a consistency variable for the wth generating unit of the ith PGG in the area grid, the creep time of this generating unit can be expressed as follows.

t_{i w} = Δ P_{i w} / Δ P_{i w}^{r a t e}

(16)

where

Δ P_{i w}

and

Δ P_{i w}^{r a t e}

are the power command and climb rate of the wth generating unit in the ith PGG, respectively. The climb rate

Δ P_{i w}^{r a t e}

of the unit can be expressed as follows.

Δ P_{i w}^{r a t e} = {\begin{cases} Δ P_{i w}^{r a t e +}, Δ P_{i} > 0 \\ Δ P_{i w}^{r a t e -}, Δ P < 0 \end{cases}

(17)

The equation for updating the variables for each intelligence within the PGG consistent by climbing ability is as follows.

t_{i w} [k + 1] = \sum_{ν = 1}^{W_{i}} d_{w ν} [k] t_{i ν} [k]

(18)

where

W_{i}

is the number of generating units of the PGG;

d_{w ν} [k]

is the

d_{w ν} [k]

-th term of the arbitrary random matrix

D = d_{w v} [k] \in R^{W_{i} \times W_{i}}

at k time.

Meanwhile, when the power command of the generating units exceeds the constraints of the unit capacity, the power requirement of each generating unit with the maximum climb time is set as follows.

Δ P_{i W} = {\begin{cases} Δ P_{i w}^{\max}, Δ P_{i w} > Δ P_{i w}^{\max} \\ Δ P_{i w}^{\min}, Δ P_{i w} < Δ P_{i w}^{\min} \end{cases}

(19)

t_{i w} = t_{i w}^{\max} = {\begin{cases} \frac{Δ P_{i w}^{\max}}{Δ P_{i w}^{r a t e +}}, Δ P_{i w} > Δ P_{i w}^{\max} \\ \frac{Δ P_{i w}^{\min}}{Δ P_{i | w}^{r a t e -}}, Δ P_{i w} < Δ P_{i w}^{\min} \end{cases}

(20)

where

Δ P_{i w}^{\max}

and

Δ P_{i w}^{\min}

are, respectively, the maximum and minimum power adjustable capacities of the wth generator set in the ith PGG. Moreover, when the power command of the wth generator set in the ith PGG exceeds the adjustable capacity range of the generator set, the constant gain

b_{w ν}

becomes

b_{w v} = 0, v = 1, 2, . . ., W_{i}

Under the condition of frequent information interactions among the intelligences and constant gain, the cooperative consistency of a multi-intelligence system can be realized if and only if the network topology of the directed graph L under the discrete-time sequence is strongly connected. To accurately control the regional power generation and ensure the regional frequency stability, the control part selects the normalized linear weighting of the dimensionless processed ACE’s instantaneous value $A (i)$ and $Δ f$ 's instantaneous value $Δ f (i)$ , and obtains the target reward function as follows, where $η$ is the weighting coefficient.

R_{1} = - 100 η {| Δ f (i) |}^{2} - \frac{(1 - η) {| A (i) |}^{2}}{10}

(21)

Experimental results and analyses

For the purpose of testing the solution performance of the improved RL (EORL) algorithm applied to the optimization model of regional water and wind power allocation strategy, operational data from a demonstration zone of a distributed renewable energy grid system is utilized for simulation studies. A simulated grid with 126 nodes and 62 generating units (35 hydroelectric, 18 wind and 19 photovoltaic) was constructed on the basis of grid data, grid structure and operating rules. All the algorithms were simulated and analyzed based on the PyTorch software framework on a computer with Intel i7-10700K (3.8 GHz) and 32 GB of RAM. The learning rate in the experiments was set to 0.001, the discount factor to 0.9, and the weighting factor to 0.01.

This paper first analyzes the performance of the EORL algorithm. By comparing the algorithms, the traditional RL algorithm and its different improved versions ANSRL,³² IGRL,³³ and ITRL³⁴ are selected. All algorithmic models were trained using datasets identical to those referenced in this study, and the power system’s real-time operational data is processed computationally, and the solving is repeated 10 times. This paper conducts comparative analysis of five performance metrics, including the extremal (maximum/minimum) and mean computation times for each algorithm, along with the mean objective function values and their variance across 10 iterations, and the results are shown in Table 1. The EORL and ITRL algorithms outperform the other algorithms in terms of solving speed, and a modest latency disparity exists between ITRL and EORL, with the former exhibiting slightly prolonged execution times, showing an inter-algorithm temporal discrepancy within 0.1-second tolerance. EORL’s architectural stratification results in measurable but acceptable latency degradation. The optimization outcomes of ANSRL, IGRL, and RL exhibit two characteristic limitations: depressed mean optimal values coupled with substantial solution dispersion. There exist problem instances where convergence to the optimum is theoretically unattainable. The EORL algorithm yields an optimal average value and variance for the optimal value of the objective function. Moreover, the stability of the solution it generates is significantly greater than that of solutions obtained by other approaches.

Table 1.

Performance comparison of improved RL algorithms.

Algorithm	Solution time			Optimal value of the target function
Algorithm	Minimum value/s	Maximum value/s	Average value/s	Average value/s	Variance
RL	15.458	16.672	16.633	768.93	0.052
ANSRL	13.732	14.584	14.682	747.30	0.043
IGRL	12.248	12.743	12.825	752.71	0.028
ITRL	10.523	10.856	11.283	772.87	0.025
EORL	9.632	10.742	10.448	778.47	0.019

To verify the effectiveness of the allocation strategy EORL designed in this research, the AEAGC,¹⁷ DPMSRL,²⁰ and APIDGC²¹ strategies are selected as the benchmark methods, and the system frequency deviation curves of different strategies, as shown in Figure 3. Assuming that node 20 loses 100 MW of normal load at 4s, the system is investigated under different AGC control strategies. When IGRL, ANSRL, and RL are used, the AGC action causes the system frequency to oscillate with a period of about 50 s at ultra-low frequency, and the maximum frequency deviation is maintained at about 0.14 Hz due to the strong frequency regulation effect of the DC load. When the EORL strategy is adopted, the action of AGC unit does not cause frequency oscillation, and the AGC unit stops operating after 150 s, which indicates that the EORL strategy has a better stabilizing regulation performance. When ITRL is adopted, although no oscillation occurs, there is a large frequency deviation after the frequency adjustment, and the system frequency deviation is finally stabilized at about 0.05 Hz. When the EORL strategy is adopted, the system frequency deviation is stabilized at about 0.007 Hz, which implies that the offered strategy has better frequency recovery performance.

Figure 3.

System frequency deviation curves of different strategies.

In this paper, CPS,³⁵ regional control error $| A C E |$ ³⁶, and absolute value of frequency deviation $| Δ f |$ are chosen as evaluation indexes, and the control performance comparison of different allocation strategies is shown in Figure 4. Compared with the other strategies, EORL can significantly reduce the frequency deviation of the region, in which $| Δ f |$ reduces 5.66%–40.61%, while the average value of CPS increases 2.2%–5.8%, and $| A C E |$ reduces 18.01%–47.41%. In summary, the study in this paper shows that EORL can realize effective regional water and wind power allocation with high following characteristics and better control performance when simulating large-scale wind, water, biomass and other new energy sources and loads connected to the complementary power generation system.

Figure 4.

Comparison of control performance of different allocation strategies.

Conclusion

The dual-carbon target drives the rapid transformation of the power system towards cleaner and lower carbonization. However, the massive access of new energy sources and random loads such as photovoltaic, wind, and hydropower leads to increased system stochasticity, and traditional control methods are difficult to eliminate the frequency fluctuations brought about by large-scale grid integration of new energy sources, which in turn poses a great challenge to the stability of the power system. To cope with the above issues, this paper models the dynamic uncertainty of regional water-wind power generation in light of RL. Firstly, based on the two-way communication mode, the AGC regional water and wind power allocation strategy is designed, and the optimized power correction is treated according to the actual response of water and wind generator sets to prevent the waste of FM resources. The RL algorithm is then optimized based on dual estimators to maximize returns over time. Finally, the different regional grids perform a multi-intelligence dynamic game through the improved RL algorithm to acquire the entire power in the region and send the total generation power command to each PGG. The design specifies agent-network topology for all PGG units. A coordinated consensus strategy updates system states through information exchange between neighboring agents. By employing an improved RL algorithm, the whole regulation power command is dynamically allocated to individual generation units, achieving dynamic uncertainty modeling for regional hydro-wind-solar power generation. The experimental outcome demonstrates that the suggested approach can significantly reduce the frequency deviation in the region, provide strong support for the scientific scheduling and stable operation of the power system, and is of great significance in promoting the efficient consumption of renewable energy and the low-carbon transformation of the power system.

Despite the fact that the method presented in this article attains satisfactory control performance, there are still some areas that need to be improved and extended, which are mainly summarized as follows.

(1) In real power grids, regions are connected to each other in some way, and the regional grids are connected by complex transmission lines. The more interconnected areas there are, the more topology and state-space considerations are needed, which makes modeling geometrically more difficult. The controllability of the proposed method still needs to be improved. This section can be further expanded in future research.

(2) The theoretical basis of the approach in this paper is RL, which is good at making decisions, but not as good at processing the environment as deep learning. Nowadays, the field of artificial intelligence is growing rapidly, and deep learning has become an important research field, which can be combined with the strong perception ability of deep learning to improve RL and complement each other’s advantages.

Footnotes

ORCID iD

Wei Zhang

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research received financial assistance from the Luquan Wudongde Power Plant of Three Gorges Jinsha River Yunchuan Hydropower Development Co., Ltd. funded [Z522402001].

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Khan

Islam

Das

, et al. Energy sustainability–survey on technology and control of microgrid, smart grid and virtual power plant. IEEE Access 2021; 9: 104663–104694.

Chen

Liu

Xiao

, et al. Power generation scheduling for a hydro-wind-solar hybrid system: a systematic survey and prospect. Energies 2022; 15: 8747.

Patel

Meegahapola

, et al. Enhancing optimal automatic generation control in a multi-area power system with diverse energy resources. IEEE Trans Power Syst 2019; 34: 3465–3475.

Ram Babu

Bhagat

Saikia

, et al. A comprehensive review of recent strategies on automatic generation control/load frequency control in power systems. Arch Comput Methods Eng 2023; 30: 543–572.

Niu

Wan

. A review on applications of heuristic optimization algorithms for optimal power flow in modern power systems. J Mod Power Syst Clean Energy 2014; 2: 289–297.

Hota

Mohanty

. Automatic generation control of multi source power generation under deregulated environment. Int J Electr Power Energy Syst 2016; 75: 205–214.

Sahu

Pati

Mohanty

, et al. Teaching–learning based optimization algorithm based fuzzy-PID controller for automatic generation control of multi-area power system. Appl Soft Comput 2015; 27: 240–249.

Barakat

. Optimal design of fuzzy-PID controller for automatic generation control of multi-source interconnected power system. Neural Comput Appl 2022; 34: 18859–18880.

Kassem

Yousef

. Voltage and frequency control of an autonomous hybrid generation system based on linear model predictive control. Sustain Energy Technol Assessments 2013; 4: 52–61.

10.

Hasan

Alsaidan

Sajid

, et al. Hybrid MPC-based automatic generation control for dominant wind energy penetrated multisource power system. Model Simulat Eng 2022; 20: 55–68.

11.

Kuang

Tian

Liu

, et al. A review of control strategies for automatic generation control in power systems with renewable energy. Prog Energy 2024; 6: 22–35.

12.

Puebla-Gutierrez

Favela-Contreras

Avila

, et al. Embedded asynchronous MIMO adaptive predictive control. IEEE Trans Industr Inform 2023; 20: 2244–2252.

13.

Vali

Petrović

Pao

, et al. Model predictive active power control for optimal structural load equalization in waked wind farms. IEEE Trans Control Syst Technol 2021; 30: 30–44.

14.

Wang

. Deep learning adaptive dynamic programming for real time energy management and control strategy of micro-grid. J Clean Prod 2018; 204: 1169–1177.

15.

Ghasemi-Marzbali

. Multi-area multi-source automatic generation control in deregulated power system. Energy (Calg) 2020; 21: 11–24.

16.

Dong

Sun

Wang

, et al. Power flow analysis considering automatic generation control for multi-area interconnection power networks. IEEE Trans Ind Appl 2017; 53: 5200–5208.

17.

Yin

Zhou

, et al. Artificial emotional reinforcement learning for automatic generation control of large‐scale interconnected power grids. IET Generation Trans & Dist 2017; 11: 2305–2313.

18.

. Deep reinforcement learning based multi-objective integrated automatic generation control for multiple continuous power disturbances. IEEE Access 2020; 8: 156839–156850.

19.

Zhou

, et al. A multi-step unified reinforcement learning method for automatic generation control in multi-area interconnected power grid. IEEE Trans Sustain Energy 2020; 12: 1406–1415.

20.

Zhao

Zeng

Liu

, et al. Automatic generation control in a distributed power grid based on multi-step reinforcement learning. Prot Control Mod Power Syst 2024; 9: 39–50.

21.

Muduli

Jena

Moger

. Application of reinforcement learning-based adaptive PID controller for automatic generation control of multi-area power system. IEEE Trans Autom Sci Eng 2024; 22: 1057–1068.

22.

Lecarpentier

Rachelson

. Non-stationary Markov decision processes, a worst-case approach using model-based reinforcement learning. Adv Neural Inf Process Syst 2019; 32: 171–178.

23.

Abdulla

Bhatnagar

. Reinforcement learning based algorithms for average cost Markov decision processes. Discret Event Dyn Syst 2007; 17: 23–52.

24.

Al-Saadi

Al-Greer

Short

. Reinforcement learning-based intelligent control strategies for optimal power management in advanced power distribution systems: a survey. Energies 2023; 16: 16–28.

25.

Keyhani

Chatterjee

. Automatic generation control structure for smart power grids. IEEE Trans Smart Grid 2012; 3: 1310–1316.

26.

Yang

C-W

Dubinin

Vyatkin

. Automatic generation of control flow from requirements for distributed smart grid automation control. IEEE Trans Industr Inform 2019; 16: 403–413.

27.

Asadi

Farsangi

Amani

, et al. Data-driven automatic generation control of interconnected power grids subject to deception attacks. IEEE Internet Things J 2022; 10: 7591–7600.

28.

Sajadi

Strezoski

, et al. Integration of renewable energy systems and challenges for dynamics, control, and automation of electrical power systems. WIREs Energy & Environment 2019; 8: e321.

29.

Hessel

Czarnecki

, et al. Discovering reinforcement learning algorithms. Adv Neural Inf Process Syst 2020; 33: 1060–1070.

30.

Ying

K-C

Lin

S-W

. Reinforcement learning iterated greedy algorithm for distributed assembly permutation flowshop scheduling problems. J Ambient Intell Humaniz Comput 2023; 14: 11123–11138.

31.

Dayan

Hinton

. Using expectation-maximization for reinforcement learning. Neural Comput 1997; 9: 271–278.

32.

Ding

Zhao

, et al. A new asynchronous reinforcement learning algorithm based on improved parallel PSO. Appl Intell 2019; 49: 4211–4222.

33.

Wang

Kang

Shao

, et al. Improving generalization in reinforcement learning with mixture regularization. Adv Neural Inf Process Syst 2020; 33: 7968–7978.

34.

Mounjid

Lehalle

. Improving reinforcement learning algorithms: towards optimal learning rate policies. Math Finance 2024; 34: 588–621.

35.

Yao

Shoults

Kelm

. AGC logic based on NERC's new control performance standard and disturbance control standard. IEEE Trans Power Syst 2000; 15: 852–857.

36.

Aziz

Stojcevski

. Analysis of frequency sensitive wind plant penetration effect on load frequency control of hybrid power system. Int J Electr Power Energy Syst 2018; 99: 603–617.