Optimizing intelligent startup strategy of power system using PPO algorithm

Abstract

This article aimed to use the proximal policy optimization (PPO) algorithm to address the limitations of power system startup strategies, to enhance the adaptability, coping ability, and overall robustness of the system to variable grid demand and integrated renewable energy, the constraints in the power system start-up strategy are optimized. Firstly, this article constructed a dynamic model of the power system, including key components such as generators, transformers, and transmission lines; secondly, it integrated the PPO algorithm and designed interfaces that allow the algorithm to interact with the power system model; afterward, the state variables of the power system were determined, and a reward function was designed to evaluate the startup efficiency and stability of the system. Next, this article adjusted the reward function and trained and iterated multiple times in the simulation environment to guide the algorithm to learn the optimal startup strategy. Finally, an effective evaluation of the strategy can be conducted. The research results showed that after optimization by the PPO algorithm, the stable frequency startup of the power system only took about 23 seconds, and the system recovery time was reduced by 33.3% under a sudden load increase. The algorithm used can significantly optimize the intelligent startup strategy of the power system.

Keywords

Power system startup strategy proximal policy optimization system adaptability and robustness dynamic modeling performance evaluation

1. Introduction

With distributed generation and dynamic demand, traditional power grid start-up scheduling faces challenges like grid stability [1, 2], energy integration [3, 4], and rapid response to extreme events [5, 6]. Conventional methods, based on linear programming or classical control theory [7, 8], are limited with fluctuating demands and renewable integration [9]. Intelligent, flexible scheduling is essential for efficient, reliable grid operation.

This article employs the PPO algorithm to optimize power generation startup and simulate a dynamic power system. A reward function, considering stability, startup efficiency, and penalties for instability and delays, drives iterative optimization. Initial strategy parameters, based on historical data, are encoded as neural network weights. During simulation training, PPO optimizes strategies through environmental interaction. The experiment evaluates the impact on startup time and system stability, verifying PPO’s effectiveness. It also tests the algorithm’s adaptability and robustness in emergencies by measuring recovery time and performance indicators. In this paper, the PPO algorithm is combined with a neural network to optimize the start-up strategy of the power system, which significantly enhances the adaptability and robustness of the system under a complex environment. The innovation lies in the successful construction of a dynamic model, which can comprehensively evaluate and optimize the behavior of the power system, and improve the response efficiency and stable operation ability of the system to various emergencies.

•
Optimizing the startup strategy of intelligent power systems using the Proximal Policy Optimization algorithm, enhancing system stability and startup efficiency comprehensively through dynamic model construction and reward function design.
•
In a simulated environment, by adjusting the reward function and iterating through training, the algorithm guides the learning of the optimal startup strategy, significantly reducing the startup time of the power system and enhancing its ability to respond to emergencies such as sudden load increases.
•
Following optimization with the PPO algorithm, the stable frequency startup time of the power system is approximately 23 seconds, reducing system recovery time by 33.3% under sudden load increase scenarios. The application of the algorithm greatly optimizes the intelligent startup strategy of the power system.

2. Related works

Optimizing power system start-up scheduling often relies on deterministic models and basic data-driven methods. Some studies employ Optimal Power Flow (OPF) [10, 11] and adaptive control strategies [12, 13]. Other studies have explored the application of machine learning techniques such as neural networks [14, 15] and support vector machines (SVM) [16, 17] to predict electricity demand and load changes. Furthermore, a multi-step recovery method based on Distributed Generators (DGs) can achieve maximum energy recovery in the power grid system startup strategy [18]. The comprehensive power grid startup system utilizing ubiquitous power IoT can effectively solve the problem of strong random disturbances and continuously coordinate key behavioral strategies [19]. Researchers such as Shaik Jan Bhasha [20] proposed the Firefly Algorithm-based Time-series Neural Network (FATNN) assisted voltage source converter high-voltage direct current technology power system recovery method for soft start problems in power systems. Chen Changming [21] and other researchers discussed the grid startup system based on renewable energy and the emerging power system startup method of energy internet.

The PPO algorithm, as a reinforcement learning technique, has demonstrated its superiority in multiple high-complexity fields and is suitable for handling dynamic environments and uncertainty problems. In automatic control [22], intelligent robot navigation [23], and complex resource allocation problems [24], PPO significantly improves decision quality and efficiency by learning optimal strategies [25, 26]. Currently, a large number of studies have applied the PPO algorithm to intelligent control of power systems. Due to the negative impedance potential of Constant Power Load (CPL), the stability of distribution networks based on power electronic converters is prone to instability. Some studies [27] have successfully applied the PPO algorithm to adjust the coefficients of MFSMC (Model-Free Sliding Mode Controller) controllers. To adapt to the volatility of renewable energy and changes in grid topology, other researchers [28] propose using the PPO algorithm for real-time decision-making, to achieve effective integration of renewable energy in the power system. For resource management and spectrum sharing issues in the power system, researchers such as Zhang Haijun [29] have adopted power control based on Distributed Proximal Policy Optimization (DPPO). Regarding the distribution network embedded with renewable energy and storage devices, scholars such as Cao Di [30] have formulated the multi-period nonlinear programming decision problem as a Markov Decision Process (MDP) process and solved it sequentially using the PPO algorithm. When facing modern power grid startup problems such as load fluctuations and energy integration, applying PPO to the intelligent startup strategy of the power system can effectively improve the system’s ability to handle complex power grid dynamics and overall robustness.

Recent optimization advances have improved power system management. Sun Xinwei et al. [31] used Proximal Policy Optimization for grid reliability. Zhang Bin et al. [32] applied deep reinforcement learning for renewable energy systems. Abidi Mustufa Haider et al. [33] introduced chaotic elephant herding for cyber-physical systems. Shengren Hou et al. [34] developed a constraint-aware reinforcement learning algorithm for energy scheduling. Alkahtani Mohammed et al. [35] proposed a Gannet optimization algorithm to reduce industrial costs. Chen Qingchuan et al. [36] created a deep reinforcement learning-based anti-jamming algorithm for wireless communications. Garg Vaneet et al. [37] proposed a predictive virtual machine consolidation approach that significantly improves resource utilization and system response speed. Srivani B et al. [38] conducted a theoretical analysis and compared the performance of 10 optimization algorithms with the DMS algorithm, further deepening the understanding and application of various optimization algorithms. For large-scale wireless sensor networks with multiple sink nodes, Reddy Chada Sampath et al. [39] developed a secure and energy-efficient technique based on adaptive squirrel-coyote optimization. Dimitropoulos Nikos et al. [40] conducted a comparative analysis of AI-based models for short-term photovoltaic power forecasting. These research outcomes collectively advance the optimization of power system management, enhancing overall system efficiency and reliability.

3. Environmental modeling and PPO algorithm construction

3.1 Environmental modeling

The research topic is to optimize the startup strategy of intelligent power systems, constructing the following system model: including three key components - generators, transformers, and transmission lines. The generator model considers parameters such as rotor inertia coefficient, damping ratio, excitation system, etc.; the transformer model considers parameters such as rated capacity, turns ratio, and short-circuit impedance; the transmission line model considers parameters such as resistance, inductance, and capacitance to simulate the behavior of the power system under different voltage and load conditions. The PPO algorithm is utilized for policy optimization, updating policy network parameters by maximizing the objective function. The algorithm comprises policy and value function networks, where the policy network generates the probability distribution of actions, and the value function network estimates state values. The advantage function is calculated to evaluate the advantage of taking certain actions. The goal is to dynamically adjust the strategy through interaction with the power system model to achieve efficient system startup. A PWM (Pulse Width Modulation) -controlled DC (direct current) motor model is used to optimize the motor startup process by adjusting PWM signals to control motor power and speed, reducing grid load and improving startup efficiency.

Based on the above system model, the following questions are clarified: How to define the state of the power system and executable actions to interact with the PPO algorithm for dynamic policy adjustment of the system? How to construct an appropriate reward function to evaluate the performance of the power system and apply it to the PPO algorithm to guide policy optimization and implementation. How to iterate and optimize startup strategies within the framework of the PPO algorithm to adapt to different power system startup scenarios and ensure efficient and stable system startup. How to adjust the parameters of the PPO algorithm, including learning rate and neural network structure, to improve the training efficiency of the algorithm and the performance of startup strategies.

A dynamic model is constructed based on MATLAB software in the study to simulate the behavior and response of the actual power system. This model includes three key components of the power system: generators, transformers, and transmission lines. Each component is assigned detailed parameters and characteristics.

The generator model is equipped with a 4.5 Hz rotor inertia coefficient and a 0.02 damping ratio, the excitation system time constant is set to 1.6 seconds, and the gain of the governor is set to 3.5. For the electrical model of the generator, the synchronous reactance of 0.15 p.u., the transient reactance of 0.07 p.u., and sub transient reactance of 0.05 p.u. can be set to accurately simulate the response of the generator under different load conditions. The transformer model is set with a rated capacity of 100 MVA, a 10 kV/110 kV ratio, and a short-circuit impedance of 0.01 to simulate the behavior of the transformer during power system startup and load changes.

3.2 Algorithm development

The PPO algorithm updates policies by maximizing the objective function, which is expressed as an estimate of expected returns under a specific policy:

$\displaystyle L(\theta)=\hat{E}_{t}\left[\min\left(r_{t}(\theta)\hat{A}_{t},% \operatorname{clip}\left(r_{t}(\theta),1-\epsilon,1+\epsilon\right)\hat{A}_{t}% \right)\right]$ (1)

Here, is the policy network parameter; is the probability ratio, representing the ratio of new and old policies; is the estimate of the dominance function, and is the hyperparameter used to limit the magnitude of policy updates. The advantage function is used to evaluate the superiority of taking a certain action compared to the average action. The calculation is as follows:

$\displaystyle\hat{A}_{t}=\delta_{t}+(\gamma\lambda)\delta_{t+1}+\ldots+(\gamma% \lambda)^{T-t+1}\delta_{T-1}$ (2)

Among them, is the TD (Temporary Difference) residual; is the reward, and is the discount factor. is a GAE (Generalized Advantage Estimation) parameter, and is a value function used to estimate the expected return in states. In the study, the PPO algorithm uses two main neural networks: policy network and value function network. The strategy network is used to generate the probability distribution of actions, and the value function network is used to estimate the state value. The strategy network updates use the following gradients:

$\displaystyle\nabla_{\theta}J(\theta)=\hat{E}_{t}\left[\nabla_{\theta}\log\pi_% {\theta}\left(a_{t}\mid s_{t}\right)\hat{A}_{t}\right]$ (3)

The goal of a value function network is to minimize the following loss functions:

$\displaystyle L(\phi)=\hat{E}_{t}\left[\left(V_{\phi}\left(s_{t}\right)-V_{t}^% {\text{target }}\right)^{2}\right]$ (4)

Here, $V_{\phi}\left({s_{t}}\right)$ is the estimate of the current value function, and $V_{t}^{\mbox{target}}$ is the objective value function.

Figure 1.

PWM-controlled DC motor model.

Figure 1 shows the constructed PWM-controlled DC motor system model. “DC” represents Direct Current, while “RPM” represents Revolutions Per Minute. By adjusting the PWM signal to optimize the motor starting process, the power grid load is reduced and the starting efficiency is improved, achieving energy management in the power system. PWM reference voltage is used to set the required motor speed, determine the duty cycle of the PWM signal, and thereby control the power and speed of the motor. The Controlled PWM Voltage is the actual output of the PWM signal, which is adjusted according to the setting of the PWM reference voltage. The change in the duty cycle of the signal can cause a change in the effective value of the motor input voltage. H-Bridge is a circuit composed of switches that allow current to flow in two directions, thereby controlling the direction of motor rotation. The H-Bridge in the model receives PWM signals and adjusts the current flowing to the motor accordingly.

4. Status and reward design

As shown in Fig. 2 the entire process of designing and optimizing the state and reward in the startup strategy of intelligent power systems based on the PPO algorithm starts from “state definition”, identifying key state variables: “voltage level”, “system frequency”, and “load demand”.

Each state variable undergoes specific analysis to ensure a comprehensive evaluation. This article incorporates the analysis results into the “reward function construction” step, where a reward mechanism is developed to evaluate different aspects of power system performance. The reward function considers “system stability evaluation”, “startup efficiency evaluation”, and “penalty for poor performance”, the latter including penalties for “instability” and “startup delay”. It can maintain positive rewards for stability and efficiency while optimizing the reward function for any unstable or delayed initiation of negative rewards. One is an iterative process that includes data-driven adjustments, balancing positive and negative rewards, and experimental validation to optimize the reward function and guide optimal performance. Next, this article establishes an optimized reward function to guide strategy implementation and testing. Theoretical models can be applied to practical scenarios. In the performance evaluation, the effectiveness of the implementation strategy is evaluated, and based on these evaluations, feedback, and re-optimization are carried out to further improve the strategy, ensuring that the startup process of the power system is as efficient and stable as possible. The flowchart finishes with an “end” node, indicating the completion of the optimization cycle.

Some parameter settings for the startup strategy of the power system are shown in Table 1.

Table 1
Different reward function parameter settings in power system startup strategy

Setting no.	Stability reward	Startup efficiency reward	Instability penalty	Startup delay penalty
1	$+$ 3.25	$+$ 2.10	$-$ 4.15	$-$ 3.05
2	$+$ 3.75	$+$ 2.35	$-$ 4.50	$-$ 3.30
3	$+$ 4.10	$+$ 2.60	$-$ 4.85	$-$ 3.55
4	$+$ 4.55	$+$ 2.80	$-$ 5.20	$-$ 3.75
5	$+$ 4.90	$+$ 3.05	$-$ 5.55	$-$ 4.00
6	$+$ 5.30	$+$ 3.30	$-$ 5.80	$-$ 4.25
7	$+$ 5.65	$+$ 3.60	$-$ 6.05	$-$ 4.50
8	$+$ 6.00	$+$ 3.85	$-$ 6.40	$-$ 4.75
9	$+$ 6.35	$+$ 4.10	$-$ 6.70	$-$ 4.95
10	$+$ 6.70	$+$ 4.35	$-$ 7.00	$-$ 5.20

Figure 2.

State and reward design process of intelligent power system startup strategy.

In Table 1, each row represents a different set of reward function parameter settings. The values of stability reward and start-up efficiency reward increase with the increase of setting number, indicating higher rewards for system stability and rapid start-up. At the same time, the values of the instability penalty and startup delay penalty also increase, indicating stricter punishment for poor performance. The data shows different attempts of the power system startup strategy in the process of adjusting the reward function parameters, reflecting the adjustment and balance of different factor weights in the optimization process.

5. Initiation strategy iteration and optimization

5.1 Strategy update

Figure 3.

Strategy updates in different power system startup scenarios. (a) High-load startup. (b) Low-temperature start. (c) Emergency recovery start. (d) Distributed resource integration initiation.

The initial strategy parameters, based on the analysis of historical power system startup data, include the probability distribution of startup actions and the expected system response, encoded as the initial weights of the neural network. During the simulation training phase, the PPO algorithm learns and optimizes strategies through interaction with the simulated environment. It executes a series of startup actions and adjusts its strategies based on feedback from the reward function. This process involves two key steps: strategy execution, which generates startup commands based on the current strategy, and strategy update, which adjusts strategy parameters based on the results (rewards). To improve training efficiency, a batch training method is used, with each batch containing dozens of simulated startups to collect sufficient data for strategy updates. The strategy update process, shown in Fig. 3, includes major scenarios such as high-load startup, low-temperature startup, emergency recovery startup, and distributed resource integration startup.

Figure 3(a), (b), (c) and (d) respectively show the strategy updates for power system startup under high load, low temperature, emergency recovery, and distributed resource integration scenarios. The X-axis represents time (10 minutes), and the Y-axis represents strategy parameter values. Different curves represent different simulated start-up batches (16 times for high load and 13 times for the rest). With time, the PPO algorithm can effectively update the parameters of the startup strategy based on real-time situations, and can efficiently cope with different power system startup scenarios. After a certain number of training iterations, the current strategy can be evaluated by executing the startup strategy in different testing scenarios and recording its performance indicators.

5.2 Strategy optimization

Figure 4.

Learning rate adjustment method and convergence speed calculation.

Figure 5.

Correlation between neural network architecture optimization scheme and system utilization.

This article analyzes the underperforming strategies and makes corresponding adjustments to the PPO algorithm parameters, including adjusting the learning rate (Fig. 4) and optimizing the neural network architecture (Fig. 5).

During the simulation training phase, the learning rate adjustment method of the PPO algorithm plays a crucial role in the stability and efficiency of the optimization process. The choice of learning rate affects the step size of the algorithm’s parameter updates, thereby directly influencing the convergence speed and stability of the model. A comparison of different learning rate adjustment methods in Fig. 4 reveals that with an increase in the number of iterations, various learning rate adjustment methods can effectively reduce the model’s convergence speed, indicating that an appropriate learning rate can expedite the optimization process to reach a stable state more quickly. However, if the learning rate is too large, it may cause parameters to update too rapidly, leading to instability in the optimization process, or even failure to converge to the optimal solution. Conversely, if the learning rate is too small, the optimization process will be slow, requiring more iterations to reach the desired state, thereby reducing training efficiency. Therefore, when selecting a learning rate adjustment method, it is necessary to balance convergence speed and stability to ensure that the optimization process proceeds smoothly and converges to the optimal solution.

Figure 4 shows the learning rate adjustment method and the convergence speed calculation, and the number of iterations is set to 100, 500, and 1000 times, respectively. The X-axis representation includes 10 different learning rate adjustment methods: Fixed, Time Decay, Adaptive, Exponential Decay, Step Decay, Cyclical, Momentum-based, adaptive Moment Estimation, Cosine Annealing, and Learning Rate Restart. The Y-axis represents the convergence speed in seconds, and a comparison of the convergence speed before and after optimization was conducted. It can be seen that when the iteration number is 100, the average convergence speed of the 10 learning rate adjustment methods changes from 1.07s to 1.09s. At this point, due to too few iterations, the model was unable to effectively reduce the convergence speed. At 500 iterations, the convergence speed decreased by 27.1%, while at 1000 iterations, it reached 25.9%. It can be seen that as the number of iterations increases, various learning rate adjustment methods in the research can effectively reduce the convergence speed of the model, making the power system start-up to a higher level of response.

Figure 5 shows the correlation between neural network architecture optimization schemes and system utilization. The X-axis represents optimization solutions for neural network architecture, including Incremental Layers/Neurons, Convolutional Neural Networks, Recurrent Neural Networks, short-term Term Memory, Gated Recurrent Unit, Self Attention Mechanism, Residual Networks, Batch Normalization, Dropout, and Network Pruning. The Y-axis represents the improved performance indicators, including Response Time, Computational efficiency, Stability, Resource Usage, Error Rate, Generalization Ability, Training Time, Memory Usage, Energy consumption, and Parallel Processing Capability. The Z-axis represents the system utilization rate (%) corresponding to different optimization schemes and performance indicators. It can be seen that the optimization plan can make different and efficient adjustments to the various performance indicators to be improved. Among them, through network pruning optimization, the parallel processing ability of the system is effectively improved, and the system utilization rate reaches 91.7%.

6. Evaluation of the effectiveness of intelligent startup strategies

6.1 System startup time evaluation

Figure 6.

Power system startup time evaluation.

This article uses the time series analysis method to evaluate the system startup time, records the time required for the system to reach a stable operating state during multiple simulated startup processes, and quantifies the impact of the PPO algorithm on startup efficiency, as shown in Fig. 6.

The X-axis in Fig. 6 represents the system startup time (1 second to 76 seconds), and the Y-axis represents the power frequency deviation. The two subgraphs on the left and right show a comparison of the time it takes for the system to reach a stable frequency before and after optimization, including four scenarios: high load, low temperature, emergency recovery, and distributed resource integration startup. Before optimization by the algorithm, the power frequency deviation of the system began to stabilize around 39 seconds and fluctuated around 0. However, after optimization by the PPO algorithm, the time was reduced to about 23 seconds. Based on the PPO algorithm, startup time optimization can be effectively carried out for different scenarios.

6.2 System startup stability

System stability assessment is conducted by monitoring and analyzing the fluctuations of key parameters (including voltage, frequency, and load) during the startup process of the power system. The voltage stability index can be used to quantify the performance of the system under different startup strategies, and the system response before and after PPO algorithm optimization can be compared to evaluate the effectiveness of the algorithm. The results are shown in Fig. 7.

In Fig. 7, the X-axis represents different startup strategies, and the Y-axis represents the voltage stability index (0 to 1). Each strategy involves conducting three different initiation experiments separately. The voltage stability index before PPO optimization: the fast response is 0.90, 0.92, 0.88, and the load balancing is 0.85, 0.87, and 0.86, respectively. The frequency adjustment is 0.91, 0.90, 0.92, and the voltage control is 0.89, 0.88, and 0.90 respectively. The demand forecast was 0.87, 0.85, 0.86, and the emergency response was 0.93, 0.94, and 0.92 respectively. The voltage stability index after PPO optimization: the fast response is 0.95, 0.96, 0.94, and the load balancing is 0.92, 0.93, and 0.91, respectively. The frequency adjustment is 0.94, 0.95, 0.93, and the voltage control is 0.91, 0.90, and 0.92 respectively. The demand forecast was 0.89, 0.90, 0.88, and the emergency response was 0.97, 0.96, and 0.98 respectively.

6.3 System emergency response capability

Table 2
Evaluation of emergency response capability of power system

Type of emergency event	Average recovery time before PPO optimization (seconds)	Average recovery time after PPO optimization (seconds)	Improvement in performance (%)
Sudden load increase	45	30	33.3
Generator failure	60	40	33.3
Transmission line damage	70	50	28.6
Main transformer fault	65	45	30.8
Frequency anomalies	50	35	30.0
Voltage drop	55	38	30.9
Control system failure	75	55	26.7
Ring network breakdown	80	60	25.0
Equipment overheating	68	48	29.4
Natural disaster impact	90	65	27.8

Figure 7.

Power system startup stability assessment.

This article records the time required for the system to recover to a stable state and the performance indicators during the recovery process, to quantify the response efficiency of the intelligent startup strategy to emergencies. The results are shown in Table 2.

Table 2 shows the performance of the power system intelligent startup strategy optimized using the PPO algorithm in response to different types of emergencies. It can be seen that whether facing common emergencies such as a sudden increase in load, generator failure, transmission line damage, or complex situations such as main transformer failure, abnormal frequency, voltage drop, or even extreme challenges such as control system failure, ring network disconnection, equipment overheating, and natural disasters, the optimized startup strategy of PPO algorithm significantly reduces the time required for the system to recover from event occurrence to stable state. Under the sudden increase in load, the system recovery time before and after optimization was reduced from 45 seconds to 30 seconds, an improvement of 33.3%. The data reflects that the PPO algorithm not only improves the rapid response ability of the power system to emergencies but also enhances its adaptability and robustness in complex and ever-changing environments.

Table 3

Comparison of reinforcement learning algorithms in power system startup optimization

Method	Startup time (seconds)	Recovery time reduction (%)	Stability index improvement (%)	Advantages
PPO algorithm	23	33.3	25.0	Performs well in various emergency scenarios, with significant improvements in stability and startup efficiency.
Baseline method	40	20.0	15.0	Longer startup time compared to the PPO algorithm, lower reduction rate in recovery time, and stability index improvement.
DDPG algorithm	25	31.0	22.5	Slightly better in some emergencies, overall performance comparable to the PPO algorithm.
A3C algorithm	30	28.0	20.0	May outperform the PPO algorithm in certain scenarios, but not as significant improvement in stability as the PPO algorithm.
SARSA algorithm	35	25.0	18.0	Longer startup time, lower reduction rate in recovery time, and stability index improvement compared to the PPO algorithm.

Table 3 indicates PPO’s superior performance in reducing startup time, recovery time, and improving stability index compared to baseline methods. While alternatives like DDPG, A3C, and SARSA may have slight advantages in certain cases, overall, their performance is comparable or slightly lower. However, for scenarios with continuous action spaces or non-stationary environments, other algorithms might be more appropriate. Thus, choosing the right algorithm depends on specific application requirements.

7. Conclusions

This article demonstrates PPO’s effectiveness in optimizing power system startup strategies. By constructing a dynamic model and designing a reward function, system stability and startup efficiency were improved. However, further research is needed to assess the model’s adaptability to different power system configurations and operational conditions. Future studies could explore challenges in transferring the optimized strategy to diverse setups and validate effectiveness in practical scenarios, enhancing power system efficiency and stability.

References

Jain

Sakamuri

Cutululis

. Grid-forming control strategies for black start by offshore wind power plants. Wind Energy Science. 2020; 5(4): 1297-1313. doi: 10.5194/wes-5-1297-2020.

Qudaih

Engel

Truijen

KJD

Stockman

Hoffstaedt

, et al. The contribution of low-head pumped hydro storage to grid stability in future power systems. IET Renewable Power Generation. 2023; 17(14): 3594-3608. doi: 10.1049/rpg2.12668.

Karimianfard

Haghighat

Zeng

. Co-optimization of battery storage investment and grid expansion in integrated energy systems. IEEE Systems Journal. 2021; 16(4): 5928-5938. doi: 10.1109/JSYST.2021.3130057.

Zhao

Wang

Hatziargyriou

Shen

. Optimal generator start-up sequence for bulk system restoration with active distribution networks. IEEE Transactions on Power Systems. 2020; 36(3): 2046-2057. doi: 10.1109/TPWRS.2020.3040130.

Yang

Xiang

Zhou

Zuo

Wen

. A single-end protection scheme for hybrid MMC HVDC grids considering the impacts of the active fault current-limiting control. IEEE Transactions on Power Delivery. 2020; 36(4): 2001-2013. doi: 10.1109/TPWRD.2020.3017895.

Samani Arash

D’amicis

De Kooning Jeroen

Bozalakov

Silva

Vandevelde

. Grid balancing with a large-scale electrolyzer providing primary reserve. IET Renewable Power Generation. 2020; 14(16): 3070-3078. doi: 10.1049/iet-rpg.2020.0453.

Znad

Mohamed

Elhaija

. Speeding-up startup process of a clean coal supercritical power generation station via classical model predictive control. Process Integration and Optimization for Sustainability. 2022; 6(3): 751-764. doi: 10.1007/s41660-022-00243-5.

Haghighat

Karimianfard

Zeng

. Integrating energy management of autonomous smart grids in electricity market operation. IEEE Transactions on Smart Grid. 2020; 11(5): 4044-4055. doi: 10.1109/TSG.2020.2992570.

Nikolaidis

Poullikkas

. Evolutionary priority-based dynamic programming for the adaptive integration of intermittent distributed energy resources in low-inertia power systems. Eng. 2021; 2(4): 643-660. doi: 10.3390/eng2040041.

10.

Inoue

Sadamoto

Arahata

Chakrabortty

. Optimal power flow design for enhancing dynamic performance: Potentials of reactive power. IEEE Transactions on Smart Grid. 2020; 12(1): 599-611. doi: 10.1109/TSG.2020.3019417.

11.

Bukhsh

Edmunds

Bell

. Oats: Optimisation and analysis toolbox for power systems. IEEE Transactions on Power Systems. 2020; 35(5): 3552-3561. doi: 10.1109/TPWRS.2020.2986081.

12.

Liu

Sun

Ren

Zhang

. Dynamic circulating current suppression method for multiple hybrid power parallel grid-connected inverters with model reference adaptive system. IEEE Transactions on Industrial Electronics. 2021; 69(5): 4364-4375. doi: 10.1109/TIE.2021.3084166.

13.

Meng

Wang

Guo

. An improved damping adaptive grid-forming control for black start of permanent magnet synchronous generator wind turbines supported with battery energy storage system. IET Generation, Transmission and Distribution. 2023; 17(2): 354-366. doi: 10.1049/gtd2.12753.

14.

Shaik

Ganesh

. Deep neural network and social ski-driver optimization algorithm for power system restoration with VSC-HVDC technology. Journal of Computational Mechanics Power System and Control. 2020; 3(1): 1-9. doi: 10.46253/jcmps.v3i1.a1.

15.

Vaish

Tiwari

Kaimal

. Multi-objective optimization of distributed energy resources based microgrid using random forest model. Bulletin of Electrical Engineering and Informatics. 2024; 13(1): 67-75. doi: 10.11591/eei.v13i1.7087.

16.

Vosoogh

Rashidinejad

Abdollahi

Ghaseminezhad

. An intelligent day-ahead energy management framework for networked microgrids considering high penetration of electric vehicles. IEEE Transactions on Industrial Informatics. 2020; 17(1): 667-677. doi: 10.1109/TII.2020.2977989.

17.

Ding

Wang

Chen

Wang

. Urban monthly power load forecasting based on economy-meteorology-gas demand coupling. Electrical Engineering. 2022; 104(5): 3497-3507. doi: 10.1007/s00202-022-01528-1.

18.

Sekhavatmanesh

Cherkaoui

. A multi-step reconfiguration model for active distribution network restoration integrating DG start-up sequences. IEEE Transactions on Sustainable Energy. 2020; 11(4): 2879-2888. doi: 10.1109/TSTE.2020.2980890.

19.

Xie

Sun

. Automatic generation control strategy for an integrated energy system based on ubiquitous power internet of things. IEEE Internet of Things Journal. 2022; 10(9): 7645-7654. doi: 10.1109/JIOT.2022.3209792.

20.

Shaik

Ganesh

. A power system restoration method using voltage source converter–high-voltage direct current technology, aided by time-series neural network with firefly algorithm. Soft Computing. 2020; 24(13): 9495-9506. doi: 10.1007/s00500-019-04459-5.

21.

Chen

Liang

Zhai

Zhang

Liu

Lin

, et al. Review of restoration technology for renewable-dominated electric power systems. Energy Conversion and Economics. 2022; 3(5): 287-303. doi: 10.1049/enc2.12064.

22.

Zhu

Wang

Pan

Wang

, et al. Research on volt/var control of distribution networks based on the PPO algorithm. Computer Modeling in Engineering and Sciences. 2023; 134(1): 599-609. doi: 10.32604/cmes.2022.021052.

23.

Zhou

Huang

Franti

. A review of motion planning algorithms for intelligent robots. Journal of Intelligent Manufacturing. 2022; 33(2): 387-424. doi: 10.1007/s10845-021-01867-z.

24.

Lee

. Resource allocation for vehicular fog computing using reinforcement learning combined with heuristic information. IEEE Internet of Things Journal. 2020; 7(10): 10450-10464. doi: 10.1109/JIOT.2020.2996213.

25.

Sun

Vivekananda

Shankar

. Achieving concurrency in cloud-orchestrated Internet of Things for resource sharing through multiple concurrent access. Computational Intelligence. 2024; 40(1): e12296. doi: 10.1111/coin.12296.

26.

Achyut

. Efficient data interpretation and artificial intelligence enabled IoT based smart sensing system. Artificial Intelligence Review. 2023; 56(12): 15053-15077. doi: 10.1007/s10462-023-10519-y.

27.

Gheisarnejad

Akhbari

Rahimi

Andresen

Khooban

. Reducing the impact of constant power loads on DC energy systems by artificial intelligence. IEEE Transactions on Circuits and Systems II: Express Briefs. 2022; 69(12): 4974-4978. doi: 10.1109/TCSII.2022.3194271.

28.

Zhou

Lee

Diao

Shi

. Deep reinforcement learning-based real-time AC optimal power flow considering uncertainties. Journal of Modern Power Systems and Clean Energy. 2021; 10(5): 1098-1109. doi: 10.35833/MPCE.2020.000885.

29.

Zhang

Yang

Huangfu

Long

Leung

. Power control based on deep reinforcement learning for spectrum sharing. IEEE Transactions on Wireless Communications. 2020; 19(6): 4209-4219. doi: 10.1109/TWC.2020.2981320.

30.

Cao

Huang

Chen

, et al. Deep reinforcement learning-based approach for optimal power flow of distribution networks embedded with renewable energy and storage devices. Journal of Modern Power Systems and Clean Energy. 2021; 9(5): 1101-1110. doi: 10.35833/MPCE.2020.000557.

31.

Sun

Han

Wang

Shi

Liao

Zheng

, et al. Proximal policy optimization-based power grid structure optimization for reliable splitting. Energies. 2024; 17(4): 834. doi: 10.3390/en17040834.

32.

Zhang

Cao

Huang

Chen

Frede

. Deep reinforcement learning–based approach for optimizing energy conversion in integrated electrical and heating systems with renewable energy. Energy conversion and management. 2019; 202: 112199. doi: 10.1016/j.enconman.2019.112199.

33.

Abidi

Alkhalefah

Moiduddin

Al-Ahmari

. Novel improved chaotic elephant herding optimization algorithm-based optimal defense resource allocation in cyber-physical systems. Soft Computing. 2023; 27(6): 2965-2980. doi: 10.1007/s00500-022-07455-4.

34.

Shengren

Vergara

Duque

EMS

Palensky

. Optimal energy system scheduling using a constraint-aware reinforcement learning algorithm. International Journal of Electrical Power and Energy Systems. 2023; 152: 109230. doi: 10.1016/j.ijepes.2023.109230.

35.

Alkahtani

Abidi

Obaid

HSB

Alotaik

. Modified gannet optimization algorithm for reducing system operation cost in the engine parts industry with pooling management and transport optimization. Sustainability. 2023; 15(18): 13815. doi: 10.3390/su151813815.

36.

Chen

Niu

Wan

Xiang

. A Novel intelligent anti-jamming algorithm based on deep reinforcement learning assisted by meta-learning for wireless communication systems. Applied Sciences. 2023; 13(23): 12642. doi: 10.3390/app132312642.

37.

Garg

Jindal

. Resource optimization using predictive virtual machine consolidation approach in cloud environment. Intelligent Decision Technologies. 2023; 17(2): 471-484. doi: 10.3233/IDT-220222.

38.

Srivani

Sandhya

Padmaja Rani

. Theoretical analysis and comparative study of top 10 optimization algorithms with DMS algorithm. Intelligent Decision Technologies. 2023; 17(3): 607-620. doi: 10.3233/IDT-220114.

39.

Reddy

Narsimha

. Adaptive squirrel coyote optimization-based secured energy efficient routing technique for large scale WSN with multiple sink nodes. Intelligent Decision Technologies. 2023; 17(4): 1207-1232. doi: 10.3233/IDT-220045.

40.

Dimitropoulos

Mylona

Marinakis

Kapsalis

Sofias

Primo

, et al. Comparative analysis of AI-based models for short-term photovoltaic power forecasting in energy cooperatives. Intelligent Decision Technologies. 2021; 15(4): 691-705. doi: 10.3233/IDT-210210.

Optimizing intelligent startup strategy of power system using PPO algorithm

Abstract

Keywords

1. Introduction

3. Environmental modeling and PPO algorithm construction

3.1 Environmental modeling

3.2 Algorithm development

Table 1 Different reward function parameter settings in power system startup strategy

5.1 Strategy update

6.1 System startup time evaluation

6.3 System emergency response capability

Table 2 Evaluation of emergency response capability of power system

References

Table 1
Different reward function parameter settings in power system startup strategy

Table 2
Evaluation of emergency response capability of power system