Markov decision process framework of optimal energy dispatch in a smart data center with uninterruptible power supplies

Abstract

This paper examines the use of an Uninterruptible Power Supply (UPS) to enhance the operational efficiency of data centers. It focuses on developing an optimal energy scheduling strategy for a data center equipped with UPS, using the Markov decision process (MDP) framework. The MDP framework simulates the decision-making process involved in minimizing energy costs. Each unit’s available power output in the data center is treated as a Markov state, taking into account the uncertainty associated with renewable distributed generation. This uncertainty drives the system to transition to other Markov states in subsequent decision times. A recursive optimization model is established for each Markov state at each decision time to guide state-based operations, which includes determining the unit output while considering both current and future costs. The challenge of dealing with high dimensionality, arising from a substantial number of states and actions in the model, is effectively addressed by adopting an approximate dynamic programming (ADP) method. This approach incorporates decision-state and forwards dynamic algorithms to tackle the complexity of the MDP-based model. By employing ADP, the computational burden is reduced, enabling efficient and practical solutions to be obtained.

Keywords

Data center Markov decision process reinforcement learning UPS

1. Introduction

The growing adoption of renewable energy (RE) has significantly accelerated the global transition towards a cleaner and more sustainable energy landscape. With increasing concerns about climate change, environmental degradation, and the depletion of traditional fossil fuel resources, there has been widespread recognition of the need to reduce greenhouse gas emissions and promote renewable sources of energy. However, the increased penetration of RE presents challenges in maintaining a balanced power supply. Furthermore, the decline in flexible power resources has negatively impacted power system dispatch operations, emphasizing the need for new dispatchable resources to ensure power system security. Data centers (DCs) have emerged as significant electricity consumers due to the rapid expansion of Internet services [1]. In 2020, data centers (DCs) played a significant role in China’s energy consumption, accounting for approximately 2.7% of the country’s total electricity consumption. These DCs consumed a staggering amount of electricity, estimated to be around 204.5 terawatt-hours (TW $\cdot$ h). This substantial energy demand is attributed to the rapid growth of digital services, cloud computing, artificial intelligence, and other data-intensive applications [2]. To support the growing number of DCs and maintain their operational security, the scale of uninterruptible power supply (UPS) systems has also expanded [3]. In addition to their primary function of providing backup power during utility outages, UPS systems have emerged as a promising resource for power dispatch in DCs. These UPS systems offer a unique advantage by leveraging the stored energy to meet power demands through charging and discharging operations.

Data centers play a pivotal role in the modern digital landscape, serving as the backbone of the information age. They are not only substantial consumers of electricity but also act as crucial backup power resources, ensuring the continuous operation of critical services in the event of power interruptions. One of the key technologies that underpin the reliability and resilience of data centers is the Uninterruptible Power Supply (UPS). In this paper, we delve into the intricate relationship between data centers and UPS systems and explore their profound implications for power system dispatch. By considering dynamic electricity tariffs and incentive rewards, DCs have the potential to play a significant role as demand response resources and actively participate in power system dispatch control. Dynamic electricity tariffs, which vary based on the time of day, season, and grid conditions, incentivize DC operators to adjust their power consumption patterns to align with the availability of renewable energy and the overall grid demand [4]. A coordination operation model is proposed to optimize the collaboration between DCs and conventional units, taking into account the temporarily interruptible and shiftable characteristics of DCs [5]. This approach enables better utilization of RE, enhances dispatch control efficiency and flexibility, and provides economic incentives through interruptible and shiftable characteristics [6]. Coordinating DC loads across different locations offer spatial flexibilities, allowing power demand to be transferred between power systems via DCs.

Data centers, as voracious consumers of electricity, are indispensable for the seamless functioning of our data-driven world. Their significance goes beyond mere data storage and processing; they stand as guardians of uninterrupted service delivery. Amidst an increasingly complex and interconnected power grid, data centers offer a dual role – as power consumers and, critically, as dependable backup power resources. It is within this dual role that the Uninterruptible Power Supply (UPS) emerges as a linchpin in ensuring the reliability and safety of data center operations. During periods of high demand for DC services, DCs alone may not be sufficient to meet dispatch instructions. Hence, it becomes feasible to jointly dispatch other controllable resources alongside data centers (DCs). Researchers have explored the integration of DCs with solar photovoltaic (PV) generation to predict energy demand and facilitate dispatch, thereby mitigating the risk of reduced power system stability [7]. Moreover, the coordination of DCs offers enhanced flexibility, stability, and reliability compared to conventional units, further benefiting the economics and stability of the power system [8]. Furthermore, DCs can devise incentive-compatible strategies in electricity markets to lower power system operation costs and increase their revenues. However, realizing such a mutually beneficial scenario comes with its own set of challenges [9]. While previous studies have primarily focused on power system dispatch involving DCs’ interruptible and shiftable characteristics, the role of DCs’ uninterruptible power supply (UPS) aspect in power dispatch has received comparatively less attention, despite being an integral part of the power supply resources.

UPS systems serve as vital backup power equipment in data centers, guaranteeing the safe and continuous operation of electrical equipment during power outages [10]. Among the various types of UPS systems, online UPS is widely preferred in data centers due to its ability to provide high-quality power output and fast response times. To further optimize the performance of UPS systems in the power system, advanced control systems have been introduced to enhance their fast dynamic response capabilities [11, 12]. However, the utilization factor of UPS as the backup power source in data centers is typically low.

Otherwise, in an era where sustainable energy sources like wind and solar power are becoming increasingly prevalent, the integration of renewable energy into the power grid presents both opportunities and challenges. Data centers, as substantial power consumers, are uniquely positioned to contribute to the management of renewable energy uncertainties. This paper explores the dynamic relationship between data centers and renewable energy sources, focusing on the role of UPS scheduling as a mitigation strategy. By leveraging UPS capabilities, we propose an innovative scheduling strategy that actively mitigates the impacts of renewable energy variability and uncertainty on power system dispatch.

This paper presents a novel approach to harnessing the transferability of data center loads in power system dispatch. The proposed approach introduces a hierarchical dispatch strategy for UPS in data centers, utilizing the principles of MPC. The primary objective of this strategy is to enhance power system stability, optimize economic efficiency, and maximize the utilization of UPS. By bridging the application gap of UPS in power system dispatch, the proposed hierarchical dispatch strategy fills an important area of research. It offers a promising solution to leverage the inherent capabilities of UPS in data centers and integrate them effectively into the broader power system dispatch framework. Through the application of Model Predictive Control, the strategy enables efficient and coordinated decision-making, ensuring the reliable and optimal operation of UPS while contributing to overall power system stability and economy.

2. Literature review

Recent studies have proposed various innovative ideas and strategies for UPS dispatch, which can also be applied to the control of UPS. Efficient dispatch strategies are necessary to ensure stable power supply services for UPS, as highlighted in numerous literature sources [13]. UPS can mitigate the unpredictability and volatility of RE power output and correct dispatch instruction errors during emergencies [14]. Efficient dispatch strategies play a crucial role in minimizing dispatch costs in the operation of UPS. Multi-objective optimization models and robust optimal strategies have been developed, taking into account different electricity prices to adjust the charging/discharging power. These strategies have shown significant potential in reducing dispatch costs and improving economic efficiency [15]. To enhance dispatch reliability, long-term optimization approaches have been proposed, considering uncertainties in RE, loads, and electricity prices. One such approach is the state of charge (SOC) interval management method, which optimizes the SOC range and charging/discharging power output. By incorporating uncertainty factors, this method enhances dispatch reliability by providing flexibility in managing the SOC, ensuring a stable and reliable power supply [16]. However, it is important to note that the error tolerance of power dispatch using a single UPS is limited by its power and energy capacity. The dispatch capabilities of a single UPS may not be sufficient to handle large fluctuations in demand or supply, potentially leading to increased risks and compromised system stability. Therefore, alternative approaches, such as coordinated dispatch with other dispatchable resources, need to be explored to overcome these limitations and achieve more efficient and reliable power dispatch operations.

In order to tackle the aforementioned challenges, a novel energy management system has been proposed, utilizing a UPS [17]. This integrated system enables efficient coordination of active power output, resulting in reduced operation costs and minimized power fluctuations [18, 19]. These studies highlight the advantages of UPS in optimizing power dispatch operations by leveraging its ability to discharge during periods of high electricity prices and charge during periods of low prices. However, it is crucial to consider the unique characteristics of Data Centers (DCs), including factors such as power outage duration and available energy, when integrating UPS systems into the power dispatch process [20]. Existing approaches often overlook the specific operational characteristics of UPS systems in DCs, which hampers their effective participation in power dispatch and diminishes their overall utilization [21, 22]. Therefore, it is imperative to address this research gap and develop tailored strategies that fully exploit the potential of UPS systems in DCs, enabling their active engagement in power dispatch and maximizing their utilization.

Various optimization models, including linear programming, quadratic programming, stochastic mixed-integer programming, and reinforcement learning, have been proposed to facilitate active power dispatch [23]. Among these models, model predictive control (MPC) has emerged as a highly promising approach. MPC optimizes the active power output of dispatch resources by taking into account predictive models, rolling horizon optimization, and feedback correction. By implementing MPC in UPS control, it becomes possible to reduce RE fluctuations and minimize dispatch errors [24]. Moreover, the performance of the MPC controller can be further enhanced by considering the risk-averse nature of UPS, thereby improving power system resilience and reducing costs [25]. In the context of distributed systems, distributed MPC ensures the coordinated operation of the power system with UPS. This approach effectively balances voltage regulation and power sharing, while simultaneously mitigating the impact of communication delays on power dispatch [26, 27]. By leveraging distributed MPC, the power system can achieve optimal operation and effectively utilize the capabilities of UPS, leading to enhanced stability and efficiency in power dispatch. Research within the domain of MDP-based power system dispatch has demonstrated its effectiveness in various contexts. Several studies have applied the MDP framework to optimize generation schedules, allocate resources, and enhance grid reliability. Notably, the MDP approach has shown promise in addressing the intermittent nature of renewable energy sources, a challenge that is becoming increasingly prevalent in modern power systems. However, signal-type UPSs face challenges in balancing dispatch requirements and economy during emergencies.

The utilization of the MDP framework within the context of UPS scheduling, as proposed in this paper, builds upon this body of work. By seamlessly integrating MDP-based decision-making into UPS operations, our approach aims to not only bolster data center efficiency but also contribute to the broader goals of power system dispatch optimization. This section forms a foundation for the subsequent discussions, highlighting the MDP framework as a logical and effective tool for addressing the challenges posed by renewable energy variability and uncertainty.

Existing studies indicate that optimal energy management with a UPS utilizing model predictive control (MPC) can effectively minimize economic losses and mitigate random risks associated with renewable energy (RE) under failure conditions [28]. Furthermore, the literature highlights the use of UPS, specifically with battery and supercapacitor dispatch instruction tracking based on MPC and iterative learning control, as a means to reduce power losses [29]. Additionally, multi-stage stochastic programming incorporating MPC has been proposed to coordinate the operation of UPS between slow-timescale and fast-timescale, resulting in reduced power loss and improved accounting for forecasting uncertainties through power circulation [30]. While these studies have primarily focused on the application of MPC in UPS dispatch operation control, there is a noticeable lack of research concerning the utilization of MPC in dispatch operation control of uninterruptible power supplies (UPS) in data centers (DCs). Therefore, exploring the potential of MPC for UPS dispatch control in DCs remains an area with a limited investigation, presenting an opportunity for further research and development.

In recent years, there has been growing interest in the development and application of hybrid actor-critic algorithms in various domains, particularly in the field of reinforcement learning. These algorithms combine the strengths of both actor and critic approaches to enhance the stability and convergence speed of learning processes. Notable examples include the Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) algorithms, which have demonstrated impressive performance in tasks ranging from robotic control to game playing. The hybrid actor-critic paradigm leverages actor networks to determine optimal policies and critic networks to estimate value functions, resulting in more efficient and effective policy optimization. This approach has gained recognition for its ability to handle high-dimensional state and action spaces, making it a promising candidate for addressing complex control and optimization challenges, including those in the domain of UPS dispatch and control. While the literature on hybrid actor-critic algorithms is extensive, its specific application in the context of UPS systems and renewable energy integration may require further exploration and adaptation to address the unique challenges of power system dispatch.

3. Model formulation of the proposed MDP framework

The proposed model for optimal energy dispatch in smart DCs with uninterruptible power supplies (UPS) follows a Markov Decision Process (MDP) framework, encompassing the following key components: 1. State space: Defining state variables that capture the current system state, including the energy level of UPS, workload status, and electricity price.2. Action space: Specifying the available actions for the controller at each time step, such as determining the amount of power to dispatch from the UPS.3. Reward function: Establishing a reward function that reflects the system’s objective, which could involve minimizing electricity costs, maximizing UPS utilization, or reducing carbon emissions.4. Transition probabilities: Describing the probabilities of transitioning to new states based on the current state and action taken.5. Discount factor: Incorporating a discount factor to weigh the importance of future rewards relative to immediate rewards.6. Policy: Identifying the optimal policy that maps each state to the corresponding action, maximizing the expected total reward. The model formulation typically involves representing the problem as a finite-horizon or infinite-horizon MDP and employing dynamic programming or reinforcement learning algorithms to solve it. The specific formulation and solution method depends on the system’s particular objectives and constraints.

To address the resource allocation problem in DCs, the MDP framework is utilized, and reinforcement learning (RL) techniques are applied. The objective is to formulate the resource allocation problem by interacting with the DC environment and making optimal decisions. Acting as a controller within the DC environment, the Energy-efficient Resource Allocator senses the environment’s state and takes actions to explore the impact of those actions on the current state.

State space: The state of a DC encompasses various critical components that collectively define its operational status and requirements. These components include:

1.
Current Time: The current time reflects the temporal context of the DC and serves as a crucial parameter for time-sensitive operations, scheduling, and decision-making within the facility.
2.
Workloads: Workloads represent the tasks and processing requirements of the DC, encompassing activities performed by servers, and network equipment. Workloads can vary in terms of their nature, priority, and resource demands.
3.
IT Equipment Capacity: The capacity of IT equipment refers to the maximum processing capability and resource limits of the servers and other computing devices present in the DC. It takes into account factors such as the number of servers, their processing power, memory capacity.
4.
Energy Supplies: Energy supplies encompass the various sources and systems responsible for providing power to the DC. This includes utility grid electricity, backup generators, renewable energy sources employed within the facility.
5.
Energy Demand: Energy demand represents the total amount of power required by the DC to support its operations, including the power consumption of IT equipment, cooling systems, lighting systems, and other power-consuming components. It reflects the overall power needs of the facility at a given point in time.

By considering these elements as part of the DC state, operators and management personnel can gain a comprehensive understanding of the operational context, available resources, and energy requirements. This information is vital for making informed decisions, optimizing resource allocation, managing energy usage, and ensuring the smooth and efficient operation of the data center.

$\displaystyle S=\left\{{t,w,B,s,d_{u}}\right\}$ (1)

Action space: In this research, various variables related to the power output of uninterruptible power systems (UPS), the state of servers, and the overall workloads of the servers are taken into account. The assumption is made that all servers within a single data center (DC) are powered by UPS. The total energy consumption of a DC can be categorized into different components, including IT equipment, cooling and lighting systems, and power distribution systems. The energy consumed by the IT equipment is influenced by factors such as the number of workloads and power usage effectiveness (PUE). In this study, the focus is specifically on delay-tolerant workloads, which are workloads that do not require immediate processing. These delay-tolerant workloads are considered demand-side resources, while the analysis excludes delay-sensitive workloads.

$\displaystyle A=\left\{{p_{u},U_{s}|p_{u}^{\min}\leqslant p_{u}\leqslant p_{u}% ^{\max},U_{s}\in\left\{{0,1}\right\}}\right\}$ (2)

Reward signal: Under typical circumstances, the UPS in a DC is supplied with electricity from the utility grid. However, in the event of abnormal conditions or power outages, the UPS serves as an emergency power source to ensure continuous operation. For this paper, it is assumed that the DC operates under normal conditions, relying on grid electricity for power supply. To account for the locational wholesale price of electricity, the operational cost can be formulated as follows:

$\displaystyle r=\textit{price}\ast p^{\textit{total}}$ (3)

Environment: In this research, it is assumed that all servers within the same DC are powered by a UPS, irrespective of their specific locations. However, the energy consumption of each node does not directly correspond to the energy output provided by the UPS. To account for this, a matrix is introduced to represent the energy usage efficiency, which is a function of the UPS load ratio. The relationship between the load ratio and efficiency can be represented by a function, which can be linearized as shown in Eq. (4) in the paper. The constants $\alpha$ and $\beta$ in the equation depend on the specific characteristics of each UPS. It is important to note that inefficient UPS operation results in heat loss to the equipment, necessitating the constraint of efficiency to a certain value. Furthermore, the power output capacity of the UPS is subject to limitations, resulting in the following constraints:

$\displaystyle\left\{{{\begin{array}[]{l}{p^{\textit{total}}=\mathop{\sum}% \limits_{u}p_{u}}\hfill\\ {p_{u}=\frac{p_{u}}{\sigma_{u}}}\hfill\\ {\sigma_{u}=\mathbb{Q}\left({\nu_{u}}\right)=a\nu_{u}+b}\hfill\\ {\sigma_{u}^{\min}\leqslant\sigma_{u}\leqslant 1}\hfill\\ \end{array}}}\right.$ (4)

The energy consumption of a data center (DC) is allocated among different components, including IT equipment (such as servers, data storage devices, and network devices), cooling and lighting systems, and power distribution systems. The energy consumed by the IT equipment can be determined by factors such as the number of workloads and power usage effectiveness (PUE). As previously mentioned, the workloads in the DC can be classified into two categories: delay-sensitive and delay-tolerant workloads. In this research, the focus is on delay-tolerant workloads, which do not require immediate processing. These workloads are considered demand-side resources. To model the behavior of these delay-tolerant workloads, Eq. (5) is employed. This equation captures the relationship between the workloads and their corresponding energy consumption.

$\displaystyle\left\{{{\begin{array}[]{l}{w=\mathop{\sum}\limits_{i}\lambda_{i}% }\hfill\\ {0\leqslant w\leqslant B}\hfill\\ \end{array}}}\right.$ (5)
4. Solving method

To solve the Markov model with mixed continuous and discrete action spaces, a novel reinforcement learning algorithm is proposed. Conventional reinforcement learning algorithms are designed to handle either continuous or discrete actions, making them unsuitable for this mixed-action space problem.

Hybrid actions, which combine discrete and continuous actions, are necessary for certain decision-making scenarios where both types of actions need to be taken simultaneously to achieve the desired outcome. For example, in robotics, a hybrid action could involve discrete movement steps combined with continuous motion within those steps. In manufacturing, a hybrid action might involve selecting a machine (discrete action) and adjusting its speed or power (continuous action).

To tackle MDPs with hybrid action spaces using reinforcement learning (RL), algorithms capable of handling both discrete and continuous actions are employed. One common approach is to use actor-critic methods, where the policy and value function approximations are separated. The actor network learns to generate continuous actions, while the critic network approximates the value function.

Another approach involves extending deep Q-learning networks (DQNs) to handle continuous action spaces by incorporating algorithms such as deterministic policy gradients (DPG) or deep deterministic policy gradients (DDPG). In DDPG, the actor network outputs continuous actions, and the critic network approximates the action-value function.

Recent advancements in RL have introduced algorithms like soft actor-critic (SAC), twin delayed deep deterministic policy gradient (TD3), and proximal policy optimization (PPO) that can handle hybrid action spaces. These algorithms leverage techniques such as entropy regularization, target networks, and clipped surrogate objective functions to enhance training stability and convergence.

In summary, solving MDPs with hybrid action spaces using RL requires selecting an appropriate algorithm that can handle both continuous and discrete actions. Proper hyperparameter tuning, including learning rates, discount factors, and exploration rates, is also crucial for achieving satisfactory results.

Table 1
Server parameters

	PUE	Ppeak	Pidle	BIT
1	1.3	105	65	750
2	1.9	93	72	850
3	1.6	108	65	750
4	1.3	91	72	850
5	1.1	100	65	750
6	1.3	41	72	850

Figure 1.

Solving framework of a hybrid actor-critic algorithm.

Figure 2.

Illustration of one test DC.

5. Results and discussion

Assuming a DC with 6 servers distributed across 3 nodes, as depicted in Fig. 2. Table 1 provides the parameter settings for various factors related to the power consumption of the IT equipment in the DC. These parameters include Power Usage Effectiveness (PUE), peak power consumption, idle power consumption, and IT equipment capacity for each server. PUE represents the efficiency of power usage in the DC, with a lower value indicating more efficient energy utilization. The peak power consumption reflects the maximum power demand of the IT equipment during periods of high workload or intense processing. On the other hand, idle power consumption represents the power consumed by the IT equipment when it is in a low or idle state. Additionally, the table includes the IT equipment capacity for each server, which indicates the maximum workload or processing capacity that a server can handle efficiently. By specifying these parameter settings in Table 1, the study establishes a foundation for analyzing and evaluating the power consumption characteristics of the IT equipment in the DC. The objective of this study, for delay-tolerant workloads, is to fulfil the network demand, as illustrated in Fig. 3.

Regarding the power supply side, the DC includes 3 nodes, each equipped with a set of UPS. Table 2 presents the parameter settings for each UPS used in the study. These parameter settings are crucial in understanding and analyzing the operational characteristics and capabilities of the UPSs. The table includes various parameters that define the behavior and performance of the UPSs. Some of the key parameters may include the maximum power output capacity of the UPS, the efficiency curve that describes the relationship between load ratio and efficiency, the heat loss coefficient indicating the heat dissipated during operation, and any constraints on the UPS operation. These parameter settings are essential for modelling and simulating the UPSs’ behavior within the power system dispatch framework. By considering these parameters, researchers can accurately assess the UPSs’ capabilities, their impact on power supply reliability, and the overall effectiveness of the proposed dispatch strategies. The reinforcement learning parameters are specified in Table 3.

Table 2
The procedure of reinforcement learning for optimal energy dispatch of DC

The procedure of hybrid actor-critic algorithm-based reinforcement learning
1	Initialization: actor, critic, actor target and critic target network.
2	Initialization: replay buffer.
3	For every episode do:
4	Observe current state $s$
5	Select continuous and discrete actions $a$ based on the actor network according to $\varepsilon$ -greedy policy
6	Obtain a new state and reward from the environment
7	Update the policies based on the Bellman function
8	Update the current state
9	End for

Table 3

UPS parameters

	$\alpha$	$\beta$
1	0.07	0.89
2	0.12	0.86
3	0.08	0.85

Table 4

Reinforcement learning algorithm parameters

Symbols	Value
Learning rate	–	10^{- 3}
Batch size	$N_{B}$	64
Discount factor	$\gamma$	0.9
Episode size	T	96

Figure 3.

Workload of the test DC.

To illustrate the benefits of the proposed approach, Table 5 presents a comparison between the proposed example and other scenarios. Case A represents the results without optimizing UPS power output, while Case B does not consider workload allocation.

Table 5

Comparison with other cases

Case	Operation cost (k$)
Base case	20.36
Case A	32.65
Case B	31.20

In addition to efficiency, the operational cost is a crucial factor for assessing the effectiveness of the optimization. Table 5 demonstrates that the approach proposed in this paper successfully reduces operational costs. The optimal approach on the power supply side achieved a cost saving of 37.64 compared to Case A. Furthermore, it achieved a cost saving of 34.74% compared to Case B, highlighting the effectiveness of the results.

Table 6 provides a comparison of the average efficiency of uninterruptible power systems (UPS) in different nodes. The purpose of this comparison is to evaluate the efficiency performance of UPS under different scenarios. The base case in the table represents the optimization result, indicating the average efficiency achieved when the workload is distributed optimally among the nodes. This serves as a benchmark for assessing the efficiency of the UPS. In Case C, a specific scenario is examined where the workload is sequentially distributed to the nodes in a predetermined order, starting with UPS1, followed by UPS2, and finally UPS3. This scenario allows for a closer examination of the efficiency performance when workload distribution is not optimized but follows a predefined sequence. By comparing the average efficiency values across the different nodes and scenarios, valuable insights can be gained regarding the effectiveness of workload distribution strategies and their impact on UPS efficiency. This information can aid in decision-making processes related to system optimization, energy management, and cost-efficiency considerations in the deployment and operation of UPS in various applications.

Table 6

Comparison of Average Efficiency of UPS

Case	UPS 1	UPS 2	UPS 3
Base case	96.85	89.32	87.25
Case C	91.02	84.28	59.62

Table 6 presents the average efficiency values of UPS1, UPS2, and UPS3 in section 1, highlighting their performance in ensuring efficient energy supply operations. The average efficiency of these UPS units surpasses 85%, indicating their ability to deliver power effectively. A specific scenario, Case C, is examined in Table 6 where the workload distribution follows a predetermined sequence. Interestingly, UPS1 in Case C exhibits a higher average efficiency compared to the base case. This suggests that the workload distribution strategy implemented in Case C is beneficial for UPS1’s efficiency. However, it is important to consider the overall system performance. Despite the higher efficiency of UPS1 in Case C, inefficient operations of UPS2 and UPS3 could introduce challenges such as increased heat generation and higher costs. These issues emphasize the need for optimizing the entire UPS system rather than focusing solely on individual units. The comparison of average UPS efficiency across different scenarios underscores the effectiveness of the optimization approach employed. It highlights the significance of optimizing workload distribution and operational strategies to enhance the overall efficiency and performance of the UPS system. By carefully analyzing these efficiency metrics and considering the associated challenges, system operators and decision-makers can make informed choices to maximize the efficiency, reliability, and cost-effectiveness of UPS operations in various settings.

Figure 4.

Convergence result of reinforcement learning.

The convergence results of the reinforcement learning algorithm depicted in Fig. 4 are shown. From the figure, it can be observed that the overall trend of learning the operating cost is decreasing, indicating the effectiveness of the proposed strategy and learning algorithm. Additionally, there is some fluctuation in the learning curve between 5000 and 7000 iterations, which corresponds to the exploration phase typical in reinforcement learning, ultimately converging around 8000 iterations.

6. Conclusion

This paper presents a novel model designed to allocate delay-tolerant workloads among multiple nodes in a data center (DC) to minimize operational costs. The model considers both the power supply and demand response aspects, offering a comprehensive approach to the problem. To address this complex optimization challenge, the paper formulates the problem as a Markov Decision Process (MDP). Several decision variables are defined and justified in the model, including server states, workload allocation, and power output. These variables play a crucial role in achieving efficient workload distribution and optimal power supply management within the DC. A key contribution of this paper is the introduction of operational efficiency as a metric to evaluate the optimization results. By focusing on operational efficiency, the model aims to improve resource utilization and minimize costs.

To assess the performance of the proposed model, a sample DC model is utilized. The model is evaluated based on operational costs, and a comparison of these costs demonstrates the effectiveness of the proposed scheme in achieving cost savings. The results indicate that the model successfully optimizes workload allocation and power output, leading to improved operational efficiency and reduced expenses.

Future research directions are also outlined in the paper. One area of interest is the inclusion of both delay-sensitive and delay-tolerant workloads in the model, expanding its applicability to a wider range of scenarios. Furthermore, the integration of electricity prices and renewable energy volatility into the workload distribution process will enhance the model’s ability to adapt to dynamic energy market conditions and further optimize cost-effectiveness. By addressing these research directions and continually refining the proposed model, this work paves the way for more efficient and sustainable management of workloads and power supply in data centers.

Footnotes

Acknowledgments

This research was Supported by the Research on design scheme of energy supply system in data center (Project No. 030100QQ00210003).

References

Ding

Østergaard

Sørensen

Meibom

. Towards a European renewable-based energy system enabled by smart grid: status and prospects. Dianli Xitong Zidonghua/Automation of Electric Power Systems. 2011; 35(22): 12-7.

Sadiq

Ali

Terriche

Mutarraf

Hassan

Hamid

, et al. Future Greener Seaports: A Review of New Infrastructure, Challenges, and Energy Efficiency Measures. IEEE Access. 2021; 9: 75568-87.

Ding

Cao

Xie

Wang

. Integrated Stochastic Energy Management for Data Center Microgrid Considering Waste Heat Recovery. IEEE Transactions on Industry Applications. 2019; 55(3): 2198-207.

Ding

Huang

Yang

Blaabjerg

. A hierarchical modeling for reactive power optimization with joint transmission and distribution networks by curve fitting. IEEE Systems Journal. 2018; 12(3): 2739-48.

Dou

Wei

Song

. Carbon-aware electricity cost minimization for sustainable data centers. IEEE Transactions on Sustainable Computing. 2017; 2(2): 211-23.

Badiei

Zhan

Azimi

Reda

. DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters. In: 16th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2016. SEAS, Harvard University, United States: Institute of Electrical and Electronics Engineers Inc.; 2016. pp. 70-9.

Bao

. Modeling demand response capability by internet data centers processing batch computing jobs. IEEE Transactions on Smart Grid. 2015; 6(2): 737-47.

Liang

Goh

Kurniawan

Zhang

Dai

Liu

, et al. Utilizing landfill gas (LFG) to electrify digital data centers in China for accelerating energy transition in Industry 40 era. Journal of Cleaner Production. 2022; 369.

Ghamkhari

Mohsenian-Rad

. Energy and performance management of green data centers: A profit maximization approach. IEEE Transactions on Smart Grid. 2013; 4(2): 1017-25.

10.

Ghamkhari

Mohsenian-Rad

. Optimal integration of renewable energy resources in data centers with behind-the-meter renewable generator. IEEE International Conference on Communications. 2012; 3340-4.

11.

Jiang

Cao

. Risk-constrained operation for internet data centers under smart grid environment. In: 2013 International Conference on Wireless Communications and Signal Processing. IEEE; 2013. pp. 1-6.

12.

Dou

Wei

Song

. Minimizing Electricity Bills for Geographically Distributed Data Centers with Renewable and Cooling Aware Load Balancing. Proceedings – 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things, IIKI 2015. 2016; 210-4.

13.

Chen

Wang

Giannakis

. Cooling-Aware Energy and Workload Management in Data Centers via Stochastic Optimization. IEEE Journal on Selected Topics in Signal Processing. 2016; 10(2): 402-15.

14.

Guo

Gong

Fang

Khargonekar

Geng

. Energy and network aware workload management for sustainable data centers with thermal storage. IEEE Transactions on Parallel and Distributed Systems. 2014; 25(8): 2030-42.

15.

Rao

Liu

. Temporal load balancing with service delay guarantees for data center energy cost optimization. IEEE Transactions on Parallel and Distributed Systems. 2014; 25(3): 775-84.

16.

Yang

Wang

Liu

Nie

, et al. Predictive energy-saving optimization based on nonlinear model predictive control for cooperative connected vehicles platoon with V2V communication. Energy. 2019; 189: 116120.

17.

Guo

Xin

Sun

Zhang

. Rapid-charging navigation of electric vehicles based on real-time power systems and traffic data. IEEE Transactions on Smart Grid. 2014; 5(4): 1969-79.

18.

Lin

McPhee

Azad

. Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control. IEEE Transactions on Intelligent Vehicles. 2021; 6(2): 221-31.

19.

Jiang

Cao

Zhang

. Risk-constrained operation for internet data centers in deregulated electricity markets. IEEE Transactions on Parallel and Distributed Systems. 2014; 25(5): 1306-16.

20.

Lyu

Yan

Ding

Lyu

Sun

. Optimal switching sequence model predictive control for three-level NPC grid-connected inverters. IET Power Electronics. 2021; 14(3): 626-39.

21.

Huang

Rong

Zhang

Liao

. Robust Predictive Torque Control of N3-Phase PMSM for High-Power Traction Application. IEEE Transactions on Power Electronics. 2020; 35(10): 10799-809.

22.

Lee

Kim

Cha

. Energy efficient speed planning of electric vehicles for car-following scenario using model-based reinforcement learning. Applied Energy. 2022; 313: 118460.

23.

Zhu

. Hierarchical load tracking control of a grid-connected solid oxide fuel cell for maximum electrical efficiency operation. Energies. 2015; 8(3): 1896-916.

24.

Huang

Guo

Lin

. Bi-level decentralised active power control for large-scale wind farm cluster. IET Renewable Power Generation. 2018; 12(13): 1486-92.

25.

Liao

. A Robust Load Frequency Control Scheme for Power Systems Based on Second-Order Sliding Mode and Extended Disturbance Observer. IEEE Transactions on Industrial Informatics. 2018; 14(7): 3076-86.

26.

Liu

. Predictive control of wind turbine for load reduction during ramping events. International Journal of Electrical Power and Energy Systems. 2017; 93: 135-45.

27.

Guo

Luo

Yang

. Energy-oriented car-following control for a front- and rear-independent-drive electric vehicle platoon. Energy. 2022; 257: 124732.

28.

Huang

Liu

Zhang

. Multi-time-scale Slack Optimal Control in Distribution Network Based on Voltage Optimization for Point of Common Coupling of PV. Dianli Xitong Zidonghua/Automation of Electric Power Systems. 2019; 43(3): 92-100.

29.

Sun

Zhang

Sun

Zhou

. Stochastic co-optimization of speed planning and powertrain control with dynamic probabilistic constraints for safe and ecological driving. Applied Energy. 2022; 325: 119874.

30.

Huang

Zhao

Liao

. Distributed Optimal Voltage Control for VSC-HVDC Connected Large-Scale Wind Farm Cluster Based on Analytical Target Cascading Method. IEEE Transactions on Sustainable Energy. 2020; 11(4): 2152-61.

	PUE	Ppeak	Pidle	BIT
1	1.3	105	65	750
2	1.9	93	72	850
3	1.6	108	65	750
4	1.3	91	72	850
5	1.1	100	65	750
6	1.3	41	72	850

	PUE	Ppeak	Pidle	BIT
1	1.3	105	65	750
2	1.9	93	72	850
3	1.6	108	65	750
4	1.3	91	72	850
5	1.1	100	65	750
6	1.3	41	72	850

Markov decision process framework of optimal energy dispatch in a smart data center with uninterruptible power supplies

Abstract

Keywords

1. Introduction

2. Literature review

3. Model formulation of the proposed MDP framework

Table 1 Server parameters

Table 2 The procedure of reinforcement learning for optimal energy dispatch of DC

Footnotes

Acknowledgments

References

Table 1
Server parameters

Table 2
The procedure of reinforcement learning for optimal energy dispatch of DC

	PUE	Ppeak	Pidle	BIT
1	1.3	105	65	750
2	1.9	93	72	850
3	1.6	108	65	750
4	1.3	91	72	850
5	1.1	100	65	750
6	1.3	41	72	850