Abstract
In cloud computing, task scheduling is a critical process that involves efficiently allocating computing resources to fulfill diverse task requirements. To address issues such as unstable response times, extensive computations, and challenges in parameter adjustment faced by traditional task scheduling methods, an enhanced deep Q-learning cloud-task-scheduling algorithm was proposed. This algorithm utilizes deep reinforcement learning and introduces an improved strategy. The optimization of the objective function was achieved by defining the state space, action space, and reward function. The agent’s exploration capability was enhanced through the utilization of a UCB exploration strategy and Boltzmann action exploration. Simulation experiments were conducted using Pycloudsim. The average instruction response time ratio and standard deviation of CPU utilization were compared to measure the advantages and disadvantages of the algorithm. The results indicate that the proposed algorithm surpasses the random, earliest, and RR algorithms in terms of the instruction-to-response time ratio and CPU utilization, demonstrating enhanced efficiency and performance in cloud-task scheduling.
Keywords
Introduction
A traditional data center is a data center that has traditional internet technologies, infrastructure, and management models. Usually, a traditional data center is built on the enterprise’s own site by the enterprise to buy and maintain the server, storage equipment, network equipment, and other infrastructure through the proprietary software and management team to support and manage a variety of business applications, but traditional data centers have many defects. First of all, they occupy a large area and require a lot of physical space and equipment. The energy consumption and maintenance costs are high, and a lot of power and management personnel are needed to support and maintain the normal operation of the equipment. Secondly, a professional IT team is needed to be responsible for the management and maintenance of servers, storage, network, security, and other aspects, and there is the risk of a single point of failure; once one of the pieces of equipment fails, it may affect the normal operation of the entire data center. The last point is that it requires a long deployment and debugging time, which cannot quickly respond to business changes and requirements. In addition, traditional data centers have disadvantages such as high energy consumption, high maintenance costs, and single point of failure. In these issues, cloud computing has great advantages, and cloud computing is flexible and scalable.
With the rise of cloud computing, virtualization technology [1], and software-defined data centers [2], cloud computing service centers have gained widespread adoption. Virtualization technology allows for the partitioning of infrastructure, such as servers, into multiple virtual machines, leading to improved hardware resource utilization. Cloud computing models enable the provisioning of infrastructure [3], platforms [4], and software services, enhancing resource flexibility and scalability. Essentially, cloud computing is a computing model that leverages the internet to provide immediate access to a range of computing resources, including processing power, storage capacity, network bandwidth, and more. Users can easily purchase and utilize these resources via the internet, eliminating the need for physical equipment procurement and maintenance. The enhanced computing power of cloud computing empowers enterprises and individuals to swiftly meet their business requirements at a lower investment cost.
In the cloud computing environment, in addition to cloud environment network security [5], task scheduling [6, 7] is also a problem worthy of attention. Due to the virtual nature of cloud computing resources, task scheduling needs to consider different factors, such as resource requirements, quality of service (QoS) requirements, and resource utilization, in order to effectively use resources and meet the QoS requirements in a cloud computing environment. Specifically, task scheduling needs to reasonably allocate computing, storage, and network resources to maximize resource utilization, thereby reducing costs. Moreover, in cloud computing, users usually put forward some QoS requirements, such as response time, reliability, scalability, etc. Task scheduling should ensure that these QoS requirements are met. In the cloud computing environment, resource usage is very dynamic. Task scheduling needs to monitor resource usage in real-time and create dynamic scheduling according to the actual situation. Additionally, in cloud computing, there may be load imbalances, resulting in some resources being idle and other resources being overloading. Therefore, task scheduling also needs to consider load balancing so that the resources under different circumstances maintain a balanced load. In summary, the importance of task scheduling in cloud computing is very significant. Only through effective task scheduling can we achieve the maximization of resource utilization, quality-of-service guarantees, and load balancing in a cloud computing environment.
Deep reinforcement learning [8, 9] is an advanced algorithm that employs deep neural networks to learn the mapping between environment states and actions, enabling intelligent decision-making. Deep Q-learning [10, 11], a variant of deep reinforcement learning, is based on the Q-learning [12] algorithm. It leverages deep neural networks to learn the Q-value function, facilitating intelligent decision-making. Within task scheduling, deep reinforcement learning and deep Q-learning algorithms find applications in resource scheduling, task allocation, and load balancing. The following steps outline their implementation:
Model the environment: Represent the task-scheduling problem as a reinforcement learning problem, defining states, actions, rewards, and other relevant factors. Data collection: Gather the training data set by collecting data from actual task scheduling in the environment. Design the neural network: Develop a deep neural network to capture the relationship between states and actions. Train the model: Utilize the back propagation algorithm to train the neural network, continuously updating the Q-value function to achieve optimal decision-making. Evaluate the model’s performance and effectiveness.
The advantages of deep reinforcement learning and deep Q-learning algorithms lie in their ability to enable intelligent resource scheduling, task allocation, and load balancing, consequently enhancing system performance and efficiency.
Common cloud task scheduling algorithms
Traditional cloud task-scheduling algorithms, such as the first come, first serve (FCFS) algorithm [13], schedule according to the order of task arrival. The advantage is that it is simple to implement and suitable for simple task-scheduling problems, but it cannot adapt to complex task-scheduling scenarios, which easily leads to low resource utilization and long task response times. The shortest job first (SJF) algorithm [14] schedules tasks according to the length of their execution time. The advantage of SJF is that it can improve the task response speed and the throughput of the system, but it may experience the problem of starvation for tasks that take a long time. The RR (round robin) algorithm [15] divides time into several time slices, and each task executes a certain task time within a time slice and then switches to the next task. The advantage is that it can improve the task response time, but it easily leads to the problem of large task switching overheads and unstable task response times. Heuristic-intelligent [16] scheduling algorithms use heuristic rules and prior knowledge to guide the search process so as to improve search efficiency and search quality. Common heuristic-intelligent algorithms, such as the particle swarm optimization (PSO) algorithm [17, 18], are based on the idea of adaptive search to minimize the energy consumption and task response time of the system by constantly adjusting the task allocation strategy. The advantage is that the task allocation strategy can be adjusted adaptively, but the parameters of the algorithm are difficult to determine, and this requires a lot of computing resources. The genetic algorithm (GA) [19, 20] searches for the optimal task allocation strategy through the inheritance and mutation of genes [21]. The advantage of GA is that it can search the task allocation strategy globally, but the search time is long and requires a lot of computing resources. The basic idea of the ant colony optimization (ACO) algorithm [22] is to abstract the problem into a graph theory problem, place some virtual ants randomly on the graph, and optimize the solution of the problem through the process of ants moving on the graph. The advantage is that the ant colony algorithm has strong global search abilities and adaptive abilities and can avoid falling into local optimal solutions, with better optimization results for complex nonlinear problems and advanced optimization problems. However, the convergence speed is slow, and the ant colony algorithm needs more time and iterations in the early search stage to obtain better solutions. It is difficult to adjust the parameters. There are many parameters to be adjusted in the ant colony algorithm, and it is necessary to determine the optimal parameters through a large number of experiments.
Deep reinforcement learning and deep Q-learning
The core idea of the deep Q-learning algorithm is to estimate the action-value function
where
Schematic diagram of the DQN model structure.
Algorithm process and framework
After initializing the simulation environment and parameters, the current state is derived from the task list and virtual machine list. Subsequently, UCB exploration [24] is employed, along with the addition of Boltzmann exploration [25], to generate an action based on the Q-network. The unknown action space can be explored using the UCB exploration strategy, which explores the unknown action by introducing a confidence upper bound in order to gather more information about the action. In cloud task scheduling, this means that the algorithm can actively try out some perhaps less certain but potentially beneficial task scheduling strategies to better understand the environment and find the best one. Boltzmann’s exploration strategy uses a probability distribution to determine the probability that each action will be selected, rather than simply choosing the action with the highest value. This gives the algorithm a certain probability of choosing other actions, even though they may not be the optimal action at the moment. In cloud task scheduling, this “soft exploration” approach can help algorithms be more comprehensive and balanced when exploring unknown Spaces of action. The scheduler then assigns the task to the virtual machine according to the generated action, initiating the simulation process. During the simulation, the reward is calculated, and the next state is observed and stored in the memory unit. To ensure sufficient training data for the neural network, the algorithm allows the agent to explore the environment for a designated period before training. This decision is based on comparing the current step to the predetermined exploration step. The value network is trained by updating the target value network every E step using randomly selected data from the experience pool. Finally, the program determines whether the maximum number of steps has been reached. If so, the program terminates; Otherwise, the network continues. The specific process of cloud task scheduling based on a DQN is shown in Fig. 2.
Flow chart of cloud task scheduling based on DQN.
State space, action space, and reward function
In the cloud task scheduling algorithm, the agent’s perception of the current environment encompasses both the task state and the virtual machine cluster state. Hence, when it becomes necessary to schedule a task to the cluster, the current state can be succinctly described as follows:
where
Action refers to the behavior of assigning tasks to virtual machines, so the action space can be expressed as a numbered set of virtual machines, and the action space is expressed as follows:
In general, a reasonable reward function can greatly improve the performance of deep reinforcement learning algorithms. In this paper, only the average instruction response time ratio is considered here, so the reward function design of the task is expressed as:
where
Designing an appropriate optimization objective holds significant importance in task scheduling as it directly impacts the algorithm’s convergence and accuracy. In this paper, the optimization objective focuses on the average instruction response time ratio of tasks, considering user QoS requirements.
Firstly, the average instruction response time ratio of all tasks is defined, which is expressed as follows:
where
where mi (million instructions) is the instruction length of the task, and
where
The transfer time can be expressed as the amount of data transferred by the task divided by the network transmission speed from the scheduler to the virtual machine, which is expressed as follows:
When the task arrives, any task that has not been executed before will be executed according to the principle of first come, first serve, which is executed by the server in the virtual machine; then, the waiting time will be zero. If the task arrives with an unfinished task in front of it, the task will wait. Suppose there are X tasks that have not completed and the virtual machine can transfer tasks while executing them; then, the waiting time of a task in the virtual machine can be expressed as:
The execution time of a task is mainly related to the instruction speed of the virtual machine CPU mips (million instructions per second), the task instruction length
By integrating Eqs (6) and (11), the final optimization objective of this paper is obtained, which is expressed as follows:
where
Experimental environment
Since it is expensive to schedule tasks in the actual environment, this paper used a simulation experiment environment. The software environment was Python 3.7, the deep learning framework was PyTorch 1.13, and the integrated development environment was JetBrains PyCharm 2022. Pycloudsim [26], a cloud scheduling simulation experiment platform, was also used.
Our dataset used the dataset generated by simulations shared in the literature [26]. The characteristics of the dataset were that the number of submissions of each task at each time was assumed to be a Poisson distribution, the average number of submissions was set, and the instruction length, CPU utilization, and transmission data size of the task were assumed to meet a normal distribution.
Experimental parameter setting
This paper is based on reference [26], and the experimental parameter setting part refers to reference [26], with some parameters also being improved. The task generation parameter settings are shown in Table 1.
Task parameter settings of simulated data set
Task parameter settings of simulated data set
The virtual machine cluster was set to contain 10 virtual machines, among which two virtual machines with mips values of 200, 300, 400, 500, and 600 were used to represent virtual machines with different computing capabilities, and the virtual machine parameters were set as per Table 2.
Settings of virtual machine parameters
When setting the algorithm parameters, we set the system to train the network after 600 steps of exploration. The reward hyper parameter was set to 0.01. The cache memory unit needs to be set larger than this, so the cache unit is set to 15000. The learning rate of the value network was set to 0.004, the target value network was updated every 70 steps, the future reward discount was set to 0.7 when calculating the Q-value, and the batch size of training was 32. The specific parameter settings are shown in Table 3.
DQN algorithm parameter settings
When the poisson_lam value is 2, the DQN loss converges to about 10k after training, and the loss change curve during training is shown in Fig. 3. Where, the vertical coordinate represents the loss value, and the horizontal coordinate represents the number of training steps.
Change diagram of algorithm training loss.
The Q-value first increases and then gradually becomes stable. The Q-value is a response to the reward of the algorithm, and the purpose of training the algorithm is to maximize the Q-value. A curve of Q-value change is shown in Fig. 4. Where, the ordinate represents the Q value and the abscissa represents the number of training steps.
Q value change diagram of algorithm training.
When the poisson_lam value is set to 2, the average command response time ratio of four task-scheduling algorithms, namely Random, Earliest, RR (round-robin), and DQN (improved DQN), was compared. The simulation results demonstrate that the algorithm presented in this study achieves higher task instruction execution within a shorter timeframe. As a result, it effectively reduces task response time, meeting the user service agreement. Figure 5 depicts the comparative experimental outcomes for the average command response time of the DQN algorithm.
Comparison of DQN average instruction response time ratio.
The DQN algorithm exhibits superiority over other algorithms in terms of the standard deviation of CPU utilization throughout the entire duration. This metric serves as an indicator of cluster load balance, suggesting that the DQN algorithm effectively promotes load balancing within the cluster. The experimental results comparing the standard deviation of CPU utilization are depicted in Fig. 6.
Comparison of CPU utilization standard deviation.
The DQN algorithm is better than the other algorithms in the average response time ratio regarding instructions, but when compared to the other algorithms at the beginning of scheduling, the convergence speed and scheduling performance of the DQN algorithm are not as high. As the number of submitted tasks increases, the performance of the DQN algorithm increases to be better than the performance of the executing greedy strategy. This is because the algorithm implements the UCB strategy, and UCB is mainly based on exploration, so its average revenue may not be good at the beginning. The average response time instruction comparisons for each algorithm with a poisson_lam value of 2 were designed, and the experimental results are shown in Fig. 7.
While there have been numerous studies on cloud task scheduling, research specifically focusing on cloud task-scheduling algorithms based on deep reinforcement learning remains limited. This paper contributes to this field by proposing a deep Q-learning task-scheduling algorithm that incorporates an enhanced exploration strategy. To evaluate its performance, this algorithm was compared with the random, earliest, and RR algorithms. The results are shown in Table 4.
Performance comparison of different algorithms
Performance comparison of different algorithms
Comparison of average response time of instructions.
The table reveals that the DQN algorithm outperforms other algorithms in terms of CPU utilization (standard deviation), exhibiting a significantly lower value. The standard deviation of CPU utilization serves as an indicator to assess cluster load balancing, highlighting the DQN algorithm’s capability to effectively enhance load balancing within the cluster. Additionally, the DQN algorithm demonstrates remarkable superiority over other algorithms in terms of both average and optimal instruction response time. This observation indicates the algorithm’s ability to execute a greater number of task instructions within a given time frame, effectively reducing task response time and satisfying the user service agreement.
In today’s rapidly developing cloud computing technology environment, cloud task scheduling occupies an important position. The cloud task scheduling algorithm proposed in this paper with an improved DQN is superior to other scheduling algorithms in terms of CPU utilization and average instruction-to-response time ratios, and experiments were designed to verify this on a simulation platform. However, there are also some shortcomings. Firstly, the performance of the algorithm is not very good when the number of scheduled tasks is small. In the future, it is hoped that the algorithm can be improved to achieve optimal scheduling performance at each stage. Secondly, the experiment was carried out on a simulation platform, in which the influencing factors in the actual task scheduling are often ignored, meaning there would have been some errors in the experimental results. And for large-scale cloud computing environments, there may be some limitations or challenges that may apply to deep reinforcement learning-based task scheduling algorithms. For example, 1. A large-scale cloud computing environment may contain thousands or even hundreds of thousands of tasks and virtual machine instances, and the task scheduling algorithm needs to be able to efficiently process large-scale tasks and resources. 2. Task scheduling algorithms in large-scale cloud computing environments need to have good scalability and can maintain efficient scheduling performance in the face of huge tasks and resource scales. 3. In a large-scale cloud computing environment, the arrival of tasks and changes in resources may be dynamic and real-time, and the scheduling algorithm needs to have real-time and dynamic adaptability to make decisions and adjustments in a short period of time.
Footnotes
Acknowledgments
The authors acknowledge the 2022 Jilin Provincial Department of Education Science and Technology Research Project(Grant: JJKH20220013KJ), the 2019 key project of Jilin Provincial Education and Science “13th Five Year Plan” (Grant: ZD19077).
