A survey on cloud computing scheduling algorithms

Abstract

Cloud computing has emerged as one of the hottest topics in technology and has quickly become a widely used information and communication technology model. Performance is a critical component in the cloud environment concerning constraints like economic, time, and hardware issues. Various characteristics and conditions for providing solutions and designing strategies must be dealt with in different situations to perform better. For example, task scheduling and resource allocation are significant challenges in cloud management. Adopting proper techniques in such conditions leads to performance improvement. This paper surveys existing scheduling algorithms concerning the macro design idea. We classify these algorithms into four main categories: deterministic algorithms, metaheuristic algorithms, learning algorithms, and algorithms based on game theory. Each category is discussed by citing appropriate studies, and the MapReduce review is addressed as an example.

Keywords

Cloud computing scheduling deterministic algorithms optimization algorithms metaheuristic algorithms learning algorithms

1. Introduction

The advent of the Internet and its popularity have changed communication’s meaning over time. Besides, the need and possibilities of various computations lead to the increased importance of information computing. The concept of cloud computing was introduced in 1960. However, the first use of cloud computing in its modern context occurred in 2006. Many companies such as Amazon, Google, Yahoo, HP, and Intel have been involved in the development of cloud computing [1].

There are many definitions for cloud computing. The National Institute of Standard and Technologies (NIST) [2], defines it as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources like networks, servers, storage, applications, and services. These resources must be rapidly provisioned and released with minimal management effort or service provider interaction.

In addition to cloud computing, two other well-known computing models emerged. In terms of becoming popular over time, grid computing was first followed by cloud computing and edge computing. Although this paper focuses on cloud computing, there are related significant works studying job scheduling and resource allocation regarding different structures in grid computing and edge computing.

In the cloud environment, there are service consumers and service providers. Consumers submit their work and expect them to be done with the least cost and time, and providers provide the necessary resources to do consumer submitted works. Providers want to make the most profit by the best use of resources and to meet customers’ demands. Thus, the scheduling demand is raising [3].

One of the significant and exciting challenges in the cloud environment is task scheduling. In general, task scheduling is arranging the user requests (tasks) in a particular order so that available resources will be used suitably. Scheduling is the technique of mapping a set of jobs to virtual machines or allocating virtual machines to run on the available resources to fulfill users’ demands [4]. According to the importance of proper scheduling and management of resources, much attention is needed in designing and choosing a suitable scheduling algorithm. Considering the studies that have been done in the field of scheduling in cloud computing, various methods have been proposed. Therefore, it is necessary to be acquainted with different challenges and issues and select the most appropriate option between existing methods based on available resources and requirements of the problem. This paper also provides the necessary background for this selection by reviewing different algorithms regarding the main macro idea on designation.

Various algorithms have been proposed to optimize and schedule job executions in shared environments like the cloud. Among them, deterministic algorithms have received much attention due to their simplicity. However, the complexity of problems and the need for more flexibility have led to the utilization of metaheuristic algorithms. These algorithms can provide a relevant result by creating approximate strategies and exploring the space of scenarios at the right time. Another category includes the learning algorithms, as they can solve a diverse set of problems and challenges in cloud computing. These algorithms receive much attention due to their ability to solve complex problems and find accurate answers using datasets. However, there are crucial challenges in seeking potent hardware compared to other methods. In addition to mentioned categories, game theory provides a new perspective by dealing with conflicts and competitions between objectives in a scheduling problem. Therefore, it has been utilized as a helpful set of algorithms for addressing related problems. Overall, to review scheduling algorithms, we focus on four main categories deterministic algorithms, metaheuristic algorithms, learning algorithms, and algorithms based on game theory. In the following sections, each of these categories is noticed briefly. An ideal and famous example of the cloud utilization is MapReduce while various scheduling algorithms are practically utilized in the MapReduce framework to improve performance.

The rest of this paper is structured as follows. Section 2 outlines the related topics in the cloud environment such as simulators. We present categories of deterministic algorithms, metaheuristic algorithms, and learning algorithms in Section 3, Section 4 and Section 5, respectively. The category of scheduling algorithms based on game theory is explored in Section 6. We review the MapReduce framework regarding scheduling algorithms as an example in Section 7. Finally, Section 8 concludes our work and discusses some future directions.

2. Background

2.1 Problem description

Scheduling is the process of allocating resources to tasks and assigning start and finish times to tasks.

Let us assume several tasks need to be performed while each has resources in the desired range but in chronological order. They should be planned to answer tasks in the shortest possible time. One of the difficulties of such scheduling is the disruption of planned work caused by environmental variations. In addition, challenges in such issues include: Machine task capacity, Machine availability changes, Variations in task required resources, The priority of interrelated tasks, and Preference between independent tasks.

Overall, the issue of scheduling in cloud computing generally consists of two major stages. The first one is the choice of tasks. In this way, the option is about which task is to be selected and start performed. The second one is resource allocation. The scheduler has to decide how many resources on which hardware nodes to allocate to the assigned task.

Resources are one of the most crucial parts of each system, and because of limitations in this part, any system should have proper resource management. Based on work in [5], resource management in the cloud has two critical parts: resource provisioning and resource scheduling.

The process of scheduling is a way to analyze customer required and related Quality of Service (QoS) parameters and decide how to handle them based on available providers’ resources. Regarding works in [3], the first objective of resource scheduling is to identify the appropriate resources for scheduling, the appropriate tasks on time, and the increasing resource utilization effectiveness. The second objective of resource scheduling is to identify the adequate and suitable resource that supports the scheduling of multiple tasks to fulfill numerous QoS requirements such as CPU utilization, availability, reliability, security. In order to increase information, know that security is a set of policies, controls, procedures, and technologies that work together to protect cloud-based systems, data, and infrastructure. The security issue is present in almost all areas of information technology. Due to the connection to the Internet and the existence of user data, this case becomes more sensitive. There are remedies like isolating users’ data and authentication and encrypting stored data [6]. It is good to note that the exploited algorithms must be fast enough to avoid QoS violations. Whereas there are many cryptography methods with various speeds and power, which must be carefully selected in the appropriate method. A good review of cloud computing security is provided in [7].

There are various important goals in scheduling algorithms. As suggested in papers [8, 9, 10], the conceivable scheduling parameters are: Make Span, processing time, quality of service, load balancing, finish time, maximum profit, cost, performance, reliability, and security. Also, there can be many vital parameters in scheduling, such as number of processing nodes, number of storage nodes, bandwidth, service request time, user request, task type, task dependency, physical machines, computing power, computing energy, communication energy. The work in [9] includes the definition and detailed information about these terms.

Energy consumption is a good example of the achievements of improving scheduling algorithms. Energy consumption is becoming important in cloud data centers in recent years. Energy consumption affects the cost of cloud service providers. It also has been considered in green computing because of its impacts on the environment, such as carbon emission [11]. Another impact is reliability, so high energy consumption causes low system reliability [12]. There are lots of works on schedule in compliance with energy consumption [13, 14, 15]. Xing et al. [16] proposed a scheduling algorithm with the goal of energy savings and fairness improvement. Li et al. [17] mainly focus on energy-aware and failure-aware workload scheduling policies. Liang et al. [18] proposed a low-power task scheduling algorithm for heterogeneous cloud computing with the ability to reduce energy consumption by reducing the computational overhead. Another good example is [19].

Here, we illustrate scheduling problem formulation regarding Fig. 1. Let’s assume $\Psi$ is a schedule used by a service provider while the set $S$ is a list of different schedules as $S=\{s_{1},s_{2},\ldots,s_{|S|}\}$ . Moreover, $\textit{time}(s,t)$ is the amount of time it takes for the task $t$ to be completed.

In simple form, based on Eq. (1), the scheduling goal is to minimize the consumption time of all tasks.

$\displaystyle\Psi(S)=\underset{s\in S}{\mathrm{argmin}}\Sigma_{t\in T}{\textit% {time}(s,t)}$ (1)

For example, in a more complicated form, each user requests a specific time for each task based on QoS. Therefore, the best schedule is the one with the lowest violation of QoS as demonstrated in Eq. (2).

$\displaystyle\Psi(S)=\underset{s\in S}{\mathrm{argmin}}\Sigma_{u_{i}\in U}{% \textit{violation}(u_{i},s)}$ (2)

Furthermore, to cover QoS violation for user i based on schedule $s$ , Eq. (3) can be established.

$\displaystyle\textit{violation}(u_{i},s)=\textit{is\_violated}(\textit{Demand}% (u_{i},t_{i}),s)\wedge\textit{QoS}(\textit{Demand}(u_{i},t_{i}))\textit{ is true}$ (3)

Figure 1.

Scheduling schema.

Note that the above formulas can be similarly extended for the other issues that may be raised.

There can be different classifications for various scheduling algorithms. As an example of the first classification, work in [8] categorized scheduling algorithms in cloud computing into two main categories batch mode and online mode.

In batch mode, algorithms receive a set of jobs as a batch, schedule them, and after a fixed time, they receive the next batch. As an example, FCFS and RR are batch algorithms. In online mode, a new task can be received, and it will enter the list of ready tasks and should schedule in its entrance time.

As other works with different criteria to classify scheduling algorithms, we can refer to [20, 21] based on different parameters, [20] based on simulators and adopted tools, [22] based on scheduling objective such as makespan and cost, [4, 22] based on different strategies, [21] based on dynamic and static approaches.

This work considers the macro idea of designing a scheduling algorithm as the main criteria; this classification includes the categories deterministic, metaheuristics, learning, and game theory based algorithms. Let it be noted that the various aims and parameters of scheduling can be achieved in each category. For a better understanding of the content presented in the paper, a list of abbreviations is provided in Table 1.

Table 1

Abbreviations

Abbreviations	Complete form
NIST	National Institute of Standard and Technologies
FCFS	First Come First Serve
FIFO	First In First Out
RR	Round-Robin
MCT	Minimum Completion Time
MECT	Minimum Execution Completion Time
VM	Virtual Machine
SLA	Service Level Agreement
QoS	Quality of Service
GA	Genetic Algorithm
PSO	Particle Swarm Optimization
GSA	Gravitational Search Algorithm
ML	Machine Learning
RL	Reinforcement Learning
DL	Deep Learning
CNN	Convolutional Neural Network
SVM	Support Vector Machine
LSTM	Long Short-Term Memory
RNN	Recurrent Neural Network
DRL	Deep Reinforcement Learning
DQL	Deep Q-learning
DAG	Directed Acyclic Graphs
GT	Game Theory

2.2 Simulators

The simulator is software to model the environment and test the proposed methods and algorithms. What makes a simulator popular can be originated from the ease of usage, change capability to test various conditions and scenarios, learning to use the tool, and the higher cost of actual hardware and maintenance in real environments. One of the significant challenges in evaluating an algorithm is to select a suitable simulator based on the available capabilities and required features. Works in [23, 24, 25] are discussed the available cloud computing simulators.

CloudSim is the most famous simulator in cloud computing, while many developed simulators are based on CloudSim. Other most-used simulators are WorkflowSim, CloudAnalyst, and GreenCloud [25].

A brief explanation of some well-known simulators is provided in the following. CloudSim is an extensible simulation toolkit that enables modeling and simulation of the cloud environment. This simulator supports VM modeling and can be used to study the migration of VMs for reliability and automatic scaling of applications [26]. GreenCloud simulator is for energy-aware cloud computing data centers. The simulation results are obtained for different architectures [27]. CloudAnalyst is based on simulation frameworks such as SimJava and CloudSim. It has an easy-to-use GUI, a high degree of configurability and flexibility, allows modelers to save simulation experiments input parameters and results that mean repeatability [28]. However, it does not support the pricing model [25]. WorkflowSim is a workflow simulator that can be used for evaluating workflow optimization techniques with better accuracy and wider support [29].

3. Deterministic algorithms

A significant category of scheduling algorithms is deterministic algorithms. This category includes several items, some of which are selected as examples. In this category, the scheduling decisions are based on several definite criteria while the outcome under the same conditions leads to the same choice. However, this condition may be changed if the algorithm fails to select between two tasks and lead to random selection.

Let it be noted that most algorithms in this category are exploited as an elemental algorithm or as a base testable algorithm for comparing with a proposed novel algorithm. Generally, the simplicity of these methods is one of the crucial factors in their popularity. Mainly, classical scheduling methods have formed this category. Because of their simplicity and efficiency continue to be used, e.g., FIFO scheduling is used in Hadoop. Overall, a lot of research and works belong to this category [30]. In the following, the well-known deterministic scheduling algorithms are pointed out, whereas instances of these algorithms are given in Table 2. Note that the order of the algorithms is near to their simplicity.

Table 2
Deterministic algorithms

DA	MCT	[31][32]
	FCFS	[33]
	Deadline Aware	[34][35][14][36]
	RR	[37][38]
	MAx-Min, Min-Min	[39][40][33][41][42][43][44]
	Market Oriented	[34][45]
	Fair	[46]
	Others	[30][47]

3.1 Task features oblivious algorithms

In this class, algorithms are oblivious to information such as length of tasks, execution time, and QoS.

First Come First Serve (FCFS) or First In First Out (FIFO) is one of the most popular and, at the same time, one of the simplest known scheduling algorithms. This algorithm queues tasks in the order that they arrive in the ready queue. Algorithm 3.1 is FCFS pseudocode. The default scheduler in Hadoop is JobQueueTaskScheduler, which is a FIFO scheduler. However, a more advanced version is implemented by the priority queue that has to be enabled and is not the default scheduler. Therefore, the superiority of the user’s tasks can be considered. However, a famine challenge for users with lower priority can happen. This method can be used in homogeneous environments and has acceptable behavior, but in heterogeneous environments, performance decreases.

: FCFS initial list_of_tasks as empty list [1] Running free resource is available and length (list_of_tasks) $>$ 0 candidate $\leftarrow$ first_added_element_of (list_of_tasks) remove first_element_of list_of_tasksnew_task receives append new_task to list_of_tasks

A Round-Robin (RR) method is an arrangement of choosing all elements in a group equally in some rational order. In this method, all tasks are chosen in rotational order. Each task has a unit of time after its neighbor. Works such as [37, 38] have used the RR algorithm or the modified versions for task scheduling in the cloud environment.

3.2 Task features aware algorithms

In this class of algorithms, some task related features such as length of tasks, and Minimum Completion Time (MCT) are considered. In the minimum completion time algorithm, tasks are assigned to the virtual resources or virtual machines based on the best predictable completion time in random order. A VM, which is expected to complete a task sooner than the others, will perform it. An improved version of MCT, the Minimum Execution Completion Time (MECT) algorithm, was proposed in [31]. Moreover, inspired by the MCT method, a scheduling algorithm to reach the trade-off between energy savings and delivered performance was proposed in [32]. This scheduling algorithm guarantees to meet the task deadlines and the minimum power consumption.

Longest Approximate Time to End, available in [30], estimates the approximate execution time of each task while there is no exact prediction of execution times. For example, this method may find the average available resources the length of tasks and calculate how long it would take. In this method, the task’s dependence can be considered. In contrast, the return time of a task will be calculated as the sum of the maximum return time of its precedence tasks and its own approximate execution time. However, other policies can be assumed, and calculation is done based on them. The better approximation leads to more hope to reach the promised results in the longest approximate time to end method.

Min-Min assigns the task with the minimum size or completion time to the resource with minimum overall execution time. This method is simple and produces a good output to reduce the average time to complete tasks. Algorithm 3.2 is its pseudocode. Meanwhile, the major challenge of this algorithm is the risk of starvation for some tasks and, at the same time, the reduction of parallelization between tasks. Regarding tasks parallelism, assume three tasks with lengths 2, 2, and 4 and two identical machines. Although the first and second choices will be sizes 2 and 2 running in parallel, the task with size four will be on one node, and the other node will be idle. However, if both tasks with size two are assigned to one node and the task with size four is assigned to other nodes, the parallelism will be higher, and the total execution time will be shorter.

: Min-Min initial list_of_tasks as empty list [1] Running free resource is available and length (list_of_tasks) $>$ 0 selected_task $\leftarrow$ first_element_of (ascending_sort (list_of_tasks based on size (or completion time))) selected_resource $\leftarrow$ first_element_of (ascending_sort (resources based on overall execution time for a task)) do assignment for selected_task and selected_resourcenew_task receives append new_task to list_of_tasks

An improvement of this method was presented by Patel et al. in [39]. This paper emphasized the need to enhance the makespan of the Min-Min algorithm and proposed an improved version of the algorithm. Max-Min assigns the task with the maximum size or maximum completion time to the resource with minimum overall execution time. Algorithm 3.2 is its pseudocode. This method is as simple as the Min-Min method, and the challenge of reducing the degree of parallelism, which was mentioned earlier, has been addressed. For example, in the previous example, a task of size four is selected and placed on one node, and the other two tasks will be on the other node. This method has its disadvantages. If large tasks are abundant, small tasks remain, and the average completion time has an undesirable increase that would not occur if one of nodes was processing them. On the other hand, starvation for larger tasks has been solved, but this problem will now happen for small tasks.

: Max-Min initial list_of_tasks as empty list [1] Running free resource is available and length (list_of_tasks) $>$ 0 selected_task $\leftarrow$ first_element_of (descending_sort (list_of_tasks based on size (or completion time))) selected_resource $\leftarrow$ first_element_of (ascending_sort (resources based on overall execution time for a task)) do assignment for selected_task and selected_resourcenew_task receives RECEIVE_TASK (new_task)

To reduce the wait time and makespan in the cloud environment, an improved scheduling algorithm based on Min-Min and Max-Min is proposed in [40]. The paper [33] provides a comprehensive performance comparison between scheduling algorithms, including FCFS, MCT, Min-Min, and Max-Min. The improved Min-Min [41] is based on adding different rules and combining existing methods. In this algorithm, an attempt has been made to solve the problem of reducing the degree of parallelism. There can be issues for longer tasks in the improved Min-Min algorithm. However, adding a multidimensional priority to solve the problem is very useful. In this regard, prioritization is defined as a weighted average of several measurements. These measurements include the user-defined priority. Moreover, one of the task’s priorities is based on the length, and another priority is related to time, i.e., the longer the waiting time, the higher the priority. Therefore, long tasks can execute after a reasonable time. In the Improved Max-Min method, the average completion time will increase if the number of long tasks is large, while the small tasks could be processed sooner.

The paper [42] proposed a version of this algorithm to assign the longest task to the slowest node and select the slowest nodes in order. As a result, fast nodes match with shorter tasks. Since shorter tasks processing speeds up, fewer problems occur in practice. In this proposed method, statistically and according to the reported results, a better answer has been obtained than the original Max-Min algorithm. The work in [43] utilizes the max-min scheduling heuristic. Their proposed algorithm improves the completion time of requests by using machine learning in clustering requests according to the size and percentage of utilization of virtual machines. Then assigns the largest task to the least utilized virtual machine.

Resource Aware Scheduling method has been introduced for grid computing in [44]. This method is a combination of two methods, Min-Min and Max-Min. In such a way that each time a task is selected, it rotates between the min-min and max-min. If it starts with the odd number of tasks, then start with the Min-Min and if it begins with the even number of tasks, then start with the Max-Min. This heuristic is statistically based on different samples that have been tested and obtained practically.

As the scheduling problem becomes more extensive, the time complexity of the Resource Aware Scheduling method is approximately equal to the time complexities of the Min-Min and Max-Min methods. Let it be noted that reducing the execution time of each task separately does not necessarily cause the reduction of the total execution time. Moreover, there is no obligation on the relationship between throughput and makespan, and the improvement of one does not necessarily affect the other. Earlier, the Min-Min and Max-Min algorithms’ weaknesses have been mentioned, which are problematic in parallelism and average completion time. The Resource Aware Scheduling algorithm has been able to reduce the problem of starvation.

In addition, using the Max-Min algorithm represents the solution to the challenge of weakness in the paralleling execution that has been eliminated. Two components that are often observed in various studies are efforts to optimize the balance of resource allocation between tasks and increase throughput. The second component is more significant when the number of input tasks increases.

Parsa et al. [44] considered this component while Min-Min makes this goal possible. The simulation result of this paper has been achieved based on the grid system. However, the result is still helpful for cloud systems due to its similarities and advantages.

3.3 User demand aware algorithms

In some algorithms, other kinds of features are considered. These features are based on users. For example, deadline of each task is based on user asks, awareness of QoS and SLA, and other priorities of users and service providers.

In Deadline Aware, as the name implies, there is an attempt to meet the required time as a deadline in Deadline Aware schedulers, which the user probably chooses. In this regard, a simple example is to select the task that has the earliest deadline. It is also common to ignore a task if it does not meet its deadline or to extend the deadline at an additional cost. Works in [14, 34, 35, 36] has addressed the deadline-aware scheduling.

In priority based algorithms, Based on [48], priority of jobs is crucial in scheduling because some jobs should be serviced earlier than others and cannot stay for a long time in a system. A suitable job scheduling algorithm must consider the priority of jobs. Ghanbari et al. [48] proposed a scheduling algorithm based on multiple criteria decision-making model.

Market oriented algorithms are another important class. It is essential to pay attention to notable factors for users and aspects that are economically significant for job scheduling. Moreover, the available resources are not infinite. Therefore, the resource allocation should be optimized to maximize the provider’s profit. When there is only one resource, the scheduling determines the order of tasks. However, if there are various resources and the possibility of parallelism between tasks, it is necessary to have a suitable resource allocation and task scheduling. Several simple methods have been used to provide a time estimate of task completion by formulating problems. However, such methods lose effectiveness when multiple resources are available and parallel execution is possible. Note that both processing and memory resources are considered when it comes to resources. In a problem formulation, these resources are usually weighed according to the problem conditions and the type of tasks. An example of a scheduling algorithm with a market-oriented heuristic has been presented in [45]. This algorithm aims to give the most revenue to the service providers while resources are limited. Details of this algorithm are as follows: The scheduler first decides whether to accept or reject the input task. It tries to start the scheduling process by prioritizing tasks and determining what resources to allocate to each task in order. It creates a prioritized queue, selects the highest priority, and allocates resources to the next task if it is not possible for each. Assuming that the volume of work received is large, it is not cost-effective in time and processing if each work is wholly rescheduled. Each job may be stopped in the middle of the run and lose resources while another job runs, but the configuration settings do not change. Each job may have physically new resources after retrieving allocated resources, but the configuration mentioned above is retained. If a task is running, it means that it will meet its deadline. Moreover, paper [45] proposed a general linear decay value function to characterize a client’s willingness-to-pay. This amount does not exceed the maximum value defined by the user, but it can be negative.

It is also necessary to get a good and fast estimate of how long a task will execute. This depends on the size of the input and the assigned configuration. The mentioned paper proposed a collaborative filtering-based approach to quickly and accurately predict the execution time of parallel jobs running in a Big Data analytics service platform. Another example is an algorithm that is presented in paper [34]. This algorithm minimizes Monetary Costs for Deadline Constrained Workflows in the cloud environment. At the same time, it can focus on on-demand provisioning, pay-as-you-go model, instance heterogeneity, performance variation, and VM acquisition delay.

On-demand service means serving resources based on user requirements. Cloud computing users define their needs and pay based on their usage. The expected service level will be addressed in the service level agreement (SLA), and the service provider’s schedule must be such that the provided service does not violate SLA.

In a fair scheduling algorithm, all tasks will be given resources as needed if there is no resource constraint. However, if there are resource restrictions, we should try to establish fairness. It is also important what level of services agreement the user has registered in the contract. Some works tried to establish fairness by targeting the quality of services like paper [46]. Two steps should be considered in the scheduling process: prioritizing tasks and then deciding which resources to allocate to each. The first solution that comes to mind is to build a matrix between tasks and resources that determine how long it takes and choose the shortest one each time, thus responding to both scheduling steps. However, this solution does not create enough fairness because longer tasks are practically ignored. An introduction of three well-known fair scheduling heuristics is provided in the following [46].

Max-Min fair sharing: Although the allocated amount may be more than the required one, all tasks are given the same resources. Redistribute the rest and continue this process as long as possible. This method is computationally simple but not entirely fair, especially since some tasks are completed after a while. It remains to be seen how the freed resources will be handled.

Simple fair task order: To complete Max-Min fair sharing, it is stated that the released resources prioritize between tasks and prioritize the task that finishes the earliest. This method also does not have complete fairness.

Another change can be made to improve fairness so that the freed resources are divided between tasks. In the first type, it should be distributed equally among everyone, and in the second type, it should be rescheduled for them, and the allocation of those resources should occur again. For example, the size of tasks should be considered. The second type is fairer, but there is more computational and temporal complexity due to multiple rescheduling.

The heuristic of dynamic optimization is introduced in the paper [47]. In a dynamic optimization algorithm, tasks are divided into two categories based on their nature: tasks that are important to reaching the deadline and more costly tasks. Each type is arranged according to the index of the same category. There is now competition between the two rows as to which row to process first. In this regard, it can be done in turns and rotations, and there is a queue each time. It is examined to select and allocate resources to establish the resource with the fastest processing time for the considered task.

4. Metaheuristic algorithms

The use of statistics and probabilities in computer science is inevitable. Various algorithms with non-deterministic behaviors have been proposed in the category of metaheuristic algorithms. Much effort has been made to develop and utilize scheduling methods in this category.

The general scheduling problem is a well-known NP-hard problem; therefore, if a method inspects all the cases, it is not cost-effective in terms of time and computational factors. As a result, researchers have used metaheuristic techniques to approach optimal answers. Such methods find a proper solution by trying to examine the state space and seek better answers in a reasonable time [49]. Although it does not guarantee that the best answer will be found due to the wide range of problem situations and limited time, the result is usually perfect.

In deterministic algorithms, there is less flexibility than in metaheuristics in interacting with different environments and problems. Also, deterministic methods are often complex in interaction with extensive data, so they cannot find the answer. At the same time, one of the most significant weaknesses of the metaheuristic algorithms is the lack of guarantee of finding the best solution.

In many metaheuristic algorithms, many agents search the problem space and try to find a better answer. This is called swarm intelligence. These algorithms are divided into three general subcategories regarding the number of search agents. The first subcategory has one agent, and only one agent tries to find the desired answer. The second subcategory is multi-agent, and a set of agents are searching in the problem space. The third subcategory combines different methods that may be multi-agents or single-agent.

One of the most critical challenges in metaheuristic algorithms is getting stuck in local optimums. Special attention has been paid to this challenge in the scheduling context regarding metaheuristic techniques. At the same time, trying to examine more of the state space and find a better possible answer and the algorithm’s speed has been considered. A simple example is given below to explain the local optimum problem further. Let us assume a robot climbs a mountain and tries to reach the highest point. This robot has simple tilt sensors only to perceive its surroundings at very short distances. So, it chooses the highest point near and goes to it, while if the surrounding points are lower, it stays at this point and thinks it has reached the peak. In Fig. 2, two robots, 1 and 2, search for the highest peak. Robot 1 moves to its left to reach point A, which is the main peak, and stops there, in which case it succeeds. Robot 2 stops at point B, but it is clear that it has not reached the main peak while stuck in a lower peak called local optimum.

Figure 2.

An example of local optimum.

In the following, a broad range of metaheuristic algorithms is discussed. In addition, Instances of the mentioned algorithms are given in Table 3. Houssein et al. [5] prepared a good review about metaheuristic based algorithms for task scheduling in cloud.

Table 3

Metaheuristic algorithms

MA	PSO	[49][50][51][52][53][54][55][56][57]
	Hill Climbing	[50]
	GA	[58][59][60][61][62][63][64]
	Fuzzy based	[56][65][66][67]
	GSA	[68][69][70][71]
	Cuckoo	[72]
	Antlion	[73][74]
	Others	[75][76][77][78][4]

4.1 Single-agent, nature-inspired algorithms

As an simple example Hill climbing is one of the most popular single-agent algorithms. This algorithm is practically similar in appearance to the example in Fig. 2. However, its simple form is a single-agent and changes its position to find a better answer. How fast the hill climbing is, how the location changes, and the local optimization escape method are discussed differently. For example, the algorithm is simulated annealing, which changes the velocity as an exponential function that gradually decreases the speed to initially escaping from local optimums and progressively decreases the speed to allow convergence. A good example for simulated annealing in [64] used a combination of this method and genetic algorithm.

Another popular version of hill climbing is the Tabu Search algorithm [50]. This algorithm keeps a list of visited points to not return to them or other explored places. This method is especially effective if there are several restarts. Let it be noted that this way of searching can be considered by different algorithms. Another improvement that has been proposed in this regard is the use of two parallel processors, one of which is responsible for checking the list of prohibited and one for performing agent moves and other tasks.

4.2 Multi-agents, nature-inspired algorithms

The behavior of nature has strangely taken steps towards optimization, and with inspiration from nature, many algorithms have been designed and introduced. Meanwhile, some of these algorithms have been used in scheduling optimization to allocate resources in cloud computing.

The Genetic algorithm (GA) tries to produce the desired output by simulating the behavior of living things. In a GA solution, there are common points that form the framework of the algorithm. In general, this algorithm starts by randomly generating several instances, which is the first generation. Each generation member is valued by a function that indicates the degree of fitness called fitness function. As in the environment, a fast gazelle is more valuable; in other words, it is more compatible with its environment and is more robust in the face of dangers. Its chances of survival are greater than those of a slow-moving gazelle. Genes make up a chromosome and express its characteristics. At the same time, each generation member has a chromosome made up of genes. Just as the production of the next generation is based on the composition of genes of the previous generation, that is, the output of the next generation is produced by combining current genes and a kind of intersection between them. How much each member of the current generation can transmit their traits through the company of their chromosomes to produce the next generation is based on a fitness function. In many cases, some top members are selected to produce the next generation, and in other cases, everyone has a chance. This probability is evaluated based on a function and the degree of fitness for the related problem. Also, in nature, one factor that reveals new traits is changes caused by mutations. In this algorithm, mutations may also occur, causing changes in the chromosome sequence. Note that the production of each generation is a forgetfulness of the previous generation, and to complete this process, various contracts are made. A contract can include implementing a certain number of generations and reaching a produced member with a minimum level of success regarding the output of the evaluation function. Algorithm 4.2 is a pseudocode for simple form of GA.

: GA[1] initial population randomly $\textit{iteration}\leftarrow 0$

$\textit{iteration}<\textit{max\_iteration}$ individual from population $\textit{score}\leftarrow$ evaluate the individual score append to fitness_list $\textit{new\_parents}\leftarrow$ Select list of most fitted individuals $\textit{crossovered\_population}\leftarrow$ crossover (new_parents) $\textit{mutationed\_population}\leftarrow$ mutation (crossovered_population) $\textit{population}\leftarrow\textit{mutationed\_population}$

As a common method for task scheduling and resource allocation in the cloud environment, each chromosome is one of the scheduling modes and is valued based on the target. Each chromosome has a gene for the number of functions, and each gene is a number from a set of numbers corresponding to the index number that determines each of the available resources. The mutation changes these numbers, using that resource to perform the specified task.

Many researchers focusing on task scheduling has been used GA approach to design scheduling framework. Pang et al. in [59] used GA to expand the search range of solutions due to GA mutation and crossover ability. Saha et al. in [60] introduced an algorithm that uses GA to minimize the waiting time and queue length in their considered system. Cui et al. in [58] proposed a workflow task scheduling algorithm based on GA that consists of a two-dimensional coding method and has a novelty in genetic crossover and mutation operation to increase population diversity. Paper [79] proposed a scheduling algorithm based on the multi-objective genetic algorithm. Its goal is to decrease energy usage and earn the provided access by agencies. This makes an appropriate scheduling mapping as per user real-time needs.

Modified Genetic Algorithm is similar to the general genetic algorithm while the first generation is received spatial attention [61]. For example, eight chromosomes are produced first, and all organs are random in the general genetic method. However, in the modified version, it is suggested that some of these organs be made in different ways to help to achieve a better answer. That is because these methods themselves have pursued the possibility of achieving a better answer. For this reason, one of the primary members assigns tasks sorted in ascending order by size to resources, which are sorted by speed. In another member, tasks are assigned in descending order by size to resources sorted by speed. These two cause these traits to be deliberately introduced to the population and cause the hope of increasing the speed of finding the answer.

In general, each algorithm has advantages and disadvantages, and trying to combine algorithms to obtain benefits of both is common. One of the problems with the genetic algorithm is that it is slow because the combination of chromosomes can destroy the valuable part of the product. The use of greedy algorithms accelerates operations and results in fewer execution cycles. In [62], unlike many genetic algorithms that only pay attention to the completion time of all tasks, the combined algorithm considers the average response time, balance degree, service quality, and cost of work. In this algorithm, everyone has a chance to produce the next generation based on the amount of a fitness function, defined other statistical functions for the intersection, and mutation that decreases over time. At first, a large amount of intersection and mutation makes more space visible, and as it reduces, the rate of convergence and finding a better answer increases. Note that the absence of an intersection event means selecting and transferring the same member directly to the next generation. However, to improve the work, the greedy algorithm is used in such a way that, after applying the intersection and mutation, the following works are done: Total processing time computation for different virtual machines. Sorting virtual machines by computed amounts, finding the task with the shortest execution time, and choosing the task on the slowest machine. If it is the fastest machine, then the job is done, in other words, the worst run time is on the fastest node, and practically nothing can be done. Otherwise, this task will be transferred to the fastest processing node.

These steps continue until the job is done, and then members are replaced. Note, although this operation improves balance and helps to improve output, it also reduces the chance of examining the entire issue space. Of course, this case can be ignored due to time limitations. The quality of services and financial issues have also been considered during the evaluation of chromosomes, which are not difficult to apply. Regardless of the above greedy section, a multi-component evaluation function is needed.

There are applications in the task performance matters while there are dependencies between tasks modeled by a workflow [63]. For example, in a multi-stage MapReduce, the input of a task can be the output of other tasks. In this case, it is common to use a DAG to model task dependencies. In addition to the priorities, the order of dependencies has to be met in such scheduling. Therefore, Workflow Scheduling using GA follows the order of tasks to perform any task in the left genes sooner. As a result, it will face a severe negative score during the evaluation if priorities are not met. The algorithm also needs a resource selection function for each task, a Min-Min algorithm. Note that genes represent each task’s performance on each chromosome, so duplicate numbers should not be.

One of the most popular algorithms for swarm intelligence is particle swarm optimization (PSO). This algorithm is generally several randomly generated particles that start moving in the problem space and are updated to get closer to the optimal solution. PSO algorithm is simply like several hill-climbing agents searching in the problem space with different starting points. One of the most critical and effective superiority of PSO is that a particle considers both its best-found point and the best-found point so far by all other particles whereas, considering both of these points on its motion. Algorithm 4.2 is a pseudocode for simple form of PSO. In the following, three versions of particle swarm optimization are described. However, there are many versions regarding various applications. Pradhan et al. [57] proposed a great overview of PSO based algorithms in cloud computing.

: PSO[1] $\textit{particles\_position}\leftarrow\textit{initial\_random\_position\_in\_% space\_for\_each\_particle}$ $\textit{best\_position}\leftarrow\textit{initial\_randomly}$ $\textit{best\_score}\leftarrow-\infty$

$\textit{iteration}<\textit{max\_iteration}$ or particles stop meaningful improvement particle from particles

$\textit{position\_score\_list}\leftarrow$ empty list $\textit{vicinity}\leftarrow$ select candidates from vicinity based on best_position and particle_position $\textit{particle}\leftarrow$ choose best candidate from vicinity $\textit{score}\leftarrow$ computes based on problem for candidate $\textit{score}>\textit{best\_score}$ $\textit{best\_particle}\leftarrow\textit{particle}$ $\textit{best\_score}\leftarrow\textit{score}$ answer is best_particle

Tsai et al. [50] proposed multiple particle swarm optimization for task scheduling in the cloud environment. The evaluation function has several different inputs in this method, e.g., the cost and returns time. As a result, a function is merely a weighted average of the various input values normalized to a specific range when necessary.

Saeedi et al. [51] proposed an improved many-objective particle swarm optimization algorithm for dynamic workflow scheduling in the cloud environment. Its goal is to minimize the makespan, the cost and the maximization of reliability for users and energy consumption for providers. Moreover, paper [52] proposed a bi-criteria priority-based particle swarm optimization to schedule workflow tasks. This method minimizes the execution cost and time with specified deadlines and meets budget constraints.

Particle swarm optimization can be combined with other algorithms to take advantage of other methods. A comprehensive review is presented in paper [54]. For example, paper [53] proposed a scheduling algorithm by combining PSO with GA and ant colony algorithm. As a result, the combination of PSO and GA compensates for the low genetic speed while escaping from local optimums.

Cost and time in data flow computing are crucial while paying attention to only one of these factors can lead to serious challenges for the other factor due to their contradictory behavior. The paper [55] proposed genetic algorithm based on greedy strategy. It is is focused on meeting deadlines and speed of convergence while also considering costs. This paper divides all particles into equal groups, and this division causes a kind of distribution that, according to the paper, is useful in managing massive data. In addition, the size of the group changes, and it is dynamic. This resizing allows the speed of convergence to be controlled. If the groups’ size is small, more space is observed, making it easier to escape from local optimums. In addition, each time, only the position of the worst particle in each group is updated, not all the particles, which leads to the observation of more space and, of course, a reduction in the rate of convergence. Note that each group is seen as a large particle.

Another innovation in paper [55] is considering the importance of numbering indexes in space. For example, when moving from the first position to the second position, numbers between the two have no special meaning. Still, creating a valuation function and calculating the proportion in space makes points between them meaningful. A matrix is calculated, which is the time of execution of each task on each node. A matrix is also calculated, representing the information transfer time between nodes. Normally pointers to pointers are meaningless, but efforts have been made to improve numbers in this paper. This numbering is done according to the importance of the cost and reaching deadlines. It is clear that not reaching the deadline is very undesirable, so first arrange them according to it, and then if several items have the same amount, arrange them based on cost. An adaptive renumbering strategy is proposed to make the index of resources meaningful. It adaptively uses two metrics related to execution time and execution cost, respectively, to sort and renumber resources to make the learning among particles clearer and more reasonable.

It is necessary to note that the paper mentioned can find a potential balance between diversity and convergence, and several groups are coevolved using the master-slave multigroup distributed model.

Another algorithm is antlion. It has been named in works that examine the behavior between antlions and ants [73, 74]. In this method, ants move in the problem space and search for food. At the same time, antlions try to trap ants by creating traps, finding better places for traps, and building bigger traps. In addition to these, a trapped ant tries to escape while the antlion tries to slow it down by throwing pebbles to catch the ant.

Like nature, ants move randomly, which makes the whole space of the problem visible. Movements of ants are in multidimensional space, as each dimension represents one of the considered characteristics. It should be noted that ants are like particles in optimizing particle swarm or chromosomes of the genetic algorithm.

Features of the antlion Algorithm are as follows: Movements are random in all dimensions. The degree of possibility of accidental movement is affected by ant traps. As antlions receive better adaption, their traps become bigger. The term elite is the same as the antlion with the best fitness. Ants can be caught by any of antlions at any stage. For ants to move towards the antlion, the effect of random movements decreases over time. If an ant is fitter than an antlion, it is caught in the middle of a trap by an antlion. In this case, the antlion builds a new trap there. Random movements must be in the problem space and not outside. For this reason, normalization occurs, which affects the minimum and maximum possible value of each dimension. All ants have a chance of success based on the output of a fitness function. In hunting, it is assumed that each ant is only for one antlion. Antlion tries to trap ants and slow down movements of ants. In this regard, a coefficient is applied that is affected by the number of execution rounds to the total execution rounds, which causes it to remain in the optimized points. The final elite is the best match among all antlions.

The cuckoo is a bird in nature that attempts to survive by laying its eggs in other birds’ nests. In the meantime, the attempt is to find nests that can deceive the host bird, and in this case, the bird tries to be in the same areas next time, and in practice, convergent behavior occurs. The research on the use of cuckoos has been investigated by presenting the implementation in such a way that each cuckoo creates a new answer, and tasks are the eggs, and nests are the processors [72]. Each time eggs generation occurs, a new mapping is produced, mapping success is evaluated, and successive implementation cycles eventually lead to an optimal response. Also, the assumption that each nest has the capacity of one cocoa egg solves the problem of interfering with the demand for tasks from the same sources.

Gravitational Search Algorithm (GSA) is constructed based on the law of Gravity, and the notion of mass interactions and its searcher agents are collections of masses [68, 69, 70, 71]. Improved Lévy based whale optimization algorithm is introduced by paper [75] to reduce costs and resource waste and save energy. Moreover, task scheduling based on Cat swarm and Bat behavior [76], based on mean grey wolf optimization [77] and based on self-adaptive fruit fly optimization [78] are among the other nature-inspired methods.

4.3 Using fuzzy logic and fuzzy system to improve metaheuristic algorithms

Fuzzy logic and fuzzy systems are suitable for an environment where several factors are not precisely identified or already known. The main advantage of using a fuzzy approach for defining a controller is that there is no need for complex mathematical modeling to develop the controller. Fuzzy logic is about approximations, not exact solutions. The output of a fuzzy system is not an exact number but a number in the range of 0 to 1 as one represents total accuracy. Further, such behavior is observed frequently in the real world [56].

Paper [56] proposed a hybrid task scheduling algorithm based on a fuzzy system and Modified Particle Swarm Optimization technique to enhance load balancing and cloud throughput. The initial population is generated in the PSO part by assigning the smallest tasks to virtual machines with the fastest processor. Moreover, for fitness, the calculation uses the fuzzy system while the length of tasks, CPU speed, RAM size, and the total execution time are the input of the fuzzy system. Another example is the paper [65] that utilizes fuzzy logic to improve its result. In their method, the Gravitational Search Algorithm (GSA) is considered a base algorithm, and fuzzy logic is used to determine the number of masses that affect one another during the implementation of the GSA. As other examples, we can mention [66, 67] that use fuzzy logic to calculate the fitness value in GA and to improve the priority-based Max-Min algorithm, respectively.

5. Learning algorithms

Recently, there has been increased interest in using artificial intelligence and learning models. Various methods based on learning algorithms have been proposed to help get the job done. In this regard, trying to allocate resources, schedule jobs, and predict workloads are among the popular ones.

There are several strategies in artificial intelligence used to solve problems. The general category is called machine learning (ML). Moreover, reinforcement learning (RL) and deep learning (DL) strategies have received significant attention. In the following, ML, RL, and DL methods are briefly explained with representative algorithms in each category. The mentioned papers of these categories are given in Table 4.

Table 4
Learning algorithms

ML	RL	[14][80][81][82][83][84][85]
	DL	DRL, DQL	[86][87][88][89][90][91][92][93]
		LSTM	[94][95][96][97]
		others	[98][99]
	Simple	[100][101]

5.1 Simple machine learning

Machine learning is a subset of methods based on artificial intelligence. Its primary focus is developing computer programs that can access data and use it for their learning. The learning process begins with observations and then uses data to find a pattern in the data and make better decisions. The main goal is to allow the computer to learn automatically without human intervention and to be able to adjust its actions accordingly.

One of the simplest models in machine learning is the Support Vector Machine (SVM). For example, a virtual resource scheduling prediction scheme based on SVM is proposed in [100]. In this scheme, reconstructing the phase space produces SVM inputs for training and predicting. As another example, a workload prediction model based on machine learning is presented in [101]. This algorithm selects idle VMs or VMs with a low workload and turns them off to reduce power consumption.

5.2 Reinforcement learning

Reinforcement Learning is among the most popular algorithms in machine learning. In these methods, a learning agent starts to experience different situations and gather information with the help of feedback. As a result, it can act better in future decisions. Usually, there is a linear equation of the weighted average of the current score and the score from the feedback. Reinforcement learning can be learned based on experience without the need for basic computation rules, leading to flexibility in various issues. Here, several examples of algorithms that utilize Reinforcement Learning are provided.

Algorithm 5.2 is a pseudocode of its simple form.

RL can be said to be a promising approach to auto-scale in the cloud, as it is possible to learn clear, dynamic, and consistent resource management policies [85].

: RL[1] initial policy randomly

! (converge on policy) $\textit{decision}\leftarrow$ decide between exploration or exploitation randomly based on probabilistic decision is exploration $\textit{action}\leftarrow$ choose randomly from possible actionsdecision is exploitation $\textit{action}\leftarrow$ policy (agent position in space) $\textit{score}\leftarrow$ COMPUTE_SCORE (action) position compute_new_position (space, position, action) $\textit{policy}\leftarrow$ update (policy, score, space)

Attention has been paid to energy consumption, but it should not be forgotten that less energy consumption means fewer resources and reduces execution time. Therefore, a balance must be struck between these two. The proposed model in the paper [80] also tries to minimize energy consumption without violating the contract with the user and exceeding the deadlines. In this regard, an attempt has been made to provide a model to predict energy consumption and know how much energy consumption is to be faced. Therefore, the valuation function is defined based on the absolute value of the difference between the predicted amount and the experienced amount in practice. This difference is called the forecast error, and it should be reduced if possible. The point to be added is that the forecasting cost can be updated in different cases with different parameters and used in algorithms mentioned earlier and need to be estimated.

Paper [81] proposed an energy-aware multi-objective reinforcement learning algorithm that tries to minimize the makespan and energy consumption while meeting the budget constraint for a multi-objective workflow scheduling problem. In addition, works in [82, 83] used a multi-agent reinforcement learning algorithm for job scheduling.

The complexity of scheduling increases if several parameters need to be optimized, especially if these parameters conflict with cost and makespan. The paper [14] investigated a bi-objective scheduling problem for deadline-sensitive tasks in the heterogeneous cloud environment. The scheduling objectives are to minimize energy consumption and makespan. It proposed a scheduling algorithm that uses reinforcement learning behavior. In this scheduling, a flexible decision-making unit learns from the performed operation and improves itself. The presence of this reinforcement learning mode helps to approach the optimal answer in a dynamic environment. It should be noted that automatic learning is suitable for dynamic and complex environments with many uncertainties. Therefore, the introduced method with such flexibility should be able to allocate resources to tasks best. In reinforcement learning, if the action selected for one task is successful, the probability for other actions may decrease, and the probability of this action will increase and vice versa.

5.3 Deep learning

Deep Learning is a new subset of machine learning that has increasingly become a popular trend in the last few years. This popular method is easy to use, powerful, and an important element of data science. Deep learning is beneficial to perform tasks with a large amount of data. Another important point is to use experiences of previous cases like the reinforcement learning paradigm. Some deep learning models can also learn from the past and influence their internal information matrix. Also, they can understand sequential behavior when receiving sequences. One-dimensional Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) are suitable models for this kind of input.

As an example, in LSTM based algorithm proposed in paper [94], there is a storage resource that is updated at each step and uses the previous information as an additional input and uses it to generate new output. This leads to detecting various behaviors such as high and low slopes and oscillations while reducing the volume of the stored matrix. Multiple resources, including CPU, bandwidth, RAM, and storage resources, are used as inputs, which means the model regards all of them. As another example of LSTM utilization, work in paper [95] used this model to predict the amount of work that would be requested at any given time. As this paper states, the advantages of this model are the ability to detect patterns that exist at different time intervals, the ability to decide what period is important and influential automatically, and provides a nonlinear behavior that can more easily manage possible noise than something like the regression line.

Q-learning is a model-free reinforcement learning algorithm. This algorithm considers the problem state space as a set of spaces and tries to move and learn between these spaces. This method creates a policy of which path to go in each case. This algorithm is fast, and essential information can be easily applied by writing in the value of each house. Q-learning defines a measure of the success of monitoring, allocating, and controlling a scheduling process. This measure is a linear combination of performance and cost. In the case of rapid changes in inputs, values of the Q-learning table may fluctuate, whereas to compensate, weights are frozen. After receiving the result of several steps, updates will occur. Many deep learning models somehow follow behaviors of machine learning algorithms, such as using reinforcement learning and creating a model called Deep Reinforcement Learning (DRL). For example, Wang et al. [93] proposed a new DRL based algorithm. This was also introduced in a model called deep Q-learning (DQL). As an example, Kaur et al. [92] proposed DQL based algorithm to improve the earliest-finish-time. The paper [86] aims to balance resource consumption and execution time by using deep learning.

Another example of Q-learning for task scheduling as an energy-efficient cloud computing is paper [84]. Paper [87] proposed deep Q-learning task scheduling, and it is aimed at solving the problem of handling directed acyclic graphs (DAG). Another good example is [92].

Zhang et al. [88] proposed a DRL-based scheduling approach to minimize the job completion time, the load imbalance value, and total cost. It is a multi-task deep reinforcement learning approach for scalable parallel task scheduling. Paper [89] used DRL to reduce energy consumption and cost through resource provisioning and task scheduling. Paper [90] used DRL for QoS-aware job scheduling. Paper [91] proposed a job scheduling algorithm based on deep reinforcement learning to minimize the makespan.

Workload prediction can benefit different proposed algorithms in cloud management and scheduling. An accurate forecast helps mitigate energy consumption, meet the required quality of service, predicting energy consumption [98]. As an example, paper [99] proposed a deep learning model for predicting cloud workload. Paper [96] used LSTM for forecasting CPU usage of machines in data centers. Apart from the above, artificial intelligence has various other usages in cloud computing; for example, Gao et al. [97] proposed an algorithm to predict failure by utilizing Bidirectional Long Short-Term Memory as a model with two LSTMs. The mentioned paper aims to predict whether tasks and jobs are failed or completed.

6. Algorithms based on game theory

Game theory (GT) mainly focuses on the strategic interaction among rational decision-makers and is widely applied in all fields including logic and systems science [102]. Game theory models have been developed to achieve goals such as increasing resource utilization and high quality of service for consumers and allocating resources efficiently and equitably between cloud centers, cloud service providers, and data consumers [103]. There are many good paper examples, [104] proposed algorithm schedule assigns each task on its best virtual machine based on Matching Game theory in the final phase. Each task can choose the appropriate machine in this model, respecting each machineâ€™s maximum number of quotas. As an example, [105] proposed using game theory and optimization methods in optimization problems. It is necessary to determine the strategy, payoff, and players in game theory. In the context of scheduling, each task is assumed to be a player, and each strategy is one of the virtual machines, and each task tries to receive a better total output by choosing the suitable virtual machine. For payoff, it used completion time and waiting time. Another examples are [106, 107, 108]. For an overview, simple game theory based pseudocode is available in Algorithm 6.

: Simple scheduling based on game theory[1] $\textit{assignment}\leftarrow$ randomly assign resources to tasks! Nash_equilibrium ( $\textit{assignment},\textit{tasks},\textit{resources}$ ) task from tasksIt is possible to improve task situation by action, either take a resource or be added to the queue for a resourcedo the action and update assignment

In game theory, two main classes are Cooperative and non-cooperative, representing games in which players work together to achieve the overall goal and compete to achieve their goals, respectively. For scheduling, Patra et al. [109] have implemented both cases and tried to provide proper real-time task scheduling in cloud computing. According to the output of this work, the cooperative type had a more successful result. [110] proposed a cooperative game model for scheduling cloud computing tasks. Participants can optimize their game strategy based on a series of rational assumptions. A sequential game-based scheduling algorithm is proposed based on the Nash equilibrium solution analysis. Ye et al. [111] is a good example of non-cooperatives.

Nash equilibrium is a concept in game theory that no player gets a better payoff by changing its strategy assuming all other players do not change their strategy [112]. The work in paper [113] used game theory to minimize thermal imbalance and hotspots while it introduced a game strategy to allocate resources in data centers. In this paper, tasks are allocated according to their temperature status and the effect on other nodes. In addition, Nash equilibrium is used to provide a non-cooperative game as a solution. The work in paper [114] used game theory to help cloud providers in the decision-making process while the goal is to increase resource utilization and profit. Paper [115] has been tried to reduce the average response time and energy consumption by using the Stackelberg game. Paper [116] proposed a model based on game theory that the job scheduling is considered as a n-player and non-cooperative game. In this model, each job is a player whose payoff is the amount of makespan. Moreover, achieving the optimal solution in planning with Nash equilibrium is checked. Genetic algorithms have also been used to find different points to reach the Nash equilibrium. Paper [117] introduced Nash Equilibrium-based VM Replacement that is designed such that most data center hosts are either in optimal resource utilization or shut down. Paper [118] introduced the usage of Nash Equilibrium concepts to improve throughput and reduce the overall idle time of the processors. Overall, there are many related works in the context of Nash equilibrium in cloud environments [110, 112, 119, 120, 121].

7. MapReduce review as an example

Different categories of scheduling in cloud computing were introduced in the previous sections. The following is an exploration of a famous example of using cloud computing. This section examine MapReduce and some proposed ideas and algorithms to improve its performance, especially the scheduling part.

MapReduce is a programming model for processing and generating large data sets. Users specify a map function that processes a key/value pair to create a set of intermediate key/value pairs and a reduce function that merges all intermediate values associated with the same intermediate key [122]. The widespread popularity of big data processing platforms using the MapReduce framework is the growing demand to optimize their performance for various purposes. In particular, enhancing resources and job scheduling are becoming critical since they fundamentally determine whether applications can achieve the performance goals in different use cases. Scheduling plays a vital role in big data, mainly in reducing the execution time and cost of processing [123].

MapReduce has become popular because of its ability to perform parallel tasks, balance data sharing, efficient use of resources, and optimal management. In addition, the ease of use increased its popularity. Due to the high popularity of MapReduce, it has attracted the attention of many researchers, and there is much discussion about its optimization. Many famous companies like Google and Yahoo have been instrumental in developing and improving MapReduce. Many sources on this subject provide helpful information, including papers [30, 124, 125], which are references for a general introduction. A brief introduction to this context is provided in the following.

The design of MapReduce consists of different parts, and there is variety in its implementation, but in general, the main parts are common. As the name implies, there are two critical steps: Map and Reduce. Nodes are also categorized in several ways, such as master and slave nodes or manager and worker nodes or nominal nodes, and data nodes. Nominal nodes are responsible for the overall management of operations and are responsible for dividing tasks and specifying tasks for other nodes. On the other hand, they are responsible for monitoring the performance of others and dealing with the problem of failed nodes. Data nodes are responsible for performing their assigned tasks. The nominal nodes send a message to the worker nodes to monitor, and if they do not receive a response, it means that the node is in trouble, which means that it must think about delegating its tasks to other nodes.

7.1 Hadoop

Hadoop is one of the most popular open-source implementations of MapReduce. This implementation provides a software framework for distributed storage and processing of big data.

Hadoop is widely used because it can handle all types of data like structured or unstructured, and supports redundancy, scalability, parallel processing, and distributed architecture [30]. Some Hadoop technology companies are Amazon Web Services, Cloudera, ScienceSoft, Pivotal, Hortonworks, IBM, MapR.

In the MapReduce process, the data is split between different nodes and processed based on a user-defined function. After that, it becomes a set of crucial order and value pairs. This set will be the output of the mapping stage and the next stage’s input. This data is called intermediate data and is used as input to nodes that are supposed to produce the final output and reduce the so-called intermediate data to the last data. In the meantime, it is important to choose which node each of these pairs of orders will be on. This is called shuffling. Various challenges may be raised in this regard, such as maintaining data security at different stages, optimizing processes, selecting appropriate nodes, and dealing with failed or slowed-down nodes.

With the provision of Hadoop infrastructure, the user only needs to prepare the two-step executable functions and try to determine the configurable parameters in the executive system and not deal with the complexities of the work. Another challenge is the issue of the locality. There is a gap between the node that has to process a specific task and its associated data. The shorter the distance, the better the performance because the cost of data transfer time is reduced. The problem is that it is not always possible to place data on the associated node, making it more complicated. Kalia et al. [126] proposed a good overview of Hadoop framework, Hadoop shortcomings and issues, and different coupled algorithms.

7.2 Algorithms based on MapReduce

Numerous instances can be seen in various papers and studies that attempt to provide a suitable function for a particular task or algorithm, in other words, to rewrite different algorithms based on MapReduce. An introductory example is the counting of different words in a text. This example, although simple, describes the operation well. In what follows, a brief description of word counting is given. In this case, a counter is designed to create statistics of the repetition of words in a text. Figure 3 is a sample abstract example of word counting that is used for description. As a part of this method, each of the sentences on the left is assigned to one of the processing nodes. Then, each of these nodes arranges a pair for a received word such that the word and one are set as its key and its value, respectively. This indicates one repetition. Keys sort this data, which is called the intermediate data set, and all pairs of sequences with the same initial letter are matched to a node to reduce. This node calculates the total value of pairs with the same key, and now the number of repetitions of each word will be the product of the output aggregation of the reducing nodes. Note that this is a simple expression of word counting and some unpleasant effects are available, such as an imbalance in the allocation of intermediate data to the reduction nodes. Different algorithms can be used for scheduling in MapReduce. For example, Pandey et al. [127] proposed a heuristic method that tries to improve energy consumption. Singh et al. [128] proposed a metaheuristic based method. This paper used ant colony optimization.

Figure 3.

An example of MapReduce for word counting.

As other examples, It can mention the problem of rewriting the decision tree algorithm with MapReduce [129]. Another simple example is the estimation of the number Pi [125]. The input data are many different points that are randomly generated in a square interval. Determine whether the point belongs to an enclosed circle or has a perpendicular rectangle, and as a result, the middle data keys are the same as the circle and the square.

7.3 Improving MapReduce

Intermediate data is an important part of the MapReduce process because it connects the two main parts; Map and Reduce. i.e., the output of the first stage is the input of the next stage. Knowing that different nodes may be disrupted, all intermediate data associated with a disrupted node must be reproduced. Therefore, a repetitive process occurs. By looking at this issue from a statistical point of view and given a large amount of data, this concern will remain even in the face of a small percentage of failure.

A paper by Moise et al. [130] aimed to solve this problem by using a distributed file system. This type of file is ideal for applications with large data volumes and data transactions that are fast and high, which is precisely what happens to intermediate data. The complete reproduction process does not need to occur again using a distributed file system and optimality is brought for node failures. However, an effective data management method is essential while a distributed file system is utilized. BlobSeer is a large-scale distributed storage service that addresses advanced data management requirements resulting from ever-increasing data sizes. This system can address concerns related to the management. The BlobSeer collects large volumes of data with a binary view written separately after each information change, and the previous information is not lost, precisely what demands here. The paper [130] has succeeded in eliminating reproduction overheads in the case of failure, whereas by providing management service facilities, there will be no challenge in the productivity of distributed files. As a result, the system will behave better, and the processing overhead will be less.

In various cases, a sequence of MapReduce invocations is necessary to solve the problem, i.e., the problem is better to be divided into smaller subproblems. This issue does not affect the optimality of the model itself but rather an improved method for solving it. One of its advantages is reducing the overhead of dealing with failure during the process and simplifying the problem. paper [131] proposed Hashdoop, which is a MapReduce framework for network anomaly detection.

In Hashdoop, for example, if the number of packets sent from a source to a destination is too high, unusual behavior is possible. The proposed algorithm of Hashdoop consists of two MapReduce steps. First, the received data is checked in two ways based on origin and destination, and outputs of this step of MapReduce are categories whose key is their common origin or their common goal. In the meantime, it will sort these data chronologically, and in the second step, it will count the repetition of each input and output pair at different times. Abnormal behavior is recognized when a large value indicator is large for a regular set of origin, destination, and time. As a result, the complexity of the problem has been dramatically reduced in Hashdoop.

Moreover, to improve MapReduce performance, one thing that can be done is to delete the somehow wrong data in a related context. Suppose the administrator node notices that nodes have trouble processing part of the data, and failure occurs. In these cases, a part of the input data is probably flawed, and thus, the nominal node itself commands to remove that data from the input data. This will improve performance and deal with multiple failures for such reasons.

Table 5
Comparison between four categories of scheduling algorithms

	Deterministic algorithms	Metaheuristic algorithms	Learning based algorithms	Algorithms based on game theory
When consider in the field of cloud computing scheduling	Simultaneously with the need for scheduling in cloud computing	Almost like deterministic category, but less practical usage	Almost like deterministic category, but less practical usage	Almost like deterministic category, but less practical usage
Time and space complexity	Usually, the least computational complexity	It depends on algorithms and number of agents	The most computational complexity	Depend on the details of the proposed algorithm
Difficulty of algorithm design	Usually less than other categories	Depend on designed algorithms	Similar to other ML and DL algorithms	Depends on designed algorithms
Disadvantages	Challenges and difficulties in complex problems – weak result in many cases	Local optimum problem – random behavior	Hungary over data problem, high computational load, large models, and overfitting challenge, specially in DL	Possible hardness for finding a suitable answer and equation
Advantages	Simple algorithm that is easy to understand and implement comparing to most other methods	Get the desired answer in many complex problems sooner than methods in other categories	Highly independent of the problem complexity and no need of feature engineering to explore the dependencies and importance of different cases	Provide a new perspective on solving scheduling problems – dealing with conflicts and competitions between objectives in a scheduling problem

7.4 Scheduling algorithms

Many scheduling algorithms have been proposed for MapReduce taking into account various factors such as user preferences, makespan, deadline, and available resources [123]. For example, consider meeting the deadline one of the desired scheduling factors. Paper [35] consider this criterion. The time required for processing, including mapping and reduction, is calculated, and the purpose of this paper is to serve the maximum number of tasks requested. The order of task checking is based on their deadlines, and if it is estimated that a task cannot complete the process before its deadline, it is ignored and set aside. Such methods tend to prioritize tasks or even eliminate some of them rather than targeting the MapReducing pattern itself.

Scheduling methods are generally divided into two categories; the static type occurs at the beginning of the schedule. The second type is dynamic, in which decisions are made during work upon receiving each task. Scheduling algorithms consider and prioritize only one component: availability of resources, FIFO, and fair scheduling. Section 3 covers the overview of these scheduling algorithms that there is not much distinction between methods for MapReduce and general ones.

8. Conclusion

Improving performance is one of the most significant concerns in the cloud environment. Meanwhile, the adopted resource allocation and job scheduling significantly affect cloud management performance. This paper explores motivations for demanding better scheduling and investigates various scheduling methods. Therefore, this paper focuses on examining scheduling algorithms in four main categories: deterministic algorithms, metaheuristic algorithms, learning algorithms, and algorithms based on game theory. Deterministic algorithms are simpler but, at the same time, have less flexibility. In the case of complicated problems, attempts have been made to find a near-optimal answer with indefinite methods and with the help of metaheuristic algorithms. Providing suitable methods becomes more and more difficult as the situation becomes increasingly complex. Accordingly, the utilization of learner patterns, especially reinforcement learning and deep learning, has facilitated implementation and high processing power and understanding of time patterns. Game theory is another helpful approach because it can study relations between agents and make better strategies. Moreover, a comparison between the four proposed categories are provided in Table. 5. In the end, in this paper, there is a review of the MapReduce framework as a example of big data processing that required proper scheduling approaches for accepted performance. Due to the increasing importance and popularity of edge computing, reviewing existing papers in this field that aims to improve performance and better scheduling can guide future reviews. The combination of simpler algorithms can also be seen in different algorithms. In this paper, we tried to examine the advantages of each method, the possibility of combining various algorithms, and better achievable results.

Footnotes

Author’s Bios

Marzieh Malekimajd received the B.Sc., M.Sc., and Ph.D. degrees in computer engineering from Sharif University of Technology in 2009, 2011, and 2018, respectively. She was a visiting Ph.D. student with the Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy, in 2014. She was also a Postdoctoral Researcher with the School of Computer Science, Institute for Research in Fundamental Science (IPM), Tehran, Iran, between 2018 to 2019. She is an Assistant Professor at the School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran. Her research interests include graph theory and algorithmic aspects of cloud computing, Big Data, and communication networks.

Ali Safarpoor-Dehkordi is an M.Sc. student at the Sharif University of Technology, Tehran, Iran. He received his B.Sc. from Iran University of Science and Technology, Tehran, Iran. His research interests are Algorithms, Cloud Computing, Deep Learning, and Machine learning.

References

Arutyunov

V.V.

, Cloud computing: Its history of development, modern state, and future considerations, Scientific and Technical Information Processing 39(3) (2012), 173–178.

Mell

and Grance

, The NIST definition of cloud computing, Computer Security Division, Information Technology Laboratory, 2011.

Singh

and Chana

, A survey on resource scheduling in cloud computing: Issues and challenges, Journal of Grid Computing 14(2) (2016), 217–264.

Yahia

H.S.

Zeebaree

S.R.

Sadeeq

M.A.

Salim

N.O.

Kak

S.F.

Adel

A.Z.

Salih

A.A.

and Hussein

H.A.

, Comprehensive survey for cloud computing based nature-inspired algorithms optimization scheduling, Asian Journal of Research in Computer Science 8(2) (2021), 1–6.

Houssein

E.H.

Gad

A.G.

Wazery

Y.M.

and Suganthan

P.N.

, Task scheduling in cloud computing based on meta-heuristics: Review, taxonomy, open challenges, and future trends, Swarm and Evolutionary Computation 62 (2021), 100841.

Wang

Ren

and Lou

, Achieving secure, scalable, and fine-grained data access control in cloud computing, in: 2010 Proceedings IEEE INFOCOM, San Diego, USA, 2010, pp. 1–9.

Alouffi

Hasnain

Alharbi

Alosaimi

Alyami

and Ayaz

, A systematic literature review on cloud computing security: Threats and mitigation strategies, IEEE Access 9 (2021), 57792–57807.

Salot

, A survey of various scheduling algorithm in cloud computing environment, International Journal of Research in Engineering and Technology 2(2) (2013), 131–135.

Srichandan

Kumar

T.A.

and Bibhudatta

, Task scheduling for cloud computing using multi-objective hybrid bacteria foraging algorithm, Future Computing and Informatics Journal 3(2) (2018), 210–230.

10.

Zhan

Z.H.

Liu

X.F.

Gong

Y.I.

Zhang

Chung

H.S.

and Li

, Cloud computing resource scheduling and a survey of its evolutionary approaches, ACM Computing Surveys (CSUR) 47(4) (2015), 1–33.

11.

Bhattacherjee

Das

Khatua

and Roy

, Energy-efficient migration techniques for cloud environment: A step toward green computing, The Journal of Supercomputing 76(7) (2020), 5192–5220.

12.

Zhu

Yang

L.T.

Chen

Wang

Yin

and Liu

, Real-time tasks oriented energy-aware scheduling in virtualized clouds, IEEE Transactions on Cloud Computing 2(2) (2014), 168–180.

13.

Cai

Wang

Song

Zhang

and Cao

, An energy-efficiency-aware resource allocation strategy in multi-granularity provision for green computing, in: 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, Hawaii, USA, 2019, pp. 782–786.

14.

Sahoo

and Turuk

A.K.

, A learning automata-based scheduling for deadline sensitive task in the cloud, IEEE Transactions on Services Computing 14(6) (2019), 1662–1674.

15.

Kaja

Shakshuki

E.M.

Guntuka

Yasar

A.U.

and Malik

, Acknowledgment scheme using cloud for node networks with energy-aware hybrid scheduling strategy, Journal of Ambient Intelligence and Humanized Computing 11(10) (2020), 3947–3962.

16.

Xing

Xiang

Xue

and Yang

, Fair energy-efficient virtual machine scheduling for Internet of Things applications in cloud environment, International Journal of Distributed Sensor Networks 13(2) (2017), 1–11.

17.

Jiang

Garraghan

and Wu

, Holistic energy and failure aware workload scheduling in Cloud datacenters, Future Generation Computer Systems 78 (2018), 887–900.

18.

Liang

Dong

Wang

and Zhang

, A low-power task scheduling algorithm for heterogeneous cloud computing, The Journal of Supercomputing 76(9) (2020), 7290–7314.

19.

Walia

N.K.

Kaur

Alowaidi

Bhatia

K.S.

Mishra

Sharma

N.K.

Sharma

S.K.

and Kaur

, An energy-efficient hybrid scheduling algorithm for task scheduling in the cloud computing environments, IEEE Access 9 (2021), 117325–117337.

20.

Ibrahim

I.M.

, Task scheduling algorithms in cloud computing: A review, Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12(4) (2021), 1041–1053.

21.

Panda

S.K.

Nanda

S.S.

and Bhoi

S.K.

, A pair-based task scheduling algorithm for cloud computing environment, Journal of King Saud University-Computer and Information Sciences 34(1) (2022), 1434–1445.

22.

Sana

M.U.

and Li

, Efficiency aware scheduling techniques in cloud computing: A descriptive literature review, PeerJ Computer Science 7 (2021), e509.

23.

Fakhfakh

Kacem

H.H.

and Kacem

A.H.

, Simulation tools for cloud computing: A survey and comparative study, in: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), China, 2017, pp. 221–226.

24.

Abreu

D.P.

Velasquez

Curado

and Monteiro

, A comparative analysis of simulators for the cloud to fog continuum, Simulation Modelling Practice and Theory 101 (2020), 102029.

25.

Mansouri

Ghafari

and Zade

B.M.

, Cloud computing simulators: A comprehensive review, Simulation Modelling Practice and Theory 104 (2020), 102144.

26.

Buyya

Ranjan

and Calheiros

R.N.

, Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities, in: 2009 International Conference on High Performance Computing & Simulation, Kochi, India, 2009, pp. 1–11.

27.

Kliazovich

Bouvry

and Khan

S.U.

, GreenCloud: A packet-level simulator of energy-aware cloud computing data centers, The Journal of Supercomputing 62(3) (2012), 1263–1283.

28.

Wickremasinghe

Calheiros

R.N.

and Buyya

, Cloudanalyst: A cloudsim-based visual modeller for analysing cloud computing environments and applications, in: 2010 24th IEEE International Conference on Advanced Information Networking and Applications, Perth, Australia, 2010, pp. 446–452.

29.

Chen

and Deelman

, Workflowsim: A toolkit for simulating scientific workflows in distributed environments, in: 2012 IEEE 8th International Conference on E-science, Chicago, USA, 2012, pp. 1–8.

30.

Mohamed

and Hong

, Hadoop-MapReduce job scheduling algorithms survey, in: 2016 7th International Conference on Cloud Computing and Big Data (CCBD), Macau, China, 2016, pp. 237–242.

31.

Kim

H.D.

and Kim

J.S.

, An online scheduling algorithm for grid computing systems, in: International Conference on Grid and Cooperative Computing, Shanghai, China, 2003, pp. 34–39.

32.

Mehdi

N.A.

Mamat

Amer

and Abdul-Mehdi

Z.T.

, Minimum completion time for power-aware scheduling in cloud computing, in: 2011 Developments in E-systems Engineering, Dubai, UAE, 2011, pp. 484–489.

33.

Madni

S.H.

Abd Latiff

M.S.

Abdullahi

Abdulhamid

S.I.

and Usman

M.J.

, Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment, PloS One 12(5) (2017), e0176321.

34.

Han

Chen

and Du

, Minimizing monetary costs for deadline constrained workflows in cloud environments, IEEE Access 8 (2020), 25060–25074.

35.

and Anyanwu

, Scheduling hadoop jobs to meet deadlines, in: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, Indiana USA, 2010, pp. 388–392.

36.

Dubey

and Sharma

S.C.

, A novel multi-objective CR-PSO task scheduling algorithm with deadline constraint in cloud computing, Sustainable Computing: Informatics and Systems 32 (2021), 100605.

37.

Runsungnoen

and Anusas-amornkul

, Round Robin scheduling based on remaining time and median (RR_RT&M) for cloud computing, in: Smart Trends in Computing and Communications; Springer, Singapore, 2020, pp. 21–29.

38.

Venkataraman

, Threshold based multi-objective memetic optimized round robin scheduling for resource efficient load balancing in cloud, Mobile Networks and Applications 24(4) (2019), 1214–1225.

39.

Patel

Mehta

and Bhoi

, Enhanced load balanced min-min algorithm for static meta task scheduling in cloud computing, Procedia Computer Science 57 (2015), 545–553.

40.

Derakhshan

and Bateni

, Optimization of tasks in cloud computing based on MAX-MIN, MIN-MIN and priority, in: 2018 4th International Conference on Web Research (ICWR), Tehran, Iran, 2018, pp. 45–50.

41.

Liu

and Xu

, An improved min-min algorithm in cloud computing, in: Proceedings of the 2012 International Conference of Modern Computer Science and Applications, Wuhan, China, 2013, pp. 47–52.

42.

Devipriya

and Ramesh

, Improved max-min heuristic model for task scheduling in cloud, in: 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), Tamil Nadu, India, 2013, pp. 883–888.

43.

Hung

T.C.

Hieu

L.N.

P.T.

and Phi

N.X.

, MMSIA: improved max-min scheduling algorithm for load balancing on cloud computing, in: Proceedings of the 3rd International Conference on Machine Learning and Soft Computing, Da Lat Vietnam, 2019, pp. 60–64.

44.

Parsa

and Entezari-Maleki

, RASA: a new grid task scheduling algorithm, International Journal of Digital Content Technology and its Applications 3(4) (2009), 91–99.

45.

Shao

Liu

Pan

Yang

and Niu

, A market-oriented heuristic algorithm for scheduling parallel applications in big data service platform, in: 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, Japan, Vol. 1, 2018, pp. 677–686.

46.

Doulamis

N.D.

Doulamis

A.D.

Varvarigos

E.A.

and Varvarigou

T.A.

, Fair scheduling algorithms in grids, IEEE Transactions on Parallel and Distributed Systems 18(11) (2007), 1630–1648.

47.

Choudhary

and Peddoju

S.K.

, A dynamic optimization algorithm for task scheduling in cloud environment, International Journal of Engineering Research and Applications (IJERA) 2(3) (2012), 2564–2568.

48.

Ghanbari

and Othman

, A priority based job scheduling algorithm in cloud computing, Procedia Engineering 50 (2012), 778–785.

49.

Dordaie

and Navimipourm

N.J.

, A hybrid particle swarm optimization and hill climbing algorithm for task scheduling in the cloud environments, ICT Express 4(4) (2018), 199–202.

50.

Tsai

c.w.

and Rodrigues

j.j.

, Metaheuristic scheduling for cloud: A survey, IEEE Systems Journal 8(1) (2013), 279–291.

51.

Zhang

Wang

Sun

and Liao

, Multi-task deep reinforcement learning for scalable parallel task scheduling, USA, in: 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 2992–3001.

52.

Verma

and Kaushal

, Bi-criteria priority based particle swarm optimization workflow scheduling algorithm for cloud, in: 2014 Recent Advances in Engineering and Computational Sciences (RAECS), Chandigarh, India, 2014, pp. 1–6.

53.

Chen

and Long

, Task scheduling of cloud computing using integrated particle swarm algorithm and ant colony algorithm, Cluster Computing 22(2) (2019), 2761–2769.

54.

Masdari

Salehi

Jalali

and Bidaki

, A survey of PSO-based scheduling algorithms in cloud computing, Journal of Network and Systems Management 25(1) (2017), 122–158.

55.

Wang

Z.J.

Zhan

Z.H.

W.J.

Lin

Zhang

T.L.

and Zhang

, Dynamic group learning distributed particle swarm optimization for large-scale optimization and its application in cloud workflow scheduling, IEEE Transactions on Cybernetics 50(6) (2019), 2715–2729.

56.

Mansouri

Zade

B.M.

and Javidi

M.M.

, Hybrid task scheduling strategy for cloud computing by modified particle swarm optimization and fuzzy theory, in: Computers & Industrial Engineering 130 (2019), 597–633.

57.

Pradhan

Bisoy

S.K.

and Das

, A survey on PSO based meta-heuristic scheduling mechanism in cloud computing environment, Journal of King Saud University-Computer and Information Sciences, 2021. doi: 10.1016/j.jksuci.2021.01.003.

58.

Cui

and Xiaoqing

, Workflow tasks scheduling optimization based on genetic algorithm in clouds, in: 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 2018, pp. 6–10.

59.

Pang

Shan

and Wang

, An EDA-GA hybrid algorithm for multi-objective task scheduling in cloud computing, IEEE Access 7 (2019), 146379–146389.

60.

Saha

Pal

and Pattnaik

P.K.

, A novel scheduling algorithm for cloud computing environment, in: Computational Intelligence in Data Mining, Springer, Vol. 1, 2016, pp. 387–398.

61.

Kaur

and Verma

, An efficient approach to genetic algorithm for task scheduling in cloud computing environment, International Journal of Information Technology and Computer Science (IJITCS) 4(10) (2012), 74–79.

62.

Zhou

Zhu

Xie

Abawajy

J.H.

and Chowdhury

M.U.

, An improved genetic algorithm using greedy strategy toward task scheduling optimization in cloud environments, Neural Computing and Applications 32(6) (2020), 1531–1541.

63.

Kaur

Chhabra

and Singh

, Heuristics based genetic algorithm for scheduling static tasks in homogeneous parallel system, International Journal of Computer Science and Security (IJCSS) 4(2) (2010), 183–98.

64.

Tanha

Hosseini Shirvani

and Rahmani

A.M.

, A hybrid meta-heuristic task scheduling algorithm based on genetic and thermodynamic simulated annealing algorithms in cloud computing environments, Neural Computing and Applications 33(24) (2021), 16951–16984.

65.

Shooli

R.G.

and Javidi

M.M.

, Using gravitational search algorithm enhanced by fuzzy for resource allocation in cloud computing environments, SN Applied Sciences 2(2) (2020), 1–13.

66.

Shojafar

Javanmardi

Abolfazli

and Cordeschi

, FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method, Cluster Computing 18(2) (2015), 829–844.

67.

Sandana Karuppan

Meena Kumari

S.A.

and Sruthi

, A Priority-Based Max-Min Scheduling Algorithm for Cloud Environment Using Fuzzy Approach, in: International Conference on Computer Networks and Communication Technologies; Springer, Singapore, 2019, pp. 819–828.

68.

Rashedi

Nezamabadi-Pour

and Saryazdi

, GSA: A gravitational search algorithm, Information Sciences 179(13) (2009), 2232–2248.

69.

Choudhary

Gupta

Singh

and Jana

P.K.

, A GSA based hybrid algorithm for bi-objective workflow scheduling in cloud computing, Future Generation Computer Systems 83 (2018), 14–26.

70.

Chaudhary

and Kumar

, Cost optimized hybrid genetic-gravitational search algorithm for load scheduling in cloud computing, Applied Soft Computing 83 (2019), 105627.

71.

Thakur

A.S.

Biswas

and Kuila

, Binary quantum-inspired gravitational search algorithm-based multi-criteria scheduling for multi-processor computing systems, The Journal of Supercomputing 77(1) (2021), 796–817.

72.

Agarwal

and Srivastava

G.M.

, A cuckoo search algorithm-based task scheduling in cloud computing, China, in: Advances in Computer and Computational Sciences; Springer: Singapore, 2018, pp. 293–299.

73.

Abualigah

and Diabat

, A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments, Cluster Computing 24(1) (2021), 205–223.

74.

Mirjalili

, The ant lion optimizer, Advances in Engineering Software 83 (2015), 80–98.

75.

Abdel-Basset

Abdle-Fatah

and Sangaiah

A.K.

, An improved Lévy based whale optimization algorithm for bandwidth-efficient virtual machine placement in cloud computing environment, Cluster Computing 22(4) (2019), 8319–8334.

76.

Sonangeri Pushpavati

U.K.

and D’Mello

D.A.

, A tree based mechanism for the load balancing of virtual machines in cloud environments, International Journal of Information Technology 13(3) (2021), 911–920.

77.

Natesan

and Chokkalingam

, Task scheduling in heterogeneous cloud environment using mean grey wolf optimization algorithm, ICT Express 5(2) (2019), 110–114.

78.

Aggarwal

Dimri

Agarwal

and Bhatt

, Self adaptive fruit fly algorithm for multiple workflow scheduling in cloud computing environment, Kybernetes 50(6) (2020), 1704–1730.

79.

Liu

Luo

X.G.

Zhang

X.M.

Zhang

and Li

B.N.

, Job scheduling model for cloud computing based on multi-objective genetic algorithm, International Journal of Computer Science Issues (IJCSI) 10(1) (2013), 134.

80.

Shaw

Howley

and Barrett

, An advanced reinforcement learning approach for energy-aware virtual machine consolidation in cloud data centers, in: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), Cambridge, UK, 2017, pp. 61–66.

81.

Qin

Wang

and Zhai

, An energy-aware scheduling algorithm for budget-constrained scientific workflows based on multi-objective reinforcement learning, The Journal of Supercomputing 76(1) (2020), 455–480.

82.

Gabel

and Riedmiller

, Adaptive reactive job-shop scheduling with reinforcement learning agents, International Journal of Information Technology and Intelligent Computing 24(4) (2008), 14–18.

83.

Zhang

and Liu

, A novel multi-agent reinforcement learning approach for job scheduling in grid computing, Future Generation Computer Systems 27(5) (2011), 430–439.

84.

Ding

Fan

Zhao

Kang

Yin

and Zeng

, Q-learning based dynamic task scheduling for energy-efficient cloud computing, Future Generation Computer Systems 108 (2020), 361–371.

85.

Garí

Monge

D.A.

Pacini

Mateos

and Garino

C.G.

, Reinforcement learning-based application autoscaling in the cloud: A survey, Engineering Applications of Artificial Intelligence 102 (2021), 104288.

86.

Zhang

Yao

and Guan

, Intelligent cloud resource management with deep reinforcement learning, IEEE Cloud Computing 4(6) (2017), 60–69.

87.

Tong

Chen

Deng

and Li

, A scheduling scheme in the cloud computing environment using deep Q-learning, Information Sciences 512 (2020), 1170–1191.

88.

Zhang

Wang

Sun

and Liao

, Multi-task deep reinforcement learning for scalable parallel task scheduling, in: 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 2019, pp. 2992–3001.

89.

Cheng

and Nazarian

, DRL-cloud: Deep reinforcement learning-based resource provisioning and task scheduling for cloud service providers, in: 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, South Korea, 2018, pp. 129–134.

90.

Wei

Pan

Liu

and Meng

, DRL-scheduling: An intelligent QoS-aware job scheduling framework for applications in clouds, IEEE Access 6 (2018), 55112–55125.

91.

and Hu

, Deepjs: Job scheduling based on deep reinforcement learning in cloud data center, in: Proceedings of the 2019 4th International Conference on Big Data and Computing, Guangzhou, China, 2019, pp. 48–53.

92.

Kaur

Singh

Singh Batth

and Peng Lim

, Deep-Q learning-based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud, Software: Practice and Experience 52(3)(2022), 689–709.

93.

Wang

Liu

and Lin

, Energy-efficient VM scheduling based on deep reinforcement learning, Future Generation Computer Systems 125 (2021), 616–628.

94.

Rjoub

Bentahar

Wahab

O.A.

and Bataineh

, Deep smart scheduling: A deep learning approach for automated big data scheduling over the cloud, in: 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud), Istanbul, Turkey, 2019, pp. 189–196.

95.

Song

Zhou

Wang

and Du

, Host load prediction with long short-term memory in cloud computing, The Journal of Supercomputing 74(12) (2018), 6554–6568.

96.

Janardhanan

and Barrett

, CPU workload forecasting of machines in data centers using LSTM recurrent neural networks and ARIMA models, in: 2017 12th International Conference for Internet Technology and Secured Transactions (ICITST), Cambridge, United Kingdom, 2017, pp. 55–60.

97.

Gao

Wang

and Shen

, Task failure prediction in cloud data centers using deep learning, IEEE Transactions on Services Computing 15(3) (2020), 1411–1422.

98.

Masdari

and Khoshnevis

, A survey and classification of the workload forecasting methods in cloud computing, Cluster Computing 23(4) (2020), 2399–2424.

99.

Zhang

Yang

L.T.

Yan

Chen

and Li

, An efficient deep learning model to predict cloud workload for industry informatics, IEEE Transactions on Industrial Informatics 14(7) (2018), 3170–3178.

100.

Shen

, Virtual resource scheduling prediction based on a support vector machine in cloud computing, in: 2015 8th International Symposium on Computational Intelligence and Design (ISCID), Zhejiang, China, Vol. 1, 2015, pp. 110–113.

101.

Verma

and Sharma

, Workload prediction model based on supervised learning for energy efficiency in cloud, in: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), Mumbai, India, 2017, pp. 66–71.

102.

Jia

Z.H.

Pan

Liu

and Li

X.J.

, A novel cloud workflow scheduling algorithm based on stable matching game theory, The Journal of Supercomputing 77(10) (2021), 11597–11624.

103.

Shamshirband

Joloudari

J.H.

Shirkharkolaie

S.K.

Mojrian

Rahmani

Mostafavi

and Mansor

, Game theory and evolutionary optimization approaches applied to resource allocation problems in computing environments: A survey, Mathematical Biosciences and Engineering 18(6) (2021), 9190–9232.

104.

Mahmoud

Thabet

Khafagy

M.H.

and Omara

F.A.

, An efficient load balancing technique for task scheduling in heterogeneous cloud environment, Cluster Computing 24(4) (2021), 3405–3419.

105.

Sohrabi

M.K.

and Azgomi

, A survey on the combined use of optimization methods and game theory, Archives of Computational Methods in Engineering 27(1) (2020), 59–80.

106.

Ananth

and Sekaran

K.C.

, Game theoretic approaches for job scheduling in cloud computing: A survey, in: 2014 International Conference on Computer and Communication Technology (ICCCT), Allahabad, India, 2014, pp. 79–85.

107.

Song

Meng

and Lin

, A game theory based mapreduce scheduling algorithm, in: Emerging Technologies for Information Systems, Computing, and Management; Springer, New York, 2013, pp. 287–296.

108.

Anselmi

Ardagna

and Passacantando

, Generalized nash equilibria for saas/paas clouds, European Journal of Operational Research 236(1) (2014), 326–339.

109.

Patra

M.K.

Sahoo

and Turuk

A.K.

, Game theoretic approach for real-time task scheduling in cloud computing environment, in: 2019 International Conference on Information Technology (ICIT), Bhubaneswar, India, 2019, pp. 454–459.

110.

Yang

Jiang

and Choo

K.K.

, A task scheduling algorithm considering game theory designed for energy management in cloud computing, Future Generation computer systems 105 (2020), 985–992.

111.

and Chen

, Non-cooperative games on multidimensional resource allocation, Future Generation Computer Systems 29(6) (2013), 1345–1352.

112.

Baranwal

and Vidyarthi

D.P.

, Admission control in cloud computing using game theory, The Journal of Supercomputing 72(1) (2016), 317–346.

113.

Akbar

Malik

S.U.

Choo

K.K.

Khan

S.U.

Ahmad

and Anjum

, A game-based thermal-aware resource allocation strategy for data centers, IEEE Transactions on Cloud Computing 9(3) (2019), 845–853.

114.

Wen

and Sun

, A game theory based resource sharing scheme in cloud computing environment, in: 2012 World Congress on Information and Communication Technologies, Trivandrum, India, 2012, pp. 1097–1102.

115.

Yang

Chen

Wang

and Li

, Stackelberg game approach for energy-aware resource allocation in data centers, IEEE Transactions on Parallel and Distributed systems 27(12) (2016), 3646–3658.

116.

Zhou

Jiang

and Huang

G.Q.

, A game-theory approach for job scheduling in networked manufacturing, The International Journal of Advanced Manufacturing Technology 41(9) (2009), 972–985.

117.

Qaiser

H.u.R.

and Shu

, Nash equilibrium based replacement of virtual machines for efficient utilization of cloud data centers, Computing 102 (2020), 2521–2540.

118.

Abdeyazdan

Parsa

and Rahmani

A.M.

, Task graph pre-scheduling, using Nash equilibrium in game theory, The Journal of Supercomputing 64(1) (2013), 177–203.

119.

Yuan

Min

Yang

L.T.

Ding

and Fang

, A game theory-based dynamic resource allocation strategy in geo-distributed datacenter clouds, Future Generation Computer Systems 76 (2017), 63–72.

120.

Zhang

Wang

Liu

and Qian

, Game theory based real-time shop floor scheduling strategy and method for cloud manufacturing, International Journal of Intelligent Systems 32(4) (2017), 437–463.

121.

Pillai

P.S.

and Rao

, Resource allocation in cloud computing using the uncertainty principle of game theory, IEEE Systems Journal 10(2) (2014), 637–648.

122.

Dean

and Ghemawat

, MapReduce: A flexible data processing tool, Communications of the ACM 53(1) (2010), 72–77.

123.

Hashem

I.A.T.

Anuar

N.B.

Marjani

Ahmed

Chiroma

Firdaus

Abdullah

M.T.

Alotaibi

Ali

W.K.

Yaqoob

and Gani

, MapReduce scheduling algorithms: A review, The Journal of Supercomputing 76(7) (2020), 4915–4945.

124.

Dean

and Ghemawat

, MapReduce: Simplified data processing on large clusters, Communications of the ACM 51(1) (2008), 107–113.

125.

Vaidya

, Survey of parallel data processing in context with MapReduce, Computer Science and Information Technology 3 (2011), 69–80.

126.

Kalia

and Gupta

, Analysis of hadoop MapReduce scheduling in heterogeneous environment, Ain Shams Engineering Journal 12(1) (2021), 1101–1110.

127.

Pandey

and Saini

, A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in Hadoop YARN, Cluster Computing 24(2) (2021), 683–699.

128.

Singh

Sharma

Jeyaraj

and Paul

, Handling Non-Local Executions to Improve MapReduce Performance Using Ant Colony Optimization, IEEE Access 9 (2021), 96176–96188.

129.

Dai

and Ji

, A mapreduce implementation of C4.5 decision tree algorithm, International Journal of Database Theory and Application 7(1) (2014), 49–60.

130.

Moise

Trieu

T.T.

Bougé

and Antoniu

, Optimizing intermediate data management in MapReduce computations, in: 2011 Proceedings of the First International Workshop on Cloud Computing Platforms, Salzburg Austria, Vol. 10, 2011, pp. 1–7.

131.

Fontugne

Mazel

and Fukuda

, Hashdoop. A MapReduce framework for network anomaly detection, in: 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, Canada, 2014, pp. 494–499.

A survey on cloud computing scheduling algorithms

Abstract

Keywords

1. Introduction

2. Background

2.1 Problem description

3. Deterministic algorithms

Table 2 Deterministic algorithms

3.2 Task features aware algorithms

3.3 User demand aware algorithms

4. Metaheuristic algorithms

4.2 Multi-agents, nature-inspired algorithms

4.3 Using fuzzy logic and fuzzy system to improve metaheuristic algorithms

5. Learning algorithms

Table 4 Learning algorithms

5.2 Reinforcement learning

5.3 Deep learning

6. Algorithms based on game theory

7. MapReduce review as an example

7.1 Hadoop

7.2 Algorithms based on MapReduce

Table 5 Comparison between four categories of scheduling algorithms

8. Conclusion

Footnotes

Author’s Bios

References

Table 2
Deterministic algorithms

Table 4
Learning algorithms

Table 5
Comparison between four categories of scheduling algorithms