Distribution slack allocation algorithm for energy aware task scheduling in cloud datacenters

Abstract

Regarding the ever-increasing development of data and computational centers due to the contribution of high-performance computing systems in such sectors, energy consumption has always been of great importance due to CO₂ emissions that can result in adverse effects on the environment. In recent years, the notions such as “energy” and also “Green Computing” have played crucial roles when scheduling parallel tasks in datacenters. The duplication and clustering strategies, as well as Dynamic Voltage and Frequency Scaling (DVFS) techniques, have focused on the reduction of the energy consumption and the optimization of the performance parameters. Concerning scheduling Directed Acyclic Graph (DAG) of a datacenter processors equipped with the technique of DVFS, this paper proposes an energy- and time-aware algorithm based on dual-phase scheduling, called EATSDCDD, to apply the combination of the strategies for duplication and clustering along with the distribution of slack-time among the tasks of a cluster. DVFS and control procedures in the proposed green system are mapped into Petri net-based models, which contribute to designing a multiple decision process. In the first phase, we use an intelligent combined approach of the duplication and clustering strategies to run the immediate tasks of DAG along with monitoring the throughput by concentrating on the reduction of makespan and the energy consumed in the processors. The main idea of the proposed algorithm involves the achievement of a maximum reduction in energy consumption in the second phase. To this end, the slack time was distributed among non-critical dependent tasks. Additionally, we cover the issues of negotiation between consumers and service providers at the rate of μ based on Green Service Level Agreement (GSLA) to achieve a higher saving of the energy. Eventually, a set of data established for conducting the examinations and also different parameters of the constructed random DAG are assessed to examine the efficiency of our proposed algorithm. The obtained results confirms that our algorithm outperforms compared the other algorithms considered in this study.

Keywords

Green service level agreement hroughput dynamic voltage and frequency scaling energy-aware scheduling slack-time distribution petri nets

1 Introduction

The growth of applications requiring a high amount of energy has urged the academic community to investigate them concerning several issues such as economic aspects, environmental challenges, low reliability and availability [1]. Estimations indicate that the total power consumption in the information and communication technology (ICT) industry is over 868 billion kWh, which accounts for 5.3% of the total electricity used across the globe in 2008 [2]. Due to the ever-increasing technological advancement, it is expected that the total power demand will increase up to four times by 2025. Carbon emissions related to the energy consumption may potentially increase the cooling cost. Any boosts in the usage of efficient computing systems will end in a very hot temperature caused by the increased generated power for fueling them. It has been proven that some failures can be observed up to two times with the heat rising of 10°C; this affects the systems’ reliability and availability, which may result in severe damages [3].

Reliability refers to the ability of a system to perform its intended function successfully, for a specified period of time, under predetermined conditions. Accordingly, the power consumption for running and scheduling parallel tasks has heightened concerns in the last decade. Two specific approaches to energy conservation are dynamic power management (DPM) [4] and dynamic voltage frequency scaling (DVFS) [5]. The former usually aims to deactivate idle nodes due to their long durations. The DPM approach shows no participation in short idle durations events due to a quick runtime between each task. On the other hand, in the DVFS approach, due to the reduction of power consumption, the tasks receive a specified frequency. Due to the power consumption and other effective parameters, scheduling parallel tasks with limited prioritizing, i.e., directed acyclic graph, is of high concern in both homogenous and heterogeneous computing environments (e.g., cloud data centers). This issue raises one of the hard optimization problems [6] for which many heuristic algorithms have been put forward in the literature. Duplication and clustering strategies, as well as DVFS, have been found capable of reducing energy consumption. They focus on the optimization of performance parameters, including throughput and makespan. The propagation of several specific tasks and running the cluster ones occur on multi-processors to reduce the communication overheads among the processors equipped with DVFS.

The main contributions of this paper are summarized as follows: (1) We develop formal models for parallel tasks and an energy aware cloud and we also define the task scheduling issue, (2) We develop a novel energy-aware scheduling heuristics algorithm for parallel tasks called EATSDCDD, which can be adapted for a wide range of DAG applications, (3) We present the green SLA use scenarios and propose a new scheduling heuristics for energy aware parallel task scheduling, which makes a study of the tradeoff between the energy consumption and task execution time, (4) Unlike the most existing studies for DAG applications, we try to allocate a slack to tasks belonging to the maximum independent set of a task.

The remainder of this paper is organized as follows. Section 2 introduces background and related work. Section 3 introduces the system model and formulates the estimation scheduling problem model and Petri Net model. In Section 4, the proposed EATSDCDD algorithm is described and its performance is evaluated. Section 5 gives the Tracing algorithms on example DAG. Section 6 gives the experimental setup and simulation results comparing them with those of other algorithms, in both homogeneous and heterogeneous environments. Section 7 presents the Green Service Level Agreement. Finally, Section 8 concludes the whole paper.

2 Related work

The recent advancement in ICT has resulted in continuous growth of cloud data centers and computing centers, hence increasing the energy consumption and, consequently, negative impact on the environment through generation of greenhouse gases and excessive emission of CO₂ [7]. In recent years, great efforts have been made focusing on developing parallel task scheduling algorithms for distributed platforms such as clusters, girds, and clouds. These studies have achieved good results in energy and performance improvements using List scheduling [8], DVFS techniques [1, 9-11], Clustering-based scheduling [1, 8], Heuristic algorithms [12-18] and Duplication-based scheduling [1]. This section gives a brief review of various existing task scheduling algorithms that mainly consider the improvement of energy efficiency and optimization of the performance parameters in distributed computing.

2.1 Energy reduction via dynamic voltage frequency scaling

The DVFS technique is one of the most efficient ways of providing significant energy saving for processors through simultaneous minimization of frequency and supply voltage for slack time slot of tasks as well as communication and idle phases [19, 20].

Wang et al. [8] employed an energy-aware scheduling heuristic algorithm called PALS and PATC to simultaneously reduce makespan and energy consumption for scheduling parallel tasks in a cluster using the DVFS technique. After determining the critical and non-critical paths, the algorithm assigns jobs in the critical paths to processors with the highest voltage/frequency. Then, the slack time of each job is calculated in the non-critical paths, and the voltage/frequency of the assigned processors is scaled down to process the non-critical jobs. This strategy mitigates energy consumption without increasing makespan.

Another approach to scheduling tasks was proposed by Wu et al. [21] to reduce energy consumption using the DVFS technique. This technique was adopted to dynamically control the frequency and voltage of cloud computing servers. The scheduling algorithm takes into account the maximum job (F_max) and minimum job (F_min) frequencies given to each job and multiple server S_i running at maximum S_i (F_max) and minimum Si (F_min) frequencies. For specific jobs, the scheduling algorithm efficiently assigns proper servers that run between (F_min, F_max) to jobs based on the requirements of job frequencies.

Juarez et al. [22] proposed a real-time dynamic scheduling method, called Multi-heuristic Resource Allocation (MHRA), for efficient execution of task-based applications on a distributed computing platform of cloud computing. This served to mitigate energy consumption and makespan. This method involved a polynomial time algorithm combining a set of heuristic rules and resource allocation techniques. To balance the two-objective function, a weight factor, α, was introduced 0 ≤ a ≤ 1 by which the user can specify the significance of each objective.

Hu et al. [3] developed the EASLA algorithm to schedule parallel applications through the DVFS technique, while maintaining the SLA on a cluster platform. The main idea behind the EASLA algorithm is to allocate each slack to a maximum set of independent tasks for each task using a compatible task matrix and scale frequencies down in order to minimize energy consumption within certain extension rate of makespan mutually accepted by user and service provider. Some researchers [23-25] proposed new task scheduling in distributed computing systems using the DVFS, clustering and duplication technique. Moreover, some authors [26-28] proposed new task slack time algorithm for task scheduling in distributed computing systems using the DVFS technique. Vydyanathan [29] proposed a multi objective task scheduling algorithm using the clustering and duplication technique.

2.2 Energy reduction via duplication, clustering, and list scheduling

A list scheduling algorithm maintains a list of all tasks of a given DAG according to their priorities. List scheduling has two phases: selecting the tasks with the highest priorities and selecting suitable processors. Heterogeneous Earliest Finish Time (HEFT) [16] is a well-known list-scheduling algorithm. HEFT operates in two phases. Prioritizing tasks and assigning tasks to processors. Mohamed et al. [30] proposed a modified non-square direct matrix converters (MC) for renewable microgrid applications using generative adversarial networks (GAN). They also provided a new application of the non-square direct MC in the microgrid system, which was able to provide a balanced output with any desired amplitude and frequency under unbalanced conditions, specifically in the case of using renewable energy sources such as photovoltaics (PVs) and wind turbines.

Clustering heuristics lower considerably the communication costs of data transferring to almost zero by propagating the tasks to the specified processors. Furthermore, task clustering helps to schedule based on the reduction of the active power of processors and power consumption [8]. Yang and Apostolos [31] proposed the DSC (Dominant Sequence Clustering) algorithm for Critical path task scheduling in homogeneous distributed computing systems using EST.

Mohamed et al. [32] proposed an energy management scheme considering five different agents, including an energy hub, a networked multi-microgrid with three agents, and a transportation system. Additionally, they developed an effective transportation system model within the smart island to save energy and minimize the operation costs.

The current increase in the chip density and improper reduction of CPU voltage in cloud data centers will certainly increase the frequency of errors during the execution of the workflow. As a result, this problem will occur in the timely completion of workflow programs, which raises serious concerns about the use and maintenance of data in cloud data centers. In this regard, Wu et al. [33] an error-conscious energy-saving work planning approach for workflow applications in DVFS-enabled cloud data centers. This approach resulted more than 30% energy savings while providing both reliability and completion time.

Li et al. [34] use reverse fault recovery, running a program with uniform frequency or high and low of target frequency, in the absence of desired frequency. This not only consumed less energy, but also led to the highest capability. They compared the strategy of running the program in real time with those proposed in recently-published articles. Experimental results proved that their proposed method can save up to 12% more energy than other methods without compromising the reliability of the program and the implementation deadlineb.

Safari and Khorsand [35] proposed an energy-aware scheduling algorithm for time-limited workflows by means of the DVFS method, in which the host reduces the operating frequency using different voltage levels. The purpose was to reduce energy consumption and customer service contract (SLA) violations and to improve resource utilization. The simulation results showed that their proposed method performs better when evaluating criteria such as energy use, average execution time, use of average resources, and relative breach of customer service contract.

Today by minimizing the energy of applications with a circular graphical structure in a heterogeneous cloud system, energy consumption can be saved. In this regard, Safari and Khoshnod [46] combined a power-based list-based programming algorithm with dynamic voltage and frequency-scaling technique (DVFS) for real-time work (PL-DVFS) in a way to maintain service quality by setting deadlines. The purpose was to improve performance and reduce overall energy consumption, including the3 CPU energy (busy and idle) and communication energy [36]. DVFS is becoming an industry standard due to its location in the CPU hardware; proper software approaches are essential for calibration. Hence, Amulu and Ramraj [37] proposed an approach to minimize the right time and cost, increase user satisfaction, and reduce energy consumption.

In multi-core systems, depending on the computational demand of an application, the cores either operate separately at different voltage and frequency levels or are placed on several voltage frequency islands (VFI) to reduce the system power consumption. In this regard, Hajiamini et al. [38] addressed the issue of VFI scheduling and partitioning with the aim of optimizing the execution time of a set of tasks for a given amount of energye.

Nejat et al. [39] considered a QOS-based synchronized resource management (RMA) algorithm for the first time in a multi-core system, which dynamically adjusts the size of the end-level cash partitions and the voltage frequency settings of each core. It was done in order to save energy while complying with QoS. Moreover, each program run on multi-core processors during multi-program workflow by the configuration space within the LLC partition size range and the dynamic voltage-frequency scaling (DVFS) settings at low overhead at runtime.

Despite the implementation, design, modeling, computing, communication, and several architectural challenges to the Internet of Things (IOT), the use of the cloud IOT is synchronized with energy efficiency, which has not been considered by researchers yet. To fill this gap, Toor et al. [40] considered the concept of renewable energy in the cloud environment. In order to add authentication as well as a layer of security to this sensitive data, block chain offers an ideal solution for providing a central office for data usage. The deployment of block chain technology and the cross-platform nature makes it an ideal proposition for exploiting the full potential of IOT. Thus, they presented a compact and energy-conscious design for the cloud computing environment of the Internet. Moreover some researchers [41-43] increased the reliability by providing a solution to reduce energy consumption.

3 System model

This section suggests the formal models applicable to systems architecture, tasks dependency graph, DVFS, the energy consumption of processors and communications, makespan, and throughput considering some limitations and hypothesis for the problems formulation. Table 1 summarizes notations on the mathematical formulas.

Table 1
Definition of notations for problem formulation

Notation Definition Notation Definition

t _i The task number ith v _highj The highest voltage of processor jth

N The number of tasks (nodes) in DAG f _highj The highest frequency of processor jth

w _i The weight of task ith $p_{j} . f_{k}^{op}$ The processor jth operating frequency at level k

$t_{i}^{st}$ The start time of task ith ND _ t_i The number of duplication task ith

et (t_i, p_j) The execution time of task ith on processor jth CCR Communication to Computation Ratio

$t_{i}^{end}$ The end time of task ith p Power consumption

CPI The clock per instruction P _dynamic Dynamic Power consumption

succ (t_i) The set of successors of task ith P _static Static Power consumption

pred (t_i) The set of predecessors of task ith E Energy consumption

d _ij The independent between task ti and tj ec _ij Communication energy by edge d_ij

ct (d_ij) The communication time to transfer message dij PC Power of interconnect

et (C_i) The execution time of cluster C_i C _max Makespan (total length of the schedule)

|j| The number of computational nodes CPL Critical Path Length

|k| The number of VM in each computing node CommR (G) Communication Rate for DAG

|m| The number of processors in each VM CompR(G) computation Rate for DAG

(v_j, f_j) The voltage and frequency pairs of processor jth Th (G) Throughout for DAG

(v_kj, f_kj) The voltage/frequency pairs of processor jth at level k μ Makespan extension rate based on agreement

Notation	Definition	Notation	Definition
t _i	The task number ith	v _highj	The highest voltage of processor jth
N	The number of tasks (nodes) in DAG	f _highj	The highest frequency of processor jth
w _i	The weight of task ith	$p_{j} . f_{k}^{op}$	The processor jth operating frequency at level k
$t_{i}^{st}$	The start time of task ith	ND _ t_i	The number of duplication task ith
et (t_i, p_j)	The execution time of task ith on processor jth	CCR	Communication to Computation Ratio
$t_{i}^{end}$	The end time of task ith	p	Power consumption
CPI	The clock per instruction	P _dynamic	Dynamic Power consumption
succ (t_i)	The set of successors of task ith	P _static	Static Power consumption
pred (t_i)	The set of predecessors of task ith	E	Energy consumption
d _ij	The independent between task ti and tj	ec _ij	Communication energy by edge d_ij
ct (d_ij)	The communication time to transfer message dij	PC	Power of interconnect
et (C_i)	The execution time of cluster C_i	C _max	Makespan (total length of the schedule)
\|j\|	The number of computational nodes	CPL	Critical Path Length
\|k\|	The number of VM in each computing node	CommR (G)	Communication Rate for DAG
\|m\|	The number of processors in each VM	CompR(G)	computation Rate for DAG
(v_j, f_j)	The voltage and frequency pairs of processor jth	Th (G)	Throughout for DAG
(v_kj, f_kj)	The voltage/frequency pairs of processor jth at level k	μ	Makespan extension rate based on agreement

3.1 Architecture model

In this study, a model with the three layers (i.e., User layer, COMP superscaler layer, and Cloud Resource layer) is proposed for the aim of scheduling parallel tasks on a cloud data center. To run their efficient applications including main codes, users need to employ the processors of a cloud data center.

3.2 Tasks dependency graph

The task dependency analyzer in COMPSs layers converts the user’s sequential program to DAGs. A directed acyclic graph (or DAG) is a digraph that has no cycles. A rooted tree is a special kind of DAG and a DAG is a special kind of directed graph [44]. The formed DAGs, called tasks dependency graph, are represented as G(T,D), where:

• T: It consists of a set of tasks in G, as given in (1). All tasks ∀t_i ∈ T are the components of the application code (nodes in a DAG): $T = \cup {t_{i}}, 1 ⩽ i ⩽ n$ (1) where t_i is a task ith in DAG. n is the total number of tasks. w_i is weight on task, t_i represents the instruction number of task t_i. $t_{i}^{st}$ is the start time of task t_i.

et (t_i, p_j) is the execution/computation time of task t_i on processor p_j. The task execution time t_i is calculated as formulation (2): $et (t_{i}, p_{j}) = \frac{t_{i} . wCPI}{p_{j} . f^{op}}$ (2)

CPI denotes the number of clock cycle per instruction for a task by processor. The ideal CPI is 1.

$t_{i}^{end}$ is the end/finish time of task t_i through the formulation (3): $(t_{i}^{end}, p_{j}) = (t_{i}^{st}, p_{j}) + et (t_{i}, p_{j})$ (3) • D: consists of a set of directed edges between the tasks in G to represent precedence constraints of an edge, d_ij ∈ D represents that task t_j is independent on task t_i, where t_j must be executed after the end of t_i, t_i is the parent and t_j is the child (or t_i is the predecessor of t_j and t_j is the success or of t_i).

ct (d_ij) is the communication time/cost of an edge d_ij, for transfer message d_ij. This time/cost is incurred if t_i and t_j are scheduled on different processors, while it is considered to be zero if t_i and t_j are scheduled on the same processor.

3.3 Model for dynamic voltage and frequency scaling

Processors currently utilize the DVFS technique to reduce the power consumption in a computing data center. It allows processors to operate using a discrete set of voltage and frequency pairs, which is (v_j, f_j). The system enables us to deal with the heavy load of tasks with a useful performance by adjusting the voltage and frequency at the same time. Due to a specifically-required increase in both the voltage and frequency or, on the other hand, a possible lowest rate in each one in terms of saving energy, Equation (4) can be formulated as follows: $\begin{matrix} (v_{j}, f_{j}) = {(v_{minj}, f_{maxj}) = (v_{1 j}, f_{1 j}) < (v_{2 j}, f_{2 j}) \\ < \dots < (v_{kj}, f_{kj}) = (v_{\max j}, f_{maxj})} \end{matrix}$ (4) where k indicates a total number of operating points in a processor, and (v_kj, f_kj) is either the voltage or the frequency of a processor j that exists at level k.

Furthermore, the execution time of task t_i on processor p_j with the set of working frequencies from f_1j to f_kj can be calculated through formulation (5). In fact, greater frequency level leads to a shorter makespan. ${et}^{'} (t_{i}, p_{j}) = [\frac{t_{i} . w \times CPI}{p_{j} . f_{1}^{op}}, \frac{t_{i} . w \times CPI}{p_{j} . f_{2}^{op}}, \dots, \frac{t_{i} . w \times CPI}{p_{j} . f_{k}^{op}}]$ (5)

3.4 Scheduling directed acyclic graph estimation model

In this section, we present the models for the performance evaluation of DAGs of different sizes and the determination of the energy consumption, makespan, and throughput.

3.4.1 Estimating makespan

Makespan (C_max) is defined as the amount of time (from start to end) required to complete a set of sequences. The aim of scheduling algorithms is the minimization of makespan. Equation (6) indicates how to obtain makespan. To calculate makespan, the largest task end time is subtracted from the smallest task start time. $C_{\max} = [\max (taski {. t}^{end}) - \min (taskj {. t}^{st})], 1 ⩽ i, j ⩽ n$ (6)

The critical path in a DAG constitutes the longest path from an entry to exit point of a node in a graph. In the case of dependency of tasks, directed edges of a parent task will pursuit the other tasks. It then should determine all possible paths. Against the others, the path that includes a longer duration will indicate an appropriate run time of the efficient application. Accordingly, the tasks will indeed stick to the defined execution time. Furthermore, it can be assumed that the tasks of the critical path may assign to the processing resources with high voltage and frequency (higher computational capability); on the other hand, the processing resources with lower voltage and frequency may choose to allocate to those that exist in the DAG G, but not the critical path. This will decrease the power computation in the processors of a data center. Accordingly, it can be expressed as follow: $C_{\max} (G) = CPL (G)$ (7) where makespan can be equal to the Critical Path Length (G).

3.4.2 Estimating throughput

This adopted model [25] works with the evaluation of throughput. Two rates of CommR(G) and CompR(G) must be taken into account in evaluating the throughput associated with a DAG G. The data elements are then consecutively processed regarding the fact that all the available tasks in a cluster, for instance C_i, are accomplished in only one processor. The rate of computation for such a cluster is equal to $\frac{1}{et (C_{i})}$ data items per time unit. Since the slowest cluster involves determining the rate of computation for DAG G, CompR(G) can be calculated by formulation (8) as follows: $CompR (G) = min_{\forall C_{i} \in C} \frac{1}{et (C_{i})}$ (8)

Likewise, the rate of communication for d_ij edges is equal to $\frac{1}{ct (d_{ij})}$ data items per time unit. Since the slowest communication rate of the edges in G involves determining the rate of communication for DAG G, CommR(G) can be calculated by formulation (9): $CommR (G) = min_{\forall d_{ij} \in D^{'}} \frac{1}{ct (d_{ij})}$ (9)

As processors are capable of handling both communication and computation at the same time, the overall throughput can be calculated by formulation (10): $Th (G) = min_{\forall d_{ij} \in D^{'}} {compR (G), commR (G)}$ (10)

3.4.3 Energy consumption model

Scheduling of dependency tasks in computational systems needs the consumption of energy equivalent to the total energy consumption of the processors in both tasks execution and data transmission to another processor in a communication network.

3.4.3.1 Energy for computation Nowadays, CMOS technology design is commonly used to build processors in which power is consumed in one way; the static or dynamic consumption of energy is obtained by formulation (11): $P = P_{dynamic} + P_{static}$ (11)

Static power consumption, i.e., the main source of static current, is leakage current and reverse based PN junction when there is no circuit activity, whereas dynamic power consumption involves charging and discharging of capacitances when inputs are active.

P_static can be eliminated from the equation, regarding the energy consumption for the execution of the parallel tasks based on the energy for computation in processors and the energy in communications among processors. Thus, formulation (12) is used to calculate the dynamic energy consumption in a processor: $P_{dynamic} = {ACv}^{2} f$ (12) where A indicates the percentages of active logic gates with the capacity of dynamic charging, C denotes a full load of a capacitor, v is voltage, and f is the frequency of the processor.

DVFS-based processors meet the maximization of power consumption (P_proc.highest) when the operation is carried out with the highest possible voltage (v_highest) and frequency (f_highest). Therefore, the active power consumption of a processor can be calculated based on a set of voltage and frequency (v_j, f_j) using formulation (13): $P_{procj} = P_{proc . highest} \times \frac{v_{j}^{2} \times f_{j}}{v_{highest}^{2} \times f_{highest}}$ (13) $P_{proc . highest} = {ACv}_{highest}^{2} f_{highest}$

Since the proposed algorithm adopts the task duplication strategy for scheduling a DAG with n tasks on DVFS-enabled processors, the total energy consumption can be calculated by formulation (14): $\begin{matrix} P_{processors . active} = \sum_{i = 1}^{n} P_{proc . highest} \\ (\sum_{j = 1}^{k} \frac{v_{j}^{2} \times f_{j}}{v_{highest}^{2} \times f_{highest}} + ND_t_{i}) \end{matrix}$ (14) $\begin{matrix} E_{processors . active} = \sum_{i = 1}^{n} P_{proc . highest} \\ (\sum_{j = 1}^{k} \frac{v_{j}^{2} \times f_{j}}{v_{highest}^{2} \times f_{highest}} \times et (t_{i}, p_{m} (v_{j}, f_{j}))) \\ + ND_t_{i} \times et (t_{i}, p_{m} (v_{highest}, f_{highest})) \end{matrix}$ where ND_ti is the number of duplication task ith, et (t_i, p_m (v_j, f_j)) is the execution time of task ith on processor mth with voltage and frequency pairs of processor jth, et (t_i, p_m (v_highest, f_highest)) denotes the execution time of task ith on processor mth with highest voltage and frequency pairs of processor jth.

Formulation (15) presents the obtaining of the energy consumption of the processors in idle durations, as follows: $\begin{matrix} E_{processors . idle} = {ACv}_{lowest}^{2} f_{lowest} (| m | \times makespan \\ - \sum_{i = 1}^{n} (\sum_{j = 1}^{k} et (t_{i}, p_{m} (v_{j}, f_{j}))) + ND_t_{i} \\ \times et (t_{i}, p_{m} (v_{highest}, f_{highest})) \end{matrix}$ (15) where m is the number of processors and makespan is the maximum time for completion of tasks by processors, also known as scheduling length.

Finally, the total energy consumed by processors to execute the task dependency graph can be obtained by the sum of (14) and (15) as follows: $E_{processors} = E_{processors . active} + E_{processors . idle}$ (16)

3.4.3.2 Energy for communications. Since the processors in each datacenter have been assumed to be homogeneous, the data transfer speed and power consumption are identical (see formulation (17)): ${ec}_{ij} = PC \times ct (d_{ij})$ (17) where ec_ij indicates the energy consumed for the communications of d_ij ∈ D, and PC denotes the power of interconnect value. Therefore, the total communication energy for the entire network can be calculated through formulation (18): $E_{Communications} = \sum_{i = 1}^{n} \sum_{v_{j} \in Succ (v_{i})} (x_{ij} \times {ec}_{ij})$ (18) where x_ij is expressed with formulation (19), as follows: $x_{ij} = {\begin{matrix} 0 & if (t_{i}^{end}, p_{m}) = (t_{j}^{st}, p_{m}) \\ 1 & O . W \end{matrix}$ (19)

3.4.3.3 Total energy for datacenter. This can be expressed by formulation (20) for a cloud data center that can be obtained through sum of (16) and (18), thus: $\begin{matrix} E_{Total} = E_{dynamic (processor . active)} \\ + E_{dynamic (processor . idle)} + E_{communication} \end{matrix}$ (20)

3.5 Problem formulation

The energy-performance tradeoff scheduling problem stated before is formally defined as follows. If an application consists of n parallel tasks, the required number of PEs p, the schedule length of the best effort schedule makespan_best, and the negotiated makespan extension rate μ, after an initial schedule, allocate the slacks to the appropriate tasks for downscaling their frequencies to try to minimize energy consumption. More formally, the problem can be formulated as follows:

min En = En_active + En_idle + En_com, C_max (G) , MaxTh_cur (G)

Subject To:

Makespan ⩽ Makespan_best × (1 + μ)

En_cur (G) ⩽ En_c (G) , Th_cur (G) ⩾ Th_c (G)

Where $X_{ij} \in {0, 1}, 0 ⩽ μ ⩽ 1, Σ β_{k}^{i} = 1, β_{k}^{i} \in {0, 1}$

The timing constraint is enforced by formulation (20). Since the available frequency range after an initial schedule is finite and discrete, we use boolean $β_{k}^{i}$ as an indicator to reflect which frequency level k is used for task i on the allocated PE l. Now, the question is how to assign different frequencies to each task to minimize energy consumption without violating the time constraint.

3.6 Petri Nets-based proposed model

Petri Nets (PNs) is a language for the modelling and validation of systems in which concurrency, communication, and synchronization play a major role. Petri Nets is a discrete-event modelling language with the functional programming language Standard ML and provides the foundation of the graphical notation such as token, place, arc, and transitions for concurrency, communication, and synchronization [45, 46].

Traditionally, Petri Nets are classified under two categories: low-level representation and high-level representation. The latter (e.g., Colored Petri nets (CPNs), Time Petri nets (TPNs), and Hierarchical Petri nets (HPNs)) is considered in applied objectives, regarding the possibility that allows for the compact modeling with quantitative capabilities [47]. In this section, the High-level Petri Nets provide a model of energy consumption in terms of an improvement for the rates of the throughput and makespan. Figures 1 and 2 show the optimized model of the proposed objectives, where the system states are modeled by places and the controlled and decidable operators are modeled by transitions.

Fig. 1

The Petri Nets-based model for energy consumption.

Fig. 2

Dynamic voltage and frequency scaling structured by Petri Nets.

In the following, the places and transitions of the proposed Petri Nets-based model of energy consumption are described in Tables 2 and 3, respectively.

Table 2

Descriptions for the places of the proposed petri nets-based model of energy consumption

Places	Name	Place description
P₁	Consumer	Users, clients and consumers are ready to send the sequential and efficient applications of them, as a request, for execution by the processors of a cloud data center.
P₂	Queue	The system holds the request in a queue.
P₃	Ready	The user’s request is ready to go to the task dependency section.
P₄	DAG	The sent sequential program of the user to COMPSs is then converted to a DAG by the task dependency analyzer.
P₅	Objective Function	This section highlights the objectives and constraints for both the green and function, for instance, the energy consumption and makespan and throughput, respectively.
P₆	File manager	This provides the information of the file as well as the management options for the transmission of it.
P₇	Job manager	It presents the information for the transmission responsibility besides the monitoring of the execution. The application sends the scheduling of the tasks. The required file is sent to the job manager by them.
P₈	Resource manager	This section gathers the necessary information associated with the introduction of the available resources.
P₉	Best sequence schedule	It sets appropriate scheduling based on three information principals including existing sources as well as their capacities, objective functions with the task constraints of user and data storage.
P₁₀	Tasks	The tasks on the directed acyclic graph are ready to carry out the clustering, duplication, and dynamic voltage and frequency scaling technique.
P₁₁	Set of Tasks Clustering &Duplication	The clustering and duplication techniques then train the tasks on the directed acyclic graph.
P₁₂	Output	The generated outputs by the second phase of the proposed algorithm are therefore applied in the first phase.

Table 3

Descriptions for the transitions of the proposed petri nets-based model of energy consumption

Transitions	Name	Transition Description
T₁	Submit	This allows the user to run their applications on the processors of a cloud data center by sending their files and the main code, using a personal computer or tablet through a web browser, for instance, Firefox.
T₂	Task Dependency	It is responsible for the specification of the dependency of the tasks by identifying all the related parameters; accordingly, it creates a DAG automatically.
T₃	Task Scheduler	This section provides the best scheduling of the tasks by using the other information sources.
T₄	Energy-Aware Allocation	This transition leads to the formation of the specified parameters to schedule the energy-aware tasks.
T₅	Clustering &Duplication	It applies the first algorithm, i.e., energy-aware tasks scheduling with duplication &clustering (EATSDC), over an input graph to create an output that acts as a transition input of DVFS.
T₆	DVFS	This phase of transition uses the latest output of the previous transition to adapt with voltage/frequency scaling algorithm.

4 Proposed method

This section presents energy aware task scheduling with combination of Duplication, Clustering, and DVFS slack allocation distribution algorithm, called EATSDCDD, in cloud datacenters, which is aimed at achieving high throughput and reducing the schedule length (makespan) and energy consumptions. Our proposed algorithm contains two main phases: EATSDCA and EADVFSDA. The former operates with a scheduling algorithm to achieve reduced energy consumption in communications and increase the throughput. On the other hand, the latter utilizes the DVFS technique on each processor to reduce the energy consumed for computing in processors and to run a DAG using the energy-aware dynamic and voltage frequency scaling (EADVFSD) algorithm. It will be discussed later in detail. The steps of the proposed method, which consists of three phases, are shown in Fig. 3.

Fig. 3

The proposed EATSDCDD flowchart.

4.1 Energy-aware task duplication clustering algorithm

The first phase presents the Energy-Aware Task Duplication-Clustering Algorithm (EATSDC) for parallel task scheduling. Our EATSDC attempts to satisfy makespan, throughput, and energy constraints using duplication and clustering strategy. Task clustering reduces makespan by zeroing edges of high communication time and proper adoption of the strategy. Task duplication decreases the communication overhead by reducing the allocation of certain tasks to multiple processors and, consequently, lowering energy consumption. The pseudo-code of EATSDC is shown in Algorithm 1 (Fig. 4).

Fig. 4

Algorithm 1.

Since the purpose of scheduling is the reduction of makespan, it can be carried out in a shortened term on the critical path using the b-level descending order (it offers prioritizing of a task). Inside each cluster, tasks are executed in the order of their b-level. b_level is a normal priority assignment for tasks. The b-level of a node is bounded from above by the length of a critical path (CP). b-level is calculated with Algorithm 2 (Fig. 5).

Fig. 5

Algorithm 2.

Algorithm 1 executes the clustering and duplication strategies on the input graph. Lines 3 and 4 require |T|, while Lines 5 to 8 require |D| operation times to calculate CompR(G) and CommR(G). Sorting in Line 9 can be done at time |D|log|D| based on quick sort. Lines 10 to 16 require 2×(|T|+|D|)² operations; the calculation of CP in Line 17 requires (|T|+|D|)² operations, and Lines 19 to 25 require |D|² operations to satisfy the energy consumption. Moreover, the calculation of b-level for all tasks in the integrated cluster requires (|T|+|D|+|D|) operations. As a result, the time complexity of EATDCA is equal to O(|D|×(|T|+|D|)²). While the complexity of the PATC algorithm is O(|E|×|E|Lg|E|+CT×A3).

4.2 Energy-Aware Dynamic Voltage/Frequency Scaling Distribution (EADVFSD) algorithm

After applying the duplication and clustering strategy to the input DAG, which leads to lower energy consumption and makespan and higher throughput, it is intended to mitigate the energy consumed by processors through determining the critical path and non-critical paths. In addition, the slack time of non-critical tasks is specified, and the voltage and frequency of processors allocated to processing of tasks in non-critical paths as well as idle and communication phases through scaling down DVFS are calculated. For this reason, it is essential to first explore the important parameters used in applying DVFS techniques to reduce energy consumption.

4.2.1 Calculation of slack time for tasks

Calculation of the Earliest Start Time (EST) is a top-down method, which starts with the first task and ends with the last one, calculated by formulation (21) starting from the entry task and terminating at the exit task. For the entry task, this parameter is 0: $\begin{matrix} EST (t_{i}) = {\begin{matrix} 0 & if pred (t_{i}) = φ \\ MAX (EFT (t_{i}), & O . W \\ MAX (EFT (t_{k})) + d_{ki})) \end{matrix} \\ t_{k} \in pred (t_{i}), d_{ij} \in D \end{matrix}$ (21)

After calculating EST, the Earliest Finish Time (EFT) can be calculated for task t_i by formulation (22), where et (t_i, p_m) is execution time of task ith on processor jth: $EFT (t_{i}) = EST (t_{i}) + et (t_{i}, p_{m})$ (22)

Moreover, the calculation of Last Finish Time (LFT) is a bottom-up method, which starts with the last task and ends with the first task, calculated by formulation (23); for the exit task, this parameter equals to its earliest completion time: $\begin{matrix} LFT (t_{i}) {\begin{matrix} EFT (t_{i}) or makespan & if succ (t_{i}) = φ \\ Min (LST (t_{j}), & O . W \\ Min (LST (t_{k}) - d_{ki})) \end{matrix} \\ t_{k} \in succ (t_{i}), d_{ij} \in D \end{matrix}$ (23)

The Latest Start Time (LST) for task t_i is also calculated by formulation (24): $LST (t_{i}) = LFT (t_{i}) - et (t_{i}, p_{m})$ (24)

The non-critical tasks in a DAG are distinguished by the presence of slack. Critical tasks have zero slack, while non-critical tasks have slack value that is known as slack time. For each task ith, the slack time is calculated using formulation (25): $\begin{matrix} Slack time for task t_{i} = LST (t_{i}) - EST (t_{i}) or \\ LFT (t_{i}) - EFT (t_{i}) \end{matrix}$ (25)

Algorithm 3 indicates the pseudo-code for computing the slack time; in case of critical and non-critical paths (Fig. 6).

Fig. 6

Algorithm 3.

4.2.2 The compatible matrix of tasks

It indicates a list of tasks with the possibility of sharing and distributing the slack-time to each other. The compatible matrix of a directed acyclic graph with a specified number of tasks (n) is a form of an n*n matrix that is shown in Fig. 7. A graph of a task generally structures a symmetric compatible matrix. Compatible matrix elements, e.g., m_ij, show whether t_i and t_j tasks are capable of sharing the slack time. In case the element m_ij equals 1, it can potentially be expected the share and distribution of the slack-time between t_i and t_j tasks, regarding their independent execution. Moreover, the distributed slack time will neither disturb the specified time for the tasks, nor increase the total time of the execution. The results indicate that the distribution of the time among the dependent tasks of a cluster has the chance to save more percentage of the energy rather in the case of an only one task in a cluster.

Fig. 7

Compatible matrix.

4.2.3 Voltage and frequency scaling

This section discusses how the dedicated voltage and frequency of processors to non-critical tasks, idle, and communication phases can be reduced based on the knowledge DVFS technique. In addition, it demonstrates the case of an increase in the dedicated voltage and frequency of processors to critical tasks, which will end in a diminished consumption of the energy. The critical path in a graph of scheduled tasks located in the Gantt chart contains a set of specified timeslots for the execution of the tasks as well as the data communication between the earliest task to the latest, in which makespan is the sum of the time for the computation and communications. The best effort scheduling algorithm fails to expand makespan and modify the voltage and frequency of the processors associated with the timeslots for the execution of tasks in the critical path. However, the other timeslots voltage and frequency of the Gantt chart can be decreased. To obtain the operating frequency of a specified processor (k), formulation (26) should be followed: Assume taskt_i is a non-critical task and is executed on Processor _k. Then, the taskt_i’s execution time can be extended to slak time for task t_i without violating precedence constraints (without changing the finish time of its precursors and the start time of its successors). Processor _k operating frequency can be scaled to Processor k′.freq^op: $\begin{matrix} {Processor}_{k}^{'} . {freq}^{op} = {freq}_{highest} \\ \frac{et (t_{i}, p_{m} (v_{highest}, f_{highest}))}{slak time for task t_{i}} \end{matrix}$ (26)

Algorithm 4 shows the proposed pseudo-code algorithm for the scaling of voltage and frequency (Fig. 8).

Fig. 8

Algorithm 4.

5 Running example

In the present study, a simple tracing example is used to illustrate our proposed algorithms application to the input DAG shown in Fig. 9. The number of processors |m| is 3, and throughput constraint is assumed to be 0.04 according to formulation (28). In the first step, we execute the EATSDC algorithm on the input DAG. This algorithm serves to satisfy the throughput, makespan, and somewhat energy consumption.

Fig. 9

Running EATSDC algorithm: Throughput, makespan, and energy constraint satisfaction.

Initially, each task was assigned to an independent cluster (Line 3), then we start to compute the number of replicas for each cluster as numr(C₁)= 0.8, numr(C₂)=2, numr(C₃)=0.4, numr(C₄)=0.5, numr(C₅)=0.4, and numr(C₆)=0.2 (line 4). In this procedure, CompR(G) satisfied the throughput constraint.

Then, it was time to specify edges weighing greater than 25 and violating the throughput constraint (Lines 7 to 10). These edges included d₁₂, d₃₄ and d₃₅. An edge with the highest weight (d₁₂) was select, and then two tasks t₁ and t₂ were clustered. Edge d₃₄was selected, and task t₃ was duplicated. At the next stage, the first copy of t₃ with t₄ and another copy of t₃ with t₅ were clustered. As a result of this operation, the required throughput was met (Lines 11 to 18) (Fig. 4.b and 4.c). Given that En_cur was larger than En_c, Lines 21 through 23 of the algorithm selected the edge with the largest communication time, while the tasks clustered the beginning and end of the selected edge. This process continued until the energy constraint was achieved. Fig. 4.d shows the result of EATDCA on the input graph, which includes clusters with tasks C₁={t₁, t₂ }, C₂={t₃, t₄ }, and C₃={t′₃, t₅, t₆ }, where each cluster was assigned to a processor. Then, the b-level of each task was calculated by Algorithm 2, which is equal to bl(t₁)=52, bl(t₅)=12, bl(t₄)=20, bl(t’₃)=24, bl(t₃)=32, bl(t₂)=42, and bl(t₆)=4. The sequence of task execution in each cluster is based on their b-level. Then, Line 2 of Algorithm 3 specified the topological order of each task in DAG.

After the calculation of EST, EFT, LST, and LFT for each task t_i (Lines 3 to 5), it determines the value of slack time for each task. Since this value is zero for the tasks t₁, t₃ and t₆, these are considered critical tasks located in a critical path, whereas tasks t₃, t’₃, t₄ and t₅ are considered non-critical (Fig. 10).

Fig. 10

The use of EATSDCDD algorithm to do the operation using the DVFS technique and the distribution of the slack-time on the early sample of DAG.

Figure 11 shows the results after applying the duplication & clustering technique to the Gantt chart. To further satisfy the energy constraints, there is a need to apply a voltage/frequency scaling algorithm (Algorithm 4) to the graph derived from the output of Algorithm 3. For each time slot associated with each processor, if the processor is executing critical tasks, the processor frequency reaches its highest value (Lines 4 to 6). If these are processing non-critical tasks, the processor frequency is calculated by Equation 26 (Lines 7 to 9). And if they are in the idle and communication phases, the processor frequency changes to the lowest level (Lines 10 to 12). In this algorithm, the slack time of non-critical tasks in each cluster is distributed according to the execution time of each task. This leads to greater saving of energy consumed by processors to execute tasks in DAG (Fig. 12).

Fig. 11

Gantt chart after employ duplication & clustering technique.

Fig. 12

Energy Gantt chart after stack time distribution (employ DVFS distribution).

6 Experimental results with simulation

This section presents the experimental evaluation of our proposed heuristic algorithm, i.e., Energy-Aware Task Scheduling with Duplication Clustering Dynamic Voltage/Frequency Distribution (EATSDCDD). Moreover, the results of the proposed EATSDCDD algorithm is compared with those of some previously-introduced algorithms, i.e., Power-Aware Task Clustering (PATC) [8], Power-Aware List-based Scheduling (PALS) [8], Energy Aware Duplication Scheduling (EADUS) & TEBUS [24], as well as EATSDCD, in terms of saving and consuming energy. Moreover, we highlight the contrast of EATSDCDD algorithm with HEFT [16] and RASD [23], underlying the reduction on the execution time or makespan. The mentioned algorithm is compared to the Throughput Constrained Latency Optimization heuristic (TCLO), Throughput Constrained Latency Optimization Pipelined (TCLO-P) [25], and EATSDCD [1] in both homogeneous and heterogeneous environments, considering the value of throughput and energy consumption. The simulation results confirmed the superiority of our proposed method to Energy Aware Service Level Agreement (EASLA) [3] and PALS [8] in terms of energy saving, based on the Green Service Level Agreement and the negotiation between users and service providers at the μ rate.

6.1 Experimental setup

First of all, the settings required for our experiments are presented, including the information of target platform and the energy, throughput, and makespan constraint settings, which are used for the following performance evaluation.

6.1.1 Settings for constraints of energy, throughput, and makespan

To avoid some unreasonable constraint settings that are impossible for an input DAG to meet, a lower bound of the energy constraint (denoted as En_min), an upper bound of the throughput constraint (denoted as Th_max), and a lower bound of the makespan or critical path constraint (denoted as CP_min) were defined in this study.The maximum amount of reduction in energy consumption by processors to execute tasks can be achieved by eliminating all communication overheads and allowing no duplication (numr(t_i)=1). To that end, it is essential to execute all tasks on a single processor. Therefore, the minimum energy consumption can be calculated as shown by formulation (27). ${En}_{\min} (G) = \sum_{\forall C_{i} \in C} et (C_{i})$ (27)

Moreover, the maximum throughput can be achieved when we have |m| processors in the system and by clustering all tasks and replicating this cluster to all processors, as indicated by formulation (28). ${Th}_{\max} (G) = \frac{| m |}{\sum_{\forall C_{i} \in C} et (C_{i})}$ (28)

The minimum makespan or critical path is represented as CP_min(G) of which the realization depends on the execution of all tasks on one processor. It causes the makespan constraint due to neglecting the communication costs. One can obtain this with formulation (29) as follows: ${CP}_{\min} (G) = EFT (t_{exit})$ (29)

A generally-accepted fact is that it is impossible to achieve both the minimal energy consumption and the maximal throughput simultaneously, since they are naturally conflicting with each other. Therefore, a coefficient should be established for En_min (G) and Th_max (G) that is, four main constraints of En_min (G): 1.1En_min (G) , 1.5En_min (G) , 1.2En_min (G), and 2En_min (G), and three throughput constraints Th_max (G) are set: 0.25Th_max (G) , 0.5Th_max (G) and0.75Th_max (G).

6.1.2 Randomly-generated task graphs

There are many random graph generator tools that generate weighted application DAG, e.g., STG (standard task graph) [48]. STG is a kind of benchmark for evaluation of proposed scheduling algorithms. In our simulation experiments, Graphs are generated for all combinations of the above parameters with the number of tasks ranging between 40 and 200, with incremental steps of 40. In addition, the tasks of 100, 200, 400, and 800 in graphs are randomly selected as the costs of nodes and edges. The CCR values are considered as 0.1, 1, 5, and 10. Equation (30) indicates how CCR is calculated in a DAG. The number of levels is determined by the parallelism factor λ(0.5, 1.0, 2.0, 5.0). Every set of the above parameters is used to generate 200 random task graphs in order to avoid scattering effects and then the average result is computed. $CCR = \frac{Σ_{1 ⩽ i, j ⩽ n} ct (d_{ij})}{Σ_{1 ⩽ i ⩽ n} w_{i}}$ (30)

6.2 Simulation results

The platform of simulation environment used to evaluate our work is CloudSim toolkit [49] that operates based on Java. The CloudSim was installed in an Asus Notebook with Intel core i7-A540UP CPU 2.4 GHz with 8 cores and 4 GB memory. We created five datacenters in our simulation, and set 200 virtual machines, each involving three processors. Table 4 shows the details of the three processor types.

Table 4
Power consumption for different voltage/frequency of processors [1]

Processors AMD Opteron 2218 AMD Turion MT-34 Intel Core i3-540

Voltage (V) 1.1, 1.15, 1.15, 1.20, 1.25, 1.30 0.9, 1.0, 1.05, 1.1, 1.15, 1.2 1.125, 1.125, 1.2, 1.2, 1.3, 1.3, 1.375

Frequency (GHz) 1.0, 1.8, 2.0, 2.2, 2.4, 2.6 0.8, 1.0, 1.2, 1.4, 1.6, 1.8 3.07, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2

Highest power (W) 95 25 108

Lowest power (W) 26.16 6.25 53

Processors	AMD Opteron 2218	AMD Turion MT-34	Intel Core i3-540
Voltage (V)	1.1, 1.15, 1.15, 1.20, 1.25, 1.30	0.9, 1.0, 1.05, 1.1, 1.15, 1.2	1.125, 1.125, 1.2, 1.2, 1.3, 1.3, 1.375
Frequency (GHz)	1.0, 1.8, 2.0, 2.2, 2.4, 2.6	0.8, 1.0, 1.2, 1.4, 1.6, 1.8	3.07, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2
Highest power (W)	95	25	108
Lowest power (W)	26.16	6.25	53

The first set of experiments compared the energy saving of algorithms with respect to various graph size. This section presents the simulation results obtained from our EATSDCDD scheme. The performance of the EATSDCDD algorithm was compared with that of five other algorithms, i.e., the PALS algorithm, PATC algorithm [8], Energy Aware Duplication Scheduling (EADUS) algorithm [24], TEBUS algorithm [24], and Energy Aware Task Scheduling Duplication Clustering DVFS (EATSDCD) algorithm [1]. PALS and PATC are critical path-based scheduling algorithms that attempt to achieve the shortest schedule length through clustering tasks in critical path and employ DVFS technique to decrease the consumption of energy and makespan. On the other hand, EADUS and TEBUS use the tasks duplication strategies to schedule DAG-based parallel tasks in a cluster computing in a way to reduce power consumption. The EATSDCD algorithm allocates the slack to one task in each cluster to reduce energy consumption by combining DVFS technique with duplication and clustering scheduling algorithm. EATSDCDD can achieve up to 58.3% energy saving respectively in the simulation. The EATSDCDD algorithm distributes the time of slack to all non-critical dependent tasks of a cluster along with the mentioned techniques; thus, it results in a more efficient saving of the energy. Results show the high impact of the parameters involved with the size of DAG and CCR on saving energy. The DAG highly focuses on computations and communications in the condition of CCR = 10 and CCR = 0.1. EATSDCDD achieved better energy saving among the others. The reason was that in our proposed method, a slack time was distributed and allocated to the tasks belonging to a maximum independent set of tasks in cluster. Table 5 compares the proposed method with the other energy-aware scheduling algorithms regarding the capability of saving the energy more efficiently.

Table 5

Comparing the energy saving capacity in EATSDCDD and five other algorithms

Energy aware DAG scheduling algorithm	Techniques used	Maximum energy saving (%)
PATC [8]	DVFS &Clustering	39.7
PALS [8]	DVFS &ETF scheduling	44.3
EADUS &TEBUS [24]	Clustering &Duplication	16.8
EATSDCD [1]	Duplication &Clustering &DVFS	52.7
EATSDCDD (proposed method)	Duplication &Clustering &DVFS Distribution	58.3

The second set of experiments compares the makespan of algorithms with respect to different numbers of tasks and CCRs. The first phase of the proposed method (EATSDC) was compared with two other algorithms: Reliability Aware Scheduling (RASD) [23] and Heterogeneous Earliest Finish Time (HEFT) [16]. Figure 13 shows the makespan when running the EATSDC algorithm in different scenarios of numbers of tasks ranging from 40 to 200, with incremental step of 40, and CCR (0.1, 1, 5). The comparative results confirmed that the proposed EATSDC algorithm outperformed the others in terms of the items examined in this paper. For CCR = 0.1, CCR = 1, and CCR = 5, however, the impact of the makespan on scheduling decisions among the three algorithms is clear, that is, the incorporation of makespan overhead results in a significantly improvement of the performance. For CCR = 1, as the communication increases, Fig. 8.b reveals that EATSDC performs better, with respect to the quality of makespan. Specifically, as CCR = 5, when the communication dominates the application, Fig. 8.c shows that EATSDC significantly outperforms HEFT and RASD by 20.4% and 3.4%, respectively, in term of the average makespan.

Fig. 13

Makespan of the EATSDC, RASD, and HEFT algorithms for different CCRs.

In the third group of experiments, the number of feasible solutions of the proposed method (EATSDCDD) was compared with that of the Energy Aware Task Scheduling Duplication Clustering DVFS (EATSDCD) [1] algorithm, Parallel Pipeline Latency Optimization) PaPilo) [29] algorithm, Throughput Constrained Latency Optimization heuristic (TCLO), and Throughput Constrained Latency Optimization Pipelined (TCLO-P) [25] algorithm. It was done among the set of 100 random task graphs from the STG benchmark [45], with respect to CCR = 1, under different Energy and throughput constraints (more specifically, four energy constraints En_min (G): 1.1En_min (G) , 1.5En_min (G) , 1.2En_min (G), and 2En_min (G), and three throughput constraints Th_max (G): 0.25Th_max (G) , 0.5Th_max (G) and 0.75Th_max (G)). Therefore, there are a total of seven constraint settings. Figure 14 shows the number of feasible solutions with three throughput constraints in CCR = 1 and En_cur = 2En_min (G), En_cur = 1.5En_min (G) .

Fig. 14

The number of feasible solutions for 100 random task graph with n = 200, with different throughput and energy constraints.

From these results, the first observation indicates that EATSDCDD, PaPiLo, and EATSDCD can always find more solutions than TCLO and TCLO-P, without any violation.

Figure 15 shows the number of feasible solutions of 100 random task graph with three throughput constraints in CCR = 1 and En_cur = 1.2En_min (G). Accordingly, only EATSDCDD and EATSDCD algorithms observed the determined constraints and successfully met them, while PaPilo algorithm was not able to adapt to the specified constraints in some of the graphs. This is because in our proposed algorithm and EATSDCD, we attempt to allocate a slack to the tasks and use DVFS technique.

Fig. 15

n = 200, En_cur = 1.2En_min (G), CCR = 1.

Moreover, when the energy constraint is relatively loose, EATSDCDD can get better performance than our previous work. On the contrary, EATSDCD can get lower feasible solution than EATSDCDD under more strict energy constraint. From the feasible solutions shown in Figure 16, under the strict energy consumption (En_cur = 1.1En_min (G)), EATSDCDD can get around 10% improvement in comparison with EATSDCD. The main reason is that the proposed algorithm distributes each slack to a set of tasks in a cluster, instead of assigning it to a task, and scales frequencies down to minimize the energy consumption.

Fig. 16

n = 200, En_cur = 1.1En_min (G), CCR = 1.

Table 6 compares our algorithm with other energy aware DAG scheduling algorithms in term of max energy saving for DAG with large task size (7200 –14400) MI, Numerous nodes among (1000–2000).

Table 6

Comparing the saving of energy between EATSDCDD and two other Algorithms for large test cases

Algorithms	Technique used	Maximum energy saving (Joule %)
PAlS [8]	DVFS &ETF	32.5
EATSDCD [1]	Duplication &Clustering &DVFS	53.4
EATSDCDD (proposed method)	Duplication &Clustering &DVFS slack Distribution	61.2

The proposed method for the large test cases saves more energy, while the execution time of the proposed algorithm is longer than other algorithms.

7 Energy-performance tradeoff scheduling algorithm via Green Service Level Agreement (GSLA)

In the previous section, we proposed the method to decrease the energy consumption without increasing the task execution time. This section presents a tradeoff between energy and makespan. Accordingly, a scenario must take place in which consumers and service providers negotiate at μ rate to realize possible scheduling regarding GSLA with respect to environment. The agreement is due to a reduction of energy consumption along with an increase in the task execution length. To obtain this, formulation (31) is used: $et (t_{i}, p_{m}) . extension = et (t_{i}, p_{m} (v_{highest}, f_{highest})) \times μ$ (31) where et (t_i, p_m (v_highest, f_highest)) denotes the longest length for the execution of a specified critical task (i) on a processor (m) with the highest possible voltage/frequency pair. In addition, a determined processor (k) operating frequency in running an expanded critical task can be obtained by formulation (32): $\begin{matrix} {Processor k}^{'} . {freq}^{op} = {freq}_{highest} \\ \times \frac{et (t_{i}, p_{m} (v_{highest}, f_{highest}))}{et (t_{i}, p_{m}) . extension} \end{matrix}$ (32)

To calculate non-critical tasks and their slack-time, we use formulation (33) as follows: ${Task}_{n} . slack . extension = {task}_{n} . slack \times μ$ (33)

Also, Equation (34) expresses the operating frequency of the processor (k) in running non-critical tasks: $\begin{matrix} {Processor}_{k}^{'} . {freq}^{op} = {freq}_{highest} \\ \times \frac{et (t_{i}, p_{m} (v_{highest}, f_{highest}))}{slak time for task t_{i} \times μ} \end{matrix}$ (34)

Algorithm 5. Energy &makespan trade off scheduling algorithm
1. Input: Green SLA negotiation between Consumer and Resource provider; μ
2. schedule tasks in DAG via EATSDC alg orithm 1
3. scale down processors frequency for critical &non –- critical tasks and idle or communication phase with algorithm 6

The energy & makespan trade off algorithm is shown in Algorithm 5. Algorithm 6 may reduce the voltage and frequency of dedicated processors to run critical and non-critical tasks, idle, and communications based on GSLA and the agreed rate of μ.

Algorithm 6. DVFS Algorithm based on Green SLA negotiation
16. Begin
17. for each proc_j do
18. for each time slot in proc_j’s do
19. if proc_j execute a crititcal task t_i then
20. calculate extension time t_j based on GSLA and μ as Equation (29)
21. calculate proc_j frequency to proc_j, f^op as Equation (30)
22. end if
23. if proc_j execute a non critical task t_i then
24. calculate extension time t_j based on GSLA and μ as Equation (31)
25. calculate proc_j frequency to proc_j, f^op as Equation (32)
26. end if
27. if proc_j is idle or it executes a communication phase then
28. scale down proc_j frequency to lowest
29. end if
30. end for
31. end for
32. end

In the fourth group of experiments, the energy saving capacity of the EATSDCDD and EASLA algorithms are compared with respect to various makespan extension rates. To this end, experimental results were provided for makespan extension rates (μ) equal to 5%, 10%, 20%, 30%, and 40% and 100, 200, 400, 800 for the number of tasks (n), and the number of processors as 10, 20, 30, and 40, while the value of CCR was set to 0.2. Figure 17 shows that in terms of energy saving, EATSDCDD outperformed EASLA for various numbers of tasks. For instance, when the makespan extension rate was 40%, the ESR of EATSDCDD algorithm was greater than the EASLA algorithm by (37%, 45%), (35%, 43%), (32%, 40%), and (33%, 35%) for the number of tasks of 100, 200, 400 and 800, respectively, with 10 processors.

Fig. 17

Comparison between EATSDCDD and EASLA algorithms in terms of the energy saving in percentages, in the condition of CCR, m, n.

In the simulation for energy and makespan trade off algorithm, when the makespan extension rate was 40%, the ESR of the EATSDCDD algorithm was greater than the EASLA algorithm by (32%, 39%), (34%, 40%), (39%, 43%), and (45%, 49%) for the number of processors of 10, 20, 30, and 40, respectively, with the tasks number set to 400 as shown in Fig. 18. We can see that when the makespan extension rate increases, the energy saving increases, too.

Fig. 18

Energy savings vs. makespan extension in the condition of CCR, m, n.

Comparing the energy saving capability of the two algorithms, EATSDCDD and PALS, the fifth set of simulation results are considered. In this regard, the makespan extension rates (μ) are 10%, 20%, and 30%. Furthermore, the numbers of tasks, CCR, and the number of processors were 200, 1, 50, respectively. Figure 19 shows the results of the conducted simulation in which expanding the makespan rate will result in energy saving; if the former increases up to 30%, the latter shows the significant amount of 77%.

Fig. 19

Energy savings vs. makespan extension (CCR = 1, m = 50, n = 200).

8 Conclusion

A good energy-aware scheduling algorithm on DVFS-enabled datacenters can reduce energy consumption, which decreases the operational cost, reduces carbon footprint and CO₂ emissions in high performance distributed computing systems (HPDCSs), and increases the system reliability and availability. To this end, in the present paper, the energy-performance tradeoff scheduling algorithm (EATSDCDD) was proposed. The proposed algorithm employed the clustering & duplication technique and distributed the slack time for set of non critical tasks in each cluster under a lower voltage and frequency. We also developed a Green SLA attempting to save more amount of energy. Eventually, a set of data was established for conducting the examinations, and also different parameters of the constructed random DAG were assessed to identify the efficiency of the proposed algorithm in comparison with a cluster and duplication-based algorithm as well as those algorithms that use the DVFS technique. The results indicated that the proposed algorithm succeeded by 8.3%, 20%, and 10% more than PALS, PATC, and EATSDCD algorithms, respectively, in terms of the energy consumption and makespan. The results also demonstrated up to 16% improvement of the proposed algorithm compared with the PaPilo algorithm where En_cur = 1.2 En_min(G). In addition, the proposed algorithm was shown almost better than the EATSDCD algorithm where En_cur = 1.1 En_min(G). Furthermore, with the increase of the μ rate up to 30% and 40%, the proposed algorithm showed 7% and 6% more advances in energy saving than the PALS and EASLA algorithm, respectively.

It is worth mentioning that the proposed method with the simultaneous use of three techniques of Duplication, Clustering and dynamic voltage and frequency scaling has a longer execution time compared to those ones use only one of these three techniques. Moreover, the proposed algorithm only reduces the power consumption of processors and does not pay attention to the power consumption of other parts of a data center such as memory, etc. In the future work, we are planning to test the algorithm on various degrees of heterogeneity among processors in datacenters and subtasks. Furthermore, we will consider individual components such as disks and memory to reduce energy consumption, and we will exam the application of the energy-aware scheduling algorithm to some real applications such as sparse Cholesky decomposition. Finally, the proposed algorithm-based approach does not guarantee the converging to the global optima. Thus, the development of the proposed method for converging to the global optima; is left to the next study.

Footnotes

Acknowledgement

The authors would like to thank the anonymous reviewers and the associate editor for their insightful comments and suggestions.

References

Barzegar

, Motameni

and Movaghar

, EATSDCD: A green Energy-aware scheduling algorithm for parallel Task-based application using clustering, duplication and DVFS technique in cloud datacenters, Journal of Intelligent & Fuzzy Systems 36(6) (2019), 5135–5152.

Koomey

, Growth in data center electricity use to A report by Analytical Press, completed at the request of The New York Times, 9.2011 (2011), 161.

, Chubo

, Kenli

, Xuedi

and Keqin

, Slack allocation algorithm for energy minimization in cluster systems, Future Generation Computer Systems 74 (2017), 119–131.

Venkatachalam

and Michael

, Power reduction techniques for microprocessor systems, ACM Computing Surveys (CSUR) 37(3) (2005), 195–237.

Zhu

, Rami

and Bruce

R.C.

, Scheduling with dynamicvoltage/speed adjustment using slack reclamation in multiprocessorreal-time systems, IEEE Transactions on Parallel andDistributed Systems 14(7) (2003), 686–700.

Ullman

J.D.

, NP-complete scheduling problems, Journal of Computer and System Sciences 10(3) (1975), 384–393.

Sutha

and Nawaz

G.M.

, Energy and service level agreement aware resource allocation heuristics for cloud data centers, KSII Transactions on Internet and Information Systems (TIIS) 12(11) (2018), 5357–5381.

Wang

, Khan

S.U.

, Chen

, KołOdziej

, Ranjan

, Xu

C.Z.

and Zomaya

, Energy-aware parallel task scheduling in a cluster, Future Generation Computer Systems 29(7) (2013), 1661–1670.

Rizvandi

N.B.

, Taheri

and Zomaya

A.Y.

, Some observations on optimal frequency selection in DVFS-based energy consumption minimization, Journal of Parallel and Distributed Computing 71(8) (2011), 1154–1164.

10.

Tang

, Qi

, Cheng

, Li

, Khan

S.U.

and Li

, An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment.”, Journal of Grid Computing 14(1) (2016), 55–74.

11.

Entezari-Maleki

, Sousa

and Movaghar

, Performance and power modeling and evaluation of virtualized servers in IaaS clouds, Information Sciences 394 (2017), 106–122.

12.

Barzegar

, Rahmani

A.M.

, Zamanifar

and Divsalar

, Gravitational emulation local search algorithm for advanced reservation and scheduling in grid computing systems, Fourth International Conference on Computer Sciences and Convergence Information Technology. IEEE, 2009.

13.

Zanywayingoma

F.N.

and Yang

, Effective task scheduling and dynamic resource optimization based on heuristic algorithms in cloud computing environment, TIIS 11(12) (2017), 5780–5802.

14.

Patwal

R.S.

, Nitin

and Harish

, A novel TVAC-PSO based mutation strategies algorithm for generation scheduling of pumped storage hydrothermal system incorporating solar units, Energy 142 (2018), 822–837.

15.

Garg

, A hybrid PSO-GA algorithm for constrained optimization problems, Applied Mathematics and Computation 274 (2016), 292–305.

16.

Topcuoglu

, Salim

and Min-you

, Performance-effective andlow-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems 13(3) (2002), 260–274.

17.

Sabbir

and Huh

E.N.

, Heuristic based energy-aware resource allocation by dynamic consolidation of virtual machines in cloud data center, KSII Transactions on Internet & Information Systems 7(8) (2013).

18.

Salimi

, Motameni

and Omranpour

, Task scheduling using NSGA II with fuzzy adaptive operators for computational grids, Journal of Parallel and Distributed Computing 74(5) (2014), 2333–2350.

19.

Beloglazov

, Jemal

and Rajkumar

, Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing, Future generation computer systems 28(5) (2012), 755–768.

20.

Ding

, Qin

, Liu

and Wang

, Energy efficient scheduling of virtual machines in cloud with deadline constraint, Future Generation Computer Systems 50 (2015), 62–74.

21.

C.M.

, Ruay-Shiung

and Hsin-Yu

, A green energy-efficient scheduling algorithm using the DVFS technique for cloud datacenters, Future Generation Computer Systems 37 (2014), 141–147.

22.

Juarez

, Jorge

and Rosa

M.B.

, Dynamic energy-aware schedulingfor parallel task-based application in cloud computing, Future Generation Computer Systems 78 (2018), 257–271.

23.

Tang

, Li

and Veeravalli

, Reliability-aware scheduling strategy for heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing 70(9) (2010), 941–952.

24.

Zong

, Manzanares

, Stinar

and Qin

, Energy-aware duplication strategies for scheduling precedence-constrained parallel tasks on clusters, IEEE International Conference on Cluster Computing. IEEE, 2006.

25.

Lin

C.S.

, Lin

C.S.

, Lin

Y.S.

, Hsiung

P.A.

and Shih

, Multi-objective exploitation of pipeline parallelism using clustering, replication and duplication in embedded multi-core systems, Journal of Systems Architecture 59(10) (2013), 1083–1094.

26.

Rizvandi

N.B.

, Taheri

and Zomaya

A.Y.

, Some observations on optimal frequency selection in DVFS-based energy consumption minimization, Journal of Parallel and Distributed Computing 71(8) (2011), 1154–1164.

27.

Baskiyar

and Rabab

A.K.

, Energy aware DAG scheduling on heterogeneous systems, Cluster Computing 13(4) (2010), 373–383.

28.

Kimura

, Sato

, Hotta

, Boku

and Takahashi

, Emprical study on reducing energy of parallel programs using slack reclamation by dvfs in a power-scalable high performance cluster, 2006 IEEE international conference on cluster computing. IEEE, 2006.

29.

Vydyanathan

, Catalyurek

, Kurc

, Sadayappan

and Saltz

, Optimizing latency and throughput of application workflows on clusters, Parallel Computing 37(10–11) (2011), 694–712.

30.

Mohamed

M.A.

, Almalaq

, Awwad

E.M.

, El-Meligy

M.A.

, Sharaf

and Ali

Z.M.

, A modified balancing approach for renewable based microgrids using deep adversarial learning, IEEE Transactions on Industry Applications (2020).

31.

Yang

and Apostolos

, DSC: Scheduling parallel tasks on an unbounded number of processors, IEEE Transactions on Parallel and Distributed Systems 5(9) (1994), 951–967.

32.

Mohamed

M.A.

, Jin

and Su

, Multi-agent energy management of smart islands using primal-dual method of multipliers, Energy 208 (2020), 118306.

33.

, Gu

, Zhou

, Wei

, Liu

and Chen

, Soft error-aware energy-efficient task scheduling for workflow applications in DVFS-enabled cloud, Journal of Systems Architecture 84 (2018), 12–27.

34.

, Ren

and Quan

, Energy minimization for reliability-guaranteed real-time applications using DVFS and checkpointing techniques.”, Journal of Systems Architecture 61(2) (2015), 71–81.

35.

Safari

and Khorsand

, Energy-aware scheduling algorithm for time-constrained workflow tasks in DVFS-enabled cloud environment, Simulation Modelling Practice and Theory 87 (2018), 311–326.

36.

Safari

and Khorsand

, PL-DVFS: combining Power-awareList-based scheduling algorithm with DVFStechnique for real-time tasks in Cloud Computing, The Journalof Supercomputing 74(10) (2018), 5578–5600.

37.

Amulu

L.M.

and Ramraj

, Combinatorial meta-heuristics approaches for DVFS-enabled green clouds.” The Journal of Supercomputing (2019), 1–10.

38.

Hajiamini

, Shirazi

, Dong

and Cain

, Optimality of dynamic voltage/frequency scaling in many-core systems with voltage-frequency islands, Sustainable Computing: Informatics and Systems 24 (2019), 100344.

39.

Nejat

, Manivannan

, Pericàs

and StenströmCoordinated

, management of DVFS and cache partitioning under QoSconstraints to save energy in multi-core systems, Journal ofParallel and Distributed Computing 144 (2020), 246–259.

40.

Toor

, ul Islam

, Sohail

, Akhunzada

, Boudjadar

, Khattak

H.A.

, Din

I.U.

and Rodrigues

J.J.

, Energy and performance awarefog computing: A case of DVFS and green renewable energy, Future Generation Computer Systems 101 (2019), 1112–1121.

41.

Harish

, Rani

and Sharma

S.P.

, An efficient two phase approach for solving reliability–redundancy allocation problem using artificial bee colony technique, Computers & Operations Research 40(12) (2013), 2961–2969.

42.

Garg

and Sharma

S.P.

, A two-stage approach for reliability and maintainability analysis of an industrial system, International Journal of Reliability, Quality and Safety Engineering 19(3) (2012), 1–20.

43.

Garg

, Multi objective non-linear programming problem for reliability optimization in intuitionistic fuzzy environment, Frontiers in Information Systems 2 (2018), 197–229.

44.

Duan

, Quynh

N.V.

, Abdullah

H.M.

, Almalaq

, Do

T.D.

, Abdelkader

S.M.

and Mohamed

M.A.

, Optimal scheduling and management of a smart city within the safe framework, IEEE Access 8 (2020), 161847–161861.

45.

Barzegar

, Motameni

and Bozorgi

, Solving flexible job-shop scheduling problem using gravitational search algorithm and colored Petri net, Journal of Applied Mathematics 2012, 1–20. doi:10.1155/2012/651310.

46.

Ouyang

Y.C.

, Chiang

Y.J.

, Hsu

C.H.

and Yi

, An optimal control policy to realize green cloud systems with SLA-awareness, The Journal of Supercomputing 69(3) (2014), 1284–1310.

47.

Jensen

, Kristensen

L.M.

and Wells

, Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems, International Journal on Software Tools for Technology Transfer 9(3–4) (2007), 213–254.

48.

Kasahara

, Standard Task Graph Set.[Online] Available: http://www.kasahara.elec.waseda.ac.jp/schedule, (2008).

49.

Buyya

, Rajiv

and Calheiros

R.N.

, Modeling and simulation of scalable Cloud computing environments and the CloudSim toolkit: Challenges and opportunities, 2009 international conference on high performance computing & simulation. IEEE, 2009.