A static task scheduling algorithm for heterogeneous systems based on merging tasks and critical tasks

Abstract

A novel task scheduling algorithm called Merge Tasks and Predict Earliest Finish Time (MTPEFT) has been proposed for static task scheduling in a heterogeneous computing environment. The algorithm merges tasks satisfying constraints and assigns the best processor for the node that has at least one immediate successor as the critical node, thereby effectively reducing the schedule length without increasing the algorithm time complexity. Experiments regarding aspects of randomly generated graphs and real-world application graphs are performed, and comparisons are made based on the scheduling length ratio, robustness and frequency of the best result. The results show that the MTPEFT algorithm outperforms the PEFT, CPOP and HEFT algorithms in terms of the schedule length ratio, frequency of the best result and robustness while maintaining the same time complexity.

Keywords

Task scheduling DAG scheduling heterogeneous systems static scheduling task graphs scheduling algorithms

1. Introduction

A heterogeneous computing system (HCS) is a computation platform composed of diverse computing resources which are interconnected by a high-speed network to execute parallel and distributed applications [1]. The key of obtaining high performance in HCS, which is generally addressed by task scheduling, is to efficiently map an application on the available resources. Task scheduling aims to assign tasks to processors and to sequentially execute so that the precedence requirements are satisfied and the minimum schedule length is achieved [1, 2, 3, 4, 5]. However, the task scheduling problem in general is NP-complete [1, 6]. Thus, the task scheduling problem has been extensively studied and various heuristics have been presented in the literature [3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. These heuristics are classified into a variety of categories, such as list scheduling algorithms, task-duplication scheduling algorithms and task-clustering scheduling algorithms.

List scheduling algorithms basically consists of two phases: a task prioritizing phase, where an ordered list is constructed by assigning a priority to each task in a given directed acyclic graph (DAG), and a processor selection phase, where each task from this ordered list is assigned to the processor that minimizes the predefined cost function. The basic idea of task-duplication scheduling algorithms is to try to duplicate the parents of the current selected task onto the selected processor or onto another processor, aiming to reduce or optimize the task finish time [6, 22]. The basic idea of task-clustering scheduling algorithms is to try to schedule tasks which have high communication cost onto the same processor, so as to reduce the dependency waiting time. The task-clustering scheduling uses a clustering method [23, 24] to map all of the tasks of a given task graph to clusters and assigns processors for these clusters. The main weakness of task-duplication scheduling algorithms is that they often require higher time complexity and a large amount of processor resources. The task-clustering scheduling algorithms may sacrifice the parallelism between tasks during reducing the dependency waiting time. Therefore, the trade-off between maximizing parallelism and minimizing communication delay should be taken into consideration when designing the clustering method [25]. List scheduling algorithms are widely used since they generated good scheduling results with less complexity which is generally quadratic in relation to the number of tasks. The heterogeneous earliest finish time (HEFT) [20, 21], Critical Path On a Processor (CPOP) [21] and Predict earliest finish time (PEFT) [10] have a complexity of $O(v^{2}\cdot p)$ , where $v$ is the number of tasks and $p$ is the number of processors.

The HEFT algorithm is widely used because of its low algorithm complexity and its producing relative shorter scheduling length. In [26], the authors compared 20 types of heuristics and found that HEFT [20, 21] performs best in terms of robustness and schedule length. But the HEFT algorithm does not account for heavily communicating tasks and the parent critical tasks which determine the DAG scheduling time.

The PEFT algorithm outperforms the HEFT algorithm in terms of the scheduling length ratio and frequency of the best result while with the same time complexity. But the algorithm does not take into account the parent critical tasks, which decide the DAG scheduling time.

In this paper, based on both task-clustering techniques and list scheduling approach for HCS, a synthesized heuristic task scheduling algorithm called Merge Tasks and Predict Earliest Finish Time (MTPEFT) has been developed, aiming to outperform the HEFT, CPOP and PEFT algorithms in terms of the schedule length ratio, robustness and frequency of the best results while with the same time complexity.

The contributions of this paper mainly include: 1) A new scheduling algorithm with lower time complexity is proposed; 2) MTPEFT merges tasks satisfying constraints before task scheduling. 3) MTPEFT takes the impact on the critical task of its parent into consideration; 4) MTPEFT takes the impact of current task assignment on its children into account; 5) Results from randomly generated and real-world application graphs are presented.

The remainder of this paper is organized as follows: the task scheduling problem is introduced in Section 2; the description of the MTPEFT algorithm is made in Section 3; and the experiment results and conclusions are provided in Sections 4 and 5, respectively.

2. Task-scheduling problem

In the static task scheduling application for HCS, the application can be represented by a DAG as shown in Fig. 1, defined by tuple $(V,E)$ , where: $V$ is a set of $v$ tasks, and each task $v_{i}\in V$ represents an application task. The terms ‘task’ and ‘node’ are used interchangeably throughout the paper. $E$ is a set of directed edges among tasks. The directed $e_{i,j}\in E$ edge represents the task dependence constraint, i.e., task $v_{i}$ must finish its execution and transfer the resulting data to solve the data dependency before task $v_{j}$ starts.

The HCS is represented by a set $P$ of $p$ processors that have diverse capabilities and are fully connected. In each processor, the executions and communications of tasks can be carried out concurrently, and task execution is assumed to be non-preemptive. The $v\times p$ computation costs matrix $W$ stores the execution costs of tasks, where $v$ represents the task number, $p$ represents the processor number and each element $w_{i,j}$ denotes the estimated execution time for task $v_{i}$ in processor $p_{j}$ . Therefore, the average execution time $\overline{w_{i}}$ of task $v_{i}$ is defined as

$\overline{w_{i}}=\sum\limits_{j=1}^{p}{w_{i,j}}/p$ (1)

Table 1 shows the computation costs matrix $W .$

Table 1

Computation time matrix $W$

Task	$P_{1}$	$P_{2}$	$P_{3}$
$v_{1}$	27	23	37
$v_{2}$	19	9	40
$v_{3}$	43	27	32
$v_{4}$	7	7	37
$v_{5}$	22	20	47
$v_{6}$	27	11	25
$v_{7}$	23	44	8
$v_{8}$	27	42	31
$v_{9}$	29	23	59
$v_{10}$	14	19	37

Figure 1.

Application DAG.

$c_{i,j}$ is the cost for transferring data from task $v_{i}$ to task $v_{j}$ , namely, the communication cost of the edge $e_{i,j}\in E$ . Because $c_{i,j}$ can only be computed after the determination of where tasks $v_{i}$ and $v_{j}$ will be executed, the average communication costs are used to label the edges [10, 21]. The average communication cost $\overline{c_{i,j}}$ of an edge $e_{i,j}$ is defined as

$\overline{c_{i,j}}=\overline{L}+\frac{\textit{data}_{i,j}}{\overline{B}}$ (2)

where $\overline{L}$ is the average latency of all processors, $\overline{B}$ is the average bandwidth among processors, and $\textit{data}_{i,j}$ is the amount of data elements that $v_{i}$ sends to $v_{j}$ . If $v_{i}$ and $v_{j}$ is assigned to the same processor, $c_{i,j}$ becomes 0 for the negligiblity of the intraprocessor communication cost compared with the interprocessor communication cost. In this study, the latency is assumed to be negligible, and the bandwidth is assumed to be 1.0 [4, 15]. Consequently, the average communication cost and amount of data to be transferred will be identical.

The following will present some common attributes for task scheduling and these attributes will be referred to in the forthcoming sections.

•

$\textit{pred}(v_{i})$ [10]: Denotes the set of immediate predecessors of task $v_{i}$ in a given DAG, if $\textit{pred}(v_{i})=\phi$ , $v_{i}$ is called an entry task, denoted as $v_{\textit{entry}}$ . If a DAG has multiple entry tasks, a dummy entry task with zero weight and zero communication can be added to the graph.

•

$\textit{succ}(v_{i})$ [10]: Denotes the set of immediate successors of task $v_{i}$ in a given DAG, if $\textit{succ}(v_{i})=\phi$ , $v_{i}$ is called an exit task, denoted as $v_{\textit{exit}}$ . If there are more than one exit tasks in a DAG graph, a dummy exit task with zero weight and zero communication can be added to the graph.

•

Makespan [10, 21] (or schedule length): It is the finish time of the last task in the scheduled DAG, i.e.

$\displaystyle\textit{makespan}=\max\{\textit{AFT}(v_{\textit{exit}})\}$ (3)

where $\textit{AFT}(v_{\textit{exit}})$ is the actual finish time of the exit task.

•

Critical Path (CP) [21]: The CP of a DAG is the longest path from the entry task to the exit task in the DAG. The length of this path CP is the sum of the computation costs of the nodes and inter-node communication costs along the path. The lower bound of the schedule length is the minimum critical path length ( $\textit{CP}_{\textit{MIN}}$ ), which is accumulated by the minimum computational costs of each task in the critical path.

•

$\textit{EST}(v_{i},p_{j})$ [10, 21]: Denotes the earliest start time of task $v_{i}$ on processor $p_{j}$ , i.e.,

$\textit{EST}(v_{i},p_{j})=\max\{T_{\textit{Available}}(p_{j}),\max\limits_{v_{% m}\in\textit{pred}(v_{i})}\{\textit{AFT}(v_{m})+c_{m,i}\}\}$ (4)

where $T_{\textit{Available}}(p_{j})$ is the earliest available time for processor $p_{j}$ , $\max\limits_{v_{m}\in\textit{pred}(v_{i})}\{\textit{AFT}(v_{m})+c_{m,i}\}$ represents the arrival time of all the input data for task $v_{i}$ on processor $p_{j}$ . If task $v_{m}$ is assigned to $p_{j}$ , $c_{m,i}=0$ . For all the entry tasks, $\textit{EST}(v_{\textit{entry}},p_{j})=0$ .

•

$\textit{EFT}(v_{i},p_{j})$ [10, 21]: Denotes the earliest finish time of task $v_{i}$ on processor $p_{j}$ , which equals to the sum of the earliest start time and computation cost for task $v_{i}$ on processor $p_{j}$ , i.e.

$\textit{EFT}(v_{i},p_{j})=\textit{EST}(v_{i},p_{j})+w_{i,j}$ (5)

•

$\textit{rank}_{u}(v_{i})$ [21]: Denotes the upward rank of task $v_{i}$ , which can be calculated recursively by traversing the DAG upward starting from the exit task $v_{\textit{exit}}$ , i.e.

$\textit{rank}_{u}(v_{i})=\overline{w_{i}}+\max_{v_{j}\in\textit{succ}(v_{i})}% \{\overline{c_{i,j}}+\textit{rank}_{u}(v_{j})\}$ (6)

where $\overline{w_{i}}$ is the average execution time of task $v_{i}$ , $\textit{rank}_{u}(v_{\textit{exit}})=\overline{w_{\textit{exit}}}$ .

•

$\textit{rank}_{d}(v_{i})$ [21]: Denotes the downward rank of task $v_{i}$ , which can be calculated recursively by traversing the DAG downward starting from the entry task $v_{\textit{entry}}$ , i.e.

$\textit{rank}_{d}(v_{i})=\max\limits_{v_{j}\in\textit{pred}(v_{i})}\{\textit{% rank}_{d}(v_{j})+\overline{w_{j}}+\overline{c_{j,i}}\}$ (7)

where $\textit{rank}_{d}(v_{\textit{entry}})=0$ .

•

Critical Node [21] (CN): Is a node where the sum value of $\textit{rank}_{u}(v_{i})$ and $\textit{rank}_{d}(v_{i})$ is equal to the upward rank of entry node, i.e.

$\displaystyle\forall v_{i}\in\textit{CN},\textit{rank}_{u}(v_{i})+\textit{rank% }_{d}(v_{i})=\textit{rank}_{u}(v_{\textit{entry}})$ (8)

•

OCT [10]: Denotes a matrix in which the rows indicate the number of tasks and the columns indicate the number of processors, and when task $v_{i}$ chooses processor $p_{k}$ , each $\textit{OCT}(v_{i},p_{k})$ is the maximum value of the shortest paths from the children tasks of $v_{i}$ to the exit task. The OCT value of task $v_{i}$ on processor $p_{k}$ can be recursively defined as follows:

$\displaystyle\textit{OCT}(v_{i},p_{k})=\max\limits_{v_{j}\in\textit{succ}(v_{i% })}[\min\limits_{p_{m}\in P}\{\textit{OCT}(v_{j},p_{m})+w_{j,m}+\overline{c_{i% ,j}}\}]$ (9)

where $w_{j,m}$ is the execution time for task $v_{i}$ in processor $p_{m}$ . For the exit task, $\textit{OCT}(v_{\textit{exit}},p_{k})=0$ .

3. Related work

In this section, we introduce a brief survey of task scheduling algorithms.

Classical examples of list scheduling algorithms have been presented in [3, 4, 5, 10, 12, 15, 19, 20, 21]. The heterogeneous earliest finish time (HEFT) [20, 21] and Critical Path On a Processor (CPOP) [21] have been proposed when there are few research results from heterogeneous environments. HEFT and CPOP are designed as 2 phases: a task prioritizing phase and processor selection phase. In the task prioritizing phase, the HEFT uses a recursive procedure to calculate an upward rank for each task priority, and then generates a list in order of descending upward rank value. Compared with the HEFT, the CPOP uses different attributes to set the task priorities. The CPOP adopts the sum of the upward and downward ranks as each task priority. In the second phase, the HEFT assigns the “best” processor for each task according to the sequence in the order list, which can minimize the task completion time. Compared with the HEFT, the CPOP uses a different policy to determine the “best” processor for each selected task. The selected task will be scheduled on the critical path processor if it is on the critical path, otherwise it will be assigned to a processor according to HEFT. Fast Load Balancing (FLB) [19] reduces the time complexity of HEFT. However, the scheduling result from FLB is worse than that of HEFT for irregular task graphs and higher processor heterogeneities [10]. Rather than assigning priorities to the tasks, heterogeneous critical parent trees (HCPT) [3] proposes a new mechanism to construct the scheduling list. High Performance Task Scheduling (HPS) [5] has three phases which are level sorting, task prioritization, and processor selection. Performance Effective Task Scheduling (PETS) [4] has the same three phases as HPS. The Lookahead algorithm [12] improves upon the processor selection strategy of HEFT. To select a processor for the current task not only relies on the completion time of the current task, but also the information about the impact of the schedule decision on the allocation of current task child nodes. The Longest Dynamic Critical Path (LDCP) [15] needs to construct a DAG for each processor and must update it when a task is scheduled. Improvement heterogeneous earliest finish time (IHEFT) [22] changes the task’s upward weight calculation method to obtain a better task list by changing the task’s upward weight calculation method. Predict earliest finish time (PEFT) [10] has the same two phases as HEFT, which are only based on an optimistic cost table (OCT).

Classical examples of task duplication heuristics have been presented in [8, 11, 13, 17]. The main weakness of task-duplication scheduling algorithms is that they often require higher time complexity and a large amount of processor resources. In addition, combining duplication strategy with DVS technique two scheduling algorithms have been proposed in [27, 28] to reduce energy consumption. The algorithms presented in [29, 30] are designed to balance the energy and performance.

Classic task clustering algorithms [9, 18] are mainly applicable to homogeneous systems. Algorithms for heterogeneous systems have been proposed in [7, 14]; however, these algorithms have limitations in systems with a high degree of heterogeneity [10]. Two novel algorithms were proposed in [31, 32] to reduce energy under the condition of no increasing makespan.

Figure 2.

An example of schedule.

4. The MTPEFT algorithm

In this section, a new task scheduling algorithm for a bounded number of heterogeneous processors is introduced, called MTPEFT. The algorithm consists of three phases, namely, task merging, task prioritization, and processor selection.

In the task merging phase, MTPEFT attempts to merge tasks and minimizes communications among tasks so that it can reduce the schedule length. In the processor selection, MTPEFT assigns parents of a CN onto processors with earliest finish time (EFT), which may advance the earliest start time of the CN to reduce the schedule length. In addition, to select a processor for the current task, MTPEFT considers information not only about the EFT of the current task but also the impact of the chosen processor on the path length from the current task’s child nodes to the exit node.

The MTPEFT algorithm will be explained in details in the following sections.

4.1 The phases of MTPEFT

4.1.1 Task merging

Assume that task $v_{i}$ is a immediate predecessors of task $v_{j}$ and task $v_{j}$ is a immediate successor of task $v_{i}$ . To reduce the communication between $v_{i}$ and $v_{j}$ , $v_{i}$ and $v_{j}$ are usually assigned to the same processor in many existed scheduling algorihms. However it isn’t always the “best” processor because $v_{i}$ has been assigned to the processor which can minimize the EFT when choosing a processor for $v_{j}$ and $v_{j}$ can only be assigned to the processor which has been selected by $v_{i}$ to reduce the communication between $v_{i}$ and $v_{j}$ . For example, first to select the processor for task $v_{1}$ in Fig. 2, the EFT for $v_{1}$ in the processor $P_{1}$ is 20 and the EFT for $v_{1}$ in the processor $P_{2}$ is 15, thus $v_{1}$ is assigned to $P_{2}$ . Then to select the processor for task $v_{2}$ , the EFT for $v_{2}$ in the processor $P_{1}$ is 15 $+$ 24 $+$ 10 $=$ 49 and the EFT for $v_{2}$ in the processor $P_{2}$ is 15 $+$ 30 $=$ 45. Therefore, $v_{2}$ is assigned to $P_{2}$ . However, if $v_{1}$ and $v_{2}$ are processed as a whole and select the processor for them, the EFT for them in the processor $P_{1}$ is 20 $+$ 10 $=$ 30 and the EFT for them in the processor $P_{2}$ is 15 $+$ 30 $=$ 45, so $v_{1}$ and $v_{2}$ are assigned to $P_{1}$ .

Aiming at these problems, our algrithm proposes a task merging strategy which is described as follows.

Tasks in the DAG graph will be merged together if they satisfy the merging constraints, and they will be processed as a whole in the following task prioritization and processor selection phases. Merging constraints of tasks are as follows:

$\begin{cases}v_{j}\in\textit{succ}(v_{i})\\ \textit{succCount}(v_{i})=1\wedge\textit{isMerg}(v_{i})=\textit{false}\\ \textit{predCount}(v_{j})=1\wedge\textit{isMerg}(v_{j})=\textit{false}\\ \overline{c_{i,j}}\geqslant\overline{w_{j}}\\ \end{cases}$ (10)

where: $\textit{succCount}(v_{i})$ is the number of immediate successors of $v_{i}$ . $\textit{predCount}(v_{j})$ is the number of immediate predecessors of $v_{j}$ . $\textit{isMerg}(v_{i})=\textit{false}$ indicates that $v_{i}$ is not merged with other nodes in the DAG. $v_{j}$ will be merged into $v_{i}$ if they satisfy the above constraints, and the merged task is denoted by $v_{i}^{\ast}$ .

For example, Fig. 3 obtained from merging tasks satisfying constraint in Fig. 1. $v_{3}$ exists a unique immediate successor $v_{7}$ , i.e. $\textit{succCount}(v_{i})=1$ and $v_{3}$ is a unique immediate predecessor of $v_{7}$ , i.e. $predCount(v_{j})=1$ ; Meanwhile, $\overline{c_{3,7}}>\overline{w_{7}}$ ( $\overline{c_{3,7}}=74,\overline{w_{7}}=25$ ). $v_{3}$ and $v_{7}$ satisfy the merging condition, therefore, merge $v_{7}$ into $v_{3}$ , denoted by $v_{3}^{\ast}$ .

Figure 3.

After merging the DAG in Fig. 1.

4.1.2 Task prioritization phase

In the task prioritization phase, priorities of each node are computed and assigned with $\textit{rank}_{u}$ . The task list is sorted in descending order of $\textit{rank}_{u}$ .

4.1.3 Processor selection phase

To select a processor for a task, the earliest finish time of the task on each processor will be first calculated. The insertion-based policy is applied for computing EFT and the possibility of inserting tasks in the earliest idle time slot between two scheduled tasks on the identical processor should be taken into consideration.

The idle time slot should be at least capable of computation cost of the task to be scheduled and also scheduling on this idle time slot should preserve precedence constraints.

Then the $\textit{EFT}^{\ast}$ for the current task $v_{i}$ is calculated, which is based on $\textit{CNP}(v_{i})$ and OCT. $\textit{CNP}(v_{i})$ denotes whether $v_{i}$ is the parent of CN. If $v_{i}$ is not a CN and has an immediate successor as the CN, then $\textit{CNP}(v_{i})=\textit{true}$ ; otherwise $\textit{CNP}(v_{i})=\textit{false}$ , i.e.,

$\textit{CNP}(v_{i})=\begin{cases}\textit{true},&\text{if }\exists v_{j}\in% \textit{succ}(v_{i}),v_{j}\text{ is }\textit{CN}\text{ and }v_{i}\text{ is not% }\textit{CN}\\ \textit{false},&\text{otherwise}\\ \end{cases}$ (11)

If the current task is the parent of a CN, the value of $\textit{EFT}^{\ast}$ is equal to EFT; otherwise, the value is the summation of EFT and OCT. i.e.,

$\textit{EFT}^{\ast}(v_{i},p_{j})=\begin{cases}\textit{EFT}(v_{i},p_{j}),&\text% {if }\textit{CNP}(v_{i})=\textit{true}\\ \textit{EFT}(v_{i},p_{j})+\textit{OCT}(v_{i},p_{j}),&\text{otherwise}\\ \end{cases}$ (12)

Finally, select the processor with the minimum $\textit{EFT}^{\ast}$ value for the current task.

This processor selection policy shortens the schedule length from two ways. First, if the current task has an immediate successor that is a CN (namely, the current task is the parent of a CN), the current task is assigned to the processor that achieves the EFT, which may advance the earliest start time of critical tasks to reduce the schedule length. Second, if the current task is not the parent of a critical task, the task is assigned to a processor not only based on the EFT of the current task but also the predicted impact of the chosen processor on the path length from the current task’s child nodes to the exit nodes. Therefore, although the finish time of the chosen processor is not always the earliest, this policy ensures a shorter finish time for the tasks which reduces the schedule length.

4.2 The detailed description of the MTPEFT algorithm

The pseudo code of MTPEFT is shown as Algorithm 1.

In the algorithm, tasks that satisfy the merging conditions are first merged in Lines 1–5. Second CNP, $\textit{rank}_{u}$ , $\textit{rank}_{d}$ and OCT of all of the tasks are calculated by Line 6. Third, an empty ready-list is created, and the entry task is placed on top of the list in Line 7. Tasks with the highest $\textit{rank}_{u}$ value is scheduled in every iterative step of the while loop from Lines 9 to 19, and $\textit{EFT}^{\ast}$ values of each task on all of the processors are calculated by the MTPEFT algorithm. The minimum $\textit{EFT}^{\ast}(v_{i},p_{k})$ is obtained by $p_{k}$ in Line 18, and the processor is selected to execute task $v_{i}$ .

The time complexities of each step in the algorithm are as follows:

1.
The time complexity of tasks merging is $O(v)$ ;
2.
For calculating CNP of all the tasks, there is $O(e+v)$ , where $e$ is the number of edges;
3.
For calculating $\textit{rank}_{u}$ and $\textit{rank}_{d}$ of all the tasks, both are $O(e+v)$ ;
4.
For calculating OCT of all the tasks, there is $O(p\cdot(v+e))$ ;
5.
For assigning processors to all the tasks, there is $O(v^{2}\cdot p)$ .

Therefore, the general time complexity of MTPEFT is $O(v^{2}\cdot p)$ , which is equivalent to those of PEFT and HEFT.

Table 2 shows the values of OCT, $\textit{rank}_{u}$ , $\textit{rank}_{d}$ and CNP for all tasks in Fig. 1 by utilizing the MTPEFT algorithm. Table 3 shows the procedure of selecting processors for all the tasks in Fig. 1 with the MTPEFT algorithm.

Algorithm 1 MTPEFT algorithm

Input: DAG $G$

Output: Schedule Map

1. for each node $v_{i}$ do

2. if $v_{i}$ has a unique immediate successor $v_{j}$ and $v_{i}$ and $v_{j}$ satisfy merging conditions

then

3. $v_{j}$ can be merged into $v_{i}$ , and the merged node is denoted by $v_{i}^{\ast}$

4. end if

5. end for

6. Compute the CNP, rank ${}_{u}$ , rank ${}_{d}$ and OCT table for each node;

7. Create Empty List ready-list and put $v_{\textit{entry}}$ as initial task;

8. while ready-list is not empty do

9. $v_{i}\leftarrow$ the task with highest rank ${}_{u}$ from ready-list

10. for each processor $p_{k}$ in the processor set $P$ do

11. Compute the EFT $(v_{i}$ , $p_{k})$ value using insertion-based scheduling policy

12. if CNP $(v_{i})=$ truethen

13. Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$

14. else

15. Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$ $+$ OCT $(v_{i}$ , $p_{k})$

16. end if

17. end for

18. Assign task $v_{i}$ to the processor $p_{k}$ that minimize EFT ${}^{}$* of task $v_{i}$

19. Update ready-list

20. end while

Figure 4 shows the scheduling result of the DAG graph in Fig. 1 by algorithms of MTPEFT, PEFT, HEFT and CPOP. The scheduling length of MTPEFT is 153, which is shorter than those of PEFT, HEFT and CPOP (183, 202 and 207, respectively).
5. Experimental results and discussion

Algorithm 1 MTPEFT algorithm
1.	for each node $v_{i}$ do
2.	if $v_{i}$ has a unique immediate successor $v_{j}$ and $v_{i}$ and $v_{j}$ satisfy merging conditions
	then
3.	$v_{j}$ can be merged into $v_{i}$ , and the merged node is denoted by $v_{i}^{\ast}$
4.	end if
5.	end for
6.	Compute the CNP, rank ${}_{u}$ , rank ${}_{d}$ and OCT table for each node;
7.	Create Empty List ready-list and put $v_{\textit{entry}}$ as initial task;
8.	while ready-list is not empty do
9.	$v_{i}\leftarrow$ the task with highest rank ${}_{u}$ from ready-list
10.	for each processor $p_{k}$ in the processor set $P$ do
11.	Compute the EFT $(v_{i}$ , $p_{k})$ value using insertion-based scheduling policy
12.	if CNP $(v_{i})=$ truethen
13.	Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$
14.	else
15.	Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$ $+$ OCT $(v_{i}$ , $p_{k})$
16.	end if
17.	end for
18.	Assign task $v_{i}$ to the processor $p_{k}$ that minimize EFT ${}^{}$* of task $v_{i}$
19.	Update ready-list
20.	end while

In this section, a comparison among MTPEFT, HEFT, CPOP and PEFT will be conducted from aspects of randomly generated DAGs and real-world application graphs. The primary comparison metrics are offered first.

5.1 Comparison metrics

To compare the algorithms, three metrices have been defined as following.

(1) (1)
Scheduling Length Ratio (SLR)

SLR [10, 21] is the normalization of schedule length, i.e.

$\textit{SLR}=\frac{\textit{makespan}(\textit{solution})}{\sum\nolimits_{v_{i}% \in\textit{CP}_{\textit{MIN}}}{\min_{p_{j}\in P}(w_{i,j})}}$ (13)

where, the denominator is the sum of the minimum computation cost of the critical tasks. A lower SLR indicates a superior algorithm.

Table 2
Optimistic Cost Table (OCT), $\textit{rank}_{u}$ , $\textit{rank}_{d}$ and CNP (3) slack

Task $P_{1}$ $P_{2}$ $P_{3}$ $\textit{rank}_{u}$ $\textit{rank}_{d}$ $\textit{rank}_{u}+$ $\textit{rank}_{d}$ CNP

1 80 90 143 356.0 0.0 356.0 False

2 43 61 68 184.0 44.0 228.0 True

3 ${}^{}$ 14 19 34 102.3 68.0 170.3 False

4 43 61 68 178.3 55.0 233.3 True

5 43 42 96 236.0 120.0 356.0 False

6 41 61 68 143.7 91.0 234.7 False

8 14 19 37 91.7 143.0 234.7 True

9 14 19 37 143.3 212.7 356.0 False

10 0 0 0 23.3 332.7 356.0 False

Table 3
Schedule produced by the MTPEFT algorithm in each iteration

Step Ready Task EFT EFT ${}_{\rm OCT}$ EFT ${}^{}$ CPU

list selected P1 P2 P3 P1 P2 P3 P1 P2 P3 selected

1 1 1 27 23 37 107 113 180 107 113 180 $P_{1}$

2 2, 3 ${}^{}$ , 4, 5, 6 5 49 138 165 92 180 261 92 180 261 $P_{1}$

3 2, 3 ${}^{}$ , 4, 5 2 68 51 82 111 112 150 68 51 82 $P_{2}$

4 2, 3 ${}^{}$ , 5 4 56 60 90 99 121 158 56 60 90 $P_{1}$

5 3 ${}^{}$ , 5, 9 6 83 100 114 124 161 182 124 161 182 $P_{1}$

6 5, 8, 9 9 112 135 171 126 154 208 126 154 208 $P_{1}$

7 8, 9 3 ${}^{}$ 178 137 106 192 156 140 192 156 140 $P_{3}$

8 9 8 139 156 145 153 175 182 139 156 145 $P_{1}$

9 10 10 153 214 232 153 214 232 153 214 232 $P_{1}$

The bold emphasis indicates that the processor is assigned based on EFT, EFT ${}_{\textit{oct}}$* and EFT ${}^{}$* for the current task in each iteration, respectively.

Figure 4.
Schedules of the DAG In Fig. 1 with (a) MTPEFT (Makespan $=$ 153) (b) PEFT (Makespan $=$ 183) (c) HEFT (Makespan $=$ 202) (d) CPOP (Makespan $=$ 207).

(2)
Number of Occurrences of Better Quality of Schedules (NOBQS)

The NOBQS is the percentages of better, equal and worse schedule lengths generated by one algorithm compared to another algorithm.
(3)
Slack

Slack [10, 33] is a measure of the robustness of the schedules which shows the uncertainty in the task processing time produced by an algorithm and can be defined as

$\textit{Slack}=\left[{\sum\limits_{v_{i}\in V}{M-b_{\textit{level}}(v_{i})}-t_% {\textit{level}}(v_{i})}\right]\Bigg{/}n$ (14)

where $M$ is the makespan of the DAG, $n$ is the number of tasks, $b_{\textit{level}}$ is the length of the longest path from task $v_{i}$ to the exit task, and $t_{\textit{level}}$ denotes the length of the longest path from the entry task to task $v_{i}$ (not including $v_{i}$ ).

5.2 Randomly generated application graphs

Task	$P_{1}$	$P_{2}$	$P_{3}$	$\textit{rank}_{u}$	$\textit{rank}_{d}$	$\textit{rank}_{u}+$ $\textit{rank}_{d}$	CNP
1	80	90	143	356.0	0.0	356.0	False
2	43	61	68	184.0	44.0	228.0	True
3 ${}^{*}$	14	19	34	102.3	68.0	170.3	False
4	43	61	68	178.3	55.0	233.3	True
5	43	42	96	236.0	120.0	356.0	False
6	41	61	68	143.7	91.0	234.7	False
8	14	19	37	91.7	143.0	234.7	True
9	14	19	37	143.3	212.7	356.0	False
10	0	0	0	23.3	332.7	356.0	False

Step	Ready	Task	EFT	EFT ${}_{\rm OCT}$	EFT ${}^{*}$	CPU
1	1	1	27	23	37	107	113	180	107	113	180	$P_{1}$
2	2, 3 ${}^{*}$ , 4, 5, 6	5	49	138	165	92	180	261	92	180	261	$P_{1}$
3	2, 3 ${}^{*}$ , 4, 5	2	68	51	82	111	112	150	68	51	82	$P_{2}$
4	2, 3 ${}^{*}$ , 5	4	56	60	90	99	121	158	56	60	90	$P_{1}$
5	3 ${}^{*}$ , 5, 9	6	83	100	114	124	161	182	124	161	182	$P_{1}$
6	5, 8, 9	9	112	135	171	126	154	208	126	154	208	$P_{1}$
7	8, 9	3 ${}^{*}$	178	137	106	192	156	140	192	156	140	$P_{3}$
8	9	8	139	156	145	153	175	182	139	156	145	$P_{1}$
9	10	10	153	214	232	153	214	232	153	214	232	$P_{1}$

In this section, the generation method of random graphs applied in the experiments is presented and the performance of the MTPEFT, HEFT, CPOP and PEFT algorithms are compared.

5.2.1 Random graph generator

A composite DAG generator [10, 34] is adopted as the random graph generator in this paper and its primary parameters are as follows:

•
$n$ : The number of nodes in the DAG;
•
fat: Affects the height and width of the DAG; the width in each level is defined by a uniform distribution that equals $fat.\sqrt{n}$ ; the number of levels (or the height) is calculated until $n$ tasks are defined in the DAG; and the width of the DAG is the maximum number of tasks that can be executed synchronously;
•
density: Determines the number of edges between the two levels of the DAG;
•
regularity: Determines the uniformity of task numbers in every level; a lower value indicates that there is a dissimilarity of task numbers among levels;
•
jump: Distance of an edge from level $l$ to level $l+\textit{jump}$ ; a jump of 1 indicates a direct connection between two consecutive levels.

Different DAG structures are established by setting different parameter values to the generator. The following arguments are used to calculate the computation and communication costs.

Figure 5.
For randomly generated DAGs: (a) average SLR for different numbers of tasks. The bars represent the 95% confidence intervals of the mean. (b) average slack for different numbers of tasks.

•
.CCR (ratio of communication to computation): The rate between the sum of the edge weights and the sum of the node weights in a DAG;
•
$\beta$ (range of computing cost percentage on the processors): the basic heterogeneous factor of processor speed. If $\beta$ is high, it indicates a higher degree of heterogeneity and distinct computation costs among processors, while a low one means that the computation costs for a specific task are almost equal among processors [10, 21]; the average computing cost $\overline{w_{i}}$ of task $n_{i}$ is randomly selected in the range of the uniform distribution $[0,2\times\overline{w_{\textit{DAG}}}]$ , where $\overline{w_{\textit{DAG}}}$ is the average computing cost of a given DAG that is randomly generated. The computing cost of task $n_{i}$ for each processor $p_{j}$ is randomly set in the following range:

$\overline{w_{i}}\times\left({1-\frac{\beta}{2}}\right)\leqslant w_{i,j}% \leqslant\overline{w_{i}}\times\left({1+\frac{\beta}{2}}\right)$ (15)

In the experiments, arguments below are used for generating random graph.

•
$n=[10,20,30,40,50,60,70,80,90,100,120,150,200,250,300,350,400]$
•
$\textit{fat}=[0.1,0.4,0.8]$
•
$\textit{density}=[0.2,0.8]$
•
$\textit{regularity}=[0.2,0.8]$
•
$\textit{jump}=[1,2,4]$
•
$\textit{CCR}=[0.1,0.25,0.8,1,2,5,8,10]$
•
$\beta=[0.1,0.2,0.5,0.75,1,2]$
•
$\textit{Processors}=[4,8,16,32]$

Accurately 117,504 DAGs are generated by these arguments combinations, and for each DAG, 20 random graphs are generated with the same structure but with different node and edge weights. Thus, the total number of DAGs comes up to 2,350,080.
5.2.2 Performance results

For randomly generated DAGs, the average SLR and the average slack for different numbers of tasks are shown in Fig. 5a and b respectively.

Compared to the HEFT algorithm, when the task quantity is 10, the average SLR of the MTPEFT algorithm is 15% lower than that of the HEFT algorithm, and when the number of tasks reaches 100 and 400, the decrement percentages decrease to 8.4% and 6.6%, respectively. This implies that the effect of key child nodes on scheduling decisions for parent nodes decrease for an increasing number of tasks. Meanwhile, the MTPEFT algorithm achieves smaller average slack than that of the HEFT algorithm with different task quantities. Compared with the PEFT and CPOP algorithm, the MTPEFT algorithm also obtains lower average SLR and average slack. Thus compared to the CPOP, HEFT and PEFT algorithms the MTPEFT algorithm does not only make improvement on SLR, but also achieves better robustness. For random graphs, CPOP is the worst algorithm. This is because CPOP just schedules all critical nodes to the key processor without considering the impact of non-critical nodes’ being assigned to the key processor and parents of a critical node on the earliest start time of critical nodes.

Figure 6a presents the average SLR of the algorithms for CCR values of [0.1, 0.5, 0.8, 1, 2, 5, 8, 10]. The MTPEFT algorithm outperforms the other algorithms in terms of the average SLR for random task graphs with various values of CCR. The average SLRs for different heterogeneity values is illustrated in Fig. 6b. The average SLR obtained by the MTPEFT algorithm is smaller than those of the CPOP, HEFT and PEFT algorithms by (7.9%, 3.8%, 3.7%), (7.5%, 4.4%, 3.8%), (8.7%, 5%, 4%), (9.8%, 5.7%, 4.2%), (12.0%, 6.7%, 4.4%) and (21.8%, 13.6%, 4.5%) when the heterogeneity value is equal to 0.1, 0.2, 0.5, 0.8, 1 and 2, respectively.

Table 4 lists the percentage of better, equal, and worse scheduling lengths generated by MTPEFT compared to other algorithms. Compared with the CPOP, HEFT and PEFT algorithm, the MTPEFT algorithm gets better performance in 87%, 82% and 67% during scheduling, equivalent performance in 3%, 6% and 21% of scheduling, and worse performance in 10%, 12% and 12% of scheduling, respectively.

Table 4
Pairwise schedule length comparison of the scheduling algorithms

		MTPEFT	PEFT	HEFT	CPOP
MTPEFT	Better	*	67%	82%	87%
	Worse		12%	12%	10%
	Equal		21%	6%	3%
PEFT	Better	12%	*	70%	74%
	Worse	67%		25%	20%
	Equal	21%		5%	6%
HEFT	Better	12%	25%	*	50%
	Worse	82%	70%		39%
	Equal	6%	5%		11%
CPOP	Better	10%	20%	39%	*
	Worse	87%	74%	50%
	Equal	3%	6%	11%

Figure 6.

For randomly generated DAGs: (a) Average SLR for different CCRs. The bars represent the 95% confidence intervals of the mean. (b) Average SLR for different heterogeneous values. The bars represent the 95% confidence intervals of the mean.

5.3 Real-world application graphs

In addition to randomly generated task graphs, application graphs of three real-world problems are also considered: Fast Fourier Transform [21], Montage [35] and LIGO [36]. In these graphs, the values of CCRs, $\beta$ and processors are set as follows:

•
$\textit{CCR}=[0.1,0.25,0.8,1,2,5,8,10];$
•
$\beta=[0.1,0.2,0.5,0.75,1,2];$
•
$\textit{Processors}=[2,4,8,16,32]$ .

5.3.1 Fast fourier transform

Fast Fourier Transform (FFT) if composed of recursive calls and butterfly operation. The number of FFT points (n) decides the number of tasks, meaning that there are $2\times(n-1)$ recursive call tasks and $n\log_{2}n$ butterfly operation tasks. In this experiment, the value set [2, 4, 8] is assigned to $n$ . The combination of CCRs, $\beta$ , processors and $n$ produce 720 distinct DAGs. Each DAG contains 1000 different random graphs, which are generated with the same structure but with different edge and node weights. As a result, the total number of DAGs comes up to 720,000.

Figure 7.

For the Fast Fourier Transform DAGs: (a) Average SLR for different heterogeneous values. The bars represent the 95% confidence intervals of the mean. (b) Average SLR for different CCRs. The bars represent the 95% confidence intervals of the mean.

The comparisons of the average SLRs for different levels of heterogeneity and for different CCRs are shown in Fig. 7a and b respectively. The experiment analysis indicates that the average SLRs of the MTPEFT algorithm are 1.4%, 1.8%, 3.2%, 5.5%, 7.2% and 17.8% smaller than those of the HEFT algorithm and 2.6%, 2.8%, 3.1%, 5.0%, 5.5% and 5.8% smaller than those of the PEFT algorithm when the heterogeneity values are 0.1, 0.2, 0.5, 0.75, 1 and 2, respectively. For different CCRs, the MTPEFT algorithm obtains smaller average SLR than those of the HEFT and PEFT algorithms. In this type of application, the results obtained by MTPEFT algorithm were similar to those obtained by CPOP algorithm.

5.3.2 Montage

The second real-world application DAG is the Montage task DAG with the task number of 25. The combination of CCRs, $\beta$ , processors produce 240 different DAGs. There are 1000 different random graphs generated with the same structure, but distinct in edge and node weights for each graph. Consequently, there is up to 240,000 DAGs in our experiment.

The comparisons of average SLRs for different levels of heterogeneity and different CCRs are illustrated in Fig. 8a and b. The experiment analysis tells that, when the heterogeneities take values 0.1, 0.2, 0.5, 0.75, 1 and 2, the average SLRs of the MTPEFT algorithm are respectively 12.7%, 13.0%, 13.7%, 14.7%, 15.0% and 23.2% smaller than those of the HEFT algorithm, respectively 8.3%, 9.2%, 10.5%, 11.0%, 11.7% and 17.9% smaller than those of the HEFT algorithm, and respectively 4.7%, 5.2%, 4.9%, 5.0%, 4.3% and 4.0% smaller than those of the PEFT algorithm. For different CCRs, the MTPEFT algorithm obtains smaller average SLR than those of the CPOP, HEFT and PEFT algorithms.

Figure 8.

For Montage DAGs: (a) Average SLR for different heterogeneous values. The bars represent the 95% confidence intervals of the mean. (b) Average SLR for different CCRs. The bars represent the 95% confidence intervals of the mean.

Figure 9.

For LIGO DAGs: (a) Average SLR for different heterogeneous values. The bars represent the 95% confidence intervals of the mean. (b) Average SLR for different CCRs. The bars represent the 95% confidence intervals of the mean.

5.3.3 LIGO

The thrid real-world application DAG is the LIGO task DAG with the task number of 30 and 50. The combination of CCRs, $\beta$ , processors produce 240 different DAGs. For each DAG, 1000 different random graphs are generated with the same structure but with different edge and node weights. Thus, the total number of DAGs comes up to 480,000.

Figure 9 presents the average SLR as different parameters. In this type of application, MTPEFT obtains preferable SLR values than those of other algorithms. Particularly the higher $\beta$ is, the more improvement of SLR obtained by the MTPEFT algorithm (see Fig. 9a). Concerning the CCRs, the average SLR improvement of MTPEFT over HEFT for CCR $=$ 0.1 was 2.5% and increased to 14.8% for CCR $=$ 10 (see Fig. 9b). In contrast, the average SLR improvement of MTPEFT over CPOP for CCR $=$ 0.1 was 24% and decreased to 16.9% for CCR $=$ 10.

6. Conclusions

We have proposed a new task scheduling algorithm, MTPEFT with quadratic time complexity for heterogeneous computing systems. The MTPEFT algorithm first merges tasks that satisfy constraints, then assigns priority for each task, and finally selects a processor for each task. In the processor selection phase, the impact of the parent task on the critical task is considered, and each parent is assigned to the processor that minimizes the EFT, which may advance the earliest start time of the critical task and improve scheduling performance. In the meantime, to select a processor for the current task which does not contain sub-task as critical task, the impact of the assignment on all children of the current task is taken into account. In this manner, the completion time for the selected processor to complete the current task is not always the earliest, but it ensures shorter finish time for the tasks in the following steps, which consequently reduce the scheduling length of the DAG.

Our algorithm has the same time complexity as CPOP, HEFT and PEFT algorithms but performs better. Regarding schedule length, the MTPEFT algorithm achieved more favorable results than the HEFT and PEFT algorithms in randomly graphs when the number of tasks ranges from 10 to 300. The robustness of the MTPEFT algorithm is also better than those of CPOP, HEFT and PEFT algorithms. In addition, the MTPEFT algorithm outperformed CPOP, HEFT and PEFT algorithms in terms of the frequency of the best results. The MTPEFT algorithm also outperformed HEFT and PEFT algorithms on real-world application graphs.

In our future work, we intend to extend the algorithm by taking Quality of Service parameters such as cost, reliability and energy into consideration.

Conflict of interest

The authors confirm that this article content has no conflicts of interest.

Footnotes

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61402183 and 61070015), Guangdong Provincial Science and Technology Projects (Grant Nos. 2014B090901028, 2014B010117001, 2014A010103022 and 2014A010103008), and Fundamental Research Funds for the Central Universities, SCUT (No. x2jsD2153930 ).

References

Dai

and Zhang

, A synthesized heuristic task scheduling algorithm, The Scientific World Journal 2014(10) (2014), 1–9.

Feitelson

D.G.

Rudolph

Schwiegelshohn

Sevcik

K.C.

and Wong

, Theory and practice in parallel job scheduling, in: Proceedings of the Job Scheduling Strategies for Parallel Processing, Springer (1997), 1–34.

Hagras

and Janecek

, A simple scheduling heuristic for heterogeneous computing environments, in: Proceedings of the Parallel and Distributed Computing, International Symposium on, IEEE (2003), 104–110.

Ilavarasan

and Thambidurai

, Low complexity performance effective task scheduling algorithm for heterogeneous computing environments, Journal of Computer Science 3(2) (2007), 94–103.

Ilavarasan

Thambidurai

and Mahilmannan

, High performance task scheduling algorithm for heterogeneous computing system, Distributed and Parallel Computing, Springer (2005), 193–203.

Hagras

and Janeček

, A high performance, low complexity algorithm for compile-time task scheduling in heterogeneous systems, Parallel Computing 31(7) (2005), 653–670.

Boeres

Filho

J.V.

and Rebello

V.E.F.

, A cluster-based strategy for scheduling task on heterogeneous processors, in: Proceedings of 16th Symposium on Computer Architecture and High Performance Computing, IEEE (2004), 214–221.

Ahmad

and Kwok

Y.K.

, On exploiting task duplication in parallel program scheduling, IEEE Transactions on Parallel and Distributed Systems 9(9) (1998), 872–892.

Yang

and Gerasoulis

, DSC: Scheduling parallel tasks on an unbounded number of processors, IEEE Transactions on Parallel and Distributed Systems 5(9) (1994), 951–967.

10.

Arabnejad

and Barbosa

J.G.

, List scheduling algorithm for heterogeneous systems by an optimistic cost table, IEEE Transactions on Parallel and Distributed Systems 25(3) (2014), 682–694.

11.

Bajaj

and Agrawal

D.P.

, Improving scheduling of tasks in a heterogeneous environment, IEEE Transactions on Parallel and Distributed Systems 15(2) (2004), 107–118.

12.

Bittencourt

L.F.

Sakellariou

and Madeira

E.R.M.

, Dag scheduling using a lookahead variant of the heterogeneous earliest finish time algorithm, in: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, IEEE (2010), 27–34.

13.

Choe

T.Y.

and Park

C.I.

, A task duplication based scheduling algorithm with optimality condition in heterogeneous systems, in: Proceedings of the International Conference on Parallel Processing Workshops, IEEE (2002), 531–536.

14.

Cirou

and Jeannot

, Triplet: A clustering scheduling algorithm for heterogeneous systems, in: Proceedings of the International Conference on Parallel Processing Workshops, Valencia, IEEE (2001), 231–236.

15.

Daoud

M.I.

and Kharma

, A high performance algorithm for static task scheduling in heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing 68(4) (2008), 399–409.

16.

Kwok

Y.K.

and Ahmad

, Dynamic critical-path scheduling: An effective technique for allocating task graphs to multiprocessors, IEEE Transactions on Parallel and Distributed Systems 7(5) (1996), 506–521.

17.

Zhou

and Shixin

, Scheduling algorithm based on critical tasks in heterogeneous environments, Journal of Systems Engineering and Electronics 19(2) (2008), 398–404.

18.

Liou

J.C.

and Palis

M.A.

, An efficient task clustering heuristic for scheduling dags on multiprocessors, in: Proceedings of the Workshop on Resource Management, Symposium of Parallel and Distributed Processing (1996), 152–156.

19.

Radulescu

and Gemund

A.J.C.V.

, Fast and effective task scheduling in heterogeneous systems, in: Proceeding of the 9th Heterogeneous Computing Workshop, IEEE (2000), 229–238.

20.

Topcuoglu

Hariri

and Wu

M.Y.

, Task scheduling algorithms for heterogeneous processors, in: Proceeding of the 8th Heterogeneous Computing Workshop, IEEE (1999), 3–14.

21.

Topcuoglu

Hariri

and Wu

M.Y.

, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems 13(3) (2002), 260–274.

22.

Wang

X.L.

Huang

H.B.

and Deng

, List scheduling algorithm for static task with precedence constraints for cyber-physical systems, Acta Automatica Sinica 38(11) (2012), 1870–1879.

23.

Wang

Zheng

Qiu

and Zheng

, Research on schedule-based user recommendation model based on improved K-means algorithm, Journal of Computational Methods in Sciences & Engineering 16(3) (2016), 1–10.

24.

Zhou

Wang

Yang

et al., Decision tree based medical image clustering algorithm in computer-aided diagnoses, Journal of Computational Methods in Sciences & Engineering 15(4) (2015), 645–651.

25.

Cao

Shaohe

L.V.

et al., A comparative study of DAG clustering, in: Proceeding of the International Conference on Information Society, IEEE (2015), 84–89.

26.

Canon

L.C.

Jeannot

et al., Comparative evaluation of the robustness of dag scheduling heuristics, in: Proceedings of Grid Computing, Achievements and Prospects (2008), 73–84.

27.

Mei

and Li

, Energy-aware scheduling algorithm with duplication on heterogeneous computing systems, in: Proceedings of ACM/IEEE International Conference on Grid Computing, IEEE (2012), 122–129.

28.

Zong

Manzanares

Ruan

and Qin

, EAD and PEBD: Two energy-aware duplication scheduling algorithms for parallel tasks on homogeneous clusters, IEEE Transactions on Computers 60(3) (2011), 360–374.

29.

Liang

and Pang

, A novel, energy-aware task duplication-based scheduling algorithm of parallel tasks on clusters, Mathematical and Computational Applications 22(2) (2017), 2–13.

30.

Liu

and Chen

, Duplication based energy aware scheduling algorithm in heterogeneous many-core system-on-chip, Journal of Computational Information Systems 11(8) (2015), 2981–2988.

31.

Liang

Xiao

and Zhang

, Energy aware scheduling for precedence constrained parallel tasks in a power-scalable cluster, in: Proceedings of IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, IEEE (2013), 1016–1021.

32.

Wang

Khan

S.U.

et al., Energy-aware parallel task scheduling in a cluster, Future Generation Computer Systems 29(7) (2013), 1661–1670.

33.

Shi

Jeannot

and Dongarra

J.J.

, Robust task scheduling in non-deterministic heterogeneous computing systems, in: Proceedings of IEEE International Conference on Cluster Computing, IEEE (2006), 1–10.

34.

Suter

, A synthethic task graph generator, https://github.com/frs69wq/daggen [10 May 2015].

35.

Deelman

Singh

et al., Pegasus: A framework for mapping complex scientific workflows onto distributed systems, Scientific Programming 13(3) (2005), 219–237.

36.

Juve

Chervenak

et al., Characterizing and profiling scientific workflows, Future Generation Computer Systems 29(3) (2013), 682–692.

Algorithm 1 MTPEFT algorithm
Input: DAG $G$
Output: Schedule Map
1.	for each node $v_{i}$ do
2.	if $v_{i}$ has a unique immediate successor $v_{j}$ and $v_{i}$ and $v_{j}$ satisfy merging conditions
	then
3.	$v_{j}$ can be merged into $v_{i}$ , and the merged node is denoted by $v_{i}^{\ast}$
4.	end if
5.	end for
6.	Compute the CNP, rank ${}_{u}$ , rank ${}_{d}$ and OCT table for each node;
7.	Create Empty List ready-list and put $v_{\textit{entry}}$ as initial task;
8.	while ready-list is not empty do
9.	$v_{i}\leftarrow$ the task with highest rank ${}_{u}$ from ready-list
10.	for each processor $p_{k}$ in the processor set $P$ do
11.	Compute the EFT $(v_{i}$ , $p_{k})$ value using insertion-based scheduling policy
12.	if CNP $(v_{i})=$ truethen
13.	Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$
14.	else
15.	Compute EFT ${}^{}(v_{i}$* , $p_{k})=$ EFT $(v_{i}$ , $p_{k})$ $+$ OCT $(v_{i}$ , $p_{k})$
16.	end if
17.	end for
18.	Assign task $v_{i}$ to the processor $p_{k}$ that minimize EFT ${}^{}$* of task $v_{i}$
19.	Update ready-list
20.	end while

Step	Ready	Task	EFT			EFT ${}_{\rm OCT}$			EFT ${}^{*}$			CPU
	list	selected	P1	P2	P3	P1	P2	P3	P1	P2	P3	selected
1	1	1	27	23	37	107	113	180	107	113	180	$P_{1}$
2	2, 3 ${}^{*}$ , 4, 5, 6	5	49	138	165	92	180	261	92	180	261	$P_{1}$
3	2, 3 ${}^{*}$ , 4, 5	2	68	51	82	111	112	150	68	51	82	$P_{2}$
4	2, 3 ${}^{*}$ , 5	4	56	60	90	99	121	158	56	60	90	$P_{1}$
5	3 ${}^{*}$ , 5, 9	6	83	100	114	124	161	182	124	161	182	$P_{1}$
6	5, 8, 9	9	112	135	171	126	154	208	126	154	208	$P_{1}$
7	8, 9	3 ${}^{*}$	178	137	106	192	156	140	192	156	140	$P_{3}$
8	9	8	139	156	145	153	175	182	139	156	145	$P_{1}$
9	10	10	153	214	232	153	214	232	153	214	232	$P_{1}$

A static task scheduling algorithm for heterogeneous systems based on merging tasks and critical tasks

Abstract

Keywords

1. Introduction

2. Task-scheduling problem

4.1 The phases of MTPEFT

4.1.1 Task merging

4.1.3 Processor selection phase

5.1 Comparison metrics

5.2.1 Random graph generator

Table 4 Pairwise schedule length comparison of the scheduling algorithms

• 𝐶𝐶𝑅 = [ 0.1 , 0.25 , 0.8 , 1 , 2 , 5 , 8 , 10 ] ; • β = [ 0.1 , 0.2 , 0.5 , 0.75 , 1 , 2 ] ; • 𝑃𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠 = [ 2 , 4 , 8 , 16 , 32 ] . 5.3.1 Fast fourier transform

6. Conclusions

Conflict of interest

Footnotes

Acknowledgments

References

Table 4
Pairwise schedule length comparison of the scheduling algorithms

•
$\textit{CCR}=[0.1,0.25,0.8,1,2,5,8,10];$
•
$\beta=[0.1,0.2,0.5,0.75,1,2];$
•
$\textit{Processors}=[2,4,8,16,32]$ .

5.3.1 Fast fourier transform