Abstract
Energy saving in multiprocessor system is emerging as a primary issue in recent years. Mostly the work available has stressed on reducing makespan of application and little attention has been paid to energy management. In this paper, a novel dynamic power management based clustering task scheduling algorithm is proposed for heterogeneous computing platforms. The prime objective of this paper is to reduce the dynamic and static power consumption of the processors. This paper couples clustering scheduling heuristics with dynamic power management technique to reduce energy consumption. The simulation based results of the proposed energy-aware clustering task scheduling algorithm with dynamic power management are analysed and compare with well-known list based heterogeneous earliest finish time (HEFT) and list based energy-aware precedence-constrained tasks algorithm (PASTA) algorithm. The result carried out for an extensive set of random and real world graphs demonstrate the potency of the proposed algorithm.
Keywords
Introduction
Multiprocessor platform is composed of two or more central processing units (CPUs) that execute tasks of parallel application with high performance and low computation cost. The significant turnover in list of most powerful supercomputers by Top500 from previous lists imparted the necessity of such high computation system these days [1]. However, this architecture has its own challenges such as load balancing, resource utilization, intercommunication of processor, and process synchronization. These challenges can be dealt with help of judicious scheduling algorithms. In recent years, power management of such systems has also become a concern issue for research fraternity. This revolution is result of the rapid increase of number of users of computer, laptops, and mobile phones day by day. On the motherboard of portable devices, processors are found to be the most energy consuming component following the memory units and other components [2]. Adequate energy saving in these computing infrastructures benefit many monetary and environmental aspects such as less electricity consumption, heat dissipation, maintenance cost, greenhouse emission, and more reliability. Several power management techniques are suggested at circuits, architectures, system software, and applications level [3].
Dynamic and static energy are two main sources of energy dissipation in processors. Dynamic energy dissipation occurs due the switching of the transistor at run time. Static energy dissipation occurs due to the continuous flow of current in gates and also when a transistor switches [4]. With the advancement of the technology, chip miniaturization, improved idle states, fast memory cache, and increase in number of transistors, the static energy dissipation has dominated dynamic power dissipation [5]. Hence, it’s important to focus both static as well as dynamic energy consumption while scheduling.
In this paper, a new energy-aware scheduling algorithm is proposed for directed acyclic graph (DAG) for heterogeneous computing platforms that judiciously builds the energy-aware clusters to reduce dynamic energy consumption. Further, the dynamic power management (DPM) techniques are made use of by exploiting three low power states for the available idle slots to reduce static energy consumption. The main contributions of the proposed energy-aware clustering task scheduling algorithm with dynamic power management (CDPM) are as:
As clustering scheduling heuristics are helpful in reducing makespan, we focus on reducing energy consumption for such heuristics. We focus on reducing static energy consumption by exploiting DPM with clustering heuristics besides reducing dynamic energy consumption. The different low power states of a processor are considered while utilizing DPM technique along with context switching overheads. The target machine model undertaken is heterogeneous computing system.
The rest of the paper is organized as follows: In Section 2, related work is presented from literature. Section 3 discusses task model, a machine model and an energy consumption model that are adopted in this paper. Next Section 4 presents proposed task scheduling strategy. Simulation platform and results for random and regular graph suites are stated in Section 5. Finally, Section 6 presents the conclusions.
DAG scheduling is to map tasks onto machines and order their execution such that task precedence requirements are satisfied with minimum makespan. In general, the scheduling can be performed in two ways: online scheduling and offline scheduling. Former is performed at run time where the behavior of the task may or may not be known. In the latter, scheduling is performed at compile time, knowing the behavior of upcoming tasks thus creates better schedules with lesser scheduling overhead than the former one. Task scheduling problem is NP-complete [7, 13] and many heuristics based scheduling algorithms list, cluster, and duplication are given in literature [23, 24]. The idea of list based algorithms is to assign rank to the tasks and then map them to the processors. Cluster based algorithms involve grouping of tasks in clusters such that intercommunication among processors is minimized. Duplication based algorithms duplicate the predecessor of a task on many processor to get shorter makespan, thereby eliminating unnecessary communication delay among processors.
Dynamic voltage and frequency scaling (DVFS) and dynamic power management (DPM) are two basic techniques used for energy management. DVFS enables the processor to run at different speeds by scaling the voltage and frequency level. The energy consumed by the processor is directly proportional to the speed of the processor. Hence, more the processor run at low speed less will be the energy consumed [3]. The challenge in this technique is to calculate the adequate voltage at which a task is to execute. If for a certain slack time, a task is executed at lower voltage it may finishes late and can increase makespan [2]. Recently, incorporation of DVFS technique on existing list based [6, 7, 8] and duplication based [9, 10] scheduling heuristics have been targeted for heterogeneous platforms. Some clustering based heuristics also have been suggested to incorporate with DVFS but they focused homogenous platforms [11, 12]. In actual, DVFS technique requires special hardware build with complementary metal-oxide-semiconductor (CMOS) technology that incurs extra cost and scaling the voltage and frequency level has negative impact on reliability of processor [13].
Another DPM technique dynamically reconfigures an electronic system to reduce the number of active components and reduces static energy consumption. For example, in laptops a timeout policy turns off some components after a fixed inactivity to save the energy. However, state transition from active to idle, accompany some time units to turn off. To retain the active state from an idle state some energy and time penalty is involved and is called context switching overhead in this paper. The challenge lying here is that a processor cannot be turned off for all the idle periods because this penalty will have to be considered for every idle to active transition of the processor. For instance, Ma et al. focused clustering and duplication heuristics to schedule tasks on homogenous platforms and used time threshold for idle periods to deploy DPM technique [14].
In above works, the scheduling algorithms that were originally designed to reduce makespan are incorporated with DVFS and DPM techniques further to make them energy efficient. On other hand, Mei et al. focused to design duplication based energy-aware algorithms that duplicates the task on basis of both time and energy parameters [15]. Kaur et al. also proposed duplication controlled energy efficient scheduling and further deploy DPM technique to reduce static energy consumption [16, 18]. Some researchers have proposed to reduce the number of processor required to finish the tasks which is a good strategy to reduce the total energy consumption [17, 19]. Many researchers have proposed multi-objective scheduling algorithms including energy as one of the objectives [20, 21, 22]. However, these techniques are not suitable for precedence constrained parallel applications.
From the above works, majority the researchers have targeted list and duplication based energy-aware algorithm and cluster based heuristics are less focused. Further, the suggested works have taken homogeneous platform as target machine model. As heterogeneous systems are suitable to execute computation extensive applications, we have considered such platform in this paper. In addition, researchers in past years have focused more to reduce dynamic energy consumption. With chip miniaturization static energy consumption is surpassing dynamic energy consumption and DVFS is becoming less efficient, hence need to be addressed. Therefore, DPM technique with several low power states is considered to reduce static energy consumption besides dynamic energy consumption. Keeping in view the above stated issues, a cluster heuristic based with DPM method is proposed that reduce dynamic as well as static power consumption for heterogeneous computing system (HCS).
Models
This section presents the formal description and the unifying notation for processor, task and energy consumption models. Table 1 enlists the notations used in the proposed CDPM algorithm.
Notations used in the proposed CDPM algorithm
Notations used in the proposed CDPM algorithm
The target system has fully connected processors and the communication between the processors is done via message transfer. Let
Task model
A parallel application is decomposed into subtasks and dependencies among subtasks can be represented in the form of a directed acyclic graph (DAG). To execute the application correctly, the order of execution of tasks should be according to the precedence constraints. A DAG denoted by
Energy model
The complementary metal-oxide-semiconductor (CMOS) based processor has two components responsible for its power dissipation (i) dynamic and (ii) idle power dissipation. For
where working power dissipation
On other hand, idle energy consumed on processor
where idle power dissipation
Thus total power dissipation
In this paper, an energy-aware clustering task scheduling algorithm with dynamic power management (CDPM) is developed. The framework of proposed CDPM algorithm is depicted in Fig. 1, and it comprises in three phases: (i) tasks clustering phase; (ii) cluster merging phase; (iii) cluster mapping and DPM phase. The three phases are illustrated in the subsequent section using the DAG as shown in Fig. 2.
Framework for energy-aware clustering scheduling of directed acyclic graph (DAG) applications on heterogeneous computing system (HCS).
This phase generates the initial set of clusters to reduce the communication cost. First, the tasks of parallel application are ranked on the basis of bottom level
where
Computation cost on three processors with
Then, the earliest start time
where
Next, two parameters: favourite processor and favourite predecessor which are used to generate the initial clusters are calculated. The favourite processor
The favourite predecessor
Table 3 shows the calculated values of
Values of different parameters for cluster generation
A sample application represented as a task graph [23].
This phase attempts to reduce dynamic energy by reducing the number of processors and number of initial clusters to be used in scheduling. Number of processors to be reduced is based on the maximum parallelism of the task graph using the method described in precedence-constrained tasks algorithm (PASTA) [17] and comes out to be 5 for given DAG.
In proposed CDPM algorithm, processor reduction is performed to reduce number of clusters generated in phase 1. For this, two parameters namely computation score
Next, processors are arranged in ascending order of
Calculated values of
The number of initial cluster obtained in phase 1 and number of processor selected in this phase could be different. In case, the number of initial clusters obtained and the reduced number of processors is less or equal then cluster merging is not required. On other hand, when number of initial clusters is greater than the reduced number of processors, then the extra clusters occurring at the end of the sorted set of initial cluster are required to be merged. Further, the two modified parameters
where critical path
Now, for all task
Values of different parameters for cluster combination
CDPM performs energy-aware clustering on the basis of energy score
where
(a)
Furthermore, DPM is applied to reduce static energy consumption on processor
DPM deployment in the idle state of a processor.
Specifically, a processor cannot transit to low power state if
Thus, the idle energy consumed on all processors
Typically, the main performance metrics considered are makespan and energy consumption for all simulation results. The energy consumption and makespan of proposed algorithm CDPM against other referenced algorithm are defined as:
Parameters for generating task graphs
The ACP donates the maximum power that a processor consumes for a task which indeed can increase or decrease according to the application. The uniform distribution used to determine working power dissipation (WPD) of any processor
where ACP is selected randomly from the set of four ACP values from Table 7. The variance in the power consumption is set by the different values of the energy heterogeneity factor (
Proposed CDPM algorithm is compared with other relevant scheduling algorithms – heterogeneous earliest finish time (HEFT) [23], and a power-aware solution to scheduling of precedence-constrained tasks algorithm (PASTA) [17]. HEFT is an insertion-based list scheduling algorithm that gives good performance in terms of makespan, but with no consideration of power consumption. PASTA is a list-based energy-aware algorithm that reduces the number of processors to reduce system energy. However, it ignores static power consumption and the impact of communication cost on makespan. In this paper, clustering is integrated with DPM technique to minimize communication cost and energy consumption respectively. In addition, we have considered three low power states (LPS) standby, dormant, and shutdown instead of a single shutdown state [4]. The low power states, break event time, energy penalty, and energy consumption to deploy DPM technique are given in Table 8b. Figure 4 shows Gantt chart of HEFT, PASTA and proposed CDPM algorithms for the given DAG in Fig. 2.
(a) HEFT makespan 
(a) Working and idle power dissipation on three processors; (b) Low power states (LPS) of processor in DPM
PASTA allocates most of its tasks on lowest power consuming processor
This shows significant amount of static energy consumption saved by applying DPM. In comparison to list based HEFT, proposed CDPM saves 35% energy with increase of 14% makespan, and against energy-aware PASTA, it saves 24% energy with 16% decrease in makespan for under taken DAG.
Based on parameters in Table 5, the 40 benchmark task graphs for homogeneous system are simulated to give 1365 different DAGs {with three CCRs, five
The first set of experiment compares the total energy consumption and makespan of all algorithms for all tested CCR values (Fig. 5). Average makespan (Fig. 5a) and average total energy consumption (Fig. 5b) of all algorithms tend to increase as CCR increases. At CCR
(a) Average makespan and (b) total energy consumption for varying CCRs.
The second set of experiment compares the total energy consumption and makespan of the algorithms for all tested task graph sizes (Fig. 6). Average makespan of CDPM lies below PASTA and near to HEFT same as in experiment first (Fig. 6a). Figure 6b shows that energy consumption of CDPM decreases with the increasing number of tasks and outperforms the other two algorithms. For example, average energy saving of CDPM is 3.34%, 6.16%, 4.14%, 32.95%, and 55.43%, for 12, 20, 30, 50, and 100 respectively over PASTA. This is because when task size is small, less number of clusters are formed with little cluster merging. The only possible way to reduce energy consumption is allocating the clusters on low energy processor selected. For large task size graphs, more clusters are generated and more chances of cluster merging. Therefore, for small task size graphs CDPM behaves as that of PASTA but for large task size graphs it saves more energy than PASTA.
(a) Average makespan and (b) total energy consumption for varying task size.
(a) Average makespan and (b) total energy consumption for varying heterogeneities.
(a) Average makespan and (b) total energy consumption for varying number of processors.
The third set of experiment is conducted to compare the performance and energy consumption for varying heterogeneities (Fig. 7). At each heterogeneity parameter
The last set of experiment is performed on the varying number of processors (Fig. 8). The CDPM observes highest energy efficiency (Fig. 8b) regardless of number of processors. Like, for number of processor 3, 7, 10, 20, and 25 energy savings of CDPM over PASTA is 21.66%, 42.08%, 41.63%, 2.58%, and 2.22% respectively. From results, the CDPM tends to behave same like PASTA with increasing number of processors. This is because the selection of the reduced processor is same for both, but energy saving clusters generated by proposed CDPM aid more energy saving than PASTA.
To evaluate the performance of CDPM, we have used the real world task graphs of Fast Fourier Transform (FFT). The task graph contained
(a) Average makespan and (b) total energy consumption of FFT for varying CCRs.
(a) Average makespan and (b) total energy consumption of FFT for varying number of processors.
With the increasing number of processors, CDPM can obtain greater percentage of energy saving compared with PASTA and HEFT. For example, when the tasks are scheduled on three processors, the energy saving of CDPM compared with PASTA is 27.51%, and up to 55.79% when the number of processors is 10. Table 8 shows with the increasing CCR, CDPM can reduce more energy consumption compared with PASTA and HEFT algorithm. For example, at CCR
Average results of randomly generated applications and FFT
This paper worked on clustering scheduling approach to reduce static and dynamic energy consumption of precedence constrained tasks on heterogeneous computing platforms. As the clustering heuristics gives reduced makespan of an application program, these are made energy efficient in this paper. The proposed energy-aware clustering task scheduling algorithm with dynamic power management (CDPM) judiciously allocates the clusters on the processors such that dynamic energy consumption reduction is achieved. For this, the proposed algorithm builds clusters on the basis of favourite processor and favourite predecessor selection while keeping an eye on makespan and dynamic energy consumption. Further, DPM technique was exploited to the processor for the available idle periods with one of the feasible low power state (shutdown, dormant, standby) to reduce static energy consumption. The potency of CDPM algorithm is compared with well-known list-based HEFT and energy-aware PASTA algorithms with respect to number of processors, CCRs, heterogeneity, and number of tasks. For an average of random and regular graphs, CDPM gives better energy reduction of 73.5% and 47.5% on communication intensive applications against HEFT and PASTA algorithm respectively. For computation intensive applications, CDPM gives energy reduction with 65% and 20% against HEFT and PASTA algorithm respectively. Simulation results conducted on both random graphs and regular graph revealed that CDPM significantly reduces energy consumption, with results being more superior for communication intensive graphs.
Footnotes
Authors’ Bios
