Abstract
In contemporary and future embedded as well as high-performance microprocessors, power consumption is one of the most important design considerations. Because in current technologies, the dynamic power consumption dominates the static power consumption, voltage scaling is an effective technique to reduce the power consumption. In multiprocessor systems, an efficient scheduling of sequential and parallel tasks onto the processors is known to be NP- Hard problem. In this paper, the problem of minimizing schedule length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint on homogeneous and heterogeneous multiprocessor computers through independent sequential and parallel tasks are proposed. These problems emphasize the tradeoff between power and performance and are defined such that the power-performance product is optimized by fixing one factor and minimizing the other and vice versa. The performances of the proposed algorithm with optimal solutions are validated using Discrete Particle Swarm Optimization (DPSO). The proposed algorithms achieve 47.5% of power savings and 45.5% of energy saving with 23.5% increased schedule length when the processors operate its maximum frequency.
Keywords
Introduction
In recent years, processor performance has increased at the expense of drastically increased power consumption. On the one hand, such increased power consumption decreases the lifetime of battery operated systems, such as hand-held mobile systems or remote solar explorers. On the other hand, increased power consumption generates more heat, which causes heat dissipation a problem which requires more expensive packaging and cooling technology. Further this problem decreases reliabilityof the systems that have many processors. To reduce processor power consumption, many hardware techniques have been proposed, such as shutting down unused parts or reducing the power level of non-fully utilized functional. Nowadays, most of the processors have multiple supply voltages which have chosen by the operating frequency of the processor. A modern high-performance computing system normally consists of heterogeneous computing and communication resources, including heterogeneous processors, heterogeneous memories, and heterogeneous communication interconnections. The energy consumption problem in heterogeneous multiprocessor systems has also become more and more important and received extensive attention as green computing becomes a new trend. The energy consumption and schedule length models for homogeneous Dynamic Voltage Scaling (DVS) enabled multiprocessor systems for parallel tasks were discussed [1]. The concept of Particle Swarm Optimization (PSO) was applied to find the power supply among different power supplies. The same concept is extended to a heterogeneous DVS enabled multiprocessor systems using Discrete PSO (DPSO) in this paper. This problem essentially relaxes the rule in the previous problem that all the processors must be identical, and instead assumes that each processor can take a differing amount of time to process any given task [2]. The objectives of heterogeneous task scheduling are to map tasks onto processors and to order their executions, so that the task precedence constraints are satisfied and other performance and resource requirements can be met. It is well known that heterogeneous task scheduling problem with resource constraints is NP-hard. As more and more heterogeneous processors become available, the same type of operations can be processed by different processors with different execution time and different energy consumption.
This paper is organized as follows: Section 2 discusses background and related work. The problem statement and the proposed energy and schedule length models are discussed in Section 3. Section 4 describes the detailed tasks scheduling concepts using discrete particle swarm optimization. Section 5 presents the results and analysis of simulation. The conclusions and future work are stated in Section 6.
Related work
The problem of scheduling applications to meet all deadlines while reducing the processor voltage (and frequency) as much as possible was first addressed on uniprocessor platforms. Nowadays, there is an abundant literature covering numerous task and platform models, including multiprocessor platform models. An optimal integer linear programming method [3] has proposed to solve energy-aware heterogeneous data allocation and task scheduling on various multiprocessor systems for real-time applications problem. The energy-aware scheduling algorithm with reduced task duplication known as Energy-Aware Scheduling by Minimizing Duplication (EAMD) [4], which takes the energy consumption as well as the length of the task of an applications were considered in multiprocessor environment. The same concept has been extended as a resource-aware scheduling algorithm [7], which searches and deletes unnecessary task duplications dynamically in the process of scheduling. The energy consumption reduction for multicore server processors by using the technique of workload dependent dynamic power management has been proposed in recent years. The power-aware scheduling of sequential tasks [5, 11] with precedence constraints on multiprocessor computers were discussed to reduce the power consumption of the multiprocessor. The lightly loaded multicore processors [8] have less processing cores than running tasks and have sufficient energetic voltage and frequency scaling capability to reduce the energy consumption of multicore processors. Apart from power consumption, the reliability-driven task setting up scheme for multiprocessor real-time embedded systems have analyzed under stochastic fault occurrences [6]. The Reliability Maximization with Energy Constraint (RMEC) algorithm [9] has been proposed, which contains three important phases, including task priority establishment, frequency selection, and processor assignment. An Energy Aware Scheduling algorithm with Equalized Frequency called EASEF [10] has proposed for parallel applications on various computing systems. The same concept extended as energy aware list-based scheduling algorithm called EALS [12] used for parallel applications in the context of service level agreement (SLA) on cloud data centers. A real-time and energy-efficient resource scheduling and optimization framework [13] has proposed to allocate the tasks by utilizing an energy-efficient heuristic and a critical path scheduling mechanism subject to the architectural requirements. The heuristic schedule algorithms [14] for task graphs on a number of processors were discussed to meet the given deadline, but at the same time minimize the power consumption. A new variant of continuous Particle Swarm Optimization (PSO) algorithm named Discrete PSO [15] has proposed to solve the bi-objective task scheduling problem in cloud which performs the smallest position value (SPV) rule [16] based PSO technique. The main idea of the modified PSO [17] was that the tasks were allocated on the available resources to minimize the execution time in addition to the computation cost.
Problem Definition
Power dissipation and circuit delays in CMOS can be accurately modeled by simple equations, even for complex microprocessor circuits. CMOS circuits have both static and dynamic power dissipations; however, the dominant component is dynamic power consumption P which is directly proportional to fv2, where f is the clock frequency and v is the supply voltage. Since f is directly proportional to v and processor speed [5] s is directly proportional to f, power consumption of the CMOS processor P is directly proportional to s3. For high supply voltages occurring when carrier velocity saturates, the frequency f is directly proportional to v γ which implies that voltage is directly proportional to and power consumption P is directly proportional to f and is also directly proportional to s α , where α is the variation factor of the processor and has the constraint [5] that . It is clear that linear change in supply voltage results in at least linear change in processor speed and linear change in processor speed results in at least cubic change in power supply. Assume that, given n independent parallel tasks to be executed on m identical processors. Task i requires π i processors to execute where 1 ≤ i ≤ n, and any π i of the m processors can be allocated to task i. π i is called as the size of task i. It is possible that in executing task i, the π i processors may have different execution requirements (i.e., the numbers of CPU cycles or the numbers of instructions executed on the processors). Let r i represent the maximum execution requirement on the π i processors executing task i. p i represents the power supplied to execute task i. Further, it is assumed that p i is simply , where is the execution speed of task i. The execution time of task i is .
Problem 1: Minimizing schedule length Model
Given n independent parallel tasks with task sizes π1, π2, …, π n and task execution requirements are r1, r2, …, r n the problem of minimizing schedule length with energy consumption constraint E on a multiprocessor is to find the power supplies p1, p2, …, p n and a nonpreemptive schedule of the n parallel tasks on the m processors such that the schedule length is minimized and the total energy consumed does not exceed E. The scheduling problems contain three nontrivial subproblems, namely, task decomposition, task scheduling, and power supplying. Scheduling the tasks is essentially to partition the n tasks into m groups such that each processor executes one group of tasks. Once a partition (schedule) is given, power supplies that minimize the schedule length within energy consumption constraint can be determined. The proposed model considers the case when the n independent parallel tasks have to be scheduled sequentially. In this case, the m heterogeneous processors are treated as one unit, called a cluster, to be allocated to one taskIt is clear that the problem of minimizing schedule length with energy consumption constraint E is simply to find the power supplies p1, p2, …, p n such that the schedule length(T) is minimized as given in (1)
Given n independent parallel tasks with task sizes π1, π2, …, π n and task execution requirements are r1, r2, …, r n the problem of minimizing energy consumption with schedule length constraint T on a heterogeneous multiprocessor computers with different speed of m processors is to find the power supplies p1, p2, …, p n and a nonpreemptive schedule of the n parallel tasks on the m different speed processors such that the total energy consumption is minimized and the schedule length does not exceed T. The problem of minimizing energy consumption with schedule length constraint is simply to find the power supplies p1, p2, …, p n , such that the total energy consumption
For a given partition M1, M2, …, M j of the n tasks into j groups on a heterogeneous multiprocessor computer partitioned into j clusters, the total energy consumption is minimized when task i in group k is executed with power where 1 ≤ ki ≤ j. The minimum energy consumption is
For a given partition M1, M2, …, M j of the n tasks into j groups on a heterogeneous multiprocessor computer partitioned into j clusters, the schedule length is minimized when task i in group k is supplied with power where the ojective function of the DPSO is given as
Particle Swarm Optimization algorithm starts with a population of randomly generated initial solutions called particles (swarm). It is to be noted that the particle structure is taken as a string, which consists of tasks numbers in certain order. The order of tasks in the string represents a sequence. After the swarm is initialized, each potential solution is assigned a velocity randomly. The length of the velocity of each particle v is generated randomly between 0 and n and the corresponding lists of transpositions are generated randomly for each particle. The above formulation permits exchange of tasks in the given order. Each particle keeps track of its improvement and the best objective function value achieved by the individual particles so far is stored as local best(Pb) solution and the overall best objective function achieved by all the particles together so far is stored as the global best solution(Gb). The particle velocity and position are updated continuously in all iterations. The iterative improvement process is continued afterwards to further improve the solution quality.
The details of Discrete PSO algorithm is described below. The algorithm begins with k random particle The step by step procedure for implementing the proposed discrete PSO algorithm is as follows.
Step1: Initialize a swarm Pi with random positions and velocities in the problem space X. Step2: For each particle, evaluate the desired optimization fitness function. Step3: Compare the fitness function with its previous best. If current value is better than previous best, then set previous best equal to current value and P i equal to the current location X i . Step4: Identify the particle in the neighborhood with the best success so far, and assign its index to the variable G. Step5: Apply local search algorithm to all the particles at the end of each iteration and evaluate for the objective function. Step6: Change the velocity and position of the particle. Step7: Loop to step (2) until a criterion is met (usually number of iterations).
The inertia weight called random inertia weight is calculated using the equation
The velocity is a continuous value and our task scheduling is a discrete permutation in PSO algorithm. The Small Position Value (SPV) rule which is borrowed from the random key representation to solve the task assignment can convert the continuous value to discrete permutation. Table 4 shows the solution representation of the particles. The SPV rule transforms a continuous position vector to a dispersed value permutation vector . In order to counting the processing time, the algorithm should map each element of the vector into processor’s vector . The converting operation is defined as the equation (12)
The proposed algorithms were tested through a series of simulation on the heterogeneous multiprocessor environment. It is to be noted that for a given heuristic algorithm, the expected Normalized Schedule Length (NSL) and the expected Normalized Energy Consumption (NEC) are determined by m, n, α, the probability distributions of the and . The are treated as independent and identically distributed (i.i.d.) continuous random variables. The discrete random variables are defined as thefollwing range, 1 ≤ r i ≤ 100 and 1 ≤ π i ≤ 5. The four different speeds of heterogeneous processors are considered in the simulation of multiprocessor environment and assumed that the processing speed of each processor and the processing time of each task are known. Figure 1 shows the processing time of 40 independent parallel tasks while numbers of sub tasks are shown in Figure 2. Table 1 shows the Simulation parameters of Dicrete PSO for obtaining optimal solution. The four different speeds of heterogeneous processors are normalized as 0.55, 0.7, 0.85 and 1. He optimal values of the best solutions throughout the optimization iterations are recorded and all tasks are scheduled in different processors.The simulation results have the following four parts. 1. HOmogeneous Multiprocessor (HOM) with equal completion time 2. Heterogeneous Multiprocessor with Different Completion Time (HMDCT) 3. Heterogeneous Multiprocessor with Equal Completion Time (HMECT) 4. Heterogeneous Multiprocessor (HM) with one non DVS processor.
Homogeneous Multiprocessor with equal completion time has the operating frequency of m processors which are equal and all the sequential (π i = 1) or parallel tasks are scheduled to that processor with equal completion time using the objective funicon (1) and (2). It is observed that, the given 40 independent sequential or parallel tasks are scheduled and the given total amount of energy consumption or schedule length can be equally distributed to the m processors. When the speed factors (α) increases, the expected energy consumption has been achieved while the schedule length of the m homogeneous multiprocessors is varied but all m processors consume equal energy. Similarly when the speed factor (α) increases, the expected schedule length has been achieved while the energy consumption of the m the homogeneous multiprocessors are varied but all m processors complete the tasks in equal time of 450 unit as shown in Figure 3.
Heterogeneous Multiprocessor with different completion time has the operating frequency of m heterogeneous processors which are distinct. Each processor has its own operating frequency. The sequential or parallel tasks are scheduled to that processors and the resulting schedules comprising different completion time 550, 480, 456 and 382 time units are as shown in Figure 4. Heterogeneous Multiprocessor with equal completion time has the operating frequency of the m heterogeneous processors are distinct. Each processor has its own operating frequency. The sequential or parallel tasks are scheduled to that processors and the resulting schedules comprising equal completion time 550 units are as shown in Figure 5.
Heterogeneous Multiprocessor with one non DVS processor has the operating frequency of the m heterogeneous processors are distinct. Each processor has its own operating frequency, but among m processors, one particular processor does not involve the DVS operations where the particular processor operates with maximum speed and resulting schedules comprising different completion time 550, 480, 550 and 550 time units are as shown in Figure 6. The proposed Discrete PSO based algorithm is compared with existing List Scheduling (LS) Algorithm. Shortest Processing Time first (SPT) is a strategy for List Scheduling algorithm in which the tasks are arranged in order of non-decreasing processing time before the application of List Scheduling algorithm. The processor 4 takes 490 units of time to complete the tasks. The proposed Discrete PSO algorithm CPU computation time are compared with existing Genetic Algorithm(GA) and List Scheduling Heuristic (LSH) as shown inn Figure 7. List scheduling is taken because the tasks are arranged as per the precedence relations. The proposed task scheduling method using GA also takes the tasks in their precedence relation sequence.
Table 2 shows the simulation results of Discrete PSO minimizing schedule length with energy consumption constraint for parallel tasks of four different types speed level.It is observed that when the processor operates at maximum speed (α=3), there is no power saving of the homogeneous and heterogeneous multiprocessor systems. It shows that as the value of α increases (speed decreases) the expected energy consumption is achieved by increasing the schedule length. It is also noticed that as the speed of the multiprocessor decreases, the deadline of the tasks are increased to attain the expected energy level. If the processors operate at maximum speed, i.e. (α=3), HMDCT has 50% excessive schedule length and HMECT has 23.5% excessive schedule length compared with HOM with the expected energy consumption of 10000 Joules. Further it is noticed that if the processors operate at minimum speed, i.e. (α=10), HMDCT has 31% excessive schedule length and HMECT has 23.5% excessive schedule length compared with HOM with the expected energy consumption of 10000 Joules.
Table 3 shows the simulation results of Discrete PSO minimizing energy consumption with schedule length constraint for parallel tasks of four different types speed leve usnig the objective function (7) and (8) l.It is observed that when the processor operates at maximum speed (α=3), there is no power saving of the homogeneous and heterogeneous multiprocessor systems. It shows that as the value of increases (speed decreases) the expected schedule is achieved by increasing the energy consumption. It is also noticed that as the speed of the multiprocessor decreases, the energy consumption of the tasks are increased to attain the expected schedule length.Iftheprocessors operates at maximum speed, i.e. (α=3), HMDCT has 22% excessive energy consumption and HMECT has 34.5% excessive energy consumption compared with HOM with the expected schedule length of 1500 seconds. Further it is noticed that if the processors operate at minimum speed, i.e. (α=10), HMDCT has 48% excessive energy consumption and HMECT has 81.4% excessive energy consumption compared with HoM with the expected energy consumption of 10000 Joules.
Conclusion
An energy efficient dynamic voltage scaling with heterogeneous multiprocessor system is discussed in this paper. The problem of minimizing schedule length with energy consumption constraint and the problem of minimizing energy consumption with schedule length constraint on heterogeneous multiprocessor systems are discussed.The concept of Discrete PSO is applied to obtain an energy efficient schedule and to find the optimal power supply among different power supplies. In each iteration, an independent parallel task is selected and remapped to a new processor and new schedule is generated. The evolutionary algorithm calculates the objective function according to the schedule and adjusts the power supply of the processor while meeting the constraints.The simulation results emphasize the tradeoff between power-performance product is optimized by fixing one factor and minimizing the other in DVS enabled multiprocessor systems.The proposed algorithm achieves 47.5% and 32% of power savings for scheduling sequential and parallel tasks to the processors respectively and also 45.5% of energy saving are achieved for scheduling both sequential or parallel tasks to the processors. It also achives when the processors operate at maximum speed, i.e. (α=3), HMDCT has 50% excessive schedule length and HMECT has 23.5% excessive schedule length compared with HOM with the expected energy consumption of 10000 Joules. Further it is noticed that if the processors operate at minimum speed, i.e. (α=10), HMDCT has 31% excessive schedule length and HMECT has 23.5% excessive schedule length compared with HOM with the expected energy consumption of 10000 Joules. Heat dissipation is a main factor for multiprocessor embedded system. The excessive heat leads to system failure and components reliability. Further, this research can be extended to reduce the heat dissipation within the multicore processors and increase the components reliability of the multiprocessor.
